 |
Time span: 2006-2009
STEVIN (Spraak- en Taaltechnologische Essentiële Voorzieningen In het Nederlands)
F. Van Eynde, I. Schuurman, V. Vandeghinste
Other participant: R.U.Groningen (coordinator)
A large corpus of written Dutch texts (1,000,000 words) is syntactically annotated (manually corrected), using the CGN/D-COI annotation guidelines. In addition, the full D-COI corpus (499,000,000 words)
is syntactically annotated automatically. For the manually corrected corpus PoS and lemmatization will be corrected as well.
The project aims to extend the available syntactically
annotated corpora for Dutch both in size as well as with respect to the various text genres and
topical domains. In addition, various browse and search tools for syntactically annotated corpora will be further developed and made available. Their potential for applications in corpus
linguistics will be tested and evaluated.
CCL is responsible for the correction of PoS and lemmatization and for part of the manually corrected syntactic annotation.
More information with respect to Lassy is to be found here.
|