Centre for Computational Linguistics: Projects
Time Span: 1995 - 1996
Vlaamse Regering (Spraak- en Taaltechnologie voor het Nederlands)
G. Adriaens, B. Tersago, W. Peters, I. Schuurman & F. Van Eynde
The ANNO Project ("Een Geannoteerde Publieke Gegevensbank voor het Geschreven Nederlands/An Annotated Public Database for Written Dutch") is sponsored by the Flemish Research Initiative in Speech and Language Technology. It aims at laying the foundations for the compilation and linguistic annotation of a multi-functional Flemish text corpus.
The textual material the ANNO corpus consists of has been derived from BRTN (Belgian Radio and Television) radio news broadcasts and the current affairs programme ACTUEEL. It contains language which is written to be spoken and transcribed interviews.
The texts are annotated for their part-of-speech by means of the WOTAN-tagger which has been developed at the Katholieke Universiteit Nijmegen by the TOSCA-group.
Phonological information is retrieved from either the CELEX-database or added by the grapheme-to-phoneme convertor which was developed at the Centre for Dutch Language and Speech at the Universitaire Instelling Antwerpen.
Parts of the text have been annotated for their discourse information and part of it has been annotated by the KEPER-system, which added morphological information. Syntactic information is taken care of by the METAL-system, also just for a part of the corpus. Text typology and text structure will be formalised in an SGML document type description.
There is a small demo available on the basis of one file. You will see the morphosyntactic, morphological, grammatical and discourse annotation. The demo is in Dutch.
One of the things we have done in the course of the project is the compilation of a description and typology of language corpora and related concepts. It also contains a list of all corpora available for the Dutch language. The description, typology and list are in Dutch and can be found here. We would like it to be exhaustive, and therefore appreciate further additions, suggestions and remarks. Please send them to the ANNO-team.
Schuurman, I. (1997): ANNO: a multi-functional Flemish text corpus, in: J. Landsbergen et al. (eds.), CLIN VII. Papers from the Seventh CLIN meeting. IPO, Technische Universiteit Eindhoven, pp. 161-176
CCL
Layout:
webmaster@ccl.kuleuven.ac.be
Information Provider: Centrum voor Computerlinguïstiek
Comments to the Webmaster:
Ineke.Schuurman@ccl.kuleuven.ac.be
(C) Copyright 1996, CCL.
All Rights Reserved.