Centre for Computational Linguistics: Projects
ATraNoS: Automatic Transcription and Normalisation of Speech
Time Span: 2000 - 2004
Institute for Innovation in Science and Technology (IWT), STWW program
F. Van Eynde, V. Vandeghinste, P.Dirix
Other participants:
ESAT/PSI (K.U.Leuven), ELIS (R.U.Gent), CNTS (U.Antwerpen)
The CCL has two tasks in this project. The first task concerns the
reduction of the number of Out-of-Vocabulary words by building a large
scale lexicon for the speech recognizer that takes the existence of
compound words in Dutch into account.
The second task concerns the 'translation' of the transcribed speech input
into 'subtitle Dutch'.
1. Reduction of Out-of-Vocabulary words.
One of the main reasons for speech recognition errors is the occurence of
Out-of-Vocabulary words. In Dutch, word compounding is a productive process
and hence it is impossible to give a list of all Dutch words that should be
in the recognizers lexicon.
A lexicon was built that consists of a list of non compound words, together
with a list of
compound parts (called the Quasi-word List) and a perl module which allows
for online word compounding. The perl module makes use of a ruleset
describing the compounding rules for Dutch. As the rules tend to
overgenerate, a statistic measure estimating the likeliness of the compound
is introduced to get a better accuracy in compounding.
2. Translation of transcribed speech into Subtitle Dutch
Subtitles are not a transcription of what is said in a TV-program. Often
some parts of what is being said are left out. This can be due to the fact
that the speaking rate is too fast in order to put all the words on the
screen and still comply to the 6-seconds rule. So, a system is designed
that parses the incoming sentence and decides what parts of the input
sentence should be retained in the output sentence.
More information can be found on the ATRANOS website
CCL
Layout:
webmaster@ccl.kuleuven.ac.be
Information Provider: Centrum voor Computerlinguïstiek
Comments to the Webmaster:
Ineke.Schuurman@ccl.kuleuven.ac.be