The Third CLIF Symposium on Language and Speech Technology

February, 5, 2008

Justus-Lipsiuszaal

Erasmushuis 08.16

Blijde-Inkomststraat 21




Program Registration Abstracts

Program

09.20 Opening
09.30 Automatic semantic frame detection using semi-supervised methods Koen Deschacht - LIIR, K.U.Leuven
10.00 Sub-sentential alignment Lieve Macken - LT3, Hogeschool Gent
10.30 Coffee break
11.00 Specification of discourse representations Markus Egg - Rijksuniversiteit Groningen
11.50 Authorship attribution on a large set of authors Kim Luyckx - CNTS, Universiteit Antwerpen
12.20 Enhancing corpus linguistics methods for tree-structured data Scott Martens - CCL, K.U.Leuven
12.50 Lunch break
14.10 Computational approaches to non-verbal communication Emiel Krahmer - Tilburg University
15.00 Dealing with cross-lingual aspects in spoken name recognition Frederik Stouten - ELIS, Universiteit Gent
15.30 Coffee break
16.00 The application of multigrams to the problem of language acquisition Joris Driesen - ESAT, K.U.Leuven
16.30 Split time warping of speech for noise robust and speaker independent automatic dialogue replacement (ADR) Pieter Soens - ETRO, V.U.Brussel

   


Registration

There is no registration fee, but we would like to know in advance whether you intend to participate. If you do, write to scott.martens@ccl.kuleuven.be and specify whether or not you will take part in the coffee breaks and whether or not we should reserve lunch for you. The lunch will take place in Alma-1. We would like to have your answer by January 29.

  

Abstracts

GUEST LECTURES

Specification of discourse representations

Markus Egg
Rijksuniversiteit Groningen

The sheer number of ways of arranging n atomic discourse elements (ADUs) into a specific discourse configuration (maximally, the Catalan number of n-1) seems overwhelming as soon as one tries to analyse real-life discourses. While underspecified approaches to the description of discourse allow one to handle such large sets of discourse configurations in terms of heavily underspecified representations, such representations are by themselves only a first step in discourse processing, because they are way too vague to be of use in potential applications of discourse analysis, e.g., automatic summarisation or machine translation.

Two more factors worsen this problem. First, there has been a shift from syntax-based segmentation of discourse to a semantics-based one, which assigns ADU status to specific subsentential phrases as well (e.g., PPs indicating a purpose). This considerably increases the number of ADUs, which then leads to a higher number of possible configurations of discourse. Second, the derivation of a suitable discourse structure goes beyond arranging ADUs into a discourse configuration, it also involves the selection of discourse relations between discourse segments. Considering the fact that only a small fraction of discourse relations is signalled lexically (e.g., by conjunctions), there is often only partial information on discourse relations available. This calls for an integrated underspecified treatment of discourse configuration and discourse relations, but this makes discourse representations even more underspecified, in that it multiplies the number of possible structures that an underspecified representation assigns to a given discourse.

This presentation is devoted to possible strategies to reduce this number. I will show how syntactic information can be used for this aim and give some preliminary evidence for a second kind of information that is relevant here, viz., interdependencies between the relations and the configurations within discourse structures. I will show the integration of this information into underspecified representations of discourse and give some estimations of the impact of this integration in terms of reduced numbers of potential structures for a given discourse.

Computational approaches to non-verbal communication

Emiel Krahmer
Tilburg University

There is currently a growing interest in building so called Virtual Humans (also referred to as Embodied Conversational Agents): lifelike computer characters which can communicate with humans through spoken language and which can support verbal information with appropriate non-verbal cues of face and body. Even though non-verbal communication has been studied extensively in the past, using these insights for the development of virtual humans is difficult since most of the previous insights are too impressionistic and imprecise to lend themselves for a straightforward computational treatment. In this talk, I will first give a general overview of current work on the development of virtual humans, and then zoom in on recent experimental work aimed at getting a more precise understanding of non-verbal behavior in humans. Finally, I will discuss how techniques from computational linguistics can be used to implement these insights.


PRESENTATIONS OF ONGOING PHD WORK

Automatic semantic frame detection using semi-supervised methods
Koen Deschacht
supervisor: Marie-Francine Moens
LIIR, K.U.Leuven

We have created novel semi-supervised methods for the detection of semantic frames and recognition of corresponding semantic roles in English sentences. We concentrate on semantic frames that describe typical actions of characters in a video transcript, with the defining characteristic that the actions, characters and other circumstances have to be visible in the described video. We have manually annotated a training set of video transcripts. Our models use approximate inference techniques that integrate probabilistic topic models with information of the syntactic structure of the sentence. Because of the low performance of these models when learning in a completely unsupervised way, we turned to semi-supervised techniques. We implemented two different Markov Chain Monte Carlo sampling methods, i.e., a Gibbs and a Metropolis-Hastings sampler which both sample from unlabeled data. We present the theoretical background, characteristics and use of our methods in the concrete setting of the analysis of video transcripts of the television soap "Buffy, the vampire slayer". We conclude with discussing the performance of the unsupervised and semi-supervised models compared with a completely supervised method trained with a maximum entropy classifier, and discuss the value and limitations of the models.

Sub-sentential alignment
Lieve Macken
supervisor: Walter Daelemans
LT3, Hogeschool Gent

In this talk, we describe a sub-sentential alignment system that links linguistically motivated phrases in parallel texts. Sub-sentential alignments are used a.o. to create phrase tables for statistical phrase-based machine translation (SMT) systems. However, a stand-alone sub-sentential alignment module is also useful for human translators if incorporated in CAT-tools, e.g. sophisticated bilingual concordance systems, or in sub-sentential translation memory systems. We conceive our sub-sentential aligner as an iterative process consisting of two cycles. The first cycle focusses on anchor chunks, i.e. chunks that can be linked with a very high precision based on lexical correspondences and syntactic similarity. In the second cycle, we focus on the more complex translational correspondences based on observed translation shift patterns. The anchor chunks of the first cycle will be used to limit the search space in the second cycle.

Authorship attribution on a large set of authors
Kim Luyckx
supervisor: Walter Daelemans
CNTS, Universiteit Antwerpen

A lot of the research in authorship attribution is performed on a closed-class task, which is an artificial situation. Hardly any corpora - except for some based on blogs - have more than ten candidate authors. The Personae corpus consists of 145 essays written by BA-level students on a single topic. The focus of this study is a systematic study of the effectiveness of lexical and syntactic features that have been proven useful in the field of stylometry. Syntactic features like part-of-speech n-grams are generally accepted as not being under the author's conscious control and therefore provide good clues for predicting gender or authorship. The corpus allows the computation of the degree of variability encountered in text on a single topic of different (types) of features when taking into account a relatively large set of authors. This will be a useful complementary resource in a field dominated by studies potentially overestimating the importance of these features in experiments discriminating between only two or a small number of authors.

Enhancing corpus linguistics methods for tree-structured data
Scott Martens
supervisor: Frank Van Eynde
CCL, K.U.Leuven

There are a variety of well developed methods in computer science for indexing and analyzing sequences, and corpus linguistics has over the years assimilated these methods into its practices.  However, the structures deployed by linguistic theories are generally non-linear.  At practically all levels and in all types of linguistic theories, non-sequential representations dominate the field - from typed feature structures to syntactic dependency trees.  Linguistic theories have, since long before the computer era, entailed claims rooted in these non-sequential representations. Many linguistic theories involve structures that can be treated as directed, acyclic graphs: tree structures.  Algorithms for analyzing such structures are relatively new to computer science and are not yet well integrated into corpus linguistics.
This presentation will offer a brief overview of one recent class of tree structured data mining algorithms with linguistic applications, as well as an effort to supplement classical information theory based techniques for analyzing sequential data with techniques derived from algorithmic information theory and minimum description length methods that are better suited to more complex non-sequential data structures.  There will also be some discussion of the kinds of empirical linguistic problems these approaches can hope to address.

Dealing with cross-lingual aspects in spoken name recognition
Frederik Stouten
supervisor: Jean-Pierre Martens
ELIS, Universiteit Gent

The development of an automatic speech recognizer (ASR) that can accurately recognize spoken names belonging to a large lexicon, is still a big challenge. One of the bottlenecks is that many names contain elements of a foreign language origin, and native speakers can adopt very different pronunciations of these elements, ranging from completely nativized to completely foreignized pronunciations.
In this presentation a recently proposed method for improving the recognition of foreign proper names spoken by native speakers is discussed. The main idea is to combine the standard acoustic model scores with scores emerging from a phonologically inspired back-off model that was trained on native speech only. This method does not require the development of any foreign phoneme models on foreign speech data. By applying the method on a baseline Dutch recognizer (comprising Dutch acoustic models) the name error rate for French and English names could be reduced by a considerable amount.

The application of multigrams to the problem of language acquisition
Joris Driesen
supervisor: Hugo Van hamme
ESAT, K.U.Leuven

It is remarkable that infants are able to automatically learn the acoustic, lexical and grammatical patterns of a language. Human learners appear to isolate units out of large and apparently unsegmented streams, using the statistical structure embedded within this stream. Moreover, they can do this significantly better than current automatic speech recognition (ASR) systems. Nevertheless, ASR systems make use of expert knowledge that arises from audiology, phonology and linguistics. The automatic discovery of the structure of speech - a task in which billions of infants have succeeded - is not attempted by these systems. In an attempt to build a system that automatically discovers the structure in speech, we address the problem of finding lexical items in a stream of symbolically (i.e. phone symbols) transcribed input. In this report, we extend the multigram algorithm that was first proposed by Deligne and Bimbot to account for ambiguity in the automatically generated symbol stream. To this end, an acoustic model is used to extract uncertain phonetic information from the sound input in the form of a phone lattice. We show this results in a significantly better recognition result. Finally, we propose a statistical method for grounding the discovered lexical items in the environment.

Split time warping of speech for noise robust and speaker independent automatic dialogue replacement (ADR)
Pieter Soens
supervisor: Werner Verhelst
ETRO, VU Brussel

In soundtrack post-production for film, video or television series, it is often necessary or desirable to replace the original dialogues recorded live on the set by re-recorded studio dialogues because the original location recordings are often unsuitable for use in the final soundtrack since they may be corrupted by some kind of background noise or simply because of an unacceptable quality of performance. This "dialogue replacement" is known to introduce a lot of mismatches between the words the audience perceives and the actual lip and mouth movements in the picture. To resolve this problem, synchronization systems have been developed that allow for automatically replacing the original location recordings with the studio dialogues. However, these systems lack robustness and often deliver time-scaled dialogue that is either insufficiently synchronized with the reference dialogue, of poor quality, or both. In this presentation, we propose both modifications to the basic system for automatic time synchronization as known from the state-of-the-art as well as techniques that improve the robustness of such a system.