In the first part of this talk, I will introduce DoReCo, an initiative to create a multilingual reference corpus, consisting of at least 10,000 words for at least 50 languages. DoReCo extracts from fieldwork-based language documentation collections narrative texts that are already transcribed, translated into a major language, and morphologically analyzed. Within DoReCo, we convert these data to a common file format and time-align them at the phoneme level using the MAUS software. In the second part of this talk, I will present two cross-linguistic studies on a subset of this corpus: One study investigates word lengthening as a function of utterance-final position. Another, still ongoing study investigates pause probabilities before nouns vs. verbs and relates findings to the fact that, typologically, there are fewer prefixes on nouns vs. verbs.
Prochains événements
Voir la liste d'événementsSRPP 22/05/2026 Katia Chirkova
Katia Chirkova (Inalco)
Stefanie Keulen - Seminar 1
Language and the brain: a lifetime perspective.
SRPP 29/05/2026 Chenzi Xu
Chenzi Xu (Nanyang Technological University, Singapore)
Stefanie Keulen - Seminar 2
The enigmatic cerebellum: involvement in speech and language.


