Catégorie d'événements : SRPP
SRPP 22/05/2026 Katia Chirkova
SRPP 10/04/2026 Megan Dailey
SRPP The past and present of stop vocalization in Danish
Lenition is common in both phonological and sound change processes. In opening processes, which involve increasing aperture, labial and dorsal consonants tend to retain some labial or dorsal constriction even if they fully vocalise. It is less straightforward what happens with coronal consonants that vocalise, since vowel production rarely involves the tongue tip or blade. Vocalization of coronal consonants often involves a reconfiguration of, or loss of, the coronal gesture, as in approximants like [j ɹ]. Danish has historically seen extreme lenition in the unaspirated stop series /b d g/ in prosodically weak positions, a process which has had fascinating phonetic and phonological consequences, not least because /d/-vocalization has not obviously led to a reconfiguration of the coronal gesture.
In Modern Standard Danish, the reflexes of /b d g/ in prosodically strong positions are voiceless unaspirated [p t k], whereas in weak positions, the reflexes are typically semivocalic. /b/ somewhat inconsistently surfaces as [ʊ̯] or [p] in weak positions. /d/ surfaces as the ‘soft d’, a sound which is often transcribed as [ð], but is well-known to be more open than a fricative. /g/ either surfaces as [ʊ̯ ɪ̯] or elides in weak positions, determined mostly by the identity of preceding vowels.
This presentation will present and discuss the outcome of stop vocalization in Modern Standard Danish, with particular focus on /d/-vocalization. Several open questions remain about the stop vocalisation process, including: (1) What exactly is the acoustic and articulatory outcome of /d/-vocalization? I will present analyses of spontaneously produced corpus data and recently collected articulatory data using electromagnetic articulography probing this question. (2) How does stop vocalization unfold in the long and short term? This will be discussed with reference to historical written sources, and I will outline the goals of a recently started project aimed at tracking and analyzing the relatively recent past of /d/-vocalization in a large longitudinal speech corpus. (3) Does phonetic and phonological evidence support an analysis where [p t k] are linked to semi-vowels in the synchronic phonology of Danish? This question will be discussed in light of the new data.
SRPP Invariant encoding of phonetic features supports speech processing in early infancy
From birth or after a few months onwards, infants are already capable of discriminating subtle differences in speech sounds (Jusczyk & Derrah, 1987; Kuhl, 1983) and of overriding the acoustic variability produced by changes in talkers voice, speaking rate and coarticulation (Eimas & Miller, 1980; Hillenbrand, 1984; Mehler et al., 1988) to categorize speech sounds. To explore what neural representation supports these early perceptive abilities for speech, we measured the brain responses of 3-month-old infants (N=30) listening to a set of natural syllables, using a 128-channel EEG system. The syllables were either a consonant followed by a vowel or vice-versa and varied along two orthogonal consonantal phonetic features -voicing and place of articulation- and two vocalic phonetic features – height and backness.
Using multivariate pattern analysis (MVPA), we extend previous results by showing that preverbal infants encoded all phonetic features, no matter how and by whom they were pronounced. More importantly, we demonstrate that these phonetic features are encoded independently from the place of the phoneme within a syllable, suggesting that alike adults, pre-verbal infants possess a position-invariant code for phonetic content. Next, we examine how this phonetic representation evolved across time and show that infants parallelly extracted and combined phonetic features, enabling them to identify phonemes. Finally, we also ran the same experiment in neonates (N=25) and compared the content and the dynamics of the phonetic encoding. Preliminary results suggest that already at birth, the brain encodes vocalic phonetic features independently from the context. Overall, this study sheds light on the neural representation behind infants’ early perceptual abilities concerning speech and contributes to a better understanding of what encoding mechanisms support rapid language acquisition in the first months of life.
SRPP Philhellénisme et phonétique : les recherches d’Hubert Pernot sur les dialectes de Chios
Pendant près d’un demi-siècle, de 1898 à 1946, Pernot a travaillé sur les dialectes de l’île de Chios, passée de l’empire ottoman à la Grèce en 1913. Ses recherches ont donné naissance à trois volumes intitulés Etudes de linguistique néo-grecque, et consacré à la phonétique, à la morphologie et à la lexicologie des parlers de Chios. Ces études procèdent des enregistrements sur cylindres que Pernot a réalisés dès 1898-1899 dans l’île. Elles s’inscrivent par ailleurs dans un contexte politique particulier, Hubert Pernot et son maître Jean Psichari militant pour le rattachement de Chios à l’Etat grec dès la fin du 19e siècle. C’est donc sur la double dimension politique et scientifique des travaux de Pernot que nous reviendrons dans notre communication, afin de rappeler les débuts de la phonétique expérimentale appliquée aux dialectes grecs ainsi que l’utilisation des recherches de Pernot à des fins politiques avant 1913.
SRPP Beyond reaction time: Articulatory evidence of perception-production link in speech using the Stimulus-Response Compatibility paradigm.
This talk will introduce our ongoing project investigating the link between perception and production in speech. In the Stimulus-Response Compatibility (SRC) paradigm, in which participants are typically prompted to produce a target syllable while being presented with either congruent or incongruent distractors. Responses tend to be slower in incongruent trials (i.e., covert imitation effect), reflecting a competition between perception-driven and goal-driven motor plans. The short response-distractor time lag in the SRC task design makes it suited to study the motor system engagement upon speech perception during speech planning. Our aim is to obtain finer-grained insights into the nature of perception-production link using electromagnetic articulography (EMA).
The discussion will be based on our preliminary analyses based on a subset of data from ten L1 British English speakers, using /ɹa/ and /va/ as prompt and distractor syllables. Reaction time (RT) analysis based on acoustic data shows a clear covert imitation effect for /ɹa/ but not for /va/. The timing of maximal displacement of tongue tip (TT) for /ɹa/ and lower lip (LL) for /va/ also followed a similar pattern. Time-varying position trajectories and tangential velocity profiles, however, show evidence of TT gestural intrusion for the /va/ production in the incongruent trials (i.e., with the distractor /ɹa/). Such between-condition difference in TT activity, despite a lack of clear congruency effects in RT measurement, might result from a greater degree of TT activation during speech planning due to the perception of distractor stimulus, demonstrating that the motor patterns activated based on observation only might in part be executed. The implications of the behavioural results will be discussed in the light of articulatory complexity, multimodal speech perception, and cognitive sensorimotor theories.
SRPP Éléments de prosodie du bedja (couchitique, Soudan)
Le bedja, langue couchitique du Soudan, est un parler aujourd’hui bien connu et décrit (Wedekind et al. 2005, Vanhove 2017, etc.). Toutefois, présenté comme tonal par certains auteurs (Mous 2012, Hellmuth & Pearce 2020) ou accentuel par d’autres (Hudson 1973, Mous 2022), son système prosodique reste méconnu au-delà de certaines de ses propriétés distributionnelles, tout particulièrement lorsque l’on considère les niveaux se situant au-delà de celui du mot prosodique.
Dans cette étude très exploratoire, nous proposerons de premiers éléments d’analyse descriptive des différents niveaux de la hiérarchie prosodique du bedja, depuis les mores jusqu’aux groupes intonatifs, en passant par les syntagmes phonologiques. Parmi les questions que nous aborderons au fil de cet examen, au-delà de celle du statut accentuel ou tonal du bedja (qui sera évalué à l’aune de la typologie de Hyman (2006, 2009, etc.)), nous évoquerons quelques paramètres conditionnant la longueur de surface des voyelles (la langue présentant une opposition phonémique entre brèves et longues), le rôle que pourrait jouer la sonorité des voyelles pour les corrélats associés à la proéminence, ou encore la localisation des tons de frontière. En fin d’exposé, l’intonation associée aux questions ou aux éléments focalisés sera brièvement évoquée.
SRPP Flapping vs. tapping in the Japanese rhotic: Evidence from X-ray microbeam and EPG corpora
One highly debated question in Japanese phonetics is whether the rhotic ‘/r/’ should be classified as a flap or a tap (Vance, 1987; Okada, 1999; Arai et al., 2007) – the consonants distinguished by tangential vs. direct movement trajectories of the active articulator with respect to the passive articulator (Ladefoged & Maddieson, 1996; Derrick & Gick, 2011). Although the flap is considered to be the primary allophone in some descriptive accounts (e.g., Okada, 1999), articulatory evidence for flapping has been limited (Sudo et al., 1972 based on electropalatography, EPG) or appeared to be absent altogether (Maekawa, 2023 based on dynamic MRI). The methods used in these and other studies, however, did not always allow for an examination of the tongue tip movement or its contact with the palate with sufficient temporal and/or spatial resolution. These studies have also frequently largely considered /r/ flanked by vowels and thus being strongly coarticulated (cf. Katz et al., 2018).
In this talk, I revisit the question of flapping vs. tapping by examining fine-grained dynamics of the Japanese rhotic productions in two corpora of read speech – the X-ray microbeam (XRMB) speech production database of Japanese (Hashi, 2000, 19 speakers, 144 tokens) and the EPG Cross-Language Articulatory Database (Kochetov et al., 2017, 5 Japanese speakers, over 1800 tokens). The results show that flap realizations of the Japanese rhotic are considerably more common than previously reported. This was specifically the case in utterance-initial position, where all speakers in the XRMB corpus produced at least some rhotic tokens with a preparatory raising/retraction of the tongue tip, followed by a rapid downward/fronting movement of the articulator in the proximity of the alveolar ridge. Similarly, three out of five speakers in the EPG corpus produced many word-initial rhotics with a closure advancing from the postalveolar to front alveolar regions. In both sets of data, the following back vowels favoured flapping, while the following front vowels, as well as the intervocalic position, favourred tapping.
Based on these results, we may conclude that the Japanese rhotic is inherently a flap (cf. Okada, 1999), as this configuration is presumably its intended articulatory target. The tap allophone appears in contexts less favourable for flapping due to the stronger overlap with neighbouring vowel gestures. This is reminiscent of the variation found for North American English flap/tap allophones of /t, d/ next to rhotics and non-rhotic vowels (Derrick & Gick, 2011) and is broadly similar to the cross-linguistically common appearance of tap/approximant realizations of phonemic trills in aerodynamically unfavourable contexts (Ladefoged & Maddieson, 1996).
SRPP A long-form single-speaker real-time MRI speech dataset and benchmark
We release the USC Long Single-Speaker (LSS) dataset containing real-time MRI video of the vocal tract dynamics and simultaneous audio obtained during speech production. This unique dataset contains roughly one hour of video and audio data from a single native speaker of American English, making it one of the longer publicly available single-speaker datasets of real-time MRI speech data. Along with the articulatory and acoustic raw data, we release derived representations of the data that are suitable for a range of downstream tasks. This includes video cropped to the vocal tract region, sentence-level splits of the data, restored and denoised audio, and regions-of-interest timeseries. We also benchmark this dataset on articulatory synthesis and phoneme recognition tasks, providing baseline performance for these tasks on this dataset which future research can aim to improve upon.


