SRPP – Laboratoire de Phonétique et Phonologie

SRPP Stress-induced articulatory modulations of post-tonic consonants and their perceptual consequences

The investigation of lexical stress has traditionally been limited to stressed syllable nuclei. Vowels are longer, louder, and less reduced when stressed, and they are produced with faster and larger articulatory movements. In a series of recent studies (1, 2, 3), we have shown that lexical stress also modulates the acoustic properties and the articulation of post-tonic velar obstruents. These results suggest that a full understanding of the phonetics and phonology of stress should also look outside the boundaries of the stressed syllable. During this talk, we will present our research investigating the articulatory and perceptual consequences of lexical stress on post-tonic obstruents. First, we present a new articulatory study which extends our findings on velar obstruents to bilabial and alveolar stops in Italian. Second, we present a set of perceptual studies showing that lexical stress increases the availability of perceptual cues to consonantal contrasts, and we investigate the perceptual underpinnings of these effects.

[1] Shao, B., Hermes, A., Buech, P., Giavazzi, M. (2025). Articulatory spill-over effects of lexical stress in Italian and their phonological implications. Journal of Phonetics, Volume 108, 101371.

[2] Shao, B., Buech, P., Hermes, A., Giavazzi M. (2023). Lexical stress and velar palatalization in Italian: A spatio-temporal interaction. Interspeech2023. Dublin, Ireland, 1833-1837.

[3] Shao, B., Buech, P., Hermes, A., Giavazzi M. (2023). Stress conditioned phonological process: a case study of Italian palatalization. (ICPhS XX), Prague, Czech Republic, 2189-2193.

SRPP Comment caractériser la dysprosodie/dysrythmie dans les troubles moteurs de la parole ?

Le rythme dans la parole sert d’indice à l’auditeur pour décoder le flux auditif continu. Deux processus hiérarchiques prosodiques participent à ce mécanisme : la segmentation en groupements pertinents (syllabes, mots, groupes de mots) et l’extraction d’une pulsation (accents rythmiques) qui attire et guide l’attention de l’auditeur sur la parole du locuteur. Dans cette présentation, nous proposons d’observer la réalisation prosodique rythmique du locuteur lorsque les mécanismes de planification, programmation ou exécution motrice de la parole sont altérés. Nous présenterons les premiers résultats issus d’une étude perceptive sur quatre groupes de participants : trois groupes de patients présentant respectivement une apraxie de la parole, une dysarthrie parkinsonienne et une dysarthrie ataxique (ataxie spinocérébelleuse), ainsi qu’un groupe contrôle neurotypique. Les résultats indiquent qu’il est possible de caractériser les trois groupes de patients selon des profils dysprosodiques/dysrythmiques distincts selon des critères relatifs au groupement et à la pulsation (nombre, type et emplacement des groupements/accents, séparateurs et taille des groupements). Ce travail montre comment, à l’aide de mesures perceptives fines relatives aux mécanismes de groupement et de pulsation, il est possible de mettre à jour des caractéristiques prosodiques propres aux différents troubles de la parole.

SRPP Does coarticulation play a role at the end of a sound change?

Internal linguistic factors no longer guide sound change in its very late stages: A real-time acoustic-phonetic study of ł-vocalisation in Polish People’s Republic newsreels

Several theoretical works link many segmental changes to coarticulatory patterns, whereby overlap between adjacent sounds introduces systematic acoustic variation in the speech signal. These models predict that real-time data should show an early concentration of change in promoting contexts, followed by gradual spread and levelling across initially non-promoting contexts; this pattern is indeed observed at early stages. The present study provides real-time acoustic–dynamic evidence that, at a very late stage, a similar change appears no longer to be guided by coarticulatory factors.

Using 150,000 tokens from 20 speakers in Polish People’s Republic newsreels (PKF, 1944–1994), we perceptually coded underlying /ł/ as [ł], [Ł], or [w] in three positions (CV , VCV , VC) across three periods (1944–1959, 1960–1979, 1980–1994). [Ł] denotes an acoustically intermediate, labialised variant with reduced apical contact. F2–F1 trajectories proxying the acoustic “darkness” were modelled with GAMMs as a function of time, category, adjacent segments, stress, duration, and frequency, with random effects for lemma and speaker.

Results show a shift from a [ł, Ł, w] system to [w] via a reduced [Ł, w] system, with positional asymmetries: the change advanced fastest in VC, was intermediate in VCV , and slowest in CV . Darker variants were favoured next to /ɔ, u/ and labial or velar consonants. Crucially, these coarticulatory effects remained stable over time. This contrasts with early-stage patterns, which predict contextual effects that weaken as innovations spread, and suggests that late-stage change is no longer guided by coarticulation.

Velarised laterals in the service of a “new, better reality”: Lifespan change and audience design in the official newsreels of the Polish People’s Republic

This study investigates lifespan phonetic change in the realization of Polish /ł/ in three narrators (AŁ, KŚ, JR) of the Polska Kronika Filmowa (PKF), the state-controlled newsreels of the communist regime, produced between 1944 and 1994.

By the twentieth century, Polish had completed /ł/-vocalization in both onset and coda positions. While the vernacular [w] had become dominant by the 1920s, stage diction manuals and normative discourse continued to prescribe the velarized lateral [ł] well into the 1960s, treating it as a prestige form associated with theatre, broadcasting, and professional speech. By the 1980s, however, this norm was increasingly perceived as artificial and obsolete, paralleling the weakening authority of official propaganda. Within Bell’s (1984) framework of “audience design” and “referee design”, PKF narration exemplified a performative register oriented toward an abstract referee (the propaganda authority), but gradually shifted toward more audience-oriented styles as the political system evolved.

The corpus analysed comprises 13,257 tokens of underlying /ł/ perceptually classified as velarised lateral [ł], reduced velarised lateral [Ł] and bilabial glide [w]. Dynamic acoustic measurements of F2–F1 as a proxy of the acoustic “darkness” were extracted across lateral segments and adjacent vowels, and modelled using Generalized Additive Mixed Models (GAMMs). Predictors included phonetic variant, period of recording, stress, segmental context, duration, and lexical frequency.

Results reveal three distinct phonological systems and lifespan trajectories: AŁ produced all three variants ([ł], [Ł], [w]), with lateral forms dominating. Over his career, acoustically, [w] and [Ł] became progressively darker, while [ł] remained rigid and near-invariant diachronically. The distribution and acoustic properties suggest reinforcement of a prescriptive norm, especially in VC position, contrary to cross-linguistic expectations of coda ł-vocalization. KŚ never produced canonical [ł] but realised both [Ł] and [w]. In CV position, [Ł] and [w] acoustically converged over time, losing distinctiveness by the end of his career; in VC position, however, the two remained stably distinct. His trajectory thus reflects partial normative design, but one insufficient to restore the canonical lateral. JR realized only [w] throughout his 32-year career, with no diachronic acoustic shift. His stable use of the vocalized variant reflects alignment with community norms (audience design), rather than prescriptive standards.

These results contribute to models of late-stage sound change by showing that, once internal phonetic pressures cease to drive change, stylistic forces—shaped by the role of mass media as a site of linguistic normativity and accommodation—can generate divergent individual trajectories.

SRPP Effets de l’orthographe et de la morphologie sur la durée des consonnes en français — Analyses fondées sur des corpus à grande échelle

En français contemporain, il n’existe pas de consonnes phonologiquement distinctes par leur durée, bien que certaines puissent être perçues comme plus longues sur le plan phonétique dans certains contextes. Les grammaires mentionnent également des prononciations longues optionnelles dans des mots comme « illégal » ou « grammaire ». Dans cette présentation, nous allons exposer notre projet en cours, qui examine les facteurs susceptibles de favoriser l’allongement consonantique en français à partir de grands corpus audio. Nous analysons notamment l’influence de l’orthographe et de la morphologie sur la durée des consonnes, en comparant celles situées à la frontière préfixe-radical (ex. /ʁ/ dans irrégulier) à des positions syllabiques similaires dépourvues de doublement graphique et/ou de frontière morphologique (ex. /ʁ/ dans ironique). Les mesures incluent également le ratio de durée entre la consonne et la voyelle précédente, de même que la vitesse de variation spectrale (∆MFCCs), un indice acoustique lié à la dynamique articulatoire, dans les consonnes analysées. Les premiers résultats indiquent un effet de l’orthographe, ainsi qu’un effet potentiel de la morphologie sur la présence de consonnes longues, effets qui semblent modulés par le type de consonne, la nature du préfixe et le contexte de parole (par ex. radio, télévision).

SRPP de Jingyi Sun et Yunzhuo Xiang

Neutral Tone as a Window into Tonal Reduction: Context, Duration, and Prosodic Structure across Speech Styles

Jingyi Sun (LPP)

The Mandarin neutral tone (NT) provides an empirical window into how tonal reduction operates under prosodic and temporal constraints. Based on a 26-hour corpus of spontaneous and read Mandarin, this talk investigates how NT contours arise from the interaction of tonal context, duration, and grammatical structure. Results show that NT is not a toneless residue but a dynamic prosodic unit whose pitch contour unfolds with temporal availability. NT remains in the lowest register after Tone 3 but is consistently raised when followed by Tone 3, even in ultrashort durations (~40 ms), revealing early activation of low-tone gestures rather than post-lexical interpolation. Gender and grammatical subtype further shape this reduction system: female speakers and /ən/-type plurals maintain wider pitch ranges and greater tonal dynamics, while /ə/-type particles undergo stronger temporal compression. These hierarchical patterns remain stable in spontaneous speech. These results demonstrate that NT realization reflects a dynamic process of tonal integration, compatible with the mechanisms proposed by the Stem-ML framework (Kochanski & Shih, 2000; 2003), operating even under significant temporal and articulatory constraints.

Inspecting monophthongization: a corpus-based study of Mandarin /ai/ reduction across speech styles and vowel durations

Yunzhuo Xiang (LPP)

Monophthongization is widely attested across the world’s languages. In Modern Mandarin, the diphthong /ai/ has undergone monophthongization to /ɛ/ or /æ/ in many regional varieties. This study investigates the diachronic /ai/ > /ɛ, æ/ change through an analysis of synchronic variation. Based on the corpus of Jingyi Sun et al. (2024), we modelled F1 and F2 variation in the Standard Mandarin /ai/ across different vowel durations and speech styles using generalized additive mixed models (GAMMs). The results indicate that speech style and vowel duration influence not only vowel target frequencies but also the degree of formant movements. We further consider that the monophthongization /ai/ > /ɛ, æ/ may be accounted for in terms of spectral dynamics along the dimension of hyper- and hypo-articulation, precisely, as the phonologization of the hypo-articulated diphthong /ai/.

SRPP De la variation à la structure : les précurseurs du dévoisement final en français

L’exposé présentera l’article « Final Devoicing before it happens: A large-scale study of word-final obstruents in French » (https://doi.org/10.16995/labphon.10855). Alors que le dévoisement final (ex. catalan llop [ˈʎop] vs. lloba [ˈʎoβə] ‘loup’ m/f) est souvent présenté comme l’archétype du changement phonétiquement motivé (Beguš 2020), les prédictions phonétiques ne correspondent pas tout à fait à la typologie des dévoisements finaux phonologisés : on s’attendrait à trouver des langues dans lesquelles seules les fricatives dévoisent, ou les consonnes d’arrière, ou seulement en fin d’énoncé. Pour résoudre ce problème, nous proposons de mesurer non plus le dévoisement des obstruantes voisées, mais la réduction du contraste entre les voisées et les sourdes, en prenant en compte d’autres dimensions du contraste comme le ratio de durée V/VC. Nous analysons de grands corpus de parole en français, alignés automatiquement, à l’aide d’un modèle bayésien multivarié. Les résultats suggèrent que le contraste de voisement des fricatives n’est pas plus fragile que celui des occlusives, et que les effets attendus du lieu d’articulation n’apparaissent pas dans les occlusives.

SRPP Consonant Contrasts and Vowel Contexts: Typology, Perception, Sound Change

With coarticulation between neighboring segments, the phonetic properties of a consonant often vary across different vowel contexts. When focusing the contrast of two consonants, their relative perceptual distinction often differs across vowel contexts. This talk focuses on the consonant contrasts in Chinese dialects, making connections between phonological typology, the results from perceptual experiments, and patterns in sound change. A phonological typology suggests that, for example, the high front vowel [_i] usually differs from other vowels in the presence/absence of a consonant contrast; the results of perceptual experiments indicate that two consonants can be perceptually less distinct in the [_i] context than in other vowel contexts; certain patterns from typology and perception turn out to be consistent with the historical development of a sound system, for which discussions are made on the interpretation and implication.

SRPP: Quantifying co-speech gestures of the head

Movements of the head during speech are frequently observed to align systematically with acoustic and articulatory aspects of prosodic prominence, and are known to enhance perception when they occur (Munhall et al. 2004). Recent studies employing electromagnetic articulometry (EMA), which supports direct comparison between speech acoustics, head movement and intraoral articulation, have greatly extended the precision of such observation (e.g. Garvin et al. 2025, Carignan et al. 2024, Tiede et al. 2019). This talk will provide an overview of the methodology supporting such studies and review recent findings in this area. The aim is to encourage discussion of possible mechanisms by which prosodic planning extends to recruitment of such co-speech head movements.

SRPP: Exploring pronunciation variation in an AI representation paradigm: addressing intra- vs. inter-speaker effects and native- vs L2-speech variation with a non-supervised neural approach

In AI representation paradigms, speech is typically represented under the form of series of vector embeddings. In this talk, we address speech variation between native and non-native (Russian L1) speakers of French with a method based on a frame-wise comparison of wav2vec2 acoustic embeddings. The first part will focus on wav2vec2 parameterization aspects (Layer and z-normalization), and then the second part will deal with French L2-acquisition questions by comparing phonologically similar recordings using Dynamic Time Warping methods.
Wav2vec2 parameterization is carried out by assessing the response of wav2vec2/XLSR-53 model vis-à-vis intra-speaker vs inter-speaker variability on a controlled experiment of read speech. Then, using wav2vec2 embeddings without any supervision, we investigate the model’s ability to tell whether or not native speech is more stable that non-native speech.
Results indicate that the model allows phonetically meaningful correlative approaches using wav2vec2 frame-by-frame embeddings. By evidencing our ability to address time-dependent phonetic questions directly on wav2vec2 embeddings, this study explores an innovative research avenue combining speech sciences and neural approaches. Our analyses, which have been conducted at word level, should reap more benefits at phoneme level. We conclude by outlining the expected benefits in the future developments of the research.

Being a method largely in construction, Maxime’s talk will be mainly addressing the methodological aspects of the approach, but, giving credit where credit is due, Daria Dashkevich (LMSU) and Ekaterina Biteeva (LPP) will step in to introduce the corpus used and discuss the perspectives of the study.

SRPP: Beyond lax vowel: timing in rhymes involving sonorants

Beyond lax vowel: timing in rhymes involving sonorants

American English rhymes involving laterals have been shown to exhibit coordination patterns reserved for onsets (Marin & Pouplier, 2010; Katz, 2012). These results, however, were based solely on items with lax vowel nuclei. There is evidence, based on limited data and simulations, suggesting that this pattern may not extend to rhymes involving tense vowel and diphthong nuclei (Popescu, 2019; Hsieh, 2017). In this talk we present data systematically comparing duration and formant dynamics across different vowel type nuclei (lax vowels, tense vowels, diphthongs) and postvocalic sonorant consonants (laterals, nasals). Results show that vowel-consonant (VC) timing in American English rhymes are modulated by both vowel type and morphological composition.

Internal linguistic factors no longer guide sound change in its very late stages: A real-time acoustic-phonetic study of ł-vocalisation in Polish People’s Republic newsreels

Velarised laterals in the service of a “new, better reality”: Lifespan change and audience design in the official newsreels of the Polish People’s Republic

Information relative aux conditions de la RGPD concernant les cookies