SRPP: Articulation and prosodic prominence: Evidence from habitual and loud speech

Speakers vary the degree of vocal effort in speech production to successfully convey a message to listeners. Vocal effort can be globally increased over entire utterances, as in loud speech (Lindblom, 1990; Mefferd & Green, 2010). Furthermore, it can vary within an utterance to locally highlight important information, which is then referred to as prosodic prominence (Cho, 2004; Roessig & Mücke, 2019). This study investigates the interaction of the two levels of vocal effort: If individuals speak loudly with a globally high effort, can they still encode
prominence relations as local effort variations? In this talk, I will present a study on articulatory correlates of prosodic prominence (different focus structures) in habitual and loud speech. 20 German speakers were recorded using 3D Electromagnetic Articulography to capture lip and tongue kinematics. Speakers were engaged in an interactive question-answer task, which elicited two focus conditions (related to local variation of vocal effort) and two speaking styles (related to global variation of vocal effort). The study shows that focus is systematically encoded in supra-laryngeal articulation as a function of local vocal effort variation. Crucially, this can not only be observed in habitual but also in loud speech, a speaking style in which vocal effort is globally increased. The results can be interpreted as underlining the flexibility of the prosodic system in adapting to communicative demands.

References
Cho, T. (2004). Prosodically conditioned strengthening and vowel-to-vowel coarticulation in English. Journal of Phonetics, 32, 141–176. https://doi.org/10.1016/S0095-4470(03)00043-3.
Lindblom, B. (1990). Explaining Phonetic Variation: A Sketch of the H&H Theory. In W. J. Hardcastle & A. Marchal (Eds.), Speech Production and Speech Modelling (pp. 403–439). Kluwer Academic Publishers. https://doi.org/10.1007/978-94-009-2037-8_16.
Mefferd, A. S., & Green, J. R. (2010). Articulatory-to-acoustic relations in response to speaking rate and loudness manipulations. Journal of Speech, Language, and Hearing Research, 53(5). https://doi.org/10.1044/1092-4388(2010/09-0083).
Roessig, S., & Mücke, D. (2019). Modeling Dimensions of Prosodic Prominence. Frontiers in Communication, 4, art. 4: 1-19. https://doi.org/10.3389/fcomm.2019.00044.

SRPP: Towards inclusive automatic speech recognition

Automatic speech recognition (ASR) is increasingly used, e.g., in emergency response centers, domestic voice assistants, and search engines. Because of the paramount relevance spoken language plays in our lives, it is critical that ASR systems are able to deal with the variability in the way people speak (e.g., due to speaker differences, demographics, different speaking styles, and differently abled users). ASR systems promise to deliver objective interpretation of human speech. Practice and recent evidence however suggests that the state-of-the-art SotA ASRs struggle with the large variation in speech due to e.g., gender, age, speech impairment, race, and accents. The overarching goal in our project is to uncover bias in ASR systems to work towards proactive bias mitigation in ASR. In this talk, I will present systematic experiments aimed at quantifying, identifying the origin of, and mitigating the bias of state-of-the-art ASRs on speech from different, typically low-resource, groups of speakers, with a focus on bias against gender, age, regional accents and non-native accents.

SRPP: Les retours somatosensoriels permettent une perception catégorielle des voyelles

Cet exposé sera essentiellement centré sur la présentation de résultats expérimentaux, qui étaient au coeur de la thèse de Jean-François Patri soutenue en 2020, que nous avons publiés dans PNAS en 2020 (Patri, J. F., Ostry, D. J., Diard, J., Schwartz, J. L., Trudeau-Fisette, P., Savariaux, C., & Perrier, P. (2020). Speakers are able to categorize vowels based on tongue somatosensation. Proceedings of the National Academy of Sciences, 117(11), 6255-6263), et qui mettaient en évidence la capacité des participants à catégoriser les voyelles du français en l’absence de retours auditifs, sur la seule base des retours somatosensoriels.
Nous avons pour cela mis au point une tâche originale de positionnement de la langue , dans laquelle les participants devaient atteindre et maintenir différentes postures de la langue dans la région des voyelles /e, Ɛ, a/, selon une procédure de guidage ne faisant aucune référence à la production de la parole, et ceci grâce à une représentation sur un écran de cibles dans un espace déformé de la langue, où la forme linguale effective n’était pas reconnaissable. Une fois la langue positionnée, les sujets devaient identifier la voyelle associée à la posture de la langue atteinte, en l’absence de tout retour sensoriel. Nos résultats indiquent que la catégorisation des voyelles est possible sur la base du seul feedback somatosensoriel, et ceci avec une précision similaire à celle de la perception auditive des sons chuchotés.
Nous discuterons des implications de ces résultats pour un modèle de contrôle de la production de la parole.

SRPP: An experimental investigation of Sevillian Spanish metathesis

Sevillian Spanish is undergoing a metathesis change in /s/-voiceless stop sequences, whereby coda /s/ debuccalizes to [h] and then metathesizes with the following stop (e.g. /pasta/: [pahta] –> [patha]).  While the phonetic reality of this change is well-established (e.g. Ruch & Peters 2016), the phonological behavior of the resulting [Ch] sequences has not been investigated.  In most languages, [Ch] sequences are aspirated stops.  It has been proposed that Sevillian [Ch] sequences may be coalescing into single segments (O’Neill 2009), and this would represent an oddity in the realm of sound change.  In this talk, I present results from a series of experiments testing the underlying representation and possible causes of this metathesis change.  Behavioral evidence from two perception tasks suggest that Sevillian listeners still treat [Ch] sequences as underlying /s/-voiceless stop sequences: they map [h] in [Ch] sequences to an /s/ on a preceding word, and they treat syllables preceding [Ch] as if they were still closed by [h] for the purposes of stress assignment.  Finally, results from a cross-linguistic ABX task show that the cause of laryngeal metathesis in Sevillian (or in other languages) is not likely to be perceptual.  The results have implications for our understanding of segments, clusters, laryngeal metathesis, and the relative rarity of preaspirated segments cross-linguistically.

SRPP: A whole tongue approach to gutturals in Levantine Arabic using Generalized Additive Mixed Modelling of Tongue surfaces

Guttural consonants (i.e., uvular, pharyngealized and pharyngeal) in Arabic are argued to form a natural class due to phonological patterning and use of a common oro-sensory zone in the pharynx (McCarthy, 1994; Sylak-Glassman, 2014a, 2014b). Yet, phonetic studies have failed to successfully find a single phonetic exponent to explain this patterning. In fact, many studies have tried to quantify this patterning by looking at changes within the root of the tongue. In this study, I Generalized Additive Mixed Modelling to quantify the whole tongue changes as obtained from Ultrasound Tongue Imaging. Using various quantification methods (2D and 3D difference splines), I show how gutturals use a common area in the vocal tract, which is indeed located at the tongue root, but also located at the tongue dorsum and body. The observed patterns point towards a gradient rather than categorical change. The phonological feature that can explain these patterns is predominantly the feature [+retracted], which is a subcomponent of the feature [+constricted epilaryngeal tube] (following the predictions of the Laryngeal Articulator Model” (LAM, Esling, 2005; Esling et al., 2019). However, tongue root, dorsum and body changes cannot be simply quantified by the feature [+retracted]. I discuss implications for an alternative formal account.

References
Esling, J. (2005). There Are No Back Vowels: The Laryngeal Articulator Model. The Canadian Journal of Linguistics, 50(1), 13–44. https://doi.org/10.1353/cjl.2007.0007
Esling, J., Moisik, S., Benner, A., & Crevier-Buchman, L. (2019). Voice Quality: The Laryngeal Articulator Model (Issue 1). Cambridge University Press. https://doi.org/10.1017/9781108696555
McCarthy, J. J. (1994). The phonetics and phonology of Semitic pharyngeals. In P. A. Keating (Ed.), Phonological Structure and Phonetic Form (pp. 191–233). Cambridge University Press. https://doi.org/10.1017/CBO9780511659461.012
Sylak-Glassman, J. (2014a). An Emergent Approach to the Guttural Natural Class. Proceedings of the Annual Meetings on Phonology, 1–12. https://doi.org/10.3765/amp.v1i1.44
Sylak-Glassman, J. (2014b). Deriving Natural Classes: The Phonology and Typology of Post-Velar Consonants. University of California, Berkeley.

SRPP: Are geminate consonants always long consonants? Acoustic-phonetic properties of Polish geminates

Polish is a language with true lexical geminates that form minimal pairs with their singleton counterparts. What is more, Polish, unlike many other geminating languages, allows both single-articulated and rearticulated geminate realisations. It seems to be a unique feature, since geminates are traditionally considered to be long counterparts of corresponding singletons and rearticulation is not discussed in any comprehensive accounts of gemination in the world’s languages. In this talk, I will demonstrate acoustic-phonetic characteristics of Polish geminates in order to trigger a discussion on how rearticulation may be incorporated into phonetic and phonological theories of gemination.

SRPP: Perceptually-motivated influences on nasal coarticulatory variation in French

The current study was designed to investigate specific research questions about whether the imitability of nasal coarticulation is affected by the phonological status of vowel nasality in French. First, does the phonological status of vowel nasality in French as contrastive lead speakers to different patterns of coarticulatory imitation for words that have a nasal vowel minimal pair relative to words that do not? Furthermore, imitation also has been viewed as a process that facilitates intelligibility. Prior work has found that American English speakers imitate an increased degree of nasal coarticulation for lexical items that pose particular challenges on perception (Zellou et al., 2016). Thus, we ask also: Do intelligibility factors influence patterns of coarticulatory nasality in French? Specifically, we compare phonetic imitation in trials where there is pressure to be intelligible, i.e., an interlocutor needs clarification about the target word to identify, to where there is less such pressure, i.e., correct response.

SRPP: Nature et origine des traits phonologiques: une perspective développementale

Certaines questions fondamentales ont parsemé les débats théoriques concernant le trait phonologique depuis ses premières conceptions, au début des années 1900, dans le cadre des travaux du Cercle Linguistique de Prague (p.ex. Jakobson 1941; voir aussi Dresher 2016 pour un survol historique). Ces questions ont aussi évolué en lien avec l’avènement de la théorique de la Grammaire Universelle (Chomsky 19757; Chomsky & Halle 1968), laquelle postule un ensemble de traits phonologiques inné à tout être humain. D’autres approches de la phonologie s’opposent de manière radicale à cette théorie et rejettent l’existence même du trait phonologique comme unité psychologiquement réelle à l’humain (Vihman & Croft, 2007).

Au cours de cette présentation, nous adopterons une approche émergentiste des représentations phonologiques (Pierrehumbert 2003; Mielke 2008), incluant le trait phonologique, à partir de deux hypothèse inter-reliées: les traits sont bien réels, mais ils ne sont pas innés; ils doivent être acquis par l’apprenant en langue première. À partir de considérations phonétiques (perceptuelles, articulatoires) et phonologiques (distributionnelles, prosodiques), aussi en fonction d’autres niveaux de représentation (p.ex. lexical), nous discuterons d’un ensemble de faits de développement phonologiques dans le parler enfantin. Ces observations soulignent, d’une part, l’importance du trait phonologique pour une compréhension des données phonologiques. D’autre part, les traits phonologiques ne peuvent pas être innés; ils représentent des connaissances spécifiques à chaque langue, et l’émergence de ces connaissances se reflète très clairement dans les données enfantines.

SRPP: R Three Ways: Capturing the dynamics of Scottish word-final /r/, using DCT and GAMMs

Sounds can be represented in terms of ‘static’ acoustic measures, e.g. from a single timepoint, or a summary mean, or through ‘dynamic’ trajectories taken across the course of a segment. Soskuthy1 outlines an effective continuum from static, through less dynamic methods, such as Discrete Cosine Transformation (DCT), which forces trajectories to fixed reference shapes and whose coefficients can be hard to interpret, to the more intuitive outputs of Generalized Additive Mixed Models (GAMMs) whose flexible reference points permit closer approximation and visualization of trajectories. As we might expect, dynamic analyses reveal further insights over static measures into social-phonological contrasts (e.g. vowels, sibilants2,3) though the inherently dynamic nature of rhotics means that dynamic analysis of /r/ has been used to characterise these sounds for a long time.4 However, comparison of different different dynamic techniques for interpreting the same feature is less usual.5

This paper considers the relative contribution of static, less and more dynamic acoustic representations, specifically mean, DCT and GAMM, in specifying the role of linguistic, social and regional factors for Scottish word-final /r/ over the 20th century. Largely auditory analyses of Scottish /r/ report changes from apical trills/taps to postalveolar, retroflex and now bunched approximants favoured by middle-class females; long-term coda /r/ weakening has also been observed for urban Central Belt vernaculars.6 The acoustic signature of a lowered third formant is found for approximant /r/; taps, trills, and weakened /r/ show high and/or rising F3.7

21-point F3 formant tracks (>49ms) were taken from all instances of pre-segmented Scottish word-final /r/, extracted from 711 speakers covering geographical, social and ethnic diversity across an apparent-/real-time span of 100+ years; likely erroneous measures were removed against existing hand-measures (36,845 tokens, 275 words). The first three DCT coefficients, capturing the trajectories’ mean, slope and curvature, were modelled for following context and lexical stress, and gender, dialect, ethnicity and decade of birth, using LME in R, controlling for speech rate, (log)/r/ duration, (log)lexical frequency, and speaker/word. GAMMs were fitted separately to male and female speaker subsets, with smooths by (log)duration, stress, following context, and dialect, ethnicity, and decade of birth, and random smooths for speaker/word.

All measures show that Scottish word-final /r/ is influenced by linguistic, regional, ethnic and social factors. DCT analysis provides robust identification of key differences and interactions for the whole dataset; GAMMs permit more refined examination of contrasts of interest. For example, DCT shows how gender interacts with decade of birth: those born most recently show lowered F3 trajectories, especially female speakers, likely reflecting a gendered shift from taps to (more bunched) approximants. GAMMs show a similar pattern, but enable better inspection of differences between groups in trajectory shapes and variability over time.

References  1. Sóskuthy, M. Evaluating generalised additive mixed modelling strategies for dynamic speech analysis. J. Phon. 84, (2021). 2. Watson, C. I. & Harrington, J. Acoustic evidence for dynamic formant trajectories in Australian English vowels. JASA. 106, 458–468 (1999). 3. Reidy, P. F. Spectral dynamics of sibilant fricatives are contrastive and language specific. JASA. 140, 2518–2529 (2016). 4. Plug, L. & Ogden, R. A parametric approach to the phonetics of postvocalic /r/ in Dutch. Phonetica 60, 159–186 (2003). 5. Tanner, J. Structured phonetic variation across dialects and speakers of English and Japanese. (McGill University, 2020).  6. Stuart-Smith, J. & Lawson, E. Scotland: Glasgow/the Central Belt. in Listening to the Past (ed. Hickey, R.) 171–98 (CUP, 2017). 7. Lawson, E., Stuart-Smith, J. & Scobbie, J. M. The role of gesture delay in coda /r/ weakening: An articulatory, auditory and acoustic study. JASA. 143 (2018).

SRPP: Russian assimilatory palatalization as incomplete neutralization

Incomplete neutralization refers to small but significant phonetic traces of underlying contrasts in phonologically neutralizing contexts. The present study examines whether Russian assimilatory palatalization in C+j sequences also results in incomplete neutralization with respect to underlying palatalized consonants. Russian contrasts plain and palatalized consonants, e.g., /p/ vs. /pj/ with the “plain” stops possibly having a secondary articulation, involving retraction of the tongue dorsum (velarization/uvularization). However, Russian also has stop-glide sequences that form near-minimal pairs with palatalized stops: e.g., /pjot/ ‘drink (3ps pres)’ vs. /pʲok/ ‘bake (3ps past).’ In the environment preceding palatal glides, the contrast between palatalized and plain consonants is neutralized, due to the palatalization of the plain stop: /pjot/à[pʲjot] (assimilatory palatalization). The purpose of the study is to explore whether the neutralization is complete. To do so, we conducted an electromagnetic articulography (EMA) experiment examining temporal coordination and the spatial position of the tongue body in derived and underlyingly palatalized consonants. Articulatory results from four native speakers of Russian (one male) revealed that gestures in both conditions are coordinated as complex segments; however, there are differences across conditions consistent with the residual presence of a tongue dorsum retraction gesture in the « plain » obstruents. We conclude that neutralization of the plain-palatal contrast in Russian is incomplete—consonants in the assimilatory palatalization exhibit inter-gestural coordination characteristic of palatalized consonants along with residual evidence of an underlying tongue dorsum retraction (velarization/uvularization) gesture.