SRPP – Page 10 – Laboratoire de Phonétique et Phonologie

SRPP de Boram Lee et de Jinyu Li

A one-year longitudinal study of development of L2 Korean for French learners – The role of cue weighting in L2

Boram Lee (Laboratoire de Phonétique et Phonologie)

L’acquisition d’une seconde langue nécessite d’apprendre quels indices sont pertinents pour les contrastes de la L2, ainsi que le poids relatif de ces indices, ou « cue weighting ». En français, le VOT (voice onset time) est l’indice principal qui permet la distinction entre des consonnes voisées et non-voisées bien qu’il existe des indices secondaires tels que l’intensité du bruit de relâchement ou la fréquence fondamentale (f0) sur la voyelle suivante (ex. Cho & Ladefoged, 1999). En revanche, en coréen, le VOT et la f0 sont tout aussi importants afin de distinguer les trois catégories de consonnes, soit les lenis, fortis et aspirées (ex. Kim. M, 2004). La question centrale de recherche est suivant. Est-ce que les apprenants francophones du coréen L2 modifient au cours de leur apprentissage le cue weighting, qui leur permet de distinguer le contraste de trois catégories ? Plus précisément, comment les apprenants francophones du coréen L2 adaptent leur « cue weighting » du français L1 dans la production et la perception au fil du temps. Afin d’examiner ces questions, 21 étudiantes francophones en première année d’études coréennes ont participé à deux tâches de production ainsi que deux tâches de perception de manière longitudinale (tous les mois), pour un total de 8 sessions. Nous nous focaliserons sur les résultats de perception. Nous avons mené deux tâches d’identification avec des stimuli naturels puis avec des stimuli synthétisés. Les stimuli naturels étaient constitués d’une syllabe de type CV (C: lenis /t/, /tç/, fortis /t*/, /tç*/ et aspirée /tʰ/, / tçʰ/ et V: /a/, /i/, /o/). Les stimuli synthétisés ont été créés par resynthèse sur Praat (7 niveaux de VOT × 5 niveaux de f0). Les résultats longitudinaux de perception montrent plus de difficulté à l’identification de la lenis comparée à l’aspirée et la fortis, et l’identification de la lenis n’augmente pas au cours de l’année alors que celui des deux autres catégorie s’améliore. En examinant le cue weighting, les apprenantes présentent un patron différent selon les catégories en termes de cue weighting : le VOT sert à distinguer aspirée vs. fortis/lenis, et la f0 sert à distinguer fortis vs lenis, suggérant une organisation en deux catégories au lieu de trois. Nous avons également observé que les apprenantes utilisent principalement l’indice VOT, qui est l’indice principal en français, et non la f0 afin de produire les trois catégories. En résumé, cette etude montre l’influence de la L1 sur la perception et la production de la L2 en termes de cue weighting.

Speech temporal control modulated by prosodic factors and sense of agency

Boram Lee (Laboratoire de Phonétique et Phonologie)

Flexibility of speech motor control in the temporal dimension, observed in the durations of speech gestures, is especially shown by the studies on time-delayed auditory feedback (DAF), in which speakers hear their speech with a certain delay and consequently respond to this temporal mismatch by decreasing their speech rate (i.e., lengthening syllables). However, given the complexity of the speech motor control system, we may expect that the temporal flexibility of speech production is modulated by various factors, including prosodic factors (e.g., the syllable position in the prosodic structure) and psycholinguist factors (e.g., the sense of agency during speech production, who is a determinant factor of controlling our own speech). To bring evidence in favor of these hypotheses, we conducted an experiment based on real-time perturbations of auditory feedback, in which 30 French speakers heard their speech with a certain delay, and/or with a shift of the F0. More precisely, we tested if with increased delay in the auditory feedback, accented syllabic nuclei were lengthened more than non-accented nuclei. Moreover, we tested if the constant F0 shift in the auditory feedback could alter the speakers’ sense of agency during speech production, and if this effect could modulate the speaker’s responses to DAF. The results show that speakers’ response to DAF depend on the syllabic status in the prosodic hierarchy. DAF lengthens more accented syllables and thus increases the saliency of accented syllables, leading to a reorganization of speech rhythm, as demonstrated by a strengthening of the coordination of syllabic and supra-syllabic amplitude modulations. The results also show that the constant F0 shift may affect the speakers’ sense of agency, reducing thus the effects of DAF. However, this reduction effect interacts with the effect of the prosodic structure.

SRPP: Using MRI to examine the posterior place of articulation in Semitic emphatics: Pharyngealization in Arabic and ejectives in Tigrinya

(together with Marissa Barlaz, Ryan Shosted, Sharon Rose, Zhi-Pei Liang, Brad Sutton)

The phonemic inventories of many languages in the Semitic family (e.g. Arabic, Soqotri, Jibbali, Mehri) include a set of emphatic sounds (Versteegh, 2001). These are doubly-articulated speech sounds that have a primary oral constriction and a simultaneous secondary back constriction in the (velo)pharyngeal and/or glottal region. This talk will investigate the articulatory configuration associated with the secondary back constriction in two Semitic languages: Arabic and Tigrinya. The emphatics of Arabic are pharyngealized, and thus have a secondary constriction resulting from a retracted tongue root. In contrast, the secondary articulation of the Tigrinya ejectives is produced with a glottalic egressive airstream: the vocal folds are tightly adducted and the larynx is raised in a piston-like motion, thereby constricting the pharynx and increasing the supralaryngeal air pressure trapped in the cavity between the vocal folds and the oral constriction (Ladefoged, 1993). Upon release of the oral constriction, “the entrapped high-pressure air will momentarily burst forth in a short sharp explosion” (Catford, 2001:22). In Semitic cognates, pharyngealized consonants of Arabic correspond to ejective consonants in languages spoken in the Southern part of the Arabian Peninsula and the Horn of Africa (Bellem & Watson, 2014; Shosted & Rose, 2011). An example is the correspondence between pharyngealized /sˤ/ in Arabic / ħisˤaːr/ (enclosure), and ejective /s’/ in Tigrinya /ħas’ur/ (fence, enclosure).
In this study, we use ultra-fast Magnetic Resonance Imaging (MRI) based on the implementation of partial-separability models (Fu et. al., 2015) aided by automatic image processing techniques to examine and compare the back articulation of Arabic pharyngealized /sˤ/ and Tigrinya ejective /k’/. With this non-invasive method, it is possible to observe, in real-time and with high spatio-temporal resolution, the articulatory configuration and gestures that occur in the relatively inaccessible pharyngeal region of interest in this study. The results show clear retraction of the tongue root and dorsum in the case of the pharyngealized consonants of Arabic resulting in a more constricted pharynx. This is also observed in the ejective consonants of Tigrinya, in addition to forward expansion of the upper posterior pharyngeal wall. The pharynx is clearly more constricted in the latter type, reducing the volume of the supralaryngeal cavity behind the oral constriction in order to achieve the high pressure required for producing the sharp burst of ejectives.

SRPP: Learning the intonation of foreign languages using a gesture-controlled vocal synthesizer

This talk explores how vocal synthesis controlled in real-time by hand gestures (chironomy) can be used by non-native speakers for intonation practice of a foreign language. Such practice addresses three sources of difficulty for intonation learning. First, it can train the ear to perceive unfamiliar features in speech by presenting them through visual and kinesthetic modalities. Second, the control of pronunciation with hand gestures bypasses ingrained patterns in the natural voice that are difficult to correct. Finally, vocal synthesis enables a learner to focus on the suprasegmental level without being preoccupied with fine-phonetic detail on the segmental level. I present findings from two experiments: a pilot with non-native speakers of French and a study with francophone learners of English, discussing lessons learned and perspectives for future directions.

SRPP: Real-time Magnetic Resonance Imaging for Phonetic Research: Current Studies

The presentation will illustrate the use of real-time MRI to analyse the articulatory manoeuvres of speech organs that have hitherto generally been restricted to studies with very few speakers, namely velum movement and vertical larynx movement. The analyses are taken from a large corpus of German data for over 30 speakers. In addition, we will emphasize the flexibility of RT-MRI to extend consideration to additional articulators where required.
The overall aim of the analysis of velum kinematics is to improve our understanding of the phonetic forces that can lead diachronically to contrastive vowel nasalization. Specifically, we will look firstly at quite subtle temporal phenomena, namely differences in anticipatory nasal coarticulation related to voicing of the post-nasal consonant in VNC sequences. Secondly, we will consider spatial effects related to the influence of different vowel categories on the amount of velum opening in both nasal and non-nasal consonantal contexts.
The second part of the talk will consider larynx height in vowel production. We exploit the properties of the German vowel system in an attempt to give a more balanced picture of whether larynx height is more closely related to vowel height or rounding. Previous results in the literature are quite messy (particularly for vowel height) which in turn is related to the pervasive, but for larynx height particularly pronounced problem of inter-speaker variability. We look briefly at whether the amount of vowel-specific modulation of larynx height can be related to speaker anatomy and speaker preferences for lip protrusion.

References
Carignan, C., Coretta, S., Frahm, J., Harrington, J., Hoole, P., Joseph, A., Kunay, E., Voit, D. (2021). Planting the Seed for Sound Change: Evidence from Real-Time MRI of Velum Kinematics in German. Language 97(2), 333-364
Hoole, P., Coretta, S., Carignan, C., Kunay, E., Joseph, A., Voit, D., Frahm, J (2020) Control of larynx height in vowel production revisited: A real-time MRI study. International Seminar on Speech Production, https://issp2020.yale.edu/S08/hoole_08_03_033_poster.pdf
Kunay, E. (2021): Vowel nasalization in German: a real-time MRI study. Dissertation, LMU München: Faculty for Languages and Literatures. DOI: 10.5282/edoc.29340. URN: urn:nbn:de:bvb:19-293408

SRPP de Clara Ponchard et d’Amélie Elmerich

L’apport du traitement automatique pour la discrimination des variationsprosodiques normales et pathologiques
Clara Ponchard, Laboratoire de Phonétique et Phonologie

La maladie de Parkinson est une maladie neuro-dégénérative qui se caractérise par la destruction des neurones dopaminergiques impliqués dans le contrôle des mouvements. La perte de neurones dopaminergiques peut atteindre jusqu’à 50 % au moment du diagnostic clinique et augmente rapidement jusqu’à 4 ans après le diagnostic. Par conséquent, des marqueurs diagnostiques objectifs précoces sont absolument nécessaires. Les déficits vocaux sont généralement l’une des premières modalités interrompues dans la maladie de Parkinson, ils peuvent être observés dès 5 ans avant le diagnostic, bien avant que les autres systèmes moteurs ne commencent à montrer les effets de la diminution de la dopamine.
Ce fait, associé aux rapides améliorations techniques de l’analyse acoustique au cours des dernières décennies, a fait de l’analyse acoustique un outil intéressant pour l’évaluation clinique, l’évaluation de l’efficacité des traitements, la surveillance à distance de la progression de la maladie et le diagnostic précoce. Kreiman et al. (2020) ont recensé un inventaire des études sur les marqueurs vocaux utilisés pour caractériser la voix dans des études expérimentales et ont constatés que les paramètres utilisés pour évaluer la voix dans la maladie de Parkinson sont restés pratiquement inchangés depuis au moins la fin des années 1980.
Nous souhaitons dans cette étude proposer une méthode novatrice pour l’analyse des troubles de la parole liés à la maladie de Parkinson par l’observation des phénomènes aérodynamiques. Pour cela, nous réalisons une étude multi-paramétriques pour identifier quels sont les marqueurs vocaux qui permettent d’identifier où se situe le pathologique dans le continuum entre parole normale et pathologique.
Pour la phase aérodynamique, nous étudions la pression intra-orale et la pression sous-glottique estimée, pour la phase acoustique la fréquence fondamentale et la durée de la production. A l’aide du traitement automatique nous avons automatisé les processus d’analyse et d’extraction de ces descripteurs afin de pouvoir traiter un grand nombre de données et d’exploiter les méthodes de l’apprentissage automatique afin d’identifier les patterns dans les données qui ne sont pas conformes au comportement attendu.

J. Kreiman, B. Gerratt (2020). Acoustic Analysis and Voice Quality in Parkinson Disease. Automatic Assessment of Parkinsonian Speech, Springer International Publishing, pp.1-23.

———————————

La polypose naso-sinusienne : perturbation aérodynamique et mesures de qualité de vie. Étude de cas et perspectives.
Amélie Elmerich, Laboratoire de Phonétique et Phonologie

Il s’agira dans un premier temps de présenter l’étude qui a été le point de départ de ma thèse. En effet certaines pathologies, comme la polypose naso-sinusienne primitive (PNS), perturbent la résonance et la qualité des sons en obstruant les cavités naso-sinusiennes (Hong et Jung, 1997). En cas de chirurgie, la communication entre sinus et cavités nasales et leur anatomie se retrouvent modifiées ce qui va perturber le passage de l’air et à terme la résonance nasale. Peu d’études (Borel, 2005 ; Giron et Mas, 2016 ; Elmerich, 2019) se sont intéressées à l’impact de la PNS sur la parole mais aussi sur la qualité de vie des patients. L’objectif de cette étude était d’évaluer l’impact de la PNS et de sa chirurgie sur la qualité de vie de 4 patients et de confronter ces résultats avec les données aérodynamiques acquises grâce à la station EVA2TM (Teston et Galindo, 1995). Les résultats obtenus avec les questionnaires VHI montraient une amélioration de la qualité de vie après chirurgie pour 3 patients. Les données aérodynamiques d’une patiente ont été mises en lien avec ces résultats. Dans un second temps, j’exposerai les perspectives de cette étude et les premiers résultats de ma thèse : 1) la dimension multiparamétrique (articulatoire – acoustique – aérodynamique – perceptive) et 2) la conception d’un nouvel outil d’enregistrement aérodynamique facilitant le lien entre les différentes approches.

SRPP: Modeling the Phonetic and Linguistic Continuums: an Engineer’s Perspective

Nowadays, machine learning is everywhere promising a new golden age where « artificial intelligence » will solve effortlessly all kinds of problems such as autonomous driving, developing new vaccines and talking and behaving just as normal human beings. Beyond this naive and oversimplistic vision stands some actual facts: the unprecedented collection of data combined with the advances of learning mathematical theories and the ever-increasing computational resources change radically the methodological approaches in various research areas. The field of linguistics is one of them: with new speech, text, and other types of communications from many languages recorded every day it is now possible to study languages empirically from a data-centric perspective.
However, data is not sufficient by itself: one also needs to design theoretical and practical models of speech and languages. This remains a challenging endeavor which is far from being over. In this talk, I will explore the evolution of the modeling of the speech signal through an engineer’s perspective: why a model was necessary in the first place, what aspect of the data was explored and how the field changed with the advent of deep learning techniques. Then, I will explore how this « speech engineering legacy » can be exploited to the study languages. Particularly, I will focus on the problem of modeling continuums: how can we model the phonetic and linguistic continuums and can we use these tools to establish the notion of « distance » between sounds and between languages.

SRPP: The Phonotactic Knowledge of Mandarin Chinese

Native speakers of a language have strong intuitions not only about what the existing words are in their language, but also about which novel forms are phonologically possible or impossible. It is assumed that these intuitions are guided by their phonotactic knowledge of the language. Many factors may play a role in this phonotactic knowledge. Apart from the lexical statistics of a language, grammatical phonotactic constraints can also influence this knowledge. First, using a non-word acceptability judgement experiment, I found that Mandarin native speakers show gradient acceptability among various types of missing syllables. Missing syllables that violate principled phonotactic constraints (in this case, the Obligatory Contour Principle) received lower ratings than those who do not, and this effect from phonotactic constraints is independent from lexical statistics. Second, in a lexical decision experiment, a more ‘online’ paradigm, I found that Mandarin speakers rejected the OCP-violating missing syllables faster than the non-violating ones. Results from these two experiments suggest that grammatical constraints are a part of Mandarin speakers’ phonotactic knowledge. Third, the observed phonotactic judgement was then modelled by a maximum entropy phonotactic grammar (Hayes & Wilson 2008). The well-formedness predictions offered by the output grammar are overall a good reflection of speakers’ acceptability ratings obtained experimentally, but there are also a number of systematic mismatches. These mismatches indicate that phonological learning is a biased process: some phonotactic patterns are easier to learn and thus have stronger effects on non-word judgement, whereas other patterns are harder to learn and have limited effects on non-word judgement. The biases introduced are (i) phonetic naturalness bias, (ii) allophony bias and (iii) suprasegmental bias. Adding these biases improved the performance of the phonotactic grammar. This indicates that phonotactic knowledge can largely be determined from the lexicon, but grammatical constraints and multiple biases also have effects.

SRPP: Gender, femininity and the voice

Fine phonetic detail carries social information. Intra- and interindividual variation in speech patterns is a source of information used to index and infer numerous facets of a speaker’s social characteristics (e.g., age, gender, sexual orientation, ethnicity, personality characteristics). In my talk, I will focus on phonetic cues of femininity/masculinity used in production and perception. What does a voice make sound feminine? Can listeners infer the level of self-ascribed femininity of a speaker from his/her voice? Results from our study with German participants combining perceived and self-rated femininity ratings point to a common ground between speakers and listeners in which they negotiate the social space of gender through speech. One reason suggested to explain this finding is that voice characteristics indicate the underlying hormone quality (e.g. testosterone level) of a speaker and thus are relevant within mating context. The role of hormones as a source of variation in fine phonetic detail is discussed by looking at the relationship between pregnancy and voice and by giving insights into an ongoing project investigating hormone levels, intra- and interindividual variation and perceived attractiveness.

SRPP: On Tashlhit Root Structure and its Implications for the Organization of the Lexicon

The present work is an attempt to investigate the notion of roots in Amazigh (Berber), more particularly in Tashlhit from a theoretical and a psycholinguistic perspective, contributing to the debate on two views on morphological theory: the root-based and the word-based. The study aims particularly to explore whether the root is a significant morphological unit in the Tashlhit lexicon, on the one hand, and to provide further arguments against the exclusive consonantality of roots in Tashlhit, on the other. With this end in view, we tried to investigate the lexical properties of root structure in Tashlhit by distinguishing between two types of roots, the vocalic and the consonantal. At the theoretical level, the analysis is carried out under the premises of Optimality Theory (Prince and Smolensky, 1993/2004; McCarthy and Prince, 1993, 1995). Facts from the verbal and nominal morphology of the language are presented to account for the linguistic system of the language through constraint ranking. At the psycholinguistic level, we followed the assumption that linguistic phenomena are not exempt from extralinguistic factors (Berent & Shimron, 1997, 2003; Frish & Zawaydeh, 2001; Prunet, Béland & Idrissi, 2000). More specifically, we discussed data from auditory supraliminal priming experiments, based on measuring the reaction times of the participants. The results of our theoretical and empirical analyses show that the root is an essential morphemic unit which plays an important role in the understanding of language processing. We proved that roots in Tashlhit, similarly to semantic features, have some psycholinguistic reality and, hence, they have significant implications for the organization of the Tashlhit lexicon. Only phonological properties did not facilitate lexical access, leading to the conclusion that phonology has no role in word recognition processes in Tashlhit. In addition, we provided arguments in favor of the coexistence of both consonantal and vocalic roots in the Tashlhit lexicon.

SRPP: L’évaluation de la prononciation dans une L2 : peut-on ne pas se référer à un modèle de L1 ?

L’évaluation de la prononciation dans une L2 est une tache problématique, qu’elle soit effectuée par des humains ou par des machines. On peut évaluer la prononciation des apprenants par sa ressemblance à la prononciation des natifs (nativelikeness, ou authenticité de la prononciation), ou par le degré d’intelligibilité / compréhensibilité. Evidemment, l’évaluation de la prononciation en termes d’authenticité implique que l’on se réfère à un modèle natif censé représenter la cible des apprenants (par exemple, l’anglais RP ou GA, le français parisien, etc.) ; cela est discutable dans le cas des cours de langue dans un pays étranger, et encore plus dans le contexte de l’ELF (English as a Lingua Franca) et de l’International English. Dans cette présentation, nous explorons la possibilité d’évaluer la prononciation d’une L2 sans faire référence à un modèle natif (c’est-à-dire, intrinsèquement), dans l’esprit d’intelligibilité et de compréhensibilité. Nous testons s’il est possible d’évaluer la prononciation d’une L2 en mesurant la distance entre réalisations phonétiques de catégories phonologiques dans la L2, en supposant qu’une plus grande distance entraîne une meilleure intelligibilité et compréhensibilité. Cette idée se base sur des méthodologies récentes issues de la sociophonétique mesurant le degré de superposition entre réalisations de différentes catégories de voyelles. Nous présentons les résultats obtenus sur plusieurs corpus oraux d’apprenants d’anglais L2 et français L2, et nous les comparons avec les résultats de tests de compréhensibilité réalisés avec des auditeurs natifs, ainsi qu’avec des approches traditionnelles s’appuyant sur un modèle natif.

Information relative aux conditions de la RGPD concernant les cookies