SRPP: Using MRI to examine the posterior place of articulation in Semitic emphatics: Pharyngealization in Arabic and ejectives in Tigrinya

(together with Marissa Barlaz, Ryan Shosted, Sharon Rose, Zhi-Pei Liang, Brad Sutton)

The phonemic inventories of many languages in the Semitic family (e.g. Arabic, Soqotri, Jibbali, Mehri) include a set of emphatic sounds (Versteegh, 2001). These are doubly-articulated speech sounds that have a primary oral constriction and a simultaneous secondary back constriction in the (velo)pharyngeal and/or glottal region. This talk will investigate the articulatory configuration associated with the secondary back constriction in two Semitic languages: Arabic and Tigrinya. The emphatics of Arabic are pharyngealized, and thus have a secondary constriction resulting from a retracted tongue root. In contrast, the secondary articulation of the Tigrinya ejectives is produced with a glottalic egressive airstream: the vocal folds are tightly adducted and the larynx is raised in a piston-like motion, thereby constricting the pharynx and increasing the supralaryngeal air pressure trapped in the cavity between the vocal folds and the oral constriction (Ladefoged, 1993). Upon release of the oral constriction, “the entrapped high-pressure air will momentarily burst forth in a short sharp explosion” (Catford, 2001:22). In Semitic cognates, pharyngealized consonants of Arabic correspond to ejective consonants in languages spoken in the Southern part of the Arabian Peninsula and the Horn of Africa (Bellem & Watson, 2014; Shosted & Rose, 2011). An example is the correspondence between pharyngealized /sˤ/ in Arabic / ħisˤaːr/ (enclosure), and ejective /s’/ in Tigrinya /ħas’ur/ (fence, enclosure).
In this study, we use ultra-fast Magnetic Resonance Imaging (MRI) based on the implementation of partial-separability models (Fu et. al., 2015) aided by automatic image processing techniques to examine and compare the back articulation of Arabic pharyngealized /sˤ/ and Tigrinya ejective /k’/. With this non-invasive method, it is possible to observe, in real-time and with high spatio-temporal resolution, the articulatory configuration and gestures that occur in the relatively inaccessible pharyngeal region of interest in this study. The results show clear retraction of the tongue root and dorsum in the case of the pharyngealized consonants of Arabic resulting in a more constricted pharynx. This is also observed in the ejective consonants of Tigrinya, in addition to forward expansion of the upper posterior pharyngeal wall. The pharynx is clearly more constricted in the latter type, reducing the volume of the supralaryngeal cavity behind the oral constriction in order to achieve the high pressure required for producing the sharp burst of ejectives.

SRPP: Learning the intonation of foreign languages using a gesture-controlled vocal synthesizer

This talk explores how vocal synthesis controlled in real-time by hand gestures (chironomy) can be used by non-native speakers for intonation practice of a foreign language. Such practice addresses three sources of difficulty for intonation learning. First, it can train the ear to perceive unfamiliar features in speech by presenting them through visual and kinesthetic modalities. Second, the control of pronunciation with hand gestures bypasses ingrained patterns in the natural voice that are difficult to correct. Finally, vocal synthesis enables a learner to focus on the suprasegmental level without being preoccupied with fine-phonetic detail on the segmental level. I present findings from two experiments: a pilot with non-native speakers of French and a study with francophone learners of English, discussing lessons learned and perspectives for future directions.

SRPP: Real-time Magnetic Resonance Imaging for Phonetic Research: Current Studies

The presentation will illustrate the use of real-time MRI to analyse the articulatory manoeuvres of speech organs that have hitherto generally been restricted to studies with very few speakers, namely velum movement and vertical larynx movement. The analyses are taken from a large corpus of German data for over 30 speakers. In addition, we will emphasize the flexibility of RT-MRI to extend consideration to additional articulators where required.
The overall aim of the analysis of velum kinematics is to improve our understanding of the phonetic forces that can lead diachronically to contrastive vowel nasalization. Specifically, we will look firstly at quite subtle temporal phenomena, namely differences in anticipatory nasal coarticulation related to voicing of the post-nasal consonant in VNC sequences. Secondly, we will consider spatial effects related to the influence of different vowel categories on the amount of velum opening in both nasal and non-nasal consonantal contexts.
The second part of the talk will consider larynx height in vowel production. We exploit the properties of the German vowel system in an attempt to give a more balanced picture of whether larynx height is more closely related to vowel height or rounding. Previous results in the literature are quite messy (particularly for vowel height) which in turn is related to the pervasive, but for larynx height particularly pronounced problem of inter-speaker variability. We look briefly at whether the amount of vowel-specific modulation of larynx height can be related to speaker anatomy and speaker preferences for lip protrusion.

References
Carignan, C., Coretta, S., Frahm, J., Harrington, J., Hoole, P., Joseph, A., Kunay, E., Voit, D. (2021). Planting the Seed for Sound Change: Evidence from Real-Time MRI of Velum Kinematics in German. Language 97(2), 333-364
Hoole, P., Coretta, S., Carignan, C., Kunay, E., Joseph, A., Voit, D., Frahm, J (2020) Control of larynx height in vowel production revisited: A real-time MRI study. International Seminar on Speech Production, https://issp2020.yale.edu/S08/hoole_08_03_033_poster.pdf
Kunay, E. (2021): Vowel nasalization in German: a real-time MRI study. Dissertation, LMU München: Faculty for Languages and Literatures. DOI: 10.5282/edoc.29340. URN: urn:nbn:de:bvb:19-293408

SRPP de Clara Ponchard et d’Amélie Elmerich

L’apport du traitement automatique pour la discrimination des variationsprosodiques normales et pathologiques
Clara Ponchard, Laboratoire de Phonétique et Phonologie

La maladie de Parkinson est une maladie neuro-dégénérative qui se caractérise par la destruction des neurones dopaminergiques impliqués dans le contrôle des mouvements. La perte de neurones dopaminergiques peut atteindre jusqu’à 50 % au moment du diagnostic clinique et augmente rapidement jusqu’à 4 ans après le diagnostic. Par conséquent, des marqueurs diagnostiques objectifs précoces sont absolument nécessaires. Les déficits vocaux sont généralement l’une des premières modalités interrompues dans la maladie de Parkinson, ils peuvent être observés dès 5 ans avant le diagnostic, bien avant que les autres systèmes moteurs ne commencent à montrer les effets de la diminution de la dopamine.
Ce fait, associé aux rapides améliorations techniques de l’analyse acoustique au cours des dernières décennies, a fait de l’analyse acoustique un outil intéressant pour l’évaluation clinique, l’évaluation de l’efficacité des traitements, la surveillance à distance de la progression de la maladie et le diagnostic précoce. Kreiman et al. (2020) ont recensé un inventaire des études sur les marqueurs vocaux utilisés pour caractériser la voix dans des études expérimentales et ont constatés que les paramètres utilisés pour évaluer la voix dans la maladie de Parkinson sont restés pratiquement inchangés depuis au moins la fin des années 1980.
Nous souhaitons dans cette étude proposer une méthode novatrice pour l’analyse des troubles de la parole liés à la maladie de Parkinson par l’observation des phénomènes aérodynamiques. Pour cela, nous réalisons une étude multi-paramétriques pour identifier quels sont les marqueurs vocaux qui permettent d’identifier où se situe le pathologique dans le continuum entre parole normale et pathologique.
Pour la phase aérodynamique, nous étudions la pression intra-orale et la pression sous-glottique estimée,   pour la phase acoustique la fréquence fondamentale et la durée de la production. A l’aide du traitement automatique nous avons automatisé les processus d’analyse et d’extraction de ces descripteurs afin de pouvoir traiter un grand nombre de données et d’exploiter les méthodes de l’apprentissage automatique afin d’identifier les patterns dans les données qui ne sont pas conformes au comportement attendu.

J. Kreiman, B. Gerratt (2020). Acoustic Analysis and Voice Quality in Parkinson Disease. Automatic Assessment of Parkinsonian Speech, Springer International Publishing, pp.1-23.

———————————

La polypose naso-sinusienne : perturbation aérodynamique et mesures de qualité de vie. Étude de cas et perspectives.
Amélie Elmerich, Laboratoire de Phonétique et Phonologie

Il s’agira dans un premier temps de présenter l’étude qui a été le point de départ de ma thèse. En effet certaines pathologies, comme la polypose naso-sinusienne primitive (PNS), perturbent la résonance et la qualité des sons en obstruant les cavités naso-sinusiennes (Hong et Jung, 1997). En cas de chirurgie, la communication entre sinus et cavités nasales et leur anatomie se retrouvent modifiées ce qui va perturber le passage de l’air et à terme la résonance nasale. Peu d’études (Borel, 2005 ; Giron et Mas, 2016 ; Elmerich, 2019) se sont intéressées à l’impact de la PNS sur la parole mais aussi sur la qualité de vie des patients. L’objectif de cette étude était d’évaluer l’impact de la PNS et de sa chirurgie sur la qualité de vie de 4 patients et de confronter ces résultats avec les données aérodynamiques acquises grâce à la station EVA2TM (Teston et Galindo, 1995). Les résultats obtenus avec les questionnaires VHI montraient une amélioration de la qualité de vie après chirurgie pour 3 patients. Les données aérodynamiques d’une patiente ont été mises en lien avec ces résultats. Dans un second temps, j’exposerai les perspectives de cette étude et les premiers résultats de ma thèse :  1) la dimension multiparamétrique (articulatoire – acoustique – aérodynamique – perceptive) et 2) la conception d’un nouvel outil d’enregistrement aérodynamique facilitant le lien entre les différentes approches.

SRPP: Modeling the Phonetic and Linguistic Continuums: an Engineer’s Perspective

Nowadays, machine learning is everywhere promising a new golden age where « artificial intelligence » will solve effortlessly all kinds of problems such as autonomous driving, developing new vaccines and talking and behaving just as normal human beings. Beyond this naive and oversimplistic vision stands some actual facts: the unprecedented collection of data combined with the advances of learning mathematical theories and the ever-increasing computational resources change radically the methodological approaches in various research areas. The field of linguistics is one of them: with new speech, text, and other types of communications from many languages recorded every day it is now possible to study languages empirically from a data-centric perspective.
However, data is not sufficient by itself: one also needs to design theoretical and practical models of speech and languages. This remains a challenging endeavor which is far from being over. In this talk, I will explore the evolution of the modeling of the speech signal through an engineer’s perspective: why a model was necessary in the first place, what aspect of the data was explored and how the field changed with the advent of deep learning techniques. Then, I will explore how this « speech engineering legacy » can be exploited to the study languages. Particularly, I will focus on the problem of modeling continuums: how can we model the phonetic and linguistic continuums and can we use these tools to establish the notion of « distance » between sounds and between languages.

SRPP: The Phonotactic Knowledge of Mandarin Chinese

Native speakers of a language have strong intuitions not only about what the existing words are in their language, but also about which novel forms are phonologically possible or impossible. It is assumed that these intuitions are guided by their phonotactic knowledge of the language. Many factors may play a role in this phonotactic knowledge. Apart from the lexical statistics of a language, grammatical phonotactic constraints can also influence this knowledge. First, using a non-word acceptability judgement experiment, I found that Mandarin native speakers show gradient acceptability among various types of missing syllables. Missing syllables that violate principled phonotactic constraints (in this case, the Obligatory Contour Principle) received lower ratings than those who do not, and this effect from phonotactic constraints is independent from lexical statistics. Second, in a lexical decision experiment, a more ‘online’ paradigm, I found that Mandarin speakers rejected the OCP-violating missing syllables faster than the non-violating ones. Results from these two experiments suggest that grammatical constraints are a part of Mandarin speakers’ phonotactic knowledge. Third, the observed phonotactic judgement was then modelled by a maximum entropy phonotactic grammar (Hayes & Wilson 2008). The well-formedness predictions offered by the output grammar are overall a good reflection of speakers’ acceptability ratings obtained experimentally, but there are also a number of systematic mismatches. These mismatches indicate that phonological learning is a biased process: some phonotactic patterns are easier to learn and thus have stronger effects on non-word judgement, whereas other patterns are harder to learn and have limited effects on non-word judgement. The biases introduced are (i) phonetic naturalness bias, (ii) allophony bias and (iii) suprasegmental bias. Adding these biases improved the performance of the phonotactic grammar. This indicates that phonotactic knowledge can largely be determined from the lexicon, but grammatical constraints and multiple biases also have effects.

SRPP: Gender, femininity and the voice

Fine phonetic detail carries social information. Intra- and interindividual variation in speech patterns is a source of information used to index and infer numerous facets of a speaker’s social characteristics (e.g., age, gender, sexual orientation, ethnicity, personality characteristics). In my talk, I will focus on phonetic cues of femininity/masculinity used in production and perception. What does a voice make sound feminine? Can listeners infer the level of self-ascribed femininity of a speaker from his/her voice? Results from our study with German participants combining perceived and self-rated femininity ratings point to a common ground between speakers and listeners in which they negotiate the social space of gender through speech. One reason suggested to explain this finding is that voice characteristics indicate the underlying hormone quality (e.g. testosterone level) of a speaker and thus are relevant within mating context. The role of hormones as a source of variation in fine phonetic detail is discussed by looking at the relationship between pregnancy and voice and by giving insights into an ongoing project investigating hormone levels, intra- and interindividual variation and perceived attractiveness.

SRPP: On Tashlhit Root Structure and its Implications for the Organization of the Lexicon

The present work is an attempt to investigate the notion of roots in Amazigh (Berber), more particularly in Tashlhit from a theoretical and a psycholinguistic perspective, contributing to the debate on two views on morphological theory: the root-based and the word-based. The study aims particularly to explore whether the root is a significant morphological unit in the Tashlhit lexicon, on the one hand, and to provide further arguments against the exclusive consonantality of roots in Tashlhit, on the other. With this end in view, we tried to investigate the lexical properties of root structure in Tashlhit by distinguishing between two types of roots, the vocalic and the consonantal. At the theoretical level, the analysis is carried out under the premises of Optimality Theory (Prince and Smolensky, 1993/2004; McCarthy and Prince, 1993, 1995). Facts from the verbal and nominal morphology of the language are presented to account for the linguistic system of the language through constraint ranking. At the psycholinguistic level, we followed the assumption that linguistic phenomena are not exempt from extralinguistic factors (Berent & Shimron, 1997, 2003; Frish & Zawaydeh, 2001; Prunet, Béland & Idrissi, 2000). More specifically, we discussed data from auditory supraliminal priming experiments, based on measuring the reaction times of the participants. The results of our theoretical and empirical analyses show that the root is an essential morphemic unit which plays an important role in the understanding of language processing. We proved that roots in Tashlhit, similarly to semantic features, have some psycholinguistic reality and, hence, they have significant implications for the organization of the Tashlhit lexicon. Only phonological properties did not facilitate lexical access, leading to the conclusion that phonology has no role in word recognition processes in Tashlhit. In addition, we provided arguments in favor of the coexistence of both consonantal and vocalic roots in the Tashlhit lexicon.

SRPP: L’évaluation de la prononciation dans une L2 : peut-on ne pas se référer à un modèle de L1 ?

L’évaluation de la prononciation dans une L2 est une tache problématique, qu’elle soit effectuée par des humains ou par des machines. On peut évaluer la prononciation des apprenants par sa ressemblance à la prononciation des natifs (nativelikeness, ou authenticité de la prononciation), ou par le degré d’intelligibilité / compréhensibilité. Evidemment, l’évaluation de la prononciation en termes d’authenticité implique que l’on se réfère à un modèle natif censé représenter la cible des apprenants (par exemple, l’anglais RP ou GA, le français parisien, etc.) ; cela est discutable dans le cas des cours de langue dans un pays étranger, et encore plus dans le contexte de l’ELF (English as a Lingua Franca) et de l’International English. Dans cette présentation, nous explorons la possibilité d’évaluer la prononciation d’une L2 sans faire référence à un modèle natif (c’est-à-dire, intrinsèquement), dans l’esprit d’intelligibilité et de compréhensibilité. Nous testons s’il est possible d’évaluer la prononciation d’une L2 en mesurant la distance entre réalisations phonétiques de catégories phonologiques dans la L2, en supposant qu’une plus grande distance entraîne une meilleure intelligibilité et compréhensibilité. Cette idée se base sur des méthodologies récentes issues de la sociophonétique mesurant le degré de superposition entre réalisations de différentes catégories de voyelles. Nous présentons les résultats obtenus sur plusieurs corpus oraux d’apprenants d’anglais L2 et français L2, et nous les comparons avec les résultats de tests de compréhensibilité réalisés avec des auditeurs natifs, ainsi qu’avec des approches traditionnelles s’appuyant sur un modèle natif.

SRPP: Does sonority have a phonological basis?

Sonority (or segmental strength) is a central concept in phonology. Especially syllable phonotactics, the distribution of consonants and vowels within syllables, is explained elegantly, though incompletely, by the Sonority Hierarchy and the Sonority Sequencing Principle (Selkirk 1984, Zec 1988, Clements 1990 a.o.).

Challenges to Sonority Theory, mostly connected to consonant sequences, have been known for a while and accounted for with theoretical patches such as the assumption of extrasyllabic segment positions. Sibilants can occur as the first member of syllable-initial consonant clusters, resulting in sonority reversals or plateaus if they are followed by other obstruents. In addition to such plateaus, sonority reversals involving sonorants preceding obstruents, as found in Slavic languages, for instance, also violate the Sonority Sequencing Principle. The imperative of high sonority in the nucleus received a blow already from Bell’s (1978) study on
syllabic consonants. Zec (1995) noted that coda inventories can be discontinuous on the sonority scale, e.g., Kiowa bans only fricatives from the coda and allows the other major consonant classes, including stops. She concluded that codas are also subject to general markedness constraints, e.g., *[+continuant])$.

In this talk, we will look at sonority thresholds for syllable nuclei and codas and see that the assumption that admission of the segment class at a certain stratum on the sonority hierarchy implies wellformedness of all higher sonority classes in these positions (e.g., Zec 2007 onsyllabicity) is not necessarily warranted. Some languages present with syllabic sibilants (e.g.,
varieties of Chinese, Duanmu 2007, Shao 2020) but do not allow other higher sonority consonants as syllable nuclei. Others only accept syllabic nasals or nasals and sibilants to the exclusion of the higher sonority liquids. The Bolivian language Chipaya also displays only sibilant nuclei (in addition to vocalic nuclei), as well as complex onsets in which the inner consonant is a velar or post-velar fricative while sonorant consonants are not attested in this position. This restriction on complex onsets is neither predicted by Minimum Sonority Distance (Harris1982, Selkirk 1984) nor Sonority Dispersion (Clements 1990).

On the basis of these and other data, I argue that sonority is only epiphenomenal and the real driving forces creating the impression of sonority sequencing are general system markedness as well as positional markedness. Building on a proposal by Krämer & Zec (2020, in prep.), cross-linguistic as well as language-internal sonority inconsistencies, as attested for nasals, liquids and sibilants are explained by the assumption of language-specific variation inthe use and specification of the features [±continuant] and [±strident]. I propose that Sonority Sequencing is a side effect of syntagmatic contrast maximization, a constraint that ultimately might be derived from the Obligatory Contour Principle (Leben 1973). Adjacent segments within syllables should be maximally different in terms of major class and manner feature specifications. In a sequence of two segments the two maximally different ones in terms of major class features are a stop consonant and a vowel. In a demisyllable with two consonants preceding the vowel, the internal consonant is most suitable if it is maximally different from both the consonant to its left and the vowel to its right, resulting in the dispersion effect noted by Clements (1990). The hypothesis that apparent Sonority Sequencing is a side effect of contrast maximization and sonority-independent markedness is supported not only by SSP violations and sonority threshold paradoxes but also by observations on cross-linguistic variation in consonant inventories and some trends in L1 acquisition.