The role of prosody in early speech processing and language acquisition

This talk will present a series of near-infrared spectroscopy studies, investigating how newborn infants process speech prosody. Specifically, it will show that newborns have already gained experience about the prosody of their native language from prenatal exposure, and that they use prosody as a privileged cue, overriding other aspects of language structure, such as word order.

Articulation, acoustics, and perception : Evidence from cross-linguistic perception of onset consonant clusters

Cross-language research on consonant cluster production has shown that consonant clusters in different languages are produced with different degrees of articulatory timing lag. For instance, German onset consonant clusters are produced with relatively shorter inter-consonant lag than Georgian ones. The present study examines perceptual sensitivity to these cross-linguistic timing differences.

Native listeners of Georgian, German, French, and Japanese are tested on an AXB similarity judgment task using stimuli including consonant clusters produced by German and Georgian speakers. Stimuli are /bla, gla, gna/ syllables recorded along with articulatory (EMA) data. Short lag German tokens and long lag Georgian tokens are selected as A and B, with Xs of varying degrees of lag chosen from either Georgian or German recordings. Results show that all four groups of listeners are sensitive to the cross-linguistic differences in articulatory timing lag : when the timing lag of X is closer to A (or B), participants are more likely to choose A (or B, respectively). [Georgian : β=0.69, p<0.001 ; German : β=0.74, p<0.001 ; French : β=0.44, p<0.001, Japanese : β=0.48, p<0.01]. This finding suggests that adult listeners are capable of discerning non-native sub-phonemic details regardless of their native phonotactics. Effects of sub-phonemic details on similarity judgments by the four listener groups are investigated. These include different measures of articulatory lags and acoustic properties that are known to be related to inter-gestural timing within clusters (e.g., presence or duration of vocalic release). Implications of possible task effects will be discussed.

Dissociations in the development of articulatory timing : Implications for representation and control in speech production

Articulatory timing refers to the temporal coordination of speech articulators to achieve motor goals in sequence. Given this definition, timing can be thought of either as a motor speech skill or as a language behavior. Whereas stable coordination patterns emerge with neuromotor maturation and speech motor practice ; goal sequencing emerges with the acquisition of language. Although this neat division within timing behavior that references « speech » versus « language » is compatible with a traditional performance/competence distinction, it is at odds with the interactionist perspective adopted in behavior-focused research on the acquisition of speech-language. Nonetheless, the division provides a good characterization of findings from two developmental studies we conducted recently. In particular, the findings from one study suggest a dissociation in the representation and execution of temporal patterns within words. The findings from the other suggest differential development of schwa reduction and coarticulation in DET+N sequences. In this talk, I will present both studies and endeavor to resolve the seeming contradiction between these findings and an interactionist perspective by incorporating them into a developmentally-sensitive model of speech production.

SRPP Multi Intervenants

Intervenant 1 : Christelle Exare (LPP, UMR7018, CNRS/Sorbonne Nouvelle)

Titre : Les aspirations intrusives dans l’anglais des apprenants francophones

Résumé :
Cette thèse décrit les aspirations intrusives, saillantes mais labiles, souvent représentées par /h/ ou [h], dans l’anglais L2 des francophones (par exemple : I hate pasta au lieu de I ate pasta). Le phonème /h/ est débile dans les langues indo-européennes. Historiquement, la consonne subit une lénition progressive, avec une forte variation linguistique et extralinguistique (diatopique, diastratique et diachronique). La fricative glottale /h/ à l’attaque de mot en anglais L1 est caractérisée par i) l’ouverture de la glotte et ii) la configuration supraglottale de la voyelle suivante. Dans cette étude, les réalisations des attaques de mot en anglais sont étudiées à partir de trois sortes de données : i) un texte lu par 8 anglophones et 10 francophones, ii) la parole spontanée de 25 francophones et iii) un test de perception passé par 30 francophones. La fréquence d’apparition des aspirations intrusives montre une forte variabilité inter- et intra-locuteurs. Elles sont retrouvées exclusivement i) en position initiale absolue ou ii) après un phone vocalique. Une pause, une glottalisation ou une aspiration sont trois procédés qui conduisent à augmenter l’écart temporel entre deux voyelles en hiatus. Une glottalisation ou une aspiration ont en commun de correspondre à une tension glottale. Les aspirations illicites semblent être des traces d’hypercorrection qui pourraient être dues i) à une assimilation incomplète du contraste phonétique [ʔ] [h] de l’anglais, ii) à la réparation phonologique facultative de *#V, iii) à un geste de constriction glottale n’atteignant pas sa cible (glottalisation inchoative) et à un geste intrusif d’ouverture glottale. La correction phonétique proposée par le professeur recherche la prise de conscience par l’apprenant i) du contrôle de la glotte pour l’aspiration, la glottalisation et le maintien d’un voisement modal en frontière morphologique, et ii) des particularités syllabiques du français et de l’anglais qui font des frontières de mot des points d’achoppement potentiels en anglais L2.

Mots clés : phonétique, anglais L2, intrusion, [h], aspiration, glotte, hiatus, frontière, position initiale

Intrusive tokens of aspiration in French learners’ L2 English

This dissertation describes some salient, yet variable, intrusive tokens of aspiration, often represented by /h/ or [h], in French learners’ L2 English productions (e.g. : I hate pasta instead of I ate pasta). The phoneme /h/ is weak in Indo-European languages. Historically, the consonant has undergone progressive lenition and exhibits strong intralinguistic and extralinguistic —diachronic, dialectal and stylistic— variation. The glottal fricative /h/ at English word onsets is characterised by i) an open glottis and ii) the supraglottal configuration of the following vowel. In the present study, the onsets of English words are analysed in three types of data : i) a text read by 8 native English speakers and 10 French learners of English, ii) spontaneous speech elicited from 25 French learners and iii) a perception test taken by 30 French-speaking students. The frequency of intrusive tokens of aspiration at L2 English word onsets shows high inter- and intra-speaker variability. Importantly, however, they only surface i) in strict initial position or ii) after a vocalic sound. A pause, some glottalisation or some aspiration are three processes that contribute to increasing the time span between two vowels in a hiatus context. Glottalisation and aspiration both correspond to glottal tension. Illicit tokens of aspiration can be considered as occurrences of hypercorrection, which may result from : i) incomplete assimilation of the English [ʔ] [h] contrast, ii) optional phonological repair of *#V, iii) a glottal constriction gesture that fails to reach its target (i.e. inchoative glottalisation) and an intrusive gesture of glottal opening. Phonetic corrective feedback in L2 learning is proposed. It aims at raising the learner’s awareness of i) glottal control for aspiration, glottalisation, and continuous modal voicing across word boundaries and ii) some syllabic specificities of French and English that make word boundaries potential stumbling blocks in French learners’ L2 English.

Keywords : phonetics, L2 English, intrusion, [h], aspiration, glottis, hiatus, boundary, initial position


Intervenant 2 : Simon Landron (LPP, UMR7018, CNRS/Sorbonne Nouvelle))

Titre : L’opposition de voisement des occlusives orales du français par des locuteurs taïwanais

Résumé :
Cette thèse traite de l’acquisition des occlusives sourdes /p t k/ et sonores /b d g/ du français par 11 locutrices taïwanaises de niveau intermédiaire à avancé. La situation de Taïwan est qualifiée de diglossique, les locuteurs parlent généralement deux langues dont les principales sont le chinois mandarin et le taïwanais. Le chinois mandarin possède les occlusives /p t k ph th kh/ tandis que le taïwanais possède les occlusives /b g p t k ph th kh/. L’analyse acoustique des logatomes CVCVCVC où C=/b d g p t k/ et V=/a i u/ révèle une grande hétérogénéité entre les locutrices : les indices des natifs du français pour opposer entre sourdes et sonores sont parfois utilisés par les non-natifs, parfois non. On note l’influence du chinois mandarin. Un test de perception révèle une moins bonne discrimination des paires de consonnes /b p/, /d t/ et /g k/ en syllabe CV si V=/a/, comparé à /i u/. Ces résultats suggèrent une tendance générale chez ces auditrices à mieux discriminer les occlusives du français lorsque le VOT des sourdes est plus long et à ne pas tenir compte du VOT négatif des voisées. En perception, les indices pour discriminer les occlusives aspirées et non-aspirées du chinois mandarin semblent ainsi également être utilisés en français. Nous n’avons pas relevé de signe d’une influence du taïwanais, où l’opposition de voisement existe cependant.

Mots clés : phonétique, acoustique, perception, prononciation, voisement, consonne occlusive, VOT, v-ratio, français, chinois mandarin, taïwanais, Français Langue Etrangère (FLE)

Abstract

This dissertation deals with the acquisition of French voiceless stops /p t k/ and voiced stops /b d g/ by 11 Taiwanese intermediate or advanced learners of L2 French. The linguistic situation in Taiwan is described as diglossia. Most speakers speak two languages, mainly Mandarin Chinese and Taiwanese. Mandarin Chinese has plosives /p t k ph th kh/ while Taiwanese has /b g p t k ph th kh/. An acoustic analysis of CVCVCVC logatoms where C = /b d g p t k/ and V = /a i u/ shows important heterogeneity among speakers. The cues used by French native speakers to oppose voiceless and voiced stops are irregularly used by non-native speakers. The influence of Mandarin Chinese is noted. A perception test shows poorer discrimination among pairs of consonants (/b p/, /d t/ and /g k/) in CV syllable when V = /a/, as compared to /i u/. The results show that non-native listeners tend to, firstly, better discriminate the voiceless plosives of French when the VOT is longer and secondly, ignore the negative VOT of voiced stops. As regards perception, the cues used in Mandarin Chinese to discriminate between aspirated and non-aspirated stops consonants seem to be used in French too. No clue to the influence of Taiwanese has been found, although the opposition of voicing exists.

Keywords : phonetics, acoustics, perception, pronunciation, voicing, stop consonant, VOT, v-ratio, French, Mandarin Chinese, Taiwanese, French as a Foreign Language (FLE)

SRPP Multi Intervenants

Intervenant 1 : Alejandrina Cristia (LSCP, CNRS/ENS)

Titre : How little is enough ? Input to early language across cultures

Résumé :
The last 50 years have witnessed a surge of studies on early language acquisition, with much experimental research studying the perception of verbal input in industrialized societies. This evidence suggests that early forms of language-specific phonological, lexical, and syntactic processing can appear even in the first two years of life. Moreover, some have argued that these early experiences both provide the foundations of abstract language structures evident later on, and set the « rhythm » of later speech perception, such that children in rich linguistic environments learn language (at least phonology and the lexicon) faster than their less fortunate peers. In the case of lexical processing, for instance, a 5x difference in number of words spoken to the child would lead to a doubling of vocabulary size by age 2, which is akin to an astounding 6-month delay. This has led to public campaigns, saliently the « Thirty million word gap » in the USA, aimed at encouraging parents to speak more to their young child. Such conclusions are based mostly on the study of individual variation that is partially correlated with socioeconomic status in populations that can be described as WEIRD (Western, Educated, Industrialized, Rich, and Democratic), an acronym designed to convey the fact that such populations are not representative of humankind on synchronic or diachronic terms. In this talk, I will provide data demonstrating that differences :
a) within USA samples varying in maternal education generalize to some but not other WEIRD countries (e.g., UK)
b) between WEIRD samples from different countries (i.e., Argentina versus USA) can be as large as those in (a)
c) between WEIRD and non-WEIRD samples (e.g., the hunter-farmer Tsimane population in the Bolivian Amazon) are considerably larger (in the order of 10x) than those in (a)
It follows from (a-c) that one should observe systematic differences among WEIRD countries, and even greater ones between WEIRD and traditional societies in terms of, at least, phonological and lexical development. It also follows logically from the idea that rate of learning is fixed by input quantity that populations in traditional societies with no formal education should speak languages with smaller vocabularies than WEIRD societies. I discuss the challenges in assessing these two predictions, and the scientific and societal implications that would follow from ratifying or ruling out these predictions.


Intervenant 2 : Leo Wetzels and Geraldo Faria

Titre : Bakairi and the Feature ‘Voice’

Résumé :
It has been claimed by many that the feature [-voice] plays no role in the early (or lexical) phonology of any language. Statements of this nature can be found in Cho (1990a,b), Lombardi (1991, 1996), Iverson & Salmons (1996), among many others. The feature [-voice] is said to be ‘unmarked’, or, almost equivalently, is regarded as a ‘default’ feature in the phonological grammar of every language. As such, the role of [-voice] should be confined to the phonetic component, or, at the very most, it should be active only in the postlexical component (cf. Lombardi 1996). One does not, therefore, expect to find a language where the feature [-voice] must be specified at the level of lexical representation, or participates in lexical rules of any kind, including rules of assimilation and dissimilation.
In this presentation we will illustrate the complex and intriguing lexical phonology of the feature [voiceless] in Bakairi, an indigenous language of Brazil.

Analysing confusion matrices through the computation of Information Transfer Rates : An illustration based on a study of the perception of nasal consonants and vowels in channel-vocoded speech.

The analysis of human speech perception data often relies on the analysis of overall performance but also on the interpretation of « qualitative » errors based on the exploration of « confusion matrices ». Such confusions are also frequently used in « machine » or « statistical » classification experiments (using either supervised or unsupervised methods, e.g. LDA, logistic regression, clustering…). Within the framework of a project investigating the perception of nasal consonants and vowels in « channel-vocoded » speech and cochlear implanted deaf listeners, I have developed an R package that is devoted to computing « Information Transfer Rate » analyses as described in Miller & Nicely (1955) : a mathematical tool that provides quantitative measurements of qualitative classification errors. Though this approach was introduced as early as the mid-50s by Miller & Nicely (1955) based on Shannon’s (1948) Information Theory and later extended by Wang & Bilger (1973), these tools have recently been reintroduced in the analysis of speech classification tasks (see Christiansen & Greenberg, 2012). I have since started developing an R package (iteR) that is devoted to help users analyse and manipulate confusion matrices within this framework. This package will also be compared with David van Leuwen’s sinfa.R script (https://github.com/davidavdav/sinfa) which is itself based on Wang & Bilger (1973)’s work. Both tools are complementary and iteR may later integrate sinfa.R procedures.

Acoustic variability and stability in speech and song – perception and production

In my talk, I will present results from two lines of my research, speech development and disorders. Both lines intend to elucidate the role of acoustic, and in particular, temporal variability and stability in the perception and production of verbal and non-verbal expressions.

In the first part, I will focus on how acoustic variability informs us about the distinction between verbal and musical functions. Here, the effects of pitch and temporal variability / stability on perception will be discussed by examining phenomena at the boundary between speech and song. As an introduction, I will present acoustic correlates that have been found to underlay the “speech-to-song transformation”, a perceptual illusion that makes us perceive spoken speech as song via repeated presentations of speech. The boundaries between speech and song are also fuzzy in the case of infant-directed communication, a “musilanguage” infants are confronted with during their first months of life. New results will be presented shedding light on the acoustic (and particular temporal) variability in infant-directed speech and singing and the perception of infant-directed expressions by infants and adults.

The second part aims to address the role of temporal variability in speech production as a marker of speech disorders. I will focus on my recent research on stuttering, a developmental motor speech disorder which is characterized by severe disruptions of the flow of speech. Temporal aspects (variability and timing) were measured in verbal and musical auditory-motor tasks in children, adolescents and adults who stutter. Results reveal that temporal variability and timing are altered in stuttering in the verbal as well as the non-verbal domain. The results will be discussed in light of the hypothesis that stuttering is linked to a deficit in predictive timing during speech production and, potentially, auditory-motor coupling.

Segmental and prosodic prominence interactions

In this talk I would like to argue for 1) the existence of universal salience scales of segmental and prosodic elements (in a given context), and 2) that the mappings between the two categories go through an abstract process. We shall observe a variety of phonological patterns and some of our experimental results supporting these ideas.

It is well-known in synchronic and diachronic phonology that segmental properties and prosodic positions interact. One of such examples is the never-stressed English schwa, the phonetically modest vowel. Among many aspects in loanword phonology as well as in L2 phonetic studies that we have been working on, our biggest interest has been about phonological interpretations and expressions of segmental properties by prosodic elements, and in the reversed way, about the prosodic position’s impact on segmental perception. Our approach is to assume that the phonological encoding of saliency of sounds involves an abstract process. It is indicated by the facts that while prominence of phonetic elements is universally fixed, their incorporation into phonological grammars varies among languages, but essentially in implicational manners respecting the prominence scales.

To introduce the idea of mapping between segmental and prosodic salience, I shall first provide examples of epenthetic vowels in stress assignment, of interactions between epenthetic vowels and tones, and of epenthesis constraining syllable shape in loanword phonology. Secondly, I shall present arguments for the relation between perception and phonological patterns based on the auditory function and phonological data from various fields. We shall observe there an implicational phonological pattern across languages. Finally, to stress the abstract and synchronic aspect of the mapping process, I shall present a part of our experimental results on a codaless language, White Hmong’s English coda interpretation in terms of tones.

La production et la perception de l’allemand chez les Français : transmission & apprentissage

L’apprentissage des langues étrangères pose des problèmes de production et perception liés aux différences phonétiques des deux langues.
Nous avons examinés les différences qui existent entre l’allemand et le français afin d’analyser les problèmes de production et perception de l’allemand chez des locuteurs ayant le français comme langue maternelle (ou dominante). Les difficultés attendues concernent le contraste de durée vocalique, la production des engma et les fricatives telles que le /h/et le /ç/.

Les analyses segmentales ont été faites à partir de deux corpus oraux en allemand :
le premier comprend 20 locuteurs natifs du français et 20 locuteurs germanophones (FLACGS)
et le deuxième a été enregistré de manière longitudinale à quatre reprises au fil d’un semestre. Ce corpus contient des productions de 30 étudiants qui ont suivi un cours sur la prononciation de l’allemand à Paris 3. Les étudiants ont été divisé en deux groupes dont un a pu profiter en plus de supports visuels (spectrogrammes) en classe et pour leur feedback individuel.

De plus, nous donnons un aperçu des travaux en cours qui concernent la perception de l’allemand chez des sujets français. Il s’agit de deux tests comportementaux de discrimination (l’emplacement de l’accent lexical, durée vocalique et le contraste /h/ – /ʔ/) et une expérience EEG.

Les apprenants des deux groupes ont pu progresser au fil d’un semestre concernant leur prononciation de l’allemand. La production des consonnes, notamment du /h/ se sont améliorés tandis que la production des voyelles longues et brèves reste problématique. Nous ne constatons pas un net impact positif des supports visuels sur la progression des apprenants. Il est tout de même probable que les supports visuels aident à augmenter la conscience linguistique (awareness).

How native and non-native listeners process schwa reduction in French : A combined eye-tracking and ERP study

Words are often reduced (e.g., the English word /jɛʃeɪ/ for /jɛstədeɪ/ yesterday). Native
listeners generally understand reduced forms effortlessly. Non-native listeners of a language, in contrast, can have problems understanding reduced forms. The question is to which extent highly proficient learners suffer from reduction, and which mechanisms may be responsible for their problems.

We investigated these questions in a combined EEG and eye-tracking experiment with French native and Dutch non-native listeners of French. We focused on schwa reduction in the first syllable of French nouns (e.g., /rkɛ/̃ for /rəkɛ̃/ requin ‘shark’). Schwa-reduced and unreduced nouns were presented in the middle of sentences and were not predictable from the preceding context.

Participants were asked to listen carefully to the spoken sentences, and to look at the
screen. During the presentation of the spoken sentence, participants saw a display of four line drawings. Each display consisted of a depiction of the target word (e.g., requin), a phonological competitor (e.g., rideau /rido/ ’curtain’), and two neutral distractors (e.g., voiture /vwatyr/ ’car’ and fleur /flɶr/ ’flower’). Eye movements and EEG were recorded simultaneously throughout the experiment.
The EEG data show no N400 effect of reduction in the natives. Natives seem to activate the representations of reduced forms as easily as unreduced forms. Unlike natives, nonnatives only showed an N400 for unreduced, but not for reduced forms. This suggests that non-natives have not activated the meaning of reduced forms. The eye tracking data reveal that the non-natives considered competitors more seriously and for a longer stretch of time than the natives. Interestingly, when the non-natives heard a reduced target, it was mainly the phonological competitor that was interfering with the identification of the target word, whereas when hearing a full form, both phonological competitor and neutral targets were being fixated. Taken together, the data suggest that highly proficient learners suffer more from reduction than natives do. They have more problems in accessing representations of reduced words and seriously consider other lexical candidates during lexical search.
Words are often reduced (e.g., the English word /jɛʃeɪ/ for /jɛstədeɪ/ yesterday). Native
listeners generally understand reduced forms effortlessly. Non-native listeners of a language, in contrast, can have problems understanding reduced forms. The question is to which extent highly proficient learners suffer from reduction, and which mechanisms may be responsible for their problems.

We investigated these questions in a combined EEG and eye-tracking experiment with French native and Dutch non-native listeners of French. We focused on schwa reduction in the first syllable of French nouns (e.g., /rkɛ/̃ for /rəkɛ̃/ requin ‘shark’). Schwa-reduced and unreduced nouns were presented in the middle of sentences and were not predictable from the preceding context.

Participants were asked to listen carefully to the spoken sentences, and to look at the
screen. During the presentation of the spoken sentence, participants saw a display of four line drawings. Each display consisted of a depiction of the target word (e.g., requin), a phonological competitor (e.g., rideau /rido/ ’curtain’), and two neutral distractors (e.g., voiture /vwatyr/ ’car’ and fleur /flɶr/ ’flower’). Eye movements and EEG were recorded simultaneously throughout the experiment.

The EEG data show no N400 effect of reduction in the natives. Natives seem to activate the representations of reduced forms as easily as unreduced forms. Unlike natives, nonnatives only showed an N400 for unreduced, but not for reduced forms. This suggests that non-natives have not activated the meaning of reduced forms. The eye tracking data reveal that the non-natives considered competitors more seriously and for a longer stretch of time than the natives. Interestingly, when the non-natives heard a reduced target, it was mainly the phonological competitor that was interfering with the identification of the target word, whereas when hearing a full form, both phonological competitor and neutral targets were being fixated. Taken together, the data suggest that highly proficient learners suffer more from reduction than natives do. They have more problems in accessing representations of reduced words and seriously consider other lexical candidates during lexical search.