A gender bias in syntax:
evidence from French and English corpora
Yanis da Cunha & Anne Abeillé
– LingLunch, October 13th –
Introduction: gender biases in syntax
Methodology: a corpus study
Results
Discussion
Subject | Object | |
---|---|---|
Men | 88% | 70% |
Women | 12% | 30% |
Results from Langue Française
(French journal, 1969-1971 & 2008-2017)
Role Noun (RN), disambiguation point
Shorter reading times when masculine role nouns were subjects (1a) rather than objects (1b).
→ Evidence from corpus and experiments for gender effect in syntactic functions:
Alignment theory (Aissen, 1999; Levshina, 2021). Three ingredients:
The syntactic gender bias is reminiscent of prominence alignment (Esaulova, 2015)
→ Prediction to be tested
Alignment has two types of effects across languages: categorical or gradient (Bresnan et al., 2001)
→ Obligatory voice alternation in Lummi (Jelinek & Demers, 1983)
a.*The man knows me
b.xcitŋsə ə cə swəyʔqəʔ
know.PASSIVE by the man
‘I am known by the man’
Subject > Object
1st pers. > 3rd pers.
→ Preferences in construction alternations:
→ Investigation of gender biases in syntactic function
Are just linguists biased (genre-specific bias) or is the syntactic gender bias more general?
Using English and French corpora
→ French is a grammatical gender language (Gygax et al., 2019)
For inanimate nouns and pronouns, grammatical gender is arbitrary
a. Marie\(_i\), elle\(_i\) est grande ‘Mary is tall’
b. La voiture\(_i\), elle\(_i\) est grande ‘The car is big’
For human nouns, grammatical gender is interpreted as social gender
Unefem ministre ‘Female minister’
Unmasc ministre ‘Male minister’
→ English is a natural gender language (Gygax et al., 2019)
Nouns don’t have gender, except lexically gendered nouns
Father, actress, queen
Gender in pronouns reflects social gender
He (male) vs. she (female)
Language | Corpus | Modality | Genre | Annotations |
---|---|---|---|---|
English | EWT | Written | Web | Dependencies |
English | GUM | Mixed | Fiction, newspaper, conversation | Dependencies |
English | Lines | Written | Fiction, technical | Dependencies |
English | Partut | Mixed | Talks, legal texts, Wikipedia | Dependencies |
English | PUD | Written | Newspaper, Wikipedia | Dependencies |
French | C-Oral-Rom | Spoken | Conversation | Dependencies |
French | CFPP | Spoken | Conversation | Dependencies |
French | CRFP | Spoken | Conversation | Dependencies |
French | Frantext | Written | Fiction, non fiction | POS-tagged |
French | FrWac | Written | Web | POS-tagged |
French | FTB | Written | Newspaper | Dependencies |
All subjects and objects nouns
I notified her grandmother with the time of the departure. (EWT)
J’ai rencontré sa fille (CFFP)
‘I met her daughter’
All subjects and subjects personal pronouns 3rd
English: he, she, him, her, it
3. And she is the STAR of the family. (EWT)
French: il, elle, le, la
4. Elle est née en 1917. (CFPP)
‘She was born in 1917’
Exclusion:
Linear sequency : Punctuation + Determiner + Noun + Verb + Determiner + Noun
Le chauffeurmasc actionne les essuie-glacesmasc (Le Triomphe de Thomas Zins, Jung Matthieu 2018, Frantext)
‘The driver activates the windshield wipers’
Votre filsmasc apprendra la voltigefem (L’Enfant des Lumières, Françoise Chandernagor 1995, Frantext)
‘Your son will learn aerobatics’
A sentence from the French TreeBank:
# textID = id270359
# author = LAZARE FRANCOISE
# gender = woman
1 Si si C CS s=s 11 mod
2 les le D DET g=m|n=p|s=def 3 det
3 programmes programme N NC g=m|n=p|s=c 7 suj
4 d’ de P P _ 3 dep
5 économies économie N NC g=f|n=p|s=c 4 obj.p
6 sont être V V m=ind|n=p|p=3|t=pst 7 aux.pass
7 respectés respecter V VPP g=m|m=part|n=p|t=past 1 obj.cpl
Thanks to Liam Duignan who collaborated with me
Language | POS | # data points | # human | # masculine | # feminine |
---|---|---|---|---|---|
English | Noun | 32885 | 1357 | 773 | 361 |
English | Pronoun | 5275 | 3111 | 2371 | 740 |
French | Noun | 151537 | 18601 | 72265 | 59019 |
French | Pronoun | 14014 | NA | 10466 | 3548 |
Bayesian logistic regression modeling with brms package on R (Bürkner, 2017)
→ Prominence alignment:
Subject > Object
Human > Inanimate
Consistent across languages/corpora
→ Prominence alignment:
Subject > Object
Definite > Indefinite
Consistent across languages/corpora
→ Prominence alignment:
Subject > Object
Pronoun > Noun
Consistent across languages/corpora
→ Just human nouns
→ Hypothetical prominence alignment:
Subject > Object
Male > female
→ Inanimate nouns in French
→ Evidence for a differential interpretation of gender on humans and inanimates (Corbett, 1991; Elpers et al., 2022).
Interaction between grammatical gender and animacy in French
→ Masculine humans are more likely to be subjects
→ Grammatical gender plays a role only for humans (ie. when it is interpreted as social gender)
- βMasc:Hum = 0.33
- CI =[0.14, 0.53]
- P(βMasc:Hum > 0) = 100%.
Brough et al. (2020) show an effect of social perspective on word order:
This is called a like-me effect or a me-first preference (Cooper & Ross, 1975)
→ Let’s take into account speaker gender in our data
→ Human nouns in French
→ Like-me effect (Brough et al., 2020)
- βMasc:Male = 0.56
- CI = [0.29, 0.83]
- P(βMasc:Male > 0) = 100%
→ Strong evidence for an interaction between speaker gender and argument gender
Social cognition studies show that men are perceived as more agentive (Koenig et al., 2011)
And agents are mainly subjects
It is possible that men are more often subjects because they are more often agents
→ Let’s take into account semantic roles
→ Human nouns and pronouns
Passive subjects & active objects:
Passive subjects are more often masculine than active objects
→ So the syntactic bias is not reducible to semantic roles
- βPassiveSubj = 0.31
- CI = [0.08, 0.54]
- P(βPassiveSubj > 0) = 99%
→ Human nouns and pronouns
Active & passive subjects:
Active subjects are more often masculine than passive subjects
→ So semantics may also play a role
- βActiveSubj = 0.51
- CI = [0.37, 0.65]
- P(βActiveSubj > βPassiveSubj ) = 91%
A third possible confound factor: topicality.
In French newspapers, Huet et al. (2013) that 50% of articles mention men, while only 10% mention women.
We thus expected that men are more talk about, more often topics.
Discourse topics are realized as pronouns (Lambrecht, 1994) and pronouns are more often subjects.
→ Men are more often topics
- βMale = 0.42
- CI = [0.29, 0.55]
- P(βMale > 0) = 100%
The gender bias in syntactic functions can be found in multiple corpora and genres
→ It is not just from linguists
→ Even in a language like English without grammatical gender
Discourse tendencies for men to be topics, agents and subjects
→ The syntax bias does not reduce to semantics
Speaker gender plays a role in the syntactic gender bias: like-me effect (Brough et al., 2020)
→ Interactions between social and linguistic structures
Gender could be integrated to prominence alignment model (Esaulova, 2015)
→ Traditional scale: human > non human animate > inanimate (Aissen, 2003)
→ Suggested scale: male > female > non gendered animate > inanimate
→ If humanness plays a role, social gender may play a role
→ Predicted effects should be tested experimentally (Bornkessel-Schlesewsky & Schlesewsky, 2009)
→ Take gender into account in experimental syntax (corpus studies and experiments)
Gender was not annotated in Bresnan’s work on dative alternation for instance (Bresnan & Ford, 2010)
Gender may be controlled for in some experiments but rarely reported:
Language with 3 genders (Greek)
Gender effect on differential object marking and subject inversion in Spanish
Disentangling function assignment and word order preferences in Greek (Exling on Wednesday!)
Merci !
Thank you!
Testing the interaction between argument gender and social gender:
→ Strong evidence for an interaction
Pronouns in French and English
Nouns in French, pronouns in English