A gender bias in syntax:
evidence from French and English corpora

Yanis da Cunha & Anne Abeillé
– LingLunch, October 13th –

Outline

Introduction: gender biases in syntax
Methodology: a corpus study
Results
Discussion

Introduction

Gender biases in syntax

Previous findings

1. Corpus studies

Gender biases in linguistic examples from research papers
Both in English (Cépeda et al., 2021; Kotek et al., 2021) and French (Richy & Burnett, 2020)

	Subject	Object
Men	88%	70%
Women	12%	30%

Results from Langue Française
(French journal, 1969-1971 & 2008-2017)

Previous findings

2. Eye-tracking experiments

Gender effect in relative clause processing in German (Esaulova & Von Stockhausen, 2015)
Ambiguity resolution uses gender cues

Role Noun (RN), disambiguation point

Die Flugbegleiterin_femSG, die viele Tourist-en_masc/-innen_fem_PL beobachtet hat_SG/haben_PL, ist aufmerksam
a. Subject RN: ‘The flight attendant, whom many Tourists_fem/masc have observed, is attentive.’
b. Object RN: ‘The flight attendant, who has observed many Tourists_fem/masc, is attentive.’

Shorter reading times when masculine role nouns were subjects (1a) rather than objects (1b).

→ Evidence from corpus and experiments for gender effect in syntactic functions:

Men are more likely to be subjects
Women are more likely to be objects

Prominence & alignment

Alignment theory (Aissen, 1999; Levshina, 2021). Three ingredients:

arguments bear so-called referential prominence features: animacy, definiteness, person…
prominence and syntactic functions are hierarchically ordered (eg. animate > inanimate)
function assignment to an argument is constrained by alignment with prominence.

The syntactic gender bias is reminiscent of prominence alignment (Esaulova, 2015)

→ Prediction to be tested

Prominence & alignment

Alignment has two types of effects across languages: categorical or gradient (Bresnan et al., 2001)

Categorical effects

→ Obligatory voice alternation in Lummi (Jelinek & Demers, 1983)

a.*The man knows me

b.xcitŋsə ə cə swəyʔqəʔ

know.PASSIVE by the man

‘I am known by the man’

Subject > Object

1^st pers. > 3^rd pers.

Gradient effects

→ Preferences in construction alternations:

Dative alternation in English (Bresnan & Ford, 2010)
Active/passive alternation in English (Hundt et al., 2018)
Active/passive in French (Da Cunha & Abeillé, 2020)

Our study

→ Investigation of gender biases in syntactic function

Are just linguists biased (genre-specific bias) or is the syntactic gender bias more general?
Using English and French corpora

French
English

→ French is a grammatical gender language (Gygax et al., 2019)

For inanimate nouns and pronouns, grammatical gender is arbitrary

a. Marie\(_i\), elle\(_i\) est grande ‘Mary is tall’
b. La voiture\(_i\), elle\(_i\) est grande ‘The car is big’
For human nouns, grammatical gender is interpreted as social gender

Une_fem ministre ‘Female minister’
Un_masc ministre ‘Male minister’

→ English is a natural gender language (Gygax et al., 2019)

Nouns don’t have gender, except lexically gendered nouns

Father, actress, queen
Gender in pronouns reflects social gender

He (male) vs. she (female)

Methodology

Selected corpora

Language	Corpus	Modality	Genre	Annotations
English	EWT	Written	Web	Dependencies
English	GUM	Mixed	Fiction, newspaper, conversation	Dependencies
English	Lines	Written	Fiction, technical	Dependencies
English	Partut	Mixed	Talks, legal texts, Wikipedia	Dependencies
English	PUD	Written	Newspaper, Wikipedia	Dependencies
French	C-Oral-Rom	Spoken	Conversation	Dependencies
French	CFPP	Spoken	Conversation	Dependencies
French	CRFP	Spoken	Conversation	Dependencies
French	Frantext	Written	Fiction, non fiction	POS-tagged
French	FrWac	Written	Web	POS-tagged
French	FTB	Written	Newspaper	Dependencies

Sampling

For dependency corpora

All subjects and objects nouns
1. I notified her grandmother with the time of the departure. (EWT)
2. J’ai rencontré sa fille (CFFP)
  ‘I met her daughter’
All subjects and subjects personal pronouns 3^rd
- English: he, she, him, her, it
  3. And she is the STAR of the family. (EWT)
- French: il, elle, le, la
  4. Elle est née en 1917. (CFPP)
  ‘She was born in 1917’
Exclusion:
- No copula (être/be, rester/to become…)

Sampling

For POS-tagged corpora

Linear sequency : Punctuation + Determiner + Noun + Verb + Determiner + Noun
- Punctuation acts as a sentence boundary
- NP+V+NP is assumed to be a transitive SVO sequence

Le chauffeur_masc actionne les essuie-glaces_masc (Le Triomphe de Thomas Zins, Jung Matthieu 2018, Frantext)
‘The driver activates the windshield wipers’
Votre fils_masc apprendra la voltige_fem (L’Enfant des Lumières, Françoise Chandernagor 1995, Frantext)
‘Your son will learn aerobatics’

A sentence from the French TreeBank:

# textID = id270359
# author = LAZARE FRANCOISE
# gender = woman

1 Si si C CS s=s 11 mod
2 les le D DET g=m|n=p|s=def 3 det
3 programmes programme N NC g=m|n=p|s=c 7 suj
4 d’ de P P _ 3 dep
5 économies économie N NC g=f|n=p|s=c 4 obj.p
6 sont être V V m=ind|n=p|p=3|t=pst 7 aux.pass
7 respectés respecter V VPP g=m|m=part|n=p|t=past 1 obj.cpl

Syntactic function: subject, object
Grammatical category: noun, pronoun
Grammatical gender in some French corpora: masculine, feminine
Speaker gender in French corpora (in metadata)
Definiteness: definite, indefinite
Construction: active, passive, intransitive

Animacy and grammatical gender for nouns in French using Flexique (Bonami et al., 2013)
→ Automatic annotation

Animacy and social gender for nouns in English
→ Manual annotation

Thanks to Liam Duignan who collaborated with me

Studied samples

Size and annotation coverage of the studied samples
Language	POS	# data points	# human	# masculine	# feminine
English	Noun	32885	1357	773	361
English	Pronoun	5275	3111	2371	740
French	Noun	151537	18601	72265	59019
French	Pronoun	14014	NA	10466	3548

Statistical analysis

Bayesian logistic regression modeling with brms package on R (Bürkner, 2017)
- Predictcs a binary outcome (eg. subject/object)
- Random intercept effects for verb lemma and corpora

When posterior probability P is superior to 90%, we will say we have strong evidence for an effect.

Results

Alignment between syntactic functions and prominence features

Subject frequency vs. noun animacy

Human nouns are more often subjects
Inanimate nouns are less often subjects

→ Prominence alignment:
Subject > Object
Human > Inanimate

Consistent across languages/corpora

Subject frequency vs. NP definiteness

Definite NPs are more often subjects
Indefinite NPs are less often subjects

→ Prominence alignment:
Subject > Object
Definite > Indefinite

Consistent across languages/corpora

Subject frequency vs. pronominality

Pronouns are more often subjects
Nouns are less often subjects

→ Prominence alignment:
Subject > Object
Pronoun > Noun

Consistent across languages/corpora

Subject frequency vs. grammatical gender

→ Inanimate nouns in French

Gender plays a role for human nouns (social gender)
But it does not with inanimate nouns (grammatical gender)

→ Evidence for a differential interpretation of gender on humans and inanimates (Corbett, 1991; Elpers et al., 2022).

Syntactic function biases: a Bayesian model

Interaction between grammatical gender and animacy in French

→ Masculine humans are more likely to be subjects

→ Grammatical gender plays a role only for humans (ie. when it is interpreted as social gender)

- β_Masc:Hum = 0.33
- CI =[0.14, 0.53]
- P(β_Masc:Hum > 0) = 100%.

Explanations for the gender bias

Like-me effect
An agency-based bias
A topicality-based bias

1. Like-me effect

Brough et al. (2020) show an effect of social perspective on word order:

Both man and women put men first in their sentences
But men do this more strongly than women

This is called a like-me effect or a me-first preference (Cooper & Ross, 1975)

→ Let’s take into account speaker gender in our data

1. Like-me effect

→ Human nouns in French

Male arguments are more often subjects
Male speakers do this more strongly

→ Like-me effect (Brough et al., 2020)

- β_Masc:Male = 0.56
- CI = [0.29, 0.83]
- P(β_Masc:Male > 0) = 100%

→ Strong evidence for an interaction between speaker gender and argument gender

2. An agency-based bias

Social cognition studies show that men are perceived as more agentive (Koenig et al., 2011)

And agents are mainly subjects

It is possible that men are more often subjects because they are more often agents

→ Let’s take into account semantic roles

2. An agency-based bias

→ Human nouns and pronouns

Passive subjects & active objects:
- Same semantic role = patient
- Different syntactic functions
Passive subjects are more often masculine than active objects

→ So the syntactic bias is not reducible to semantic roles

- β_PassiveSubj = 0.31
- CI = [0.08, 0.54]
- P(β_PassiveSubj > 0) = 99%

2. An agency-based bias

→ Human nouns and pronouns

Active & passive subjects:
- Different semantic roles
- Same syntactic function = subject
Active subjects are more often masculine than passive subjects

→ So semantics may also play a role

- β_ActiveSubj = 0.51
- CI = [0.37, 0.65]
- P(β_ActiveSubj > β_PassiveSubj ) = 91%

3. A topicality-based bias

A third possible confound factor: topicality.

In French newspapers, Huet et al. (2013) that 50% of articles mention men, while only 10% mention women.
We thus expected that men are more talk about, more often topics.
Discourse topics are realized as pronouns (Lambrecht, 1994) and pronouns are more often subjects.

3. A topicality-based bias

Men are more often pronominalized than women in English

→ Men are more often topics

- β_Male = 0.42
- CI = [0.29, 0.55]
- P(β_Male > 0) = 100%

Discussion

The gender bias in syntactic functions can be found in multiple corpora and genres
→ It is not just from linguists
→ Even in a language like English without grammatical gender
Discourse tendencies for men to be topics, agents and subjects
→ The syntax bias does not reduce to semantics
Speaker gender plays a role in the syntactic gender bias: like-me effect (Brough et al., 2020)
→ Interactions between social and linguistic structures
Gender could be integrated to prominence alignment model (Esaulova, 2015)
→ Traditional scale: human > non human animate > inanimate (Aissen, 2003)
→ Suggested scale: male > female > non gendered animate > inanimate
→ If humanness plays a role, social gender may play a role
→ Predicted effects should be tested experimentally (Bornkessel-Schlesewsky & Schlesewsky, 2009)

Bring home message:

→ Take gender into account in experimental syntax (corpus studies and experiments)

Gender was not annotated in Bresnan’s work on dative alternation for instance (Bresnan & Ford, 2010)
Gender may be controlled for in some experiments but rarely reported:
- Soares et al. (2019) for pro-drop in Portuguese
- De la Fuente (2015) on anaphora resolution in English and Spanish
- Pozniak (2018) on subject inversion in relative clauses in French

On going and future work:

Language with 3 genders (Greek)
Gender effect on differential object marking and subject inversion in Spanish
Disentangling function assignment and word order preferences in Greek (Exling on Wednesday!)

Merci !

Thank you!

References

Abeillé, A., An, A., & Shiraïshi, A. (2018). L’accord de proximité du déterminant en français. Discours. Revue de Linguistique, Psycholinguistique Et Informatique. A Journal of Linguistics, Psycholinguistics and Computational Linguistics, 22.

Abeillé, A., Clément, L., & Liégeois, L. (2019). Un corpus annoté pour le français : Le french treebank. TAL, 60, 19–43. https://halshs.archives-ouvertes.fr/halshs-02560207

Ahrenberg, L. (2015). Converting an english-swedish parallel treebank to universal dependencies. Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015), 10–19.

Aissen, J. (1999). Markedness and subject choice in optimality theory. Natural Language & Linguistic Theory, 17(4), 673–711. https://doi.org/10.1023/A:1006335629372

Aissen, J. (2003). Differential object marking: Iconicity vs. economy. Natural Language and Linguistic Theory, 21(3), 435–483. https://doi.org/10.1023/A:1024109008573

An, A., & Abeillé, A. (2021). Closest conjunct agreement with attributive adjectives. Journal of French Language Studies, 1–28.

ATILF. (2022). Base textuelle frantexte. ATILF-CNRS & université de lorraine [En ligne]. https://www.frantext.fr/

Baroni, M., Bernardini, S., Ferraresi, A., & Zanchetta, E. (2009). The WaCky wide web: A collection of very large linguistically processed web-crawled corpora. Language Resources and Evaluation, 43(3), 209–226. https://doi.org/10.1007/s10579-009-9081-4

Bonami, O., Caron, G., & Plancq, C. (2013). Flexique: An inflectional lexicon for spoken french. Actes Du Quatrième Congrès Mondial de Linguistique Française, 2583–2596.

Bornkessel-Schlesewsky, I., & Schlesewsky, M. (2009). The role of prominence information in the real-time comprehension of transitive constructions: A cross-linguistic approach. Language and Linguistics Compass, 3(1), 19–58.

Bresnan, J., Dingare, S., & Manning, C. D. (2001). Soft constraints mirror hard constraints: Voice and person in english and lummi. In M. Butt & T. Holloway King (Eds.), Proceedings of the LFG01 conference (pp. 13–32). Stanford: CSLI Publications.

Bresnan, J., & Ford, M. (2010). Predicting syntax: Processing dative constructions in american and australian varieties of english. Language, 168–213.

Brough, J., Branigan, H., Harris, L., & Rabagliati, H. (2020). The influence of race and gender on perspective-taking during language production.

Bürkner, P.-C. (2017). Brms: An r package for bayesian multilevel models using stan. Journal of Statistical Software, 80, 1–28.

Cépeda, P., Kotek, H., Pabst, K., & Syrett, K. (2021). Gender bias in linguistics textbooks: Has anything changed since macaulay & brice 1997? Language.

Cooper, W. E., & Ross, J. R. (1975). World order. Papers from the Parasession on Functionalism, 63–111.

Corbett, G. G. (1991). Gender. Cambridge University Press.

Da Cunha, Y., & Abeillé, A. (2020). L’alternance actif/passif en français : Une étude statistique sur corpus écrit. Discours, 27.

De la Fuente, I. (2015). Putting pronoun resolution in context: The role of syntax, semantics, and pragmatics in pronoun interpretation [PhD thesis]. Université paris Diderot.

Debaisieux, J.-M., & Benzitoun, C. (Eds.). (2020). Orféo: Un corpus et une plateforme pour l’étude du français contemporain. Armand Colin.

Elpers, N., Jensen, G., & Holmes, K. J. (2022). Does grammatical gender affect object concepts? Registered replication of phillips and boroditsky (2003). Journal of Memory and Language, 127, 104357. https://doi.org/10.1016/j.jml.2022.104357

Esaulova, Y. (2015). The prominence of gender information in on-line language processing: Cross-linguistic evidence of implicit gender hierarchies [PhD thesis].

Esaulova, Y., & Von Stockhausen, L. (2015). Cross-linguistic evidence for gender as a prominence feature. Frontiers in Psychology, 6. https://doi.org/10.3389/fpsyg.2015.01356

Gygax, P. M., Elmiger, D., Zufferey, S., Garnham, A., Sczesny, S., Stockhausen, L. von, Braun, F., & Oakhill, J. (2019). A language index of grammatical gender dimensions to study the impact of grammatical gender on the way we perceive women and men. Frontiers in Psychology, 10. https://doi.org/10.3389/fpsyg.2019.01604

Huet, T., Biega, J., & Suchanek, F. M. (2013). Mining history with le monde. Proceedings of the 2013 Workshop on Automated Knowledge Base Construction - AKBC ’13, 49–54. https://doi.org/10.1145/2509558.2509567

Hundt, M., Röthlisberger, M., & Seoane, E. (2018). Predicting voice alternation across academic englishes. Corpus Linguistics and Linguistic Theory, 17(1), 189–222. https://doi.org/10.1515/cllt-2017-0050

Jelinek, E., & Demers, R. A. (1983). The agent hierarchy and voice in some coast salish languages. International Journal of American Linguistics, 49(2), 167–185. https://doi.org/10.1086/465780

Koenig, A. M., Eagly, A. H., Mitchell, A. A., & Ristikari, T. (2011). Are leader stereotypes masculine? A meta-analysis of three research paradigms. Psychological Bulletin, 137(4), 616–642. https://doi.org/10.1037/a0023557

Kotek, H., Dockum, R., Babinski, S., & Geissler, C. (2021). Gender bias and stereotypes in linguistic example sentences. Language.

Lambrecht, K. (1994). Information structure and sentence form: Topic, focus, and the mental representations of discourse referents (Vol. 71). Cambridge University Press.

Levshina, N. (2021). Communicative efficiency and differential case marking: A reverse-engineering approach. Linguistics Vanguard, 7. https://doi.org/10.1515/lingvan-2019-0087

McDonald, R., Nivre, J., Quirmbach-Brundage, Y., Goldberg, Y., Das, D., Ganchev, K., Hall, K., Petrov, S., Zhang, H., Täckström, O., Bedini, C., Bertomeu Castelló, N., & Lee, J. (2013). Universal dependency annotation for multilingual parsing. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 92–97. https://aclanthology.org/P13-2017

Mollin, S. (2013). Pathways of change in the diachronic development of binomial reversibility in late modern american english. Journal of English Linguistics, 41(2), 168–203.

Mollin, S. (2014). The (ir) reversibility of english binomials: Corpus, constraints, developments (Vol. 64). John Benjamins Publishing Company.

Pozniak, C. (2018). Le traitement des relatives dans les langues : Une approche comparative et multifactorielle [PhD thesis]. http://www.theses.fr/235767611

Richy, C., & Burnett, H. (2020). Jean does the dishes while marie fixes the car: A qualitative and quantitative study of social gender in french syntax articles. Journal of French Language Studies, 30(1), 47–72.

Silveira, N., Dozat, T., De Marneffe, M.-C., Bowman, S., Connor, M., Bauer, J., & Manning, C. D. (2014). A gold standard dependency corpus for english. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), 2897–2904.

Soares, E. C., Miller, P. H., & Hemforth, B. (2019). The effect of verbal agreement marking on the use of null and overt subjects: A quantitative study of first person singular in brazilian portuguese. Fórum Linguı́stico, 16(1), 3579–3600.

Zeldes, A. (2017). The GUM corpus: Creating multilayer resources in the classroom. Language Resources and Evaluation, 51(3), 581–612. https://doi.org/10.1007/s10579-016-9343-x

Like-me effect: a Bayesian model

Testing the interaction between argument gender and social gender:

β_Masc:Male = 0.56
CI = [0.29, 0.83]
P(β_Masc:Male > 0) = 100%

→ Strong evidence for an interaction

Ordering preference: subject inversion in FTB

Gender bias across genres

Pronouns in French and English

Gender bias across genres

Nouns in French, pronouns in English

A gender bias in syntax: evidence from French and English corpora

Outline

Introduction

Gender biases in syntax

Previous findings

1. Corpus studies

Previous findings

2. Eye-tracking experiments

Prominence & alignment

Prominence & alignment

Categorical effects

Gradient effects

Our study

Methodology

Selected corpora

Sampling

For dependency corpora

Sampling

For POS-tagged corpora

Annotation

Studied samples

Statistical analysis

Results

Alignment between syntactic functions and prominence features

Subject frequency vs. noun animacy

Subject frequency vs. NP definiteness

Subject frequency vs. pronominality

Subject frequency vs. social gender

Subject frequency vs. grammatical gender

Syntactic function biases: a Bayesian model

Explanations for the gender bias

1. Like-me effect

1. Like-me effect

2. An agency-based bias

2. An agency-based bias

2. An agency-based bias

3. A topicality-based bias

3. A topicality-based bias

Discussion

Bring home message:

On going and future work:

References

Like-me effect: a Bayesian model

Ordering preference: subject inversion in FTB

Gender bias across genres

Gender bias across genres

A gender bias in syntax:
evidence from French and English corpora