A gender bias in syntax:
evidence from French and English corpora

Yanis da Cunha & Anne Abeillé
– LingLunch, October 13th –

Outline

  1. Introduction: gender biases in syntax

  2. Methodology: a corpus study

  3. Results

  4. Discussion

Introduction

Gender biases in syntax

cluster_order Word order cluster_agree Agreement cluster_function Syntactic function A Gender bias in syntax B Men before women in coordinations 1. père et mère     ‘father and mother’ (Abeillé et al., 2018; Mollin, 2013, 2014) A->B:w C Unlike-gender coordinations trigger masculine agreement (An & Abeillé, 2021; Corbett, 1983) A->C:w D Men are more likely to be subjects (Kotek et al., 2021; Richy & Burnett, 2020) A->D:w

Previous findings

1. Corpus studies

Subject Object
Men 88% 70%
Women 12% 30%

Results from Langue Française
(French journal, 1969-1971 & 2008-2017)

Previous findings

2. Eye-tracking experiments

Role Noun (RN), disambiguation point

  1. Die FlugbegleiterinfemSG, die viele Tourist-enmasc/-innenfemPL beobachtet hatSG/habenPL, ist aufmerksam
    a. Subject RN: ‘The flight attendant, whom many Touristsfem/masc have observed, is attentive.’
    b. Object RN: ‘The flight attendant, who has observed many Touristsfem/masc, is attentive.’

Shorter reading times when masculine role nouns were subjects (1a) rather than objects (1b).


→ Evidence from corpus and experiments for gender effect in syntactic functions:

  • Men are more likely to be subjects
  • Women are more likely to be objects

Prominence & alignment

Alignment theory (Aissen, 1999; Levshina, 2021). Three ingredients:

  • arguments bear so-called referential prominence features: animacy, definiteness, person…
  • prominence and syntactic functions are hierarchically ordered (eg. animate > inanimate)
  • function assignment to an argument is constrained by alignment with prominence.

cluster_proe Argument prominence cluster Function assignment a Inanimate Indefinite 3rd person Noun A Object a->A:n C Subject a:we->C:ne b > > > > c Animate Definite 1st/2nd Pronoun c:we->A:nw c:s->C:n B   >   X    More frequent    Less frequent

The syntactic gender bias is reminiscent of prominence alignment (Esaulova, 2015)

cluster_proe cluster a     Female A Object a->A:n C Subject a:we->C:ne b > c Male      c:we->A:nw c:s->C:n B   >  

→ Prediction to be tested

Prominence & alignment

Alignment has two types of effects across languages: categorical or gradient (Bresnan et al., 2001)

Categorical effects

→ Obligatory voice alternation in Lummi (Jelinek & Demers, 1983)

  1. a.*The man knows me

    b.xcitŋsə ə cə swəyʔqəʔ

    know.PASSIVE by the man

    ‘I am known by the man’

Subject > Object

1st pers. > 3rd pers.

Gradient effects

→ Preferences in construction alternations:

Our study

→ Investigation of gender biases in syntactic function

  • Are just linguists biased (genre-specific bias) or is the syntactic gender bias more general?

  • Using English and French corpora

→ French is a grammatical gender language (Gygax et al., 2019)

  1. For inanimate nouns and pronouns, grammatical gender is arbitrary

    a. Marie\(_i\), elle\(_i\) est grande ‘Mary is tall’
    b. La voiture\(_i\), elle\(_i\) est grande ‘The car is big’

  2. For human nouns, grammatical gender is interpreted as social gender

    Unefem ministre ‘Female minister’
    Unmasc ministre ‘Male minister’

→ English is a natural gender language (Gygax et al., 2019)

  1. Nouns don’t have gender, except lexically gendered nouns

    Father, actress, queen

  2. Gender in pronouns reflects social gender

    He (male) vs. she (female)

Methodology

Selected corpora

Language Corpus Modality Genre Annotations
English EWT Written Web Dependencies
English GUM Mixed Fiction, newspaper, conversation Dependencies
English Lines Written Fiction, technical Dependencies
English Partut Mixed Talks, legal texts, Wikipedia Dependencies
English PUD Written Newspaper, Wikipedia Dependencies
French C-Oral-Rom Spoken Conversation Dependencies
French CFPP Spoken Conversation Dependencies
French CRFP Spoken Conversation Dependencies
French Frantext Written Fiction, non fiction POS-tagged
French FrWac Written Web POS-tagged
French FTB Written Newspaper Dependencies

Sampling

For dependency corpora

  • All subjects and objects nouns

    1. I notified her grandmother with the time of the departure. (EWT)

    2. J’ai rencontré sa fille (CFFP)
      ‘I met her daughter’

  • All subjects and subjects personal pronouns 3rd

    • English: he, she, him, her, it
      3. And she is the STAR of the family. (EWT)

    • French: il, elle, le, la
      4. Elle est née en 1917. (CFPP)
      ‘She was born in 1917’

  • Exclusion:

    • No copula (être/be, rester/to become…)

Sampling

For POS-tagged corpora

  • Linear sequency : Punctuation + Determiner + Noun + Verb + Determiner + Noun

    • Punctuation acts as a sentence boundary
    • NP+V+NP is assumed to be a transitive SVO sequence
  1. Le chauffeurmasc actionne les essuie-glacesmasc (Le Triomphe de Thomas Zins, Jung Matthieu 2018, Frantext)
    ‘The driver activates the windshield wipers’

  2. Votre filsmasc apprendra la voltigefem (L’Enfant des Lumières, Françoise Chandernagor 1995, Frantext)
    ‘Your son will learn aerobatics’

Annotation

A sentence from the French TreeBank:

# textID = id270359
# author = LAZARE FRANCOISE
# gender = woman

1 Si si C CS s=s 11 mod
2 les le D DET g=m|n=p|s=def 3 det
3 programmes programme N NC g=m|n=p|s=c 7 suj
4 d’ de P P _ 3 dep
5 économies économie N NC g=f|n=p|s=c 4 obj.p
6 sont être V V m=ind|n=p|p=3|t=pst 7 aux.pass
7 respectés respecter V VPP g=m|m=part|n=p|t=past 1 obj.cpl


  • Syntactic function: subject, object
  • Grammatical category: noun, pronoun
  • Grammatical gender in some French corpora: masculine, feminine
  • Speaker gender in French corpora (in metadata)
  • Definiteness: definite, indefinite
  • Construction: active, passive, intransitive
  • Animacy and grammatical gender for nouns in French using Flexique (Bonami et al., 2013)
    → Automatic annotation


  • Animacy and social gender for nouns in English
    → Manual annotation

Thanks to Liam Duignan who collaborated with me

Studied samples



Size and annotation coverage of the studied samples
Language POS # data points # human # masculine # feminine
English Noun 32885 1357 773 361
English Pronoun 5275 3111 2371 740
French Noun 151537 18601 72265 59019
French Pronoun 14014 NA 10466 3548

Statistical analysis


  • Bayesian logistic regression modeling with brms package on R (Bürkner, 2017)

    • Predictcs a binary outcome (eg. subject/object)
    • Random intercept effects for verb lemma and corpora


  • When posterior probability P is superior to 90%, we will say we have strong evidence for an effect.

Results

Alignment between syntactic functions and prominence features

Subject frequency vs. noun animacy



  • Human nouns are more often subjects
  • Inanimate nouns are less often subjects


→ Prominence alignment:
Subject > Object
Human > Inanimate


Consistent across languages/corpora

Subject frequency vs. NP definiteness



  • Definite NPs are more often subjects
  • Indefinite NPs are less often subjects


→ Prominence alignment:
Subject > Object
Definite > Indefinite


Consistent across languages/corpora

Subject frequency vs. pronominality



  • Pronouns are more often subjects
  • Nouns are less often subjects


→ Prominence alignment:
Subject > Object
Pronoun > Noun


Consistent across languages/corpora

Subject frequency vs. social gender



→ Just human nouns


  • Male arguments are more often subjects
  • Female arguments are less often subjects


→ Hypothetical prominence alignment:
Subject > Object
Male > female

Subject frequency vs. grammatical gender



→ Inanimate nouns in French


  • Gender plays a role for human nouns (social gender)
  • But it does not with inanimate nouns (grammatical gender)


→ Evidence for a differential interpretation of gender on humans and inanimates (Corbett, 1991; Elpers et al., 2022).


Syntactic function biases: a Bayesian model



Interaction between grammatical gender and animacy in French

Masculine humans are more likely to be subjects

→ Grammatical gender plays a role only for humans (ie. when it is interpreted as social gender)


- βMasc:Hum = 0.33
- CI =[0.14, 0.53]
- P(βMasc:Hum > 0) = 100%.

Explanations for the gender bias

  • Like-me effect
  • An agency-based bias
  • A topicality-based bias

1. Like-me effect


Brough et al. (2020) show an effect of social perspective on word order:

  • Both man and women put men first in their sentences
  • But men do this more strongly than women

This is called a like-me effect or a me-first preference (Cooper & Ross, 1975)


→ Let’s take into account speaker gender in our data

1. Like-me effect




→ Human nouns in French

  • Male arguments are more often subjects
  • Male speakers do this more strongly

Like-me effect (Brough et al., 2020)


- βMasc:Male = 0.56
- CI = [0.29, 0.83]
- P(βMasc:Male > 0) = 100%

→ Strong evidence for an interaction between speaker gender and argument gender

2. An agency-based bias


Social cognition studies show that men are perceived as more agentive (Koenig et al., 2011)

And agents are mainly subjects

It is possible that men are more often subjects because they are more often agents


→ Let’s take into account semantic roles

2. An agency-based bias



→ Human nouns and pronouns

  • Passive subjects & active objects:

    • Same semantic role = patient
    • Different syntactic functions
  • Passive subjects are more often masculine than active objects

→ So the syntactic bias is not reducible to semantic roles


- βPassiveSubj = 0.31
- CI = [0.08, 0.54]
- P(βPassiveSubj > 0) = 99%

2. An agency-based bias



→ Human nouns and pronouns

  • Active & passive subjects:

    • Different semantic roles
    • Same syntactic function = subject
  • Active subjects are more often masculine than passive subjects

→ So semantics may also play a role


- βActiveSubj = 0.51
- CI = [0.37, 0.65]
- P(βActiveSubj > βPassiveSubj ) = 91%

3. A topicality-based bias


A third possible confound factor: topicality.


In French newspapers, Huet et al. (2013) that 50% of articles mention men, while only 10% mention women.
We thus expected that men are more talk about, more often topics.
Discourse topics are realized as pronouns (Lambrecht, 1994) and pronouns are more often subjects.

3. A topicality-based bias




  • Men are more often pronominalized than women in English

→ Men are more often topics


- βMale = 0.42
- CI = [0.29, 0.55]
- P(βMale > 0) = 100%

Discussion

  • The gender bias in syntactic functions can be found in multiple corpora and genres
    → It is not just from linguists
    → Even in a language like English without grammatical gender

  • Discourse tendencies for men to be topics, agents and subjects
    → The syntax bias does not reduce to semantics

  • Speaker gender plays a role in the syntactic gender bias: like-me effect (Brough et al., 2020)
    → Interactions between social and linguistic structures

  • Gender could be integrated to prominence alignment model (Esaulova, 2015)
    → Traditional scale: human > non human animate > inanimate (Aissen, 2003)
    → Suggested scale: male > female > non gendered animate > inanimate
    → If humanness plays a role, social gender may play a role
    → Predicted effects should be tested experimentally (Bornkessel-Schlesewsky & Schlesewsky, 2009)

Bring home message:

Take gender into account in experimental syntax (corpus studies and experiments)

  • Gender was not annotated in Bresnan’s work on dative alternation for instance (Bresnan & Ford, 2010)

  • Gender may be controlled for in some experiments but rarely reported:

    • Soares et al. (2019) for pro-drop in Portuguese
    • De la Fuente (2015) on anaphora resolution in English and Spanish
    • Pozniak (2018) on subject inversion in relative clauses in French


On going and future work:

  • Language with 3 genders (Greek)

  • Gender effect on differential object marking and subject inversion in Spanish

  • Disentangling function assignment and word order preferences in Greek (Exling on Wednesday!)



Merci !

Thank you!

References

Abeillé, A., An, A., & Shiraïshi, A. (2018). L’accord de proximité du déterminant en français. Discours. Revue de Linguistique, Psycholinguistique Et Informatique. A Journal of Linguistics, Psycholinguistics and Computational Linguistics, 22.
Abeillé, A., Clément, L., & Liégeois, L. (2019). Un corpus annoté pour le français : Le french treebank. TAL, 60, 19–43. https://halshs.archives-ouvertes.fr/halshs-02560207
Ahrenberg, L. (2015). Converting an english-swedish parallel treebank to universal dependencies. Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015), 10–19.
Aissen, J. (1999). Markedness and subject choice in optimality theory. Natural Language & Linguistic Theory, 17(4), 673–711. https://doi.org/10.1023/A:1006335629372
Aissen, J. (2003). Differential object marking: Iconicity vs. economy. Natural Language and Linguistic Theory, 21(3), 435–483. https://doi.org/10.1023/A:1024109008573
An, A., & Abeillé, A. (2021). Closest conjunct agreement with attributive adjectives. Journal of French Language Studies, 1–28.
ATILF. (2022). Base textuelle frantexte. ATILF-CNRS & université de lorraine [En ligne]. https://www.frantext.fr/
Baroni, M., Bernardini, S., Ferraresi, A., & Zanchetta, E. (2009). The WaCky wide web: A collection of very large linguistically processed web-crawled corpora. Language Resources and Evaluation, 43(3), 209–226. https://doi.org/10.1007/s10579-009-9081-4
Bonami, O., Caron, G., & Plancq, C. (2013). Flexique: An inflectional lexicon for spoken french. Actes Du Quatrième Congrès Mondial de Linguistique Française, 2583–2596.
Bornkessel-Schlesewsky, I., & Schlesewsky, M. (2009). The role of prominence information in the real-time comprehension of transitive constructions: A cross-linguistic approach. Language and Linguistics Compass, 3(1), 19–58.
Bresnan, J., Dingare, S., & Manning, C. D. (2001). Soft constraints mirror hard constraints: Voice and person in english and lummi. In M. Butt & T. Holloway King (Eds.), Proceedings of the LFG01 conference (pp. 13–32). Stanford: CSLI Publications.
Bresnan, J., & Ford, M. (2010). Predicting syntax: Processing dative constructions in american and australian varieties of english. Language, 168–213.
Brough, J., Branigan, H., Harris, L., & Rabagliati, H. (2020). The influence of race and gender on perspective-taking during language production.
Bürkner, P.-C. (2017). Brms: An r package for bayesian multilevel models using stan. Journal of Statistical Software, 80, 1–28.
Cépeda, P., Kotek, H., Pabst, K., & Syrett, K. (2021). Gender bias in linguistics textbooks: Has anything changed since macaulay & brice 1997? Language.
Cooper, W. E., & Ross, J. R. (1975). World order. Papers from the Parasession on Functionalism, 63–111.
Corbett, G. G. (1991). Gender. Cambridge University Press.
Da Cunha, Y., & Abeillé, A. (2020). L’alternance actif/passif en français : Une étude statistique sur corpus écrit. Discours, 27.
De la Fuente, I. (2015). Putting pronoun resolution in context: The role of syntax, semantics, and pragmatics in pronoun interpretation [PhD thesis]. Université paris Diderot.
Debaisieux, J.-M., & Benzitoun, C. (Eds.). (2020). Orféo: Un corpus et une plateforme pour l’étude du français contemporain. Armand Colin.
Elpers, N., Jensen, G., & Holmes, K. J. (2022). Does grammatical gender affect object concepts? Registered replication of phillips and boroditsky (2003). Journal of Memory and Language, 127, 104357. https://doi.org/10.1016/j.jml.2022.104357
Esaulova, Y. (2015). The prominence of gender information in on-line language processing: Cross-linguistic evidence of implicit gender hierarchies [PhD thesis].
Esaulova, Y., & Von Stockhausen, L. (2015). Cross-linguistic evidence for gender as a prominence feature. Frontiers in Psychology, 6. https://doi.org/10.3389/fpsyg.2015.01356
Gygax, P. M., Elmiger, D., Zufferey, S., Garnham, A., Sczesny, S., Stockhausen, L. von, Braun, F., & Oakhill, J. (2019). A language index of grammatical gender dimensions to study the impact of grammatical gender on the way we perceive women and men. Frontiers in Psychology, 10. https://doi.org/10.3389/fpsyg.2019.01604
Huet, T., Biega, J., & Suchanek, F. M. (2013). Mining history with le monde. Proceedings of the 2013 Workshop on Automated Knowledge Base Construction - AKBC ’13, 49–54. https://doi.org/10.1145/2509558.2509567
Hundt, M., Röthlisberger, M., & Seoane, E. (2018). Predicting voice alternation across academic englishes. Corpus Linguistics and Linguistic Theory, 17(1), 189–222. https://doi.org/10.1515/cllt-2017-0050
Jelinek, E., & Demers, R. A. (1983). The agent hierarchy and voice in some coast salish languages. International Journal of American Linguistics, 49(2), 167–185. https://doi.org/10.1086/465780
Koenig, A. M., Eagly, A. H., Mitchell, A. A., & Ristikari, T. (2011). Are leader stereotypes masculine? A meta-analysis of three research paradigms. Psychological Bulletin, 137(4), 616–642. https://doi.org/10.1037/a0023557
Kotek, H., Dockum, R., Babinski, S., & Geissler, C. (2021). Gender bias and stereotypes in linguistic example sentences. Language.
Lambrecht, K. (1994). Information structure and sentence form: Topic, focus, and the mental representations of discourse referents (Vol. 71). Cambridge University Press.
Levshina, N. (2021). Communicative efficiency and differential case marking: A reverse-engineering approach. Linguistics Vanguard, 7. https://doi.org/10.1515/lingvan-2019-0087
McDonald, R., Nivre, J., Quirmbach-Brundage, Y., Goldberg, Y., Das, D., Ganchev, K., Hall, K., Petrov, S., Zhang, H., Täckström, O., Bedini, C., Bertomeu Castelló, N., & Lee, J. (2013). Universal dependency annotation for multilingual parsing. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 92–97. https://aclanthology.org/P13-2017
Mollin, S. (2013). Pathways of change in the diachronic development of binomial reversibility in late modern american english. Journal of English Linguistics, 41(2), 168–203.
Mollin, S. (2014). The (ir) reversibility of english binomials: Corpus, constraints, developments (Vol. 64). John Benjamins Publishing Company.
Pozniak, C. (2018). Le traitement des relatives dans les langues : Une approche comparative et multifactorielle [PhD thesis]. http://www.theses.fr/235767611
Richy, C., & Burnett, H. (2020). Jean does the dishes while marie fixes the car: A qualitative and quantitative study of social gender in french syntax articles. Journal of French Language Studies, 30(1), 47–72.
Silveira, N., Dozat, T., De Marneffe, M.-C., Bowman, S., Connor, M., Bauer, J., & Manning, C. D. (2014). A gold standard dependency corpus for english. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), 2897–2904.
Soares, E. C., Miller, P. H., & Hemforth, B. (2019). The effect of verbal agreement marking on the use of null and overt subjects: A quantitative study of first person singular in brazilian portuguese. Fórum Linguı́stico, 16(1), 3579–3600.
Zeldes, A. (2017). The GUM corpus: Creating multilayer resources in the classroom. Language Resources and Evaluation, 51(3), 581–612. https://doi.org/10.1007/s10579-016-9343-x

Like-me effect: a Bayesian model



Testing the interaction between argument gender and social gender:

  • βMasc:Male = 0.56
  • CI = [0.29, 0.83]
  • P(βMasc:Male > 0) = 100%

→ Strong evidence for an interaction

Ordering preference: subject inversion in FTB


Gender bias across genres

Pronouns in French and English

Gender bias across genres

Nouns in French, pronouns in English