Linguistic Bias in Crowdsourced Biographies: A Cross-lingual Examination

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

77 Downloads (Pure)


Biographies make up a significant portion of Wikipedia entries and are a source of information and inspiration for the public. We examine a threat to their objectivity, linguistic biases, which are pervasive in human communication. Linguistic bias, the systematic asymmetry in the language used to describe people as a function of their social groups, plays a role in the perpetuation of stereotypes. Theory predicts that we describe people who are expected — because they are members of our own in-groups or are stereotype-congruent — with more abstract, subjective language, as compared to others. Abstract language has the power to sway our impressions of others as it implies stability over time. Extending our monolingual work, we consider biographies of intellectuals at the English- and Greek-language Wikipedia. We use our recently introduced sentiment analysis tool, DidaxTo, which extracts domain-specific opinion words to build lexicons of subjective words in each language and for each gender, and compare the extent to which abstract language is used. Contrary to expectation, we find evidence of gender-based linguistic bias, with women being described more abstractly as compared to men. However, this is limited to English-language biographies. We discuss the implications of using DidaxTo to monitor linguistic bias in texts produced via crowdsourcing.
Original languageEnglish
Title of host publicationMultilingual Text Analysis
PublisherWorld Scientific Publishing Co.
Publication statusPublished - 2019


Dive into the research topics of 'Linguistic Bias in Crowdsourced Biographies: A Cross-lingual Examination'. Together they form a unique fingerprint.

Cite this