Wikipedia:Wikipedia Signpost/2023-04-03/Recent research

Recent research

Language bias: Wikipedia captures at least the "silhouette of the elephant", unlike ChatGPT


A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.


"A Perspectival Mirror of the Elephant: Investigating Language Bias on Google, ChatGPT, Wikipedia, and YouTube"

This arXiv preprint (which according to the authors grew out of a student project for a course titled "Critical Thinking in Data Science" at Harvard University) finds that

The blind men and the elephant
(wall relief in Northeast Thailand)

Regarding Wikipedia, the authors note it "is an encyclopedia that provides summaries of knowledge and is written from a neutral point of view", concluding that

While the authors employ some quantitative methods to study the bias on the other three sites (particularly Google), the Wikipedia part of the paper is almost entirely qualitative in nature. It focuses on an in-depth comparison of a small set of (quite apparently non-randomly chosen) article topics across languages, not unlike various earlier studies of language bias on Wikipedia (e.g. on the coverage of the Holocaust in different languages, see our previous coverage here and here). Unfortunately, the paper fails to cite such such earlier research (which has also included quantitative results, such as those represented in the "Wikipedia Diversity Observatory", which among other things includes data on topic coverage across 300+ Wikipedia languages) – despite asserting "there has been a lack of investigation into language bias on platforms such as Google, ChatGPT, Wikipedia, and YouTube".

The first and largest part of the paper's examination of Wikipedia's coverage concerns articles about Buddhism and various subtopics, in the English, French, German, Vietnamese, Chinese, Thai, and Nepali Wikipedias. The authors indicate that they chose this topic starting out from the observation that

Somewhat in confirmation of this hypothesis, they find that

However, the authors also observe that "randomness is involved to some degree in terms of topic coverage on Wikipedia", defying overly simplistic predictions of biases based on intellectual traditions. E.g.

A second, shorter section focuses on comparing Wikipedia articles on liberalism and Marxism across languages. Among other things, it observes that the "English article has a long section on Keynesian economics", likely due to its prominent role in the New Deal reforms in the US in the 1930s. In contrast,

Among other proposals for reducing language bias on the four sites, the paper proposes that

Returning to their title metaphor, the authors give Wikipedia credit for at least "show[ing] a rough silhouette of the elephant", whereas e.g. Google only "presents a piece of the elephant based on a user's query language". However, this "silhouette – topic coverage – differs by language. [Wikipedia] writes in a descriptive tone and contextualizes first-person narratives and subjective opinions as cultural, historical, or religious phenomena." YouTube, on the other hand, "displays the 'color' and 'texture' of the elephant as it incorporates images and sounds that are effective in invoking emotions." But its top-rated videos "tend to create a more profound ethnocentric experience as they zoom into a highly confined range of topics or views that conform to the majority's interests".

The papers singles out the new AI-based chatbots as particularly problematic regarding language bias:

On the other hand, the paper's examination of the biases of "ChatGPT-Bing" [sic] highlights among other concerns its reliance on Wikipedia among the sources it cites in its output:

Briefly

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

"A systematic review of Wikidata in Digital Humanities projects"

From the abstract:


"Leveraging Wikipedia article evolution for promotional tone detection"

From the abstract:


"Detection of Puffery on the English Wikipedia"

From the abstract:


"Mapping Process for the Task: Wikidata Statements to Text as Wikipedia Sentences"

From the abstract:

Among other examples given in the paper, a Wikidata statement involving the items and properties Q217760, P54, Q221525 and P580 is mapped to the Wikipedia sentence "On 30 January 2010, Wiltord signed with Metz until the end of the season."

Judging from the paper's citations, the authors appear to have been unaware of the Abstract Wikipedia project, which is pursuing a closely related effort.


"The URW-KG: a Resource for Tackling the Underrepresentation of non-Western Writers" on Wikidata

From the paper:


References

Uses material from the Wikipedia article Wikipedia:Wikipedia Signpost/2023-04-03/Recent research, released under the CC BY-SA 4.0 license.