Wikipedia:Wikipedia Signpost/2024-01-31/Recent research
Croatian takeover was enabled by "lack of bureaucratic openness and rules constraining [admins]"
A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
A "lack of bureaucratic openness and rules constraining administrator behavior" enabled nationalist takeover of Croatian Wikipedia
- Reviewed by Bri and Tilman Bayer
A paper titled "Governance Capture in a Self-Governing Community: A Qualitative Comparison of the Serbo-Croatian Wikipedias" (accepted for publication in the CSCW 2024 proceedings) examines the well-known case of the Croatian Wikipedia's hijacking by far-right nationalists (from at least 2011 to 2020), and asks why the similarly situated Serbian, Bosnian and Serbo-Croatian Wikipedias managed to escape this fate.
As summarized in a post by the University of Washington's Center for an Informed Public (an interdisciplinary center involving UW's Information School, School of Law, and Department of Human Centered Design & Engineering), on the Croatian Wikipedia
This has already been documented in detail in a report commissioned by the Wikimedia Foundation (see e.g. prior Signpost coverage: "Croatian Wikipedia: capture and release", Disinformation report, 2021-06-27 and "Wikimedia Foundation builds 'Knowledge Integrity Risk Observatory' to enable communities to monitor at-risk Wikipedias", Recent research, 2022-11-28). As summarized in the present paper, "In part, the [WMF's] report attributed Croatian Wikipedia’s capture to a unique situation in which there were distinct Wikipedia editions for the standardized national variants of a pluricentric language: Bosnian-Croatian Montenegrin-Serbian (BCMS), sometimes referred to as Serbo-Croatian. This explanation, however, raises the question of why Serbian and Bosnian Wikipedia did not appear to suffer Croatian [Wikipedia's] fate."
To answer this question, the authors focus in particular on the comparison with Serbian Wikipedia (the largest of the four BCMS language Wikipedias; a Montenegrin Wikipedia does not exist currently, whereas the Serbo-Croatian Wikipedia, while catering to all the national variants, was deemed to be a less attractive takeover target due to its smaller audience and lack of "national resonance"). Their findings point at weak policies and norms that allowed capture to happen, especially the lack of policies around blocking, and the importance of integrity amongst the community's bureaucrats (users who can grant and remove admin permissions).
The researchers used a grounded theory approach, specifically a "qualitative analysis of interview data with a range of participants in Croatian and Serbian Wikipedia and in the broader Wikipedia community" (15 interviews in total). Based on this,
The authors state that their paper is the first academic work they know of "that has considered how distributed influence operations target, become deeply engaged with, and are facilitated by institutional and organizational arrangements within peer production communities like Wikipedia".
Among the limitations acknowledged in the paper, "none of its authors are fluent BCMS speakers. As a result, interviews were conducted in English." However, they attempted to compensate for this potential loss of relevant interviewees by also examining policy-related talk page discussion using Google Translate.
Perhaps more seriously, while the paper's insights certainly deserve wide attention by everyone concerned with similar issues in the Wikimedia movement, they are based on a single case - the authors note "that Croatian Wikipedia reflects only one potential path." They point to the case of Chinese Wikipedia, where "infiltration concerns" had led the Wikimedia Foundation to ban several admins in 2021 (Signpost coverage), illustrating "government pressure" as an important additional factor that "Future research could extend our framework" with. However, the authors do not mention that the Chinese Wikipedia case also provides important information relevant to factors that their paper did focus on and made conclusions about. For example, the Chinese Wikipedia community decided early on to build a single language project instead of separate ones for national variants of the Chinese language, aided by an (at the time) innovative automatic conversion system. As summarized in a 2009 paper,
One can't help wondering if a similar "anti-regionalism policy" could have been an effective "antidote" against Croatian nationalism, too, and whether using a similar technology-aided conversion between writing systems of Serbo-Croatian early on could have helped maintain Serbo-Croatian Wikipedia as a common locus of collaboration instead of being overtaken by the nationally focused Croatian and Serbian Wikipedia. (Both the Serbian and Serbo-Croatian Wikipedia did eventually adopt automatic conversion systems.) Unfortunately, the present interview study fails to address such questions.
Briefly
- See the page of the monthly Wikimedia Research Showcase for videos and slides of past presentations.
Other recent publications
Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.
- Compiled by Tilman Bayer
"Why do you need 400 photographs of 400 different Lockheed Constellations" on Commons?
From the abstract:
These 32 interviews are apparently the same as those that already served as the basis of an earlier, related paper by the same authors (cf. our review: "Unpacking Stitching between Wikipedia and Wikimedia Commons: Barriers to Cross-Platform Collaboration").
"From academic to media capital: To what extent does the scientific reputation of universities translate into Wikipedia attention?"
From the abstract:
"NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages"
Including loan words in a training corpus for natural language processing, a linguistic-computational technique closely interrelated with recent advances in artificial intelligence, can degrade the fidelity of the model that is supposed to represent the native language, not the language of the loan words. According to the authors, the relatively high fraction of loan words in Indonesian language Wikipedias (there are several) suffer from this defect. From a Twitter/X thread by one of the authors of this preprint:
"Loanword identification based on web resources: A case study on Wikipedia"
From the abstract:
From the introduction:
"Time Lag Analysis of Adding Scholarly References to English Wikipedia"
From the abstract:
See also:
- excerpt
- slide deck
- Our review of an earlier, related paper by the same authors: "The first scholarly references on Wikipedia articles, and the editors who placed them there"
"Wikipedia as a tool for contemporary history of science: A case study on CRISPR
From the abstract:

References
- Supplementary references and notes:
Discuss this story
While we don't have and explicit anti-regionalism policy we do have MOS:COMMONALITY which encourages use of English that will be understood across the English speaking world. This might be called an anti-insularity 'policy'. All the best: Rich Farmbrough 20:13, 31 January 2024 (UTC).[reply]