Wikipedia:Wikipedia Signpost/2022-10-31/Recent research
Disinformatsiya: Much research, but what will actually help Wikipedia editors?
A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
Think tank publishes report on "Information Warfare and Wikipedia"
The Institute for Strategic Dialogue, a London-based think tank, earlier this month published a report (co-authored with a company called CASM Technology) focusing "on information warfare on Wikipedia about the invasion of Ukraine"; see also this issue's "In the media" (summarizing media coverage of the report) and "Disinformation report" (providing context in form of various other concrete cases).
As summarized in the abstract:
The report offers a great overview of Wikipedia's existing mechanisms for dealing with such issues, based on numerous conversations with community members and other experts. However, the literature review indicates that the authors – despite confidently telling Wired magazine "We've never tried to analyze Wikipedia data in that way before" – were unfamiliar with a lot of existing academic research (e.g. about finding alternative accounts, aka sockpuppets, of abusive editors); the 39 references cited in the report include only a single peer-reviewed research paper. Likewise, despite the hope that their findings could yield "new tools" (Wired) that would support combating disinformation on Wikipedia, there is no indication that the authors were aware of past and ongoing research-supported product development efforts to build such tools, by the Wikimedia Foundation and others, some of which are outlined below. On Twitter, the lead author stated that "We're going to be doing more research on information warfare on Wikipedia with a new project kicking off later this month [October]", so perhaps some of these gaps can still be bridged.
How existing research efforts could help editors fight disinformation
Exactly two years ago, in the run-up to the 2020 US elections, the Wikimedia Foundation published a blog post noting concerns about a "rising rate and sophistication of disinformation campaigns" on the internet by coordinated actors, about elections and other topics such as the global pandemic or climate change, and providing a summary of how Wikipedia specifically was addressing such threats.
After mentioning the volunteer community's "robust mechanisms and editorial guidelines that have made the site one of the most trusted sources of information online" and announcing an internal anti-disinformation task force at the Foundation (which reportedly still exists, although one former member recently stated they were unaware what its current work areas are) as well as "strengthened capacity building by creating several new positions, including anti-disinformation director and research scientist roles," the post focused on summarizing how
With the US mid-term elections imminent and independent researchers apparently being unaware of these research projects at the Foundation (see above), now seems a good time to take a look at how they have developed in the meantime. As "some of the tools used or soon available to be used by editors", the October 2020 post listed the following:
- According to the corresponding project page on Meta-wiki, this project (launched in 2017) is still in progress, but it already resulted in a paper presented at the 2019 World Wide Web Conference ("Citation Needed: A Taxonomy and Algorithmic Assessment of Wikipedia's Verifiability"). The project page mentions plans "to work in close contact with the [developers of Citation Hunt, an existing community tool that identifies unsourced statements for editors to fix] and the Wikipedia Library communities. We will pilot a set of a recommendations, powered by the new citation context dataset, to evaluate if our classifiers can help support community efforts to address the problem of unsourced statements."
- (See also "Improving Wikipedia Verifiability with AI" and "Countering Disinformation by Finding Reliable Sources", below)
- This project resulted in a working prototype by December 2020, but it appears not yet to have been put into production and made available as a tool for the intended audience (checkuser community members). It had been preceded by earlier research efforts at the Foundation as well as various independent academic publications that also tackled this detection problem.

- This project (a collaboration with researchers from Koreas's KAIST, the Chinese University of Hong Kong, and Taiwan's NCKU) has since completed, resulting in a dataset that aligned Wikipedia and Wikidata statements using natural language processing (NLP) techniques. The project documentation on Meta-wiki doesn't mention an implementation of the system that would be directly usable by editors.
- This report (English Wikipedia version: User:HostBot/Social media traffic report) was made available to Wikipedia editors as a pilot project in spring 2020, which concluded at the end of 2021. (It implemented a recommndation from a 2019 report about "Patrolling on Wikipedia".) The research project page on Meta-wiki lists several conclusions, including:
- "the organic traffic coming from external platforms like YouTube and Facebook that link to Wikipedia articles as context for examining the credibility of content is not having a significant deleterious impact on Wikipedia or placing an additional burden on patrollers." (mitigating earlier concerns that had been voiced by the Wikimedia Foundation and others when YouTube introduced these links back in 2018))
- " early evidence suggests that the Social Media Traffic Report as an intervention has not led to a substantial change in patrolling behavior around these articles"
- This report (English Wikipedia version: User:HostBot/Social media traffic report) was made available to Wikipedia editors as a pilot project in spring 2020, which concluded at the end of 2021. (It implemented a recommndation from a 2019 report about "Patrolling on Wikipedia".) The research project page on Meta-wiki lists several conclusions, including:
Furthermore, the 2020 post mentioned the (at that time already widely used) ORES system.
The efforts appear to be part of the WMF Research team's "knowledge integrity" focus, announced in February 2019 in one of four "white papers that outline our plans and priorities for the next 5 years" .
Briefly
- See the page of the monthly Wikimedia Research Showcase for videos and slides of past presentations.
- The Wikimedia Foundation's Research and Security teams are requesting input from researchers "to help prioritize the release of data that can be useful for your research", such as country-wise pageview and editor numbers.
Other recent publications
Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.
Facebook/Meta research on "Improving Wikipedia Verifiability with AI"
From the abstract:
See also research project page on Meta-wiki
"Countering Disinformation by Finding Reliable Sources: a Citation-Based Approach"
From the abstract:
New book: A Discursive Perspective on Wikipedia: More than an Encyclopaedia?
From the publisher's description::
"Russian Wikipedia vs Great Russian Encyclopedia: (Re)construction of Soviet Music in the Post-Soviet Internet Space"
From the abstract:
Discuss this story