Wikipedia:Wikipedia Signpost/2021-11-29/Recent research

Recent research

Vandalizing Wikipedia as rational behavior


A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.


Vandalizing Wikipedia as rational behavior

A paper presented last year at the International Conference on Social Media and Society studies possible rational motivations for Wikipedia vandalism:

The author observes that "vandalism-related research has tended to focus on the detection and removal of vandalism, with relatively little attention paid to understanding vandals themselves" (which can be readily confirmed by searching the archives of this newsletter for "vandalism"; one exception being a 2018 study that asked students their guesses about why their classmates vandalize Wikipedia: "Only 4% of students vandalize Wikipedia – motivated by boredom, amusement or ideology (according to their peers)"). She notes that

The theoretical framework used to study such rational motivations is "rational choice theory (RCT) as applied in value expectancy theory (VET)". It conceptualizes the expected utility of a choice (such as that engaging in an act of vandalizing) as the sum over possible outcomes over the product of "the probability of some outcome O [...] and the utility valuation U of that same outcome".

Based on a sample of 141 vandalism edits (from the English Wikipedia), the author proposes an ontology of Wikipedia vandalism, extending classifications used in previous vandalism detection studies (e.g. blanking, misinformation, "image attack", "link spam") with a few new ones: "Attack graffiti" (i.e. "attack an individual or group") and "Community-related Graffiti" (expressing "opposition to community, norms, or policies").

The quantitative part of this mixed methods paper "examine[s] vandalism from four groups: users of a privacy tool Tor Browser, those contributing without an account, those contributing with an account for the first time, and those contributing with an account but having some prior edit history". Tor Browser edits are generally blocked automatically on Wikipedia and those in the dataset consists of edits that slipped through this mechanism, raising the question whether some or many of these edits might have involved the editor having to try several times to get around that block, setting them apart from less dedicated vandals in the other groups.

The observation that contributing under an account requires more effort (i.e. creating that account, and logging into it) than contributing as IP editor motivates the author's first hypothesis: "(H1) users who have created accounts will vandalize less frequently". She finds it confirmed by the examined edit data.

Secondly, the author hypothesizes that "the least identifiable individuals are more likely to produce vandalism that has high-risk repercussions" (H2) because value expectancy theory "suggests that identifiability acts as a constraint on deviant behavior." The author finds this hypothesis partially supported. Among other findings, "Tor-based users are substantially more likely than other groups to engage in large-scale vandalism and least likely to engage in the lowest risk type of vandalism, that which communicates friendly and sociable intent."

In motivating her third hypothesis, the author observes that "the groups under study differ by how they are treated by community policies. Newcomers are targeted for social interventions to welcome, train, and retain them. Wikipedia invites IP-based editors to create accounts as well as welcoming them. However, Tor-based editors generally experience rejection." The resulting hypothesis is "(H3) Members of excluded groups are more likely to strike against the community targeting them," operationalized as a higher rate of vandalism in the "community-related" category (e.g. directly attacking Wikipedia norms or policies).

The paper contains various other interesting observations that might make it worth reading for Wikipedia editors spending time dealing with vandalism and related community policies. To pick just one example, the author highlights that vandalism can also have positive effects, referring to a 2014 paper. That earlier study involved conducting interviews with editors and a quantitative analysis of a dataset that included edit numbers by editor experience level, page watcher numbers, pageview numbers and other data from the English Wikipedia, finding that "novice contributors’ participation has a direct negative effect on the quality of goods produced [i.e. newbie edit decreased article quality on average], but a positive indirect effect because it acts as a cue for expert contributors to improve the quality of those goods that consumers [i.e. Wikipedia readers] are most interested in." It found "that the positive direct effect of article consumption [i.e. pageviews] on expert editing patterns is fully mediated by novice contributions. Results [...] support the theory that experts are unaware of demand [i.e. experienced editors do not usually check traffic levels of the articles they edit] but they are stimulated to respond to article consumption if consumers signal demand for that particular good through their contributions as novice producers."

Briefly

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

Language biases in Wikipedia's "information landscapes"

From the abstract and conclusions:

"Universal structure" of collective reactions to invididual actions found in Twitter, Wikipedia and scientific citations

From the abstract:

"Novel Version of PageRank, CheiRank and 2DRank for Wikipedia in Multilingual Network Using Social Impact"

From the abstract:

(see also earlier coverage of related research that applied such ranking metrics to graphs of Wikipedia articles)

"Modeling Popularity and Reliability of Sources in Multilingual Wikipedia"

From the accompanying blog post:

References


Uses material from the Wikipedia article Wikipedia:Wikipedia Signpost/2021-11-29/Recent research, released under the CC BY-SA 4.0 license.