Wikipedia:Wikipedia Signpost/2020-01-27/Recent research

Recent research

How useful is Wikipedia for novice programmers trying to learn computing concepts?

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

Wikipedia as a learning resource (for programmers)

Reviewed by Isaac Johnson

"Understanding Wikipedia as a Resource for Opportunistic Learning of Computing Concepts" by Martin P. Robillard and Christoph Treude of McGill University and University of Adelaide and published in SIGCSE 2020 examines the utility of Wikipedia articles about computing concepts for novice programmers. The researchers recruit 18 students with varying computer science backgrounds to read Wikipedia articles about computing concepts that are new to them. The authors use a sample of four Wikipedia articles that appear frequently in Stack Overflow posts (Dependency injection; Endianness; Levenshtein distance; Regular expression). Side note: in a sample of 44 million posts on Stack Overflow that the authors process, 360 thousand (0.8%) have a Wikipedia link, pointing to 40 thousand different Wikipedia articles in aggregate. They indicate that this rate of linking to Wikipedia is similar on the Reddit subreddit r/programming as well. The participants are instructed to use a think-aloud method where they talk through what they are doing and thinking as they try to learn about the concept. The authors then analyzed the transcripts from these interviews to determine what themes were consistent across the students.

The researchers identified the following challenges to learning from Wikipedia:

  • Concept Confusion: if vocabulary or notation has a different meaning in other contexts, this can confuse those readers who think they know what they're reading (but don't).
  • Need for Examples: explanations are not always enough. Examples are often desired.
  • New Terminology: encountering too many unfamiliar terms can be frustrating for readers.
  • Trivia Clutter: peripheral information that is not core to learning a concept can make it hard to find the most useful information, especially for non-native readers.
  • Unfamiliar Notation: math notation and code in articles is generally not explained, which can create confusion for the reader if they are not familiar with it.

While the authors conclude that linking to more structured learning resources from Stack Overflow and related forums might be beneficial, this research clearly provokes some thought about how Wikipedia might be a more effective learning context. For instance, page previews are a clear improvement for readers who are not familiar with the concepts mentioned in an article. The other concepts emphasize the value of surfacing examples in articles, not relying on mathematical notation to explain a concept, and having a clear lede paragraph. Two other thoughts about this research:

  • The authors describe how computer programmers often end up at Wikipedia by way of Stack Overflow posts that link Wikipedia as a means of better understanding concepts mentioned in an answer. The ability of these communities to build on Wikipedia is a really lovely example of beneficial re-use. It has been examined more widely in work by Vincent et al. (see this past write-up).
  • As machine translation is explored as a means of supporting content creation (e.g., via the content translation tool) or of providing access to knowledge in other languages (e.g., automatic translations in search), it is useful to understand what articles are particularly difficult for novices to learn from, such as the computing concepts studied in this research. This is content that likely becomes even more confusing if imperfect machine translation leads to odd sentence structure or word choice. Perhaps it should be prioritized for cleanup by native speakers.


Briefly


Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

Compiled by Tilman Bayer and Miriam Redi


"A systematic literature review on Wikidata"

From the abstract:

See also earlier coverage of a related paper by Piscopo et al.: "First literature survey of Wikidata quality research", and the following preprint


"Wikidata from a Research Perspective -- A Systematic Mapping Study of Wikidata"

From the abstract:


"Example from Wikipedia with a correct and an incorrect example extracted, as well as non-matching literals marked in the abstract" (from "Extracting Literal Assertions ...")

"Extracting Literal Assertions for DBpedia from Wikipedia Abstracts"

From the abstract:


"Getting the Most out of Wikidata: Semantic Technology Usage in Wikipedia’s Knowledge Graph"

From the abstract and accompanying note:


"Who Models the World?: Collaborative Ontology Creation and User Roles in Wikidata"

From the abstract:

See also comments about the paper by Daniel Mietchen, and earlier coverage of a related paper: "Participation patterns on Wikidata"


"The Evolution of Power and Standard Wikidata Editors: Comparing Editing Behavior over Time to Predict Lifespan and Volume of Edits"

From the abstract:


"Following the footsteps of giants: Modeling the mobility of historically notable individuals using Wikipedia"

This study found that the migration place for historically relevant people is limited to few locations, depending on discipline and opportunities.


"GeBioToolkit: Automatic Extraction of Gender-Balanced Multilingual Corpus of Wikipedia Biographies"

This preprint presents a tool for extracting multilingual and gender-balanced parallel corpora at sentence level, with document and gender information.


"On the Relation of Edit Behavior, Link Structure, and Article Quality on Wikipedia"

This study found that on Wikipedia, controversial and high-quality articles articles differ from others, according to metrics quantifying editing and linking behavior.


"Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering"

This preprint presents a new graph-based recurrent retrieval approach to answer multi-hop open-domain questions through the Wikipedia link graph.


A photomontage currently used as the lead illustration in both the English and Spanish Wikipedia's articles about the Falklands War (Guerra de las Malvinas)

"Collectively biased representations of the past: Ingroup Bias in Wikipedia articles about intergroup conflicts"

From the abstract:


"People tend to do more when collaborating with more people" on Wikipedia

From the abstract:


References

Uses material from the Wikipedia article Wikipedia:Wikipedia Signpost/2020-01-27/Recent research, released under the CC BY-SA 4.0 license.