Wikipedia:Wikipedia Signpost/2024-09-26/Recent research

File:A human writer and a creature with the head and wings of a crow, both sitting and typing on their own laptops, experiencing mild hallucinations (DALL-E illustration).webp
HaeB
CC0
0
0
300
Recent research

Article-writing AI is less "prone to reasoning errors (or hallucinations)" than human Wikipedia editors


A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.


"Wikicrow" AI less "prone to reasoning errors (or hallucinations)" than human Wikipedia editors when writing gene articles

A preprint titled "Language Agents Achieve Superhuman Synthesis of Scientific Knowledge" introduces

It was published by "FutureHouse", a San-Francisco-based nonprofit working on "Automating scientific discovery" (with a focus on biology). FutureHouse was launched last year with funding from former Google CEO Eric Schmidt (at which time it was anticipated it would spend about $20 million by the end of 2024). Generating Wikipedia-like articles about science topics is only one of the applications of "PaperQA2, FutureHouse's scientific RAG [retrieval-augmented generation] system", which is designed to aid researchers. (For example, FutureHouse also recently launched a website called "Has Anyone", described as a "minimalist AI tool to search if anyone has ever researched a given topic.")

In more detail, the researchers "engineered a system called WikiCrow, which generates cited Wikipedia-style articles about human protein-coding genes by combining several PaperQA2 calls on topics such as the structure, function, interactions, and clinical significance of the gene." Each call contributes a section of the resulting article (somewhat similar to another recent system, see our review: "STORM: AI agents role-play as 'Wikipedia editors' and 'experts' to create Wikipedia-like articles"). The prompts include the instruction to "Write in the style of a Wikipedia article, with concise sentences and coherent paragraphs".

With an average cost of $5.50, the generated articles tended to be longer than their Wikipedia counterparts and had higher quality, at least according to the paper's evaluation method:

For the judgment whether a particular statement was "supported" by the cited references, the concrete question asked to the graders (described as "expert researchers" in the paper) was:

In addition, among other more detailed instructions, the graders were advised to mark a statement correct as cited even if it was not directly supported by the source, as long as the statement consisted of "broad context" judged to be "undergraduate biology student common knowledge" (akin to an extreme interpretation of WP:BLUE).

The fact that these rating criteria appear to be more liberal than Wikipedia's own, combined with the well-known general reputation of LLMs for generating hallucinations, makes the "WikiCrow displayed significantly higher precision" result rather remarkable. The authors double-checked it by examining the data more closely:

A human writer and a creature with the head and wings of a crow, both sitting and typing on their own laptops, experiencing mild hallucinations (DALL-E illustration)
Very scientifically accurate depictions of hallucinations experienced by human editors (left) and WikiCrow (right). Not from the paper.

The authors caution that this result about Wikipedians "hallucinating" more frequently than AI is specific to their "WikiCrow" system (and the task of writing articles about genes), and must not be generalized to LLMs in general:

A previous, less capable version of the WikiCrow system had already been described in a December 2023 blog post, which discussed the motivation for focusing on the task of writing Wikipedia-like articles about genes in more detail. Rather than seeing it as an arbitrary benchmark demo for their LLM agent system (back then in its earlier version, PaperQA), the authors described it as being motivated by longstanding shortcomings of Wikipedia's gene coverage that are seriously hampering the work of researchers who have come to rely on Wikipedia:

These challenges of covering the large number of relevant genes are not news to Wikipedians working in this area. Back in 2011, several papers in a special issue of Nucleic Acids Research on databases had explored Wikipedia as a database for structured biological data, e.g. asking "how to get scientists en masse to edit articles" in this area, and presenting English Wikipedia's "Gene Wiki" taskforce (which is currently inactive). In a 2020 article in eLife, a group of 30 researchers and Wikidata contributors similarly "describe[d] the breadth and depth of the biomedical knowledge contained within Wikidata," including its coverage of genes in general ("Wikidata contains items for over 1.1 million genes and 940 thousand proteins from 201 unique taxa") and human genetic variants ("Wikidata currently contains 1502 items corresponding to human genetic variants, focused on those with a clear clinical or therapeutic relevance"). But it seems that at least from the point of view of the FutureHouse researchers, Wikidata's gene coverage is not a substitute for Wikipedia's, perhaps because it does not offer the same kind of factual coverage (see also the review of a related dissertation below).

The current paper is not peer-reviewed, but conveys credibility by disclosing ample detail about the methodology for building and evaluating the PaperQA2 and WikiCrow systems (also in an accompanying technical blog post), and by releasing the underlying source code and data. The PaperQA2 system is available as an open-source software package. (This includes a "Setting to emulate the Wikipedia article writing used in our WikiCrow publication". However, the paper cautions that the released version does not include some additional tools that were used, and in particular does not provide "access to non-local full-text literature searches", which are "often bound by licensing agreements".) The generated articles are available online in rendered form and as Markdown source (see full list below, with links to their Wikipedia counterparts for comparison). The annotated expert ratings have been published as well.

The authors acknowledge "previous work on unconstrained document summarization, where the document must be found and then summarized, and even writing Wikipedia-style articles with RAG" (i.e. the aforementioned STORM project). But they highlight that

The "crow" moniker (already used in a predecessor project called "ChemCrow", an LLM agent working on chemistry tasks) is inspired by the fact that "Crows can talk – like a parrot – but their intelligence lies in tool use."

List of gene articles generated by WikiCrow
Notes:
  • The second column (the list of rendered articles) was obtained from the search box dropdown list at https://wikicrow.ai/ . The other two columns were derived from it.
  • Despite the paper's statement that these are "240 articles on genes that already have non-stub Wikipedia articles", the dropdown list appears to contain only 235, some of which don't seem to have an equivalent English Wikipedia article. (See also List of human protein-coding genes 1 etc.)

Using Wikipedia's categories and list pages to build a knowledge graph separate from Wikidata

From the abstract of a dissertation titled "Exploiting semi-structured information in Wikipedia for knowledge graph construction":

Why would one want to use Wikipedia as a source of structured data and build a new knowledge graph when Wikidata already exists? First, the thesis argues that Wikidata — even though it has surpassed other public knowledge graphs in the number of entitities — is still very incomplete, especially when it comes to information about long-tail topics:

On the other hand, an automated process for extracting structured information from Wikipedia may not yet be reliable enough to import the result directly without manual review:

Chapter 3 ("Knowledge Graphs on the Web") contains detailed comparisons of Wikidata with other public knowledge graphs, with observations including the following:

(see also an earlier paper co-authored by the author that was titled "Knowledge Graphs on the Web -- an Overview")

Briefly

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

"Refining Wikidata Taxonomy using Large Language Models"

A Wikidata taxonomy (from "city or town" to "entity") before and after refinement

From the abstract:

From the "Evaluation" section:

"Psychiq and Wwwyzzerdd: Wikidata completion using Wikipedia"

Video demonstrating the Wwwyzzerdd browser extension

From the abstract:

"Bridging Background Knowledge Gaps in Translation with Automatic Explicitation"

From the paper:


"Relevant Entity Selection: Knowledge Graph Bootstrapping [from Wikidata] via Zero-Shot Analogical Pruning"

From the abstract:


"Assembling Hyperpop: Genre Formation on Wikipedia"

From the abstract:


"After all, who invented the airplane? Multilingualism and grassroots knowledge production on Wikipedia"

From the abstract:

"Excerpt on first powered flights in the (Portuguese Wikipedia's) Avião article" (figure from the paper)

References

Supplementary references:


Uses material from the Wikipedia article Wikipedia:Wikipedia Signpost/2024-09-26/Recent research, released under the CC BY-SA 4.0 license.