Wikipedia:Wikipedia Signpost/2023-01-01/Technology report

Technology report

Wikimedia Foundation's Abstract Wikipedia project "at substantial risk of failure"

Could Abstract Wikipedia fail?

People sitting around an outdoor dining table with a view of a lake and Swiss landscape in the background
Members of the Foundation's Abstract Wikipedia team with the Google.org Fellows and others at an offsite in Switzerland this August. Left-hand side of the table, from front to back: Ariel Gutman, Ori Livneh, Maria Keet, Sandy Woodruff, Mary Yang, Eunice Moon. At head of table: Rebecca Wambua. Right-hand side of the table, front to back: Olivia Zhang, Denny Vrandečić, Edmund Wright, Dani de Waal, Ali Assaf, James Forrester

In 2020, the Wikimedia Foundation began working on Abstract Wikipedia, which is envisaged to become the first new Wikimedia project since Wikidata's launch in 2012, accompanied and supported by the separate Wikifunctions project. Abstract Wikipedia is "a conceptual extension of Wikidata", where language-independent structured information is rendered in an automated way as human-readable text in a multitude of languages, with the hope that this will vastly increase access to Wikipedia information in hitherto underserved languages. Both Abstract Wikipedia and Wikifunctions are the brainchild of longtime Wikimedian Denny Vrandečić, who also started and led the Wikidata project at Wikimedia Deutschland before becoming a Google employee in 2013, where he began to develop these ideas before joining the Wikimedia Foundation staff in 2020 to lead their implementation.

An evaluation published earlier this month calls the project's future into question:

That Fellowship was part of a program by Google.org (the philanthropy organization of the for-profit company Google) that enables Google employees to do pro-bono work in support of non-profit causes. The Fellow team's tech lead was Ori Livneh, himself a longtime Wikipedian and former software engineer at the Wikimedia Foundation (2012–2016), where he founded and led the Performance Team before joining Google. The other three Google Fellows who authored the evaluation are Ariel Gutman (holder of a PhD in linguistics and author of a book titled "Attributive constructions in North-Eastern Neo-Aramaic", who also published a separate "goodbye letter" summarizing his work during the Fellowship), Ali Assaf, and Mary Yang.

The evaluation examines a long list of issues in detail, and ends with a set of recommendations centered around the conclusion that –

Among other things, the Fellows caution the Foundation to not "invent a new programming language. The cost of developing the function composition language to the required standard of stability, performance, and correctness is large ..." They propose that –

  • "Wikifunctions should extend, augment, and refine the existing programming facilities in MediaWiki. The initial version should be a central wiki for common Lua code", Lua being "an easy-to-learn and general purpose programming language, originally developed in Brazil" that is already widely used on Wikimedia projects, with the added benefit of satisfying "a long-standing community request" (for a "Central repository for gadgets, templates and Lua modules", which had been the third most popular proposal in the 2015 Community Wishlist Survey).

Regarding Abstract Wikipedia, the recommendations likewise center on limiting complexity and aiming to build on existing open-source solutions if possible, in particular for the NLG (natural language generation) part responsible for converting the information expressed in the project's language-independent formalism into a human-readable statement in a particular language:

The Foundation's answer

A response authored by eight Foundation staff members from the Abstract Wikipedia team (published simultaneously with the Fellows' evaluation) rejects these recommendations. They begin by acknowledging that although "Wikidata went through a number of very public iterations, and faced literally years of criticism from Wikimedia communities and from academic researchers[, the] plan for Abstract Wikipedia had not faced the same level of public development and discussion. [...] Barely anyone outside of the development team itself has dived into the Abstract Wikipedia and Wikifunctions proposal as deeply as the authors of this evaluation."

However, Vrandečić's team then goes on to reject the evaluation's core recommendations, presenting the expansive scope of Wikifunctions as a universal repository of general-purpose functions a done deal mandated by the Board (the Wikimedia Foundation's top decision-making authority), and accusing the Google Fellows of "fallacies" rooted in "misconception":

(The team doesn't elaborate on why the Foundation's trustees shouldn't be able to amend that May 2020 mandate if, two and a half years later, its expansive scope does indeed risk causing the entire project to fail.)

The evaluation report and the WMF's response are both lengthy (at over 6,000 and over 10,000 words, respectively), replete with technical and linguistic arguments and examples that are difficult to summarize here in full. Interested readers are encouraged to read both documents in their entirety. Nevertheless, below we attempt to highlight and explain a few key points made by each side, and to illuminate the underlying principal tensions about decisions that are likely to shape this important effort of the Wikimedia movement for decades to come.

What is the scope of the new "Wikipedia of functions"?

In an April 2020 article for the Signpost (published a few weeks before the WMF board approved his proposal), Vrandečić explained the concept of Abstract Wikipedia and a "wiki for functions" using an example describing political happenings involving San Francisco mayor London Breed:

In other words, the proposal for what is now called Wikifunctions combined two kinds of functions required for Abstract Wikipedia ("constructors" like the elect example and "translators" or renderers for natural language generation that produce the human-readable Wikipedia text) with a much more general "new form of knowledge assets", functions or algorithms in the sense of computer science. While the examples Vrandečić highlighted in that April 2020 Signpost article are simple calculations such as unit conversions that are already implemented on Wikipedia today using thousands of Lua-based templates (e.g. {{convert}} for the inches to centimeters translation), the working paper published earlier that month (recommended in his Signpost article for "technical aspects") evokes a much more ambitious vision:

Indeed, a function examples list created on Meta-Wiki in July 2020 already lists much more involved cases than unit conversion functions, e.g. calculating SHA256 hashes, factorizing integers or determining the "dominant color" of an image. It is not clear (to this Wikimedian at least) whether there will be any limits in scope. Will Wikifunctions become a universal code library eclipsing The Art of Computer Programming in scope, with its editors moderating disputes about the best proxmap sort implementation and patrolling recent changes for attempts to covertly insert code vulnerabilities?

This ambitious vision of Wikifunctions as spearheading a democraticizing revolution in computer programming (rather than just providing the technical foundation of Abstract Wikipedia) appears to fuel a lot of the concerns raised in the Fellows' evaluation, and conversely motivate a lot of the pushback in the Abstract Wikipedia team's answer.

Would adapting existing NLG efforts mean perpetuating the dominance of "an imperialist English-focused Western-thinking industry"?

Another particularly contentious aspect is the Fellows' recommendation to rely on existing natural language generation tools, rather than building them from the ground up in Wikifunctions. They write:

The Foundation's response argues that this approach would fail to cover the breadth of languages envisaged for Abstract Wikipedia:

(However, in a response to the response, Keet – a computer science professor at the University of Cape Town who volunteers for Abstract Wikipedia – disputed the Foundation's characterization of her concerns, stating that her "arguments got conflated into a, in shorthand, 'all against GF' that your reply suggests, but that is not the case.")

The Abstract Wikipedia team goes on to decry Grammatical Framework as a –

Uses material from the Wikipedia article Wikipedia:Wikipedia Signpost/2023-01-01/Technology report, released under the CC BY-SA 4.0 license.