End of Term Web Archive
| ||
---|---|---|
Transitions | ||
Planned transitions | ||
Related | ||
The End of Term Web Archive is an archival project that preserves U.S. federal government websites during administration changes.
Background
The End of Term Web Archive was set up following a 2008 announcement from National Archives and Records Administration (NARA) that they would not be archiving government websites during transition, after carrying out such crawls in 2000 and 2004. The 2004 federal web harvest can be accessed alongside congressional web harvests, beginning with the 109th United States Congress, at National Archives.
The first project partners were the Library of Congress, George Washington University Libraries, Stanford University Libraries, University of North Texas Libraries, the US Government Publishing Office, California Digital Library and the Internet Archive, all members of the International Internet Preservation Consortium. The project was initially sketched out after a General Assembly of the IIPC in 2008. NARA and the Environmental Data & Governance Initiative (EDGI) joined the 2020/21 project.
The project
The project archives websites and documents for public access and research use. Data from archiving 2008, 2012, 2016, and 2020 End of Term datasets can be downloaded in bulk. As of February 2025, the 2004 datasets are still being inventoried and there is plan to move a copy of all datasets into Amazon Web Services.
A UNT study into the risk to document files found that 83% of PDFs on the .gov domain in 2008 were missing four years later. This is consistent with the requirement to manage websites, but their status means that changes may be of interest to the public and watchdog groups. Evidence of the demand for continued access to historical web material can be found in an announcement made by the EPA in response to concerns about changes in 2017, stating that pages from the previous administration would be carefully archived. These snapshot pages were clearly marked to distinguish them from contemporary content.
The archive prioritizes sites administering areas regarded as likely to be updated or removed over the period of transition. The public are encouraged to nominate important sites and these are combined with broad crawls of government domains to create the collection. Although it is extensive - the 2016 crawl preserved 11,382 sites - it stops short of being comprehensive. Researchers have used these collections to examine the history of climate change policy and reuse of suspended U.S. government Twitter accounts.
The 2024 crawl began in January 2024, with a URL Nomination Tool developed by the University of North Texas.