Vanishing public data: How journalists can fight back

Across the U.S., environmental reporters are facing an increasingly difficult challenge: the disappearance of public data. Government records that once detailed scientific data — such as climate change and extreme weather risks — are being removed or made harder to find, often without explanation.

For journalists who depend on these datasets to tell stories, this is more than a nuisance. It’s a threat to their work.

To adapt, many are turning to new resources. Some now rely on international sources. Others are working with digital archivists who are preserving what’s been lost. Projects like the End of Term Web Archive are now playing critical roles in salvaging data that would otherwise disappear.

Mark Graham, director of the Wayback Machine at the Internet Archive, is one of the people leading this preservation work. Since 2004, Graham and his team have been part of the End of Term Web Archive, a collaborative effort dedicated to preserving U.S. government websites and datasets during presidential transitions when website purges are most likely to occur.

“The work to preserve and make available material published by the U.S. government that may be changed, or removed, from public access starts long before those changes or deletions,” Graham wrote in an email to the Institute.

In 2024 and 2025 alone, the Internet Archive houses more than 2 petabytes of archived federal websites, all available via the Wayback Machine and for download by researchers and journalists.

A pattern of disappearance

Some categories of data are more vulnerable than others. 

According to the Environmental Data & Governance Initiative, environmental justice and equity-related information have been some of the hardest hit under the second Trump administration. 

In the first 100 days of Trump’s second term, EDGI documented a barrage of removals of environmental data from agency websites, including the complete takedown of the EPA’s Environmental Justice site and the flagship EJScreen tool for environmental justice screening and mapping.

These actions build on patterns from Trump’s first term. In April 2017, the administration redirected the page for climate change on EPA’s website to a page claiming the site was being updated. That redirect remained for 18 months before ultimately leading to 404 errors.

By early 2025, EDGI observed the systematic removal of terms such as “equity” and “environmental justice” from federal pages. Entire sections were renamed from “Equity and Environmental Justice” to “Population Impacts”, and webpages highlighting the EPA’s support for African American and LGBTQ+ communities were deleted.

Early in 2025, a new wave of takedowns targeted federal agency websites. 404 Media documented that “more than 2,000 datasets have disappeared from data.gov since Trump was inaugurated” in January 2025 alone.

Once removed, recovering data is only part of the challenge. Verifying its accuracy, finding complete versions, and preserving context are just as important.

Tools for recovering data

Graham pointed journalists to several resources beyond the Internet Archive itself:

  • The Data Rescue Project, a joint initiative among data professionals and curators, has developed a public tracker of data rescue efforts. 
  • The Harvard Library Innovation Lab has published more than 311,000 datasets harvested from federal websites in recent years through its Source Cooperative platform.
  • The Public Environmental Data Project, a volunteer-led coalition involving Boston University and the Harvard Climate and Health CAFE Research Coordinating Center, has assembled a large inventory of federal tools and datasets, including the CDC’s Social Vulnerability Index and Environmental Justice Index.
  • Investigative Reporters and Editors has also been quietly downloading and preserving more than 120 federal datasets, ranging from FDA adverse event reports to the OSHA workplace safety database. Members of IRE can request access.
  • EDGI and its partners are working to rebuild some of the equity screening tools that were taken down, so far reviving six of the nine removed.

For those seeking to preserve at-risk pages themselves, the Internet Archive remains one of the most accessible tools.

Pages can be saved through the “Save Page Now” tool here. Files can also be uploaded directly to the Internet Archive using the “Upload” feature. The Wayback Machine is open for public searching of a full-text index of those pages.

These tools give individuals the power to preserve online content.

“Don’t expect a publisher to do the job of a library,” Graham said. “This is especially true about the world’s largest publisher, the U.S. government.”

For journalists, that means taking a more proactive approach. Archiving sources, backing up documents, and building relationships with preservation efforts are no longer optional — they’re part of the job. 

As public data becomes more unstable, ensuring continued access may depend on the people who rely on it most.

Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments