The Internet Archive is Far Too Fragile

A weeklong site outage is a damning indictment of an unserious organization.

Ben Zotto
5 min readOct 15, 2024
Archive.org home page, October 10, 2024.

As I write this, the Internet Archive — repository of all manner of documents, books, magazines, historical records, audio recordings, film, you-name-it—has been inaccesible on the web for an entire week. The proximate cause of the outage was shared by founder and director Brewster Kahle: recovery from a coordinated series of hacks including data theft, security breach, and aggressive denial-of-service attacks on October 8–10th.

This was, of course, repugnant vandalism of one of the web’s best resources, and I don’t envy the Archive’s ops team for their surely heroic efforts in locking down and restoring the services—work that, as of this writing, is well into its seventh straight day.

But the more important issue is: how could this possibly have happened in the first place? Malicious actors are not a new threat to prominent websites. The Internet Archive may be a nonprofit, but it is no fly-by-night operation. It’s an organization that’s nearly three decades old and had a budget of $37 million in 2019—surely more today. The Internet Archive is rooted in the first wave of the world wide web; it’s older than Facebook and Twitter and YouTube, predates Wikipedia and even Google. So this is not some credulous crew that just got off the bus. What happened?

I’ve written before about the Internet Archive’s inexplicable and infuriating tendency to see themselves not as a resource that the whole world relies on for research, reading, and reference, but instead as someone’s scrappy best-effort passion project. And that is no more obvious than in this current fiasco. Weeklong site outages are basically unheard of for any site in the 21st century, let alone for one of the best-known (and most important) internet services. Network operations people traditionally talk about their site uptimes in terms of “nines,” with “five nines” (99.999% uptime) being a typical service standard that works out to about five minutes of downtime in total during a calendar year. The Internet Archive is currently down to just a single nine.

Meanwhile, the only status updates they are offering are via Brewster Kahle’s personal Twitter/X account, to which he seems to be adding brief tidbits just once a day, including such wry professionalisms as “yippie!”

An organization with a $37 million budget and a staff of over a hundred is relying entirely on its founder/director doing casual one-liners about one of the most disastrous web outages anywhere in recent memory. “Quirky” is fine as a corporate personality, but reserve that for the good times. Take the bad times seriously—your users certainly do.

I love the Internet Archive. Without it, my own research and writing efforts, on which I spend an inordinate amount of my time, would be substantially limited—and this past week has been a harsh reminder of that; I’ve had to put off multiple projects while they sort this out. Online communities that I participate in rely on the Archive as a place to find and make permanent all manner of documentation and ephemera. I also donate to the Internet Archive—sometimes a little, sometimes a lot—because it’s a very good thing and it matters to me.

But I’m astounded by this manifest unseriousness. The home page for the site, after all these days, still offers only a link to social media accounts for status. And it concludes, as of this writing, with: “Our patrons have asked how they can support: PayPal.” A $37 million organization that’s gone dark for a week is taking pizza money via PayPal?

Kahle has reassured users that “The data is safe.” This is like a mechanic reassuring you that your car is definitely not infested with snakes when you drop it off for an air conditioning repair. Why would anyone think the Internet Archive’s data—which is, of course the archive itself—was at risk? Companies get hacked all the time. They keep backups. They have people on call with pagers. The network operations people have plans: plans A, B and C—and process runbooks at the ready for all manner of failures. Then, there are usually vendors on call who can, at a moment’s notice, help a customer with emergency hardware, software, or ops consulting.

The present situation at the Archive suggests that the Internet Archive has not prioritized any of that. I mean, I’m glad to know the data is safe, but… is the organization? If I can’t trust them to recover from a malicious attack — something that happens to prominent websites with some regularity — in a few hours, let alone a week, then how sure am I— as an uploader, a user, and a donor—how reliable and “archival” they really are over the long term?

Something seems to have gone awry in the priorities of an organization that has allowed this to happen. Kahle is an icon, a deeply passionate leader, and by all accounts a very decent man. But a leader is responsible for the culture, too, and it’s clear that the Internet Archive’s values are largely about libraries and archives, and apparently much less about actual access (in the most literal senses). Do they think no one uses their web site? The Internet Archive needs to worry about all of these things, lest they lose the long term trust of researchers, contributors and donors who give their mission shape and importance.

I wish the Internet Archive and Kahle the best for a speedy recovery. I guess I’ll be downloading archived documents going forward, rather than assuming I’ll be able to access them later via the website. I hope this event triggers serious consideration at the organization around what they are prioritizing and spending their time and donations on, and how to best signal strength and permanence to the rest of the internet that relies on them.

Update October 17: The collection archive is in its ninth day of downtime. Founder Kahle has not said anything at all for the last 48 hours, but the head of the Wayback Machine (Mark Graham) responded to someone who asked when the main archive would be back online. His response was “days…”

Update October 20: The collection archive is in its twelfth consecutive day of downtime. Two days ago, the Washington Post ran an article covering the outage under the headline “The world’s largest internet archive is under siege—and fighting back.” In it, Kahle responds to the hackers’ statement of “Have you ever felt like the Internet Archive runs on sticks and is constantly on the verge of suffering a catastrophic security breach?”with a simple “They’re not wrong.” It is, at least, heartening to hear Kahle acknowledge that core issue, because it’s difficult to draw any other conclusion from the ongoing disastrous outage.

Update October 25: After a few days of unstable availability, the collection archive appears to be back online. Half of search is still broken (you can search by metadata but not within text) and logins are not permitted, so uploading content is disabled. But basic access, which is better than nothing. Notably, the response time for search and the site in general appears to be much snappier than it traditionally is, though it’s hard to know if that’s because of improvements or simply lower traffic at the moment.

--

--

Responses (6)