Poor usability and careless downtime are hampering one of the internet’s most important resources.
The Internet Archive — critical research and content resource — was completely offline for well over an hour yesterday. Why? Because there was a windy storm in San Francisco, and they lost power:
PG&E is the local power utility, and serves the quiet residential neighborhood where the Archive has both its office and — in a quirk that’s charming and worrisome— its datacenter. Brewster Kahle (above) is the founder, director, and spiritual leader of the organization and service, and all he is able to do, apparently, is refresh the power company’s status page like any random household hoping the food in the fridge won’t spoil. This is an absolutely insane way to approach site reliability at one of the most important resources on the net.
The Archive hosts the “Wayback Machine,” of course, which has been preserving web site content from blinking out of existence for over 20 years. But it’s also the home to an absolutely endless collection of scanned and digitized material — books, yes, but also full runs of magazines, ephemera, a huge cache of vintage software, audio and television and film recordings. The Archive does its own digitization, partners with third-party archives, and gets uploads from dedicated volunteers — and makes all of this stuff available to serious researchers and the curious alike. Unfortunately, they seem to run the service as if it were only a quirky little side project, and it is riddled with not just poor reliability but also embarassingly outdated web design and usability.
All documents on the Archive appear in a web viewer, which allows paging and zooming and search. But that core web viewer only works correctly in desktop browsers and is nonfunctional on mobile. If I open a document on archive.org on my phone, I find that trying to pinch to zoom and page through will quickly have me stuck in a corner of a PDF page unable to move or scroll to the spot I want. This has been the case for years! I don’t know what the explanation for this is, but I know that the mobile web is old enough that children born when the iPhone launched are now themselves teenagers using smartphones. There’s just no excuse for this. A huge amount of real work is done on mobile devices, why aren’t they supported?
The Archive’s search functionality is also lousy. You have to choose between searching metadata and text content, and you can’t search through one and then scope down with the other. This makes it impossible to say, for example, “Find all the old issues of Datamation magazine, and then search for the term IBM only within those documents and no others.” Or, let’s say you want to find the name “MacGuyver,” but only within San Francisco historic city directories. Filtering (or searching) by flexible date ranges — another obvious mode — is also fussy and difficult. Search result snippet UX is barely-there; a janky service like newspapers.com is worlds better at this, to say nothing of Google Books.
Text search within documents is flaky: usually it works, sometimes it tells you there are no results (when there certainly are). If you’re not used to this unreliability, you might easily miss something important. As recently as 2021, I ran into a spate of strange search errors that told me “Sorry, the text may still be processing.” I emailed the Archive’s support about this, and I received this in response:
Hi,
Thanks for contacting us.
Text search is undergoing maintenance. It could be up to a few days before it is fully functional.
Hope this helps.
Thanks for using archive.org
A few days? For maintenance?
With no warning and misleading messaging, a major component of a crucial research service had gone offline for an unknown amount of time. I was on a deadline, and I needed search-within-document to complete the project. I responded to ask if there was some central place where I could follow the status, and this was the answer: “The nature of the interruption is not always predictable. While I agree it is a good idea, we currently do not maintain a status page.” Yikes.
(Instead of waiting, I had to download dozens of documents as PDFs to my computer, and had to use local tools to search them individually.)
Look, this is clown stuff. It’s junior varsity. The Internet Archive may be a nonprofit, and I understand they do more than just operate a web front end. But when was the last time you can remember search failing in Wikipedia — or the whole site going down for hours? Wikipedia is also a nonprofit.
I’m a historian and researcher and I rely on the Internet Archive for all sorts of critical material that is essentially unobtainable elsewhere. It’s an organization with an important mission. I donate to them every year, and more than just pocket change; it’s a resource that is seriously worthy of support.
I’m harsh in my criticisms here out of love. I would not find this so frustrating if it didn’t matter so much. The Internet Archive may not be a top-10 web site in terms of traffic, but it has evolved a crucial role as part of the nervous system of the internet itself. Can you imagine Mark Zuckerberg or Satya Nadella helplessly refreshing the power company’s page because Instagram or Microsoft Teams went dark after a tree knocked over a utility pole? I didn’t think so. Those organizations — like almost all modern web operations, for-profit and otherwise— have evolved a culture of accessibility and reliability for their services and they design and build accordingly.
It’s time for the Internet Archive to grow up. No organization is perfect or fully resourced, and no service will be fully reliable for every second of every day forever. But you can’t expect people to take you seriously when you don’t even take yourselves seriously. Please, archive.org, claim the mantle of pillar of the internet that your logo suggests you aspire to. Your legions of committed users are waiting.