Digital Archives

Internet Archive

One of the most fascinating things about the Internet from when I first started using it twenty years ago was this sense of having the world at your fingertips. I remember downloading Mosaic for the first time and discovering this thing they called a search engine, in this case, an early version of Yahoo. Organized by categories, you could search and drill down from topic to topic.

Eventually Google supplanted Yahoo, and then conceived the project back in 2004 of digitizing every known book. I’ve used this to track down quotes to their sources (and sometimes discovered that there was no source for the quote attributed to a particular author). I’ve downloaded 19th century sermon collections available for free. I understand that in recent years, Google has slowed down these efforts. Some would contend this reflects a shift in mission to use of search data in marketing, but it also reflects the fact that they’ve digitized over 20 million books! And they’ve been hampered by some lawsuits along the way.

Another outfit that has also been digitally archiving books and an incredible array of other materials is the Internet Archive. The Internet Archive is a non-profit effort launched in 1996 in San Francisco that includes text, audio, moving pictures, software, and, significantly, archived web pages. I discovered for example that you can look at a collection of archived campaign webpages from 1996. One of the challenges of the internet is its ephemerality. Have you ever come across a weblink that no longer works or a page that no longer exists? The Internet Archive may be the place that still has a record of this. From their homepage you can use their Wayback machine to enter an old URL to see if it is in their archives.

One of the other standout features of Internet Archive for the computer geek is old software from MS-DOS games to VisiCalc for the Apple II. They even have an emulator that allows you to play the games in your browser. Yes, you can play Oregon Trail again!

One writer described the Internet Archive as “a chaotic, beautiful mess”. Indeed, among other places you can go from their home page is a free audiobook collection, a Grateful Dead collection, the Biodiversity Heritage Library, The Iraq War Collection, The Portuguese Web archive, and that collection of MS-DOS software!

The question of course, is whether you can find what you are looking for. My sense is that Google’s search algorithms are better for getting you in the neighborhood of what you are looking for. But the Internet archive is just a fun place to snoop around, and you can do it from your own home.

It occurs to me that one of the big questions around the future of libraries and archives is both how to preserve materials in physical form and also to continue to preserve digitized materials including media that only ever had a digital format, especially because of the weird paradox that digital materials often degrade far faster than the printed page. It makes me wonder if a journal on my daily doings will last longer than my social media presence on Facebook’s servers–of course, who is going to want to study either?

At any rate, exploring all this reminded me that librarians and archivists to day face very different challenges in preserving not only printed primary source materials but the digital record of our society. It will be an interesting task to figure out what is important enough to safe and what is just ephemera!