The Minecraft Archive Project

ZZT
Minecraft
Hi, I'm Leonard Richardson. When I was growing up in the 1990s, my favorite computer game was a blocky little thing called ZZT. Lots of games had level editors, but ZZT came with its programming language, allowing you to script your own adventures and puzzles.

The kids who grew up playing ZZT are now artists, game designers, and programmers. But many of the worlds they created are gone. ZZT worlds were shared through BBSes and online services like CompuServe. When the Internet took over, those services shut down and the worlds were lost. It's estimated that only half the ZZT worlds ever created still survive.

In the early 2010s I realized that history was repeating itself. This time, the blocky game with the embedded programming language is Minecraft. Kids and teenagers are creating worlds, putting a lot of work into them, and sharing them on unreliable file-hosting sites. In the 2020s they'll be artists, game designers, and programmers. They'll get nostalgic, start thinking back on the game that showed them how fun it was to create their own worlds... and it'll all be gone.

The Minecraft Archive Project is my attempt to stop that from happening. Minecraft is much more popular than ZZT ever was, and I don't think I can save more than a fraction of one percent of its cultural history, but without this project pretty much all of it is doomed.

In the Collection

I periodically refresh the MAP by capturing new data from a couple different sites:

My focus is on packaged binaries (Minecraft maps, resource packs, and mods) but I also capture images (screenshots and skins).

The collection is so large that even showing you an index is a daunting task, but I've created a lot of tables to summarize it. Here's the grand total of all the images and binaries I've captured as of February 2016:

Number of images Size of image collection Number of binaries Size of binary collection
1.7m 246 GiB 951k 2.57 TiB

The repository repository

I clone every Git repository I learn about in the course of performing a capture. I also periodically run a Github search for "minecraft" and related terms, and clone every repository that shows up. Since a version control repository contains its own history I put all the repositories in the same place—there's no need to separate things out by the date of capture.

GameNumber of repositoriesTotal size
Minecraft 18k 391 GiB
Bukkit 10k 30 GiB
Other "craft" projects 7k 121 GiB
Total 35k 542 GiB

The "craft" section is a mixed bag. It mostly contains Minecraft clones or games inspired by Minecraft, and projects that have nothing to do with games at all—they just have a name like "FooCraft" that sounds Minecraft-ish.

How?

There's no secret, really. I just wrote a lot of Python scripts and let them run for a really long time. When one script finishes, I run the next one in the sequence. I go into some detail about my process in a 2015 blog post.

Where?

There are three copies of the Minecraft Archive Project. My personal copy is the master copy. I keep a backup copy at my workplace at NYPL Labs, and I periodically send copies to Jason Scott of the Internet Archive.

Making the archive publicly available is a tricky proposition. It's a really huge dataset, and most of it is still online... somewhere... if you know to look for it. I also don't have the time to do a proper presentation. At this point my only concern is making sure the dataset makes it to the future intact.

Spin-off projects

Using the data from the initial MAP capture in 2014, I created these projects:

ESC: the Ephemeral Software Collection

As the Minecraft Archive Project grew, I started getting data from sites like CurseForge and GitHub which contain both Minecraft and non-Minecraft stuff. I started the Ephemeral Software Collection to hold the non-Minecraft stuff. Before long, the ESC became larger than the Minecraft Archive Project that spawned it—over four terabytes as of February 2016.

You can see an overview of the ESC here. It's an eclectic collection, but I generally think of it as containing software that's at risk of being lost, forgotten or destroyed. It contains the equivalent of the Minecraft Archive Project for games other than Minecraft. It also contains mods, add-ons, software created for one-off events like game jams, implementations of classic computer games, software that exists in a copyright or trademark grey area, experimental code, and stuff I just think might be interesting or useful later.

I'm not even going to try to archive a comprehensive collection of ephemeral software, but I figure I might as well collect what I can, since it's mostly a matter of letting a script run and filling up old hard drives.

What I Didn't Capture

I could spend my whole life archiving this stuff, but... I don't want to. Once in a while—my plan is once a year—I run the basic Minecraft capture scripts on Planet Minecraft and the Minecraft forum. Everything else I get, I consider a bonus.

Whenever I discover or hear about some new dataset of ephemeral software, I put it on the following list and then forget about it. If you're inspired by what I've done with the Minecraft Archive Project and the Ephemeral Software Collection, a great way to show your appreciation would be to tackle one of these projects. Otherwise we'll see if these sites are still around when I retire.

If you happen to run one of these sites and would like to contribute a mirror to the MAP or ESC or the Internet Archive, and make sure your users' creations don't get lost, please send me email at leonardr@segfault.org.

Adding to the Minecraft Archive Project

The holy grail of the Minecraft Archive Project is a way to automatically archive active public Minecraft servers. There's no technical obstacle to doing this—walking around on a server streams the chunks to the client, and there are even mods for archiving the streamed chunks—but I've never gotten these mods to work, and getting it to work automatically, across hundreds of thousands of servers running different versions of Minecraft, requires work and resources far beyond what I can bring to the project. Thinking of applying for a digital preservation grant? Try this project out.

I would like to set up a dead-drop email address where people can send their zipped-up Minecraft worlds to explicitly put them in the MAP without publishing them anywhere else. This creates a lot of problems that I don't have time to deal with, so I haven't made any serious attempt at this.

Getting into the more achievable goals, there are more Minecraft maps at Minecraft Maps, MinecraftDL, 9Minecraft, etc. I don't even know if these sites have anything new or if it's all duplicates of things I already have. I haven't gone through them because adding a new site to the rotation is a lot of work, and these collections are very small compared to the Minecraft forum or Planet Minecraft.

Back in May 2014 I archived maps from Minecraft World Share and Minecraft World Map, but I haven't been back. It's a similar situation—they have a couple thousand maps but the collection is relatively small and doesn't grow quickly the way Planet Minecraft does.

The Technic Platform hosts thousands (not sure of the exact number) of Minecraft mod packs.

Adding to the Ephemeral Software Collection

My top wishlist item for the Ephemeral Software Collection is a way to archive all the Super Mario Maker levels. I have no idea how to do this—I suspect you need to mod a Wii U.

Why am I concerned about Super Mario Maker? Because of what happened to Warioware D.I.Y.. Four years after this DS game was released, Nintendo shut down the servers that allowed you to share your minigames. Now the only way to collect old D.I.Y. levels is to buy old cartridges and rip them.

Steam Workshop hosts millions of add-ons for over 300 games, as well as screenshots and links to hosted videos. It seems extraordinarily difficult to download the files, though. I think it's impossible if you don't own the games, and you'll probably need to hack a Steam client if you want to download the add-ons in a systematic way.

I'd like someone to archive all the board game rules and other files on BoardGameGeek. I regularly archive BGG game metadata for the Loaded Dice project, but getting the files is a much trickier proposition.

Youtube hosts petabytes of gaming videos, and there's no way to save it all, but it should be possible to archive a gameplay video for every game in MobyGames. It's also especially important to archive gameplay videos for mobile and online games, which can die as soon as the game studio shuts down a server.

Since ZZT started me on this project in the first place, I should make sure to mirror the ZZT archive, as well as the archive of its cousin Megazeux.

The Terraria forums have links to mods and maps.

Hacked console ROMs (Super Mario World, Sonic the Hedgehog, etc.) Big collection at Romhacking.net. I'm sure other people have private collections of these, so it's not as big a deal.

A wide variety of mods (and prerelease versions of games in development) at ModDB. Similarly, mobile games at SlideDB.

Civilization add-ons at CivFanatics.

The Sims mods at Mod the Sims.

Kerbal Space Program mods at Kerbal Stuff and KSP mods.

Nexus Mods hosts over 100,000 add-ons for over 200 games.

Glorious Trainwrecks archives thousands of quickly-created games.

In general

Whenever we humans create a new art form, the early stuff gets lost. It's not considered "art", it doesn't fit into the existing archives, it's a pain to collect, expensive to keep around, and nobody's in charge of saving it. So it gets lost. This is especially true for art forms favored by children or other people who aren't considered artists.

Time passes, and we regret the loss. We cherish every scrap that survives. Ninety percent of humanity's early films are gone, and a lot of the ten percent is crap, but we preserve it all because there's nothing else like it. Sometimes the crap turns out to be pretty good after all: pulp sci-fi and noir. Even ephemera, things that never get raised to the level of "art", become valuable as windows into the past: account books, restaurant menus, road maps, receipts.

I believe all this stuff is art and I want to save it. But even if history disagrees with me, and the MAP and the ESC are classified as ephemera, that's fine too. In the long run, it's all ephemera.