Minecraft Archive Project - Capture Details

I append on to this page with a summary of the information I obtain in each new Minecraft Archive Project capture. Each table shows the amount of new data I obtained in a capture.

2016 February

Site Collection Images Image size Binaries Binary size
minecraftforum.net Maps 2k 1 GiB 1k 12 GiB
Mods 5k 1 GiB 1k 2 GiB
Resource packs 1k < 1 GiB 288 1 GiB
Skins 172 < 1 GiB - -
Maps (MCPE) 146 < 1 GiB 44 < 1 GiB
Mods (MCPE) 459 < 1 GiB 93 < 1 GiB
Resource packs (MCPE) 58 < 1 GiB 16 < 1 GiB
planetminecraft.com Maps 34k 13 GiB 2k 17 GiB
Mods 1k < 1 GiB 38 < 1 GiB
Resource packs 1k < 1 GiB 60 < 1 GiB
Servers 14k 4 GiB - -
Skins 7k < 1 GiB - -

2015 December

Site Collection Images Image size Binaries Binary size
bukkit.org Bukkit plugins30k 4 GiB 83k 9 GiB
feed-the-beast.net Modpacks - - 307 14 GiB
curseforge.com Maps 3k 2 GiB 3k 42 GiB
Mods 16k 3 GiB 22k 28 GiB
Resource packs 5k 3 GiB 5k 76 GiB
Customizations 215 < 1 GiB 233 < 1 GiB
Modpacks 1k < 1 GiB 4k 86 GiB
minecraftforum.net Maps 10k 5 GiB 4k 38 GiB
Mods 10k 2 GiB 2k 2 GiB
Resource packs 2k < 1 GiB 1k 7 GiB
Skins 1k < 1 GiB - -
Maps (MCPE) 1k < 1 GiB 394 2 GiB
Mods (MCPE) 1k < 1 GiB 1k < 1 GiB
Resource packs (MCPE) 166 < 1 GiB 51 < 1 GiB
planetminecraft.com Maps 254k 39 GiB 29k 236 GiB
Mods 3k < 1 GiB 318 < 1 GiB
Resource packs 8k 1 GiB 1k 22 GiB
Skins 29k < 1 GiB - -
Server records 48k 8 GiB - -

2015 February

Site Collection Images Image size Binaries Binary size
minecraftforum.net Maps 16k 8 GiB 7k 66 GiB
Mods 19k 3 GiB 5k 9 GiB
Resource packs 30k 9 GiB 6k 63 GiB
Skins 44k 2 GiB 405 -
Mods (MCPE) 2k < 1 GiB 2k 1 GiB
Maps (MCPE) 13k 4 GiB 7k 8 GiB
Resource packs (MCPE) 2k < 1 GiB 1k 1 GiB
planetminecraft.com Maps 55k 8 GiB 14k 47 GiB
Mods 2k < 1 GiB 1k < 1 GiB
Resource packs 11k 1 GiB 1k 28 GiB

2014 July

Site Collection Images Image size Binaries Binary size
planetminecraft.com Resource packs 86k 10 GiB 24k 95 GiB
Skins 194k < 1 GiB - -

2014 June

Site Collection Images Image size Binaries Binary size
minecraftforum.net Mods 102k 13 GiB 33k 33 GiB
planetminecraft.com Maps 561k 72 GiB 186k 798 GiB
Mods 16k 1 GiB 6k 3 GiB

2014 May

Site Collection Images Image size Binaries Binary size
minecraftworldmap.com Maps 6k < 1 GiB 6k 272 GiB
minecraftworldshare.com Maps 1k < 1 GiB 2k 18 GiB

2014 April

Site Collection Images Image size Binaries Binary size
minecraftforum.net Maps 34k 11 GiB 354k 342 GiB

Ephemeral Software Collection - Capture Details

Some of the ESC comes from the same CurseForge sources I captured for the Minecraft Archive Project. It's the same kind of stuff—mods, custom maps, and so on—just for games other than Minecraft.

201512 Capture

Site Game Collection Images Image size Binaries Binary size
curseforge.com Firefall Addons 147 < 1 GiB 223 < 1 GiB
Kerbal Space Program Mods 2k < 1 GiB 2k 21 GiB
Kerbal Space Program Shareables 2k < 1 GiB 1k < 1 GiB
Rift Addons 1k < 1 GiB 6k 7 GiB
Runes of Magic Addons 1k < 1 GiB 3k 2 GiB
Terraria Maps 2k < 1 GiB 1k 4 GiB
The Elder Scrolls Online Addons 308 < 1 GiB 1k < 1 GiB
The Secret World Mods 318 < 1 GiB 1k < 1 GiB
Wildstar Addons 2k < 1 GiB 6k 1 GiB
World of Tanks Mods 1k < 1 GiB 2k 39 GiB
World of Tanks Skins 374 < 1 GiB 303 12 GiB
World of Warcraft Addons 9k 2 GiB 105k 56 GiB
sc2mapster.com Starcraft 2 Assets 2k < 1 GiB 2k 4 GiB
sc2mapster.com Starcraft 2 Maps 8k 4 GiB 6k 36 GiB
skyrimforge.com Skyrim Mods 1k < 1 GiB 398 6 GiB

Cloned Git repositories

The ESC also contains about four terabytes of source code cloned from GitHub. I looked for implementations and clones of classic games, add-ons for games with active modding communities, popular game genres, and other types of software I thought would be interesting to preserve.

I made over 100 collections, each focused on a query against the GitHub API. For a given collection, I ran the query, found the addresses for all GitHub repositories that matched the query, and cloned them. For example, I wanted to archive all the implementations of Conway's Life on Github, so I ran a search against the GitHub API for conway life, found about 4000 repositories, and cloned them into the conway life collection. (This dataset was used to create my bot That's Life!.)

You can figure out roughly what's in each collection by looking at the name, but the query I used is not necessarily the same as the name of the collection. Sometimes this is true ("checkers") and sometimes not. The query for the "snake" collection is snake game, because I wanted implementations of the classic game Snake, not any project that mentioned a snake. The query for the "soccer" collection is soccer NOT football, because there's a separate football collection. For the decadal collections ("1970s", "1970s", "1990s") the query includes the name of the decade as well as each individual year in that decade. Stuff like that.

There are a fair number of duplicates in this collection, repositories that show up in more than one collection, but I've run some basic deduping code and probably no more than 50 GiB of the collection is duplicated.

Here's a table of the collections I made, with the approximate number of repositories in each (when I have that information handy) and the total size of each collection.

CollectionNumber of reposTotal size
1970s1k45 GiB
1980s1k21 GiB
1990s3k52 GiB
abstract1191 GiB
age of empires1282 GiB
angry birds3719 GiB
arcade 3k 87 GiB
asdf12k11 GiB
assembly21k9 GiB
asteroids3k58 GiB
atari69711 GiB
baseball2k32 GiB
basketball1k27 GiB
battleship5k28 GiB
beautiful soup2021 GiB
bitcoin7k83 GiB
board game8k132 GiB
bowling7334 GiB
breakout4k56 GiB
checkers2k14 GiB
chess11k99 GiB
chiptune1291 GiB
classic5k58 GiB
coding challenge7k51 GiB
conway life4k9 GiB
cricket751 GiB
dada5636 GiB
demoscene1473 GiB
dinosaur4519 GiB
dogecoin4023 GiB
doom1k< 1 GiB
dungeon?123 GiB
dwarf fortress35112 GiB
emulator9k119 GiB
erotic46< 1 GiB
fallout?12 GiB
firefall1< 1 GiB
football5k69 GiB
fractal3k31 GiB
game jam3 10k 749 GiB
gender98920 GiB
golf3k60 GiB
graphic adventure2302 GiB
half-life23040 GiB
hangman5k17 GiB
hockey2k26 GiB
interactive fiction2k11 GiB
invaders3k46 GiB
kerbal space program99524 GiB
mame39532 GiB
mastermind5021 GiB
maze7k145 GiB
megazeux10< 1 GiB
minesweeper5k18 GiB
minigame96523 GiB
mud?39 GiB
nethack365< 1 GiB
nintendo1k30 GiB
old-basic41561 GiB
open world22411 GiB
othello1k7 GiB
pac man1k21 GiB
pico 897< 1GiB
platformer5k184 GiB
pokemon4k94 GiB
pong9k90 GiB
puzzle 14k 202 GiB
quake2k< 1 GiB
rift192< 1 GiB
robotfindskitten24< 1 GiB
roguelike3k42 GiB
runes of magic219< 1 GiB
rpg?358 GiB
sega39610 GiB
senior project591754 GiB
shooter?535 GiB
sim3k57 GiB
skyrim1< 1 GiB
snake4k21 GiB
soccer2k63 GiB
solitaire1k7 GiB
space engineers?3 GiB
star trek2739 GiB
star wars1k28 GiB
starcraft69919 GiB
strategy2k66 GiB
street fighter2334 GiB
super mario61719 GiB
surreal106< 1 GiB
tabletop92020 GiB
terraria502< 1 GiB
tetris?63 GiB
tic tac toe14k44 GiB
the elder scrolls online11< 1 GiB
the secret world25< 1 GiB
towers of hanoi9482 GiB
tycoon?13 GiB
warcraft2k39 GiB
wesnoth19922 GiB
wildstar149< 1 GiB
women1k21 GiB
world of tanks3< 1 GiB
world of warcraft1k1 GiB
zelda701< 1 GiB
zzt96< 1 GiB

1 I searched for the literal string 'asdf' to find respositories whose maintainers didn't care to name or describe their projects. Thanks to Allison Parrish for this idea.

2 6502 assembler, 8088 assembler, etc.


4 Not to bury the lede, but I think nearly a terabyte of games created for game jams (Ludum Dare, Global Game Jam, 7DRL, etc.) is the most significant part of the ESC.

4 I saved this for last and this is where I ran out of hard drive space. I could fill up a whole other hard drive with senior projects. And then there's the 'final project' collection, and the 'hackathon' collection...