Amber64: Mining Hacker History from Over Half a Million Commodore 64 Disks
This talk demonstrates a methodology for performing large-scale digital forensics on a massive corpus of over 650,000 Commodore 64 floppy disk images. The speaker details the reverse engineering of the Commodore 1541 disk drive's proprietary file system, including sector chaining, block availability maps, and non-standard text encoding (PETSCII). The research enables the automated extraction, indexing, and searching of historical hacker artifacts, such as BBS logs, phreaking tools, and source code, using Elasticsearch. The project provides a framework for analyzing legacy storage media to uncover hidden data and historical context.
Mining 650,000 Floppy Disks: Forensic Lessons from the Commodore 64
TLDR: This research project demonstrates how to perform large-scale digital forensics on over 650,000 Commodore 64 disk images by reverse-engineering the proprietary 1541 disk drive file system. By normalizing non-standard PETSCII text and indexing the entire corpus into Elasticsearch, the researcher uncovered a treasure trove of historical hacker artifacts, including BBS logs and phreaking tools. This methodology provides a blueprint for modern researchers to extract actionable intelligence from legacy storage media and obscure file system structures.
Digital forensics often feels like a race to keep up with the latest cloud-native vulnerabilities, but sometimes the most valuable intelligence is buried in the past. When we talk about "legacy" systems, we usually mean an unpatched Windows Server 2012 box in a basement. We rarely consider the forensic potential of 40-year-old 5.25-inch floppy disks. Yet, as this research into the Commodore 64 (C64) ecosystem proves, these artifacts are not just historical curiosities. They are massive, untapped data sets that reveal the origins of modern offensive techniques.
Reverse Engineering the 1541 File System
The Commodore 1541 disk drive was a marvel of 1980s engineering, but its file system was entirely proprietary. It did not use standard sector-based addressing in a way that modern tools could easily parse. To index 650,000 images, the researcher had to move beyond simple string searches and actually reverse-engineer the disk layout.
The C64 file system uses a block availability map (BAM) to track free and used sectors, much like a FAT file system. However, the real challenge lies in sector chaining. Each 256-byte sector contains a header block, a data block, and a link to the next sector. Because the disk is constantly spinning, the drive uses an interleaving strategy to optimize read speeds. If you try to read sectors sequentially as they appear on the disk, you will miss the data. You must follow the sector chain pointers, which are embedded in the first two bytes of each sector.
If you are looking to replicate this, you need to handle the non-standard text encoding. The C64 used PETSCII, which is not ASCII. It features two distinct character sets that can be toggled via a "shift" state. Without normalizing this to standard Unicode, your search results will be garbage.
From Raw Data to Searchable Intelligence
The core of this project is the Amber64 toolset. It automates the extraction of files, directory structures, and even "slack space"—the data left behind in sectors that were marked as deleted but not overwritten.
For a pentester, the most interesting part of this research is the discovery of "hidden" artifacts. Because the C64 file system doesn't use folders, users simply dumped files into a single directory. When a disk filled up, they deleted old files to make room for new ones. By parsing the raw disk images, the researcher was able to recover these "deleted" files, including BBS logs that contain phone numbers, calling card codes, and private conversations between hackers from the late 80s.
To make this data useful, the researcher pushed the extracted text into an Elasticsearch cluster. This allows for rapid cross-referencing. You can search for a specific handle or a phone number and instantly see every disk image in the corpus that references it.
# Simplified logic for extracting sector chains in Python
def get_next_sector(sector_data):
track = sector_data[0]
sector = sector_data[1]
return track, sector
# This allows for traversing the file regardless of physical location
Why This Matters for Modern Pentesters
You might ask why a modern security researcher should care about 1980s hardware. The answer is simple: the methodology. We are constantly dealing with proprietary file formats, custom binary protocols, and legacy systems that lack documentation. The skills required to reverse-engineer a 1541 disk drive—understanding sector-level data, identifying file system metadata, and normalizing non-standard encoding—are the exact same skills you need to analyze a modern embedded device or a custom industrial control system.
Furthermore, this research highlights the danger of "deleted" data. Even on modern systems, we often assume that a file is gone once it is unlinked from the file system. This project serves as a reminder that unless you are performing a cryptographic wipe, the data remains on the physical media, waiting for someone with the right tools to reconstruct it.
Defensive Considerations
Defenders should take note of the persistence of data in unallocated space. If you are decommissioning hardware, do not rely on standard formatting or file deletion. Use tools that perform full-disk overwrites or, better yet, physical destruction. If you are working in an environment where sensitive data is stored on legacy media, assume that any "deleted" file is still recoverable by a motivated adversary.
The next time you find yourself staring at a raw binary blob from an unknown device, don't just run strings and hope for the best. Map out the structure, identify the metadata, and build a parser. The most interesting vulnerabilities are often hidden in the way a system organizes its own data. If you want to dive deeper into the technical specifics, check out the official documentation for managing your own data indexing, and keep an eye on the Amber64 repository for updates on how to handle these legacy formats. There is a lot of history left to uncover, and the tools to find it are right in front of us.
Target Technologies
Up Next From This Conference

DisguiseDelimit: Exploiting Synology NAS with Delimiters and Novel Tricks

Browser Extension Clickjacking: One Click and Your Credit Card Is Stolen

Can't Stop the ROP: Automating Universal ASLR Bypasses for Windows
Similar Talks

Hacking Apple's USB-C Port Controller

Unmasking the Snitch Puck: The Creepy IoT Surveillance Tech in the School Bathroom

