DEF CON2025

Vibe School: Making Dumb Devices Smart with AI

DEFCONConference541 views22:276 months ago

This talk demonstrates the practical application of large language models (LLMs) as an assistive tool for reverse engineering and hardware hacking. The speaker uses Google Gemini to guide the process of reverse engineering a 433MHz weather station, including signal analysis, packet structure decoding, and firmware development for an ESP32 microcontroller. The presentation highlights the limitations of current AI models in handling complex, domain-specific hardware tasks and the necessity of human oversight in debugging AI-generated code.

Using LLMs to Reverse Engineer Proprietary Hardware Protocols

TLDR: This research explores using large language models like Google Gemini as an assistive co-pilot for reverse engineering 433MHz radio protocols and configuring custom firmware. While LLMs can accelerate signal analysis and code generation, they frequently hallucinate or struggle with complex, domain-specific hardware constraints. Pentesters should treat AI as a junior assistant that requires constant verification rather than an autonomous expert.

Hardware hacking often feels like a lonely, tedious grind. You are staring at a waterfall display in SDR++ for hours, trying to identify a preamble or a sync word in a proprietary 433MHz signal. When you finally capture the raw data, you then have to spend more time writing a decoder in C or Python. The recent talk by Katie Paxton-Fear at DEF CON 2025 highlights a shift in this workflow: using LLMs to bridge the gap between raw signal capture and functional firmware implementation.

The AI-Assisted Reverse Engineering Workflow

The core of this research involves using Google Gemini to navigate the entire lifecycle of a hardware project. The goal was to take a "dumb" 433MHz weather station, capture its transmission, and integrate that data into Home Assistant using an ESP32 microcontroller.

The process starts with signal acquisition. Using an RTL-SDR dongle and rtl_433, the researcher captured the raw radio frequency data. Instead of manually parsing the bitstream, the researcher fed the output and screenshots of the signal analysis directly into the LLM. The AI acted as a guide, suggesting the necessary tools and providing the initial logic for the decoder.

This is where the real-world utility for a pentester becomes clear. When you are on an engagement and encounter an unknown radio protocol, you rarely have the luxury of spending three days writing a custom parser. An LLM can ingest the output of a tool like rtl_433 and generate a boilerplate decoder in seconds. However, the research proves that the AI is only as good as the context you provide. If you do not understand the underlying hardware constraints—like the timing requirements of a specific ESP32 board—the AI will happily provide code that is syntactically correct but functionally useless.

The Reality of AI Hallucinations in Hardware

Technical limitations quickly emerge when you rely on LLMs for firmware development. The researcher found that the AI struggled significantly with ESPHome configuration files. Because ESPHome is highly specific and relies on complex YAML structures, the LLM often hallucinated non-existent configuration keys or failed to account for hardware-specific pin mappings.

One of the most critical takeaways is the "Heisenberg" nature of debugging AI-generated code on microcontrollers. In one instance, the code only worked when the researcher included Serial.print statements. The overhead of the serial output was inadvertently slowing down the execution loop, which allowed the microcontroller to keep up with the signal timing. Once the researcher removed the debug prints, the timing broke, and the decoder failed. The LLM could not diagnose this because it lacked the physical context of the hardware's real-time constraints.

Practical Application for Pentesters

For those of us in the field, this research suggests that LLMs are best used as a force multiplier for tasks you already understand. If you are a pentester, you can use an LLM to:

Generate boilerplate code for common protocols like SPI, I2C, or UART.
Translate obscure vendor documentation into actionable attack vectors.
Draft initial scripts for fuzzing or brute-forcing simple authentication mechanisms.

However, you must maintain a "human-in-the-loop" approach. If you are testing a critical system, you cannot blindly trust the AI to generate your payloads or decoders. The risk of a subtle logic error—like the timing issue mentioned above—is too high. During a red team engagement, a failed decoder could mean missing a critical data exfiltration point or triggering an alarm because your "optimized" code crashed the target device.

Defensive Considerations

From a defensive perspective, this research underscores why security through obscurity is failing. As LLMs become more capable at reverse engineering, the barrier to entry for analyzing proprietary radio protocols is dropping. If your organization relies on custom, undocumented radio protocols for physical security or sensor networks, assume that a motivated attacker can use AI to decode your traffic in a fraction of the time it took five years ago.

Defenders should focus on implementing encryption at the application layer rather than relying on the secrecy of the physical layer protocol. If an attacker can capture your signal with a cheap RTL-SDR, they can eventually decode it. If that signal is encrypted, the raw data becomes significantly less valuable.

Ultimately, this research is a reminder that we are entering an era where the bottleneck in hardware hacking is no longer the ability to write code, but the ability to verify it. The AI can write the decoder, but it cannot tell you why your signal is noisy or why your ESP32 is dropping packets. Keep your logic simple, verify your assumptions with a logic analyzer, and use the LLM to handle the grunt work of syntax and boilerplate. If you find yourself arguing with an LLM for five hours about a YAML file, you have already lost the efficiency battle.

Talk Type

talk

Difficulty

intermediate