Black Hat2023

How we taught ChatGPT-4 to break Mbed TLS and wolfSSL with side-channel attacks

Black Hat3,107 views39:25about 2 years ago

This talk demonstrates the use of ChatGPT-4 to automate correlation power analysis (CPA) attacks against cryptographic implementations. The researcher integrates LLM-based automation with a custom cloud-based hardware-in-the-loop infrastructure to perform side-channel analysis on Mbed TLS and wolfSSL. The presentation highlights the effectiveness of LLMs in generating attack code, performing self-correction during build failures, and interpreting side-channel metrics. The approach provides a scalable, automated method for testing hardware security against power-based side-channel leakage.

Automating Side-Channel Attacks Against AES with ChatGPT-4

TLDR: This research demonstrates how to integrate LLMs into a hardware-in-the-loop pipeline to automate correlation power analysis (CPA) attacks. By using ChatGPT-4 to generate, debug, and refine attack code against embedded targets like the STM32F3, researchers can significantly lower the barrier to entry for complex side-channel analysis. The approach proves that LLMs can effectively handle the "heavy lifting" of interfacing with hardware, making sophisticated cryptographic attacks more accessible and repeatable.

Hardware security has long been the domain of specialists with expensive equipment and years of experience in signal processing. For most penetration testers, a side-channel attack like correlation power analysis (CPA) feels like a black box—something you read about in academic papers but rarely attempt during a standard engagement. That is changing. The recent research presented at Black Hat 2023 shows that we can now use LLMs to bridge the gap between high-level code and low-level hardware interaction, effectively turning a "clever teenager" with a $100 oscilloscope into a threat capable of extracting cryptographic keys from embedded devices.

The Mechanics of the Automated Pipeline

The core of this research is a cloud-based, hardware-in-the-loop infrastructure that allows for remote side-channel analysis. By using QEMU and KVM for virtualization, the researchers created a standard environment where they could deploy firmware to targets like the STM32F3. The setup uses PCIe passthrough to connect physical tools—specifically the ChipWhisperer and ChipShouter—directly to the virtualized environment.

This setup allows an LLM to act as the orchestrator. Instead of manually writing scripts to capture power traces, the researcher provides the LLM with an OpenAPI definition of the infrastructure's endpoints. The LLM then handles the logic: it generates the necessary Python code to trigger the encryption, captures the power traces via the PicoScope, and performs the statistical correlation to recover the key.

LLMs as Pentesters: Code Generation and Self-Correction

What makes this research compelling is not just the automation, but the LLM's ability to handle the "messy" parts of security research. During the demo, the researcher showed ChatGPT-4 encountering a build error while trying to compile a target implementation of Mbed TLS. The LLM didn't just fail; it analyzed the error message, identified a naming conflict with an existing function, and automatically renamed the function to a custom asynchronous script to resolve the build failure.

This self-correcting loop is a game-changer for researchers. When you are dealing with AES implementations in libraries like wolfSSL, the boilerplate code is often the biggest hurdle. The LLM can generate the necessary wrappers and test harnesses faster than a human, allowing the researcher to focus on the actual leakage model.

Real-World Applicability and Risk

For a pentester, this means that side-channel analysis is no longer restricted to lab-only environments. If you are assessing IoT devices, smart meters, or any hardware that performs cryptographic operations using common microcontrollers, the barrier to performing a CPA attack has dropped significantly. The "cost of entry" is now just a laptop and a modest investment in hardware.

The impact is clear: if your device uses a non-secure microcontroller without hardware-level countermeasures against power analysis, it is vulnerable. The research highlights that even if you use industry-standard libraries, the underlying hardware implementation might still leak the key through power consumption patterns. This is a classic OWASP concern regarding insecure design; if the hardware itself is not hardened, the software layer cannot fully protect the secrets.

Defensive Considerations

Defending against these attacks requires moving beyond software-only security. If your threat model includes attackers with physical access, you must consider hardware-level protections. This includes using secure elements that are specifically hardened against side-channel analysis, implementing power-consumption masking, or using jitter-based clock randomization to make trace alignment significantly harder for an attacker.

The era of "security by obscurity" in hardware is effectively over. When an LLM can be taught to navigate the complexities of a build system and perform statistical analysis on power traces, the time it takes to move from "device in hand" to "key extracted" shrinks from weeks to hours. As researchers, we should be using these tools to stress-test our own hardware implementations before they ever reach the field. If we can automate the attack, we can certainly automate the verification. The next time you are looking at an embedded target, ask yourself if you are relying on the difficulty of the attack to keep your keys safe—because that defense is no longer holding up.

Talk Type

research presentation

Difficulty

advanced