Black Hat2023

BTD: Unleashing the Power of Decompilation for x86 Deep Neural Network Executables

Black Hat953 views23:46about 2 years ago

This talk introduces BTD (Bin-To-DNN), a novel decompiler designed to recover high-level model specifications, including architectures and parameters, from compiled x86 deep neural network executables. The research demonstrates that despite the complexity of compiled DNN binaries, their execution path is deterministic, allowing for the use of trace-based symbolic execution to reverse-engineer the underlying model. This technique enables white-box attacks, such as model stealing and adversarial example generation, against black-box DNN executables. The authors provide a tool-based implementation that successfully reconstructs models from various deep learning compilers.

Reverse Engineering Deep Neural Networks: How BTD Turns Black-Box Binaries into White-Box Targets

TLDR: Researchers have released BTD (Bin-To-DNN), a decompiler that recovers high-level model architectures and parameters from compiled x86 binaries. By using trace-based symbolic execution, the tool bypasses the complexity of compiled code to reconstruct functional models, effectively turning black-box neural networks into white-box targets. This research demonstrates that intellectual property protection through binary compilation is insufficient against modern reverse engineering techniques.

Deep learning models are increasingly deployed on edge devices and IoT hardware, often packaged as compiled x86 binaries to protect proprietary architectures and weights. The industry assumption has long been that these binaries are sufficiently opaque to prevent meaningful reverse engineering. This research from Black Hat 2023 shatters that illusion. By treating the compiled binary as a deterministic execution path rather than a traditional software program, the authors have created a pipeline that extracts the exact model specification, including operator types, topology, and parameters.

The Mechanics of Model Recovery

Traditional reverse engineering tools like IDA Pro struggle with deep neural network (DNN) executables because the control flow is dominated by millions of floating-point arithmetic operations. However, the researchers identified a critical observation: DNN inference is inherently deterministic. Regardless of the input, the execution path through the model remains constant. This lack of path explosion makes these binaries perfect candidates for symbolic execution.

The BTD pipeline operates in three distinct phases:

Operator Recovery: The tool maps assembly functions to specific DNN operators (e.g., Conv, ReLU, MatMul) using an LSTM model trained on assembly opcodes. By treating opcodes as language tokens and applying Byte Pair Encoding (BPE), the tool achieves high-accuracy classification of the underlying operations.
Topology Recovery: Because compilers pass inputs and outputs as memory pointers through function arguments, the tool hooks these calls to track memory addresses. If the output of one operator serves as the input to another, the tool links them, successfully reconstructing the computational graph.
Dimension and Parameter Recovery: This is the most technically impressive phase. The tool launches trace-based symbolic execution to infer kernel sizes, input/output channels, and strides. By instrumenting the binary to dump parameters during execution, it extracts the actual weights and biases used by the model.

From Black-Box to White-Box

For a pentester or bug bounty hunter, the implications are immediate. If you can recover the model architecture and weights, you are no longer performing black-box testing. You can now perform white-box attacks, such as generating adversarial examples that are specifically tuned to the model's internal structure.

During the talk, the researchers demonstrated this by using DeepInversion to attack a ResNet18 model that had been decompiled using BTD. The results were identical to attacking the original, uncompiled model. This means that any security control relying on the "secrecy" of the model architecture is effectively bypassed. If you encounter a target using a proprietary model on an edge device, you can now treat that binary as a source of truth for your exploit development.

The Compiler as a Variable

One of the most interesting aspects of this research is how it handles different deep learning compilers like Apache TVM, Glow, and NNFusion. While these compilers generate distinct low-level code, they all adhere to the same high-level operator semantics. The BTD tool leverages this consistency. Even when compilers apply aggressive optimizations, the symbolic constraints remain largely invariant.

The tool is not perfect; it occasionally fails when compiler optimizations are so extreme that they obscure the memory layout. However, these are edge cases. For the vast majority of production-grade models, the tool provides a near-perfect reconstruction.

Defensive Realities

Defenders must stop viewing binary compilation as a security boundary for AI models. If your security model relies on the attacker not knowing the architecture of your neural network, your model is already compromised.

Instead, focus on hardening the model itself. Techniques like model watermarking, input sanitization, and adversarial training are far more effective than trying to hide the binary. If you are a developer, assume that any model deployed to an endpoint is effectively public. If the model contains sensitive intellectual property or if its compromise leads to a critical failure, you need to implement robust access controls and monitoring at the inference API level rather than relying on binary obfuscation.

This research highlights a growing trend in AI security: the gap between high-level model design and low-level implementation is closing. As tools like BTD become more accessible, the barrier to entry for model extraction will continue to drop. Start testing your models against these types of extraction techniques now, before someone else does it for you.

Talk Type

research presentation

Difficulty

advanced

Black Hat USA 2023

118 talks · 2023

Browse conference →

Up Next From This Conference

Chained to Hit: Discovering New Vectors to Gain Remote and Root Access in SAP Enterprise Software

Black Hat2023

36:09

Chained to Hit: Discovering New Vectors to Gain Remote and Root Access in SAP Enterprise Software

research presentation

3K·over 2 years ago

Zero-Touch-Pwn: Abusing Zoom's Zero Touch Provisioning for Remote Attacks on Desk Phones

Black Hat2023

30:49

Zero-Touch-Pwn: Abusing Zoom's Zero Touch Provisioning for Remote Attacks on Desk Phones

research presentation

1.9K·over 2 years ago

ODDFuzz: Hunting Java Deserialization Gadget Chains via Structure-Aware Directed Greybox Fuzzing

Black Hat2023

33:46

ODDFuzz: Hunting Java Deserialization Gadget Chains via Structure-Aware Directed Greybox Fuzzing

research presentation

1.4K·over 2 years ago

Similar Talks

Hacking Apple's USB-C Port Controller

DEFCONConference

binary-ninjahackrf+47

548K·36:54·over 1 year ago

Unmasking the Snitch Puck: The Creepy IoT Surveillance Tech in the School Bathroom

DEFCONConference

arp-scannc+36

412K·40:04·6 months ago

Anyone Can Hack IoT: A Beginner's Guide to Hacking Your First IoT Device

DEFCONConference

multimetertigard+46

299K·54:12·over 1 year ago

BTD: Unleashing the Power of Decompilation for x86 Deep Neural Network Executables

Reverse Engineering Deep Neural Networks: How BTD Turns Black-Box Binaries into White-Box Targets

The Mechanics of Model Recovery

From Black-Box to White-Box

The Compiler as a Variable

Defensive Realities

Vulnerability Classes

Tools Used

Target Technologies

Attack Techniques