Kuboid
Open Luck·Kuboid.in
Black Hat2023
Open in YouTube ↗

BTD: Unleashing the Power of Decompilation for x86 Deep Neural Network Executables

Black Hat953 views23:46about 2 years ago

This talk introduces BTD (Bin-To-DNN), a novel decompiler designed to recover high-level model specifications, including architectures and parameters, from compiled x86 deep neural network executables. The research demonstrates that despite the complexity of compiled DNN binaries, their execution path is deterministic, allowing for the use of trace-based symbolic execution to reverse-engineer the underlying model. This technique enables white-box attacks, such as model stealing and adversarial example generation, against black-box DNN executables. The authors provide a tool-based implementation that successfully reconstructs models from various deep learning compilers.

Reverse Engineering Deep Neural Networks: How BTD Turns Black-Box Binaries into White-Box Targets

TLDR: Researchers have released BTD (Bin-To-DNN), a decompiler that recovers high-level model architectures and parameters from compiled x86 binaries. By using trace-based symbolic execution, the tool bypasses the complexity of compiled code to reconstruct functional models, effectively turning black-box neural networks into white-box targets. This research demonstrates that intellectual property protection through binary compilation is insufficient against modern reverse engineering techniques.

Deep learning models are increasingly deployed on edge devices and IoT hardware, often packaged as compiled x86 binaries to protect proprietary architectures and weights. The industry assumption has long been that these binaries are sufficiently opaque to prevent meaningful reverse engineering. This research from Black Hat 2023 shatters that illusion. By treating the compiled binary as a deterministic execution path rather than a traditional software program, the authors have created a pipeline that extracts the exact model specification, including operator types, topology, and parameters.

The Mechanics of Model Recovery

Traditional reverse engineering tools like IDA Pro struggle with deep neural network (DNN) executables because the control flow is dominated by millions of floating-point arithmetic operations. However, the researchers identified a critical observation: DNN inference is inherently deterministic. Regardless of the input, the execution path through the model remains constant. This lack of path explosion makes these binaries perfect candidates for symbolic execution.

The BTD pipeline operates in three distinct phases:

  1. Operator Recovery: The tool maps assembly functions to specific DNN operators (e.g., Conv, ReLU, MatMul) using an LSTM model trained on assembly opcodes. By treating opcodes as language tokens and applying Byte Pair Encoding (BPE), the tool achieves high-accuracy classification of the underlying operations.
  2. Topology Recovery: Because compilers pass inputs and outputs as memory pointers through function arguments, the tool hooks these calls to track memory addresses. If the output of one operator serves as the input to another, the tool links them, successfully reconstructing the computational graph.
  3. Dimension and Parameter Recovery: This is the most technically impressive phase. The tool launches trace-based symbolic execution to infer kernel sizes, input/output channels, and strides. By instrumenting the binary to dump parameters during execution, it extracts the actual weights and biases used by the model.

From Black-Box to White-Box

For a pentester or bug bounty hunter, the implications are immediate. If you can recover the model architecture and weights, you are no longer performing black-box testing. You can now perform white-box attacks, such as generating adversarial examples that are specifically tuned to the model's internal structure.

During the talk, the researchers demonstrated this by using DeepInversion to attack a ResNet18 model that had been decompiled using BTD. The results were identical to attacking the original, uncompiled model. This means that any security control relying on the "secrecy" of the model architecture is effectively bypassed. If you encounter a target using a proprietary model on an edge device, you can now treat that binary as a source of truth for your exploit development.

The Compiler as a Variable

One of the most interesting aspects of this research is how it handles different deep learning compilers like Apache TVM, Glow, and NNFusion. While these compilers generate distinct low-level code, they all adhere to the same high-level operator semantics. The BTD tool leverages this consistency. Even when compilers apply aggressive optimizations, the symbolic constraints remain largely invariant.

The tool is not perfect; it occasionally fails when compiler optimizations are so extreme that they obscure the memory layout. However, these are edge cases. For the vast majority of production-grade models, the tool provides a near-perfect reconstruction.

Defensive Realities

Defenders must stop viewing binary compilation as a security boundary for AI models. If your security model relies on the attacker not knowing the architecture of your neural network, your model is already compromised.

Instead, focus on hardening the model itself. Techniques like model watermarking, input sanitization, and adversarial training are far more effective than trying to hide the binary. If you are a developer, assume that any model deployed to an endpoint is effectively public. If the model contains sensitive intellectual property or if its compromise leads to a critical failure, you need to implement robust access controls and monitoring at the inference API level rather than relying on binary obfuscation.

This research highlights a growing trend in AI security: the gap between high-level model design and low-level implementation is closing. As tools like BTD become more accessible, the barrier to entry for model extraction will continue to drop. Start testing your models against these types of extraction techniques now, before someone else does it for you.

Talk Type
research presentation
Difficulty
advanced
Has Demo Has Code Tool Released


Black Hat USA 2023

118 talks · 2023
Browse conference →
Premium Security Audit

We break your app before they do.

Professional penetration testing and vulnerability assessments by the Kuboid Secure Layer team. Securing your infrastructure at every layer.

Get in Touch
Official Security Partner
kuboid.in