Black Hat2025

Standing on the Shoulders of Giants: Deobfuscating WebAssembly using LLVM

Black Hat1,076 views36:367 months ago

This talk demonstrates a novel approach to deobfuscating WebAssembly (Wasm) binaries by lifting them into LLVM IR and leveraging compiler-based optimization passes. The technique addresses the challenges of Wasm obfuscation, such as control-flow flattening and opaque predicates, by using a custom tool called Squanchy to orchestrate existing tools like Wasm2C, SIMBA++, and SOUPER. This method effectively normalizes and simplifies obfuscated Wasm code, enabling successful reverse engineering and analysis of malicious Wasm samples. The presentation includes a practical demonstration of deobfuscating a real-world Wasm-based cryptocurrency miner.

Deobfuscating WebAssembly Binaries by Lifting to LLVM IR

TLDR: WebAssembly binaries are increasingly used to hide malicious logic like cryptominers, but standard reverse engineering tools often struggle with control-flow obfuscation. By lifting obfuscated Wasm into LLVM IR, researchers can apply powerful compiler optimization passes to normalize and simplify the code. This approach, demonstrated with the new Squanchy tool, allows for the effective recovery of original logic from heavily protected binaries.

WebAssembly has moved far beyond its initial role as a high-performance sandbox for browser-based games. Today, it is a primary target for developers looking to protect intellectual property and, more concerningly, for attackers looking to hide malicious payloads. When you encounter a Wasm binary in a bug bounty program or a red team engagement, you are rarely looking at clean, readable code. You are likely staring at a mess of control-flow flattening, opaque predicates, and junk instructions designed to break your disassembler.

Standard tools like Ghidra or IDA Pro often fail to provide a coherent view of these binaries because they lack the context of the Wasm virtual machine. The research presented at Black Hat 2025 changes this dynamic by shifting the focus from static disassembly to compiler-based lifting.

The Mechanics of Wasm Lifting

The core problem with Wasm obfuscation is that it targets the human analyst. By flattening the control flow, an attacker turns a simple if-else block into a complex state machine that is nearly impossible to follow manually. The solution is to treat the Wasm binary not as a static file to be read, but as an intermediate representation that can be transformed.

By lifting the Wasm binary into LLVM IR, you gain access to the same optimization passes that modern compilers use to make code faster and smaller. The process starts by using Wasm2C to convert the binary into C code, which acts as a bridge. From there, the Squanchy tool orchestrates the lifting process. It injects runtime helpers that handle the Wasm virtual machine state, such as memory access and global variables, which are otherwise lost during the conversion.

Once the code is in LLVM IR, you can apply specific passes to strip away the obfuscation. For example, if an attacker uses instruction substitution to replace a simple addition with a complex sequence of bitwise operations, the LLVM optimizer can often reduce that sequence back to the original operation.

Solving Opaque Predicates with Super-Optimization

One of the most effective obfuscation techniques is the use of opaque predicates—conditional branches where the outcome is always the same, but the logic is designed to look unpredictable. These are the bane of any reverse engineer.

The research highlights the use of Souper, a super-optimizer for LLVM IR. Unlike standard optimizers that rely on a fixed set of rules, Souper uses synthesis to find optimizations. It treats the obfuscated code as a constraint satisfaction problem and uses an SMT solver to prove that a simpler, equivalent expression exists. When you chain this with SIMBA++, which specializes in detecting and simplifying mixed boolean-arithmetic expressions, you can effectively "solve" the obfuscation.

During the live demonstration, the researchers took a Wasm binary that had been mutated 3,000 times. The resulting code was bloated and unreadable. After running it through the lifting and optimization pipeline, the tool successfully recovered the original, clean function logic. The transformation was not just cosmetic; it reduced the instruction count from hundreds of lines down to a handful of readable operations.

Real-World Applicability for Pentesters

You will encounter this in the wild when analyzing web-based applications that use Wasm for security-sensitive tasks, such as hCaptcha. These implementations often use obfuscation to prevent automated solving or to hide the underlying verification logic. If you are tasked with auditing such an application, you cannot rely on manual analysis.

The workflow for a pentester is straightforward:

Extract the .wasm file from the web application.
Use wasm2c to generate the C representation.
Run the Squanchy pipeline to lift and normalize the code.
Apply Souper and SIMBA++ to simplify the control flow.
Recompile the resulting IR to a native binary for analysis in your preferred debugger.

This approach is also highly effective against Cryptonight miners, which frequently use Wasm to hide their mining loops. By normalizing the binary, you can quickly identify the core mining logic and determine if the application is performing unauthorized resource hijacking.

Defensive Considerations

Defenders should recognize that obfuscation is not a permanent barrier. If you are relying on Wasm obfuscation to protect sensitive business logic or API keys, you are operating under a false sense of security. The tools to reverse these protections are becoming more accessible and automated. Instead of relying on code-level obfuscation, focus on server-side validation and ensuring that your Wasm modules do not contain secrets that, if recovered, would compromise your infrastructure.

The shift toward compiler-based analysis is a significant step forward for the research community. By leveraging the same infrastructure that builds our software, we can now dismantle the protections that were previously considered "too hard" to break. If you are dealing with protected Wasm, stop trying to read the assembly and start lifting it into a form that your compiler can understand. The results are often surprising.

Talk Type

research presentation

Difficulty

advanced