Black Hat2024

We R in a right pickle with all these insecure serialization formats

Black Hat1,682 views40:05about 1 year ago

This talk demonstrates advanced exploitation techniques against Python's Pickle and R's RDS serialization formats, focusing on achieving remote code execution through insecure deserialization. The researchers analyze the internal virtual machine architectures of these formats to bypass common security filters and anti-malware scanners. They introduce custom disassemblers, patchers, and compilers to craft bespoke, obfuscated payloads that evade standard detection mechanisms. The presentation concludes with the release of two open-source tools, HiddenPickle and HiddenPromise, to facilitate further research and exploitation of these serialization vulnerabilities.

Bypassing Deserialization Filters: A Deep Dive into Pickle and R(DS) Exploitation

TLDR: Modern deserialization filters often rely on blocklists that are trivial to bypass by leveraging the internal virtual machine architectures of Python Pickle and R(DS). By crafting bespoke payloads using custom disassemblers and compilers, researchers can achieve reliable remote code execution even when standard security tools are present. Security teams must move away from blocklisting and instead implement strict, schema-based validation or avoid deserializing untrusted data entirely.

Deserialization vulnerabilities are far from new, but the way we approach them is often stuck in a cycle of cat-and-mouse. Security tools frequently attempt to block known dangerous methods like eval or os.system within serialized streams. This approach is fundamentally flawed because it ignores the underlying execution logic of the serialization format itself. If you can control the bytecode, you control the execution flow, and you can easily bypass these superficial filters.

The Mechanics of Pickle Manipulation

Python’s pickle module is essentially a stack-based virtual machine. When you deserialize a stream, you are executing bytecode instructions that manipulate a stack and a memo (a storage area for objects). The danger lies in the opcodes that allow for arbitrary code execution, such as GLOBAL, INST, and REDUCE.

Most security scanners look for specific strings or opcodes in the stream. However, because pickle is a Turing-complete language, you can achieve the same result through multiple paths. If a scanner blocks os.system, an attacker can simply use a different method to reach the same outcome, such as using ctypes to load kernel32.dll and then calling VirtualAlloc, WriteProcessMemory, and CreateThread to inject shellcode directly into the process.

The researchers behind HiddenPickle demonstrated that by building a custom assembler and disassembler, they could craft payloads that never trigger these simple pattern-matching scanners. The key is to understand that the pickle VM doesn't care about your blocklist; it only cares about the validity of the bytecode. By manipulating the stack and memo, you can construct complex objects that execute arbitrary code upon instantiation or reduction, effectively hiding your intent from static analysis tools.

Exploiting R(DS) and the Promise Object

R’s serialization format, RDS, is often overlooked in security research, which makes it a prime target. Like pickle, it uses a virtual machine to reconstruct objects. The researchers identified a critical component in R called the "promise object."

In R, lazy evaluation means that expressions are not evaluated until they are actually accessed. A promise object stores the expression, the environment in which it should be evaluated, and the eventual value. If an attacker can inject a malicious promise object into an RDS file, the code will execute the moment that object is accessed by the application.

The vulnerability is particularly potent because the R virtual machine is recursive. When R_Unserialize is called, it processes the stream and can trigger the evaluation of these promises. The researchers found that by crafting specific RDS files, they could force the R interpreter to execute arbitrary code. They released HiddenPromise to help researchers explore these structures. The tool allows for the disassembly of RDS files and the injection of malicious code into existing RDB files, which are the workhorses for loading R packages.

Real-World Impact and Engagement Strategy

During a penetration test, you are likely to encounter these formats in machine learning pipelines, data science platforms, or any application that persists user-provided objects. If you see a file with a .pkl or .rds extension, you are looking at a potential entry point.

Do not waste time trying to find a "magic" payload that works everywhere. Instead, focus on the application's logic. How is the data being deserialized? Is it coming from an untrusted source? If you can influence the serialized stream, you don't need a complex exploit. You need to understand the environment. If the application is running on Windows, target the kernel APIs. If it is running on Linux, target common system utilities.

The impact of these vulnerabilities is almost always full remote code execution. In a data science environment, this could mean compromising the entire training pipeline, exfiltrating sensitive datasets, or pivoting into the internal network.

Moving Beyond Blocklists

Defenders must stop relying on blocklists to secure deserialization. If your security posture depends on filtering eval or os.system, you are already compromised. These filters are easily bypassed by anyone with a basic understanding of the underlying virtual machine.

The only effective defense is to avoid deserializing untrusted data entirely. If you must deserialize, use a format that does not support arbitrary code execution, such as JSON or Protobuf, and enforce a strict schema. If you are forced to use a format like pickle or RDS, you must implement cryptographic signing to ensure the integrity of the data. If the signature doesn't match, the data should never reach the deserialization function.

For those interested in the OWASP guidance on this topic, the focus remains on integrity and avoiding the execution of untrusted code. Research like this serves as a stark reminder that our tools are only as good as our understanding of the underlying technology. If you are not looking at the bytecode, you are not looking at the real risk.

Talk Type

research presentation

Difficulty

advanced