DEF CON2024

I've Got 99 Problems But Prompt Injection Ain't

DEFCONConference516 views30:43over 1 year ago

This talk explores the evolving landscape of AI and machine learning security, focusing on vulnerabilities inherent in AI systems and their supporting infrastructure. It details specific attack vectors including data poisoning, model evasion, and supply chain attacks targeting pre-trained models. The presentation highlights the risks of insecure model serialization formats like pickle and the potential for arbitrary code execution in AI development frameworks. It concludes with a call for improved vulnerability disclosure processes and stronger security practices in the AI/ML ecosystem.

Why Your AI Model Serialization is a Remote Code Execution Goldmine

TLDR: Machine learning models are often treated as static data, but they are frequently serialized using insecure formats like pickle that allow for arbitrary code execution upon loading. Researchers demonstrated that common model repositories like Hugging Face can be weaponized to deliver malicious payloads to unsuspecting developers. Pentesters should prioritize auditing model loading functions and serialization configurations in any AI-integrated application.

Security researchers often treat machine learning models as black boxes, focusing on adversarial inputs or prompt injection while ignoring the underlying infrastructure. This oversight is dangerous. When you load a model, you are often executing code that was serialized by someone else. If that serialization format is inherently insecure, you are essentially running a remote code execution exploit against your own infrastructure.

The Serialization Trap

Most developers and data scientists prioritize convenience over security when moving models between environments. They use serialization libraries to convert complex model objects into byte streams for storage and transmission. The problem is that many of these libraries, most notably Python’s pickle, are designed to reconstruct objects by executing arbitrary code.

If an attacker can replace a legitimate model file with a backdoored version, they gain execution on the host machine the moment the application calls a load function. This is not a theoretical risk. The research presented at DEF CON 2024 highlighted how platforms like Hugging Face host hundreds of thousands of models, many of which are downloaded millions of times. If a repository is compromised, or if a user is tricked into downloading a malicious model, the impact is immediate and total system compromise.

Anatomy of an Exploit

The vulnerability lies in how libraries like pickle or numpy.load handle untrusted data. When allow_pickle=True is set, the library does not just read data; it reconstructs the object graph, which can include calls to system-level functions.

Consider a simple payload designed to execute a command upon deserialization:

import pickle
import os

class MaliciousModel:
    def __reduce__(self):
        return (os.system, ('whoami',))

payload = pickle.dumps(MaliciousModel())
# If this is loaded by the victim, 'whoami' executes

This is a classic A03:2021-Injection scenario, but applied to the data pipeline rather than a web form. During the talk, researchers demonstrated how they could use Skops, a library intended to securely handle scikit-learn models, to bypass intended restrictions. By crafting a JSON-based model file that included an operator function node, they could trigger an eval() call during the reconstruction process. While Skops has since been patched, the underlying issue remains prevalent in older or custom-built model loaders.

Path Traversal in Model Formats

Beyond simple code execution, model formats like ONNX present their own unique attack surface. ONNX models are often used for cross-platform compatibility, but their structure allows for complex data references. Researchers found that by manipulating the internal graph of an ONNX file, they could perform path traversal attacks.

If an application loads an ONNX model and uses a file path provided within that model to fetch weights or biases, an attacker can point that path to sensitive files on the host system, such as /etc/passwd. When the model loader attempts to read the "weights" from that location, it inadvertently leaks the contents of the file back to the attacker or crashes the service in a way that reveals system information.

Testing Your AI Pipeline

For a pentester, the engagement should start by mapping every point where the application interacts with external model files. Do not just look at the API endpoints. Look at the CI/CD pipeline, the model registry, and the storage buckets where models are pulled from.

If you find an application that pulls models from a public hub, test the loading logic. Can you provide a custom model file? If you can, try to trigger a simple callback or file read. If the application uses torch.load or numpy.load, check if the weights_only parameter is set to True. If it is not, you have a high-probability path to code execution.

Defensive Hardening

Defenders must stop treating model files as trusted assets. The first step is to move away from insecure formats like pickle entirely. Use safer, data-only formats like Safetensors whenever possible. These formats are designed to store tensors without the ability to execute arbitrary code.

Furthermore, implement strict access controls on your model storage. If your application must pull models from a hub, use a proxy or a local cache that performs integrity checks and scans for known malicious patterns before the model ever touches your production environment. Finally, ensure that your model loading code is running with the least privilege necessary. If the model loader does not need network access or shell execution permissions, use container security policies to strip those capabilities away.

The AI ecosystem is moving fast, and security is often an afterthought. As researchers, we need to treat these models with the same skepticism we apply to any other user-supplied input. If you are not auditing your model loading code, you are leaving the door wide open for anyone who knows how to craft a malicious tensor.

Talk Type

talk

Difficulty

intermediate