Black Hat2023

Risks from AI Risk Management

Black Hat1,154 views33:45about 2 years ago

This talk explores the inherent risks and limitations of current AI risk management frameworks, specifically focusing on the phenomenon of overtrust in automated systems. It demonstrates how users blindly follow instructions from AI, even when the system is clearly malfunctioning or providing dangerous prompts. The presentation highlights the tension between security, explainability, and robustness, arguing that current standards are often vague and susceptible to manipulation. It concludes that real change requires a shift in organizational culture rather than reliance on superficial compliance.

Why Your AI Risk Management Framework Is Just Security Theater

TLDR: Current AI risk management frameworks often prioritize compliance over technical reality, creating a false sense of security through vague standards. Research shows that users exhibit dangerous levels of overtrust in AI systems, while adversarial attacks like model inversion and data poisoning remain difficult to mitigate in production. Security researchers and pentesters should look past these high-level frameworks and focus on the specific, messy technical components of the machine learning pipeline.

Security teams are currently obsessed with building "robust" AI risk management programs. We see the same pattern every time a new technology hits the mainstream: organizations rush to adopt high-level frameworks, check a few boxes, and assume they have mitigated the risk. The reality, as demonstrated by recent research into AI system failures, is that these frameworks are often disconnected from the actual technical vulnerabilities that matter to an attacker.

The core problem is overtrust. When users interact with an AI system, they tend to treat its output as gospel, even when the system is clearly malfunctioning or being manipulated. In one study, participants were asked to interact with a robot that was intentionally programmed to be incompetent. Despite the robot performing nonsensical tasks like playing the wrong music or performing erratic movements, every single participant followed its instructions to unlock a computer and disclose sensitive information. When we design security controls for AI, we are often ignoring the human element—the tendency to trust the machine simply because it is a machine.

The Illusion of Compliance

Current standards, such as the NIST AI Risk Management Framework, are well-intentioned, but they often fail to account for the technical complexity of modern machine learning pipelines. These frameworks treat an AI system as a monolithic black box. In reality, an AI system is an "ML amoeba"—a sprawling, interconnected mess of data collection, feature extraction, model training, and serving infrastructure.

If you are a pentester, you know that the model itself is often the smallest part of the attack surface. Research from Google has shown that the actual machine learning code often accounts for less than 5% of the total system. The rest is the supporting infrastructure: containers, orchestrators, hypervisors, and hardware accelerators. When a framework asks you to "secure your AI," it rarely gives you actionable guidance on how to secure the underlying container orchestration or the data pipeline.

Adversarial Attacks Are Not Just Theoretical

The technical reality is that adversarial attacks are becoming increasingly sophisticated. Take model inversion, for example. An attacker with nothing more than API access can query a model and, by observing the responses, reconstruct private training data. This is not a bug you can patch with a simple configuration change; it is an inherent property of how many models learn and store information.

Similarly, data poisoning remains a critical threat. If an attacker can influence the data used to train or fine-tune a model, they can introduce backdoors that are nearly impossible to detect through standard code reviews. When you are testing these systems, do not just look for standard web vulnerabilities. Look for the ways in which the model's reliance on external data can be weaponized.

The Trade-off Between Security and Explainability

One of the most dangerous myths in AI security is that you can have it all: a model that is perfectly robust, perfectly explainable, and perfectly private. The literature tells a different story. There is an inherent tension between these properties. If you make a model more robust against adversarial examples, you often make it less explainable. If you try to make it more explainable, you might inadvertently leak information that helps an attacker game the system.

This is where the "we" in "we need to secure our AI" becomes problematic. Who is making these trade-offs? In many organizations, it is not the security team. It is the product manager or the data scientist, neither of whom may have a background in offensive security. When you are conducting a red team engagement, your goal should be to expose these trade-offs. Show the stakeholders that their "robust" model is actually a sieve for sensitive data, or that their "explainable" model is providing misleading justifications that can be easily manipulated.

What Pentesters Should Do Next

Stop waiting for the industry to settle on a "standard" for AI security. It is not coming, and even if it does, it will likely be a product of the same corporate interests that prioritize speed over safety. Instead, focus on the technical reality of the systems you are testing.

Map out the entire pipeline. Identify where the data comes from, how it is processed, and who has access to the model's API. Look for the OWASP Top 10 for Machine Learning and start testing against those categories. If you are testing a system that uses PyTorch, look for insecure dependencies in the supply chain—a very real, very recent threat that has already been exploited in the wild.

Real change in AI security will not come from a new compliance checklist. It will come from security researchers who understand that the most effective way to secure these systems is to treat them like any other piece of software: with healthy skepticism, a deep understanding of the underlying architecture, and a willingness to break things that everyone else assumes are safe.

Talk Type

research presentation

Difficulty

intermediate