Black Hat2023

Artificial Intelligence in Cyber Defense

Black Hat1,241 views41:33over 2 years ago

This panel discussion explores the dual-use nature of generative AI, focusing on its potential for both defensive automation and offensive exploitation. The speakers analyze risks such as model poisoning, data exfiltration, and the generation of malicious content, highlighting the challenges of maintaining data integrity and security in AI-integrated environments. The discussion emphasizes the need for organizations to develop in-house expertise and critical thinking to mitigate the risks associated with rapid AI adoption.

The Hidden Risks of Generative AI in Enterprise Workflows

TLDR: Generative AI models are being rapidly integrated into enterprise environments, creating new attack surfaces through model poisoning and data exfiltration. Attackers can manipulate these models by injecting malicious training data or using specific prompts to leak sensitive information. Security teams must treat AI integrations as high-risk assets and implement strict data handling policies to prevent unauthorized information disclosure.

Rapid adoption of generative AI has outpaced the development of necessary security controls. While developers and business units are eager to integrate these models to automate tasks, they often ignore the fundamental security implications of feeding proprietary data into third-party systems. The core issue is that these models are not just static tools; they are dynamic, data-hungry systems that can be manipulated if the input pipeline is not properly secured.

The Mechanics of Model Poisoning

Model poisoning is the most significant threat to the integrity of AI-driven workflows. In a typical scenario, an attacker injects malicious data into the training set or the fine-tuning process of a model. Because many organizations are rushing to fine-tune pre-trained models on their own internal datasets, they create a perfect environment for this type of attack.

If an attacker can influence the data used for fine-tuning, they can introduce backdoors or bias that only trigger under specific conditions. For example, an attacker could poison a model used for code completion by submitting malicious snippets to a public repository that the model later scrapes for training. When a developer uses that model, it might suggest code that contains a buffer overflow or other exploitable patterns.

The technical challenge here is that these models are often treated as black boxes. Once a model is poisoned, detecting the malicious behavior is non-trivial because the output looks like standard, albeit flawed, code. Pentesters should focus on the data ingestion pipeline. If you are assessing an application that uses a custom-trained model, look for ways to influence the training data. Can you submit data that the model will process? If so, you have a potential vector for poisoning.

Data Exfiltration via Prompt Injection

Data exfiltration in the context of generative AI often occurs through prompt injection, where an attacker crafts inputs that trick the model into revealing information it was trained on or has access to. This is particularly dangerous when the model is connected to internal enterprise data.

Consider a scenario where an AI assistant is integrated into a company's internal documentation portal. An attacker with access to the assistant can craft a prompt designed to bypass the model's safety filters and force it to output sensitive information, such as API keys, internal project roadmaps, or PII.

The risk is compounded by the fact that many of these models are accessed via APIs. If the application does not implement strict input validation and output filtering, the model becomes a conduit for data leakage. For a pentester, this means testing the boundaries of the model's instructions. Use techniques to see if you can force the model to ignore its system prompt and act as an unauthorized data retrieval tool.

The Supply Chain of AI Models

The industry is moving toward a model where pre-trained systems are sold as a service. This creates a supply chain risk that is arguably more severe than traditional software dependencies. When an organization relies on a third-party model, they are essentially running code they did not write and cannot fully inspect.

If a vendor's model is compromised, every organization using that model is potentially vulnerable. This is not just a theoretical risk. We have already seen instances where third-party integrations have led to the accidental exposure of sensitive corporate information. The lack of transparency in how these models are trained and updated makes it nearly impossible for a security team to perform a traditional risk assessment.

Defensive Strategies for AI Integration

Defending against these threats requires a shift in mindset. You cannot rely on perimeter security to protect an AI model. Instead, you must focus on data governance.

First, never feed sensitive or proprietary data into a public-facing AI model. If you must use AI, ensure that the data is sanitized and that the model is running in a private, isolated environment. Second, implement robust monitoring for all interactions with the model. Look for anomalous patterns in the prompts being sent and the responses being generated. If a model suddenly starts outputting data that looks like internal configuration files, you have a clear indicator of a compromise.

Finally, organizations must invest in building internal expertise. Relying on vendors to handle security is a recipe for disaster. Your security team needs to understand the specific risks associated with machine learning, including the potential for adversarial attacks.

The current state of AI security is reminiscent of the early days of web security, where developers were more concerned with functionality than with the risks of user-supplied input. We are in a period of rapid, unchecked growth, and the inevitable security failures are already beginning to surface. If you are a researcher or a pentester, start looking at these AI integrations now. The vulnerabilities are there, and they are waiting to be found. Do not wait for a major breach to start treating these systems with the same level of scrutiny you apply to any other critical infrastructure.

Talk Type

panel

Difficulty

intermediate