Security BSides2025

Everyday AI: Leveraging LLMs for Simple, Effective Security Automation

Security BSides San Francisco120 views18:405 months ago

This talk demonstrates the integration of Large Language Models (LLMs) into security operations to automate complex, non-deterministic tasks such as access request evaluation and IAM policy management. The speakers showcase how LLMs can be used to parse natural language requests, map them to appropriate AWS IAM roles, and generate pull requests for infrastructure-as-code updates. The presentation emphasizes the importance of maintaining a human-in-the-loop for final decision-making while using LLMs to handle the heavy lifting of data analysis and policy generation. A practical toolkit for implementing these workflows is provided.

Automating IAM Policy Cleanup with LLMs and Infrastructure as Code

TLDR: Security teams are drowning in manual access requests and bloated IAM policies that defy traditional automation. This talk demonstrates how to use LLMs to parse natural language access requests, map them to specific AWS roles, and generate automated pull requests for infrastructure-as-code updates. By keeping a human-in-the-loop for the final merge, you can safely offload the heavy lifting of policy analysis to AI without sacrificing control.

Security teams often hit a wall when trying to automate identity and access management. The sheer volume of requests, combined with the complexity of mapping natural language intent to specific AWS IAM permissions, usually forces teams to rely on manual, error-prone human review. When you have thousands of employees and millions of permissioned objects, the "least privilege" model becomes a theoretical ideal rather than an operational reality.

The Problem with Non-Deterministic Access Requests

Most automated security tooling relies on rigid, deterministic logic. If a user requests access to a bucket, the system checks a database, verifies the user's group, and either approves or denies the request. This works for simple scenarios, but it fails when the request is ambiguous or requires context that isn't stored in a structured format.

Consider the common request: "I need to read from the orders table." An automated system needs to know which database, which environment, and whether the user is even authorized to touch that data. Traditional automation struggles with this because it cannot interpret the intent behind the request. This is where LLMs change the game. Instead of building complex regex-based parsers or massive decision trees, you can feed the natural language request into an LLM, provide it with the necessary context, and let it determine the appropriate IAM role.

Implementing LLM-Driven Access Control

The approach demonstrated at BSides SF 2025 involves using an LLM as a reasoning engine within a SOAR workflow. The process is straightforward:

Capture: A user submits a request via a platform like Slack or a custom portal.
Contextualize: The system pulls relevant metadata, such as the user's current role and the target resource.
Evaluate: The LLM analyzes the request against the organization's security policy.
Act: If the request is valid, the system generates a Terraform pull request to update the IAM policy.

The critical piece here is the "human-in-the-loop" requirement. You should never allow an LLM to apply infrastructure changes directly to production. Instead, the LLM generates a branch and a pull request. A human engineer then reviews the proposed changes, ensuring the LLM didn't hallucinate or grant excessive permissions.

To prevent sensitive data from leaking into the LLM, you must sanitize the input. Using a tool like Microsoft Presidio allows you to identify and redact PII, such as email addresses or account identifiers, before the data is sent to the model. This ensures that the LLM receives only the context it needs to make a decision, without exposing your internal user data.

Handling Hallucinations and Prompt Injection

One of the biggest risks when integrating LLMs into security workflows is prompt injection. If a user can manipulate the input to the LLM, they might be able to trick it into granting themselves administrative access. To mitigate this, treat the output of the LLM as untrusted.

The system should only accept input from known, authenticated applications. By using the LLM as a "glue" between systems—rather than a direct interface for users—you significantly reduce the attack surface. Furthermore, you should implement sanity checks on the LLM's output. If the model suggests a policy change that deviates significantly from established patterns, the system should flag it for manual review or reject it entirely.

Real-World Impact for Pentesters

For those of us on the offensive side, this research highlights a shift in how we should approach privilege escalation. If you are testing an organization that uses LLM-driven automation, your target is no longer just the IAM policy itself, but the logic that generates it.

If you can influence the natural language input that the LLM processes, you might be able to trick the system into generating a pull request that grants you elevated access. During an engagement, look for where the organization is using LLMs to parse user input. If you can find a way to inject malicious intent into that input, you might be able to bypass traditional access controls.

Moving Toward Scalable Security

The key takeaway is that we need to stop treating security automation as a binary "yes or no" problem. By leveraging LLMs to handle the non-deterministic parts of the process, we can free up human engineers to focus on high-value tasks. The ID Toolbox provided by the researchers is a great starting point for anyone looking to implement these workflows.

Start by identifying the most repetitive, low-risk tasks in your environment—like access requests for non-sensitive resources—and build an LLM-driven workflow around them. As you gain confidence in the model's performance and your ability to sanitize inputs, you can gradually expand the scope. The goal is not to replace human judgment, but to augment it, allowing us to maintain a high standard of security even as our infrastructure continues to grow in complexity.

Talk Type

talk

Difficulty

intermediate