Security BSides2025

Let's Talk About the AI Apocalypse

Security BSides San Francisco237 views34:5610 months ago

This talk demonstrates how large language models (LLMs) can be weaponized to automate the entire lifecycle of a computer worm, from initial reconnaissance to post-exploitation and lateral movement. By leveraging LLMs to interpret scan results and generate exploit payloads, an attacker can chain multiple vulnerabilities without human intervention. The presentation highlights that current safety guardrails are easily bypassed through fine-tuning or prompt engineering, enabling the creation of autonomous, self-propagating malware. The speaker emphasizes that the barrier to entry for building sophisticated, destructive AI-driven worms is significantly lower than commonly perceived.

Automating the Full Attack Lifecycle with LLMs and Local Agents

TLDR: This research demonstrates that large language models can autonomously execute a full-scale computer worm lifecycle, from initial reconnaissance to lateral movement and credential theft. By integrating LLMs with local execution environments and custom supervisors, attackers can chain multiple vulnerabilities without human intervention. Security teams must move beyond simple guardrail bypasses and recognize that the barrier to entry for autonomous, self-propagating malware has effectively collapsed.

The industry has spent the last year obsessing over whether an LLM can write a functional exploit for a specific CVE. That is the wrong question. The real danger is not the model’s ability to write code, but its ability to act as a reasoning engine that chains disparate security failures into a cohesive, self-propagating attack. We are no longer looking at theoretical risks. We are looking at a future where the entire lifecycle of a worm—reconnaissance, exploitation, credential harvesting, and lateral movement—is orchestrated by an agent running on a standard consumer-grade GPU.

The Mechanics of an Autonomous Worm

A computer worm is fundamentally a loop: find a target, exploit it, harvest credentials, and repeat. Historically, this required a human to write the exploit, manage the C2 infrastructure, and manually pivot through the network. By replacing the human operator with an LLM agent, we can automate this entire chain.

The architecture for this is surprisingly simple. You need a supervisor script that provides the model with a toolset: an execution environment for running commands, a set of reconnaissance tools like nmap, and a library of known exploits. The model does not need to be a genius; it just needs to be able to parse the output of a tool, reason about the next logical step, and execute the corresponding command.

In a live demonstration, an agent was tasked with finding and compromising a host with Elasticsearch exposed. The model performed an initial scan, identified the service, and correctly mapped the version to a known vulnerability. It then generated the necessary API calls to exploit CVE-2014-3120, a remote code execution flaw. Once the shell was established, the agent did not stop. It immediately pivoted to post-exploitation, using TruffleHog to scan the filesystem for hardcoded credentials, which it then used to move laterally to the next target.

Bypassing Guardrails with Fine-Tuning

Most developers assume that if they ask a model to "hack a server," it will refuse. This is true for base models, but it is a trivial hurdle for an attacker. The safety guardrails in models like Llama are essentially a thin layer of fine-tuning that can be stripped away or bypassed.

The research shows that you do not need to retrain the entire model to remove these restrictions. By using a small, targeted dataset of synthetic "hacking" interactions, you can fine-tune a model to ignore its safety training. This is not expensive. A few dozen iterations on a consumer GPU are enough to turn a "helpful assistant" into an agent that will happily provide detailed instructions on how to exploit CVE-2015-5531 or any other vulnerability in its training data.

The "refusal" behavior is mediated by a single direction in the model’s vector space. By identifying and removing this vector, you can effectively disable the model’s moral compass. Once the refusal mechanism is gone, the model treats a request to build a malicious payload with the same level of technical rigor as a request to write a Python script for a web scraper.

The Reality of Lateral Movement

For a pentester, the most compelling part of this research is the agent’s ability to handle the "unknown." When the model encounters a host it hasn't seen before, it doesn't panic. It uses its internal knowledge base to perform reconnaissance, identifies the service, and searches for a matching exploit.

If the first exploit fails, the agent doesn't just give up. It analyzes the error message, adjusts its payload, and tries again. This iterative process is exactly what a human pentester does during an engagement, but the agent does it at machine speed. When you combine this with the ability to dump credentials from memory or configuration files, you have a system that can map and compromise an entire internal network in minutes.

Defensive Realities

Defending against this requires a shift in how we view internal security. If your internal network is flat, you are already compromised. The only effective defense against an autonomous agent is to make the cost of lateral movement prohibitively high.

Zero Trust Architecture: Assume every host is already infected. If a machine can talk to every other machine on the network, an autonomous worm will find a way to move.
Credential Hygiene: If your developers are still committing AWS keys or SSH private keys to internal repositories, your network is a playground for these agents. Use tools to scan for secrets before they hit the disk.
Egress Filtering: An autonomous agent needs to communicate. If your servers can reach out to the public internet to download new tools or reach a C2 server, you are failing at basic egress control.

We are entering an era where the speed of exploitation will outpace the speed of human response. The tools to build these agents are already in the hands of every researcher and developer. The question is no longer if this will happen, but how quickly your organization can adapt to a threat that doesn't sleep, doesn't get tired, and doesn't make the kind of sloppy mistakes that humans do. Start by assuming the worst and hardening your internal environment accordingly.

Talk Type

talk

Difficulty

advanced