Security BSides2025

Enhancing Secrets Detection in Cybersecurity with Small LMs

Security BSides San Francisco310 views31:135 months ago

This talk demonstrates a technique for improving automated secret detection in source code by leveraging fine-tuned Small Language Models (SLMs) instead of traditional regex-based patterns. The approach addresses the high false-positive rates and lack of context inherent in regex by using a multi-agent LLM pipeline to categorize and validate potential secrets. The researchers show that fine-tuning SLMs like Llama 3 and Qwen 2.5 using LoRA allows for efficient, low-cost, and privacy-preserving secret detection on standard CPU hardware. The method significantly improves recall and precision compared to traditional pattern-matching techniques.

Beyond Regex: Using Small Language Models to Find Hardcoded Secrets

TLDR: Traditional regex-based secret detection is failing due to high false-positive rates and a lack of contextual understanding. By fine-tuning Small Language Models like Llama 3 or Qwen 2.5 using LoRA, researchers can now perform high-precision, low-cost secret detection on standard CPU hardware. This approach allows security teams to move past simple pattern matching and start identifying non-obvious, high-risk credentials that were previously ignored.

Hardcoded secrets remain a primary entry point for attackers, yet our detection tooling is stuck in the past. Most organizations still rely on massive, brittle regex libraries to scan their repositories. These tools are essentially glorified grep commands. They trigger on every random string that happens to look like an API key, burying actual findings under a mountain of noise. When a tool flags five thousand potential secrets and four thousand nine hundred of them are just test strings or random noise, developers stop caring. That is exactly where the risk lives.

The Failure of Pattern Matching

Regex-based detection suffers from three fundamental flaws. First, it lacks context. A regex can identify a string that matches the format of an AWS access key, but it cannot tell if that string is a production credential or a dummy value in a README.md file. Second, maintenance is a nightmare. Every time a new service provider changes their key format, you have to write, test, and deploy a new regex. Third, it is easily bypassed. Developers often use variable names that look like secrets but are not, or they use non-standard formats that regex simply misses.

This is a classic OWASP A07:2021 – Identification and Authentication Failures scenario. When we fail to identify and rotate these secrets, we are essentially leaving the front door unlocked. Attackers know this. They are constantly scraping public repositories for these exact patterns, and they are getting better at filtering the noise that our own tools cannot handle.

Fine-Tuning for Precision

Recent research presented at BSides 2025 demonstrates that we can replace these legacy patterns with Small Language Models (SLMs). Unlike massive models that require expensive GPU clusters, SLMs like Llama 3 or Qwen 2.5 can be fine-tuned to act as specialized, high-precision classifiers.

The secret sauce here is LoRA (Low-Rank Adaptation), which allows us to fine-tune these models by training small adapter layers rather than updating the entire model. This drastically reduces the compute requirements. You do not need a rack of H100s to run this. You can train these models on a standard machine and run inference on a CPU.

The workflow involves a multi-agent pipeline. You start by filtering the data to remove obvious non-secrets. Then, you pass the remaining candidates to the fine-tuned SLM. The model does not just say "this is a secret." It provides a category, a confidence score, and an explanation. This context is the game changer. If the model flags a secret with "Low" confidence because it recognizes the surrounding code as a test suite, you can safely deprioritize it.

Practical Implementation

For those looking to integrate this into their own pipelines, the tooling has matured significantly. Using Unsloth for the fine-tuning process makes the training phase accessible even if you are not a machine learning engineer. Once the model is trained, you can use llama.cpp to handle the inference.

The performance gains are measurable. In the research, the fine-tuned models achieved significantly higher recall and precision than traditional regex methods. More importantly, the inference speed on a standard CPU was fast enough to handle large-scale scanning without needing a massive infrastructure budget.

# Example of running inference with llama.cpp
./main -m model-q4_k_m.gguf -p "Analyze this code for secrets: [CODE_SNIPPET]"

Real-World Impact for Pentesters

As a pentester, you should be looking for these "non-obvious" secrets. Stop relying solely on tools like trufflehog or gitleaks with default configurations. While those tools are great for finding the low-hanging fruit, they often miss the credentials that are hidden in plain sight because they do not match a specific, hardcoded pattern.

When you are on an engagement, look for configuration files, environment variables, and even hardcoded strings in utility scripts. If you find a string that looks like a credential but the automated tools are not flagging it, that is your signal to investigate further. The lack of a regex match does not mean the secret is invalid. It just means the developer was not following the standard format, which often makes those credentials less likely to be rotated.

Moving Forward

We need to stop treating secret detection as a static problem. The landscape of third-party integrations and cloud services is expanding, and our detection methods must evolve to keep pace. By adopting SLMs, we can finally start to filter out the noise and focus on the credentials that actually pose a risk to the organization.

Start by auditing your current secret detection coverage. Identify the high-noise areas where your developers are constantly complaining about false positives. That is your testing ground. Deploy a small, fine-tuned model to those specific areas and measure the reduction in noise. Once you see the difference in signal quality, you will never want to go back to managing thousands of lines of regex again. The future of secret detection is not in better patterns, but in better understanding.

Talk Type

research presentation

Difficulty

advanced