Black Hat2023

VoBERT: Unstable Log Sequence Anomaly Detection

Black Hat1,284 views41:16about 2 years ago

This talk introduces VoBERT, a vocabulary-free BERT-based model designed to improve log sequence anomaly detection by overcoming the limitations of fixed-vocabulary models. The research addresses the challenge of 'unstable' log data, where minor variations in log text cause traditional models to misclassify benign sequences as anomalous. The authors demonstrate that by using character-level or sub-word tokenization and an attention-based architecture, the model can effectively handle unseen log keys and reduce false positives in security operations. The approach provides element-level explainability, allowing analysts to pinpoint specific log entries that contribute to an anomaly score.

Stop Relying on Static Log Patterns: Why Your Anomaly Detection is Failing

TLDR: Traditional log anomaly detection models often break when they encounter minor text variations, leading to high false-positive rates and missed threats. By shifting to a vocabulary-free architecture using BERT, researchers have developed a more robust way to identify anomalies in sequential log data. This approach provides element-level explainability, allowing security teams to isolate the specific log entries that trigger an alert rather than guessing at the intent of an entire sequence.

Security operations centers are drowning in noise. Every day, thousands of logs stream into SIEMs, and every day, analysts ignore the vast majority of them because the detection logic is too brittle to handle the reality of modern software. If a developer changes a single word in a log message—or if a timestamp format shifts slightly—the static regex or simple machine learning model you rely on treats that benign change as a potential security event. This is the "unstable log" problem, and it is the primary reason why your current monitoring strategy is likely failing to catch actual T1562-impair-defenses activity.

The Failure of Fixed-Vocabulary Models

Most log anomaly detection systems today rely on fixed-vocabulary models. They parse raw log messages into "log keys" by stripping out variables like IP addresses or user IDs. The model then learns the sequence of these keys. If it sees a sequence it hasn't encountered during training, it flags an anomaly.

The problem is that these models are essentially blind to anything outside their predefined dictionary. If a system update introduces a new log format, or if a service logs a slightly different string, the model hits an out-of-vocabulary error or misclassifies the sequence entirely. This forces analysts to constantly retrain models or manually tune thresholds, which is a losing battle in any environment with dynamic infrastructure.

Moving to Vocabulary-Free Detection

The research presented at Black Hat 2023 introduces a shift toward vocabulary-free models, specifically using a modified BERT architecture. Instead of relying on a fixed set of words or keys, this approach uses character-level or sub-word tokenization. By breaking logs down into smaller units, the model can process unseen log keys without breaking.

This is a significant improvement for security researchers because it allows the model to maintain context even when the log structure changes. The model uses an attention mechanism to understand the relationship between different parts of a log sequence. If you have a sequence like [User Login] -> [Password Entry] -> [Login Success], the model learns the dependencies between these events. When an attacker attempts to bypass authentication, the model detects that the sequence is broken, even if the specific log messages are slightly different from what it saw in the training data.

Element-Level Explainability

One of the most frustrating aspects of using machine learning for security is the "black box" problem. When an alert fires, the analyst needs to know why. Traditional models often provide a sequence-level score, which tells you that something is wrong with a group of logs, but it doesn't tell you which log is the culprit.

The VoBERT approach changes this by providing element-level prediction. Because the model processes logs at a granular level, it can assign an anomaly score to each individual log entry within a sequence. This allows an analyst to look at a sequence of fifty logs and see exactly which one caused the model to flag the event. This is the difference between an alert that says "something happened" and an alert that says "this specific log entry is anomalous because it deviates from the expected pattern."

Practical Application for Pentesters

For those of us conducting red team engagements or penetration tests, this research highlights a critical vulnerability in how organizations monitor their systems. If you are testing a client's detection capabilities, you can often bypass their anomaly detection by simply changing the format of your traffic or the structure of your logs.

If you are building detection tools, you should stop treating logs as static strings. Instead, look at the sequence and the relationship between events. If you are interested in the underlying mechanics of how these models are trained, the official documentation for BERT provides the foundation for understanding how these attention mechanisms work. For those looking to implement better logging, OWASP’s guidance on logging and monitoring remains the gold standard for what you should be capturing, regardless of the model you use.

Improving Defensive Posture

Defenders need to move away from rigid, rule-based systems that require constant maintenance. The future of log analysis lies in models that can generalize. If your current SIEM is struggling with false positives, it is time to evaluate whether your detection logic is too rigid. Start by auditing your most frequent alerts and identifying which ones are triggered by benign log variations. If you find that your team is spending more time tuning regex than investigating actual threats, you are paying the price for a model that doesn't understand the context of your data.

The goal is not to build a perfect model, but to build one that is stable enough to let your analysts focus on the logs that actually matter. Stop chasing the noise and start building systems that can distinguish between a software update and an adversary.

Talk Type

research presentation

Difficulty

advanced