Security BSides2025

You Scored 46

Security BSides London66 views45:45about 1 month ago

This talk provides a critical analysis of the metrics and reporting practices used by Security Operations Centers (SOCs) and Managed Security Service Providers (MSSPs). It highlights how common industry metrics like Mean Time to Detect (MTTD) and Mean Time to Respond (MTTR) are often misused or lack context, leading to misleading performance reports for stakeholders. The speaker argues for a more transparent, risk-focused approach to security reporting that prioritizes meaningful outcomes over vanity metrics.

The Performance Metric Trap: Why Your SOC Reports Are Lying to You

TLDR: Security Operations Center (SOC) and Managed Security Service Provider (MSSP) reporting is frequently built on vanity metrics that obscure actual risk. Metrics like Mean Time to Detect (MTTD) and Mean Time to Respond (MTTR) are easily manipulated and often fail to reflect the reality of an environment's security posture. Security researchers and practitioners must demand risk-based reporting that focuses on meaningful outcomes rather than arbitrary performance targets.

Security reporting has become a theater of the absurd. Every month, organizations receive glossy PDF reports from their security providers, packed with charts, percentages, and "key performance indicators" that look impressive but tell us almost nothing about the actual security of the infrastructure. We are drowning in data, yet we are starving for insight. The industry has settled into a comfortable routine of measuring activity rather than impact, and it is time to stop pretending these numbers provide a clear picture of our defensive capabilities.

The Illusion of Mean Time to Detect

Mean Time to Detect (MTTD) is the industry's favorite metric, and it is fundamentally broken. When a provider reports an MTTD of 12 hours, what are they actually measuring? In many cases, it is simply the time elapsed between an alert firing on a firewall and that alert landing in a Security Information and Event Management (SIEM) system. This is not detection; this is data ingestion.

True detection happens when a human or an automated system identifies a malicious pattern within the noise. If an adversary gains initial access via T1566-phishing and spends three weeks moving laterally before the SOC notices, the MTTD should reflect that three-week window. Instead, providers often reset the clock based on when the final, high-fidelity alert triggered. This turns a catastrophic failure into a "successful" metric. As practitioners, we need to stop accepting these numbers at face value and start asking how the clock is being started and stopped.

The Problem with "Confirmed Incidents"

Another common metric is the number of "confirmed incidents." This is a classic example of Goodhart’s Law: when a measure becomes a target, it ceases to be a good measure. If an MSSP is incentivized to keep the number of incidents low, they will inevitably lower the threshold for what they consider "benign" or "false positive."

We see this constantly with T1110-brute-force and credential spraying against Microsoft 365 environments. A provider might report 800 alerts but only 20 "confirmed incidents." If those 20 incidents are just the ones where the attacker successfully authenticated, the report is ignoring the 780 other attempts that represent a clear, ongoing threat to the organization. By filtering out the "noise," the provider is effectively filtering out the adversary's reconnaissance phase.

Why We Need Risk-Based Reporting

The goal of a SOC is to reduce risk, not to clear tickets. When we look at OWASP A07:2021-Identification and Authentication Failures, the risk is not just a single successful login; it is the systemic failure of the authentication process. A report that lists "3 compromised accounts" without explaining the underlying vulnerability or the path to remediation is useless.

Instead of focusing on how fast a ticket was closed, we should be asking:

What was the business impact of this incident?
Did this incident expose a gap in our EDR or XDR coverage?
How does this incident change our threat model for the next quarter?

If your provider cannot answer these questions, they are not providing a security service; they are providing a compliance checkbox.

Challenging the Status Quo

The next time you sit down for a service delivery meeting, bring a healthy dose of skepticism. When you see a "99.2% SLA compliance" chart, ask for the raw data. Ask how they define a "critical" incident versus a "high" one. If they cannot explain the methodology behind their metrics, they are likely just outputting whatever makes the report look green.

We need to push for transparency. If a vendor is using automated playbooks to handle alerts, we should be measuring the effectiveness of those playbooks, not just the speed at which they execute. We should be looking for evidence of threat hunting, not just alert triage.

Ultimately, the responsibility lies with us. We are the ones who understand the technical reality of these attacks. If we continue to accept these superficial reports, we are complicit in the industry's failure to provide actual security. Stop looking for the "green" on the dashboard and start looking for the gaps in the coverage. The adversary is not measuring their success by how quickly they close tickets, and neither should we.

Talk Type

research presentation

Difficulty

intermediate