Security BSides2025

The Growing Crisis in CVE Data Quality

Security BSides San Francisco56 views30:065 months ago

This talk examines the systemic degradation of CVE data quality, highlighting the lack of mandatory fields like CVSS, CPE, and CWE in published vulnerability records. It analyzes the bureaucratic and funding challenges faced by organizations like MITRE and NVD, which lead to incomplete or missing vulnerability metadata. The presentation demonstrates how this data gap hinders effective vulnerability management and risk assessment for security professionals. The speaker advocates for community-driven improvements, including mandatory data standards and increased transparency in vulnerability reporting.

The Data Quality Crisis That Is Breaking Your Vulnerability Management

TLDR: The CVE program is suffering from a systemic failure in data quality, with nearly 25% of records missing critical metadata like CVSS scores, CPEs, and CWEs. This lack of standardization forces security teams to rely on incomplete information, leading to inaccurate risk assessments and delayed patching cycles. Researchers and defenders must stop treating CVE data as a source of truth and start demanding better transparency and stricter validation from the organizations managing these databases.

Vulnerability management is fundamentally broken, not because we lack tools, but because the data we feed those tools is increasingly garbage. When you pull a list of vulnerabilities for a Windows or Microsoft Office environment, you expect actionable intelligence. Instead, you are often met with a wall of noise. The recent analysis of CVE data quality reveals that the foundational layer of our security industry is crumbling under the weight of bureaucratic mismanagement and a lack of technical rigor.

The Mechanics of the Failure

The CVE program, which is supposed to be the global standard for tracking software vulnerabilities, has become a victim of its own success. As the volume of reported vulnerabilities explodes, the organizations responsible for maintaining the data—specifically MITRE and the National Vulnerability Database (NVD)—are struggling to keep pace. The core issue is that the barrier to entry for publishing a CVE is dangerously low.

A CVE record only requires a description and a product name to be considered valid. This is a massive problem for anyone trying to automate risk scoring. If a record lacks a Common Vulnerability Scoring System (CVSS) score, a Common Platform Enumeration (CPE), or a Common Weakness Enumeration (CWE), it is essentially useless for automated triage.

During the recent research, we saw a disturbing trend where major vendors like Microsoft are publishing records that are little more than placeholders. For example, vulnerabilities like CVE-2023-31314 or CVE-2024-38020 often arrive with minimal context. When you encounter these in the wild, you are forced to manually verify the impact, which defeats the entire purpose of having a centralized database.

Why Your Scanners Are Missing the Mark

For a pentester, this data gap is a nightmare. You are running scans against a target, and your tools are flagging hundreds of vulnerabilities. If the underlying CVE data is missing the CPE, your scanner cannot accurately map the vulnerability to the installed software version. You end up with a high rate of false positives or, worse, false negatives where a critical T1190-exploit-public-facing-app is ignored because the metadata was never properly linked.

The technical reality is that we are relying on a system that was designed for a different era. The current CVE Numbering Authority (CNA) system is decentralized, which is great for speed but terrible for consistency. When you have dozens of different organizations, each with their own internal processes, the quality of the data varies wildly. Some CNAs are diligent, while others treat the process as a checkbox exercise.

The Cost of Inaction

The financial cost of this mismanagement is staggering. We are talking about millions of dollars in government contracts being funneled into organizations that are failing to provide the most basic level of data integrity. When you look at the NVD and the resources allocated to it, it is clear that the problem is not a lack of funding, but a lack of accountability.

If you are a researcher or a developer, you need to stop assuming that the NVD is a perfect source of truth. Start building your own intelligence pipelines. Use tools that allow you to cross-reference multiple data sources, such as the GitHub Advisory Database, which often provides more granular and accurate information than the official CVE records.

What You Can Do Today

Defenders and researchers need to get loud. If you are working with a vendor that is a CNA, push them to provide better data. If you are participating in OWASP projects or other community-driven security initiatives, advocate for stricter data validation rules. We need to move toward a model where a CVE is not considered "published" until it meets a minimum threshold of metadata completeness.

The current state of affairs is unsustainable. We are building our security programs on a foundation of sand. By demanding better data quality and refusing to accept incomplete records, we can force the industry to prioritize accuracy over volume. The next time you are triaging a list of vulnerabilities, take a moment to look at the source data. If it is missing the fields that actually matter, report it. The only way to fix this is to make the cost of bad data higher than the cost of doing it right.

Talk Type

research presentation

Difficulty

intermediate