Care and Feeding of HSMs: Key Management in Hard Mode
This talk explores the operational complexities and security risks associated with managing Hardware Security Modules (HSMs) in production environments. It details the critical importance of key lifecycle management, including generation, storage, and revocation, while highlighting the dangers of human error in key ceremonies. The speaker provides practical guidance on mitigating risks through redundancy, proper role-based access control, and rigorous adherence to manufacturer documentation. The presentation emphasizes that HSMs are a serious commitment requiring dedicated budget, staffing, and robust disaster recovery planning.
Why Your Hardware Security Module Is Probably a Liability
TLDR: Hardware Security Modules (HSMs) are often treated as "set and forget" security silver bullets, but they are actually complex, fragile, and prone to catastrophic failure due to human error. This post breaks down why key management ceremonies, battery-backed RAM, and the false promise of PKCS#11 interoperability make HSMs a significant operational risk. If you are auditing or deploying these systems, you need to treat them as high-maintenance infrastructure rather than static black boxes.
Security professionals love the idea of a hardware-backed root of trust. We assume that if a cryptographic key lives inside a FIPS-certified box, it is effectively untouchable. But after watching Nick Pelis break down the operational reality of HSMs at BSidesSF 2025, it is clear that the biggest threat to these devices isn't a sophisticated side-channel attack—it is the person holding the smart card.
The Myth of the "Set and Forget" HSM
Most organizations treat HSMs as a way to offload the "hard" part of security. You buy the box, you rack it, you generate your keys, and you assume you are safe from Identification and Authentication Failures. The reality is that an HSM is a high-maintenance, stateful computer that requires a level of operational discipline most teams lack.
The most critical technical nuance is how these devices store keys. Most HSMs use battery-backed SRAM to hold sensitive cryptographic material. If the battery dies, the device loses its state, and your keys are gone. This isn't just a theoretical risk; it is a common failure mode. If you are performing a pentest on an environment using HSMs, don't just look for API vulnerabilities. Ask about the disaster recovery plan for the physical hardware. If they don't have a clear, tested procedure for battery replacement and key restoration, they are one power cycle away from a total system outage.
The PKCS#11 Interoperability Lie
One of the most dangerous misconceptions in the industry is that the PKCS#11 standard provides true vendor interoperability. The standard is an API specification, not a data portability format. If you build your infrastructure around a specific HSM vendor, you are locked in for the life of those keys.
When you attempt to move keys between different HSM vendors, you will quickly find that the "interoperable" nature of PKCS#11 breaks down at the implementation layer. You cannot simply export a key from a Thales unit and import it into a cloud-based HSM or a different on-premise device without significant friction. Every time you wrap or unwrap a key through layers of software—like OpenSSL or custom Golang wrappers—you risk losing metadata or violating the security policies that the HSM enforces. For a researcher, this means the "migration" phase of an engagement is often where the most interesting, and most destructive, bugs are found.
Human Error as an Attack Vector
The most effective way to compromise an HSM is to wait for the security officer to forget their PIN. HSMs are designed to be tamper-resistant, which means they are often "bricked" by design if they detect an unauthorized access attempt. If a security officer enters the wrong PIN three times, the device may enter a tamper state, wiping its memory or locking out all administrative functions.
This is where the "Key Ceremony" becomes a massive liability. These ceremonies are designed to prevent collusion by requiring multiple people to provide their smart cards to perform administrative tasks. However, if your ceremony requires a quorum of five people and three of them are on vacation, you cannot perform critical operations like key rotation or recovery.
During a pentest, focus on the "Ceremony" documentation. If the organization has a rigid, manual process that relies on specific individuals being physically present, you have found a denial-of-service vector that is far more effective than any network-based exploit.
Defensive Realities
If you are working with a blue team to secure these devices, the advice is simple but difficult to implement:
- Redundancy is mandatory: Never operate a single HSM. You need a primary and a hot-spare, at minimum.
- Rehearse the failure: If you haven't practiced an emergency key rotation on a test HSM, you aren't ready for production.
- Audit the physical access: The security of an HSM is only as good as the physical security of the room it sits in. If you can walk up to the rack and pull the power cable, you have already compromised the system.
HSMs are a serious commitment. They are not a shortcut to security. They are a complex, fragile, and expensive piece of infrastructure that requires a dedicated team, a massive budget, and a deep understanding of the manufacturer's manual. If you don't have the resources to treat them with the care they demand, you are likely better off with a well-managed software-based key management service. Stop pretending that hardware is a substitute for process.
Vulnerability Classes
Target Technologies
Attack Techniques
OWASP Categories
All Tags
Up Next From This Conference
Similar Talks

Inside the FBI's Secret Encrypted Phone Company 'Anom'

Unmasking the Snitch Puck: The Creepy IoT Surveillance Tech in the School Bathroom




