Schedule a Demo
Blog January 27, 2025

When you know, you Know - Catching HSM failures before they cost your organization!

by Mark B. Cooper

 

One of the most extraordinary things about working in a new Cybersecurity space like PKI Spotlight is seeing your technology stack solve real-world problems. From our decades of experience designing, deploying, and supporting complex enterprise PKIs, we have seen that many minor issues have a significant impact on organizations. Often, these minor issues would go undetected – until they didn’t. Frequently resulting in massive global outages, inability to transact retail sales with point-of-sale systems across a continent, or affecting the safety of medical devices and commercial airplanes.

One commonly occurring issue we saw was failed Hardware Security Modules such as Thales Luna HSMs and Entrust nCipher HSMs, to name two. Customers would deploy these devices in highly available pairs (or more!) and configure their PKI to use the HSMs to generate and store their keys. But, when the HSM fails, the PKI could fail and affect an organization’s entire IAM, PAM, and operational state. The challenge with running a highly available pair was that customers often didn’t even know an HSM failed – because the PKI continued running. Their lack of visibility meant they didn’t think they were operating in a failed state and were now suddenly susceptible to a single point of failure. Honestly, do you know if all your HSMs are genuinely working and available RIGHT NOW? I bet you are like everyone else and don’t know.

Well, we recently had the pleasure of benefiting from our own technology! We developed a patent-pending process we call “Is-Alive” that extensively checks the operations of PKI critical roles every five minutes to look for issues – including HSM availability. In addition, we collect data from all available HSMs where our agent is running. We are the only product in the world that today can do this for multiple HSM vendors on a single screen!

Well, recently, I went into our development lab as we were planning to add some new HSMs for future development and noticed one of the HSMs was offline and showing an error. Just like our customers often report when they occasionally go into their data center and happen to look “in that cabinet” where the HSMs are.

Reviewing our development version of PKI Spotlight, we were able to detect within 5 minutes that the HSM had failed, as reported by our Is-Alive service. The kicker, the HSM vendors software indicates the module is operational. Clearly, it isn’t, and this is a common situation we see in PKIs – existing visibility and software are insufficient to protect the modern enterprise.

Had this been a production system, we would have known the moment we entered a failed state of the HSM and began the expedited replacement of the HSM. As a badge of honor, I will leave this HSM in a failed state and not fix it. The number of times our customers, before PKI Spotlight, suffered an outage from a scenario like this is not to be underestimated. We will maintain this failed HSM in its current state, and yes, you can see it in a demo of PKI Spotlight if you connect with us. We will happily show you our own failed HSM – so that you never have to be caught by surprise in your environment.

This is a small part of what we do as pioneers in PKI Posture Management.

Mark B. Cooper

President & Founder at PKI Solutions, Leading PKI Cybersecurity Subject Matter Expert, Author, Speaker, Trainer, Microsoft Certified Master.

View All Posts by Mark B. Cooper

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *