Back to PKI Mistakes

PKI Operations Mistakes

The ones that wake you up at night

When pain hits: 2am Friday4 mistakes covered
PKI Operations Mistakes - The ones that wake you up at night

10"The CA will remind us before it expires"

What happens

Organization relies entirely on CA-sent expiration emails. Those emails go to the wrong person, get caught in spam filters, or land in an inbox nobody checks.

Why it seems reasonable

"DigiCert sends emails at 90, 60, 30 days. We'll see them."

The reality

Emails go to whoever ordered the cert 3 years ago. That person is gone. Their email forwards to nowhere, or the account was deactivated.

Real-world consequence

CA sends warnings to jane.smith@company.com. Jane left 18 months ago. Her inbox was archived and nobody reads it. Certificate expires on a holiday weekend. On-call gets paged at 2am.

The fix

  • Your own monitoring - don't depend on CA emails
  • Multiple notification channels (email, Slack, PagerDuty)
  • Auto-create tickets at 60+ days before expiration
  • Use a distribution list, not individual emails

Warning signs

  • Don't know where CA notifications go
  • No internal certificate monitoring
  • CA emails go to individual addresses, not distribution lists

11"We don't need to track internal certificates"

What happens

Only public-facing certificates get inventoried and monitored. Internal certificates are deployed and forgotten. An internal cert expires and takes down services.

Why it seems reasonable

"Internal certs don't affect customers. They're lower priority."

The reality

Everything is connected. Internal service breaks → API fails → customer site down. The dependency chain doesn't care about your priority labels.

Real-world consequence

Internal service mesh certificate expires at 3am. Auth service can't talk to user database. API gateway starts returning 503s. Customer-facing website shows "Service Unavailable." All because of a certificate nobody knew existed.

The fix

  • ALL certificates tracked - internal and external
  • Same lifecycle management for internal certs
  • Map dependencies - what breaks if this cert expires?
  • Internal certs get the same monitoring as public ones

Warning signs

  • "That's just an internal cert"
  • Don't know how many internal certificates exist
  • Internal services use different (or no) monitoring

12"It's just one cert, I'll renew it tomorrow"

What happens

Certificate is approaching expiration. Engineer sees it, plans to renew "tomorrow." Life happens. Tomorrow never comes.

Why it seems reasonable

"Doesn't expire until Friday. I'll do it Thursday. Plenty of time."

The reality

Things come up. Priorities shift. People get sick, go on vacation, or leave. "Thursday" becomes "never." Mental notes aren't reliable.

Real-world consequence

Engineer gets pulled into a production incident. Works until 10pm. Goes home exhausted. Next day is swamped with the incident postmortem. Friday comes. Certificate expires. Nobody else knew renewal was pending because it was in one person's head.

The fix

  • Renew early - 30+ days before expiration, not 3
  • Ticket system, not mental notes
  • 60-day renewal window as standard practice
  • If you see it, ticket it immediately

Warning signs

  • Renewals in the last week before expiration
  • "I'll get to it" as the plan
  • Renewal tasks live in someone's head, not a system

13"IT owns all certificates"

What happens

IT believes they manage all certificates. Meanwhile, DevOps is deploying Let's Encrypt, developers are using ACM, and someone set up Cloudflare. None of this feeds into IT's inventory.

Why it seems reasonable

"Certificate management is IT's job. That's where the expertise is."

The reality

Developers have access to Let's Encrypt, AWS Certificate Manager, Cloudflare, and a dozen other services that issue certificates. They're not going through IT's 2-week procurement process. They're solving problems now.

Real-world consequence

Security audit runs certificate discovery. Finds 300 certificates IT didn't know existed. Some are expired. Some use deprecated algorithms. Some protect critical systems. IT's "complete inventory" covered 40% of actual certificates.

The fix

  • Certificate discovery tools - find what's actually deployed
  • Provide self-service that feeds into inventory automatically
  • Work with DevOps, not against them
  • Make the right thing the easy thing

Warning signs

  • Discovery finds more certs than inventory shows
  • DevOps and IT have different certificate counts
  • "They should be using our process" (but they're not)

Key Takeaways

  • Never rely solely on CA emails - build your own monitoring
  • Internal certificates cause outages just like public ones - track them all
  • Mental notes aren't a renewal strategy - use tickets and automation
  • Shadow IT certificates are real - discover what's actually deployed, not what you think is deployed