Building a Modern Certificate Management Team: Surviving the 90-Day TLS Mandate and Beyond

Historically, Public Key Infrastructure (PKI) and certificate management were treated as part-time responsibilities—a chore grudgingly assigned to a network administrator or a junior IT engineer. Arme...

Tim Henrich
June 26, 2026
7 min read
4 views

Building a Modern Certificate Management Team: Surviving the 90-Day TLS Mandate and Beyond

Historically, Public Key Infrastructure (PKI) and certificate management were treated as part-time responsibilities—a chore grudgingly assigned to a network administrator or a junior IT engineer. Armed with a massive Excel spreadsheet and a calendar application, this lone operator was the only thing standing between operational stability and a catastrophic outage.

In 2024, this approach is not just outdated; it is mathematically impossible to sustain.

The exponential growth of cloud-native architectures, IoT devices, and microservices has transformed traditional PKI into a critical, high-stakes discipline known as Machine Identity Management (MIM). Driven by impending industry mandates and the looming threat of quantum computing, building a dedicated Certificate Management Team has transitioned from an enterprise luxury to an absolute necessity.

Why "PKI Admin" is a Dead Job Title

If your organization still relies on a part-time "PKI Admin," you are already behind the curve. According to the 2024 Keyfactor State of Machine Identity Report, the average enterprise now manages over 250,000 certificates—a staggering 30% increase from 2023.

Worse, 81% of organizations have experienced at least two certificate-related outages in the past 24 months. These aren't just minor inconveniences; they are headline-making disasters.

Consider the Starlink global outage in April 2023. A single expired ground station certificate brought down satellite internet access worldwide. Similarly, Cisco Meraki experienced crippling issues in 2023 and 2024 when expired certificates disrupted VPN and cloud-managed device connectivity. The lesson is clear: even highly advanced technology companies fail at basic lifecycle management without centralized team oversight.

Two massive industry shifts are forcing organizations to rethink how they manage machine identities:

  1. The 90-Day TLS Mandate: Google’s Chromium Root Program has proposed reducing the maximum lifespan of public TLS certificates from 398 days to just 90 days. Expected to be enforced in late 2024 or 2025, this mandate means that manual renewal processes will collapse under their own weight.
  2. Post-Quantum Cryptography (PQC): In August 2024, NIST finalized the first set of PQC standards (FIPS 203, 204, and 205). Organizations must build teams capable of executing a multi-year cryptographic transition to replace RSA and ECC before quantum computers can break them. You cannot migrate an inventory you haven't fully discovered.

The Anatomy of a Modern Certificate Management Team

To survive this shift, organizations must build a Certificate Management Team that operates as a cross-functional Center of Excellence (CoE). This team cannot exist in a silo; it must bridge the gap between security, operations, and development.

A mature team consists of four core personas:

1. The PKI / Cryptography Architect (The Strategist)

This role is responsible for the overarching root CA architecture, defining cryptographic policies (such as minimum key lengths and approved algorithms), and mapping the organization's crypto-agility strategy for the upcoming PQC migration. They ensure that internal PKI hierarchies are robust, secure, and highly available.

2. The Machine Identity Engineer (The Operator)

The Operator manages the day-to-day lifecycle of machine identities. They oversee the central Certificate Lifecycle Management (CLM) platforms, handle complex revocations, and troubleshoot infrastructure issues. When an intermediate CA needs to be rotated, the Machine Identity Engineer executes the process seamlessly.

3. The DevOps / SRE Liaison (The Automator)

This is arguably the most critical modern addition to the team. The Automator integrates certificate issuance directly into CI/CD pipelines and Kubernetes environments. Their job is to ensure developers have frictionless, API-driven access to certificates, eliminating the temptation of "Shadow IT."

4. The Security & Compliance Analyst (The Auditor)

With the enforcement of PCI-DSS v4.0 arriving in 2025, stricter requirements for automated discovery and lifecycle management of cryptographic keys are now mandatory. The Auditor monitors for rogue certificates, ensures regulatory compliance, and tracks crypto-agility metrics across the enterprise.

The "Carrot and Stick" Approach for DevOps Integration

One of the biggest challenges a Certificate Management Team faces is Shadow IT. When IT processes are slow, developers will bypass them. They will buy certificates on corporate credit cards or spin up unapproved Let's Encrypt instances to get their applications running.

The traditional IT response is the "stick"—blocking ports, writing strict policies, and punishing violators. The modern Certificate Management Team uses the "carrot."

By providing decentralized execution through self-service portals and API integrations, the team makes the secure, compliant path the easiest path for developers.

Automating Kubernetes with cert-manager

For example, instead of asking developers to generate CSRs (Certificate Signing Requests) for their Kubernetes workloads, the DevOps Liaison implements cert-manager. By configuring a centralized ClusterIssuer, developers can automatically provision trusted certificates simply by adding annotations to their Ingress resources.

Here is an example of how the team might configure an ACME ClusterIssuer using Let's Encrypt for external-facing workloads:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    # The ACME server URL
    server: https://acme-v02.api.letsencrypt.org/directory
    # Email address used for ACME registration
    email: pki-team@yourcompany.com
    # Name of a secret used to store the ACME account private key
    privateKeySecretRef:
      name: letsencrypt-prod-account-key
    # Enable the HTTP-01 challenge provider
    solvers:
    - http01:
        ingress:
          class: nginx

With this infrastructure in place, a developer only needs to define their Ingress, and cert-manager handles the issuance, renewal, and attachment of the certificate entirely in the background.

Enforcing Technical Standards: The Move to Ephemeral Identities

To be effective, the Certificate Management Team must ruthlessly enforce modern technical standards. The days of 5-year internal certificates are over.

Standardizing on ACME and SCEP/EST

The team must mandate the use of ACME (Automated Certificate Management Environment) for all web servers and load balancers. For mobile devices, edge computing, and IoT, protocols like SCEP (Simple Certificate Enrollment Protocol) or EST (Enrollment over Secure Transport) must be the default.

Short-Lived Certificates via HashiCorp Vault

A core tenet of modern Machine Identity Management is moving toward short-lived certificates. Internal PKI should issue certificates valid for only 7 to 30 days. This forces the organization to fix broken automation rather than relying on manual intervention. If a certificate expires in 7 days, you must automate its renewal; human intervention is no longer a viable fallback.

HashiCorp Vault is the gold standard for issuing dynamic, short-lived internal certificates. The Certificate Management Team can use Terraform to enforce maximum Time-To-Live (TTL) policies across the organization.

Here is a Terraform snippet demonstrating how the team can configure a Vault PKI secret engine role that strictly limits certificate lifespans to 30 days:

resource "vault_pki_secret_backend_role" "internal_microservices" {
  backend          = vault_pki_secret_backend.internal_ca.path
  name             = "microservice-mtls"

  # Enforce short-lived certificates
  ttl              = "720h"  # 30 days
  max_ttl          = "720h"  # Hard cap at 30 days

  # Security parameters
  allow_ip_sans    = true
  key_type         = "ec"
  key_bits         = 256

  # Allowed domains
  allowed_domains  = ["svc.cluster.local", "internal.yourcompany.com"]
  allow_subdomains = true

  # Require Common Name
  require_cn       = true
}

By enforcing a 30-day max_ttl at the infrastructure level, the team ensures that compromised keys are automatically rotated out of existence rapidly, laying the foundation for a true Zero Trust Architecture (ZTA) where Mutual TLS (mTLS) is enforced for all service-to-service communication.

The Tech Stack: Command Centers and Expiration Tracking

A modern team cannot function without the right technology stack. Relying on spreadsheets to track expiration dates leads to human error and inevitable outages. You need a mix of enterprise command centers and agile execution tools.

The Command Center

Enterprise Certificate Lifecycle Management (CLM) platforms act as the single pane of glass. Tools like Venafi, Keyfactor, and AppViewX aggregate telemetry from cloud-native secret managers (AWS ACM, Azure Key Vault, Google CAS) into one dashboard. They allow the PKI Architect to enforce policies globally and the Security Analyst to audit compliance instantly.

Continuous Discovery and Alerting with Expiring.at

However, before you can automate everything, you must have absolute visibility into what is currently deployed. The reality for most teams is a chaotic mix of legacy systems, multi-cloud deployments, and shadow IT.

This is where Expiring.at becomes a foundational tool for the modern Certificate Management Team. Instead of manually scanning networks or waiting for a user to complain about a broken website, Expiring.at provides continuous, automated expiration tracking.

By integrating Expiring.at into your workflow, the team gains proactive alerting via Slack, email, or webhooks before a 90-day TLS certificate unexpectedly drops. It acts as the ultimate safety net, ensuring that even systems that haven't yet been migrated to ACME automation are monitored with precision. When building a team, establishing this baseline visibility is step one; you cannot automate or secure what you cannot see.

Preparing for the Post-Quantum Transition

Building this team today is not just about solving the 90-day TLS problem; it is about preparing for the largest cryptographic migration in the history of the internet.

Quantum computers capable of breaking current RSA and ECC encryption algorithms

Share This Insight

Related Posts