The End of Static PKI: Container Certificate Management Best Practices for 2025

If you are manually managing certificates in Kubernetes, you are already setting yourself up for an outage.

Tim Henrich
March 10, 2026
8 min read
23 views

The End of Static PKI: Container Certificate Management Best Practices for 2025

If you are manually managing certificates in Kubernetes, you are already setting yourself up for an outage.

In traditional IT infrastructure, servers lived for years, and Public Key Infrastructure (PKI) was a relatively static affair. Security teams would provision TLS certificates with one- or two-year lifespans, track them in a spreadsheet, and rely on calendar reminders for manual renewals.

Today, the shift toward cloud-native, microservice-based architectures has fundamentally broken this model. Containers are ephemeral—often living for days, hours, or even minutes. Furthermore, recent industry reports reveal that machine identities (certificates, keys, and secrets) now outnumber human identities by a staggering factor of 45 to 1.

Adding fuel to the fire is Google's push to reduce the maximum validity of public TLS certificates to just 90 days. While this mandate targets public trust, it has created a massive trickle-down effect for internal PKI. Security teams are now routinely scoping internal container certificates to 24 hours or less to minimize the blast radius of compromised keys.

Managing these identities in containerized environments requires a complete paradigm shift from manual, ticket-based provisioning to hyper-automated, policy-driven, and short-lived certificate lifecycles. In this guide, we will explore the maturity model of container certificate management, examine real-world case studies, and provide actionable best practices for securing your Kubernetes environments.

The Cost of Visibility Blind Spots: The "Hidden Ingress" Outage

Before diving into solutions, it is crucial to understand the stakes. In late 2023, a major SaaS provider suffered a catastrophic four-hour global outage. The root cause was not a complex network partition or a database corruption—it was a single expired TLS certificate on a Kubernetes Ingress controller.

The certificate had been manually provisioned by a developer who had since left the company. Because it was an internal ingress routing traffic between two legacy microservices, it bypassed the company's central PKI dashboard. When the certificate expired, the microservices immediately dropped all mutual TLS (mTLS) connections, causing a cascading failure across the entire application stack.

This scenario highlights a critical reality: unmanaged Kubernetes certificates are ticking time bombs. Without a centralized control plane and automated lifecycle management, human error is inevitable.

The Container Certificate Maturity Model

To prevent outages and secure container communications, organizations must evolve their certificate management strategies. We can categorize this evolution into three distinct levels of maturity.

Level 1: The Anti-Pattern (Baked-in Certificates)

In the earliest stages of container adoption, developers often bake certificates and private keys directly into container images via Dockerfiles.

# WARNING: DO NOT DO THIS
FROM nginx:alpine
COPY ./certs/production-cert.pem /etc/ssl/certs/
COPY ./certs/production-key.pem /etc/ssl/private/

This approach creates immediate security and operational nightmares. It leads to massive secret sprawl, as private keys are pushed to image registries and potentially exposed in source control. More importantly, it makes certificate rotation impossible without rebuilding and redeploying the entire application—a direct violation of modern compliance frameworks like PCI-DSS v4.0.

Level 2: Native Kubernetes Automation with cert-manager

The first step toward a mature PKI strategy is decoupling certificates from the application code and leveraging native Kubernetes automation. The industry standard for this is cert-manager, a CNCF graduated project.

cert-manager operates using Custom Resource Definitions (CRDs). You define an Issuer (or ClusterIssuer) that dictates how certificates are signed—connecting to Certificate Authorities like Let's Encrypt for public endpoints, or HashiCorp Vault for internal PKI.

Instead of manually generating a CSR, you simply define a Certificate resource. cert-manager will automatically request the certificate, store it in a Kubernetes Secret, and mount it to your pods. Crucially, it tracks the expiration date and automatically rotates the certificate before it expires.

Here is a practical example of a cert-manager configuration using an internal HashiCorp Vault issuer:

---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: vault-issuer
spec:
  vault:
    server: https://vault.internal.example.com:8200
    path: pki_int/sign/kubernetes-workloads
    auth:
      kubernetes:
        role: cert-manager
        secretRef:
          name: vault-token
          key: token
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: microservice-a-tls
  namespace: production
spec:
  # Issue a short-lived certificate valid for only 24 hours
  duration: 24h
  renewBefore: 8h
  secretName: microservice-a-tls-secret
  issuerRef:
    name: vault-issuer
    kind: ClusterIssuer
  commonName: microservice-a.production.svc.cluster.local
  dnsNames:
    - microservice-a.production.svc.cluster.local

With this configuration, the application container simply mounts microservice-a-tls-secret as a volume. The developer writes zero cryptography code, and the infrastructure automatically handles the 24-hour rotation cycle.

Level 3: Zero Trust and Service Mesh Integration

While cert-manager is excellent for Ingress controllers and basic workloads, achieving true Zero Trust Architecture (ZTA) requires Level 3 maturity: implementing Mutual TLS (mTLS) for every single container-to-container communication.

Relying on network perimeters is no longer sufficient. If an attacker breaches a single pod, they should not be able to sniff traffic or impersonate other services on the internal cluster network.

This is where Service Meshes like Istio or Linkerd, combined with identity frameworks like SPIFFE and SPIRE, come into play.

In a Service Mesh architecture, a sidecar proxy (like Envoy) is injected into every pod. The control plane acts as a subordinate CA, automatically generating cryptographic identities (SVIDs) via SPIRE, pushing them to the sidecars, and rotating them seamlessly—often every hour. The application container communicates over plain HTTP to its own localhost, while the sidecar proxy intercepts the traffic, encrypts it, and handles the complex mTLS handshakes.

Core Best Practices for Container Certificates

Whether you are at Level 2 or Level 3 of the maturity model, adhering to the following technical best practices is essential for a secure, compliant container environment.

1. Embrace Ultra-Short-Lived Certificates

Traditional revocation mechanisms like Certificate Revocation Lists (CRLs) and Online Certificate Status Protocol (OCSP) are notoriously slow, unreliable, and difficult to scale in highly dynamic Kubernetes clusters.

The modern solution to revocation is to make certificates expire so quickly that revocation becomes unnecessary. If a container's private key is compromised, a certificate valid for only a few hours will expire before the attacker can heavily exploit it. Aim to reduce internal certificate lifespans from years to days, and eventually to hours.

2. Never Touch the Disk

Private keys should be generated in memory and never written to persistent storage. In Kubernetes, standard Secrets are merely base64 encoded and can easily be exposed if node storage is compromised.

To mitigate this, ensure that your certificate volumes are mounted using tmpfs (an in-memory file system). Alternatively, utilize the Secrets Store CSI Driver to fetch certificates directly from external enterprise vaults (like AWS Secrets Manager or Azure Key Vault) and mount them as ephemeral, memory-only volumes that disappear the moment the pod is destroyed.

3. Rotate Keys, Not Just Certificates

A common mistake in automated PKI is renewing the certificate while reusing the exact same private key. This completely defeats the cryptographic purpose of rotation. Ensure your automation tools are configured to generate a fresh private key for every renewal cycle. In cert-manager, this is handled by setting the rotationPolicy: Always field in your Certificate resource.

4. Prepare for Crypto-Agility

In August 2024, NIST finalized the first set of Post-Quantum Cryptography (PQC) standards, including algorithms like ML-KEM. As quantum computing advances, traditional RSA and ECC algorithms will become vulnerable.

Container environments must be audited for "crypto-agility"—the ability to swap out cryptographic algorithms via automated rotation without application downtime. By fully automating your PKI now, transitioning to quantum-resistant algorithms in the future becomes a simple configuration change rather than a multi-year engineering project.

Case Study: Fintech Zero Trust Success

Consider the case of a global payment processor that recently overhauled its container security posture. Operating across 50+ Kubernetes clusters in a hybrid-cloud environment, they were struggling with secret sprawl and failing PCI-DSS compliance audits due to manual tracking of 1-year static certificates.

The engineering team implemented SPIRE for workload identity federation and Istio for network routing. By moving from 1-year static certificates to 1-hour automated mTLS certificates handled entirely by Envoy sidecars, they achieved several massive wins:

  1. Eliminated Secret Sprawl: Developers no longer had access to production private keys.
  2. Compliance Automation: The centralized SPIRE server provided perfect audit logs of exactly who issued a certificate, what workload used it, and when it was rotated, immediately satisfying PCI-DSS v4.0 requirements.
  3. Reduced Blast Radius: By rotating keys every hour, the theoretical window of compromise for any single node was reduced to near zero.

The Visibility Mandate: Trust, but Verify

Automation is the foundation of modern container PKI, but automation without visibility is dangerous. Tools like cert-manager and Istio are incredibly robust, but they can and do fail. An ACME DNS challenge might timeout, an internal HashiCorp Vault cluster might experience a split-brain scenario, or a misconfigured RBAC policy might prevent a pod from fetching its renewed secret.

When the automation fails silently, you are back to the "Hidden Ingress" outage scenario. You must monitor the actual expiration dates of the certificates currently in use, independent of the systems responsible for renewing them.

This is where a dedicated expiration tracking platform becomes invaluable. By utilizing Expiring.at, DevOps and security teams can establish an independent layer of verification. Rather than hoping your internal automation fired correctly, you can actively monitor your external endpoints, ingress controllers, and API gateways.

Expiring.at provides centralized visibility and proactive alerting across your entire infrastructure. If a Kubernetes automated renewal fails, Expiring.at will alert your team via Slack, email, or webhook days before the expiration actually impacts your users, bridging the gap between automated issuance and guaranteed uptime.

Conclusion

The era of manual certificate management and long-lived static PKI

Share This Insight

Related Posts