The 90-Day TLS Mandate is Coming: A Complete Guide to Managing Certificates in Kubernetes
In 2023, a single expired ground station certificate triggered a global outage for Starlink. While not an isolated Kubernetes incident, it serves as the ultimate cautionary tale for modern infrastructure: manual certificate tracking at scale is a mathematical and operational impossibility.
In Kubernetes (K8s) environments, the scale and velocity of certificate management have fundamentally changed. Unlike traditional infrastructure where a monolithic server held a static certificate for years, Kubernetes workloads are highly ephemeral. Pods spin up and down in seconds, requiring dynamic, automated certificate provisioning for both external ingress and internal pod-to-pod communication.
The stakes are getting higher. With Google's impending proposal to reduce the maximum validity of public TLS certificates from 398 days to just 90 days, the era of manual certificate rotation is officially over.
This comprehensive guide explores the architectural shifts, technical implementations, and best practices required to build a resilient, automated certificate management strategy in Kubernetes.
The Fundamental Shift in Kubernetes Certificate Management
Managing TLS in a microservices architecture introduces unique challenges. A single cluster might run thousands of pods, each requiring cryptographic identity. Historically, teams struggled with certificate sprawl, where a single forgotten ingress certificate could take down an entire customer-facing application.
Today, the industry is experiencing a massive shift toward hyper-automation, driven by three primary factors:
- The 90-Day TLS Validity Mandate: Google's push to mandate 90-day lifespans for public certificates forces organizations to adopt the Automated Certificate Management Environment (ACME) protocol. If you are manually rotating certificates, your operational overhead is about to quadruple.
- Zero Trust and Workload Identity: Modern security architectures assume the internal network is already compromised. Technologies like SPIFFE and SPIRE are replacing static IP-based security with short-lived, cryptographically verifiable identities (SVIDs) for every workload.
- Strict Compliance Requirements: Frameworks like PCI-DSS v4.0 and US Executive Order 14028 mandate stronger cryptography, faster key rotation, and Zero Trust architectures. Automated K8s certificate management is now a strict audit requirement.
North-South vs. East-West: Architecting Your K8s TLS Strategy
To master Kubernetes certificate management, you must separate your strategy into two distinct traffic flows: North-South and East-West.
Securing North-South Traffic (Ingress)
North-South traffic refers to data entering and exiting your Kubernetes cluster from the outside world. This requires public trust.
For Ingress, you rely on public Certificate Authorities (CAs) like Let's Encrypt or DigiCert. The industry standard for managing this in Kubernetes is cert-manager, a CNCF graduated project that acts as a centralized control plane for your cluster's certificates. It integrates directly with your Ingress controllers (like NGINX, Traefik, or the newer Gateway API) to automatically request, provision, and renew public certificates.
Securing East-West Traffic (Pod-to-Pod)
East-West traffic is the internal communication between your microservices. Exposing public certificates to internal traffic is an anti-pattern. Instead, internal traffic should be secured using mutual TLS (mTLS) backed by a private PKI.
For East-West traffic, Service Meshes like Istio, Linkerd, or Consul handle the heavy lifting. They operate as subordinate CAs, issuing extremely short-lived certificates (often valid for just hours) to sidecar proxies or eBPF nodes. This limits the blast radius; if an internal private key is compromised, it becomes useless almost immediately.
The "Secret" Problem: Securing Certificates at Rest
A critical vulnerability in many Kubernetes deployments is how certificates and private keys are stored. By default, Kubernetes Secrets are merely base64 encoded—they are not encrypted at rest. If an attacker gains access to your etcd datastore, they can easily decode and steal your private keys.
To solve this, modern architectures utilize the External Secrets Operator (ESO). ESO bridges your Kubernetes cluster with enterprise secret management systems like HashiCorp Vault or AWS Secrets Manager.
Instead of storing the certificate directly in K8s, cert-manager can be configured via Container Storage Interface (CSI) drivers to fetch certificates dynamically from Vault and inject them directly into the pod's memory. Alternatively, you must enable Kubernetes KMS (Key Management Service) provider encryption to ensure all Secrets stored in etcd are encrypted at rest.
Step-by-Step: Automating Ingress Certificates with cert-manager
To understand how this looks in practice, let's implement a fully automated North-South certificate pipeline using cert-manager and Let's Encrypt.
Step 1: Define the ClusterIssuer
First, we create a ClusterIssuer Custom Resource Definition (CRD). This tells cert-manager how to communicate with Let's Encrypt using the HTTP-01 challenge.
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
# The ACME server URL
server: https://acme-v02.api.letsencrypt.org/directory
# Email address used for ACME registration
email: security@yourdomain.com
# Name of a secret used to store the ACME account private key
privateKeySecretRef:
name: letsencrypt-prod-account-key
# Enable the HTTP-01 challenge provider
solvers:
- http01:
ingress:
class: nginx
Step 2: Configure the Ingress Resource
Once the issuer is established, developers only need to add a single annotation to their Ingress resources. cert-manager will detect this annotation, automatically spin up a temporary pod to solve the ACME challenge, and populate the resulting certificate into a Kubernetes Secret.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: web-application-ingress
annotations:
# This annotation triggers cert-manager
cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
ingressClassName: nginx
tls:
- hosts:
- app.yourdomain.com
# cert-manager will store the created certificate in this secret
secretName: web-application-tls-secret
rules:
- host: app.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: web-application-service
port:
number: 80
With this configuration, your certificates will automatically renew 30 days before expiration, requiring zero human intervention.
5 Best Practices for Kubernetes Certificate Lifecycle Management
Implementing the technology is only half the battle. To maintain a secure and resilient environment, adhere to these industry best practices.
1. Separate CAs for Internal and External Traffic
Never use Let's Encrypt or other public CAs for internal pod-to-pod communication. Public certificates are subject to rate limits and external validation failures. Use a robust internal PKI (like HashiCorp Vault) for your internal mesh, and reserve public CAs strictly for Ingress.
2. Enforce Short-Lived Certificates
Do not issue internal certificates valid for years. In a Zero Trust architecture, internal mTLS certificates should live for hours or days. Enterprise teams like Robinhood have successfully used SPIFFE/SPIRE to reduce provisioning times from days to milliseconds, securing thousands of ephemeral workloads with highly volatile, short-lived identities.
3. Never Store Certificates in Git
If your team uses GitOps workflows with tools like ArgoCD or Flux, it can be tempting to commit Kubernetes Secrets containing TLS certificates directly into your repositories. Never do this. Even in private repositories, hardcoded secrets are a massive security risk. If you must store secret state in Git, use encryption tools like SOPS or Bitnami Sealed Secrets.
4. Prepare for the "Chicken and Egg" Bootstrapping Problem
How does a brand-new Kubernetes node securely authenticate to get its very first certificate? Relying on static tokens is insecure. Utilize Trusted Platform Modules (TPMs) and cloud-provider IAM roles to securely bootstrap node identities before any workload certificates are issued.
5. Implement Independent Expiration Tracking and Alerting
Automation is incredible—until it fails. cert-manager might be configured perfectly, but Let's Encrypt rate limits, DNS propagation errors, or misconfigured webhooks can silently halt the renewal process. If you rely solely on your automation to tell you it's working, you will eventually suffer an outage.
This is where Expiring.at provides a critical safety net. By implementing external, independent expiration tracking, you gain visibility into the actual state of your live endpoints. Expiring.at monitors your external-facing Ingress endpoints and alerts your team via Slack, email, or webhooks well before an automation failure turns into a production outage. It acts as the ultimate "trust but verify" layer for your automated certificate pipelines.
Future-Proofing: eBPF and Post-Quantum Cryptography
The Kubernetes networking landscape is evolving rapidly, and your certificate strategy must adapt.
We are currently seeing a shift toward sidecar-less service meshes. Technologies leveraging eBPF, such as Cilium and Istio Ambient Mesh, are moving mTLS termination out of individual pod sidecars and down to the node level. This drastically reduces resource overhead while maintaining strict cryptographic identities for workloads.
Furthermore, with NIST finalizing Post-Quantum Cryptography (PQC) standards, K8s administrators must prioritize "crypto-agility." This means designing your certificate management pipelines so that underlying cryptographic algorithms (e.g., swapping from RSA/ECC to quantum-resistant algorithms like Kyber) can be updated globally via your control plane without requiring application code changes.
Conclusion
The days of manually updating spreadsheets with certificate expiration dates are over.