Why 90 Days is Too Long: The New Rules of Container Certificate Management
The shift to cloud-native architectures has fundamentally broken traditional Public Key Infrastructure (PKI). In the era of monolithic applications, provisioning a single SSL/TLS certificate valid for a year was standard operating procedure. Today, when Google is actively pushing the industry to reduce public TLS certificate lifespans to 90 days, enterprise infrastructure teams are realizing that for internal containerized environments, even 90 days is a massive security liability.
In modern Kubernetes environments, pods scale up and down in seconds. A certificate tied to a specific IP address becomes invalid almost immediately. Consequently, container certificates are now frequently issued with lifespans measured not in months, but in hours or even minutes.
Managing this hyper-ephemeral cryptographic landscape requires moving beyond basic "certificate tracking." Success in 2024 and 2025 demands cryptographic agility, automated workload identity, and strict Zero Trust mutual TLS (mTLS) enforcement. Here are the technical best practices and implementation strategies for mastering container certificate management.
The Ephemerality Problem: Rethinking Workload Identity
The core challenge of container PKI is the clash between static security constructs and dynamic infrastructure. Manual certificate provisioning—generating a Certificate Signing Request (CSR), sending it to a CA, and waiting for a signed response—can take days. Containers often live for less time than it takes to open a Jira ticket.
Furthermore, traditional certificates rely heavily on IP addresses or static hostnames. In Kubernetes, IP addresses are ephemeral. To solve this, the industry has standardized on the Secure Production Identity Framework for Everyone (SPIFFE).
Instead of tying certificates to network locations, SPIFFE ties certificates to cryptographic workload identities. A SPIFFE ID (e.g., spiffe://example.org/ns/payments/sa/billing-service) is embedded directly into the Subject Alternative Name (SAN) of a short-lived X.509 certificate. This ensures that no matter where a container spins up, its identity remains cryptographically verifiable.
Core Principle 1: Implement Zero-Touch Automation
To handle certificates that expire in 24 hours or less, human intervention must be entirely removed from the provisioning lifecycle. cert-manager has become the undisputed CNCF-graduated standard for Kubernetes certificate automation.
However, simply installing cert-manager isn't enough; you must architect a secure trust chain. Never issue container certificates directly from an offline Root CA. Instead, integrate cert-manager with an intermediate trust broker like HashiCorp Vault to dynamically issue short-lived certificates.
Here is a practical example of configuring a ClusterIssuer that authenticates to Vault using a Kubernetes Service Account token:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: vault-issuer
spec:
vault:
server: https://vault.internal.example.com:8200
path: pki_int/sign/kubernetes-workloads
auth:
kubernetes:
mountPath: /v1/auth/kubernetes
role: cert-manager
secretRef:
name: issuer-token-secret
key: token
With the issuer established, developers can request hyper-ephemeral certificates declaratively alongside their application deployments. Notice the aggressive duration and renewBefore values in this Certificate resource:
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: billing-service-tls
namespace: payments
spec:
secretName: billing-service-tls-secret
duration: 1h # Certificate lives for only 1 hour
renewBefore: 15m # Automation attempts renewal 15 minutes before expiration
subject:
organizations:
- example-corp
isCA: false
privateKey:
algorithm: RSA
encoding: PKCS1
size: 2048
issuerRef:
name: vault-issuer
kind: ClusterIssuer
Core Principle 2: Enforce mTLS via the Service Mesh
Under zero-trust frameworks like NIST 800-207, network location does not imply trust. Every container-to-container communication must be encrypted and authenticated. However, a critical best practice is to decouple PKI from application code. Developers should not be writing cryptographic validation logic or handling OpenSSL libraries within their microservices.
Instead, push certificate management down to the infrastructure layer using a Service Mesh like Istio or an eBPF-based solution like Cilium.
In a modern mesh architecture, the control plane automatically requests short-lived certificates from your CA and pushes them directly into the Envoy sidecar proxies (or handles it in the Linux kernel via eBPF). When Pod A attempts to communicate with Pod B, the proxies intercept the traffic, present their client certificates, validate the SPIFFE IDs against the trusted Root CA, and establish an mTLS tunnel—all completely transparent to the application code.
This approach is also critical for compliance. Frameworks like PCI DSS v4.0 explicitly require that cardholder data be encrypted in transit between internal microservices. A service mesh provides a globally provable mechanism to satisfy this audit requirement.
Core Principle 3: Eradicate Plaintext Secrets
One of the most common security anti-patterns in container environments is "Secret Sprawl." By default, Kubernetes Secrets are merely base64 encoded, not encrypted. Storing manually generated TLS private keys in default Kubernetes secrets or, worse, hardcoding them into container images is a massive vulnerability.
To secure certificate private keys, utilize the External Secrets Operator. This operator integrates Kubernetes directly with external secret management systems like AWS Secrets Manager, Azure Key Vault, or HashiCorp Vault. Keys are injected directly into the container's memory at runtime via a tmpfs volume, ensuring the private key never touches the physical disk and is immediately destroyed when the pod terminates.
Core Principle 4: Guard the Ingress with Policy-as-Code
Ingress controllers are frequent blind spots. A common scenario involves a developer spinning up an Ingress resource for a new service but bypassing the corporate CA, opting instead for a self-signed certificate or a misconfigured Let's Encrypt staging issuer. This leads to browser security warnings, failed API integrations, and potential man-in-the-middle vulnerabilities.
To prevent this, implement Policy-as-Code using tools like Kyverno or OPA Gatekeeper. You can enforce a cluster-wide policy that rejects any Ingress or Certificate resource that does not use the approved corporate ClusterIssuer.
Here is an example of a Kyverno ClusterPolicy that blocks rogue certificate issuers:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: restrict-cert-issuers
spec:
validationFailureAction: enforce
rules:
- name: allow-only-vault-issuer
match:
any:
- resources:
kinds:
- Certificate
validate:
message: "You must use the approved corporate 'vault-issuer' for all certificates."
pattern:
spec:
issuerRef:
name: vault-issuer
The "Hidden" Outage Phenomenon: Why Automation Requires Monitoring
According to recent industry reports, roughly 80% of organizations have experienced at least one outage due to an expired certificate in the past two years. In cloud-native environments, these outages are notoriously difficult to troubleshoot because the failure happens deep within machine-to-machine communication layers.
A dangerous misconception is that implementing cert-manager completely solves the expiration problem. Automation without monitoring is just a faster way to fail.
Automation pipelines break for numerous reasons:
* Let's Encrypt or your internal CA applies rate limiting.
* The Vault server becomes sealed after a restart.
* A webhook timeout prevents the CSR from being validated.
* DNS resolution issues block ACME challenges.
When your certificates only live for 24 hours, an automation failure means you will experience a catastrophic cluster outage by tomorrow. You need an out-of-band monitoring system that acts as a safety net.
This is where a dedicated expiration tracking platform like Expiring.at becomes critical. By integrating external monitoring, you can track the actual endpoints and certificates deployed in your cluster. If cert-manager fails to renew a certificate, and it crosses a critical threshold (e.g., 7 days until expiration for longer-lived ingress certs, or 1 hour for internal mesh certs), Expiring.at will trigger proactive alerts via Slack, PagerDuty, or email. Relying solely on the automation tool to report its own failures is a recipe for downtime; external validation is a mandatory defense-in-depth strategy.
Looking Ahead: Cryptographic Agility and PQC
Finally, container certificate management must account for the future of cryptography. In August 2024, NIST finalized the first three Post-Quantum Cryptography (PQC) standards (FIPS 203, 204, and 205). As quantum computing advances, traditional RSA and ECC algorithms will become vulnerable.
Organizations must audit their container environments for "crypto-agility"—the ability to swap out cryptographic algorithms without rewriting application code.
If your developers are hardcoding cryptographic logic into their applications, migrating to PQC will be a multi-year refactoring nightmare. However, if you have followed the best practices outlined above—automating issuance with cert-manager and abstracting encryption to the Service Mesh—achieving quantum resistance will eventually be as simple as updating a few lines of YAML in your ClusterIssuer to support the new NIST algorithms.
Key Takeaways
Managing certificates in containerized environments requires a fundamental shift in mindset from static tracking to dynamic orchestration. To secure your infrastructure in 2024 and beyond:
- Embrace Ephemerality: Use SPIFFE/SPIRE for workload identity and issue certificates with lifespans measured in hours, not months.
- Automate Relentlessly: Leverage
cert-managerand Vault to ensure zero human touch in the certificate provisioning lifecycle. - Abstract the Cryptography: Push mTLS enforcement down to the Service Mesh (Istio/Cilium) to keep application code clean and compliant with Zero Trust architectures.
- Enforce Guardrails: Use Kyverno or OPA Gatekeeper to block unauthorized self-signed certificates at the API server level.
- Trust, but Verify: Never assume automation is infallible. Use external monitoring tools like Expiring.at to alert your team before a failed renewal turns into a headline-making outage.
By treating certificates as dynamic, automated, and tightly monitored infrastructure components, you can eliminate secret sprawl