Surviving the 90-Day TLS Mandate: A GitOps Approach to Certificate Lifecycle Management
The era of the multi-year SSL certificate is officially over. With Google's proposal to reduce the maximum public TLS certificate validity from 398 days to just 90 days, the industry is facing a mathematical reality: manual certificate management is no longer a viable operational strategy.
When you combine shrinking lifespans with the explosion of machine identities—which now outnumber human identities by a staggering 45 to 1—managing certificates via spreadsheets, calendar alerts, and manual ticketing systems is a guaranteed recipe for an outage. High-profile, multi-million dollar outages caused by expired certificates at companies like Starlink, Microsoft, and Epic Games have proven that human memory does not scale. With Gartner estimating the average cost of IT downtime at $5,600 per minute, a single "forgot to renew" incident can easily cost an organization hundreds of thousands of dollars.
To survive this shift, engineering teams must fundamentally rethink how they handle Certificate Lifecycle Management (CLM). The solution lies in merging CLM with the declarative, automated world of GitOps.
By shifting from manual, reactive operations to a model where infrastructure state is version-controlled and automatically reconciled, GitOps transforms certificate management from an operational chore into a highly scalable, secure, and auditable engineering practice.
The Breaking Point: Why Traditional Certificate Management is Failing
Before diving into the solution, it is crucial to understand why legacy CLM workflows break down in modern, cloud-native environments.
- Configuration Drift: Certificates deployed manually to load balancers, ingress controllers, or web servers inevitably drift from their intended state. A developer might manually patch a certificate during an incident, forgetting to update the central repository.
- The "Secret in Git" Anti-Pattern: In early attempts to automate deployments, engineers often resorted to committing raw certificates and private keys directly into Git repositories. This created massive security vulnerabilities, exposing private keys to anyone with read access to the source code.
- Lack of Crypto-Agility: With the National Institute of Standards and Technology (NIST) releasing finalized Post-Quantum Cryptography (PQC) standards (FIPS 203, 204, and 205), organizations must prepare to swap out underlying cryptographic algorithms. Manually updating RSA or ECC keys to quantum-resistant algorithms across thousands of endpoints is practically impossible without declarative automation.
The GitOps Paradigm Shift: Intent Over Artifacts
The fundamental genius of GitOps in the context of CLM is the shift from managing artifacts to managing intent.
In a traditional workflow, an engineer generates a Private Key and a Certificate Signing Request (CSR), sends the CSR to a Certificate Authority (CA), receives the signed Certificate artifact, and then figures out how to deploy that artifact to a server.
In a GitOps workflow, you never touch the private key or the certificate artifact. Instead, you commit a declarative configuration file to Git that simply states: "I need a valid certificate for api.example.com, issued by Let's Encrypt, that renews 15 days before expiration."
A GitOps controller (such as ArgoCD or Flux) detects this intent, applies it to the cluster, and delegates the actual cryptographic heavy lifting to an in-cluster certificate controller.
Architecting a GitOps CLM Pipeline
A modern, robust GitOps CLM architecture typically relies on a combination of Git, a GitOps controller, and an in-cluster certificate manager like cert-manager, the de facto standard for Kubernetes.
Here is how the automated flow works in practice:
- Source of Truth (Git): A developer commits a YAML file defining a
CertificateCustom Resource Definition (CRD). Crucially, no private keys are stored here. - GitOps Controller Synchronization: ArgoCD or Flux constantly monitors the Git repository. Upon detecting the new commit, it pulls the YAML and applies it to the target Kubernetes cluster.
- In-Cluster Generation:
cert-managerdetects the newCertificateresource. It securely generates a private key directly inside the cluster's memory, creates a CSR, and negotiates with the CA via the Automated Certificate Management Environment (ACME) protocol. - Fulfillment & Injection: The CA (e.g., Let's Encrypt, HashiCorp Vault) validates the request via a DNS-01 or HTTP-01 challenge and returns the signed certificate.
cert-managerstores the resulting certificate and private key as a KubernetesSecret, which is immediately mounted by the application or Ingress controller. - Automated Reconciliation: If the certificate nears expiration, or if someone accidentally deletes the Kubernetes
Secret, the controllers automatically kick off the process again to restore the declared state.
Real-World Implementation: ArgoCD + cert-manager
Let's look at what this actually looks like in code. To automate issuance, you first need to define an Issuer or ClusterIssuer that tells cert-manager how to talk to your CA.
Here is a declarative configuration for a Let's Encrypt ACME issuer using DNS-01 validation (which is ideal because it doesn't require exposing your cluster to the public internet):
# cluster-issuer.yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-production
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: security@yourdomain.com
privateKeySecretRef:
name: letsencrypt-production-account-key
solvers:
- dns01:
route53:
region: us-east-1
hostedZoneID: Z1234567890ABCDEF
Once your GitOps controller applies this issuer, developers can request certificates by simply committing a Certificate resource to their infrastructure repository:
# api-certificate.yaml
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: api-example-com-tls
namespace: production
spec:
# The secret name where the cert and key will be stored
secretName: api-example-com-tls-secret
# How long the certificate is valid for
duration: 2160h # 90 days
# When to renew (30 days before expiration)
renewBefore: 720h
subject:
organizations:
- My Company LLC
isCA: false
privateKey:
algorithm: ECDSA
size: 256
dnsNames:
- api.example.com
issuerRef:
name: letsencrypt-production
kind: ClusterIssuer
When you commit api-certificate.yaml to your Git repository, ArgoCD syncs it, cert-manager talks to Let's Encrypt via Route53 DNS validation, and the resulting TLS secret is ready for your Ingress controller to consume. Zero manual intervention, zero exposed keys.
Security and Compliance Benefits
Moving CLM to a GitOps model solves several massive compliance and security headaches simultaneously.
1. Zero Trust Architecture (ZTA) Enablement
Zero Trust requires continuous verification, which in modern microservices means mutual TLS (mTLS) between every service. You cannot manually manage mTLS certificates that expire every 24 hours. GitOps enables the rapid, automated rotation of these short-lived certificates required to sustain a true Zero Trust network.
2. Instant Auditability for SOC 2 and ISO 27001
Compliance auditors love GitOps. Because Git is the single source of truth, every certificate request, configuration change, or revocation is permanently logged as a Git commit. You have a built-in, tamper-evident audit trail showing exactly who requested a certificate, who approved the Pull Request (PR), and when it was deployed.
3. Git as the Security Perimeter
By utilizing GitOps, your branch protection rules become your primary security perimeter. You can enforce policies requiring two senior engineers to approve any PR that modifies a ClusterIssuer, ensuring that no single individual can blindly reroute certificate issuance or alter cryptographic standards.
Industry Best Practices for GitOps CLM
While the tooling is powerful, a successful GitOps CLM implementation relies on strict adherence to architectural best practices.
Decouple Infrastructure from Application Repositories
Never store your Certificate and Issuer configurations in the same Git repository as your application source code. Maintain a dedicated platform or infrastructure repository. This allows you to apply strict Role-Based Access Control (RBAC) to your cryptographic configurations without slowing down application developers.
Standardize on the ACME Protocol
Whenever possible, use the Automated Certificate Management Environment (ACME) protocol. While it is famous for powering Let's Encrypt, enterprise CAs like Venafi and internal PKI tools like HashiCorp Vault also support ACME. Standardizing on ACME means your GitOps workflow remains identical whether you are issuing external public certificates or internal private certificates.
Trust, but Verify: The Importance of External Monitoring
This is the most critical and often overlooked best practice: Automation can, and will, fail silently.
Your GitOps controller might be syncing perfectly, but if Let's Encrypt rate-limits your IP, or if AWS Route53 experiences an outage preventing DNS-01 validation, cert-manager will quietly fail to renew the certificate. If you are only looking at your Git repository, everything will look green right up until your application goes offline.
GitOps handles the issuance and renewal, but you absolutely must have an independent, external system handling the expiration tracking. This is where integrating a dedicated tool like Expiring.at becomes your ultimate safety net.
By utilizing Expiring.at, you decouple your monitoring from your infrastructure. It acts as an external observer, continuously checking your actual endpoints. If your GitOps automation fails to renew a certificate 15 days before expiration as declared, Expiring.at will alert your team via Slack, email, or webhook, giving you ample time to debug the cert-manager logs before an outage occurs.
Test Renewals in CI/CD
Do not wait for a 90-day cycle to find out if your renewal logic works. Create staging environments that issue certificates with extremely short lifespans (e.g., 1-hour validity). This forces your GitOps pipeline to continuously execute the renewal process, ensuring that your ACME challenges, DNS permissions, and controller logic are constantly validated.
Case Study: Automating mTLS at Scale
Consider the real-world scenario of a mid-sized FinTech company that needed to implement mTLS across 500 microservices to meet strict financial compliance regulations.
Attempting this manually would require a dedicated team working full-time just to issue and rotate certificates. Instead, the platform engineering team adopted a GitOps approach. They defined cert-manager Issuer and Certificate resources within their base Helm charts, managed by ArgoCD, hooked into an internal HashiCorp Vault