From Spreadsheets to YAML: The GitOps Guide to Certificate Lifecycle Management
For decades, the standard procedure for managing SSL/TLS certificates was a ritual of manual labor: generate a CSR on a server, paste it into a CA portal, download the blob, scp it to the load balancer, and update a spreadsheet to remind yourself to do it again in a year.
In 2025, that workflow is not just inefficient; it is an operational liability.
With Google’s proposal to reduce maximum certificate validity to 90 days, the window for manual renewal is shrinking by 75%. Combined with the explosion of ephemeral infrastructure and the looming requirement for crypto-agility due to Post-Quantum Cryptography (PQC), the industry has reached a breaking point. You cannot manage high-velocity infrastructure with low-velocity security processes.
The solution lies in the convergence of PKI (Public Key Infrastructure) and DevOps. It is time to stop managing certificates as artifacts and start managing them as code. This is the definitive guide to adopting a GitOps approach to Certificate Lifecycle Management (CLM).
The Case for Declarative PKI
GitOps is often described simply as "operations by pull request," but in the context of security, it represents a shift in liability and auditability.
In a traditional model, an expired certificate is often the result of human error—a missed email notification or a forgotten ticket. In a GitOps model, the state of your certificates is defined in a Git repository. If the cluster state deviates from that definition (e.g., a certificate is missing or nearing expiration), an automated controller reconciles it.
The Three Drivers of Change
- The 90-Day Validity Cliff: As browser vendors push for shorter validity periods to reduce the window of compromise for stolen keys, the workload for managing certificates creates a "toil debt" that manual teams cannot pay.
- Ephemeral Environments: When a developer spins up a dynamic feature branch environment, they need valid TLS immediately. Waiting two days for a security ticket to be processed blocks the CI/CD pipeline.
- Crypto-Agility: NIST recently finalized the standards for Post-Quantum Cryptography. When the time comes to migrate from RSA-2048 to algorithms like CRYSTALS-Kyber, you do not want to manually reconfigure thousands of servers. You want to change one line of YAML in your
Issuerconfiguration and let GitOps propagate the change.
The GitOps CLM Architecture
To implement this, we move away from imperative commands (like openssl req) and toward a Kubernetes-native stack. The industry standard architecture currently revolves around three core components:
- The Source of Truth: A Git repository containing
CertificateandIssuermanifests. - The CD Controller: Tools like ArgoCD or Flux that sync Git to the cluster.
- The Certificate Controller: cert-manager, which watches the cluster for certificate requests and talks to the Certificate Authority (CA).
How the Workflow Executes
In this architecture, you never touch the private key. Here is the "Happy Path" lifecycle of a GitOps-managed certificate:
- Commit: A DevOps engineer commits a
CertificateCustom Resource Definition (CRD) to the repository. - Sync: ArgoCD detects the change and applies the manifest to the Kubernetes namespace.
- Request:
cert-managersees the new resource. It generates a private key (stored inside the cluster) and creates a Certificate Signing Request (CSR). - Challenge: The controller performs an ACME challenge (DNS-01 or HTTP-01) to prove domain ownership to a CA like Let's Encrypt.
- Issue: Upon validation, the CA signs the certificate.
cert-managerstores the resulting certificate and key in a KubernetesSecret. - Mount: The application mounts the secret. When renewal time comes,
cert-managerupdates the secret automatically, and the application reloads it.
Implementation: Defining Policy as Code
The power of GitOps is that it allows you to template security requirements. Instead of asking developers to understand X.509 standards, you provide them with a standard manifest.
1. The ClusterIssuer (The Authority)
First, platform engineers define who can issue certificates. This is usually done once per cluster.
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: security@example.com
privateKeySecretRef:
name: letsencrypt-prod-account-key
solvers:
- http01:
ingress:
class: nginx
2. The Certificate (The Request)
Developers then include a Certificate resource in their application's Helm chart. Note that they define the policy (dnsNames, duration, issuer), not the certificate itself.
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: payment-service-cert
namespace: payments
spec:
secretName: payment-service-tls
duration: 2160h # 90 days
renewBefore: 360h # 15 days
subject:
organizations:
- Expiring Example Corp
commonName: api.payments.example.com
dnsNames:
- api.payments.example.com
- internal.payments.example.com
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
By checking this into Git, you have created an immutable audit trail. You know exactly when the domain was added, who requested it, and which issuer was authorized to sign it.
Solving the "Secret Zero" Problem
A common misconception in GitOps is that you must commit everything to Git. You must never commit the actual .crt or .key files to Git.
However, you often need sensitive credentials to automate the process—for example, the AWS Route53 API token required to solve a DNS-01 challenge for a wildcard certificate. This creates the "Secret Zero" problem: how do you get that first credential into the cluster securely?
The modern solution is the External Secrets Operator (ESO).
ESO bridges your cloud provider's secret store (AWS Secrets Manager, Azure Key Vault, HashiCorp Vault) with Kubernetes. You store the API token in your secure cloud vault, and ESO fetches it to create the Kubernetes Secret that cert-manager needs.
The Workflow with ESO:
1. Store Route53 API Token in AWS Secrets Manager.
2. Commit an ExternalSecret manifest to Git (safe, as it only references the name of the AWS secret).
3. ESO fetches the token and creates a K8s Secret.
4. cert-manager uses that Secret to prove domain ownership and issue the wildcard cert.
The "Blind Spot": Why Automation Needs Monitoring
While GitOps solves the provisioning problem, it introduces a new risk: silent failure.
If your automation pipeline breaks—perhaps the ACME API is down, a firewall rule changed, or the cert-manager pod crashed—the certificate will simply fail to renew. In a purely automated system, no human logs into the dashboard to check expiration dates. You might not know the renewal failed until the outage begins.
This is where external monitoring becomes critical. You cannot rely solely on the internal logic of the cluster to report its own health.
The Defense-in-Depth Strategy
- Internal Metrics: Configure Prometheus to scrape
cert-managermetrics. Specifically, watchcertmanager_certificate_expiration_timestamp_seconds. Alert if any certificate is < 7 days from expiration. - External Verification: Use a dedicated tracking platform like Expiring.at. By scanning your public endpoints from the outside, you verify not just that the certificate exists, but that the load balancer is serving the correct one.
- The "Check-Engine" Light: Automation handles the 99% of renewals that go smoothly. Your monitoring tool is the "check-engine light" for the 1% that hang due to obscure DNS propagation issues or CA rate limits.
Advanced Pattern: Trust Distribution
One of the hardest challenges in CLM is not issuing certificates, but distributing the Trust Bundle (CA Root) to all your services. If Service A talks to Service B via mTLS, Service A must trust the Private CA that signed Service B's cert.
In a GitOps environment, you can use trust-manager, a sister project to cert-manager.
You define a Bundle resource in Git:
apiVersion: trust.cert-manager.io/v1alpha1
kind: Bundle
metadata:
name: global-trust-bundle
spec:
sources:
- useDefaultCAs: true
- secret:
name: "my-internal-ca-secret"
key: "ca.crt"
target:
configMap:
key: "trust-bundle.pem"
The operator ensures this ConfigMap exists in every namespace. Your applications simply mount /etc/ssl/certs/trust-bundle.pem. When you rotate your Private CA, you update one Secret, and the trust bundle propagates to the entire fleet automatically.
Best Practices for Production
- Lint Your Certificates: Just as you lint your code, use tools like kube-linter to validate your certificate manifests before they merge. Ensure developers aren't requesting overly broad permissions or using deprecated API versions.
- Use Staging Issuers for Dev: Let's Encrypt has strict rate limits. If your CI/CD pipeline spins up ephemeral environments, configure them to use the Let's Encrypt Staging Environment. You won't get a trusted green lock, but you won't get rate-limited either.
- Decouple Applications from Certificates: Don't hardcode certificate logic into your app. The app should expect a file at
/tls/tls.crt. It shouldn't care if that file came from Vault, Let's Encrypt, or a self-signed source. This is the essence of cloud-native design.
Conclusion: Identity is the New Perimeter
The transition to GitOps for certificate management is about more than saving time. It is about acknowledging that in a zero-trust world, identity (and by extension, the certificate) is the new perimeter.
By moving CLM into Git, you gain version control, auditability, and the ability to respond to cryptographic threats at the speed of code deployment. However, remember that automation is a force multiplier—it multiplies efficiency, but it can also multiply errors if left unchecked.
Build your declarative pipelines, automate your renewals, but keep your external monitoring active. In the era of 90-day certificates, visibility is the only thing standing between an automated renewal and a silent outage.