Surviving the 90-Day TLS Mandate: Infrastructure as Code for Certificate Management
For years, managing Public Key Infrastructure (PKI) and TLS certificates was a manual, often tedious chore relegated to calendar reminders and ticketing systems. A developer would submit an IT ticket, wait three days, receive a .pem file, and manually upload it to a load balancer. If we were lucky, the certificate was valid for a year or two, allowing us to forget about the process until the inevitable panic of expiration.
Those days are officially over.
The intersection of Infrastructure as Code (IaC) and certificate management has shifted from a "nice-to-have" engineering luxury to a strict operational mandate. Driven by shrinking certificate lifespans, the explosion of machine identities, and the push toward Zero Trust Architecture, manual certificate management is now mathematically and operationally impossible.
In this comprehensive guide, we will explore why the industry is moving toward code-driven PKI, the critical security pitfalls to avoid when managing certificates in tools like Terraform, and how to build a robust, automated certificate lifecycle pipeline.
The Impending 90-Day Reality Check
The most significant driver forcing the adoption of IaC for certificate management is the impending reduction of maximum TLS certificate lifespans. Following Google’s Chromium Root Program proposal, the industry is preparing to reduce the maximum validity of public TLS certificates from 398 days to just 90 days.
When certificates expire every three months, relying on human intervention is a recipe for disaster. Expired certificates remain the number one cause of preventable downtime, having recently caused massive, highly publicized outages for platforms like Starlink, Epic Games, and numerous government portals.
Furthermore, human identities are no longer the primary consumers of certificates. According to recent industry reports, machine identities—containers, microservices, APIs, and automated scripts—now outnumber human identities by a factor of 45 to 1. Every single one of these endpoints requires a cryptographic identity to facilitate Mutual TLS (mTLS) in a Zero Trust Architecture.
You cannot manually provision, rotate, and revoke thousands of certificates every 90 days. The only scalable solution is treating your certificate policies and issuance pipelines as code.
The Pitfalls of Legacy PKI Management
Before diving into the solution, it is vital to understand the friction legacy PKI creates in modern DevOps environments:
- Shadow PKI: When security teams enforce slow, manual ticketing processes for certificate issuance, developers inevitably bypass them. They spin up self-signed certificates or use unauthorized Certificate Authorities (CAs) to keep their CI/CD pipelines moving. This creates massive security blind spots.
- Lack of Crypto-Agility: In August 2024, NIST finalized the first three Post-Quantum Cryptography (PQC) algorithms. Organizations must now prepare to swap out RSA and ECC algorithms for quantum-resistant ones. In a manual PKI environment, this takes years. With IaC, updating cryptographic algorithms across thousands of endpoints can be as simple as changing a variable in a central configuration file.
- Audit and Compliance Failures: Modern compliance frameworks, including PCI-DSS v4.0, require strict, auditable controls over cryptography and automated key rotation. Manual processes fail these audits; Git-backed IaC pipelines provide the exact immutable audit trails auditors demand.
GitOps for PKI: The IaC Paradigm Shift
The modern approach to certificate management relies on GitOps. In a GitOps model, security teams define certificate policies (e.g., "All internal certificates must use RSA 2048 and be valid for no more than 30 days") using Policy-as-Code tools like Open Policy Agent (OPA) or HashiCorp Sentinel.
Developers are provided with pre-approved Terraform modules or Helm charts. When a developer needs a certificate for a new application, they simply declare it in their infrastructure code. A pull request serves as the audit trail. Once merged, the CI/CD pipeline—acting as an authorized machine identity—automatically requests, validates, and provisions the certificate.
Solving the Terraform State Secret Leakage
When teams first attempt to automate certificates using Terraform, they often make a critical, high-risk security error: generating private keys directly within the Terraform code.
Using the tls_private_key resource from the HashiCorp TLS provider generates the private key during the terraform apply phase. The fatal flaw here is that Terraform stores this generated private key in plain text within the .tfstate file. Anyone with read access to your state file (or the S3 bucket hosting it) now has your private keys.
The Wrong Way (Do Not Do This):
# DANGER: This stores the unencrypted private key in your terraform.tfstate
resource "tls_private_key" "bad_example" {
algorithm = "RSA"
rsa_bits = 2048
}
resource "tls_self_signed_cert" "bad_example_cert" {
private_key_pem = tls_private_key.bad_example.private_key_pem
# ... other configuration
}
The Right Way: Dynamic Secrets and External Vaults
The golden rule of IaC certificate management is: Never generate private keys in Terraform.
Instead, use Terraform to configure the infrastructure (like Load Balancers or API Gateways) to request the certificate dynamically from a secure vault or managed service at runtime.
Here is a secure, production-ready example using AWS Certificate Manager (ACM) and Route53. In this workflow, Terraform requests the certificate from AWS, automates the DNS validation, and attaches it to a load balancer. The private key never touches your Terraform state; it remains securely locked inside AWS KMS.
# 1. Request the certificate from AWS ACM
resource "aws_acm_certificate" "secure_cert" {
domain_name = "api.yourdomain.com"
validation_method = "DNS"
lifecycle {
create_before_destroy = true
}
}
# 2. Automatically create the Route53 DNS records for Domain Validation
resource "aws_route53_record" "cert_validation" {
for_each = {
for dvo in aws_acm_certificate.secure_cert.domain_validation_options : dvo.domain_name => {
name = dvo.resource_record_name
record = dvo.resource_record_value
type = dvo.resource_record_type
}
}
allow_overwrite = true
name = each.value.name
records = [each.value.record]
ttl = 60
type = each.value.type
zone_id = data.aws_route53_zone.main.zone_id
}
# 3. Tell Terraform to wait for the validation to complete
resource "aws_acm_certificate_validation" "cert_validation_waiter" {
certificate_arn = aws_acm_certificate.secure_cert.arn
validation_record_fqdns = [for record in aws_route53_record.cert_validation : record.fqdn]
}
# 4. Attach the validated certificate to an Application Load Balancer
resource "aws_lb_listener" "https_listener" {
load_balancer_arn = aws_lb.main.arn
port = "443"
protocol = "HTTPS"
ssl_policy = "ELBSecurityPolicy-TLS13-1-2-2021-06"
certificate_arn = aws_acm_certificate_validation.cert_validation_waiter.certificate_arn
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.main.arn
}
}
Result: Zero human interaction, fully documented in Git, and mathematically secure state files.
Kubernetes Native: Automating with cert-manager
If you are operating in a cloud-native environment, the undisputed standard for certificate management is cert-manager. Deployed via Helm or Terraform, cert-manager extends the Kubernetes API using Custom Resource Definitions (CRDs) to automate the issuance and renewal of certificates.
By defining an Issuer (or ClusterIssuer) and a Certificate resource, you instruct your cluster to automatically negotiate with a CA (like Let's Encrypt or HashiCorp Vault) via the Automated Certificate Management Environment (ACME) protocol.
Here is an example of implementing IaC for Kubernetes certificates. First, we define a Let's Encrypt ACME issuer:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: security@yourdomain.com
privateKeySecretRef:
name: letsencrypt-prod-account-key
solvers:
- http01:
ingress:
class: nginx
Next, developers simply include a Certificate manifest alongside their application deployments. cert-manager handles the generation of the private key directly within the cluster, requests the signed certificate, and stores both securely as a Kubernetes Secret.
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: api-tls-cert
namespace: production
spec:
secretName: api-tls-secret # The secret where the cert/key will be stored
duration: 2160h # 90 days
renewBefore: 360h # Renew 15 days before expiration
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
dnsNames:
- api.yourdomain.com
When the 90-day expiration window approaches, cert-manager automatically spins up a temporary pod to solve the HTTP-01 challenge, fetches the new certificate, and updates the Kubernetes Secret. Your Ingress controller (like NGINX or Traefik) detects the updated Secret and dynamically reloads the certificate with zero downtime.
Real-World Impact: A Case Study in Automation
The transition to IaC certificate management yields dramatic operational improvements. Consider the case of a major FinTech company that recently overhauled its PKI infrastructure.
Previously, they relied on a Jira-based ticketing system for SSL certificates, boasting an average SLA of 4 days per request. By migrating to a Terraform and HashiCorp Vault GitOps pipeline, issuance time dropped to exactly