From Manual Pain to Automated Gain: Infrastructure as Code for Certificate Management

It’s a scenario that keeps DevOps engineers and SREs up at night: a critical production service suddenly becomes unreachable. Dashboards light up with alerts, customer complaints flood in, and the war...

Tim Henrich
November 21, 2025
8 min read
102 views

From Manual Pain to Automated Gain: Infrastructure as Code for Certificate Management

It’s a scenario that keeps DevOps engineers and SREs up at night: a critical production service suddenly becomes unreachable. Dashboards light up with alerts, customer complaints flood in, and the war room assembles. After a frantic investigation, the culprit is found—a simple, forgotten TLS certificate that expired, bringing a multi-million dollar application to its knees. This isn't a hypothetical fear; it's a recurring reality. In 2022, a similar incident with an expired certificate contributed to a major outage for Microsoft Teams.

For years, organizations managed certificates with spreadsheets, calendar reminders, and heroic manual effort. This was barely sustainable when you had a handful of monolithic applications and certificates with multi-year lifespans. Today, that model is completely broken.

The convergence of three powerful trends has created a perfect storm, making automated, code-driven certificate management an absolute necessity:
1. The 90-Day Lifespan: The industry, led by initiatives from Google and the CA/Browser Forum, is rapidly moving towards a 90-day maximum validity for public TLS certificates. Manual renewals every three months are simply not scalable.
2. Microservice Proliferation: The shift to cloud-native architectures means we're no longer managing ten certificates for ten servers. We're managing thousands of certificates for thousands of microservices, load balancers, and API gateways.
3. The Rise of GitOps: Git has become the undisputed source of truth for application code and infrastructure. It's only logical that security configurations, including certificate definitions, follow suit.

This is where Infrastructure as Code (IaC) transforms certificate management from a high-risk operational burden into a reliable, auditable, and fully automated process. By defining every certificate in code, you can provision, renew, and deploy them with the same rigor and safety as your core infrastructure.

The Pitfalls of Manual Management vs. The Power of Code

Before diving into the technical implementation, it's crucial to understand the specific problems that IaC solves. The manual approach is riddled with hidden risks that often go unnoticed until it's too late.

Common Manual Problem Real-World Consequence The IaC Solution
Certificate Sprawl & Shadow IT Teams request certificates independently, resulting in no central inventory. You can't secure what you don't know exists. This creates massive security blind spots and makes auditing impossible. All certificates are defined as resources in a version-controlled Git repository. The codebase becomes the definitive, queryable inventory—the single source of truth.
Inconsistent & Weak Configurations Certificates are issued with outdated cipher suites, incorrect domain names, or weak key algorithms because of human error or lack of standardization, leading to failed compliance audits and security vulnerabilities. Reusable IaC modules enforce standards. A "standard-webapp-certificate" module can guarantee that every certificate created uses ECDSA P-256 keys and meets your organization's security baseline.
Insecure Private Key Handling Private keys are emailed, stored in shared folders, or left on developer laptops. Worse, they are sometimes accidentally committed to Git, exposing the crown jewels of your security. The IaC workflow generates the key and immediately pushes it to a secure secrets manager like HashiCorp Vault or AWS Secrets Manager. The key never touches a local disk in plaintext.
Error-Prone Renewal Process A calendar reminder is missed, the responsible person is on vacation, or a manual copy-paste error occurs during deployment. The result is a costly, reputation-damaging outage from an expired certificate. The min_days_remaining parameter in an IaC definition automatically triggers a renewal process well before expiration. The entire cycle—issuance, validation, and deployment—is fully automated.

A Practical Blueprint: Automating Let's Encrypt Certificates with Terraform

Theory is great, but let's see what this looks like in practice. We'll use Terraform, the industry-standard IaC tool, to automate the entire lifecycle of a certificate from Let's Encrypt, the world's leading free and automated Certificate Authority.

This example will:
1. Generate a private key for the certificate.
2. Request a certificate from Let's Encrypt using the ACME protocol.
3. Automatically solve the DNS-01 challenge using AWS Route 53 to prove domain ownership.
4. Store the issued certificate in AWS Certificate Manager (ACM).
5. Attach the certificate to an Application Load Balancer.

Prerequisites

  • Terraform installed.
  • AWS credentials configured.
  • A domain you own managed by AWS Route 53.

Here is the complete Terraform code:

# main.tf

# Configure the required providers: tls for key generation,
# acme for Let's Encrypt, and aws for our infrastructure.
terraform {
  required_providers {
    acme = {
      source  = "vancluever/acme"
      version = "~> 2.0"
    }
    tls = {
      source  = "hashicorp/tls"
      version = "~> 4.0"
    }
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

# 1. Configure the ACME provider to use the Let's Encrypt production server.
provider "acme" {
  server_url = "https://acme-v02.api.letsencrypt.org/directory"
}

# 2. Create a private key and registration for your ACME account.
# This identifies you to Let's Encrypt. Store this key securely.
resource "tls_private_key" "account_key" {
  algorithm = "ECDSA"
  ec_curve  = "P256"
}

resource "acme_registration" "reg" {
  account_key_pem = tls_private_key.account_key.private_key_pem
  email_address   = "devops@example.com" # Use a real email for expiration notices
}

# 3. Create a strong private key for the certificate itself.
# This key will be secured on your server/load balancer.
resource "tls_private_key" "certificate_key" {
  algorithm = "ECDSA"
  ec_curve  = "P256"
}

# 4. Request the certificate from Let's Encrypt.
# This is the core resource that manages the certificate lifecycle.
resource "acme_certificate" "certificate" {
  account_key_pem = acme_registration.reg.account_key_pem
  common_name     = "app.your-domain.com"
  subject_alternative_names = ["api.your-domain.com"]

  # Terraform will automatically trigger a renewal when the certificate
  # has 30 or fewer days remaining before expiration. This is the magic!
  min_days_remaining = 30

  private_key_pem = tls_private_key.certificate_key.private_key_pem

  # Use the DNS challenge for automation. The provider will automatically
  # create and delete the required TXT records in Route 53.
  dns_challenge {
    provider = "route53"
    # The AWS provider credentials are used implicitly here.
  }
}

# 5. Upload the issued certificate to AWS Certificate Manager (ACM).
# This makes it available to other AWS services like ALBs and CloudFront.
resource "aws_acm_certificate" "cert" {
  private_key       = tls_private_key.certificate_key.private_key_pem
  certificate_body  = acme_certificate.certificate.certificate_pem
  certificate_chain = acme_certificate.certificate.issuer_pem

  tags = {
    ManagedBy = "Terraform"
    Domain    = "app.your-domain.com"
  }

  # This lifecycle block ensures that a new certificate is created and
  # uploaded before the old one is destroyed, preventing downtime during renewal.
  lifecycle {
    create_before_destroy = true
  }
}

# 6. Attach the certificate to a pre-existing Application Load Balancer listener.
# (Assumes you have an ALB and Listener defined elsewhere)
resource "aws_lb_listener_certificate" "https_cert" {
  listener_arn    = "arn:aws:elasticloadbalancing:us-east-1:123456789012:listener/app/my-load-balancer/..."
  certificate_arn = aws_acm_certificate.cert.arn
}

How It Works

When you run terraform apply, Terraform executes this plan:
1. It generates the necessary private keys in memory.
2. It contacts the Let's Encrypt API to start the certificate request.
3. Let's Encrypt provides a token for the DNS-01 challenge.
4. The ACME provider uses your AWS credentials to create a temporary TXT record in Route 53 with the required value.
5. Once Let's Encrypt verifies the record, it issues the certificate.
6. Terraform then uploads the certificate, private key, and chain to AWS ACM.
7. Finally, it attaches this new certificate to your load balancer listener.

The most powerful part is the min_days_remaining = 30 line. On subsequent runs of terraform apply (e.g., in a nightly CI/CD pipeline), Terraform will check the certificate's expiration date. If it's within 30 days of expiring, Terraform will automatically repeat the entire process, provisioning a new certificate and deploying it seamlessly.

Best Practices for Bulletproof Certificate Automation

Implementing the code is just the first step. To build a truly robust and secure system, follow these industry best practices.

1. Centralize State, Decentralize Execution

Your Terraform state file is your source of truth. It contains the status of all managed resources, including your certificates. This state file must be stored in a centralized, secure, and remote location like an AWS S3 bucket with state locking enabled or in Terraform Cloud. This prevents conflicts and provides a complete inventory.

While the state is central, execution can be decentralized. Different teams' CI/CD pipelines can trigger Terraform runs for their specific applications, all while reporting back to the same central state.

2. Integrate with a True Secrets Manager

Notice in our example that the tls_private_key resource generates the key, which Terraform then holds in its state. While the state can be encrypted, an even better practice is to avoid storing private keys there at all.

A more advanced workflow involves using your IaC tool as an orchestrator to pass secrets directly to a dedicated secrets manager. The Terraform script would generate the CSR, but the private key would be generated and stored directly within HashiCorp Vault or AWS Secrets Manager. The application or infrastructure resource would then be configured to fetch the key directly from the vault at runtime.

3. Implement Policy as Code (PaC)

To prevent developers from provisioning non-compliant certificates (e.g., using weak RSA keys or requesting overly broad wildcard domains), you can implement Policy as Code. Tools like [Open Policy Agent (OPA)](https://www.openpolicyagent.

Share This Insight

Related Posts