The 90-Day Lifespan is Here: A DevOps Guide to Automating Let's Encrypt Renewals

The era of the one-year SSL/TLS certificate is officially coming to an end. With Google Chrome’s "Moving Forward, Together" initiative pushing to reduce the maximum validity of public TLS certificates...

Tim Henrich
April 28, 2026
7 min read
122 views

The 90-Day Lifespan is Here: A DevOps Guide to Automating Let's Encrypt Renewals

The era of the one-year SSL/TLS certificate is officially coming to an end. With Google Chrome’s "Moving Forward, Together" initiative pushing to reduce the maximum validity of public TLS certificates from 398 days to just 90 days, the industry is undergoing a massive shift.

For years, Let's Encrypt championed the 90-day certificate lifespan to encourage automation and limit the damage of compromised keys. Now, what was once considered an aggressive security posture is becoming the mandatory global standard.

If your infrastructure relies on calendar reminders, Jira tickets, or spreadsheets to manage certificate renewals, you are sitting on a ticking time bomb. In April 2023, a global outage of Starlink services was traced back to a single expired ground station certificate. Historically, giants like Epic Games have suffered massive downtime for the exact same reason. Manual certificate management is no longer viable at scale.

In this comprehensive guide, we will explore how to architect bulletproof, automated certificate renewal pipelines using Let's Encrypt, dive into the nuances of the ACME protocol, and establish the monitoring safety nets required to keep your infrastructure online.

Understanding the ACME Protocol: Proving Domain Control

To issue a certificate, Let's Encrypt must verify that you actually control the domain you are requesting a certificate for. This is handled by the Automated Certificate Management Environment (ACME) protocol.

When your ACME client requests a certificate, Let's Encrypt issues a "challenge." Understanding which challenge to use is the first hurdle in automating your infrastructure.

The HTTP-01 Challenge

The HTTP-01 challenge is the most common method for standard web servers. When your client requests a certificate, Let's Encrypt provides a token. Your client must place this token in a specific file on your web server at http://<domain>/.well-known/acme-challenge/<token>. Let's Encrypt then makes an HTTP request to that URL; if the token matches, the certificate is issued.

  • Best for: Standard, public-facing web servers (Nginx, Apache) and single-node deployments.
  • The Catch: It requires port 80 to be open to the internet. Furthermore, HTTP-01 cannot be used to issue wildcard certificates (e.g., *.example.com). If your architecture terminates SSL at a load balancer, you must configure the load balancer to route /.well-known/acme-challenge/ traffic specifically to the server running your ACME client.

The DNS-01 Challenge

For complex environments, the DNS-01 challenge is the gold standard. Instead of placing a file on a web server, your ACME client uses your DNS provider's API to create a specific TXT record at _acme-challenge.<yourdomain.com>. Let's Encrypt queries global DNS; if the TXT record contains the expected token, validation succeeds.

  • Best for: Issuing Wildcard certificates, securing internal servers that are not exposed to the internet, and highly available multi-server clusters.
  • The Catch: It requires programmatic API access to your DNS provider. It is also subject to DNS propagation delays. If your automation script signals Let's Encrypt to verify the record before the TXT record has propagated to authoritative nameservers, the challenge will fail.

Note: A third challenge, TLS-ALPN-01, validates via the TLS handshake itself using Application-Layer Protocol Negotiation. This is highly favored by modern ingress controllers like Traefik when port 80 is strictly blocked but port 443 is open.

The DevOps Toolchain: Choosing Your ACME Client

You don't need to write ACME API calls from scratch. The open-source community has built robust clients for nearly every environment.

1. Certbot (Traditional VM Deployments)

Maintained by the Electronic Frontier Foundation (EFF), Certbot is the industry standard for traditional Linux VMs running Nginx or Apache.

A standard installation and automated renewal setup for an Nginx server looks like this:

# Install Certbot and the Nginx plugin
sudo apt-get update
sudo apt-get install certbot python3-certbot-nginx

# Request the certificate (HTTP-01) and automatically configure Nginx
sudo certbot --nginx -d example.com -d www.example.com

Certbot automatically creates a systemd timer (systemctl list-timers | grep certbot) that runs twice a day to check for certificates within 30 days of expiration, renewing them automatically.

2. cert-manager (Kubernetes)

If you are running Kubernetes, manual Certbot scripts are an anti-pattern. cert-manager is the absolute standard for cloud-native environments. It runs as a controller within your cluster, automatically provisioning and injecting certificates into Kubernetes Secrets for your Ingress resources.

Here is an example of a ClusterIssuer configured for Let's Encrypt using the DNS-01 challenge with AWS Route 53:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: devops@example.com
    privateKeySecretRef:
      name: letsencrypt-prod-account-key
    solvers:
    - dns01:
        route53:
          region: us-east-1
          hostedZoneID: Z1234567890ABCDEF

3. Native Integrations (Caddy)

Modern web servers are increasingly building ACME clients directly into their core. Caddy is a powerful reverse proxy that provisions and renews Let's Encrypt certificates by default, requiring zero configuration beyond specifying your domain in the Caddyfile:

example.com {
    reverse_proxy localhost:8080
}

That single block of code automatically handles HTTP-01 validation, certificate fetching, renewal, and HTTPS redirection.

Security First: The Principle of Least Privilege for DNS-01

When utilizing the DNS-01 challenge, one of the most common—and dangerous—mistakes DevOps teams make is providing the ACME client with global DNS administrator credentials. If the server running your ACME client is compromised, attackers could hijack your entire DNS routing, redirecting MX records or pointing subdomains to malicious servers.

You must apply the principle of least privilege. If you are using AWS Route 53, your IAM policy should restrict the ACME client so it can only modify TXT records that begin with _acme-challenge.

Here is a secure AWS IAM policy template for cert-manager or Certbot:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "route53:GetChange",
      "Resource": "arn:aws:route53:::change/*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "route53:ChangeResourceRecordSets",
        "route53:ListResourceRecordSets"
      ],
      "Resource": "arn:aws:route53:::hostedzone/YOUR_HOSTED_ZONE_ID",
      "Condition": {
        "ForAllValues:StringEquals": {
          "route53:ChangeResourceRecordSetsNormalizedRecordNames": [
            "_acme-challenge.example.com"
          ],
          "route53:ChangeResourceRecordSetsRecordTypes": [
            "TXT"
          ]
        }
      }
    },
    {
      "Effect": "Allow",
      "Action": "route53:ListHostedZonesByName",
      "Resource": "*"
    }
  ]
}

Surviving the Edge Cases: Rate Limits, Reloads, and ARI

Setting up the initial certificate is easy; ensuring it renews flawlessly for years requires navigating several technical edge cases.

1. Respecting Let's Encrypt Rate Limits

Let's Encrypt enforces strict rate limits to protect their infrastructure. The most common limits you will hit during pipeline development are the "Failed Validation" limit (5 failures per account, per hostname, per hour) and the "Duplicate Certificate" limit (5 per week).

The Solution: Always use the Let's Encrypt Staging Environment when building or testing your CI/CD pipelines. The staging environment has vastly higher rate limits and issues untrusted certificates.

In Certbot, append the --dry-run or --test-cert flag:

sudo certbot renew --dry-run

2. The Reload Hook

A successful ACME renewal downloads the new .pem files to your disk. However, web servers like Nginx and Apache load certificates into memory on startup. If you do not reload the web server, it will continue serving the old, soon-to-be-expired certificate until the process restarts.

Always include a deploy hook in your automation to gracefully reload the service:

sudo certbot renew --deploy-hook "systemctl reload nginx"

3. Embracing ARI (ACME Renewal Information)

A major development for 2024 and 2025 is the adoption of the ARI extension. Historically, ACME clients blindly attempted renewal at the 60-day mark of a 90-day certificate.

ARI allows Let's Encrypt to dynamically signal to your client exactly when it should renew. This prevents massive traffic spikes on Let's Encrypt's servers and allows for smooth, staggered renewals in the event of a mass revocation incident. Ensure you are updating your ACME clients (like Certbot and cert-manager) to their latest versions to natively support ARI.

Trust, but Verify: Why Automation Still Needs Monitoring

"Set it and forget it" is the most dangerous phrase in infrastructure management. Automation is fantastic, but automation breaks silently.

Consider these common failure scenarios:
* Your DNS provider rotates their API keys, causing DNS-01 challenges

Share This Insight

Related Posts