Taming the Chaos: Modern Subdomain Certificate Management Strategies
If you’ve ever been jolted awake by a PagerDuty alert for a "Connection Not Secure" error on a critical service, you understand the visceral pain of a forgotten certificate. For years, managing TLS certificates was a tedious but manageable chore. You'd buy a one-year wildcard certificate, set a calendar reminder for 11 months later, and hope for the best.
That era is definitively over.
The landscape has been seismically altered by two forces: the relentless push towards 90-day certificate lifespans and the explosion of ephemeral, microservice-based architectures. Manual management is no longer just inefficient; it's a direct path to service outages, security vulnerabilities, and compliance failures.
This post dives into the modern strategies you need to adopt to manage subdomain certificates effectively. We'll move beyond simple renewal scripts and explore a holistic approach covering automation, strategic certificate selection, and centralized visibility to build a resilient and secure PKI system.
The New Baseline: Why 90-Day Certificates Demand 100% Automation
The industry, led by initiatives from Google Chrome and others, is rapidly converging on a 90-day maximum validity period for public TLS certificates. While not yet a formal baseline requirement, its adoption is an inevitability. The primary driver is security—a shorter lifespan dramatically reduces the window of opportunity for an attacker to misuse a compromised key.
This shift has profound operational consequences:
* Manual renewals are impossible. A process that happened once a year must now happen four times a year, for every single certificate. The potential for human error increases exponentially.
* Automation is the only viable path. The entire certificate lifecycle—issuance, validation, deployment, and renewal—must be automated.
* Crypto-agility is essential. Shorter lifespans make it easier to transition to newer cryptographic standards, like those being developed for post-quantum cryptography, without massive, disruptive projects.
The de facto standard for this automation is the Automated Certificate Management Environment (ACME) protocol, popularized by the free and automated Certificate Authority, Let's Encrypt. Any modern certificate strategy must be built on a foundation of ACME-driven automation.
The Strategic Choice: Wildcard vs. Multi-Domain (SAN) Certificates
One of the most fundamental decisions in subdomain management is choosing the right type of certificate. The two primary options are Wildcard certificates and Multi-Domain (SAN) certificates. They are not interchangeable, and choosing the wrong one can introduce unnecessary security risks.
| Feature | Wildcard Certificate (*.example.com) |
Multi-Domain (SAN) Certificate |
|---|---|---|
| Coverage | Covers a single level of subdomains (e.g., api, blog). Does not cover app.dev.example.com. |
Covers a specific, explicit list of domains (e.g., example.com, api.example.com). |
| Flexibility | High. New subdomains are automatically secured without re-issuance. Ideal for dynamic environments. | Low. A new subdomain requires re-issuing the certificate with the new name added to the SAN list. |
| Security Risk | High. A single private key compromise affects all subdomains covered by the wildcard. | Lower. The blast radius is limited to the specific domains listed on the certificate. |
| ACME Challenge | Requires the DNS-01 challenge, proving control over the entire domain zone. |
Can use the simpler HTTP-01 challenge or DNS-01. |
The Principle of Least Privilege in Practice
The core security trade-off is flexibility versus risk. A wildcard certificate (*.production.example.com) is incredibly convenient for a Kubernetes cluster where new services are deployed constantly. However, if the private key for that wildcard is compromised from a single, low-security application, an attacker can now impersonate any service in your production environment, including your critical billing API or user authentication service.
Best Practice: Adopt a hybrid strategy.
- Use Wildcard Certificates for homogenous, ephemeral environments. A Kubernetes ingress controller managing services within a single, tightly controlled security domain is a perfect use case. The convenience of not needing to reconfigure the ingress for every new service outweighs the risk.
- Use Multi-Domain (SAN) Certificates for distinct, high-security applications. Your customer-facing portal, primary API gateway, and internal administration tools should have their own specific certificates. This contains the blast radius of a potential compromise to only those named services.
Taming the Sprawl: Discovery, Inventory, and Policy
You can't manage what you can't see. "Certificate sprawl"—the proliferation of unmanaged, forgotten certificates across cloud accounts, development servers, and legacy systems—is a primary cause of unexpected outages.
Step 1: Automated Discovery and Centralized Inventory
The first step towards control is building a complete, real-time inventory of every certificate issued for your domains. Manual spreadsheets are a recipe for disaster.
A modern approach involves continuous, automated discovery from multiple sources:
1. Certificate Transparency (CT) Logs: Public logs that record all publicly trusted TLS certificates. Monitoring these logs for your domains reveals any certificate issued by any public CA, helping you spot rogue or unauthorized issuances.
2. Internal Network Scanning: Regularly scan your internal and external IP ranges to discover active TLS endpoints and inspect their certificates.
3. Cloud Provider APIs: Integrate with AWS, Azure, and GCP to pull data from their certificate management services (ACM, Key Vault, etc.).
This is where a dedicated monitoring and inventory platform becomes invaluable. Services like Expiring.at automate this entire discovery process, integrating with CT logs and providing a centralized dashboard to track every certificate, its expiration date, and its configuration details. This single pane of glass is crucial for preventing certificates from slipping through the cracks.
Step 2: Enforce Policy as Code
Once you have visibility, you can enforce standards. Don't rely on documentation and developer goodwill to ensure proper security configurations.
-
Certificate Authority Authorization (CAA): This is a simple but powerful DNS record that lets you specify which CAs are authorized to issue certificates for your domain. If an attacker or a misconfigured script tries to get a certificate from an unauthorized CA, the issuance will fail.
A typical CAA record looks like this:
dns example.com. IN CAA 0 issue "letsencrypt.org" example.com. IN CAA 0 issuewild "letsencrypt.org" example.com. IN CAA 0 iodef "mailto:security@example.com"
This configuration allows Let's Encrypt to issue both standard and wildcard certificates and sends violation reports to your security team. -
CI/CD Pipeline Checks: Integrate tools like Open Policy Agent (OPA) into your deployment pipelines. You can write policies to prevent the deployment of services with certificates that don't meet your standards (e.g., weak cipher suites, short key lengths, or issuance from an unapproved internal CA).
Practical Implementation: Automation with cert-manager in Kubernetes
For teams using Kubernetes, cert-manager is the gold standard for certificate automation. It runs as a controller within your cluster and uses Custom Resource Definitions (CRDs) to manage the entire certificate lifecycle.
Let's walk through issuing a wildcard certificate using cert-manager with Let's Encrypt and the DNS-01 challenge with Cloudflare.
Step 1: Install cert-manager
First, install cert-manager into your cluster using its Helm chart.
# Add the Jetstack Helm repository
helm repo add jetstack https://charts.jetstack.io
# Update your local Helm chart repository cache
helm repo update
# Install the cert-manager chart
helm install \
cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--version v1.14.4 \
--set installCRDs=true
Step 2: Create a ClusterIssuer
An Issuer or ClusterIssuer represents a certificate authority from which to request certificates. A ClusterIssuer is a cluster-scoped resource that can be used from any namespace.
We'll create a ClusterIssuer for Let's Encrypt's production environment. It needs a secret containing your Cloudflare API token to solve the DNS-01 challenge.
First, create the secret:
kubectl create secret generic cloudflare-api-token-secret \
--namespace cert-manager \
--from-literal=api-token='YOUR_CLOUDFLARE_API_TOKEN'
Now, define the ClusterIssuer:
# cluster-issuer.yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-production
spec:
acme:
# The ACME server URL
server: https://acme-v02.api.letsencrypt.org/directory
# Email address used for ACME registration
email: your-email@example.com
# Name of a secret used to store the ACME account private key
privateKeySecretRef:
name: letsencrypt-production-account-key
# Enable the DNS-01 challenge provider
solvers:
- dns01:
cloudflare:
email: your-cloudflare-email@example.com
# Reference the secret containing the API token
apiTokenSecretRef:
name: cloudflare-api-token-secret
key: api-token
Apply it to your cluster: kubectl apply -f cluster-issuer.yaml.
Step 3: Request a Wildcard Certificate
Finally, create a Certificate resource to request the wildcard certificate. cert-manager will see this resource, communicate with Let's Encrypt, solve the DNS challenge, and store the resulting certificate and private key in a Kubernetes Secret.
# wildcard-certificate.yaml
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: wildcard-example-com
namespace: istio-system # Or any namespace where your ingress gateway is
spec:
# The secret name to store the certificate and key
secretName: wildcard-example-com-tls
# The issuer to use
issuerRef:
name: letsencrypt-production
kind: ClusterIssuer
# The common name and SANs for the certificate
commonName: "*.example.com"
dnsNames:
- "*.example.com"
Apply it: kubectl apply -f wildcard-certificate.yaml.
Within a few minutes, cert-manager will create the wildcard-example-com-tls secret, containing tls.crt, tls.key, and ca.crt. You can now mount this secret into your Ingress controller or Gateway to secure all your subdomains. Best of all, cert-manager will automatically handle the renewal process long before it expires.
Conclusion: From Reactive Firefighting to Proactive Strategy
Managing subdomain certificates has evolved from a simple administrative task into a critical infrastructure discipline. The move to 90-day lifecycles is the final nail in the coffin for manual processes.
To build a resilient and secure system, you must embrace a new paradigm based on three pillars:
- Automate Everything: Use ACME clients like
cert-managerto handle the entire certificate lifecycle. There should be zero manual steps in your issuance, renewal, or deployment process. - Centralize and Monitor: You cannot protect what you don't know exists. Implement automated discovery and use a centralized inventory like Expiring.at to maintain a single source of truth for all your public and private certificates.
- Apply a Least Privilege Mindset: Choose the right tool for the job. Use specific SAN certificates for high-value assets to limit your risk, and reserve wildcards for homogenous, dynamic environments where the operational benefits are clear.
Start today by auditing your domains. Find every certificate, get it into a centralized inventory, and identify your highest-risk manual renewal processes. By replacing them with an automated, policy-driven workflow, you can finally stop firefighting certificate expirations and start building a more secure and reliable infrastructure.