The 90-Day Mandate and Beyond: Certificate Management Metrics That Actually Matter in 2025
In 2023, SpaceX's Starlink suffered a massive global outage that left users without internet access for hours. The root cause wasn't a sophisticated nation-state cyberattack, a satellite hardware failure, or a complex BGP routing error. It was an expired SSL/TLS certificate on a ground station.
Around the same time, Cisco issued field notices warning that certificates hardcoded into their Viptela vEdge routers were expiring, threatening to sever control connections across enterprise networks.
These incidents highlight a critical reality in modern IT: machine identities now vastly outnumber human identities. With the explosion of cloud-native architectures, IoT devices, and microservices, the volume of certificates organizations must manage has grown exponentially. According to the Ponemon Institute, the average enterprise experiences more than three certificate-related outages per year, with the average cost of a single outage exceeding $300,000.
Historically, organizations tracked certificates using spreadsheets, focusing on a single, binary metric: Is the certificate expired?
In 2025, that approach is a critical vulnerability. Between Google’s impending mandate to reduce maximum public TLS certificate lifespans to 90 days and the urgent transition toward Post-Quantum Cryptography (PQC), tracking expiration dates on a calendar is mathematically impossible.
To prevent catastrophic outages and secure the perimeter, DevOps and security teams must transition to advanced metrics focused on visibility, operational health, compliance, and automation. Here are the certificate management metrics that actually matter today.
Why Traditional Certificate Tracking is Dead
Before diving into the metrics, we must understand the environmental shifts rendering old methods obsolete.
- The 90-Day Certificate Mandate: Google’s Chromium Root Program has proposed reducing the maximum validity of public TLS certificates from 398 days to just 90 days. When lifespans are this short, manual renewal processes will inevitably fail.
- Post-Quantum Cryptography (PQC) Readiness: In August 2024, NIST finalized its PQC standards (FIPS 203, 204, and 205). Organizations are now racing to inventory their cryptographic assets to prepare for migration. If you don't know where your RSA-2048 certificates are, you cannot replace them.
- Multi-Cloud Sprawl: Relying on native tools like AWS Certificate Manager (ACM) or Azure Key Vault creates siloed visibility. A centralized approach is required to see the full picture.
To build a resilient infrastructure, you need to track metrics across four core pillars.
Pillar 1: Visibility & Discovery Metrics (You Can't Secure What You Can't See)
The foundation of Certificate Lifecycle Management (CLM) is absolute visibility. If a developer spins up a rogue server with a self-signed certificate, your security team needs to know immediately.
1. Total Certificate Inventory Size vs. Active Endpoints
This is your baseline metric. In a modern DevOps environment utilizing dynamic scaling and Kubernetes, this number should fluctuate daily. If your total inventory size remains static for weeks, your discovery tools are failing, and you have blind spots.
2. Percentage of Unmanaged/Rogue Certificates (Shadow IT)
This metric tracks certificates discovered on your network that are not managed by your central CLM system. Developers often bypass slow security processes by using Let's Encrypt directly without oversight. Driving this metric to 0% is critical for Zero Trust Architecture compliance.
3. Time-to-Discovery (TTD)
How long does it take for a newly minted certificate to appear in your central inventory? If a rogue CI/CD pipeline deploys a certificate to a public-facing endpoint, a TTD of "one week during the next vulnerability scan" is unacceptable.
Practical Implementation: Network Scanning
You can implement basic discovery using tools like nmap to proactively scan your infrastructure for untracked certificates:
# Scan a subnet for SSL certificates and output the details
nmap -p 443 --script ssl-cert 10.0.0.0/24 -oN cert_scan_results.txt
While this is a manual example, modern infrastructure requires continuous, automated scanning integrated directly into your monitoring stack.
Pillar 2: Operational Health & Risk Metrics (Preventing the 3 AM Alert)
Once you can see your certificates, you must measure their operational health to prevent the dreaded "certificate expired" application outage.
4. Days to Expiration (DTE) Distribution
Tracking "expired" certificates is a lagging indicator—by the time it hits the dashboard, the application is already down. Instead, track the distribution of your DTE.
Measure the percentage of certificates expiring in 15, 30, 60, and 90 days. With the upcoming 90-day lifespan mandate, alerting at 30 days is no longer an early warning; it's a critical alarm.
This is where dedicated monitoring tools become indispensable. Instead of maintaining complex internal scripts, platforms like Expiring.at allow you to monitor your domains, endpoints, and SSL certificates continuously. By providing external validation of your DTE, you get reliable, automated alerts via email, Slack, or webhooks long before an expiration triggers an outage.
5. Mean Time to Resolution (MTTR) for Certificate Outages
If a certificate expires or is unexpectedly revoked, how quickly can your team replace it and restore service? Industry best practice dictates an MTTR of under 1 hour. If your MTTR is measured in days, your provisioning process is too manual.
6. Certificate-Related Outage Frequency
This is the ultimate KPI for your CLM strategy. Track the number of application downtime incidents caused by expired or misconfigured certificates per quarter. The goal, naturally, is zero.
Pillar 3: Security & Compliance Metrics (Hardening the Perimeter)
Certificates are only as secure as the cryptography backing them. Regulatory frameworks like PCI-DSS v4.0 and the EU's NIS2 Directive require strict management of cryptographic standards.
7. Cryptographic Compliance Rate
This metric tracks the percentage of certificates meeting your current corporate security policy (e.g., RSA 2048-bit or higher, ECC, SHA-256+). As the industry shifts toward quantum-safe algorithms, this metric will evolve into your "Crypto-Agility Score."
8. Self-Signed Certificate Ratio
Self-signed certificates are common in local development, but their presence in production environments is a massive security risk. They cannot be revoked via standard CRLs (Certificate Revocation Lists) or OCSP (Online Certificate Status Protocol). Track the percentage of self-signed certs in production; the target must be strictly 0%.
9. Wildcard Certificate Usage
Wildcard certificates (e.g., *.yourdomain.com) are convenient but highly dangerous. If the private key for a wildcard certificate is compromised on a minor development server, the attacker can impersonate your primary production domain. Tracking and minimizing the raw number of active wildcard certificates is a critical risk-reduction metric.
Pillar 4: Automation & Agility Metrics (The 2025 Imperative)
Automation is no longer a luxury; it is a mathematical necessity. You cannot manually manage thousands of certificates that expire every 90 days.
10. Certificate Automation Rate
This is the most important metric for 2025. It measures the percentage of certificates renewed and provisioned entirely without human intervention via protocols like ACME (Automated Certificate Management Environment - RFC 8555), SCEP, or EST.
Target: >90% automation rate for all infrastructure.
11. Time-to-Provision
How long does it take from the moment a developer requests a certificate to the moment it is deployed on the endpoint? If security teams take three days to issue a certificate, DevOps will find a workaround. Shift-left CLM by integrating issuance directly into CI/CD pipelines.
Practical Implementation: Kubernetes Automation
In cloud-native environments, tracking your Automation Rate means relying on tools like cert-manager for Kubernetes. Instead of a developer requesting a certificate via a Jira ticket, they define it as code:
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: prod-api-cert
namespace: production
spec:
secretName: prod-api-tls
duration: 2160h # 90 days
renewBefore: 360h # 15 days
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
dnsNames:
- api.yourdomain.com
With this manifest, cert-manager automatically negotiates with the CA via the ACME protocol, provisions the certificate, stores it as a Kubernetes Secret, and rotates it 15 days before expiration—driving your Automation Rate to 100% for that cluster.
12. Revocation Speed
If a private key is leaked on GitHub, how fast can you revoke the compromised certificate and push a replacement across your entire global infrastructure? This metric tests true crypto-agility.
Technical Implementation: Automating Your Metrics Pipeline
To track these metrics, you need to extract data from your endpoints programmatically. Below is a robust Bash script utilizing openssl that connects to an endpoint, extracts the certificate expiration date, calculates the Days to Expiration (DTE), and outputs the data in JSON format.
This JSON output can easily be ingested by monitoring platforms like Datadog, Prometheus, or custom dashboards.
```bash
!/bin/bash
check_cert_health.sh
Usage: ./check_cert_health.sh example.com 443
DOMAIN=$1
PORT=${2:-443}
if [ -z "$DOMAIN" ]; then
echo '{"error": "Domain is required"}'
exit 1
fi
Fetch the certificate details using openssl
CERT_DATA=$(echo | openssl s_client -servername "$DOMAIN" -connect "$DOMAIN":"$PORT" 2>/dev/null | openssl x509 -noout -dates -issuer -pubkey)
if [ $? -ne 0 ]; then
echo "{\"error\": \"Failed to retrieve certificate for $DOMAIN\"}"
exit 1
fi
Extract Expiration Date
EXP_DATE_STR=$(echo "$CERT_DATA" | grep "notAfter=" | cut -d= -f2)
EXP_EPOCH=$(date -d "$EXP_DATE_STR" +%s)
CURRENT_EPOCH=$(date +%s)
Calculate Days to Expiration (DTE)
DTE=$(( (EXP_