01 Overview
On 1 July 2024, the Qualys Threat Research Unit (TRU) published one of the most significant Linux vulnerability disclosures in years: a signal handler race condition in OpenSSH's server daemon (sshd) that allows an unauthenticated remote attacker to execute arbitrary code as root on glibc-based Linux systems.
The vulnerability is tracked as CVE-2024-6387 and nicknamed "regreSSHion" — a portmanteau of "regression" and "SSH." The name is deliberate: this flaw is not new code. It is a resurrection of CVE-2006-5051, a signal handler race condition that was patched in 2006 and accidentally reintroduced in OpenSSH 8.5p1, released in October 2020, after a transformative patch inadvertently removed a critical safety wrapper.
In this writeup we break down exactly how the race condition works, why it's exploitable in the remote-unauthenticated context, what the PoC attack chain looks like, how defenders can detect active exploitation, and the exact hardening steps to take.
02 Background: A 2006 Bug Returns
To understand regreSSHion, you need to understand the original vulnerability it is based on. In 2006, CVE-2006-5051 identified that OpenSSH's signal handler called syslog() from within an async signal context — a fundamental violation of async-signal-safety rules in POSIX. The fix introduced a flag-based approach: the handler would set a flag, and a safe function would call syslog() in the main process context later.
When OpenSSH 8.5p1 was released in October 2020 with a new logging framework, a refactoring change removed the async-signal-safe wrapper around syslog() within the SIGALRM handler — quietly reintroducing the original race condition after 14 years.
syslog() removed from SIGALRM context.03 Affected Versions
The vulnerability affects OpenSSH running on glibc-based Linux systems (Debian, Ubuntu, RHEL, Fedora, Arch, etc.). OpenBSD is unaffected because its OpenSSH implementation has handled this condition correctly since 2001. macOS and non-glibc systems (e.g., Alpine Linux with musl) have a different exposure profile.
| OpenSSH Version Range | Status | Notes |
|---|---|---|
| < 4.4p1 | VULNERABLE | Unless patched for CVE-2006-5051 and CVE-2008-4109 |
| 4.4p1 – 8.4p1 | NOT VULNERABLE | CVE-2006-5051 patch applied, protection intact |
| 8.5p1 – 9.7p1 | VULNERABLE | Regression introduced in Oct 2020 refactor |
| ≥ 9.8p1 | FIXED | Patch released 1 Jul 2024. Upgrade immediately. |
ssh -V or sshd -V on your server. You are looking for OpenSSH_9.8p1 or higher as the minimum safe version. Many major Linux distributions have backported the fix — check your distro's security advisory for the exact patched package version.
04 Technical Deep Dive: The Race Condition
How sshd Handles Unauthenticated Connections
When a client connects to sshd, the server forks a child process to handle the connection. This child is initially unprivileged and has a LoginGraceTime window (120 seconds by default) to complete authentication. If the client fails to authenticate within that window, sshd sends itself a SIGALRM signal to terminate the connection.
The SIGALRM handler is asynchronous — it fires at an unpredictable point during the program's execution and runs in the same process context as the main thread, interrupting whatever code was executing at that moment.
The Unsafe Call: syslog() Inside an Async Signal Handler
The vulnerability exists because the SIGALRM handler (specifically the grace_alarm_handler() function in sshd.c) calls syslog() to log the timeout — and syslog() is not async-signal-safe. The POSIX standard explicitly lists which functions may be called safely from a signal handler; syslog() is not among them.
Why does this matter? syslog() internally calls malloc(), free(), and other heap management functions. If the SIGALRM signal fires while the main process is in the middle of a malloc() call — which is also manipulating the heap — the signal handler's syslog() call interrupts that operation and runs its own malloc()/free() call on top of a partially-modified heap structure. This creates a heap corruption condition.
/* Fires when LoginGraceTime expires — called asynchronously */ static void grace_alarm_handler(int sig) { /* ❌ UNSAFE: syslog() is NOT async-signal-safe. If main thread is inside malloc() when SIGALRM fires, this call corrupts the heap. */ syslog(LOG_INFO, "Timeout before authentication for %s", ...); /* syslog() internally calls malloc()/free(), which operate on a heap that may be in a mid-operation inconsistent state. */ _exit(1); }
From Race Condition to Remote Code Execution
The Qualys researchers demonstrated that this heap corruption is not merely a crash — it is exploitable for arbitrary code execution. The exploit chain works as follows:
The Numbers Behind the Race
Qualys's analysis reveals the probabilistic nature of the exploit. Winning the race condition requires the SIGALRM to fire at the precise nanosecond when malloc() is in an inconsistent state. This is inherently unreliable, requiring repeated attempts:
- On average, ~10,000 connection attempts are needed to win the race condition once.
- With sshd's default limit of 100 concurrent connections (
MaxStartups) per 120-secondLoginGraceTime, achieving a root shell on a 32-bit system takes approximately 3–4 hours. - Factoring in ASLR bypass on 64-bit systems, the average rises to 6–8 hours per successful exploitation — though the Qualys team continues to research improvements to this timeline.
Why ASLR Helps (But Doesn't Fix It)
Address Space Layout Randomization (ASLR) randomizes the base addresses of memory regions at process startup, making it harder for an attacker to predict where shellcode will land. On 64-bit Linux, ASLR provides 28 bits of entropy for the heap — meaning an attacker must make 2²⁸ (~268 million) attempts to brute-force it by chance. However, sshd forks a new child for each connection attempt, and each fork inherits the parent's address space layout before ASLR re-randomizes. This, combined with information leaked through partial heap manipulation, significantly reduces the effective entropy an attacker must defeat.
05 PoC Analysis
Within 24–72 hours of Qualys's disclosure, multiple proof-of-concept scripts appeared on GitHub. Most public PoCs are "check scripts" — they attempt to fingerprint vulnerable sshd versions and test for the race condition without completing full exploitation. However, the technical foundation they provide gives advanced attackers a detailed blueprint.
The Qualys PoC (not publicly released, but demonstrated to OpenSSH maintainers) uses the public key parser as the payload delivery mechanism — the crafted public key data contains the shellcode payload embedded after a valid key header, sized to fit within specific heap allocation constraints.
#!/bin/bash # Safe version check — no exploitation, no connection required # Run on the server itself or via SSH to an already-trusted host SSHD_VERSION=$(ssh -V 2>&1 | grep -oP 'OpenSSH_\K[0-9]+\.[0-9]+p[0-9]+') echo "Detected sshd version: OpenSSH_${SSHD_VERSION}" # Extract major.minor for comparison MAJOR=$(echo "$SSHD_VERSION" | cut -d. -f1) MINOR=$(echo "$SSHD_VERSION" | cut -d. -f2 | cut -dp -f1) if [[ "$MAJOR" -eq 8 && "$MINOR" -ge 5 ]] || \ [[ "$MAJOR" -eq 9 && "$MINOR" -le 7 ]]; then echo "[!] VULNERABLE to CVE-2024-6387 (regreSSHion)" echo " Upgrade to OpenSSH 9.8p1 or apply vendor patch immediately." elif [[ "$MAJOR" -lt 8 ]] || \ [[ "$MAJOR" -eq 8 && "$MINOR" -lt 5 ]]; then echo "[!] Possibly vulnerable (pre-4.4p1 range). Check CVE-2006-5051 patch status." else echo "[✓] NOT VULNERABLE — version is 9.8p1 or later." fi
06 Detection & Threat Hunting
Detecting active exploitation of regreSSHion is possible through a combination of SSH log analysis, network monitoring, and endpoint telemetry. Because the exploit requires thousands of connection attempts, the noise it generates is actually detectable — if you're looking for it.
SSH Log Analysis
During exploitation, /var/log/auth.log (Debian/Ubuntu) or /var/log/secure (RHEL/CentOS) will show a high volume of entries like the following from the same source IP within a short time window:
# These patterns indicate active regreSSHion exploitation: Jul 8 03:12:44 server sshd[41231]: Timeout before authentication for 203.0.113.42 port 54321 Jul 8 03:12:44 server sshd[41232]: Timeout before authentication for 203.0.113.42 port 54322 Jul 8 03:12:44 server sshd[41233]: Timeout before authentication for 203.0.113.42 port 54323 # ... hundreds more from the same IP in rapid succession # Also watch for: Jul 8 03:14:01 server sshd[41301]: message authentication code incorrect Jul 8 03:14:01 server sshd[41301]: fatal: mm_request_send: write: Broken pipe
Grep / Hunt Commands
# Count "Timeout before authentication" events per source IP in last hour grep "Timeout before authentication" /var/log/auth.log | \ grep "$(date -d '1 hour ago' '+%b %d %H')" | \ awk '{print $NF}' | sort | uniq -c | sort -rn | head -20 # Flag IPs with 50+ failed preauth connections (exploitation threshold) grep "Timeout before authentication" /var/log/auth.log | \ grep -oP 'for \K[\d\.]+' | sort | uniq -c | awk '$1 > 50' # Check for suspicious child process spawned by sshd (post-exploitation) ps auxf | grep sshd | grep -v grep # Any sshd child running bash/sh/python is a strong IOC # Verify LoginGraceTime setting (mitigation check) grep LoginGraceTime /etc/ssh/sshd_config
SIEM Detection Rules
For teams using Splunk, Elastic, or similar platforms, the following logic can be adapted into alerts:
| search index=linux_logs sourcetype=syslog "Timeout before authentication"
| rex field=_raw "for (?<src_ip>\d+\.\d+\.\d+\.\d+)"
| bucket _time span=5m
| stats count AS timeout_count BY src_ip, _time
| where timeout_count > 100
| eval severity=case(
timeout_count > 500, "CRITICAL",
timeout_count > 200, "HIGH",
true(), "MEDIUM"
)
| table _time, src_ip, timeout_count, severity
| sort -timeout_count
Network-Level Detection
Cisco Talos released Snort/Suricata signature SID: 63659 specifically for regreSSHion exploitation attempts. If you run a network IDS/IPS, ensure this signature is enabled and alerting. Additionally, watch for:
- High-volume TCP SYN packets to port 22 from a single source (scan → exploit pattern)
- Outbound connections from your sshd process to external IPs (post-exploitation C2)
- Anomalous child processes of
sshd(bash, nc, python, curl) — strong post-exploitation indicator
07 Remediation & Hardening
1. Patch — The Only Real Fix
# Ubuntu / Debian sudo apt update && sudo apt upgrade openssh-server # RHEL / CentOS / Fedora sudo dnf update openssh-server # or: sudo yum update openssh-server # Arch Linux sudo pacman -Syu openssh # Verify patch applied: ssh -V # Should output: OpenSSH_9.8p1 (or your distro's patched equivalent) # Restart sshd after patching sudo systemctl restart sshd
2. Set LoginGraceTime to 0 (Immediate Mitigation)
Setting LoginGraceTime 0 in /etc/ssh/sshd_config removes the grace period, which prevents the SIGALRM from ever firing in the authentication context — directly eliminating the race condition trigger. However, this has a trade-off: it removes the timeout for authentication attempts, which could allow resource exhaustion via many simultaneous unauthenticated connections.
# Eliminates the race condition trigger. # Trade-off: removes authentication timeout. # ONLY use this if you cannot patch immediately. LoginGraceTime 0 # Always pair with MaxStartups to limit concurrent connections: MaxStartups 10:30:60 # ↑ start:rate:full — limits connection flood risk # Apply changes: # sudo sshd -t && sudo systemctl reload sshd
3. Restrict SSH Access at the Network Layer
# Allow SSH only from your office/VPN IP ranges iptables -A INPUT -p tcp --dport 22 -s YOUR_OFFICE_IP/32 -j ACCEPT iptables -A INPUT -p tcp --dport 22 -s YOUR_VPN_RANGE/24 -j ACCEPT iptables -A INPUT -p tcp --dport 22 -j DROP # For cloud instances (AWS security groups, GCP firewall rules): # Restrict port 22 inbound to known management IPs. # Never expose SSH to 0.0.0.0/0 in production. # Even better: move SSH behind a bastion host or VPN entirely
4. Deploy fail2ban with Aggressive SSH Rules
[sshd] enabled = true port = ssh filter = sshd logpath = /var/log/auth.log # regreSSHion needs thousands of attempts — catch it fast: maxretry = 20 findtime = 60 # 60-second window bantime = 3600 # ban for 1 hour; increase to 86400 for persistent attackers # Also add a timeout-specific jail: [sshd-timeout] enabled = true filter = sshd[mode=aggressive] logpath = /var/log/auth.log maxretry = 10 findtime = 30 bantime = 7200
5. Additional Hardening Best Practices
- Disable password authentication: Use key-based authentication only. Add
PasswordAuthentication noandChallengeResponseAuthentication notosshd_config. - Enable ASLR on all systems: Verify
cat /proc/sys/kernel/randomize_va_spacereturns2. Set permanently withkernel.randomize_va_space = 2in/etc/sysctl.conf. - Apply seccomp profiles: Restricting the syscalls available to sshd limits what an attacker can do even if they win the race condition. Use
systemd'sSystemCallFilterfor sshd. - Network segmentation: SSH should never be directly reachable from the internet. Place a bastion/jump host or use a VPN as a prerequisite.
- Audit exposed SSH instances: Use Shodan or Censys to identify which of your IP ranges expose SSH to the internet. Anything you don't know about is a risk.
08 Real-World Impact Assessment
The scale of exposure for regreSSHion is genuinely alarming. Qualys identified over 14 million internet-facing OpenSSH instances potentially vulnerable at the time of disclosure. Even weeks later, Qualys's CSAM data showed approximately 700,000 externally-facing customer instances still unpatched — real enterprise servers sitting vulnerable with a known exploit chain.
A successful exploitation yields root access — meaning an attacker can:
- Install persistent backdoors, rootkits, or ransomware
- Exfiltrate all data on the system, including secrets, certificates, and credentials
- Disable logging and tamper with audit trails
- Use the compromised host as a pivot point to attack internal network segments
- Add the host to a botnet for ongoing use in DDoS attacks or further exploitation campaigns
During our assessment engagements following this disclosure, we encountered multiple customer environments with unpatched sshd instances — in several cases, on internet-facing jump hosts and bastion servers. This is the highest-risk deployment pattern: a compromised bastion provides direct, authenticated access to the entire internal network it was meant to protect.
09 Lessons for Security Teams
regreSSHion carries a lesson that goes beyond "patch OpenSSH." It is a textbook example of several failure modes that security teams need to build process around:
- Regression testing for security patches: The 2006 fix worked for 14 years. The vulnerability returned because a code refactor in 2020 removed the protection without a security-focused regression test. If you do security-critical code changes, your test suite must verify the security properties, not just functionality.
- Vulnerability debt compounds: Organizations that couldn't answer "what version of sshd are we running on all internet-facing servers?" in July 2024 had a pre-existing asset management problem that regreSSHion simply exposed.
- Complexity ≠ safety: "The exploit takes 3–8 hours and thousands of attempts" was used by some organizations to justify delay. Automated tooling and the sheer scale of attacks makes this reasoning dangerous.
- The 90-day window is real: Qualys followed coordinated disclosure and gave OpenSSH time to patch. The moment they published, the race was on. Patch SLAs for internet-facing critical infrastructure need to be measured in hours, not weeks.
10 References & Further Reading
- Qualys TRU — Original Advisory (regresshion.txt)
- NVD — CVE-2024-6387 Detail
- OpenSSH Security Advisory
- CISA Known Exploited Vulnerabilities Catalog
Grey Shield Research Team · contact@greyshield.in · Responsible Disclosure Policy: /legal