8.1 CVSS v3.1 Score
HIGH Severity
14M+ Exposed Servers
9.8p1 Fixed Version

01 Overview

On 1 July 2024, the Qualys Threat Research Unit (TRU) published one of the most significant Linux vulnerability disclosures in years: a signal handler race condition in OpenSSH's server daemon (sshd) that allows an unauthenticated remote attacker to execute arbitrary code as root on glibc-based Linux systems.

The vulnerability is tracked as CVE-2024-6387 and nicknamed "regreSSHion" — a portmanteau of "regression" and "SSH." The name is deliberate: this flaw is not new code. It is a resurrection of CVE-2006-5051, a signal handler race condition that was patched in 2006 and accidentally reintroduced in OpenSSH 8.5p1, released in October 2020, after a transformative patch inadvertently removed a critical safety wrapper.

🚨
Why this matters This is the first OpenSSH RCE vulnerability in nearly two decades. It affects sshd in its default configuration, requires no credentials, and grants full root access — the highest possible privilege on a Linux system. Qualys identified over 700,000 externally-facing, customer-owned instances still unpatched weeks after disclosure.

In this writeup we break down exactly how the race condition works, why it's exploitable in the remote-unauthenticated context, what the PoC attack chain looks like, how defenders can detect active exploitation, and the exact hardening steps to take.

02 Background: A 2006 Bug Returns

To understand regreSSHion, you need to understand the original vulnerability it is based on. In 2006, CVE-2006-5051 identified that OpenSSH's signal handler called syslog() from within an async signal context — a fundamental violation of async-signal-safety rules in POSIX. The fix introduced a flag-based approach: the handler would set a flag, and a safe function would call syslog() in the main process context later.

When OpenSSH 8.5p1 was released in October 2020 with a new logging framework, a refactoring change removed the async-signal-safe wrapper around syslog() within the SIGALRM handler — quietly reintroducing the original race condition after 14 years.

2001
OpenBSD's OpenSSH implementation adds async-signal-safe handling — never becomes vulnerable.
Sep 2006 — CVE-2006-5051
Signal handler race condition discovered and patched in OpenSSH 4.4p1. syslog() removed from SIGALRM context.
Oct 2020 — OpenSSH 8.5p1
Logging refactor accidentally removes the async-signal-safe protection, silently reintroducing the 2006 bug.
1 Jul 2024 — CVE-2024-6387
Qualys TRU discovers and responsibly discloses the regression. 14M+ servers exposed. OpenSSH 9.8p1 released same day.
Days later
Multiple PoC exploits appear on GitHub within 24–72 hours of disclosure. Check scripts widely distributed.

03 Affected Versions

The vulnerability affects OpenSSH running on glibc-based Linux systems (Debian, Ubuntu, RHEL, Fedora, Arch, etc.). OpenBSD is unaffected because its OpenSSH implementation has handled this condition correctly since 2001. macOS and non-glibc systems (e.g., Alpine Linux with musl) have a different exposure profile.

OpenSSH Version Range Status Notes
< 4.4p1 VULNERABLE Unless patched for CVE-2006-5051 and CVE-2008-4109
4.4p1 – 8.4p1 NOT VULNERABLE CVE-2006-5051 patch applied, protection intact
8.5p1 – 9.7p1 VULNERABLE Regression introduced in Oct 2020 refactor
≥ 9.8p1 FIXED Patch released 1 Jul 2024. Upgrade immediately.
ℹ️
Check your version Run ssh -V or sshd -V on your server. You are looking for OpenSSH_9.8p1 or higher as the minimum safe version. Many major Linux distributions have backported the fix — check your distro's security advisory for the exact patched package version.

04 Technical Deep Dive: The Race Condition

How sshd Handles Unauthenticated Connections

When a client connects to sshd, the server forks a child process to handle the connection. This child is initially unprivileged and has a LoginGraceTime window (120 seconds by default) to complete authentication. If the client fails to authenticate within that window, sshd sends itself a SIGALRM signal to terminate the connection.

The SIGALRM handler is asynchronous — it fires at an unpredictable point during the program's execution and runs in the same process context as the main thread, interrupting whatever code was executing at that moment.

The Unsafe Call: syslog() Inside an Async Signal Handler

The vulnerability exists because the SIGALRM handler (specifically the grace_alarm_handler() function in sshd.c) calls syslog() to log the timeout — and syslog() is not async-signal-safe. The POSIX standard explicitly lists which functions may be called safely from a signal handler; syslog() is not among them.

Why does this matter? syslog() internally calls malloc(), free(), and other heap management functions. If the SIGALRM signal fires while the main process is in the middle of a malloc() call — which is also manipulating the heap — the signal handler's syslog() call interrupts that operation and runs its own malloc()/free() call on top of a partially-modified heap structure. This creates a heap corruption condition.

sshd.c — Vulnerable signal handler (simplified)
/* Fires when LoginGraceTime expires — called asynchronously */
static void
grace_alarm_handler(int sig)
{
    /* ❌ UNSAFE: syslog() is NOT async-signal-safe.
       If main thread is inside malloc() when SIGALRM fires,
       this call corrupts the heap. */
    syslog(LOG_INFO, "Timeout before authentication for %s", ...);

    /* syslog() internally calls malloc()/free(),
       which operate on a heap that may be in a
       mid-operation inconsistent state.             */
    _exit(1);
}

From Race Condition to Remote Code Execution

The Qualys researchers demonstrated that this heap corruption is not merely a crash — it is exploitable for arbitrary code execution. The exploit chain works as follows:

STEP 01
Identify & Connect
Attacker confirms vulnerable sshd version. Initiates TCP connection to port 22.
STEP 02
Fake Key Exchange
Crafted public key packet positions shellcode payload on the heap.
STEP 03
Win the Race
Time the SIGALRM to fire while malloc() is active. Takes ~10,000 attempts (3–8 hours).
STEP 04
ASLR Bypass
32-bit systems: ~10,000 tries. 64-bit: significantly harder but not impossible.
STEP 05
Root Shell
Shellcode executes as root. Full system compromise achieved.

The Numbers Behind the Race

Qualys's analysis reveals the probabilistic nature of the exploit. Winning the race condition requires the SIGALRM to fire at the precise nanosecond when malloc() is in an inconsistent state. This is inherently unreliable, requiring repeated attempts:

⚠️
Don't dismiss it because it's "hard" The complexity of this exploit provides no security guarantee. Automated tooling, botnet-scale attempts against thousands of targets simultaneously, and improvements to the PoC can dramatically reduce time-to-compromise in practice. Many organizations still run 32-bit systems or have ASLR misconfigured.

Why ASLR Helps (But Doesn't Fix It)

Address Space Layout Randomization (ASLR) randomizes the base addresses of memory regions at process startup, making it harder for an attacker to predict where shellcode will land. On 64-bit Linux, ASLR provides 28 bits of entropy for the heap — meaning an attacker must make 2²⁸ (~268 million) attempts to brute-force it by chance. However, sshd forks a new child for each connection attempt, and each fork inherits the parent's address space layout before ASLR re-randomizes. This, combined with information leaked through partial heap manipulation, significantly reduces the effective entropy an attacker must defeat.

05 PoC Analysis

Within 24–72 hours of Qualys's disclosure, multiple proof-of-concept scripts appeared on GitHub. Most public PoCs are "check scripts" — they attempt to fingerprint vulnerable sshd versions and test for the race condition without completing full exploitation. However, the technical foundation they provide gives advanced attackers a detailed blueprint.

The Qualys PoC (not publicly released, but demonstrated to OpenSSH maintainers) uses the public key parser as the payload delivery mechanism — the crafted public key data contains the shellcode payload embedded after a valid key header, sized to fit within specific heap allocation constraints.

Bash — Check if your sshd is vulnerable (safe fingerprint only)
#!/bin/bash
# Safe version check — no exploitation, no connection required
# Run on the server itself or via SSH to an already-trusted host

SSHD_VERSION=$(ssh -V 2>&1 | grep -oP 'OpenSSH_\K[0-9]+\.[0-9]+p[0-9]+')
echo "Detected sshd version: OpenSSH_${SSHD_VERSION}"

# Extract major.minor for comparison
MAJOR=$(echo "$SSHD_VERSION" | cut -d. -f1)
MINOR=$(echo "$SSHD_VERSION" | cut -d. -f2 | cut -dp -f1)

if [[ "$MAJOR" -eq 8 && "$MINOR" -ge 5 ]] || \
   [[ "$MAJOR" -eq 9 && "$MINOR" -le 7 ]]; then
  echo "[!] VULNERABLE to CVE-2024-6387 (regreSSHion)"
  echo "    Upgrade to OpenSSH 9.8p1 or apply vendor patch immediately."
elif [[ "$MAJOR" -lt 8 ]] || \
     [[ "$MAJOR" -eq 8 && "$MINOR" -lt 5 ]]; then
  echo "[!] Possibly vulnerable (pre-4.4p1 range). Check CVE-2006-5051 patch status."
else
  echo "[✓] NOT VULNERABLE — version is 9.8p1 or later."
fi

06 Detection & Threat Hunting

Detecting active exploitation of regreSSHion is possible through a combination of SSH log analysis, network monitoring, and endpoint telemetry. Because the exploit requires thousands of connection attempts, the noise it generates is actually detectable — if you're looking for it.

SSH Log Analysis

During exploitation, /var/log/auth.log (Debian/Ubuntu) or /var/log/secure (RHEL/CentOS) will show a high volume of entries like the following from the same source IP within a short time window:

/var/log/auth.log — Signatures of exploitation attempts
# These patterns indicate active regreSSHion exploitation:
Jul  8 03:12:44 server sshd[41231]: Timeout before authentication for 203.0.113.42 port 54321
Jul  8 03:12:44 server sshd[41232]: Timeout before authentication for 203.0.113.42 port 54322
Jul  8 03:12:44 server sshd[41233]: Timeout before authentication for 203.0.113.42 port 54323
# ... hundreds more from the same IP in rapid succession

# Also watch for:
Jul  8 03:14:01 server sshd[41301]: message authentication code incorrect
Jul  8 03:14:01 server sshd[41301]: fatal: mm_request_send: write: Broken pipe

Grep / Hunt Commands

Bash — Quick threat hunt for exploitation indicators
# Count "Timeout before authentication" events per source IP in last hour
grep "Timeout before authentication" /var/log/auth.log | \
  grep "$(date -d '1 hour ago' '+%b %d %H')" | \
  awk '{print $NF}' | sort | uniq -c | sort -rn | head -20

# Flag IPs with 50+ failed preauth connections (exploitation threshold)
grep "Timeout before authentication" /var/log/auth.log | \
  grep -oP 'for \K[\d\.]+' | sort | uniq -c | awk '$1 > 50'

# Check for suspicious child process spawned by sshd (post-exploitation)
ps auxf | grep sshd | grep -v grep
# Any sshd child running bash/sh/python is a strong IOC

# Verify LoginGraceTime setting (mitigation check)
grep LoginGraceTime /etc/ssh/sshd_config

SIEM Detection Rules

For teams using Splunk, Elastic, or similar platforms, the following logic can be adapted into alerts:

Splunk SPL — regreSSHion exploitation detection
| search index=linux_logs sourcetype=syslog "Timeout before authentication"
| rex field=_raw "for (?<src_ip>\d+\.\d+\.\d+\.\d+)"
| bucket _time span=5m
| stats count AS timeout_count BY src_ip, _time
| where timeout_count > 100
| eval severity=case(
    timeout_count > 500, "CRITICAL",
    timeout_count > 200, "HIGH",
    true(),              "MEDIUM"
  )
| table _time, src_ip, timeout_count, severity
| sort -timeout_count

Network-Level Detection

Cisco Talos released Snort/Suricata signature SID: 63659 specifically for regreSSHion exploitation attempts. If you run a network IDS/IPS, ensure this signature is enabled and alerting. Additionally, watch for:

07 Remediation & Hardening

Primary fix: Upgrade to OpenSSH 9.8p1 or later This is the only complete fix. All other measures below are mitigations that reduce risk but do not eliminate the vulnerability.

1. Patch — The Only Real Fix

Shell — Patch commands by distribution
# Ubuntu / Debian
sudo apt update && sudo apt upgrade openssh-server

# RHEL / CentOS / Fedora
sudo dnf update openssh-server
# or: sudo yum update openssh-server

# Arch Linux
sudo pacman -Syu openssh

# Verify patch applied:
ssh -V
# Should output: OpenSSH_9.8p1 (or your distro's patched equivalent)

# Restart sshd after patching
sudo systemctl restart sshd

2. Set LoginGraceTime to 0 (Immediate Mitigation)

Setting LoginGraceTime 0 in /etc/ssh/sshd_config removes the grace period, which prevents the SIGALRM from ever firing in the authentication context — directly eliminating the race condition trigger. However, this has a trade-off: it removes the timeout for authentication attempts, which could allow resource exhaustion via many simultaneous unauthenticated connections.

/etc/ssh/sshd_config — Mitigation (use with caution)
# Eliminates the race condition trigger.
# Trade-off: removes authentication timeout.
# ONLY use this if you cannot patch immediately.
LoginGraceTime 0

# Always pair with MaxStartups to limit concurrent connections:
MaxStartups 10:30:60
# ↑ start:rate:full — limits connection flood risk

# Apply changes:
# sudo sshd -t && sudo systemctl reload sshd

3. Restrict SSH Access at the Network Layer

iptables — Whitelist SSH access to known IPs only
# Allow SSH only from your office/VPN IP ranges
iptables -A INPUT -p tcp --dport 22 -s YOUR_OFFICE_IP/32 -j ACCEPT
iptables -A INPUT -p tcp --dport 22 -s YOUR_VPN_RANGE/24  -j ACCEPT
iptables -A INPUT -p tcp --dport 22 -j DROP

# For cloud instances (AWS security groups, GCP firewall rules):
# Restrict port 22 inbound to known management IPs.
# Never expose SSH to 0.0.0.0/0 in production.

# Even better: move SSH behind a bastion host or VPN entirely

4. Deploy fail2ban with Aggressive SSH Rules

/etc/fail2ban/jail.local — Aggressive SSH jail for regreSSHion
[sshd]
enabled  = true
port     = ssh
filter   = sshd
logpath  = /var/log/auth.log
# regreSSHion needs thousands of attempts — catch it fast:
maxretry = 20
findtime = 60     # 60-second window
bantime  = 3600   # ban for 1 hour; increase to 86400 for persistent attackers

# Also add a timeout-specific jail:
[sshd-timeout]
enabled  = true
filter   = sshd[mode=aggressive]
logpath  = /var/log/auth.log
maxretry = 10
findtime = 30
bantime  = 7200

5. Additional Hardening Best Practices

08 Real-World Impact Assessment

The scale of exposure for regreSSHion is genuinely alarming. Qualys identified over 14 million internet-facing OpenSSH instances potentially vulnerable at the time of disclosure. Even weeks later, Qualys's CSAM data showed approximately 700,000 externally-facing customer instances still unpatched — real enterprise servers sitting vulnerable with a known exploit chain.

A successful exploitation yields root access — meaning an attacker can:

During our assessment engagements following this disclosure, we encountered multiple customer environments with unpatched sshd instances — in several cases, on internet-facing jump hosts and bastion servers. This is the highest-risk deployment pattern: a compromised bastion provides direct, authenticated access to the entire internal network it was meant to protect.

🎯
Grey Shield assessment finding In post-disclosure red team engagements, we successfully identified vulnerable sshd versions on externally-facing infrastructure across multiple Indian enterprise environments, including in the banking and e-commerce sectors, months after patch availability. Patch compliance for infrastructure-level vulnerabilities in Indian enterprises significantly lags behind global averages.

09 Lessons for Security Teams

regreSSHion carries a lesson that goes beyond "patch OpenSSH." It is a textbook example of several failure modes that security teams need to build process around:

  1. Regression testing for security patches: The 2006 fix worked for 14 years. The vulnerability returned because a code refactor in 2020 removed the protection without a security-focused regression test. If you do security-critical code changes, your test suite must verify the security properties, not just functionality.
  2. Vulnerability debt compounds: Organizations that couldn't answer "what version of sshd are we running on all internet-facing servers?" in July 2024 had a pre-existing asset management problem that regreSSHion simply exposed.
  3. Complexity ≠ safety: "The exploit takes 3–8 hours and thousands of attempts" was used by some organizations to justify delay. Automated tooling and the sheer scale of attacks makes this reasoning dangerous.
  4. The 90-day window is real: Qualys followed coordinated disclosure and gave OpenSSH time to patch. The moment they published, the race was on. Patch SLAs for internet-facing critical infrastructure need to be measured in hours, not weeks.

10 References & Further Reading

Grey Shield Research Team · contact@greyshield.in · Responsible Disclosure Policy: /legal