public inbox for virtualization@lists.linux-foundation.org
 help / color / mirror / Atom feed
* [BUG] vsock: poll() not waking on data arrival, causing multi-second SSH delays
@ 2026-02-05  7:53 agpn1b92
  0 siblings, 0 replies; 4+ messages in thread
From: agpn1b92 @ 2026-02-05  7:53 UTC (permalink / raw)
  To: virtualization, netdev; +Cc: sgarzare

Hi,

I'm experiencing a bug where SSH sessions over vsock take 2-20+ seconds
to establish due to poll() not signaling POLLIN when data is available.
The bug does NOT occur on the first connection after VM boot, but affects
all subsequent connections.

* Summary

- vsock poll() fails to return POLLIN when data is in the receive buffer
- sshd-session's ppoll() times out every ~20ms instead of waking on data
- First SSH connection after guest boot works instantly
- All subsequent connections experience 2-20+ second delays
- Non-PTY commands (ssh -T ... 'echo test') work instantly
- TCP connections to the same VM work instantly

* Environment

Host:
- OS: Arch Linux
- Kernel: 6.18.2-arch2-1
- QEMU: system package (latest)

Guest:
- OS: Debian trixie
- Kernel: 6.17.13+deb13-amd64 (also tested on 6.12.57, same issue)
- OpenSSH: 10.0p2

QEMU command (relevant parts):
  qemu-system-x86_64 -enable-kvm -smp 8 \
    -object memory-backend-memfd,id=mem,size=20G,share=on \
    -machine memory-backend=mem \
    -device vhost-vsock-pci,guest-cid=5 \
    ...

Connection method: ssh user@vsock/5 (via systemd-ssh-proxy)

* Symptoms

Interactive SSH (PTY) - SLOW:
  $ time ssh user@vsock/5
  # Takes 2-20+ seconds before shell prompt appears

Non-interactive SSH - FAST:
  $ time ssh user@vsock/5 'echo test'
  test
  real    0m0.156s

TCP to same VM - FAST:
  $ time ssh -p 33594 user@127.0.0.1
  # Instant

* Key observation: First connection after boot is fast

After guest reboot:
  $ ssh user@vsock/5      # INSTANT (< 1 second)
  $ exit
  $ ssh user@vsock/5      # SLOW (2-20 seconds)
  $ ssh user@vsock/5      # SLOW
  ...

This suggests the bug involves state that accumulates or isn't properly
cleaned up between connections.

** bpftrace evidence

Using syscall tracepoints on guest during slow connection:

  === MINIMAL VSOCK DIAGNOSTIC ===
  [   29 ms] sshd-session: ppoll() duration=19 ms ret=1
             ^^^ 20ms TIMEOUT pattern detected!
  [   50 ms] sshd-session: ppoll() duration=20 ms ret=1
             ^^^ 20ms TIMEOUT pattern detected!
  [   70 ms] sshd-session: ppoll() duration=18 ms ret=1
             ^^^ 20ms TIMEOUT pattern detected!
  ... (continues for ~2 seconds) ...
  
  [ 5000 ms] --- 5s stats: ppoll=455, timeouts=103, recv=0 (0 bytes) ---
  
  [19432 ms] sshd: recvmsg() = 308 bytes [4 µs]
  [19442 ms] sshd-session: recvmsg() = 308 bytes [4 µs]

Pattern analysis:
- ppoll() returns ret=1 (1 fd ready) but takes exactly ~20ms (timeout)
- The ready fd is the PTY, NOT the vsock socket
- recv=0 during the timeout phase: vsock data not being read
- recvmsg() finally succeeds after ~19 seconds
- When recvmsg() runs, it completes in 4 microseconds (data WAS there)

This proves: data is sitting in the vsock receive buffer, but poll()
is not returning POLLIN, so sshd doesn't know to read it.

* 30-second summary from bpftrace

  Total ppoll calls: 488
  Timeouts (20ms pattern): 103
  Successful recvmsg: 6 (984 bytes)
  Timeout rate: 21%

* Why PTY-specific?

PTY sessions require bidirectional traffic:
1. Server sends shell prompt → client must receive it
2. Client sends keypress → server must receive it
3. Server sends echo → client must receive it

Each exchange relies on poll() waking on POLLIN. The bug causes poll()
to miss the wakeup, forcing sshd to wait for its 20ms timeout fallback.

Non-PTY commands do request-response-exit quickly before the bug
manifests significantly.

## Additional context

I previously encountered the identical issue on WSL2's Hyper-V vsock
implementation, suggesting this may be a fundamental issue with how
vsock transports handle poll/wakeup semantics, not specific to virtio.

## Hypothesis

Based on the evidence, this appears to be a lost wakeup race condition:
1. Host sends packet to guest
2. Packet is enqueued to socket's rx_queue
3. sk_data_ready() is called but poll waiters aren't properly woken
4. vsock_poll() returns 0 (no POLLIN) despite data being available
5. ppoll() times out after 20ms, sshd retries
6. Eventually succeeds through timeout-based retry

The "first connection works" pattern suggests the race involves
existing state from previous connections - possibly worker threads,
interrupt handlers, or virtqueue state that isn't properly reset.

## Reproducer

1. Start QEMU VM with vhost-vsock-pci device
2. Boot guest, ensure sshd is running
3. From host: ssh user@vsock/<CID>  # First connection is fast
4. Exit and reconnect: ssh user@vsock/<CID>  # Now slow

## Request

Could someone familiar with the vsock/virtio poll implementation
review the wakeup path? Specifically:
- virtio_transport_recv_pkt() -> sk_data_ready() path
- vsock_poll() -> poll_wait() registration timing
- Any state that persists between connections

Happy to provide additional traces or test patches.

Thanks,
[Your Name]

---
bpftrace script used (runs on guest):

#!/usr/bin/env bpftrace
BEGIN {
    @start = nsecs;
    printf("=== MINIMAL VSOCK DIAGNOSTIC ===\n");
}
tracepoint:syscalls:sys_enter_ppoll {
    if (comm == "sshd-session" || comm == "sshd") {
        @ppoll_enter[tid] = nsecs;
        @ppoll_count++;
    }
}
tracepoint:syscalls:sys_exit_ppoll {
    if (@ppoll_enter[tid]) {
        $ms = (nsecs - @start) / 1000000;
        $dur = (nsecs - @ppoll_enter[tid]) / 1000000;
        if ($dur > 10) {
            printf("[%5lld ms] %s: ppoll() duration=%lld ms ret=%d\n",
                   $ms, comm, $dur, args->ret);
            if ($dur >= 18 && $dur <= 25) {
                printf("           ^^^ 20ms TIMEOUT pattern detected!\n");
                @timeout_count++;
            }
        }
        delete(@ppoll_enter[tid]);
    }
}
tracepoint:syscalls:sys_exit_recvmsg {
    if (comm == "sshd-session" || comm == "sshd") {
        if (args->ret > 0) {
            $ms = (nsecs - @start) / 1000000;
            printf("[%5lld ms] %s: recvmsg() = %lld bytes\n", $ms, comm, args-
>ret);
            @recv_count++;
            @recv_bytes += args->ret;
        }
    }
}
interval:s:5 {
    printf("\n[%5lld ms] --- 5s stats: ppoll=%d, timeouts=%d, recv=%d (%d bytes)
---\n\n",
           (nsecs - @start) / 1000000, @ppoll_count, @timeout_count,
@recv_count, @recv_bytes);
}




^ permalink raw reply	[flat|nested] 4+ messages in thread

* [BUG] vsock: poll() not waking on data arrival, causing multi-second SSH delays
@ 2026-02-05  8:18 agpn1b92
  0 siblings, 0 replies; 4+ messages in thread
From: agpn1b92 @ 2026-02-05  8:18 UTC (permalink / raw)
  To: netdev, virtualization, sgarzare

Hi,

I'm experiencing a bug where SSH sessions over vsock take 2-20+ seconds
to establish due to poll() not signaling POLLIN when data is available.
The bug does NOT occur on the first connection after VM boot, but affects
all subsequent connections.

* Summary

- vsock poll() fails to return POLLIN when data is in the receive buffer
- sshd-session's ppoll() times out every ~20ms instead of waking on data
- First SSH connection after guest boot works instantly
- All subsequent connections experience 2-20+ second delays
- Non-PTY commands (ssh -T ... 'echo test') work instantly
- TCP connections to the same VM work instantly

* Environment

Host:
- OS: Arch Linux
- Kernel: 6.18.2-arch2-1
- QEMU: system package (latest)

Guest:
- OS: Debian trixie
- Kernel: 6.17.13+deb13-amd64 (also tested on 6.12.57, same issue)
- OpenSSH: 10.0p2

QEMU command (relevant parts):
   qemu-system-x86_64 -enable-kvm -smp 8 \
     -object memory-backend-memfd,id=mem,size=20G,share=on \
     -machine memory-backend=mem \
     -device vhost-vsock-pci,guest-cid=5 \
     ...

Connection method: ssh user@vsock/5 (via systemd-ssh-proxy)

* Symptoms

Interactive SSH (PTY) - SLOW:
   $ time ssh user@vsock/5
   # Takes 2-20+ seconds before shell prompt appears

Non-interactive SSH - FAST:
   $ time ssh user@vsock/5 'echo test'
   test
   real    0m0.156s

TCP to same VM - FAST:
   $ time ssh -p 33594 user@127.0.0.1
   # Instant

* Key observation: First connection after boot is fast

After guest reboot:
   $ ssh user@vsock/5      # INSTANT (< 1 second)
   $ exit
   $ ssh user@vsock/5      # SLOW (2-20 seconds)
   $ ssh user@vsock/5      # SLOW
   ...

This suggests the bug involves state that accumulates or isn't properly
cleaned up between connections.

** bpftrace evidence

Using syscall tracepoints on guest during slow connection:

   === MINIMAL VSOCK DIAGNOSTIC ===
   [   29 ms] sshd-session: ppoll() duration=19 ms ret=1
              ^^^ 20ms TIMEOUT pattern detected!
   [   50 ms] sshd-session: ppoll() duration=20 ms ret=1
              ^^^ 20ms TIMEOUT pattern detected!
   [   70 ms] sshd-session: ppoll() duration=18 ms ret=1
              ^^^ 20ms TIMEOUT pattern detected!
   ... (continues for ~2 seconds) ...

   [ 5000 ms] --- 5s stats: ppoll=455, timeouts=103, recv=0 (0 bytes) ---

   [19432 ms] sshd: recvmsg() = 308 bytes [4 µs]
   [19442 ms] sshd-session: recvmsg() = 308 bytes [4 µs]

Pattern analysis:
- ppoll() returns ret=1 (1 fd ready) but takes exactly ~20ms (timeout)
- The ready fd is the PTY, NOT the vsock socket
- recv=0 during the timeout phase: vsock data not being read
- recvmsg() finally succeeds after ~19 seconds
- When recvmsg() runs, it completes in 4 microseconds (data WAS there)

This proves: data is sitting in the vsock receive buffer, but poll()
is not returning POLLIN, so sshd doesn't know to read it.

* 30-second summary from bpftrace

   Total ppoll calls: 488
   Timeouts (20ms pattern): 103
   Successful recvmsg: 6 (984 bytes)
   Timeout rate: 21%

* Why PTY-specific?

PTY sessions require bidirectional traffic:
1. Server sends shell prompt → client must receive it
2. Client sends keypress → server must receive it
3. Server sends echo → client must receive it

Each exchange relies on poll() waking on POLLIN. The bug causes poll()
to miss the wakeup, forcing sshd to wait for its 20ms timeout fallback.

Non-PTY commands do request-response-exit quickly before the bug
manifests significantly.

## Additional context

I previously encountered the identical issue on WSL2's Hyper-V vsock
implementation, suggesting this may be a fundamental issue with how
vsock transports handle poll/wakeup semantics, not specific to virtio.

## Hypothesis

Based on the evidence, this appears to be a lost wakeup race condition:
1. Host sends packet to guest
2. Packet is enqueued to socket's rx_queue
3. sk_data_ready() is called but poll waiters aren't properly woken
4. vsock_poll() returns 0 (no POLLIN) despite data being available
5. ppoll() times out after 20ms, sshd retries
6. Eventually succeeds through timeout-based retry

The "first connection works" pattern suggests the race involves
existing state from previous connections - possibly worker threads,
interrupt handlers, or virtqueue state that isn't properly reset.

## Reproducer

1. Start QEMU VM with vhost-vsock-pci device
2. Boot guest, ensure sshd is running
3. From host: ssh user@vsock/<CID>  # First connection is fast
4. Exit and reconnect: ssh user@vsock/<CID>  # Now slow

## Request

Could someone familiar with the vsock/virtio poll implementation
review the wakeup path? Specifically:
- virtio_transport_recv_pkt() -> sk_data_ready() path
- vsock_poll() -> poll_wait() registration timing
- Any state that persists between connections

Happy to provide additional traces or test patches.

Thanks,
[Your Name]

---
bpftrace script used (runs on guest):

#!/usr/bin/env bpftrace
BEGIN {
     @start = nsecs;
     printf("=== MINIMAL VSOCK DIAGNOSTIC ===\n");
}
tracepoint:syscalls:sys_enter_ppoll {
     if (comm == "sshd-session" || comm == "sshd") {
         @ppoll_enter[tid] = nsecs;
         @ppoll_count++;
     }
}
tracepoint:syscalls:sys_exit_ppoll {
     if (@ppoll_enter[tid]) {
         $ms = (nsecs - @start) / 1000000;
         $dur = (nsecs - @ppoll_enter[tid]) / 1000000;
         if ($dur > 10) {
             printf("[%5lld ms] %s: ppoll() duration=%lld ms ret=%d\n",
                    $ms, comm, $dur, args->ret);
             if ($dur >= 18 && $dur <= 25) {
                 printf("           ^^^ 20ms TIMEOUT pattern detected!\n");
                 @timeout_count++;
             }
         }
         delete(@ppoll_enter[tid]);
     }
}
tracepoint:syscalls:sys_exit_recvmsg {
     if (comm == "sshd-session" || comm == "sshd") {
         if (args->ret > 0) {
             $ms = (nsecs - @start) / 1000000;
             printf("[%5lld ms] %s: recvmsg() = %lld bytes\n", $ms, 
comm, args->ret);
             @recv_count++;
             @recv_bytes += args->ret;
         }
     }
}
interval:s:5 {
     printf("\n[%5lld ms] --- 5s stats: ppoll=%d, timeouts=%d, recv=%d 
(%d bytes) ---\n\n",
            (nsecs - @start) / 1000000, @ppoll_count, @timeout_count, 
@recv_count, @recv_bytes);
}



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [BUG] vsock: poll() not waking on data arrival, causing multi-second SSH delays
       [not found] <f3a801a1a32670294d5d703cad8ab609@anonaddy.me>
@ 2026-02-06 16:23 ` Stefano Garzarella
  2026-02-07 18:56   ` agpn1b92
  0 siblings, 1 reply; 4+ messages in thread
From: Stefano Garzarella @ 2026-02-06 16:23 UTC (permalink / raw)
  To: agpn1b92; +Cc: virtualization, netdev

On Thu, Feb 05, 2026 at 07:53:33AM +0000, agpn1b92@anonaddy.me wrote:
>Hi,

Hi [Your Name],

>
>I'm experiencing a bug where SSH sessions over vsock take 2-20+ seconds
>to establish due to poll() not signaling POLLIN when data is available.
>The bug does NOT occur on the first connection after VM boot, but affects
>all subsequent connections.
>
>* Summary
>
>- vsock poll() fails to return POLLIN when data is in the receive buffer
>- sshd-session's ppoll() times out every ~20ms instead of waking on data
>- First SSH connection after guest boot works instantly
>- All subsequent connections experience 2-20+ second delays
>- Non-PTY commands (ssh -T ... 'echo test') work instantly

mm, so not sure if it's related to the kernel or the user space proxy, 
etc. Would be nice to replicate without ssh.

I tried with 6.18 on both guest and host and I'm not able to reproduce 
it.

Can you try to write a simple reproducer without ssh involved?

Thanks,
Stefano

>- TCP connections to the same VM work instantly
>
>* Environment
>
>Host:
>- OS: Arch Linux
>- Kernel: 6.18.2-arch2-1
>- QEMU: system package (latest)
>
>Guest:
>- OS: Debian trixie
>- Kernel: 6.17.13+deb13-amd64 (also tested on 6.12.57, same issue)
>- OpenSSH: 10.0p2
>
>QEMU command (relevant parts):
>  qemu-system-x86_64 -enable-kvm -smp 8 \
>    -object memory-backend-memfd,id=mem,size=20G,share=on \
>    -machine memory-backend=mem \
>    -device vhost-vsock-pci,guest-cid=5 \
>    ...
>
>Connection method: ssh user@vsock/5 (via systemd-ssh-proxy)
>
>* Symptoms
>
>Interactive SSH (PTY) - SLOW:
>  $ time ssh user@vsock/5
>  # Takes 2-20+ seconds before shell prompt appears
>
>Non-interactive SSH - FAST:
>  $ time ssh user@vsock/5 'echo test'
>  test
>  real    0m0.156s
>
>TCP to same VM - FAST:
>  $ time ssh -p 33594 user@127.0.0.1
>  # Instant
>
>* Key observation: First connection after boot is fast
>
>After guest reboot:
>  $ ssh user@vsock/5      # INSTANT (< 1 second)
>  $ exit
>  $ ssh user@vsock/5      # SLOW (2-20 seconds)
>  $ ssh user@vsock/5      # SLOW
>  ...
>
>This suggests the bug involves state that accumulates or isn't properly
>cleaned up between connections.
>
>** bpftrace evidence
>
>Using syscall tracepoints on guest during slow connection:
>
>  === MINIMAL VSOCK DIAGNOSTIC ===
>  [   29 ms] sshd-session: ppoll() duration=19 ms ret=1
>             ^^^ 20ms TIMEOUT pattern detected!
>  [   50 ms] sshd-session: ppoll() duration=20 ms ret=1
>             ^^^ 20ms TIMEOUT pattern detected!
>  [   70 ms] sshd-session: ppoll() duration=18 ms ret=1
>             ^^^ 20ms TIMEOUT pattern detected!
>  ... (continues for ~2 seconds) ...
>
>  [ 5000 ms] --- 5s stats: ppoll=455, timeouts=103, recv=0 (0 bytes) ---
>
>  [19432 ms] sshd: recvmsg() = 308 bytes [4 µs]
>  [19442 ms] sshd-session: recvmsg() = 308 bytes [4 µs]
>
>Pattern analysis:
>- ppoll() returns ret=1 (1 fd ready) but takes exactly ~20ms (timeout)
>- The ready fd is the PTY, NOT the vsock socket
>- recv=0 during the timeout phase: vsock data not being read
>- recvmsg() finally succeeds after ~19 seconds
>- When recvmsg() runs, it completes in 4 microseconds (data WAS there)
>
>This proves: data is sitting in the vsock receive buffer, but poll()
>is not returning POLLIN, so sshd doesn't know to read it.
>
>* 30-second summary from bpftrace
>
>  Total ppoll calls: 488
>  Timeouts (20ms pattern): 103
>  Successful recvmsg: 6 (984 bytes)
>  Timeout rate: 21%
>
>* Why PTY-specific?
>
>PTY sessions require bidirectional traffic:
>1. Server sends shell prompt → client must receive it
>2. Client sends keypress → server must receive it
>3. Server sends echo → client must receive it
>
>Each exchange relies on poll() waking on POLLIN. The bug causes poll()
>to miss the wakeup, forcing sshd to wait for its 20ms timeout fallback.
>
>Non-PTY commands do request-response-exit quickly before the bug
>manifests significantly.
>
>## Additional context
>
>I previously encountered the identical issue on WSL2's Hyper-V vsock
>implementation, suggesting this may be a fundamental issue with how
>vsock transports handle poll/wakeup semantics, not specific to virtio.
>
>## Hypothesis
>
>Based on the evidence, this appears to be a lost wakeup race condition:
>1. Host sends packet to guest
>2. Packet is enqueued to socket's rx_queue
>3. sk_data_ready() is called but poll waiters aren't properly woken
>4. vsock_poll() returns 0 (no POLLIN) despite data being available
>5. ppoll() times out after 20ms, sshd retries
>6. Eventually succeeds through timeout-based retry
>
>The "first connection works" pattern suggests the race involves
>existing state from previous connections - possibly worker threads,
>interrupt handlers, or virtqueue state that isn't properly reset.
>
>## Reproducer
>
>1. Start QEMU VM with vhost-vsock-pci device
>2. Boot guest, ensure sshd is running
>3. From host: ssh user@vsock/<CID>  # First connection is fast
>4. Exit and reconnect: ssh user@vsock/<CID>  # Now slow
>
>## Request
>
>Could someone familiar with the vsock/virtio poll implementation
>review the wakeup path? Specifically:
>- virtio_transport_recv_pkt() -> sk_data_ready() path
>- vsock_poll() -> poll_wait() registration timing
>- Any state that persists between connections
>
>Happy to provide additional traces or test patches.
>
>Thanks,
>[Your Name]
>
>---
>bpftrace script used (runs on guest):
>
>#!/usr/bin/env bpftrace
>BEGIN {
>    @start = nsecs;
>    printf("=== MINIMAL VSOCK DIAGNOSTIC ===\n");
>}
>tracepoint:syscalls:sys_enter_ppoll {
>    if (comm == "sshd-session" || comm == "sshd") {
>        @ppoll_enter[tid] = nsecs;
>        @ppoll_count++;
>    }
>}
>tracepoint:syscalls:sys_exit_ppoll {
>    if (@ppoll_enter[tid]) {
>        $ms = (nsecs - @start) / 1000000;
>        $dur = (nsecs - @ppoll_enter[tid]) / 1000000;
>        if ($dur > 10) {
>            printf("[%5lld ms] %s: ppoll() duration=%lld ms ret=%d\n",
>                   $ms, comm, $dur, args->ret);
>            if ($dur >= 18 && $dur <= 25) {
>                printf("           ^^^ 20ms TIMEOUT pattern detected!\n");
>                @timeout_count++;
>            }
>        }
>        delete(@ppoll_enter[tid]);
>    }
>}
>tracepoint:syscalls:sys_exit_recvmsg {
>    if (comm == "sshd-session" || comm == "sshd") {
>        if (args->ret > 0) {
>            $ms = (nsecs - @start) / 1000000;
>            printf("[%5lld ms] %s: recvmsg() = %lld bytes\n", $ms, comm, args-
>>ret);
>            @recv_count++;
>            @recv_bytes += args->ret;
>        }
>    }
>}
>interval:s:5 {
>    printf("\n[%5lld ms] --- 5s stats: ppoll=%d, timeouts=%d, recv=%d (%d bytes)
>---\n\n",
>           (nsecs - @start) / 1000000, @ppoll_count, @timeout_count,
>@recv_count, @recv_bytes);
>}
>
>
>
>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [BUG] vsock: poll() not waking on data arrival, causing multi-second SSH delays
  2026-02-06 16:23 ` Stefano Garzarella
@ 2026-02-07 18:56   ` agpn1b92
  0 siblings, 0 replies; 4+ messages in thread
From: agpn1b92 @ 2026-02-07 18:56 UTC (permalink / raw)
  To: sgarzare; +Cc: virtualization, netdev

Hi Stefano and all,

Thank you Stefano for your response and skepticism about whether this was
a kernel issue - you were absolutely right to question it!

After extensive debugging with strace on both guest and host, I've
determined this was NOT a kernel bug at all, but rather an OpenSSH issue
specific to vsock connections.

Root Cause:
-----------
The 10-20 second delay was caused by OpenSSH's sshd attempting DNS lookups
on the literal string "UNKNOWN" (the placeholder hostname used for vsock
connections where no IP address exists). This triggered two 5-second DNS
timeouts during login recording and audit subsystem operations, totaling
~10 seconds of delay.

The strace showed:
   17:11:14.465 sendmmsg(13, DNS query for "UNKNOWN")
   17:11:14.465 poll([{fd=13, events=POLLIN}], 1, 5000) = 0 (Timeout) 
<5.005s>
   17:11:19.472 sendmmsg(13, DNS query for "UNKNOWN") [RETRY]
   17:11:19.472 poll([{fd=13, events=POLLIN}], 1, 5000) = 0 (Timeout) 
<5.005s>

Why I Initially Thought It Was a Kernel Issue:
----------------------------------------------
- bpftrace showed ppoll() timeouts while data appeared to be queued
- The pattern looked like a classic lost wakeup race condition

However, the vsock kernel modules were working perfectly. The delay
happened in userspace during sshd's session setup, specifically when
mm_record_login() tried to resolve the peer hostname for logging.

The Fix:
--------
OpenSSH 10.1 and 10.2 include fixes to prevent passing "UNKNOWN" to
subsystems that would attempt DNS resolution:

- 10.1: Skip audit logging for UNKNOWN hostnames
- 10.2: Don't set PAM_RHOST when remote host is "UNKNOWN"

References:
- https://github.com/openssh/openssh-portable/pull/388
- 
https://gitlab.archlinux.org/archlinux/packaging/packages/openssh/-/issues/16
- https://www.openssh.org/releasenotes.html

Workaround for older OpenSSH versions:
Add to /etc/hosts: 127.0.0.1 UNKNOWN

Apologies for the noise on netdev - the vsock kernel implementation is
working correctly. The misleading symptoms (PTY-specific, ppoll timeouts,
state between connections) made it appear kernel-related when it was
actually sshd's login recording code hitting DNS timeouts.

Thanks again for your help and for maintaining the vsock subsystem!

Best regards,
[Your name - don't forget to update it this time or you'll look even 
more stupid]



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-02-07 18:56 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-05  7:53 [BUG] vsock: poll() not waking on data arrival, causing multi-second SSH delays agpn1b92
  -- strict thread matches above, loose matches on Subject: below --
2026-02-05  8:18 agpn1b92
     [not found] <f3a801a1a32670294d5d703cad8ab609@anonaddy.me>
2026-02-06 16:23 ` Stefano Garzarella
2026-02-07 18:56   ` agpn1b92

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox