public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] vsock: avoid timeout for non-blocking accept() with empty backlog
@ 2026-04-02  4:46 Laurence Rowe
  2026-04-02 12:02 ` Stefano Garzarella
  2026-04-02 17:19 ` Bobby Eshleman
  0 siblings, 2 replies; 6+ messages in thread
From: Laurence Rowe @ 2026-04-02  4:46 UTC (permalink / raw)
  To: Stefano Garzarella; +Cc: virtualization, netdev, Laurence Rowe

A common pattern in epoll network servers is to eagerly accept all
pending connections from the non-blocking listening socket after
epoll_wait indicates the socket is ready by calling accept in a loop
until EAGAIN is returned indicating that the backlog is empty.

Scheduling a timeout for a non-blocking accept with an empty backlog
meant AF_VSOCK sockets used by epoll network servers incurred hundreds
of microseconds of additional latency per accept loop compared to
AF_INET or AF_UNIX sockets.

Signed-off-by: Laurence Rowe <laurencerowe@gmail.com>
---

This fixes the observed issue for me:

1. With loopback vsock on the host running Linux v6.19.10 built with
config-6.17.0-19-generic from Ubuntu 24.04 and make olddefconfig.

2. With Firecracker guests with current torvalds/master, v6.19.10, and
amazonlinux/microvm-kernel-6.1.166-24.303.amzn2023 used in Firecracker
CI and examples. (Firecracker guest vsocks are unix sockets on the host
side so this fix works there with just a fixed guest kernel.)

I struggled to build a generic 6.1.166 kernel that worked as a
Firecracker guest but the patch applies (conflict due to change of
`flags` to `arg->flags` in surrounding context) so I believe it should
work for generic v6.1.166 kernel.

Alternatively a minimal version of this fix is to just wrap the
`schedule_timeout` in an `if (timeout != 0)` but that leaves an
unnecessary additional `lock_sock` call.

There are ftrace's and reproduction tools at:
https://github.com/lrowe/linux-vsock-accept-timeout-investigation
---
 net/vmw_vsock/af_vsock.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 2f7d94d682..483889b6d8 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1850,11 +1850,11 @@ static int vsock_accept(struct socket *sock, struct socket *newsock,
 	 * created upon connection establishment.
 	 */
 	timeout = sock_rcvtimeo(listener, arg->flags & O_NONBLOCK);
-	prepare_to_wait(sk_sleep(listener), &wait, TASK_INTERRUPTIBLE);
 
 	while ((connected = vsock_dequeue_accept(listener)) == NULL &&
-	       listener->sk_err == 0) {
+	       listener->sk_err == 0 && timeout != 0) {
 		release_sock(listener);
+		prepare_to_wait(sk_sleep(listener), &wait, TASK_INTERRUPTIBLE);
 		timeout = schedule_timeout(timeout);
 		finish_wait(sk_sleep(listener), &wait);
 		lock_sock(listener);
@@ -1862,17 +1862,15 @@ static int vsock_accept(struct socket *sock, struct socket *newsock,
 		if (signal_pending(current)) {
 			err = sock_intr_errno(timeout);
 			goto out;
-		} else if (timeout == 0) {
-			err = -EAGAIN;
-			goto out;
 		}
-
-		prepare_to_wait(sk_sleep(listener), &wait, TASK_INTERRUPTIBLE);
 	}
-	finish_wait(sk_sleep(listener), &wait);
 
-	if (listener->sk_err)
+	if (listener->sk_err) {
 		err = -listener->sk_err;
+	} else if (timeout == 0 && connected == NULL) {
+		err = -EAGAIN;
+		goto out;
+	}
 
 	if (connected) {
 		sk_acceptq_removed(listener);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] vsock: avoid timeout for non-blocking accept() with empty backlog
  2026-04-02  4:46 [PATCH] vsock: avoid timeout for non-blocking accept() with empty backlog Laurence Rowe
@ 2026-04-02 12:02 ` Stefano Garzarella
  2026-04-02 19:22   ` Laurence Rowe
  2026-04-02 17:19 ` Bobby Eshleman
  1 sibling, 1 reply; 6+ messages in thread
From: Stefano Garzarella @ 2026-04-02 12:02 UTC (permalink / raw)
  To: Laurence Rowe; +Cc: virtualization, netdev

On Wed, Apr 01, 2026 at 09:46:37PM -0700, Laurence Rowe wrote:
>A common pattern in epoll network servers is to eagerly accept all
>pending connections from the non-blocking listening socket after
>epoll_wait indicates the socket is ready by calling accept in a loop
>until EAGAIN is returned indicating that the backlog is empty.
>
>Scheduling a timeout for a non-blocking accept with an empty backlog
>meant AF_VSOCK sockets used by epoll network servers incurred hundreds
>of microseconds of additional latency per accept loop compared to
>AF_INET or AF_UNIX sockets.

Not related to this patch, but should we do something similar (in 
another patch) also in vsock_connect() or doesn't matter since usually 
it's always blocking?

>
>Signed-off-by: Laurence Rowe <laurencerowe@gmail.com>
>---
>
>This fixes the observed issue for me:
>
>1. With loopback vsock on the host running Linux v6.19.10 built with
>config-6.17.0-19-generic from Ubuntu 24.04 and make olddefconfig.
>
>2. With Firecracker guests with current torvalds/master, v6.19.10, and
>amazonlinux/microvm-kernel-6.1.166-24.303.amzn2023 used in Firecracker
>CI and examples. (Firecracker guest vsocks are unix sockets on the host
>side so this fix works there with just a fixed guest kernel.)
>
>I struggled to build a generic 6.1.166 kernel that worked as a
>Firecracker guest but the patch applies (conflict due to change of
>`flags` to `arg->flags` in surrounding context) so I believe it should
>work for generic v6.1.166 kernel.
>
>Alternatively a minimal version of this fix is to just wrap the
>`schedule_timeout` in an `if (timeout != 0)` but that leaves an
>unnecessary additional `lock_sock` call.
>
>There are ftrace's and reproduction tools at:
>https://github.com/lrowe/linux-vsock-accept-timeout-investigation
>---
> net/vmw_vsock/af_vsock.c | 16 +++++++---------
> 1 file changed, 7 insertions(+), 9 deletions(-)
>
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index 2f7d94d682..483889b6d8 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -1850,11 +1850,11 @@ static int vsock_accept(struct socket *sock, struct socket *newsock,
> 	 * created upon connection establishment.
> 	 */
> 	timeout = sock_rcvtimeo(listener, arg->flags & O_NONBLOCK);
>-	prepare_to_wait(sk_sleep(listener), &wait, TASK_INTERRUPTIBLE);
>
> 	while ((connected = vsock_dequeue_accept(listener)) == NULL &&
>-	       listener->sk_err == 0) {
>+	       listener->sk_err == 0 && timeout != 0) {
> 		release_sock(listener);
>+		prepare_to_wait(sk_sleep(listener), &wait, TASK_INTERRUPTIBLE);

Is it okay to move prepare_to_wait() after `release_sock(listener)`?

I'm worried if we can miss any wakeup. BTW if this change is okay, we 
should document that at least in the commit description.

> 		timeout = schedule_timeout(timeout);
> 		finish_wait(sk)sleep(listener), &wait);
> 		lock_sock(listener);
>@@ -1862,17 +1862,15 @@ static int vsock_accept(struct socket *sock, struct socket *newsock,
> 		if (signal_pending(current)) {
> 			err = sock_intr_errno(timeout);
> 			goto out;
>-		} else if (timeout == 0) {
>-			err = -EAGAIN;
>-			goto out;
> 		}
>-
>-		prepare_to_wait(sk_sleep(listener), &wait, TASK_INTERRUPTIBLE);
> 	}
>-	finish_wait(sk_sleep(listener), &wait);
>
>-	if (listener->sk_err)
>+	if (listener->sk_err) {
> 		err = -listener->sk_err;
>+	} else if (timeout == 0 && connected == NULL) {

 From checkpatch:
CHECK: Comparison to NULL could be written "!connected"
#58: FILE: net/vmw_vsock/af_vsock.c:1870:
+	} else if (timeout == 0 && connected == NULL) {

>+		err = -EAGAIN;
>+		goto out;
>+	}

What about simplifying this with (not a strong opinion):

	} else if (connected == NULL) {
		err = -EAGAIN;
	}


Also 
https://patchwork.kernel.org/project/netdevbpf/patch/20260402044637.73531-1-laurencerowe@gmail.com/ 
suggests to specify a tree (net-next I think for this change) and be 
sure to CC other maintainers (scripts/get_maintainer.pl can help).

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] vsock: avoid timeout for non-blocking accept() with empty backlog
  2026-04-02  4:46 [PATCH] vsock: avoid timeout for non-blocking accept() with empty backlog Laurence Rowe
  2026-04-02 12:02 ` Stefano Garzarella
@ 2026-04-02 17:19 ` Bobby Eshleman
  1 sibling, 0 replies; 6+ messages in thread
From: Bobby Eshleman @ 2026-04-02 17:19 UTC (permalink / raw)
  To: Laurence Rowe; +Cc: Stefano Garzarella, virtualization, netdev

On Wed, Apr 01, 2026 at 09:46:37PM -0700, Laurence Rowe wrote:
> A common pattern in epoll network servers is to eagerly accept all
> pending connections from the non-blocking listening socket after
> epoll_wait indicates the socket is ready by calling accept in a loop
> until EAGAIN is returned indicating that the backlog is empty.
> 
> Scheduling a timeout for a non-blocking accept with an empty backlog
> meant AF_VSOCK sockets used by epoll network servers incurred hundreds
> of microseconds of additional latency per accept loop compared to
> AF_INET or AF_UNIX sockets.
> 
> Signed-off-by: Laurence Rowe <laurencerowe@gmail.com>
> ---
> 
> This fixes the observed issue for me:
> 
> 1. With loopback vsock on the host running Linux v6.19.10 built with
> config-6.17.0-19-generic from Ubuntu 24.04 and make olddefconfig.
> 
> 2. With Firecracker guests with current torvalds/master, v6.19.10, and
> amazonlinux/microvm-kernel-6.1.166-24.303.amzn2023 used in Firecracker
> CI and examples. (Firecracker guest vsocks are unix sockets on the host
> side so this fix works there with just a fixed guest kernel.)
> 
> I struggled to build a generic 6.1.166 kernel that worked as a
> Firecracker guest but the patch applies (conflict due to change of
> `flags` to `arg->flags` in surrounding context) so I believe it should
> work for generic v6.1.166 kernel.
> 
> Alternatively a minimal version of this fix is to just wrap the
> `schedule_timeout` in an `if (timeout != 0)` but that leaves an
> unnecessary additional `lock_sock` call.
> 
> There are ftrace's and reproduction tools at:
> https://github.com/lrowe/linux-vsock-accept-timeout-investigation
> ---
>  net/vmw_vsock/af_vsock.c | 16 +++++++---------
>  1 file changed, 7 insertions(+), 9 deletions(-)
> 
> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> index 2f7d94d682..483889b6d8 100644
> --- a/net/vmw_vsock/af_vsock.c
> +++ b/net/vmw_vsock/af_vsock.c
> @@ -1850,11 +1850,11 @@ static int vsock_accept(struct socket *sock, struct socket *newsock,
>  	 * created upon connection establishment.
>  	 */
>  	timeout = sock_rcvtimeo(listener, arg->flags & O_NONBLOCK);
> -	prepare_to_wait(sk_sleep(listener), &wait, TASK_INTERRUPTIBLE);
>  
>  	while ((connected = vsock_dequeue_accept(listener)) == NULL &&
> -	       listener->sk_err == 0) {
> +	       listener->sk_err == 0 && timeout != 0) {
>  		release_sock(listener);
> +		prepare_to_wait(sk_sleep(listener), &wait, TASK_INTERRUPTIBLE);
>  		timeout = schedule_timeout(timeout);
>  		finish_wait(sk_sleep(listener), &wait);
>  		lock_sock(listener);
> @@ -1862,17 +1862,15 @@ static int vsock_accept(struct socket *sock, struct socket *newsock,
>  		if (signal_pending(current)) {
>  			err = sock_intr_errno(timeout);
>  			goto out;
> -		} else if (timeout == 0) {
> -			err = -EAGAIN;
> -			goto out;
>  		}
> -
> -		prepare_to_wait(sk_sleep(listener), &wait, TASK_INTERRUPTIBLE);
>  	}
> -	finish_wait(sk_sleep(listener), &wait);
>  
> -	if (listener->sk_err)
> +	if (listener->sk_err) {
>  		err = -listener->sk_err;
> +	} else if (timeout == 0 && connected == NULL) {
> +		err = -EAGAIN;
> +		goto out;
> +	}

I wonder if this goto can be omitted since the following 'if
(connected)' guards the connected != NULL case? I don't have a strong
opinion, just noticed it would keep if-else symmetrical.

All-in-all, LGTM.

Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] vsock: avoid timeout for non-blocking accept() with empty backlog
  2026-04-02 12:02 ` Stefano Garzarella
@ 2026-04-02 19:22   ` Laurence Rowe
  2026-04-02 23:30     ` Laurence Rowe
  0 siblings, 1 reply; 6+ messages in thread
From: Laurence Rowe @ 2026-04-02 19:22 UTC (permalink / raw)
  To: Stefano Garzarella; +Cc: virtualization, netdev, Bobby Eshleman

On Thu, Apr 2, 2026 at 5:03 AM Stefano Garzarella <sgarzare@redhat.com> wrote:

> >Scheduling a timeout for a non-blocking accept with an empty backlog
> >meant AF_VSOCK sockets used by epoll network servers incurred hundreds
> >of microseconds of additional latency per accept loop compared to
> >AF_INET or AF_UNIX sockets.
>
> Not related to this patch, but should we do something similar (in
> another patch) also in vsock_connect() or doesn't matter since usually
> it's always blocking?

Looking at vsock_connect it's not immediately obvious to me whether it
is affected
in the same way. I'll capture some ftraces and follow up after
updating this patch.

> >diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> >index 2f7d94d682..483889b6d8 100644
> >--- a/net/vmw_vsock/af_vsock.c
> >+++ b/net/vmw_vsock/af_vsock.c
> >@@ -1850,11 +1850,11 @@ static int vsock_accept(struct socket *sock, struct socket *newsock,
> >        * created upon connection establishment.
> >        */
> >       timeout = sock_rcvtimeo(listener, arg->flags & O_NONBLOCK);
> >-      prepare_to_wait(sk_sleep(listener), &wait, TASK_INTERRUPTIBLE);
> >
> >       while ((connected = vsock_dequeue_accept(listener)) == NULL &&
> >-             listener->sk_err == 0) {
> >+             listener->sk_err == 0 && timeout != 0) {
> >               release_sock(listener);
> >+              prepare_to_wait(sk_sleep(listener), &wait, TASK_INTERRUPTIBLE);
>
> Is it okay to move prepare_to_wait() after `release_sock(listener)`?
>
> I'm worried if we can miss any wakeup. BTW if this change is okay, we
> should document that at least in the commit description.

I'm not sure. I'll swap them around so they are the same order as before.

>  From checkpatch:
> CHECK: Comparison to NULL could be written "!connected"
> #58: FILE: net/vmw_vsock/af_vsock.c:1870:
> +       } else if (timeout == 0 && connected == NULL) {
>
> >+              err = -EAGAIN;
> >+              goto out;
> >+      }
>
> What about simplifying this with (not a strong opinion):
>
>         } else if (connected == NULL) {
>                 err = -EAGAIN;
>         }

That definitely seems cleaner.

> Also
> https://patchwork.kernel.org/project/netdevbpf/patch/20260402044637.73531-1-laurencerowe@gmail.com/
> suggests to specify a tree (net-next I think for this change) and be
> sure to CC other maintainers (scripts/get_maintainer.pl can help).

Sorry about that, first kernel patch submission. Will do so when I
send the updated patch.

Laurence

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] vsock: avoid timeout for non-blocking accept() with empty backlog
  2026-04-02 19:22   ` Laurence Rowe
@ 2026-04-02 23:30     ` Laurence Rowe
  2026-04-03 10:04       ` Stefano Garzarella
  0 siblings, 1 reply; 6+ messages in thread
From: Laurence Rowe @ 2026-04-02 23:30 UTC (permalink / raw)
  To: Stefano Garzarella; +Cc: virtualization, netdev, Bobby Eshleman

On Thu, Apr 2, 2026 at 12:22 PM Laurence Rowe <laurencerowe@gmail.com> wrote:
>
> On Thu, Apr 2, 2026 at 5:03 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> > >Scheduling a timeout for a non-blocking accept with an empty backlog
> > >meant AF_VSOCK sockets used by epoll network servers incurred hundreds
> > >of microseconds of additional latency per accept loop compared to
> > >AF_INET or AF_UNIX sockets.
> >
> > Not related to this patch, but should we do something similar (in
> > another patch) also in vsock_connect() or doesn't matter since usually
> > it's always blocking?
>
> Looking at vsock_connect it's not immediately obvious to me whether it
> is affected
> in the same way. I'll capture some ftraces and follow up after
> updating this patch.

This does not seem to be a problem for vsock_connect since it checks
for `if (flags & O_NONBLOCK) {` in the while loop before calling
`schedule_timeout`.

Timings and ftraces:

https://github.com/lrowe/linux-vsock-accept-timeout-investigation?tab=readme-ov-file#a-quick-look-at-connect

Laurence

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] vsock: avoid timeout for non-blocking accept() with empty backlog
  2026-04-02 23:30     ` Laurence Rowe
@ 2026-04-03 10:04       ` Stefano Garzarella
  0 siblings, 0 replies; 6+ messages in thread
From: Stefano Garzarella @ 2026-04-03 10:04 UTC (permalink / raw)
  To: Laurence Rowe; +Cc: virtualization, netdev, Bobby Eshleman

On Thu, Apr 02, 2026 at 04:30:20PM -0700, Laurence Rowe wrote:
>On Thu, Apr 2, 2026 at 12:22 PM Laurence Rowe <laurencerowe@gmail.com> wrote:
>>
>> On Thu, Apr 2, 2026 at 5:03 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>>
>> > >Scheduling a timeout for a non-blocking accept with an empty backlog
>> > >meant AF_VSOCK sockets used by epoll network servers incurred hundreds
>> > >of microseconds of additional latency per accept loop compared to
>> > >AF_INET or AF_UNIX sockets.
>> >
>> > Not related to this patch, but should we do something similar (in
>> > another patch) also in vsock_connect() or doesn't matter since usually
>> > it's always blocking?
>>
>> Looking at vsock_connect it's not immediately obvious to me whether it
>> is affected
>> in the same way. I'll capture some ftraces and follow up after
>> updating this patch.
>
>This does not seem to be a problem for vsock_connect since it checks
>for `if (flags & O_NONBLOCK) {` in the while loop before calling
>`schedule_timeout`.
>
>Timings and ftraces:
>
>https://github.com/lrowe/linux-vsock-accept-timeout-investigation?tab=readme-ov-file#a-quick-look-at-connect

Thanks for checking and for the fix!
Stefano


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-04-03 10:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-02  4:46 [PATCH] vsock: avoid timeout for non-blocking accept() with empty backlog Laurence Rowe
2026-04-02 12:02 ` Stefano Garzarella
2026-04-02 19:22   ` Laurence Rowe
2026-04-02 23:30     ` Laurence Rowe
2026-04-03 10:04       ` Stefano Garzarella
2026-04-02 17:19 ` Bobby Eshleman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox