[PATCH] vhost/vsock: Refuse the connection immediately when guest isn't ready

Kernel KVM virtualization development
 help / color / mirror / Atom feed

* [PATCH] vhost/vsock: Refuse the connection immediately when guest isn't ready
@ 2026-05-11 14:56 Polina Vishneva
  2026-05-11 15:56 ` Stefano Garzarella
  0 siblings, 1 reply; 9+ messages in thread
From: Polina Vishneva @ 2026-05-11 14:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: netdev, virtualization, kvm, Eugenio Pérez, Jason Wang,
	Michael S. Tsirkin, Stefano Garzarella, Stefan Hajnoczi,
	Polina Vishneva, Denis V . Lunev

From: "Denis V. Lunev" <den@openvz.org>

When the host initiates an AF_VSOCK connect() to a guest that has not
yet loaded the virtio-vsock transport (i.e. still booting), the caller
blocks for VSOCK_DEFAULT_CONNECT_TIMEOUT (2 seconds), because
vhost_transport_do_send_pkt() silently exits when
vhost_vq_get_backend(vq) returns NULL.

If the guest doesn't start listening within this timeout, connect()
returns ETIMEDOUT.

This delay is usually pointless and it doesn't well align with our
behavior at other initialization stages: for example, if a connection is
attempted when the guest driver is already loaded, but when nothing is
listening yet, it returns ECONNRESET immediately without any wait.

Fix this by checking the RX virtqueue backend in
vhost_transport_send_pkt() before queuing. If the backend is NULL,
return -ECONNREFUSED immediately.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Co-developed-by: Polina Vishneva <polina.vishneva@virtuozzo.com>
Signed-off-by: Polina Vishneva <polina.vishneva@virtuozzo.com>
---
 drivers/vhost/vsock.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 1d8ec6bed53e..a3f218292c3a 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -302,6 +302,16 @@ vhost_transport_send_pkt(struct sk_buff *skb, struct net *net)
 		return -ENODEV;
 	}

+	/* Fast-fail if the guest hasn't enabled the RX vq yet. Reading
+	 * private_data without vq->mutex is deliberate: even if the backend becomes
+	 * NULL right after that check, do_send_pkt() checks it under the mutex.
+	 */
+	if (!data_race(READ_ONCE(vsock->vqs[VSOCK_VQ_RX].private_data))) {
+		rcu_read_unlock();
+		kfree_skb(skb);
+		return -ECONNREFUSED;
+	}
+
 	if (virtio_vsock_skb_reply(skb))
 		atomic_inc(&vsock->queued_replies);

base-commit: 8ab992f815d6736b5c7a6f5fd7bfe7bc106bb3dc
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] vhost/vsock: Refuse the connection immediately when guest isn't ready
  2026-05-11 14:56 [PATCH] vhost/vsock: Refuse the connection immediately when guest isn't ready Polina Vishneva
@ 2026-05-11 15:56 ` Stefano Garzarella
  2026-05-12 14:32   ` Polina Vishneva
  0 siblings, 1 reply; 9+ messages in thread
From: Stefano Garzarella @ 2026-05-11 15:56 UTC (permalink / raw)
  To: Polina Vishneva
  Cc: linux-kernel, netdev, virtualization, kvm, Eugenio Pérez,
	Jason Wang, Michael S. Tsirkin, Stefan Hajnoczi, Denis V . Lunev

On Mon, May 11, 2026 at 04:56:10PM +0200, Polina Vishneva wrote:
>From: "Denis V. Lunev" <den@openvz.org>
>
>When the host initiates an AF_VSOCK connect() to a guest that has not
>yet loaded the virtio-vsock transport (i.e. still booting), the caller
>blocks for VSOCK_DEFAULT_CONNECT_TIMEOUT (2 seconds), because
>vhost_transport_do_send_pkt() silently exits when
>vhost_vq_get_backend(vq) returns NULL.

Can SO_VM_SOCKETS_CONNECT_TIMEOUT helps on this?

>
>If the guest doesn't start listening within this timeout, connect()
>returns ETIMEDOUT.
>
>This delay is usually pointless and it doesn't well align with our
>behavior at other initialization stages: for example, if a connection is
>attempted when the guest driver is already loaded, but when nothing is
>listening yet, it returns ECONNRESET immediately without any wait.
>
>Fix this by checking the RX virtqueue backend in
>vhost_transport_send_pkt() before queuing. If the backend is NULL,
>return -ECONNREFUSED immediately.
>
>Signed-off-by: Denis V. Lunev <den@openvz.org>
>Co-developed-by: Polina Vishneva <polina.vishneva@virtuozzo.com>
>Signed-off-by: Polina Vishneva <polina.vishneva@virtuozzo.com>
>---
> drivers/vhost/vsock.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
>diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>index 1d8ec6bed53e..a3f218292c3a 100644
>--- a/drivers/vhost/vsock.c
>+++ b/drivers/vhost/vsock.c
>@@ -302,6 +302,16 @@ vhost_transport_send_pkt(struct sk_buff *skb, struct net *net)
> 		return -ENODEV;
> 	}
>
>+	/* Fast-fail if the guest hasn't enabled the RX vq yet. Reading
>+	 * private_data without vq->mutex is deliberate: even if the backend becomes
>+	 * NULL right after that check, do_send_pkt() checks it under the mutex.
>+	 */
>+	if (!data_race(READ_ONCE(vsock->vqs[VSOCK_VQ_RX].private_data))) 

Why not using vhost_vq_get_backend() ?

Also is READ_ONCE() okay without WRITE_ONCE() where it is set ?

>{
>+		rcu_read_unlock();
>+		kfree_skb(skb);
>+		return -ECONNREFUSED;

This is a generic send_pkt, is it okay to return ECONNREFUSED in any 
case?

Thanks,
Stefano

>+	}
>+
> 	if (virtio_vsock_skb_reply(skb))
> 		atomic_inc(&vsock->queued_replies);
>
>
>base-commit: 8ab992f815d6736b5c7a6f5fd7bfe7bc106bb3dc
>-- 
>2.53.0
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] vhost/vsock: Refuse the connection immediately when guest isn't ready
  2026-05-11 15:56 ` Stefano Garzarella
@ 2026-05-12 14:32   ` Polina Vishneva
  2026-05-12 15:39     ` Stefano Garzarella
  0 siblings, 1 reply; 9+ messages in thread
From: Polina Vishneva @ 2026-05-12 14:32 UTC (permalink / raw)
  To: sgarzare@redhat.com
  Cc: den@openvz.org, virtualization@lists.linux.dev,
	stefanha@redhat.com, eperezma@redhat.com,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	mst@redhat.com, kvm@vger.kernel.org, jasowang@redhat.com

On Mon, 2026-05-11 at 17:56 +0200, Stefano Garzarella wrote:
> On Mon, May 11, 2026 at 04:56:10PM +0200, Polina Vishneva wrote:
> > From: "Denis V. Lunev" <den@openvz.org>
> > 
> > When the host initiates an AF_VSOCK connect() to a guest that has not
> > yet loaded the virtio-vsock transport (i.e. still booting), the caller
> > blocks for VSOCK_DEFAULT_CONNECT_TIMEOUT (2 seconds), because
> > vhost_transport_do_send_pkt() silently exits when
> > vhost_vq_get_backend(vq) returns NULL.
> 
> Can SO_VM_SOCKETS_CONNECT_TIMEOUT helps on this?

It can, but it might be difficult to find a correct timeout.

And, generally, there's no way to distinguish "the guest hasn't yet initialized
the vq" from "the guest is up and running, but didn't reply to connect() in
time". That's exactly what this patch is attempting to fix.

>
> > 
> > If the guest doesn't start listening within this timeout, connect()
> > returns ETIMEDOUT.
> > 
> > This delay is usually pointless and it doesn't well align with our
> > behavior at other initialization stages: for example, if a connection is
> > attempted when the guest driver is already loaded, but when nothing is
> > listening yet, it returns ECONNRESET immediately without any wait.
> > 
> > Fix this by checking the RX virtqueue backend in
> > vhost_transport_send_pkt() before queuing. If the backend is NULL,
> > return -ECONNREFUSED immediately.
> > 
> > Signed-off-by: Denis V. Lunev <den@openvz.org>
> > Co-developed-by: Polina Vishneva <polina.vishneva@virtuozzo.com>
> > Signed-off-by: Polina Vishneva <polina.vishneva@virtuozzo.com>
> > ---
> > drivers/vhost/vsock.c | 10 ++++++++++
> > 1 file changed, 10 insertions(+)
> > 
> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > index 1d8ec6bed53e..a3f218292c3a 100644
> > --- a/drivers/vhost/vsock.c
> > +++ b/drivers/vhost/vsock.c
> > @@ -302,6 +302,16 @@ vhost_transport_send_pkt(struct sk_buff *skb, struct net *net)
> > 		return -ENODEV;
> > 	}
> > 
> > +	/* Fast-fail if the guest hasn't enabled the RX vq yet. Reading
> > +	 * private_data without vq->mutex is deliberate: even if the backend becomes
> > +	 * NULL right after that check, do_send_pkt() checks it under the mutex.
> > +	 */
> > +	if (!data_race(READ_ONCE(vsock->vqs[VSOCK_VQ_RX].private_data))) 
> 
> Why not using vhost_vq_get_backend() ?

Because it locks the mutex, which is slow and unacceptable in this hot path.

>
> Also is READ_ONCE() okay without WRITE_ONCE() where it is set ?

It's racy, but as described here in the comment and in the commit message,
any possible race outcome is covered by the subsequent checks.

> > {
> > +		rcu_read_unlock();
> > +		kfree_skb(skb);
> > +		return -ECONNREFUSED;
> 
> This is a generic send_pkt, is it okay to return ECONNREFUSED in any 
> case?

EHOSTUNREACH would probably be better.
All the current send_pkt functions only return ENODEV, but it has different
semantics: they mean that the local device isn't yet ready, while there we're
dealing with the opposite end not being ready.

Best regards, Polina.

>
> Thanks,
> Stefano
> 
> > +	}
> > +
> > 	if (virtio_vsock_skb_reply(skb))
> > 		atomic_inc(&vsock->queued_replies);
> > 
> > 
> > base-commit: 8ab992f815d6736b5c7a6f5fd7bfe7bc106bb3dc
> > -- 
> > 2.53.0
> > 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] vhost/vsock: Refuse the connection immediately when guest isn't ready
  2026-05-12 14:32   ` Polina Vishneva
@ 2026-05-12 15:39     ` Stefano Garzarella
  2026-05-12 16:02       ` Michael S. Tsirkin
  2026-05-13 11:18       ` Polina Vishneva
  0 siblings, 2 replies; 9+ messages in thread
From: Stefano Garzarella @ 2026-05-12 15:39 UTC (permalink / raw)
  To: Polina Vishneva
  Cc: den@openvz.org, virtualization@lists.linux.dev,
	stefanha@redhat.com, eperezma@redhat.com,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	mst@redhat.com, kvm@vger.kernel.org, jasowang@redhat.com

On Tue, May 12, 2026 at 02:32:14PM +0000, Polina Vishneva wrote:
>On Mon, 2026-05-11 at 17:56 +0200, Stefano Garzarella wrote:
>> On Mon, May 11, 2026 at 04:56:10PM +0200, Polina Vishneva wrote:
>> > From: "Denis V. Lunev" <den@openvz.org>
>> >
>> > When the host initiates an AF_VSOCK connect() to a guest that has not
>> > yet loaded the virtio-vsock transport (i.e. still booting), the caller
>> > blocks for VSOCK_DEFAULT_CONNECT_TIMEOUT (2 seconds), because
>> > vhost_transport_do_send_pkt() silently exits when
>> > vhost_vq_get_backend(vq) returns NULL.
>>
>> Can SO_VM_SOCKETS_CONNECT_TIMEOUT helps on this?
>
>It can, but it might be difficult to find a correct timeout.
>
>And, generally, there's no way to distinguish "the guest hasn't yet initialized
>the vq" from "the guest is up and running, but didn't reply to connect() in
>time". That's exactly what this patch is attempting to fix.

Okay, so please mention this in the commit message, I mean why 
SO_VM_SOCKETS_CONNECT_TIMEOUT can't really help.

>
>>
>> >
>> > If the guest doesn't start listening within this timeout, connect()
>> > returns ETIMEDOUT.
>> >
>> > This delay is usually pointless and it doesn't well align with our

I still don't understand why this is pointless. If an application wants 
to wait while sleeping, it can simply increase the timeout long enough 
to wait for the VM to start up and use a single `connect()` call, 
instead of continuing to try and wasting CPU cycles unnecessarily.

Hmm, or maybe not, because the driver will definitely be initialized 
before the application that wants to listen on that port, so it will 
respond that no one is listening, and the `connect()` call will fail 
with an `ECONNRESET` error in any case. Right?

If it is the case, is the following line in the commit description 
correct?

     If the guest doesn't start listening within this timeout, connect()
     returns ETIMEDOUT.

I mean, also if the application starts to listen within the timeout, I 
think the connect() will fail in any case as I pointed out above (this 
should be another point in favour of this change)


BTW, I think we should explain this more clearly both here and briefly 
in the code as well.

>> > behavior at other initialization stages: for example, if a connection is
>> > attempted when the guest driver is already loaded, but when nothing is
>> > listening yet, it returns ECONNRESET immediately without any wait.
>> >
>> > Fix this by checking the RX virtqueue backend in
>> > vhost_transport_send_pkt() before queuing. If the backend is NULL,
>> > return -ECONNREFUSED immediately.
>> >
>> > Signed-off-by: Denis V. Lunev <den@openvz.org>
>> > Co-developed-by: Polina Vishneva <polina.vishneva@virtuozzo.com>
>> > Signed-off-by: Polina Vishneva <polina.vishneva@virtuozzo.com>
>> > ---
>> > drivers/vhost/vsock.c | 10 ++++++++++
>> > 1 file changed, 10 insertions(+)
>> >
>> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>> > index 1d8ec6bed53e..a3f218292c3a 100644
>> > --- a/drivers/vhost/vsock.c
>> > +++ b/drivers/vhost/vsock.c
>> > @@ -302,6 +302,16 @@ vhost_transport_send_pkt(struct sk_buff *skb, struct net *net)
>> > 		return -ENODEV;
>> > 	}
>> >
>> > +	/* Fast-fail if the guest hasn't enabled the RX vq yet. Reading
>> > +	 * private_data without vq->mutex is deliberate: even if the backend becomes
>> > +	 * NULL right after that check, do_send_pkt() checks it under the mutex.
>> > +	 */
>> > +	if (!data_race(READ_ONCE(vsock->vqs[VSOCK_VQ_RX].private_data)))
>>
>> Why not using vhost_vq_get_backend() ?
>
>Because it locks the mutex, which is slow and unacceptable in this hot 
>path.

ehm, sorry, which mutex are you talking about?

I see just a comment about the mutex to be acquired by the caller, but I 
don't see any lock there.

>
>>
>> Also is READ_ONCE() okay without WRITE_ONCE() where it is set ?
>
>It's racy, but as described here in the comment and in the commit message,
>any possible race outcome is covered by the subsequent checks.

Okay, so what is the point to call READ_ONCE()?

>
>> > {
>> > +		rcu_read_unlock();
>> > +		kfree_skb(skb);
>> > +		return -ECONNREFUSED;
>>
>> This is a generic send_pkt, is it okay to return ECONNREFUSED in any
>> case?
>
>EHOSTUNREACH would probably be better.
>All the current send_pkt functions only return ENODEV, but it has different
>semantics: they mean that the local device isn't yet ready, while there we're
>dealing with the opposite end not being ready.

In the AF_VSOCK prespective, I see ENODEV like the transport is not 
ready, so I think it can eventually fit here too, but also EHOSTUNREACH 
is fine, for sure better than ECONNREFUSED.

Thanks,
Stefano

>
>Best regards, Polina.
>
>>
>> Thanks,
>> Stefano
>>
>> > +	}
>> > +
>> > 	if (virtio_vsock_skb_reply(skb))
>> > 		atomic_inc(&vsock->queued_replies);
>> >
>> >
>> > base-commit: 8ab992f815d6736b5c7a6f5fd7bfe7bc106bb3dc
>> > --
>> > 2.53.0
>> >


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] vhost/vsock: Refuse the connection immediately when guest isn't ready
  2026-05-12 15:39     ` Stefano Garzarella
@ 2026-05-12 16:02       ` Michael S. Tsirkin
  2026-05-13  9:44         ` Polina Vishneva
  2026-05-13 11:18       ` Polina Vishneva
  1 sibling, 1 reply; 9+ messages in thread
From: Michael S. Tsirkin @ 2026-05-12 16:02 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Polina Vishneva, den@openvz.org, virtualization@lists.linux.dev,
	stefanha@redhat.com, eperezma@redhat.com,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	kvm@vger.kernel.org, jasowang@redhat.com

On Tue, May 12, 2026 at 05:39:48PM +0200, Stefano Garzarella wrote:
> On Tue, May 12, 2026 at 02:32:14PM +0000, Polina Vishneva wrote:
> > On Mon, 2026-05-11 at 17:56 +0200, Stefano Garzarella wrote:
> > > On Mon, May 11, 2026 at 04:56:10PM +0200, Polina Vishneva wrote:
> > > > From: "Denis V. Lunev" <den@openvz.org>
> > > >
> > > > When the host initiates an AF_VSOCK connect() to a guest that has not
> > > > yet loaded the virtio-vsock transport (i.e. still booting), the caller
> > > > blocks for VSOCK_DEFAULT_CONNECT_TIMEOUT (2 seconds), because
> > > > vhost_transport_do_send_pkt() silently exits when
> > > > vhost_vq_get_backend(vq) returns NULL.
> > > 
> > > Can SO_VM_SOCKETS_CONNECT_TIMEOUT helps on this?
> > 
> > It can, but it might be difficult to find a correct timeout.
> > 
> > And, generally, there's no way to distinguish "the guest hasn't yet initialized
> > the vq" from "the guest is up and running, but didn't reply to connect() in
> > time". That's exactly what this patch is attempting to fix.
> 
> Okay, so please mention this in the commit message, I mean why
> SO_VM_SOCKETS_CONNECT_TIMEOUT can't really help.
> 
> > 
> > > 
> > > >
> > > > If the guest doesn't start listening within this timeout, connect()
> > > > returns ETIMEDOUT.
> > > >
> > > > This delay is usually pointless and it doesn't well align with our
> 
> I still don't understand why this is pointless. If an application wants to
> wait while sleeping, it can simply increase the timeout long enough to wait
> for the VM to start up and use a single `connect()` call, instead of
> continuing to try and wasting CPU cycles unnecessarily.
> 
> Hmm, or maybe not, because the driver will definitely be initialized before
> the application that wants to listen on that port, so it will respond that
> no one is listening, and the `connect()` call will fail with an `ECONNRESET`
> error in any case. Right?
> 
> If it is the case, is the following line in the commit description correct?
> 
>     If the guest doesn't start listening within this timeout, connect()
>     returns ETIMEDOUT.
> 
> I mean, also if the application starts to listen within the timeout, I think
> the connect() will fail in any case as I pointed out above (this should be
> another point in favour of this change)
> 
> 
> BTW, I think we should explain this more clearly both here and briefly in
> the code as well.
> 
> > > > behavior at other initialization stages: for example, if a connection is
> > > > attempted when the guest driver is already loaded, but when nothing is
> > > > listening yet, it returns ECONNRESET immediately without any wait.
> > > >
> > > > Fix this by checking the RX virtqueue backend in
> > > > vhost_transport_send_pkt() before queuing. If the backend is NULL,
> > > > return -ECONNREFUSED immediately.
> > > >
> > > > Signed-off-by: Denis V. Lunev <den@openvz.org>
> > > > Co-developed-by: Polina Vishneva <polina.vishneva@virtuozzo.com>
> > > > Signed-off-by: Polina Vishneva <polina.vishneva@virtuozzo.com>
> > > > ---
> > > > drivers/vhost/vsock.c | 10 ++++++++++
> > > > 1 file changed, 10 insertions(+)
> > > >
> > > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > > > index 1d8ec6bed53e..a3f218292c3a 100644
> > > > --- a/drivers/vhost/vsock.c
> > > > +++ b/drivers/vhost/vsock.c
> > > > @@ -302,6 +302,16 @@ vhost_transport_send_pkt(struct sk_buff *skb, struct net *net)
> > > > 		return -ENODEV;
> > > > 	}
> > > >
> > > > +	/* Fast-fail if the guest hasn't enabled the RX vq yet. Reading
> > > > +	 * private_data without vq->mutex is deliberate: even if the backend becomes
> > > > +	 * NULL right after that check, do_send_pkt() checks it under the mutex.
> > > > +	 */
> > > > +	if (!data_race(READ_ONCE(vsock->vqs[VSOCK_VQ_RX].private_data)))
> > > 
> > > Why not using vhost_vq_get_backend() ?
> > 
> > Because it locks the mutex, which is slow and unacceptable in this hot
> > path.
> 
> ehm, sorry, which mutex are you talking about?
> 
> I see just a comment about the mutex to be acquired by the caller, but I
> don't see any lock there.
> 
> > 
> > > 
> > > Also is READ_ONCE() okay without WRITE_ONCE() where it is set ?
> > 
> > It's racy, but as described here in the comment and in the commit message,
> > any possible race outcome is covered by the subsequent checks.
> 
> Okay, so what is the point to call READ_ONCE()?
> 
> > 
> > > > {
> > > > +		rcu_read_unlock();
> > > > +		kfree_skb(skb);
> > > > +		return -ECONNREFUSED;
> > > 
> > > This is a generic send_pkt, is it okay to return ECONNREFUSED in any
> > > case?
> > 
> > EHOSTUNREACH would probably be better.
> > All the current send_pkt functions only return ENODEV, but it has different
> > semantics: they mean that the local device isn't yet ready, while there we're
> > dealing with the opposite end not being ready.
> 
> In the AF_VSOCK prespective, I see ENODEV like the transport is not ready,
> so I think it can eventually fit here too, but also EHOSTUNREACH is fine,
> for sure better than ECONNREFUSED.
> 
> Thanks,
> Stefano

I think it's worth trying to do the same thing with e.g. TCP
and see what error, if any, we get. Match that.


> > 
> > Best regards, Polina.
> > 
> > > 
> > > Thanks,
> > > Stefano
> > > 
> > > > +	}
> > > > +
> > > > 	if (virtio_vsock_skb_reply(skb))
> > > > 		atomic_inc(&vsock->queued_replies);
> > > >
> > > >
> > > > base-commit: 8ab992f815d6736b5c7a6f5fd7bfe7bc106bb3dc
> > > > --
> > > > 2.53.0
> > > >


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] vhost/vsock: Refuse the connection immediately when guest isn't ready
  2026-05-12 16:02       ` Michael S. Tsirkin
@ 2026-05-13  9:44         ` Polina Vishneva
  2026-05-13 10:03           ` Michael S. Tsirkin
  0 siblings, 1 reply; 9+ messages in thread
From: Polina Vishneva @ 2026-05-13  9:44 UTC (permalink / raw)
  To: sgarzare@redhat.com, mst@redhat.com
  Cc: den@openvz.org, virtualization@lists.linux.dev,
	stefanha@redhat.com, eperezma@redhat.com,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	kvm@vger.kernel.org, jasowang@redhat.com

On Tue, 2026-05-12 at 12:02 -0400, Michael S. Tsirkin wrote:
> On Tue, May 12, 2026 at 05:39:48PM +0200, Stefano Garzarella wrote:
> > On Tue, May 12, 2026 at 02:32:14PM +0000, Polina Vishneva wrote:
> > > On Mon, 2026-05-11 at 17:56 +0200, Stefano Garzarella wrote:
> > > > On Mon, May 11, 2026 at 04:56:10PM +0200, Polina Vishneva wrote:
> > > > > From: "Denis V. Lunev" <den@openvz.org>
> > > > > 
> > > > > When the host initiates an AF_VSOCK connect() to a guest that has not
> > > > > yet loaded the virtio-vsock transport (i.e. still booting), the caller
> > > > > blocks for VSOCK_DEFAULT_CONNECT_TIMEOUT (2 seconds), because
> > > > > vhost_transport_do_send_pkt() silently exits when
> > > > > vhost_vq_get_backend(vq) returns NULL.
> > > > 
> > > > Can SO_VM_SOCKETS_CONNECT_TIMEOUT helps on this?
> > > 
> > > It can, but it might be difficult to find a correct timeout.
> > > 
> > > And, generally, there's no way to distinguish "the guest hasn't yet initialized
> > > the vq" from "the guest is up and running, but didn't reply to connect() in
> > > time". That's exactly what this patch is attempting to fix.
> > 
> > Okay, so please mention this in the commit message, I mean why
> > SO_VM_SOCKETS_CONNECT_TIMEOUT can't really help.
> > 
> > > 
> > > > 
> > > > > 
> > > > > If the guest doesn't start listening within this timeout, connect()
> > > > > returns ETIMEDOUT.
> > > > > 
> > > > > This delay is usually pointless and it doesn't well align with our
> > 
> > I still don't understand why this is pointless. If an application wants to
> > wait while sleeping, it can simply increase the timeout long enough to wait
> > for the VM to start up and use a single `connect()` call, instead of
> > continuing to try and wasting CPU cycles unnecessarily.
> > 
> > Hmm, or maybe not, because the driver will definitely be initialized before
> > the application that wants to listen on that port, so it will respond that
> > no one is listening, and the `connect()` call will fail with an `ECONNRESET`
> > error in any case. Right?
> > 
> > If it is the case, is the following line in the commit description correct?
> > 
> >     If the guest doesn't start listening within this timeout, connect()
> >     returns ETIMEDOUT.
> > 
> > I mean, also if the application starts to listen within the timeout, I think
> > the connect() will fail in any case as I pointed out above (this should be
> > another point in favour of this change)
> > 
> > 
> > BTW, I think we should explain this more clearly both here and briefly in
> > the code as well.
> > 
> > > > > behavior at other initialization stages: for example, if a connection is
> > > > > attempted when the guest driver is already loaded, but when nothing is
> > > > > listening yet, it returns ECONNRESET immediately without any wait.
> > > > > 
> > > > > Fix this by checking the RX virtqueue backend in
> > > > > vhost_transport_send_pkt() before queuing. If the backend is NULL,
> > > > > return -ECONNREFUSED immediately.
> > > > > 
> > > > > Signed-off-by: Denis V. Lunev <den@openvz.org>
> > > > > Co-developed-by: Polina Vishneva <polina.vishneva@virtuozzo.com>
> > > > > Signed-off-by: Polina Vishneva <polina.vishneva@virtuozzo.com>
> > > > > ---
> > > > > drivers/vhost/vsock.c | 10 ++++++++++
> > > > > 1 file changed, 10 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > > > > index 1d8ec6bed53e..a3f218292c3a 100644
> > > > > --- a/drivers/vhost/vsock.c
> > > > > +++ b/drivers/vhost/vsock.c
> > > > > @@ -302,6 +302,16 @@ vhost_transport_send_pkt(struct sk_buff *skb, struct net *net)
> > > > > 		return -ENODEV;
> > > > > 	}
> > > > > 
> > > > > +	/* Fast-fail if the guest hasn't enabled the RX vq yet. Reading
> > > > > +	 * private_data without vq->mutex is deliberate: even if the backend becomes
> > > > > +	 * NULL right after that check, do_send_pkt() checks it under the mutex.
> > > > > +	 */
> > > > > +	if (!data_race(READ_ONCE(vsock->vqs[VSOCK_VQ_RX].private_data)))
> > > > 
> > > > Why not using vhost_vq_get_backend() ?
> > > 
> > > Because it locks the mutex, which is slow and unacceptable in this hot
> > > path.
> > 
> > ehm, sorry, which mutex are you talking about?
> > 
> > I see just a comment about the mutex to be acquired by the caller, but I
> > don't see any lock there.
> > 
> > > 
> > > > 
> > > > Also is READ_ONCE() okay without WRITE_ONCE() where it is set ?
> > > 
> > > It's racy, but as described here in the comment and in the commit message,
> > > any possible race outcome is covered by the subsequent checks.
> > 
> > Okay, so what is the point to call READ_ONCE()?
> > 
> > > 
> > > > > {
> > > > > +		rcu_read_unlock();
> > > > > +		kfree_skb(skb);
> > > > > +		return -ECONNREFUSED;
> > > > 
> > > > This is a generic send_pkt, is it okay to return ECONNREFUSED in any
> > > > case?
> > > 
> > > EHOSTUNREACH would probably be better.
> > > All the current send_pkt functions only return ENODEV, but it has different
> > > semantics: they mean that the local device isn't yet ready, while there we're
> > > dealing with the opposite end not being ready.
> > 
> > In the AF_VSOCK prespective, I see ENODEV like the transport is not ready,
> > so I think it can eventually fit here too, but also EHOSTUNREACH is fine,
> > for sure better than ECONNREFUSED.
> > 
> > Thanks,
> > Stefano
> 
> I think it's worth trying to do the same thing with e.g. TCP
> and see what error, if any, we get. Match that.

This case is not directly applicable to TCP: in TCP, there's no out-of-band way
to detect the "host up, but not initialized yet and not ready for connections"
state: this could theoretically be ENOPROTOOPT, but no real TCP stack implement
this, because replying with ICMP_PROT_UNREACH requires a TCP stack, which is
exactly the thing that isn't up.

So, in real world, a similar situation with TCP would result in ETIMEDOUT.

> 
> 
> > > 
> > > Best regards, Polina.
> > > 
> > > > 
> > > > Thanks,
> > > > Stefano
> > > > 
> > > > > +	}
> > > > > +
> > > > > 	if (virtio_vsock_skb_reply(skb))
> > > > > 		atomic_inc(&vsock->queued_replies);
> > > > > 
> > > > > 
> > > > > base-commit: 8ab992f815d6736b5c7a6f5fd7bfe7bc106bb3dc
> > > > > --
> > > > > 2.53.0
> > > > > 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] vhost/vsock: Refuse the connection immediately when guest isn't ready
  2026-05-13  9:44         ` Polina Vishneva
@ 2026-05-13 10:03           ` Michael S. Tsirkin
  2026-05-13 10:34             ` Denis V. Lunev
  0 siblings, 1 reply; 9+ messages in thread
From: Michael S. Tsirkin @ 2026-05-13 10:03 UTC (permalink / raw)
  To: Polina Vishneva
  Cc: sgarzare@redhat.com, den@openvz.org,
	virtualization@lists.linux.dev, stefanha@redhat.com,
	eperezma@redhat.com, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, kvm@vger.kernel.org, jasowang@redhat.com

On Wed, May 13, 2026 at 09:44:49AM +0000, Polina Vishneva wrote:
> On Tue, 2026-05-12 at 12:02 -0400, Michael S. Tsirkin wrote:
> > On Tue, May 12, 2026 at 05:39:48PM +0200, Stefano Garzarella wrote:
> > > On Tue, May 12, 2026 at 02:32:14PM +0000, Polina Vishneva wrote:
> > > > On Mon, 2026-05-11 at 17:56 +0200, Stefano Garzarella wrote:
> > > > > On Mon, May 11, 2026 at 04:56:10PM +0200, Polina Vishneva wrote:
> > > > > > From: "Denis V. Lunev" <den@openvz.org>
> > > > > > 
> > > > > > When the host initiates an AF_VSOCK connect() to a guest that has not
> > > > > > yet loaded the virtio-vsock transport (i.e. still booting), the caller
> > > > > > blocks for VSOCK_DEFAULT_CONNECT_TIMEOUT (2 seconds), because
> > > > > > vhost_transport_do_send_pkt() silently exits when
> > > > > > vhost_vq_get_backend(vq) returns NULL.
> > > > > 
> > > > > Can SO_VM_SOCKETS_CONNECT_TIMEOUT helps on this?
> > > > 
> > > > It can, but it might be difficult to find a correct timeout.
> > > > 
> > > > And, generally, there's no way to distinguish "the guest hasn't yet initialized
> > > > the vq" from "the guest is up and running, but didn't reply to connect() in
> > > > time". That's exactly what this patch is attempting to fix.
> > > 
> > > Okay, so please mention this in the commit message, I mean why
> > > SO_VM_SOCKETS_CONNECT_TIMEOUT can't really help.
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > If the guest doesn't start listening within this timeout, connect()
> > > > > > returns ETIMEDOUT.
> > > > > > 
> > > > > > This delay is usually pointless and it doesn't well align with our
> > > 
> > > I still don't understand why this is pointless. If an application wants to
> > > wait while sleeping, it can simply increase the timeout long enough to wait
> > > for the VM to start up and use a single `connect()` call, instead of
> > > continuing to try and wasting CPU cycles unnecessarily.
> > > 
> > > Hmm, or maybe not, because the driver will definitely be initialized before
> > > the application that wants to listen on that port, so it will respond that
> > > no one is listening, and the `connect()` call will fail with an `ECONNRESET`
> > > error in any case. Right?
> > > 
> > > If it is the case, is the following line in the commit description correct?
> > > 
> > >     If the guest doesn't start listening within this timeout, connect()
> > >     returns ETIMEDOUT.
> > > 
> > > I mean, also if the application starts to listen within the timeout, I think
> > > the connect() will fail in any case as I pointed out above (this should be
> > > another point in favour of this change)
> > > 
> > > 
> > > BTW, I think we should explain this more clearly both here and briefly in
> > > the code as well.
> > > 
> > > > > > behavior at other initialization stages: for example, if a connection is
> > > > > > attempted when the guest driver is already loaded, but when nothing is
> > > > > > listening yet, it returns ECONNRESET immediately without any wait.
> > > > > > 
> > > > > > Fix this by checking the RX virtqueue backend in
> > > > > > vhost_transport_send_pkt() before queuing. If the backend is NULL,
> > > > > > return -ECONNREFUSED immediately.
> > > > > > 
> > > > > > Signed-off-by: Denis V. Lunev <den@openvz.org>
> > > > > > Co-developed-by: Polina Vishneva <polina.vishneva@virtuozzo.com>
> > > > > > Signed-off-by: Polina Vishneva <polina.vishneva@virtuozzo.com>
> > > > > > ---
> > > > > > drivers/vhost/vsock.c | 10 ++++++++++
> > > > > > 1 file changed, 10 insertions(+)
> > > > > > 
> > > > > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > > > > > index 1d8ec6bed53e..a3f218292c3a 100644
> > > > > > --- a/drivers/vhost/vsock.c
> > > > > > +++ b/drivers/vhost/vsock.c
> > > > > > @@ -302,6 +302,16 @@ vhost_transport_send_pkt(struct sk_buff *skb, struct net *net)
> > > > > > 		return -ENODEV;
> > > > > > 	}
> > > > > > 
> > > > > > +	/* Fast-fail if the guest hasn't enabled the RX vq yet. Reading
> > > > > > +	 * private_data without vq->mutex is deliberate: even if the backend becomes
> > > > > > +	 * NULL right after that check, do_send_pkt() checks it under the mutex.
> > > > > > +	 */
> > > > > > +	if (!data_race(READ_ONCE(vsock->vqs[VSOCK_VQ_RX].private_data)))
> > > > > 
> > > > > Why not using vhost_vq_get_backend() ?
> > > > 
> > > > Because it locks the mutex, which is slow and unacceptable in this hot
> > > > path.
> > > 
> > > ehm, sorry, which mutex are you talking about?
> > > 
> > > I see just a comment about the mutex to be acquired by the caller, but I
> > > don't see any lock there.
> > > 
> > > > 
> > > > > 
> > > > > Also is READ_ONCE() okay without WRITE_ONCE() where it is set ?
> > > > 
> > > > It's racy, but as described here in the comment and in the commit message,
> > > > any possible race outcome is covered by the subsequent checks.
> > > 
> > > Okay, so what is the point to call READ_ONCE()?
> > > 
> > > > 
> > > > > > {
> > > > > > +		rcu_read_unlock();
> > > > > > +		kfree_skb(skb);
> > > > > > +		return -ECONNREFUSED;
> > > > > 
> > > > > This is a generic send_pkt, is it okay to return ECONNREFUSED in any
> > > > > case?
> > > > 
> > > > EHOSTUNREACH would probably be better.
> > > > All the current send_pkt functions only return ENODEV, but it has different
> > > > semantics: they mean that the local device isn't yet ready, while there we're
> > > > dealing with the opposite end not being ready.
> > > 
> > > In the AF_VSOCK prespective, I see ENODEV like the transport is not ready,
> > > so I think it can eventually fit here too, but also EHOSTUNREACH is fine,
> > > for sure better than ECONNREFUSED.
> > > 
> > > Thanks,
> > > Stefano
> > 
> > I think it's worth trying to do the same thing with e.g. TCP
> > and see what error, if any, we get. Match that.
> 
> This case is not directly applicable to TCP: in TCP, there's no out-of-band way
> to detect the "host up, but not initialized yet and not ready for connections"
> state: this could theoretically be ENOPROTOOPT, but no real TCP stack implement
> this, because replying with ICMP_PROT_UNREACH requires a TCP stack, which is
> exactly the thing that isn't up.
> 
> So, in real world, a similar situation with TCP would result in ETIMEDOUT.

Then it just might be best to keep the current behaviour which seems
to match that pretty closely?


> > 
> > 
> > > > 
> > > > Best regards, Polina.
> > > > 
> > > > > 
> > > > > Thanks,
> > > > > Stefano
> > > > > 
> > > > > > +	}
> > > > > > +
> > > > > > 	if (virtio_vsock_skb_reply(skb))
> > > > > > 		atomic_inc(&vsock->queued_replies);
> > > > > > 
> > > > > > 
> > > > > > base-commit: 8ab992f815d6736b5c7a6f5fd7bfe7bc106bb3dc
> > > > > > --
> > > > > > 2.53.0
> > > > > > 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] vhost/vsock: Refuse the connection immediately when guest isn't ready
  2026-05-13 10:03           ` Michael S. Tsirkin
@ 2026-05-13 10:34             ` Denis V. Lunev
  0 siblings, 0 replies; 9+ messages in thread
From: Denis V. Lunev @ 2026-05-13 10:34 UTC (permalink / raw)
  To: Michael S. Tsirkin, Polina Vishneva
  Cc: sgarzare@redhat.com, den@openvz.org,
	virtualization@lists.linux.dev, stefanha@redhat.com,
	eperezma@redhat.com, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, kvm@vger.kernel.org, jasowang@redhat.com

On 5/13/26 12:03, Michael S. Tsirkin wrote:
> On Wed, May 13, 2026 at 09:44:49AM +0000, Polina Vishneva wrote:
>> On Tue, 2026-05-12 at 12:02 -0400, Michael S. Tsirkin wrote:
>>> On Tue, May 12, 2026 at 05:39:48PM +0200, Stefano Garzarella wrote:
>>>> On Tue, May 12, 2026 at 02:32:14PM +0000, Polina Vishneva wrote:
>>>>> On Mon, 2026-05-11 at 17:56 +0200, Stefano Garzarella wrote:
>>>>>> On Mon, May 11, 2026 at 04:56:10PM +0200, Polina Vishneva wrote:
>>>>>>> From: "Denis V. Lunev" <den@openvz.org>
>>>>>>>
>>>>>>> When the host initiates an AF_VSOCK connect() to a guest that has not
>>>>>>> yet loaded the virtio-vsock transport (i.e. still booting), the caller
>>>>>>> blocks for VSOCK_DEFAULT_CONNECT_TIMEOUT (2 seconds), because
>>>>>>> vhost_transport_do_send_pkt() silently exits when
>>>>>>> vhost_vq_get_backend(vq) returns NULL.
>>>>>> Can SO_VM_SOCKETS_CONNECT_TIMEOUT helps on this?
>>>>> It can, but it might be difficult to find a correct timeout.
>>>>>
>>>>> And, generally, there's no way to distinguish "the guest hasn't yet initialized
>>>>> the vq" from "the guest is up and running, but didn't reply to connect() in
>>>>> time". That's exactly what this patch is attempting to fix.
>>>> Okay, so please mention this in the commit message, I mean why
>>>> SO_VM_SOCKETS_CONNECT_TIMEOUT can't really help.
>>>>
>>>>>>> If the guest doesn't start listening within this timeout, connect()
>>>>>>> returns ETIMEDOUT.
>>>>>>>
>>>>>>> This delay is usually pointless and it doesn't well align with our
>>>> I still don't understand why this is pointless. If an application wants to
>>>> wait while sleeping, it can simply increase the timeout long enough to wait
>>>> for the VM to start up and use a single `connect()` call, instead of
>>>> continuing to try and wasting CPU cycles unnecessarily.
>>>>
>>>> Hmm, or maybe not, because the driver will definitely be initialized before
>>>> the application that wants to listen on that port, so it will respond that
>>>> no one is listening, and the `connect()` call will fail with an `ECONNRESET`
>>>> error in any case. Right?
>>>>
>>>> If it is the case, is the following line in the commit description correct?
>>>>
>>>>     If the guest doesn't start listening within this timeout, connect()
>>>>     returns ETIMEDOUT.
>>>>
>>>> I mean, also if the application starts to listen within the timeout, I think
>>>> the connect() will fail in any case as I pointed out above (this should be
>>>> another point in favour of this change)
>>>>
>>>>
>>>> BTW, I think we should explain this more clearly both here and briefly in
>>>> the code as well.
>>>>
>>>>>>> behavior at other initialization stages: for example, if a connection is
>>>>>>> attempted when the guest driver is already loaded, but when nothing is
>>>>>>> listening yet, it returns ECONNRESET immediately without any wait.
>>>>>>>
>>>>>>> Fix this by checking the RX virtqueue backend in
>>>>>>> vhost_transport_send_pkt() before queuing. If the backend is NULL,
>>>>>>> return -ECONNREFUSED immediately.
>>>>>>>
>>>>>>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>>>>>>> Co-developed-by: Polina Vishneva <polina.vishneva@virtuozzo.com>
>>>>>>> Signed-off-by: Polina Vishneva <polina.vishneva@virtuozzo.com>
>>>>>>> ---
>>>>>>> drivers/vhost/vsock.c | 10 ++++++++++
>>>>>>> 1 file changed, 10 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>>>>>>> index 1d8ec6bed53e..a3f218292c3a 100644
>>>>>>> --- a/drivers/vhost/vsock.c
>>>>>>> +++ b/drivers/vhost/vsock.c
>>>>>>> @@ -302,6 +302,16 @@ vhost_transport_send_pkt(struct sk_buff *skb, struct net *net)
>>>>>>> 		return -ENODEV;
>>>>>>> 	}
>>>>>>>
>>>>>>> +	/* Fast-fail if the guest hasn't enabled the RX vq yet. Reading
>>>>>>> +	 * private_data without vq->mutex is deliberate: even if the backend becomes
>>>>>>> +	 * NULL right after that check, do_send_pkt() checks it under the mutex.
>>>>>>> +	 */
>>>>>>> +	if (!data_race(READ_ONCE(vsock->vqs[VSOCK_VQ_RX].private_data)))
>>>>>> Why not using vhost_vq_get_backend() ?
>>>>> Because it locks the mutex, which is slow and unacceptable in this hot
>>>>> path.
>>>> ehm, sorry, which mutex are you talking about?
>>>>
>>>> I see just a comment about the mutex to be acquired by the caller, but I
>>>> don't see any lock there.
>>>>
>>>>>> Also is READ_ONCE() okay without WRITE_ONCE() where it is set ?
>>>>> It's racy, but as described here in the comment and in the commit message,
>>>>> any possible race outcome is covered by the subsequent checks.
>>>> Okay, so what is the point to call READ_ONCE()?
>>>>
>>>>>>> {
>>>>>>> +		rcu_read_unlock();
>>>>>>> +		kfree_skb(skb);
>>>>>>> +		return -ECONNREFUSED;
>>>>>> This is a generic send_pkt, is it okay to return ECONNREFUSED in any
>>>>>> case?
>>>>> EHOSTUNREACH would probably be better.
>>>>> All the current send_pkt functions only return ENODEV, but it has different
>>>>> semantics: they mean that the local device isn't yet ready, while there we're
>>>>> dealing with the opposite end not being ready.
>>>> In the AF_VSOCK prespective, I see ENODEV like the transport is not ready,
>>>> so I think it can eventually fit here too, but also EHOSTUNREACH is fine,
>>>> for sure better than ECONNREFUSED.
>>>>
>>>> Thanks,
>>>> Stefano
>>> I think it's worth trying to do the same thing with e.g. TCP
>>> and see what error, if any, we get. Match that.
>> This case is not directly applicable to TCP: in TCP, there's no out-of-band way
>> to detect the "host up, but not initialized yet and not ready for connections"
>> state: this could theoretically be ENOPROTOOPT, but no real TCP stack implement
>> this, because replying with ICMP_PROT_UNREACH requires a TCP stack, which is
>> exactly the thing that isn't up.
>>
>> So, in real world, a similar situation with TCP would result in ETIMEDOUT.
> Then it just might be best to keep the current behaviour which seems
> to match that pretty closely?

My motivation on this is very simple.

1. The guest is not configured rings yet
2. Connection request arrives, waiting is started
3. Guest has configured rings, assuming initialization is performed.
   Important - there is no process listening in guest on the socket
   yet, even in Linux when dependency will start next service this
   will require some time
4. The queued connection request in host revives and gets -ECONNREFUSED
   in my understanding. The destiny of the packet arrived just in this
   window would be the same
5. Guest listener starts

At my opinion packets coming in the moment 2 should have the same
resolution as packets arriving in 4, without timeout as if these
packets will be served in the moment 3 they should get this resolution.
No need to wait.

Den

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] vhost/vsock: Refuse the connection immediately when guest isn't ready
  2026-05-12 15:39     ` Stefano Garzarella
  2026-05-12 16:02       ` Michael S. Tsirkin
@ 2026-05-13 11:18       ` Polina Vishneva
  1 sibling, 0 replies; 9+ messages in thread
From: Polina Vishneva @ 2026-05-13 11:18 UTC (permalink / raw)
  To: sgarzare@redhat.com
  Cc: den@openvz.org, virtualization@lists.linux.dev,
	stefanha@redhat.com, eperezma@redhat.com,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	kvm@vger.kernel.org, mst@redhat.com, jasowang@redhat.com

On Tue, 2026-05-12 at 17:39 +0200, Stefano Garzarella wrote:
> On Tue, May 12, 2026 at 02:32:14PM +0000, Polina Vishneva wrote:
> > On Mon, 2026-05-11 at 17:56 +0200, Stefano Garzarella wrote:
> > > On Mon, May 11, 2026 at 04:56:10PM +0200, Polina Vishneva wrote:
> > > > From: "Denis V. Lunev" <den@openvz.org>
> > > > 
> > > > When the host initiates an AF_VSOCK connect() to a guest that has not
> > > > yet loaded the virtio-vsock transport (i.e. still booting), the caller
> > > > blocks for VSOCK_DEFAULT_CONNECT_TIMEOUT (2 seconds), because
> > > > vhost_transport_do_send_pkt() silently exits when
> > > > vhost_vq_get_backend(vq) returns NULL.
> > > 
> > > Can SO_VM_SOCKETS_CONNECT_TIMEOUT helps on this?
> > 
> > It can, but it might be difficult to find a correct timeout.
> > 
> > And, generally, there's no way to distinguish "the guest hasn't yet initialized
> > the vq" from "the guest is up and running, but didn't reply to connect() in
> > time". That's exactly what this patch is attempting to fix.
> 
> Okay, so please mention this in the commit message, I mean why 
> SO_VM_SOCKETS_CONNECT_TIMEOUT can't really help.

Will do.

> 
> > 
> > > 
> > > > 
> > > > If the guest doesn't start listening within this timeout, connect()
> > > > returns ETIMEDOUT.
> > > > 
> > > > This delay is usually pointless and it doesn't well align with our
> 
> I still don't understand why this is pointless. If an application wants 
> to wait while sleeping, it can simply increase the timeout long enough 
> to wait for the VM to start up and use a single `connect()` call, 
> instead of continuing to try and wasting CPU cycles unnecessarily.
> 
> Hmm, or maybe not, because the driver will definitely be initialized 
> before the application that wants to listen on that port, so it will 
> respond that no one is listening, and the `connect()` call will fail 
> with an `ECONNRESET` error in any case. Right?

That's the case indeed.

> 
> If it is the case, is the following line in the commit description 
> correct?
> 
>      If the guest doesn't start listening within this timeout, connect()
>      returns ETIMEDOUT.
> 
> I mean, also if the application starts to listen within the timeout, I 
> think the connect() will fail in any case as I pointed out above (this 
> should be another point in favour of this change)

Yes, the commit message should be updated, as well as the code comment.

> 
> 
> BTW, I think we should explain this more clearly both here and briefly 
> in the code as well.

Definitely.

> 
> > > > behavior at other initialization stages: for example, if a connection is
> > > > attempted when the guest driver is already loaded, but when nothing is
> > > > listening yet, it returns ECONNRESET immediately without any wait.
> > > > 
> > > > Fix this by checking the RX virtqueue backend in
> > > > vhost_transport_send_pkt() before queuing. If the backend is NULL,
> > > > return -ECONNREFUSED immediately.
> > > > 
> > > > Signed-off-by: Denis V. Lunev <den@openvz.org>
> > > > Co-developed-by: Polina Vishneva <polina.vishneva@virtuozzo.com>
> > > > Signed-off-by: Polina Vishneva <polina.vishneva@virtuozzo.com>
> > > > ---
> > > > drivers/vhost/vsock.c | 10 ++++++++++
> > > > 1 file changed, 10 insertions(+)
> > > > 
> > > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > > > index 1d8ec6bed53e..a3f218292c3a 100644
> > > > --- a/drivers/vhost/vsock.c
> > > > +++ b/drivers/vhost/vsock.c
> > > > @@ -302,6 +302,16 @@ vhost_transport_send_pkt(struct sk_buff *skb, struct net *net)
> > > > 		return -ENODEV;
> > > > 	}
> > > > 
> > > > +	/* Fast-fail if the guest hasn't enabled the RX vq yet. Reading
> > > > +	 * private_data without vq->mutex is deliberate: even if the backend becomes
> > > > +	 * NULL right after that check, do_send_pkt() checks it under the mutex.
> > > > +	 */
> > > > +	if (!data_race(READ_ONCE(vsock->vqs[VSOCK_VQ_RX].private_data)))
> > > 
> > > Why not using vhost_vq_get_backend() ?
> > 
> > Because it locks the mutex, which is slow and unacceptable in this hot 
> > path.
> 
> ehm, sorry, which mutex are you talking about?
> 
> I see just a comment about the mutex to be acquired by the caller, but I 
> don't see any lock there.

Apparently the comment in vhost.h says "Context: Need to call with vq->mutex
acquired.", but I guess we're safe to ignore this and use it instead of
accessing private_data manually, thanks for pointing this out.

> 
> > 
> > > 
> > > Also is READ_ONCE() okay without WRITE_ONCE() where it is set ?
> > 
> > It's racy, but as described here in the comment and in the commit message,
> > any possible race outcome is covered by the subsequent checks.
> 
> Okay, so what is the point to call READ_ONCE()?

Probably none, it was just there in the initial patch version, and I've decided
not to drop it when adding data_race(). Will drop.

> 
> > 
> > > > {
> > > > +		rcu_read_unlock();
> > > > +		kfree_skb(skb);
> > > > +		return -ECONNREFUSED;
> > > 
> > > This is a generic send_pkt, is it okay to return ECONNREFUSED in any
> > > case?
> > 
> > EHOSTUNREACH would probably be better.
> > All the current send_pkt functions only return ENODEV, but it has different
> > semantics: they mean that the local device isn't yet ready, while there we're
> > dealing with the opposite end not being ready.
> 
> In the AF_VSOCK prespective, I see ENODEV like the transport is not 
> ready, so I think it can eventually fit here too, but also EHOSTUNREACH 
> is fine, for sure better than ECONNREFUSED.

EHOSTUNREACH is indeed a better fit, agreed.

> 
> Thanks,
> Stefano
> 
> > 
> > Best regards, Polina.
> > 
> > > 
> > > Thanks,
> > > Stefano
> > > 
> > > > +	}
> > > > +
> > > > 	if (virtio_vsock_skb_reply(skb))
> > > > 		atomic_inc(&vsock->queued_replies);
> > > > 
> > > > 
> > > > base-commit: 8ab992f815d6736b5c7a6f5fd7bfe7bc106bb3dc
> > > > --
> > > > 2.53.0
> > > > 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-05-13 11:18 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-11 14:56 [PATCH] vhost/vsock: Refuse the connection immediately when guest isn't ready Polina Vishneva
2026-05-11 15:56 ` Stefano Garzarella
2026-05-12 14:32   ` Polina Vishneva
2026-05-12 15:39     ` Stefano Garzarella
2026-05-12 16:02       ` Michael S. Tsirkin
2026-05-13  9:44         ` Polina Vishneva
2026-05-13 10:03           ` Michael S. Tsirkin
2026-05-13 10:34             ` Denis V. Lunev
2026-05-13 11:18       ` Polina Vishneva

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox