Linux virtualization list
 help / color / mirror / Atom feed
From: Polina Vishneva <polina.vishneva@virtuozzo.com>
To: "sgarzare@redhat.com" <sgarzare@redhat.com>,
	"mst@redhat.com" <mst@redhat.com>
Cc: "den@openvz.org" <den@openvz.org>,
	"virtualization@lists.linux.dev" <virtualization@lists.linux.dev>,
	"stefanha@redhat.com" <stefanha@redhat.com>,
	"eperezma@redhat.com" <eperezma@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"jasowang@redhat.com" <jasowang@redhat.com>
Subject: Re: [PATCH] vhost/vsock: Refuse the connection immediately when guest isn't ready
Date: Wed, 13 May 2026 09:44:49 +0000	[thread overview]
Message-ID: <8ae7e443034026eda016322d22da52700e432f09.camel@virtuozzo.com> (raw)
In-Reply-To: <20260512120019-mutt-send-email-mst@kernel.org>

On Tue, 2026-05-12 at 12:02 -0400, Michael S. Tsirkin wrote:
> On Tue, May 12, 2026 at 05:39:48PM +0200, Stefano Garzarella wrote:
> > On Tue, May 12, 2026 at 02:32:14PM +0000, Polina Vishneva wrote:
> > > On Mon, 2026-05-11 at 17:56 +0200, Stefano Garzarella wrote:
> > > > On Mon, May 11, 2026 at 04:56:10PM +0200, Polina Vishneva wrote:
> > > > > From: "Denis V. Lunev" <den@openvz.org>
> > > > > 
> > > > > When the host initiates an AF_VSOCK connect() to a guest that has not
> > > > > yet loaded the virtio-vsock transport (i.e. still booting), the caller
> > > > > blocks for VSOCK_DEFAULT_CONNECT_TIMEOUT (2 seconds), because
> > > > > vhost_transport_do_send_pkt() silently exits when
> > > > > vhost_vq_get_backend(vq) returns NULL.
> > > > 
> > > > Can SO_VM_SOCKETS_CONNECT_TIMEOUT helps on this?
> > > 
> > > It can, but it might be difficult to find a correct timeout.
> > > 
> > > And, generally, there's no way to distinguish "the guest hasn't yet initialized
> > > the vq" from "the guest is up and running, but didn't reply to connect() in
> > > time". That's exactly what this patch is attempting to fix.
> > 
> > Okay, so please mention this in the commit message, I mean why
> > SO_VM_SOCKETS_CONNECT_TIMEOUT can't really help.
> > 
> > > 
> > > > 
> > > > > 
> > > > > If the guest doesn't start listening within this timeout, connect()
> > > > > returns ETIMEDOUT.
> > > > > 
> > > > > This delay is usually pointless and it doesn't well align with our
> > 
> > I still don't understand why this is pointless. If an application wants to
> > wait while sleeping, it can simply increase the timeout long enough to wait
> > for the VM to start up and use a single `connect()` call, instead of
> > continuing to try and wasting CPU cycles unnecessarily.
> > 
> > Hmm, or maybe not, because the driver will definitely be initialized before
> > the application that wants to listen on that port, so it will respond that
> > no one is listening, and the `connect()` call will fail with an `ECONNRESET`
> > error in any case. Right?
> > 
> > If it is the case, is the following line in the commit description correct?
> > 
> >     If the guest doesn't start listening within this timeout, connect()
> >     returns ETIMEDOUT.
> > 
> > I mean, also if the application starts to listen within the timeout, I think
> > the connect() will fail in any case as I pointed out above (this should be
> > another point in favour of this change)
> > 
> > 
> > BTW, I think we should explain this more clearly both here and briefly in
> > the code as well.
> > 
> > > > > behavior at other initialization stages: for example, if a connection is
> > > > > attempted when the guest driver is already loaded, but when nothing is
> > > > > listening yet, it returns ECONNRESET immediately without any wait.
> > > > > 
> > > > > Fix this by checking the RX virtqueue backend in
> > > > > vhost_transport_send_pkt() before queuing. If the backend is NULL,
> > > > > return -ECONNREFUSED immediately.
> > > > > 
> > > > > Signed-off-by: Denis V. Lunev <den@openvz.org>
> > > > > Co-developed-by: Polina Vishneva <polina.vishneva@virtuozzo.com>
> > > > > Signed-off-by: Polina Vishneva <polina.vishneva@virtuozzo.com>
> > > > > ---
> > > > > drivers/vhost/vsock.c | 10 ++++++++++
> > > > > 1 file changed, 10 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > > > > index 1d8ec6bed53e..a3f218292c3a 100644
> > > > > --- a/drivers/vhost/vsock.c
> > > > > +++ b/drivers/vhost/vsock.c
> > > > > @@ -302,6 +302,16 @@ vhost_transport_send_pkt(struct sk_buff *skb, struct net *net)
> > > > > 		return -ENODEV;
> > > > > 	}
> > > > > 
> > > > > +	/* Fast-fail if the guest hasn't enabled the RX vq yet. Reading
> > > > > +	 * private_data without vq->mutex is deliberate: even if the backend becomes
> > > > > +	 * NULL right after that check, do_send_pkt() checks it under the mutex.
> > > > > +	 */
> > > > > +	if (!data_race(READ_ONCE(vsock->vqs[VSOCK_VQ_RX].private_data)))
> > > > 
> > > > Why not using vhost_vq_get_backend() ?
> > > 
> > > Because it locks the mutex, which is slow and unacceptable in this hot
> > > path.
> > 
> > ehm, sorry, which mutex are you talking about?
> > 
> > I see just a comment about the mutex to be acquired by the caller, but I
> > don't see any lock there.
> > 
> > > 
> > > > 
> > > > Also is READ_ONCE() okay without WRITE_ONCE() where it is set ?
> > > 
> > > It's racy, but as described here in the comment and in the commit message,
> > > any possible race outcome is covered by the subsequent checks.
> > 
> > Okay, so what is the point to call READ_ONCE()?
> > 
> > > 
> > > > > {
> > > > > +		rcu_read_unlock();
> > > > > +		kfree_skb(skb);
> > > > > +		return -ECONNREFUSED;
> > > > 
> > > > This is a generic send_pkt, is it okay to return ECONNREFUSED in any
> > > > case?
> > > 
> > > EHOSTUNREACH would probably be better.
> > > All the current send_pkt functions only return ENODEV, but it has different
> > > semantics: they mean that the local device isn't yet ready, while there we're
> > > dealing with the opposite end not being ready.
> > 
> > In the AF_VSOCK prespective, I see ENODEV like the transport is not ready,
> > so I think it can eventually fit here too, but also EHOSTUNREACH is fine,
> > for sure better than ECONNREFUSED.
> > 
> > Thanks,
> > Stefano
> 
> I think it's worth trying to do the same thing with e.g. TCP
> and see what error, if any, we get. Match that.

This case is not directly applicable to TCP: in TCP, there's no out-of-band way
to detect the "host up, but not initialized yet and not ready for connections"
state: this could theoretically be ENOPROTOOPT, but no real TCP stack implement
this, because replying with ICMP_PROT_UNREACH requires a TCP stack, which is
exactly the thing that isn't up.

So, in real world, a similar situation with TCP would result in ETIMEDOUT.

> 
> 
> > > 
> > > Best regards, Polina.
> > > 
> > > > 
> > > > Thanks,
> > > > Stefano
> > > > 
> > > > > +	}
> > > > > +
> > > > > 	if (virtio_vsock_skb_reply(skb))
> > > > > 		atomic_inc(&vsock->queued_replies);
> > > > > 
> > > > > 
> > > > > base-commit: 8ab992f815d6736b5c7a6f5fd7bfe7bc106bb3dc
> > > > > --
> > > > > 2.53.0
> > > > > 

  reply	other threads:[~2026-05-13  9:44 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-11 14:56 [PATCH] vhost/vsock: Refuse the connection immediately when guest isn't ready Polina Vishneva
2026-05-11 15:56 ` Stefano Garzarella
2026-05-12 14:32   ` Polina Vishneva
2026-05-12 15:39     ` Stefano Garzarella
2026-05-12 16:02       ` Michael S. Tsirkin
2026-05-13  9:44         ` Polina Vishneva [this message]
2026-05-13 10:03           ` Michael S. Tsirkin
2026-05-13 10:34             ` Denis V. Lunev
2026-05-13 11:18       ` Polina Vishneva

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8ae7e443034026eda016322d22da52700e432f09.camel@virtuozzo.com \
    --to=polina.vishneva@virtuozzo.com \
    --cc=den@openvz.org \
    --cc=eperezma@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=sgarzare@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox