Netdev List
 help / color / mirror / Atom feed
* [PATCH 0/4] vhost/vsock: add support for VHOST_RESET_OWNER and CPR migration
@ 2026-06-12 16:57 Andrey Drobyshev
  2026-06-12 16:57 ` [PATCH 1/4] vhost/vsock: split out vhost_vsock_drop_backends helper Andrey Drobyshev
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Andrey Drobyshev @ 2026-06-12 16:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: kvm, virtualization, netdev, sgarzare, mst, stefanha,
	maciej.szmigiero, bchaney, mark.kanda, ptikhomirov, den,
	andrey.drobyshev

Host<-->guest connections via AF_VSOCK sockets aren't supposed to
outlive VM migration, since VM is moving to another host.  However
there's a special case, which is QEMU live-update, or CPR
(checkpoint-restore) migration.  In this case, VM remains on the same
host, and we'd like such connections to persist.

For this to work, we need to be able to transfer device ownership from
source QEMU to dest QEMU.  Namely, source needs to reset ownership by
issuing VHOST_RESET_OWNER ioctl, and then target has to claim it by
calling VHOST_SET_OWNER.

Since VHOST_RESET_OWNER isn't yet implemented for vhost-vsock, let's add
such implementation (patches 1-2).  Also fix regression introduced by
the earlier commit [1] (patch 3), and fix the deadlock bug (commit 4).

There's a complementary series for QEMU [0] adding support of vhost-vsock
devices during CPR migration.

NOTE: this series needs to be applied on top of Michael's vhost/linux-next
tree as it contains relevant commit [1], not yet present in master branch.

I've tested this (patched QEMU + patched kernel) approximately as follows:

  * Run listener in the guest:
  socat -u VSOCK-LISTEN:9999 - >/tmp/recv.bin

  * Run data transfer from host to guest:
  socat -u FILE:/root/bigfile.bin VSOCK-CONNECT:CID:9999

  * Perform CPR migration during transfer (either cpr-exec or cpr-transfer)
  * Check that file hash sum matches

[0] https://lore.kernel.org/qemu-devel/20260612165110.431376-1-andrey.drobyshev@virtuozzo.com
[1] https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/commit/?id=bb26ed5f3a8b

Andrey Drobyshev (1):
  vhost/vsock: re-scan TX virtqueue on device start

Denis V. Lunev (1):
  vhost/vsock: suppress EHOSTUNREACH fast-fail during CPR pause

Pavel Tikhomirov (2):
  vhost/vsock: split out vhost_vsock_drop_backends helper
  vhost/vsock: add VHOST_RESET_OWNER ioctl

 drivers/vhost/vsock.c | 80 +++++++++++++++++++++++++++++++++++++------
 1 file changed, 69 insertions(+), 11 deletions(-)

-- 
2.47.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/4] vhost/vsock: split out vhost_vsock_drop_backends helper
  2026-06-12 16:57 [PATCH 0/4] vhost/vsock: add support for VHOST_RESET_OWNER and CPR migration Andrey Drobyshev
@ 2026-06-12 16:57 ` Andrey Drobyshev
  2026-06-16 13:42   ` Stefano Garzarella
  2026-06-12 16:57 ` [PATCH 2/4] vhost/vsock: add VHOST_RESET_OWNER ioctl Andrey Drobyshev
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Andrey Drobyshev @ 2026-06-12 16:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: kvm, virtualization, netdev, sgarzare, mst, stefanha,
	maciej.szmigiero, bchaney, mark.kanda, ptikhomirov, den,
	andrey.drobyshev

From: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>

Split the actual backend dropping part from vhost_vsock_stop.  We're
going to need it for the VHOST_RESET_OWNER implementation in the
following patch, when vsock->dev.mutex is already taken and owner is
checked.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
---
 drivers/vhost/vsock.c | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 9aaab6bb8061..b12221ce6faf 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -664,9 +664,24 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
 	return ret;
 }
 
-static int vhost_vsock_stop(struct vhost_vsock *vsock, bool check_owner)
+static void vhost_vsock_drop_backends(struct vhost_vsock *vsock)
 {
+	struct vhost_virtqueue *vq;
 	size_t i;
+
+	lockdep_assert_held(&vsock->dev.mutex);
+
+	for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
+		vq = &vsock->vqs[i];
+
+		mutex_lock(&vq->mutex);
+		vhost_vq_set_backend(vq, NULL);
+		mutex_unlock(&vq->mutex);
+	}
+}
+
+static int vhost_vsock_stop(struct vhost_vsock *vsock, bool check_owner)
+{
 	int ret = 0;
 
 	mutex_lock(&vsock->dev.mutex);
@@ -677,14 +692,7 @@ static int vhost_vsock_stop(struct vhost_vsock *vsock, bool check_owner)
 			goto err;
 	}
 
-	for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
-		struct vhost_virtqueue *vq = &vsock->vqs[i];
-
-		mutex_lock(&vq->mutex);
-		vhost_vq_set_backend(vq, NULL);
-		mutex_unlock(&vq->mutex);
-	}
-
+	vhost_vsock_drop_backends(vsock);
 err:
 	mutex_unlock(&vsock->dev.mutex);
 	return ret;
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/4] vhost/vsock: add VHOST_RESET_OWNER ioctl
  2026-06-12 16:57 [PATCH 0/4] vhost/vsock: add support for VHOST_RESET_OWNER and CPR migration Andrey Drobyshev
  2026-06-12 16:57 ` [PATCH 1/4] vhost/vsock: split out vhost_vsock_drop_backends helper Andrey Drobyshev
@ 2026-06-12 16:57 ` Andrey Drobyshev
  2026-06-16 13:48   ` Stefano Garzarella
  2026-06-12 16:57 ` [PATCH 3/4] vhost/vsock: suppress EHOSTUNREACH fast-fail during CPR pause Andrey Drobyshev
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Andrey Drobyshev @ 2026-06-12 16:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: kvm, virtualization, netdev, sgarzare, mst, stefanha,
	maciej.szmigiero, bchaney, mark.kanda, ptikhomirov, den,
	andrey.drobyshev

From: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>

This ioctl is needed for QEMU's CPR (checkpoint-restore) migration of
the guest with vhost-vsock device.  For this to work, we need to reset
the device ownership on the source side by calling RESET_OWNER, and then
claim it on the dest side by calling SET_OWNER.  We expect not to lose any
AF_VSOCK connection while this happens.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
---
 drivers/vhost/vsock.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index b12221ce6faf..e629886e5cf8 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -894,6 +894,32 @@ static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features)
 	return -EFAULT;
 }
 
+static int vhost_vsock_reset_owner(struct vhost_vsock *vsock)
+{
+	struct vhost_iotlb *umem;
+	long err;
+
+	mutex_lock(&vsock->dev.mutex);
+	err = vhost_dev_check_owner(&vsock->dev);
+	if (err)
+		goto done;
+	umem = vhost_dev_reset_owner_prepare();
+	if (!umem) {
+		err = -ENOMEM;
+		goto done;
+	}
+	/* Follows vhost_vsock_dev_release closely except for guest_cid drop */
+	vsock_for_each_connected_socket(&vhost_transport.transport,
+					vhost_vsock_reset_orphans);
+	vhost_vsock_drop_backends(vsock);
+	vhost_vsock_flush(vsock);
+	vhost_dev_stop(&vsock->dev);
+	vhost_dev_reset_owner(&vsock->dev, umem);
+done:
+	mutex_unlock(&vsock->dev.mutex);
+	return err;
+}
+
 static long vhost_vsock_dev_ioctl(struct file *f, unsigned int ioctl,
 				  unsigned long arg)
 {
@@ -937,6 +963,8 @@ static long vhost_vsock_dev_ioctl(struct file *f, unsigned int ioctl,
 			return -EOPNOTSUPP;
 		vhost_set_backend_features(&vsock->dev, features);
 		return 0;
+	case VHOST_RESET_OWNER:
+		return vhost_vsock_reset_owner(vsock);
 	default:
 		mutex_lock(&vsock->dev.mutex);
 		r = vhost_dev_ioctl(&vsock->dev, ioctl, argp);
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/4] vhost/vsock: suppress EHOSTUNREACH fast-fail during CPR pause
  2026-06-12 16:57 [PATCH 0/4] vhost/vsock: add support for VHOST_RESET_OWNER and CPR migration Andrey Drobyshev
  2026-06-12 16:57 ` [PATCH 1/4] vhost/vsock: split out vhost_vsock_drop_backends helper Andrey Drobyshev
  2026-06-12 16:57 ` [PATCH 2/4] vhost/vsock: add VHOST_RESET_OWNER ioctl Andrey Drobyshev
@ 2026-06-12 16:57 ` Andrey Drobyshev
  2026-06-16 14:18   ` Stefano Garzarella
  2026-06-12 16:57 ` [PATCH 4/4] vhost/vsock: re-scan TX virtqueue on device start Andrey Drobyshev
  2026-06-16 13:35 ` [PATCH 0/4] vhost/vsock: add support for VHOST_RESET_OWNER and CPR migration Stefano Garzarella
  4 siblings, 1 reply; 17+ messages in thread
From: Andrey Drobyshev @ 2026-06-12 16:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: kvm, virtualization, netdev, sgarzare, mst, stefanha,
	maciej.szmigiero, bchaney, mark.kanda, ptikhomirov, den,
	andrey.drobyshev

From: "Denis V. Lunev" <den@openvz.org>

Earlier commit ("ms/vhost/vsock: Refuse the connection immediately when
guest isn't ready") added a fast-fail in vhost_transport_send_pkt().  It
rejects every host send with -EHOSTUNREACH until the destination calls
SET_RUNNING(1).  The fast-fail condition checks whether device's backends
are dropped, and if they're, the guest is considered to be not ready.

However, there might be other reasons for backends to be nulled.  In
particular, when QEMU is performing CPR (checkpoint-restore) migration,
device ownership is being RESET and SET again, which leads to backends
drop and reattach.  If we end up connecting during this window, an
AF_VSOCK client gets -EHOSTUNREACH, which is wrong.

Add a cpr_paused flag set inside vhost_vsock_drop_backends() when the
backend was previously live, cleared by vhost_vsock_start(). When set,
vhost_transport_send_pkt() queues the skb instead of fast-failing; the
existing kick of send_pkt_work in vhost_vsock_start() drains it on
resume. A device that has never run keeps cpr_paused == false and the
boot-time fast-fail behaviour is preserved.

Pair the cpr_paused store with the backend store using an
smp_wmb()/smp_rmb() pair so a concurrent sender on a weakly-ordered
architecture never observes (NULL backend, !paused):

Signed-off-by: Denis V. Lunev <den@openvz.org>
---
 drivers/vhost/vsock.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index e629886e5cf8..bcaba36becd7 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -61,6 +61,7 @@ struct vhost_vsock {
 
 	u32 guest_cid;
 	bool seqpacket_allow;
+	bool cpr_paused;	/* between stop and next start */
 };
 
 static u32 vhost_transport_get_local_cid(void)
@@ -311,11 +312,17 @@ vhost_transport_send_pkt(struct sk_buff *skb, struct net *net)
 	 * the mutex would be too expensive in this hot path, and we already have
 	 * all the outcomes covered: if the backend becomes NULL right after the check,
 	 * vhost_transport_do_send_pkt() will check it under the mutex anyway.
+	 *
+	 * Don't fast-fail if cpr_paused is set, keep queueing skbs instead.
+	 * The kick in vhost_vsock_start() will drain them on resume.
 	 */
 	if (unlikely(!data_race(vhost_vq_get_backend(&vsock->vqs[VSOCK_VQ_RX])))) {
-		rcu_read_unlock();
-		kfree_skb(skb);
-		return -EHOSTUNREACH;
+		smp_rmb();	/* pairs with smp_wmb() in start/drop_backends */
+		if (!READ_ONCE(vsock->cpr_paused)) {
+			rcu_read_unlock();
+			kfree_skb(skb);
+			return -EHOSTUNREACH;
+		}
 	}
 
 	if (virtio_vsock_skb_reply(skb))
@@ -640,6 +647,9 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
 		mutex_unlock(&vq->mutex);
 	}
 
+	smp_wmb();	/* pairs with smp_rmb() in send_pkt */
+	WRITE_ONCE(vsock->cpr_paused, false);
+
 	/* Some packets may have been queued before the device was started,
 	 * let's kick the send worker to send them.
 	 */
@@ -671,6 +681,11 @@ static void vhost_vsock_drop_backends(struct vhost_vsock *vsock)
 
 	lockdep_assert_held(&vsock->dev.mutex);
 
+	if (vhost_vq_get_backend(&vsock->vqs[VSOCK_VQ_RX])) {
+		WRITE_ONCE(vsock->cpr_paused, true);
+		smp_wmb();	/* pairs with smp_rmb() in send_pkt */
+	}
+
 	for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
 		vq = &vsock->vqs[i];
 
@@ -728,6 +743,7 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
 
 	vsock->guest_cid = 0; /* no CID assigned yet */
 	vsock->seqpacket_allow = false;
+	vsock->cpr_paused = false;
 
 	atomic_set(&vsock->queued_replies, 0);
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/4] vhost/vsock: re-scan TX virtqueue on device start
  2026-06-12 16:57 [PATCH 0/4] vhost/vsock: add support for VHOST_RESET_OWNER and CPR migration Andrey Drobyshev
                   ` (2 preceding siblings ...)
  2026-06-12 16:57 ` [PATCH 3/4] vhost/vsock: suppress EHOSTUNREACH fast-fail during CPR pause Andrey Drobyshev
@ 2026-06-12 16:57 ` Andrey Drobyshev
  2026-06-16 14:23   ` Stefano Garzarella
  2026-06-16 13:35 ` [PATCH 0/4] vhost/vsock: add support for VHOST_RESET_OWNER and CPR migration Stefano Garzarella
  4 siblings, 1 reply; 17+ messages in thread
From: Andrey Drobyshev @ 2026-06-12 16:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: kvm, virtualization, netdev, sgarzare, mst, stefanha,
	maciej.szmigiero, bchaney, mark.kanda, ptikhomirov, den,
	andrey.drobyshev

During QEMU CPR live-update (and VHOST_RESET_OWNER in general) the guest
keeps running while the host drops and later re-attaches vhost backends.
If the guest adds a buffer to the TX virtqueue (guest->host) and kicks
while the backend is temporarily NULL (between vhost_vsock_drop_backends()
and the next vhost_vsock_start()), then the kick is delivered to the
vhost worker, handle_tx_kick() sees a NULL backend and returns, and the
kick signal is consumed.  The buffer is then left in the ring.

Then upon device start vhost_vsock_start() only re-kicks the RX send
worker, never the TX VQ, so the buffer is processed only if the guest
happens to kick again.  But if the guest itself is now waiting for data
from the host, it will never kick TX VQ again, and we end up in a
deadlock.

The deadlock is reproduced during active host->guest socat data transfer
under multiple consecutive CPR live-update's.

To fix this, in vhost_vsock_start(), after kicking the RX send worker, also
queue the TX vq poll so any buffers the guest enqueued while we were paused
get scanned.

Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
---
 drivers/vhost/vsock.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index bcaba36becd7..1fcfe71d18be 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -655,6 +655,12 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
 	 */
 	vhost_vq_work_queue(&vsock->vqs[VSOCK_VQ_RX], &vsock->send_pkt_work);
 
+	/*
+	 * Some packets might've also been queued in TX VQ.  Re-scan it here,
+	 * mirroring the RX send-worker kick above.
+	 */
+	vhost_poll_queue(&vsock->vqs[VSOCK_VQ_TX].poll);
+
 	mutex_unlock(&vsock->dev.mutex);
 	return 0;
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/4] vhost/vsock: add support for VHOST_RESET_OWNER and CPR migration
  2026-06-12 16:57 [PATCH 0/4] vhost/vsock: add support for VHOST_RESET_OWNER and CPR migration Andrey Drobyshev
                   ` (3 preceding siblings ...)
  2026-06-12 16:57 ` [PATCH 4/4] vhost/vsock: re-scan TX virtqueue on device start Andrey Drobyshev
@ 2026-06-16 13:35 ` Stefano Garzarella
  2026-06-16 14:01   ` Andrey Drobyshev
  4 siblings, 1 reply; 17+ messages in thread
From: Stefano Garzarella @ 2026-06-16 13:35 UTC (permalink / raw)
  To: Andrey Drobyshev
  Cc: linux-kernel, kvm, virtualization, netdev, mst, stefanha,
	maciej.szmigiero, bchaney, mark.kanda, ptikhomirov, den

Hi Andrey,
thanks for the series!

On Fri, Jun 12, 2026 at 07:57:14PM +0300, Andrey Drobyshev wrote:
>Host<-->guest connections via AF_VSOCK sockets aren't supposed to
>outlive VM migration, since VM is moving to another host.  However
>there's a special case, which is QEMU live-update, or CPR
>(checkpoint-restore) migration.  In this case, VM remains on the same
>host, and we'd like such connections to persist.

In the spec we have VIRTIO_VSOCK_EVENT_TRANSPORT_RESET which is usually 
sent by the device after a migration.

IIUC the specs don't say this has to be done all the time, so we don't 
need to change anything in the specs, right?

We just need to avoid sending it (which I think is what we're doing 
here... I still need to look at the patches).

>
>For this to work, we need to be able to transfer device ownership from
>source QEMU to dest QEMU.  Namely, source needs to reset ownership by
>issuing VHOST_RESET_OWNER ioctl, and then target has to claim it by
>calling VHOST_SET_OWNER.
>
>Since VHOST_RESET_OWNER isn't yet implemented for vhost-vsock, let's add
>such implementation (patches 1-2).  Also fix regression introduced by
>the earlier commit [1] (patch 3), and fix the deadlock bug (commit 4).

If it's a regression, should we fix it separately?

Or is it related to this series?

>
>There's a complementary series for QEMU [0] adding support of vhost-vsock
>devices during CPR migration.
>
>NOTE: this series needs to be applied on top of Michael's vhost/linux-next
>tree as it contains relevant commit [1], not yet present in master branch.
>
>I've tested this (patched QEMU + patched kernel) approximately as follows:
>
>  * Run listener in the guest:
>  socat -u VSOCK-LISTEN:9999 - >/tmp/recv.bin
>
>  * Run data transfer from host to guest:
>  socat -u FILE:/root/bigfile.bin VSOCK-CONNECT:CID:9999
>
>  * Perform CPR migration during transfer (either cpr-exec or cpr-transfer)
>  * Check that file hash sum matches
>
>[0] https://lore.kernel.org/qemu-devel/20260612165110.431376-1-andrey.drobyshev@virtuozzo.com
>[1] https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/commit/?id=bb26ed5f3a8b
>
>Andrey Drobyshev (1):
>  vhost/vsock: re-scan TX virtqueue on device start
>
>Denis V. Lunev (1):
>  vhost/vsock: suppress EHOSTUNREACH fast-fail during CPR pause
>
>Pavel Tikhomirov (2):
>  vhost/vsock: split out vhost_vsock_drop_backends helper
>  vhost/vsock: add VHOST_RESET_OWNER ioctl
>
> drivers/vhost/vsock.c | 80 +++++++++++++++++++++++++++++++++++++------
> 1 file changed, 69 insertions(+), 11 deletions(-)
>
>-- 
>2.47.1
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] vhost/vsock: split out vhost_vsock_drop_backends helper
  2026-06-12 16:57 ` [PATCH 1/4] vhost/vsock: split out vhost_vsock_drop_backends helper Andrey Drobyshev
@ 2026-06-16 13:42   ` Stefano Garzarella
  0 siblings, 0 replies; 17+ messages in thread
From: Stefano Garzarella @ 2026-06-16 13:42 UTC (permalink / raw)
  To: Andrey Drobyshev
  Cc: linux-kernel, kvm, virtualization, netdev, mst, stefanha,
	maciej.szmigiero, bchaney, mark.kanda, ptikhomirov, den

On Fri, Jun 12, 2026 at 07:57:15PM +0300, Andrey Drobyshev wrote:
>From: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
>
>Split the actual backend dropping part from vhost_vsock_stop.  We're
>going to need it for the VHOST_RESET_OWNER implementation in the
>following patch, when vsock->dev.mutex is already taken and owner is
>checked.
>
>Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
>---
> drivers/vhost/vsock.c | 26 +++++++++++++++++---------
> 1 file changed, 17 insertions(+), 9 deletions(-)

LGTM!

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>

>
>diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>index 9aaab6bb8061..b12221ce6faf 100644
>--- a/drivers/vhost/vsock.c
>+++ b/drivers/vhost/vsock.c
>@@ -664,9 +664,24 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
> 	return ret;
> }
>
>-static int vhost_vsock_stop(struct vhost_vsock *vsock, bool check_owner)
>+static void vhost_vsock_drop_backends(struct vhost_vsock *vsock)
> {
>+	struct vhost_virtqueue *vq;
> 	size_t i;
>+
>+	lockdep_assert_held(&vsock->dev.mutex);
>+
>+	for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
>+		vq = &vsock->vqs[i];
>+
>+		mutex_lock(&vq->mutex);
>+		vhost_vq_set_backend(vq, NULL);
>+		mutex_unlock(&vq->mutex);
>+	}
>+}
>+
>+static int vhost_vsock_stop(struct vhost_vsock *vsock, bool check_owner)
>+{
> 	int ret = 0;
>
> 	mutex_lock(&vsock->dev.mutex);
>@@ -677,14 +692,7 @@ static int vhost_vsock_stop(struct vhost_vsock *vsock, bool check_owner)
> 			goto err;
> 	}
>
>-	for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
>-		struct vhost_virtqueue *vq = &vsock->vqs[i];
>-
>-		mutex_lock(&vq->mutex);
>-		vhost_vq_set_backend(vq, NULL);
>-		mutex_unlock(&vq->mutex);
>-	}
>-
>+	vhost_vsock_drop_backends(vsock);
> err:
> 	mutex_unlock(&vsock->dev.mutex);
> 	return ret;
>-- 
>2.47.1
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] vhost/vsock: add VHOST_RESET_OWNER ioctl
  2026-06-12 16:57 ` [PATCH 2/4] vhost/vsock: add VHOST_RESET_OWNER ioctl Andrey Drobyshev
@ 2026-06-16 13:48   ` Stefano Garzarella
  2026-06-16 14:10     ` Andrey Drobyshev
  0 siblings, 1 reply; 17+ messages in thread
From: Stefano Garzarella @ 2026-06-16 13:48 UTC (permalink / raw)
  To: Andrey Drobyshev
  Cc: linux-kernel, kvm, virtualization, netdev, mst, stefanha,
	maciej.szmigiero, bchaney, mark.kanda, ptikhomirov, den

On Fri, Jun 12, 2026 at 07:57:16PM +0300, Andrey Drobyshev wrote:
>From: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
>
>This ioctl is needed for QEMU's CPR (checkpoint-restore) migration of
>the guest with vhost-vsock device.  For this to work, we need to reset
>the device ownership on the source side by calling RESET_OWNER, and then
>claim it on the dest side by calling SET_OWNER.  We expect not to lose any
>AF_VSOCK connection while this happens.
>
>Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
>---
> drivers/vhost/vsock.c | 28 ++++++++++++++++++++++++++++
> 1 file changed, 28 insertions(+)
>
>diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>index b12221ce6faf..e629886e5cf8 100644
>--- a/drivers/vhost/vsock.c
>+++ b/drivers/vhost/vsock.c
>@@ -894,6 +894,32 @@ static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features)
> 	return -EFAULT;
> }
>
>+static int vhost_vsock_reset_owner(struct vhost_vsock *vsock)
>+{
>+	struct vhost_iotlb *umem;
>+	long err;
>+
>+	mutex_lock(&vsock->dev.mutex);
>+	err = vhost_dev_check_owner(&vsock->dev);
>+	if (err)
>+		goto done;
>+	umem = vhost_dev_reset_owner_prepare();
>+	if (!umem) {
>+		err = -ENOMEM;
>+		goto done;
>+	}
>+	/* Follows vhost_vsock_dev_release closely except for guest_cid drop */
>+	vsock_for_each_connected_socket(&vhost_transport.transport,
>+					vhost_vsock_reset_orphans);

In vhost_vsock_reset_orphans() we have:

	rcu_read_lock();

	/* If the peer is still valid, no need to reset connection */
	if (vhost_vsock_get(vsk->remote_addr.svm_cid, sock_net(sk))) {
		rcu_read_unlock();
		return;
	}

IIUC we are not removing the guest cid from the hash table, so this 
check will be always true, and nothing is done.

So, is this call really useful?

>+	vhost_vsock_drop_backends(vsock);
>+	vhost_vsock_flush(vsock);
>+	vhost_dev_stop(&vsock->dev);
>+	vhost_dev_reset_owner(&vsock->dev, umem);
>+done:
>+	mutex_unlock(&vsock->dev.mutex);
>+	return err;
>+}
>+
> static long vhost_vsock_dev_ioctl(struct file *f, unsigned int ioctl,
> 				  unsigned long arg)
> {
>@@ -937,6 +963,8 @@ static long vhost_vsock_dev_ioctl(struct file *f, unsigned int ioctl,
> 			return -EOPNOTSUPP;
> 		vhost_set_backend_features(&vsock->dev, features);
> 		return 0;
>+	case VHOST_RESET_OWNER:
>+		return vhost_vsock_reset_owner(vsock);
> 	default:
> 		mutex_lock(&vsock->dev.mutex);
> 		r = vhost_dev_ioctl(&vsock->dev, ioctl, argp);
>-- 
>2.47.1
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/4] vhost/vsock: add support for VHOST_RESET_OWNER and CPR migration
  2026-06-16 13:35 ` [PATCH 0/4] vhost/vsock: add support for VHOST_RESET_OWNER and CPR migration Stefano Garzarella
@ 2026-06-16 14:01   ` Andrey Drobyshev
  2026-06-16 14:28     ` Stefano Garzarella
  0 siblings, 1 reply; 17+ messages in thread
From: Andrey Drobyshev @ 2026-06-16 14:01 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: linux-kernel, kvm, virtualization, netdev, mst, stefanha,
	maciej.szmigiero, bchaney, mark.kanda, ptikhomirov, den

Hello Stefano,

On 6/16/26 4:35 PM, Stefano Garzarella wrote:
> Hi Andrey,
> thanks for the series!
> 
> On Fri, Jun 12, 2026 at 07:57:14PM +0300, Andrey Drobyshev wrote:
>> Host<-->guest connections via AF_VSOCK sockets aren't supposed to
>> outlive VM migration, since VM is moving to another host.  However
>> there's a special case, which is QEMU live-update, or CPR
>> (checkpoint-restore) migration.  In this case, VM remains on the same
>> host, and we'd like such connections to persist.
> 
> In the spec we have VIRTIO_VSOCK_EVENT_TRANSPORT_RESET which is usually 
> sent by the device after a migration.
> 
> IIUC the specs don't say this has to be done all the time, so we don't 
> need to change anything in the specs, right?
> 
> We just need to avoid sending it (which I think is what we're doing 
> here... I still need to look at the patches).
>

Sending this exact ioctl is guarded by one of my patches in the QEMU
counterpart series:

https://lore.kernel.org/qemu-devel/20260612165110.431376-6-andrey.drobyshev@virtuozzo.com/

So we indeed avoid sending it on migration target in case of CPR migration.

>>
>> For this to work, we need to be able to transfer device ownership from
>> source QEMU to dest QEMU.  Namely, source needs to reset ownership by
>> issuing VHOST_RESET_OWNER ioctl, and then target has to claim it by
>> calling VHOST_SET_OWNER.
>>
>> Since VHOST_RESET_OWNER isn't yet implemented for vhost-vsock, let's add
>> such implementation (patches 1-2).  Also fix regression introduced by
>> the earlier commit [1] (patch 3), and fix the deadlock bug (commit 4).
> 
> If it's a regression, should we fix it separately?
> 
> Or is it related to this series?
>

Probably my wording wasn't quite correct.  I posted this patch here
because we found the problem during testing this particular
functionality, i.e. vsock data transfer + CPR migration.  And the
problem was introduced by a recent commit, which is fine on its own, but
breaks the CPR case.
>>
>> There's a complementary series for QEMU [0] adding support of vhost-vsock
>> devices during CPR migration.
>>
>> NOTE: this series needs to be applied on top of Michael's vhost/linux-next
>> tree as it contains relevant commit [1], not yet present in master branch.
>>
>> I've tested this (patched QEMU + patched kernel) approximately as follows:
>>
>>  * Run listener in the guest:
>>  socat -u VSOCK-LISTEN:9999 - >/tmp/recv.bin
>>
>>  * Run data transfer from host to guest:
>>  socat -u FILE:/root/bigfile.bin VSOCK-CONNECT:CID:9999
>>
>>  * Perform CPR migration during transfer (either cpr-exec or cpr-transfer)
>>  * Check that file hash sum matches
>>
>> [0] https://lore.kernel.org/qemu-devel/20260612165110.431376-1-andrey.drobyshev@virtuozzo.com
>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/commit/?id=bb26ed5f3a8b
>>
>> Andrey Drobyshev (1):
>>  vhost/vsock: re-scan TX virtqueue on device start
>>
>> Denis V. Lunev (1):
>>  vhost/vsock: suppress EHOSTUNREACH fast-fail during CPR pause
>>
>> Pavel Tikhomirov (2):
>>  vhost/vsock: split out vhost_vsock_drop_backends helper
>>  vhost/vsock: add VHOST_RESET_OWNER ioctl
>>
>> drivers/vhost/vsock.c | 80 +++++++++++++++++++++++++++++++++++++------
>> 1 file changed, 69 insertions(+), 11 deletions(-)
>>
>> -- 
>> 2.47.1
>>
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] vhost/vsock: add VHOST_RESET_OWNER ioctl
  2026-06-16 13:48   ` Stefano Garzarella
@ 2026-06-16 14:10     ` Andrey Drobyshev
  2026-06-16 14:26       ` Stefano Garzarella
  0 siblings, 1 reply; 17+ messages in thread
From: Andrey Drobyshev @ 2026-06-16 14:10 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: linux-kernel, kvm, virtualization, netdev, mst, stefanha,
	maciej.szmigiero, bchaney, mark.kanda, ptikhomirov, den

On 6/16/26 4:48 PM, Stefano Garzarella wrote:
> On Fri, Jun 12, 2026 at 07:57:16PM +0300, Andrey Drobyshev wrote:
>> From: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
>>
>> This ioctl is needed for QEMU's CPR (checkpoint-restore) migration of
>> the guest with vhost-vsock device.  For this to work, we need to reset
>> the device ownership on the source side by calling RESET_OWNER, and then
>> claim it on the dest side by calling SET_OWNER.  We expect not to lose any
>> AF_VSOCK connection while this happens.
>>
>> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
>> ---
>> drivers/vhost/vsock.c | 28 ++++++++++++++++++++++++++++
>> 1 file changed, 28 insertions(+)
>>
>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>> index b12221ce6faf..e629886e5cf8 100644
>> --- a/drivers/vhost/vsock.c
>> +++ b/drivers/vhost/vsock.c
>> @@ -894,6 +894,32 @@ static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features)
>> 	return -EFAULT;
>> }
>>
>> +static int vhost_vsock_reset_owner(struct vhost_vsock *vsock)
>> +{
>> +	struct vhost_iotlb *umem;
>> +	long err;
>> +
>> +	mutex_lock(&vsock->dev.mutex);
>> +	err = vhost_dev_check_owner(&vsock->dev);
>> +	if (err)
>> +		goto done;
>> +	umem = vhost_dev_reset_owner_prepare();
>> +	if (!umem) {
>> +		err = -ENOMEM;
>> +		goto done;
>> +	}
>> +	/* Follows vhost_vsock_dev_release closely except for guest_cid drop */
>> +	vsock_for_each_connected_socket(&vhost_transport.transport,
>> +					vhost_vsock_reset_orphans);
> 
> In vhost_vsock_reset_orphans() we have:
> 
> 	rcu_read_lock();
> 
> 	/* If the peer is still valid, no need to reset connection */
> 	if (vhost_vsock_get(vsk->remote_addr.svm_cid, sock_net(sk))) {
> 		rcu_read_unlock();
> 		return;
> 	}
> 
> IIUC we are not removing the guest cid from the hash table, so this 
> check will be always true, and nothing is done.
> 
> So, is this call really useful?
>

You're right, and it's most probably an artifact from mimicking the
vhost_vsock_dev_release() implementation, as mentioned in the comment.
In our case this whole iteration is a no-op, we better remove it.

BTW earlier I received some feedback from Sashiko AI reviewer, which
also spotted that same issue (and some more interesting races):

https://sashiko.dev/#/patchset/20260612165718.433546-1-andrey.drobyshev@virtuozzo.com

Apparently it only CC's its reviews to kvm@vger.kernel.org so you can't
see them right away.  Just wanted to let you know to save your time
here.  I'll send a v2 with respect to Sashiko remarks.  But of course
would be great if you spot some more issues here.


>> +	vhost_vsock_drop_backends(vsock);
>> +	vhost_vsock_flush(vsock);
>> +	vhost_dev_stop(&vsock->dev);
>> +	vhost_dev_reset_owner(&vsock->dev, umem);
>> +done:
>> +	mutex_unlock(&vsock->dev.mutex);
>> +	return err;
>> +}
>> +
>> static long vhost_vsock_dev_ioctl(struct file *f, unsigned int ioctl,
>> 				  unsigned long arg)
>> {
>> @@ -937,6 +963,8 @@ static long vhost_vsock_dev_ioctl(struct file *f, unsigned int ioctl,
>> 			return -EOPNOTSUPP;
>> 		vhost_set_backend_features(&vsock->dev, features);
>> 		return 0;
>> +	case VHOST_RESET_OWNER:
>> +		return vhost_vsock_reset_owner(vsock);
>> 	default:
>> 		mutex_lock(&vsock->dev.mutex);
>> 		r = vhost_dev_ioctl(&vsock->dev, ioctl, argp);
>> -- 
>> 2.47.1
>>
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] vhost/vsock: suppress EHOSTUNREACH fast-fail during CPR pause
  2026-06-12 16:57 ` [PATCH 3/4] vhost/vsock: suppress EHOSTUNREACH fast-fail during CPR pause Andrey Drobyshev
@ 2026-06-16 14:18   ` Stefano Garzarella
  2026-06-16 15:58     ` Andrey Drobyshev
  0 siblings, 1 reply; 17+ messages in thread
From: Stefano Garzarella @ 2026-06-16 14:18 UTC (permalink / raw)
  To: Andrey Drobyshev
  Cc: linux-kernel, kvm, virtualization, netdev, mst, stefanha,
	maciej.szmigiero, bchaney, mark.kanda, ptikhomirov, den

On Fri, Jun 12, 2026 at 07:57:17PM +0300, Andrey Drobyshev wrote:
>From: "Denis V. Lunev" <den@openvz.org>
>
>Earlier commit ("ms/vhost/vsock: Refuse the connection immediately when

Please follow 
https://docs.kernel.org/process/submitting-patches.html#describe-your-changes 
on how to refer to a commit.

>guest isn't ready") added a fast-fail in vhost_transport_send_pkt().  It
>rejects every host send with -EHOSTUNREACH until the destination calls
>SET_RUNNING(1).  The fast-fail condition checks whether device's backends
>are dropped, and if they're, the guest is considered to be not ready.

Okay, so it's not a regression, I mean without this series that patch is 
not adding any regression, no?

If it's the case, I'll change the wording in the cover letter.

>
>However, there might be other reasons for backends to be nulled.  In
>particular, when QEMU is performing CPR (checkpoint-restore) migration,
>device ownership is being RESET and SET again, which leads to backends
>drop and reattach.  If we end up connecting during this window, an
>AF_VSOCK client gets -EHOSTUNREACH, which is wrong.

Please add this change before starting to support VHOST_RESET_OWNER 
ioctl in vhost-vsock, otherwise we are breaking the bisectability.

>
>Add a cpr_paused flag set inside vhost_vsock_drop_backends() when the
>backend was previously live, cleared by vhost_vsock_start(). When set,
>vhost_transport_send_pkt() queues the skb instead of fast-failing; the
>existing kick of send_pkt_work in vhost_vsock_start() drains it on
>resume. A device that has never run keeps cpr_paused == false and the
>boot-time fast-fail behaviour is preserved.
>
>Pair the cpr_paused store with the backend store using an
>smp_wmb()/smp_rmb() pair so a concurrent sender on a weakly-ordered
>architecture never observes (NULL backend, !paused):
>
>Signed-off-by: Denis V. Lunev <den@openvz.org>
>---
> drivers/vhost/vsock.c | 22 +++++++++++++++++++---
> 1 file changed, 19 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>index e629886e5cf8..bcaba36becd7 100644
>--- a/drivers/vhost/vsock.c
>+++ b/drivers/vhost/vsock.c
>@@ -61,6 +61,7 @@ struct vhost_vsock {
>
> 	u32 guest_cid;
> 	bool seqpacket_allow;
>+	bool cpr_paused;	/* between stop and next start */
> };
>
> static u32 vhost_transport_get_local_cid(void)
>@@ -311,11 +312,17 @@ vhost_transport_send_pkt(struct sk_buff *skb, struct net *net)
> 	 * the mutex would be too expensive in this hot path, and we already have
> 	 * all the outcomes covered: if the backend becomes NULL right after the check,
> 	 * vhost_transport_do_send_pkt() will check it under the mutex anyway.
>+	 *
>+	 * Don't fast-fail if cpr_paused is set, keep queueing skbs instead.
>+	 * The kick in vhost_vsock_start() will drain them on resume.
> 	 */
> 	if (unlikely(!data_race(vhost_vq_get_backend(&vsock->vqs[VSOCK_VQ_RX])))) {
>-		rcu_read_unlock();
>-		kfree_skb(skb);
>-		return -EHOSTUNREACH;
>+		smp_rmb();	/* pairs with smp_wmb() in start/drop_backends */
>+		if (!READ_ONCE(vsock->cpr_paused)) {

Can we avoid this which is not really readable and maybe add a single 
variable to control the fast-fail at all?

I mean replacing both cpr_paused + backend-pointer with a single 
`started` flag: set it to false at open, true on start via 
smp_store_release(), back to false on normal stop, and leave it true 
during CPR pause.

The reader in send_pkt can do just:

     if (!smp_load_acquire(&vsock->started))
         return -EHOSTUNREACH;

WDYT?

>+			rcu_read_unlock();
>+			kfree_skb(skb);
>+			return -EHOSTUNREACH;
>+		}


That said claude here is reporting a potential issue that I think we 
should consider:
     After VHOST_RESET_OWNER, the guest CID stays in the hash, so 
     vhost_transport_send_pkt() can still find the vsock, skip the 
     fast-fail (cpr_paused=true), and call vhost_vq_work_queue() while 
     vhost_workers_free() is freeing workers without a synchronize_rcu() 
     — risking a use-after-free. Also, any send_pkt_work queued between 
     the last flush and worker teardown gets its VHOST_WORK_QUEUED bit 
     stuck (the vhost task exits without draining), deadlocking 
     host→guest traffic after restart.

     A synchronize_rcu() in vhost_workers_free() between the 
     rcu_assign_pointer(NULL) loop and the destroy loop would close the 
     use-after-free, and reinitializing send_pkt_work via 
     vhost_work_init() after vhost_dev_reset_owner() returns would clear 
     the stuck QUEUED bit.


> 	}
>
> 	if (virtio_vsock_skb_reply(skb))
>@@ -640,6 +647,9 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
> 		mutex_unlock(&vq->mutex);
> 	}
>
>+	smp_wmb();	/* pairs with smp_rmb() in send_pkt */
>+	WRITE_ONCE(vsock->cpr_paused, false);
>+
> 	/* Some packets may have been queued before the device was started,
> 	 * let's kick the send worker to send them.
> 	 */
>@@ -671,6 +681,11 @@ static void vhost_vsock_drop_backends(struct vhost_vsock *vsock)
>
> 	lockdep_assert_held(&vsock->dev.mutex);
>
>+	if (vhost_vq_get_backend(&vsock->vqs[VSOCK_VQ_RX])) {
>+		WRITE_ONCE(vsock->cpr_paused, true);
>+		smp_wmb();	/* pairs with smp_rmb() in send_pkt */
>+	}

Why here and not in vhost_vsock_reset_owner()?

Also having this here will set it to true also with 
VHOST_VSOCK_SET_RUNNING(0), is that right?

Thanks,
Stefano

>+
> 	for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
> 		vq = &vsock->vqs[i];
>
>@@ -728,6 +743,7 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
>
> 	vsock->guest_cid = 0; /* no CID assigned yet */
> 	vsock->seqpacket_allow = false;
>+	vsock->cpr_paused = false;
>
> 	atomic_set(&vsock->queued_replies, 0);
>
>-- 
>2.47.1
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/4] vhost/vsock: re-scan TX virtqueue on device start
  2026-06-12 16:57 ` [PATCH 4/4] vhost/vsock: re-scan TX virtqueue on device start Andrey Drobyshev
@ 2026-06-16 14:23   ` Stefano Garzarella
  2026-06-16 15:58     ` Andrey Drobyshev
  0 siblings, 1 reply; 17+ messages in thread
From: Stefano Garzarella @ 2026-06-16 14:23 UTC (permalink / raw)
  To: Andrey Drobyshev
  Cc: linux-kernel, kvm, virtualization, netdev, mst, stefanha,
	maciej.szmigiero, bchaney, mark.kanda, ptikhomirov, den

On Fri, Jun 12, 2026 at 07:57:18PM +0300, Andrey Drobyshev wrote:
>During QEMU CPR live-update (and VHOST_RESET_OWNER in general) the guest
>keeps running while the host drops and later re-attaches vhost backends.
>If the guest adds a buffer to the TX virtqueue (guest->host) and kicks
>while the backend is temporarily NULL (between vhost_vsock_drop_backends()
>and the next vhost_vsock_start()), then the kick is delivered to the
>vhost worker, handle_tx_kick() sees a NULL backend and returns, and the
>kick signal is consumed.  The buffer is then left in the ring.
>
>Then upon device start vhost_vsock_start() only re-kicks the RX send
>worker, never the TX VQ, so the buffer is processed only if the guest
>happens to kick again.  But if the guest itself is now waiting for data
>from the host, it will never kick TX VQ again, and we end up in a
>deadlock.
>
>The deadlock is reproduced during active host->guest socat data transfer
>under multiple consecutive CPR live-update's.
>
>To fix this, in vhost_vsock_start(), after kicking the RX send worker, also
>queue the TX vq poll so any buffers the guest enqueued while we were paused
>get scanned.

Again, it seems like we're fixing an issue that existed before this 
series, but IIUC without support for VHOST_RESET_OWNER, this could never 
have happened, so the wording should be changed to make it clear that 
this is can happen only with the new VHOST_RESET_OWNER support.

In addition, this patch must also be applied before the 
VHOST_RESET_OWNER support or merged into it.

>
>Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
>---
> drivers/vhost/vsock.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
>diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>index bcaba36becd7..1fcfe71d18be 100644
>--- a/drivers/vhost/vsock.c
>+++ b/drivers/vhost/vsock.c
>@@ -655,6 +655,12 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
> 	 */
> 	vhost_vq_work_queue(&vsock->vqs[VSOCK_VQ_RX], &vsock->send_pkt_work);
>
>+	/*
>+	 * Some packets might've also been queued in TX VQ.  Re-scan it here,
>+	 * mirroring the RX send-worker kick above.
>+	 */

Can we also mention that this is related to VHOST_RESET_OWNER?

Thanks,
Stefano

>+	vhost_poll_queue(&vsock->vqs[VSOCK_VQ_TX].poll);
>+
> 	mutex_unlock(&vsock->dev.mutex);
> 	return 0;
>
>-- 
>2.47.1
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] vhost/vsock: add VHOST_RESET_OWNER ioctl
  2026-06-16 14:10     ` Andrey Drobyshev
@ 2026-06-16 14:26       ` Stefano Garzarella
  0 siblings, 0 replies; 17+ messages in thread
From: Stefano Garzarella @ 2026-06-16 14:26 UTC (permalink / raw)
  To: Andrey Drobyshev
  Cc: linux-kernel, kvm, virtualization, netdev, mst, stefanha,
	maciej.szmigiero, bchaney, mark.kanda, ptikhomirov, den

On Tue, Jun 16, 2026 at 05:10:38PM +0300, Andrey Drobyshev wrote:
>On 6/16/26 4:48 PM, Stefano Garzarella wrote:
>> On Fri, Jun 12, 2026 at 07:57:16PM +0300, Andrey Drobyshev wrote:
>>> From: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
>>>
>>> This ioctl is needed for QEMU's CPR (checkpoint-restore) migration of
>>> the guest with vhost-vsock device.  For this to work, we need to reset
>>> the device ownership on the source side by calling RESET_OWNER, and then
>>> claim it on the dest side by calling SET_OWNER.  We expect not to lose any
>>> AF_VSOCK connection while this happens.
>>>
>>> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
>>> ---
>>> drivers/vhost/vsock.c | 28 ++++++++++++++++++++++++++++
>>> 1 file changed, 28 insertions(+)
>>>
>>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>>> index b12221ce6faf..e629886e5cf8 100644
>>> --- a/drivers/vhost/vsock.c
>>> +++ b/drivers/vhost/vsock.c
>>> @@ -894,6 +894,32 @@ static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features)
>>> 	return -EFAULT;
>>> }
>>>
>>> +static int vhost_vsock_reset_owner(struct vhost_vsock *vsock)
>>> +{
>>> +	struct vhost_iotlb *umem;
>>> +	long err;
>>> +
>>> +	mutex_lock(&vsock->dev.mutex);
>>> +	err = vhost_dev_check_owner(&vsock->dev);
>>> +	if (err)
>>> +		goto done;
>>> +	umem = vhost_dev_reset_owner_prepare();
>>> +	if (!umem) {
>>> +		err = -ENOMEM;
>>> +		goto done;
>>> +	}
>>> +	/* Follows vhost_vsock_dev_release closely except for guest_cid drop */
>>> +	vsock_for_each_connected_socket(&vhost_transport.transport,
>>> +					vhost_vsock_reset_orphans);
>>
>> In vhost_vsock_reset_orphans() we have:
>>
>> 	rcu_read_lock();
>>
>> 	/* If the peer is still valid, no need to reset connection */
>> 	if (vhost_vsock_get(vsk->remote_addr.svm_cid, sock_net(sk))) {
>> 		rcu_read_unlock();
>> 		return;
>> 	}
>>
>> IIUC we are not removing the guest cid from the hash table, so this
>> check will be always true, and nothing is done.
>>
>> So, is this call really useful?
>>
>
>You're right, and it's most probably an artifact from mimicking the
>vhost_vsock_dev_release() implementation, as mentioned in the comment.
>In our case this whole iteration is a no-op, we better remove it.
>
>BTW earlier I received some feedback from Sashiko AI reviewer, which
>also spotted that same issue (and some more interesting races):
>
>https://sashiko.dev/#/patchset/20260612165718.433546-1-andrey.drobyshev@virtuozzo.com

Oh they seems similar to claude comments I included in my comment on 
patch 3.

Yeah, we should takes a look, they seems real issues.

>
>Apparently it only CC's its reviews to kvm@vger.kernel.org so you can't
>see them right away.  Just wanted to let you know to save your time
>here.  I'll send a v2 with respect to Sashiko remarks.  But of course
>would be great if you spot some more issues here.
>

Thanks for pointing that out, but in general I try to do my reviews 
before looking at AI reviews (both sashiko or claude locally) to avoid 
to be too much biased.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/4] vhost/vsock: add support for VHOST_RESET_OWNER and CPR migration
  2026-06-16 14:01   ` Andrey Drobyshev
@ 2026-06-16 14:28     ` Stefano Garzarella
  0 siblings, 0 replies; 17+ messages in thread
From: Stefano Garzarella @ 2026-06-16 14:28 UTC (permalink / raw)
  To: Andrey Drobyshev
  Cc: linux-kernel, kvm, virtualization, netdev, mst, stefanha,
	maciej.szmigiero, bchaney, mark.kanda, ptikhomirov, den

On Tue, Jun 16, 2026 at 05:01:34PM +0300, Andrey Drobyshev wrote:
>Hello Stefano,
>
>On 6/16/26 4:35 PM, Stefano Garzarella wrote:
>> Hi Andrey,
>> thanks for the series!
>>
>> On Fri, Jun 12, 2026 at 07:57:14PM +0300, Andrey Drobyshev wrote:
>>> Host<-->guest connections via AF_VSOCK sockets aren't supposed to
>>> outlive VM migration, since VM is moving to another host.  However
>>> there's a special case, which is QEMU live-update, or CPR
>>> (checkpoint-restore) migration.  In this case, VM remains on the same
>>> host, and we'd like such connections to persist.
>>
>> In the spec we have VIRTIO_VSOCK_EVENT_TRANSPORT_RESET which is usually
>> sent by the device after a migration.
>>
>> IIUC the specs don't say this has to be done all the time, so we don't
>> need to change anything in the specs, right?
>>
>> We just need to avoid sending it (which I think is what we're doing
>> here... I still need to look at the patches).
>>
>
>Sending this exact ioctl is guarded by one of my patches in the QEMU
>counterpart series:
>
>https://lore.kernel.org/qemu-devel/20260612165110.431376-6-andrey.drobyshev@virtuozzo.com/
>
>So we indeed avoid sending it on migration target in case of CPR migration.

Great, so we are aligned :-)

>
>>>
>>> For this to work, we need to be able to transfer device ownership from
>>> source QEMU to dest QEMU.  Namely, source needs to reset ownership by
>>> issuing VHOST_RESET_OWNER ioctl, and then target has to claim it by
>>> calling VHOST_SET_OWNER.
>>>
>>> Since VHOST_RESET_OWNER isn't yet implemented for vhost-vsock, let's add
>>> such implementation (patches 1-2).  Also fix regression introduced by
>>> the earlier commit [1] (patch 3), and fix the deadlock bug (commit 4).
>>
>> If it's a regression, should we fix it separately?
>>
>> Or is it related to this series?
>>
>
>Probably my wording wasn't quite correct.  I posted this patch here
>because we found the problem during testing this particular
>functionality, i.e. vsock data transfer + CPR migration.  And the
>problem was introduced by a recent commit, which is fine on its own, but
>breaks the CPR case.

Yeah, I figured out while reviewing the patch.
I'd avoid "regression" here and use just "issue", because at the end is 
just affecting this work that is not yet merged, so it can be a 
regression.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] vhost/vsock: suppress EHOSTUNREACH fast-fail during CPR pause
  2026-06-16 14:18   ` Stefano Garzarella
@ 2026-06-16 15:58     ` Andrey Drobyshev
  2026-06-16 16:13       ` Stefano Garzarella
  0 siblings, 1 reply; 17+ messages in thread
From: Andrey Drobyshev @ 2026-06-16 15:58 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: linux-kernel, kvm, virtualization, netdev, mst, stefanha,
	maciej.szmigiero, bchaney, mark.kanda, ptikhomirov, den

On 6/16/26 5:18 PM, Stefano Garzarella wrote:
> On Fri, Jun 12, 2026 at 07:57:17PM +0300, Andrey Drobyshev wrote:
>> From: "Denis V. Lunev" <den@openvz.org>
>>
>> Earlier commit ("ms/vhost/vsock: Refuse the connection immediately when
> 
> Please follow 
> https://docs.kernel.org/process/submitting-patches.html#describe-your-changes 
> on how to refer to a commit.
>

I omitted the hash on purpose as the commit is not yet in the mainline
tree, although our series is based and depends on it, as I mentioned:

https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/commit/?id=bb26ed5f3a8b

So it's a different (Michael's) repo and the commit is about to get
merged (but not yet there).  But maybe usual reference style + repo link
would be better.
>> guest isn't ready") added a fast-fail in vhost_transport_send_pkt().  It
>> rejects every host send with -EHOSTUNREACH until the destination calls
>> SET_RUNNING(1).  The fast-fail condition checks whether device's backends
>> are dropped, and if they're, the guest is considered to be not ready.
> 
> Okay, so it's not a regression, I mean without this series that patch is 
> not adding any regression, no?
> 
> If it's the case, I'll change the wording in the cover letter.
>

Agreed.

>>
>> However, there might be other reasons for backends to be nulled.  In
>> particular, when QEMU is performing CPR (checkpoint-restore) migration,
>> device ownership is being RESET and SET again, which leads to backends
>> drop and reattach.  If we end up connecting during this window, an
>> AF_VSOCK client gets -EHOSTUNREACH, which is wrong.
> 
> Please add this change before starting to support VHOST_RESET_OWNER 
> ioctl in vhost-vsock, otherwise we are breaking the bisectability.
>

Agreed.

>>
>> Add a cpr_paused flag set inside vhost_vsock_drop_backends() when the
>> backend was previously live, cleared by vhost_vsock_start(). When set,
>> vhost_transport_send_pkt() queues the skb instead of fast-failing; the
>> existing kick of send_pkt_work in vhost_vsock_start() drains it on
>> resume. A device that has never run keeps cpr_paused == false and the
>> boot-time fast-fail behaviour is preserved.
>>
>> Pair the cpr_paused store with the backend store using an
>> smp_wmb()/smp_rmb() pair so a concurrent sender on a weakly-ordered
>> architecture never observes (NULL backend, !paused):
>>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> ---
>> drivers/vhost/vsock.c | 22 +++++++++++++++++++---
>> 1 file changed, 19 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>> index e629886e5cf8..bcaba36becd7 100644
>> --- a/drivers/vhost/vsock.c
>> +++ b/drivers/vhost/vsock.c
>> @@ -61,6 +61,7 @@ struct vhost_vsock {
>>
>> 	u32 guest_cid;
>> 	bool seqpacket_allow;
>> +	bool cpr_paused;	/* between stop and next start */
>> };
>>
>> static u32 vhost_transport_get_local_cid(void)
>> @@ -311,11 +312,17 @@ vhost_transport_send_pkt(struct sk_buff *skb, struct net *net)
>> 	 * the mutex would be too expensive in this hot path, and we already have
>> 	 * all the outcomes covered: if the backend becomes NULL right after the check,
>> 	 * vhost_transport_do_send_pkt() will check it under the mutex anyway.
>> +	 *
>> +	 * Don't fast-fail if cpr_paused is set, keep queueing skbs instead.
>> +	 * The kick in vhost_vsock_start() will drain them on resume.
>> 	 */
>> 	if (unlikely(!data_race(vhost_vq_get_backend(&vsock->vqs[VSOCK_VQ_RX])))) {
>> -		rcu_read_unlock();
>> -		kfree_skb(skb);
>> -		return -EHOSTUNREACH;
>> +		smp_rmb();	/* pairs with smp_wmb() in start/drop_backends */
>> +		if (!READ_ONCE(vsock->cpr_paused)) {
> 
> Can we avoid this which is not really readable and maybe add a single 
> variable to control the fast-fail at all?
> 
> I mean replacing both cpr_paused + backend-pointer with a single 
> `started` flag: set it to false at open, true on start via 
> smp_store_release(), back to false on normal stop, and leave it true 
> during CPR pause.
> 
> The reader in send_pkt can do just:
> 
>      if (!smp_load_acquire(&vsock->started))
>          return -EHOSTUNREACH;
> 
> WDYT?
>

I don't think it's gonna work as suggested.  As I understand, the order
during CPR migration is:

1) SET_RUNNING(0)
       -> vhost_vsock_stop()
           -> vhost_vsock_drop_backends()
2) RESET_OWNER
       -> vhost_vsock_drop_backends()
3) SET_OWNER
4) SET_RUNNING(1)
       -> vhost_vsock_start
           -> for (...) vhost_vq_set_backend()

(Btw I just noticed backends are already NULL at step 2), but that's
just our CPR case, for any potential RESET_OWNER users it might not be
the case).

So the race windows starts from 1) (not from 2)).  We have no way of
differentiating whether device is actually being stopped for good, or
we're in the middle of CPR.  If we set the flag to false on stop as you
suggested, we'll still hit the -EHOSTUNREACH case eventually, and
avoiding it is the whole purpose of this patch.

The fast-fail with -EHOSTUNREACH relies on the presence of backends.
IIUC the backend will only become set after initial SET_RUNNING(1),
which will only happen once the guest driver writes smth to virtio
config register, QEMU catches it and calls SET_RUNNING(1).  So we have
ordering with the guest's actions here, which is logical.  But for our
issue that means that the only true marker of paused/not paused is the
presence of backends - and that's why the flag is set in
vhost_vsock_drop_backends().

>> +			rcu_read_unlock();
>> +			kfree_skb(skb);
>> +			return -EHOSTUNREACH;
>> +		}
> 
> 
> That said claude here is reporting a potential issue that I think we 
> should consider:
>      After VHOST_RESET_OWNER, the guest CID stays in the hash, so 
>      vhost_transport_send_pkt() can still find the vsock, skip the 
>      fast-fail (cpr_paused=true), and call vhost_vq_work_queue() while 
>      vhost_workers_free() is freeing workers without a synchronize_rcu() 
>      — risking a use-after-free. Also, any send_pkt_work queued between 
>      the last flush and worker teardown gets its VHOST_WORK_QUEUED bit 
>      stuck (the vhost task exits without draining), deadlocking 
>      host→guest traffic after restart.
> 
>      A synchronize_rcu() in vhost_workers_free() between the 
>      rcu_assign_pointer(NULL) loop and the destroy loop would close the 
>      use-after-free, and reinitializing send_pkt_work via 
>      vhost_work_init() after vhost_dev_reset_owner() returns would clear 
>      the stuck QUEUED bit.
> 
> 

Yes, this looks real indeed.  Though I couldn't hit the UAF issue while
testing host->guest transfer under KASAN.

>> 	}
>>
>> 	if (virtio_vsock_skb_reply(skb))
>> @@ -640,6 +647,9 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
>> 		mutex_unlock(&vq->mutex);
>> 	}
>>
>> +	smp_wmb();	/* pairs with smp_rmb() in send_pkt */
>> +	WRITE_ONCE(vsock->cpr_paused, false);
>> +
>> 	/* Some packets may have been queued before the device was started,
>> 	 * let's kick the send worker to send them.
>> 	 */
>> @@ -671,6 +681,11 @@ static void vhost_vsock_drop_backends(struct vhost_vsock *vsock)
>>
>> 	lockdep_assert_held(&vsock->dev.mutex);
>>
>> +	if (vhost_vq_get_backend(&vsock->vqs[VSOCK_VQ_RX])) {
>> +		WRITE_ONCE(vsock->cpr_paused, true);
>> +		smp_wmb();	/* pairs with smp_rmb() in send_pkt */
>> +	}
> 
> Why here and not in vhost_vsock_reset_owner()?
> 
> Also having this here will set it to true also with 
> VHOST_VSOCK_SET_RUNNING(0), is that right?
>

That was added here precisely to cover the vhost_vsock_stop() case (see
above).

> Thanks,
> Stefano
> 
>> +
>> 	for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
>> 		vq = &vsock->vqs[i];
>>
>> @@ -728,6 +743,7 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
>>
>> 	vsock->guest_cid = 0; /* no CID assigned yet */
>> 	vsock->seqpacket_allow = false;
>> +	vsock->cpr_paused = false;
>>
>> 	atomic_set(&vsock->queued_replies, 0);
>>
>> -- 
>> 2.47.1
>>
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/4] vhost/vsock: re-scan TX virtqueue on device start
  2026-06-16 14:23   ` Stefano Garzarella
@ 2026-06-16 15:58     ` Andrey Drobyshev
  0 siblings, 0 replies; 17+ messages in thread
From: Andrey Drobyshev @ 2026-06-16 15:58 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: linux-kernel, kvm, virtualization, netdev, mst, stefanha,
	maciej.szmigiero, bchaney, mark.kanda, ptikhomirov, den

On 6/16/26 5:23 PM, Stefano Garzarella wrote:
> On Fri, Jun 12, 2026 at 07:57:18PM +0300, Andrey Drobyshev wrote:
>> During QEMU CPR live-update (and VHOST_RESET_OWNER in general) the guest
>> keeps running while the host drops and later re-attaches vhost backends.
>> If the guest adds a buffer to the TX virtqueue (guest->host) and kicks
>> while the backend is temporarily NULL (between vhost_vsock_drop_backends()
>> and the next vhost_vsock_start()), then the kick is delivered to the
>> vhost worker, handle_tx_kick() sees a NULL backend and returns, and the
>> kick signal is consumed.  The buffer is then left in the ring.
>>
>> Then upon device start vhost_vsock_start() only re-kicks the RX send
>> worker, never the TX VQ, so the buffer is processed only if the guest
>> happens to kick again.  But if the guest itself is now waiting for data
>>from the host, it will never kick TX VQ again, and we end up in a
>> deadlock.
>>
>> The deadlock is reproduced during active host->guest socat data transfer
>> under multiple consecutive CPR live-update's.
>>
>> To fix this, in vhost_vsock_start(), after kicking the RX send worker, also
>> queue the TX vq poll so any buffers the guest enqueued while we were paused
>> get scanned.
> 
> Again, it seems like we're fixing an issue that existed before this 
> series, but IIUC without support for VHOST_RESET_OWNER, this could never 
> have happened, so the wording should be changed to make it clear that 
> this is can happen only with the new VHOST_RESET_OWNER support.
> 
> In addition, this patch must also be applied before the 
> VHOST_RESET_OWNER support or merged into it.
>

Agreed.
>>
>> Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
>> ---
>> drivers/vhost/vsock.c | 6 ++++++
>> 1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>> index bcaba36becd7..1fcfe71d18be 100644
>> --- a/drivers/vhost/vsock.c
>> +++ b/drivers/vhost/vsock.c
>> @@ -655,6 +655,12 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
>> 	 */
>> 	vhost_vq_work_queue(&vsock->vqs[VSOCK_VQ_RX], &vsock->send_pkt_work);
>>
>> +	/*
>> +	 * Some packets might've also been queued in TX VQ.  Re-scan it here,
>> +	 * mirroring the RX send-worker kick above.
>> +	 */
> 
> Can we also mention that this is related to VHOST_RESET_OWNER?
>

Agreed.
> Thanks,
> Stefano
> 
>> +	vhost_poll_queue(&vsock->vqs[VSOCK_VQ_TX].poll);
>> +
>> 	mutex_unlock(&vsock->dev.mutex);
>> 	return 0;
>>
>> -- 
>> 2.47.1
>>
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] vhost/vsock: suppress EHOSTUNREACH fast-fail during CPR pause
  2026-06-16 15:58     ` Andrey Drobyshev
@ 2026-06-16 16:13       ` Stefano Garzarella
  0 siblings, 0 replies; 17+ messages in thread
From: Stefano Garzarella @ 2026-06-16 16:13 UTC (permalink / raw)
  To: Andrey Drobyshev
  Cc: linux-kernel, kvm, virtualization, netdev, mst, stefanha,
	maciej.szmigiero, bchaney, mark.kanda, ptikhomirov, den

On Tue, Jun 16, 2026 at 06:58:40PM +0300, Andrey Drobyshev wrote:
>On 6/16/26 5:18 PM, Stefano Garzarella wrote:
>> On Fri, Jun 12, 2026 at 07:57:17PM +0300, Andrey Drobyshev wrote:

[...]

>>> static u32 vhost_transport_get_local_cid(void)
>>> @@ -311,11 +312,17 @@ vhost_transport_send_pkt(struct sk_buff *skb, struct net *net)
>>> 	 * the mutex would be too expensive in this hot path, and we already have
>>> 	 * all the outcomes covered: if the backend becomes NULL right after the check,
>>> 	 * vhost_transport_do_send_pkt() will check it under the mutex anyway.
>>> +	 *
>>> +	 * Don't fast-fail if cpr_paused is set, keep queueing skbs instead.
>>> +	 * The kick in vhost_vsock_start() will drain them on resume.
>>> 	 */
>>> 	if (unlikely(!data_race(vhost_vq_get_backend(&vsock->vqs[VSOCK_VQ_RX])))) {
>>> -		rcu_read_unlock();
>>> -		kfree_skb(skb);
>>> ]		return -EHOSTUNREACH;
>>> +		smp_rmb();	/* pairs with smp_wmb() in start/drop_backends */
>>> +		if (!READ_ONCE(vsock->cpr_paused)) {
>>
>> Can we avoid this which is not really readable and maybe add a single
>> variable to control the fast-fail at all?
>>
>> I mean replacing both cpr_paused + backend-pointer with a single
>> `started` flag: set it to false at open, true on start via
>> smp_store_release(), back to false on normal stop, and leave it true
>> during CPR pause.
>>
>> The reader in send_pkt can do just:
>>
>>      if (!smp_load_acquire(&vsock->started))
>>          return -EHOSTUNREACH;
>>
>> WDYT?
>>
>
>I don't think it's gonna work as suggested.  As I understand, the order
>during CPR migration is:
>
>1) SET_RUNNING(0)
>       -> vhost_vsock_stop()
>           -> vhost_vsock_drop_backends()
>2) RESET_OWNER
>       -> vhost_vsock_drop_backends()
>3) SET_OWNER
>4) SET_RUNNING(1)
>       -> vhost_vsock_start
>           -> for (...) vhost_vq_set_backend()
>
>(Btw I just noticed backends are already NULL at step 2), but that's
>just our CPR case, for any potential RESET_OWNER users it might not be
>the case).
>
>So the race windows starts from 1) (not from 2)).  We have no way of
>differentiating whether device is actually being stopped for good, or
>we're in the middle of CPR.  If we set the flag to false on stop as you
>suggested, we'll still hit the -EHOSTUNREACH case eventually, and
>avoiding it is the whole purpose of this patch.
>
>The fast-fail with -EHOSTUNREACH relies on the presence of backends.
>IIUC the backend will only become set after initial SET_RUNNING(1),
>which will only happen once the guest driver writes smth to virtio
>config register, QEMU catches it and calls SET_RUNNING(1).  So we have
>ordering with the guest's actions here, which is logical.  But for our
>issue that means that the only true marker of paused/not paused is the
>presence of backends - and that's why the flag is set in
>vhost_vsock_drop_backends().

Okay, so what about avoiding to set `started` to false in 
SET_RUNNING(0)? I mean use it just to track the first SET_RUNNING(1).
(And maybe changing the name to that variable).

Apart from CPR, when can SET_RUNNING(0) occur?

At the end that was just an optimization, if we queue the packet is not 
a big issue IMO.

>
>>> +			rcu_read_unlock();
>>> +			kfree_skb(skb);
>>> +			return -EHOSTUNREACH;
>>> +		}
>>
>>
>> That said claude here is reporting a potential issue that I think we
>> should consider:
>>      After VHOST_RESET_OWNER, the guest CID stays in the hash, so
>>      vhost_transport_send_pkt() can still find the vsock, skip the
>>      fast-fail (cpr_paused=true), and call vhost_vq_work_queue() while
>>      vhost_workers_free() is freeing workers without a synchronize_rcu()
>>      — risking a use-after-free. Also, any send_pkt_work queued between
>>      the last flush and worker teardown gets its VHOST_WORK_QUEUED 
>>      bit
>>      stuck (the vhost task exits without draining), deadlocking
>>      host→guest traffic after restart.
>>
>>      A synchronize_rcu() in vhost_workers_free() between the
>>      rcu_assign_pointer(NULL) loop and the destroy loop would close the
>>      use-after-free, and reinitializing send_pkt_work via
>>      vhost_work_init() after vhost_dev_reset_owner() returns would clear
>>      the stuck QUEUED bit.
>>
>>
>
>Yes, this looks real indeed.  Though I couldn't hit the UAF issue while
>testing host->guest transfer under KASAN.
>
>>> 	}
>>>
>>> 	if (virtio_vsock_skb_reply(skb))
>>> @@ -640,6 +647,9 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
>>> 		mutex_unlock(&vq->mutex);
>>> 	}
>>>
>>> +	smp_wmb();	/* pairs with smp_rmb() in send_pkt */
>>> +	WRITE_ONCE(vsock->cpr_paused, false);
>>> +
>>> 	/* Some packets may have been queued before the device was started,
>>> 	 * let's kick the send worker to send them.
>>> 	 */
>>> @@ -671,6 +681,11 @@ static void vhost_vsock_drop_backends(struct vhost_vsock *vsock)
>>>
>>> 	lockdep_assert_held(&vsock->dev.mutex);
>>>
>>> +	if (vhost_vq_get_backend(&vsock->vqs[VSOCK_VQ_RX])) {
>>> +		WRITE_ONCE(vsock->cpr_paused, true);
>>> +		smp_wmb();	/* pairs with smp_rmb() in send_pkt */
>>> +	}
>>
>> Why here and not in vhost_vsock_reset_owner()?
>>
>> Also having this here will set it to true also with
>> VHOST_VSOCK_SET_RUNNING(0), is that right?
>>
>
>That was added here precisely to cover the vhost_vsock_stop() case (see
>above).

I see now, a comment or something in the commit would have helped.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2026-06-16 16:13 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-12 16:57 [PATCH 0/4] vhost/vsock: add support for VHOST_RESET_OWNER and CPR migration Andrey Drobyshev
2026-06-12 16:57 ` [PATCH 1/4] vhost/vsock: split out vhost_vsock_drop_backends helper Andrey Drobyshev
2026-06-16 13:42   ` Stefano Garzarella
2026-06-12 16:57 ` [PATCH 2/4] vhost/vsock: add VHOST_RESET_OWNER ioctl Andrey Drobyshev
2026-06-16 13:48   ` Stefano Garzarella
2026-06-16 14:10     ` Andrey Drobyshev
2026-06-16 14:26       ` Stefano Garzarella
2026-06-12 16:57 ` [PATCH 3/4] vhost/vsock: suppress EHOSTUNREACH fast-fail during CPR pause Andrey Drobyshev
2026-06-16 14:18   ` Stefano Garzarella
2026-06-16 15:58     ` Andrey Drobyshev
2026-06-16 16:13       ` Stefano Garzarella
2026-06-12 16:57 ` [PATCH 4/4] vhost/vsock: re-scan TX virtqueue on device start Andrey Drobyshev
2026-06-16 14:23   ` Stefano Garzarella
2026-06-16 15:58     ` Andrey Drobyshev
2026-06-16 13:35 ` [PATCH 0/4] vhost/vsock: add support for VHOST_RESET_OWNER and CPR migration Stefano Garzarella
2026-06-16 14:01   ` Andrey Drobyshev
2026-06-16 14:28     ` Stefano Garzarella

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox