[PATCH 0/4] vhost/vsock: add support for VHOST_RESET

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/4] vhost/vsock: add support for VHOST_RESET_OWNER and CPR migration
@ 2026-06-12 16:57 Andrey Drobyshev
  2026-06-12 16:57 ` [PATCH 1/4] vhost/vsock: split out vhost_vsock_drop_backends helper Andrey Drobyshev
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Andrey Drobyshev @ 2026-06-12 16:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: kvm, virtualization, netdev, sgarzare, mst, stefanha,
	maciej.szmigiero, bchaney, mark.kanda, ptikhomirov, den,
	andrey.drobyshev

Host<-->guest connections via AF_VSOCK sockets aren't supposed to
outlive VM migration, since VM is moving to another host.  However
there's a special case, which is QEMU live-update, or CPR
(checkpoint-restore) migration.  In this case, VM remains on the same
host, and we'd like such connections to persist.

For this to work, we need to be able to transfer device ownership from
source QEMU to dest QEMU.  Namely, source needs to reset ownership by
issuing VHOST_RESET_OWNER ioctl, and then target has to claim it by
calling VHOST_SET_OWNER.

Since VHOST_RESET_OWNER isn't yet implemented for vhost-vsock, let's add
such implementation (patches 1-2).  Also fix regression introduced by
the earlier commit [1] (patch 3), and fix the deadlock bug (commit 4).

There's a complementary series for QEMU [0] adding support of vhost-vsock
devices during CPR migration.

NOTE: this series needs to be applied on top of Michael's vhost/linux-next
tree as it contains relevant commit [1], not yet present in master branch.

I've tested this (patched QEMU + patched kernel) approximately as follows:

  * Run listener in the guest:
  socat -u VSOCK-LISTEN:9999 - >/tmp/recv.bin

  * Run data transfer from host to guest:
  socat -u FILE:/root/bigfile.bin VSOCK-CONNECT:CID:9999

  * Perform CPR migration during transfer (either cpr-exec or cpr-transfer)
  * Check that file hash sum matches

[0] https://lore.kernel.org/qemu-devel/20260612165110.431376-1-andrey.drobyshev@virtuozzo.com
[1] https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/commit/?id=bb26ed5f3a8b

Andrey Drobyshev (1):
  vhost/vsock: re-scan TX virtqueue on device start

Denis V. Lunev (1):
  vhost/vsock: suppress EHOSTUNREACH fast-fail during CPR pause

Pavel Tikhomirov (2):
  vhost/vsock: split out vhost_vsock_drop_backends helper
  vhost/vsock: add VHOST_RESET_OWNER ioctl

 drivers/vhost/vsock.c | 80 +++++++++++++++++++++++++++++++++++++------
 1 file changed, 69 insertions(+), 11 deletions(-)

-- 
2.47.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/4] vhost/vsock: split out vhost_vsock_drop_backends helper
  2026-06-12 16:57 [PATCH 0/4] vhost/vsock: add support for VHOST_RESET_OWNER and CPR migration Andrey Drobyshev
@ 2026-06-12 16:57 ` Andrey Drobyshev
  2026-06-12 16:57 ` [PATCH 2/4] vhost/vsock: add VHOST_RESET_OWNER ioctl Andrey Drobyshev
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Andrey Drobyshev @ 2026-06-12 16:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: kvm, virtualization, netdev, sgarzare, mst, stefanha,
	maciej.szmigiero, bchaney, mark.kanda, ptikhomirov, den,
	andrey.drobyshev

From: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>

Split the actual backend dropping part from vhost_vsock_stop.  We're
going to need it for the VHOST_RESET_OWNER implementation in the
following patch, when vsock->dev.mutex is already taken and owner is
checked.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
---
 drivers/vhost/vsock.c | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 9aaab6bb8061..b12221ce6faf 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -664,9 +664,24 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
 	return ret;
 }
 
-static int vhost_vsock_stop(struct vhost_vsock *vsock, bool check_owner)
+static void vhost_vsock_drop_backends(struct vhost_vsock *vsock)
 {
+	struct vhost_virtqueue *vq;
 	size_t i;
+
+	lockdep_assert_held(&vsock->dev.mutex);
+
+	for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
+		vq = &vsock->vqs[i];
+
+		mutex_lock(&vq->mutex);
+		vhost_vq_set_backend(vq, NULL);
+		mutex_unlock(&vq->mutex);
+	}
+}
+
+static int vhost_vsock_stop(struct vhost_vsock *vsock, bool check_owner)
+{
 	int ret = 0;
 
 	mutex_lock(&vsock->dev.mutex);
@@ -677,14 +692,7 @@ static int vhost_vsock_stop(struct vhost_vsock *vsock, bool check_owner)
 			goto err;
 	}
 
-	for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
-		struct vhost_virtqueue *vq = &vsock->vqs[i];
-
-		mutex_lock(&vq->mutex);
-		vhost_vq_set_backend(vq, NULL);
-		mutex_unlock(&vq->mutex);
-	}
-
+	vhost_vsock_drop_backends(vsock);
 err:
 	mutex_unlock(&vsock->dev.mutex);
 	return ret;
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/4] vhost/vsock: add VHOST_RESET_OWNER ioctl
  2026-06-12 16:57 [PATCH 0/4] vhost/vsock: add support for VHOST_RESET_OWNER and CPR migration Andrey Drobyshev
  2026-06-12 16:57 ` [PATCH 1/4] vhost/vsock: split out vhost_vsock_drop_backends helper Andrey Drobyshev
@ 2026-06-12 16:57 ` Andrey Drobyshev
  2026-06-13 16:57   ` sashiko-bot
  2026-06-12 16:57 ` [PATCH 3/4] vhost/vsock: suppress EHOSTUNREACH fast-fail during CPR pause Andrey Drobyshev
  2026-06-12 16:57 ` [PATCH 4/4] vhost/vsock: re-scan TX virtqueue on device start Andrey Drobyshev
  3 siblings, 1 reply; 8+ messages in thread
From: Andrey Drobyshev @ 2026-06-12 16:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: kvm, virtualization, netdev, sgarzare, mst, stefanha,
	maciej.szmigiero, bchaney, mark.kanda, ptikhomirov, den,
	andrey.drobyshev

From: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>

This ioctl is needed for QEMU's CPR (checkpoint-restore) migration of
the guest with vhost-vsock device.  For this to work, we need to reset
the device ownership on the source side by calling RESET_OWNER, and then
claim it on the dest side by calling SET_OWNER.  We expect not to lose any
AF_VSOCK connection while this happens.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
---
 drivers/vhost/vsock.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index b12221ce6faf..e629886e5cf8 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -894,6 +894,32 @@ static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features)
 	return -EFAULT;
 }
 
+static int vhost_vsock_reset_owner(struct vhost_vsock *vsock)
+{
+	struct vhost_iotlb *umem;
+	long err;
+
+	mutex_lock(&vsock->dev.mutex);
+	err = vhost_dev_check_owner(&vsock->dev);
+	if (err)
+		goto done;
+	umem = vhost_dev_reset_owner_prepare();
+	if (!umem) {
+		err = -ENOMEM;
+		goto done;
+	}
+	/* Follows vhost_vsock_dev_release closely except for guest_cid drop */
+	vsock_for_each_connected_socket(&vhost_transport.transport,
+					vhost_vsock_reset_orphans);
+	vhost_vsock_drop_backends(vsock);
+	vhost_vsock_flush(vsock);
+	vhost_dev_stop(&vsock->dev);
+	vhost_dev_reset_owner(&vsock->dev, umem);
+done:
+	mutex_unlock(&vsock->dev.mutex);
+	return err;
+}
+
 static long vhost_vsock_dev_ioctl(struct file *f, unsigned int ioctl,
 				  unsigned long arg)
 {
@@ -937,6 +963,8 @@ static long vhost_vsock_dev_ioctl(struct file *f, unsigned int ioctl,
 			return -EOPNOTSUPP;
 		vhost_set_backend_features(&vsock->dev, features);
 		return 0;
+	case VHOST_RESET_OWNER:
+		return vhost_vsock_reset_owner(vsock);
 	default:
 		mutex_lock(&vsock->dev.mutex);
 		r = vhost_dev_ioctl(&vsock->dev, ioctl, argp);
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/4] vhost/vsock: suppress EHOSTUNREACH fast-fail during CPR pause
  2026-06-12 16:57 [PATCH 0/4] vhost/vsock: add support for VHOST_RESET_OWNER and CPR migration Andrey Drobyshev
  2026-06-12 16:57 ` [PATCH 1/4] vhost/vsock: split out vhost_vsock_drop_backends helper Andrey Drobyshev
  2026-06-12 16:57 ` [PATCH 2/4] vhost/vsock: add VHOST_RESET_OWNER ioctl Andrey Drobyshev
@ 2026-06-12 16:57 ` Andrey Drobyshev
  2026-06-13 16:57   ` sashiko-bot
  2026-06-12 16:57 ` [PATCH 4/4] vhost/vsock: re-scan TX virtqueue on device start Andrey Drobyshev
  3 siblings, 1 reply; 8+ messages in thread
From: Andrey Drobyshev @ 2026-06-12 16:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: kvm, virtualization, netdev, sgarzare, mst, stefanha,
	maciej.szmigiero, bchaney, mark.kanda, ptikhomirov, den,
	andrey.drobyshev

From: "Denis V. Lunev" <den@openvz.org>

Earlier commit ("ms/vhost/vsock: Refuse the connection immediately when
guest isn't ready") added a fast-fail in vhost_transport_send_pkt().  It
rejects every host send with -EHOSTUNREACH until the destination calls
SET_RUNNING(1).  The fast-fail condition checks whether device's backends
are dropped, and if they're, the guest is considered to be not ready.

However, there might be other reasons for backends to be nulled.  In
particular, when QEMU is performing CPR (checkpoint-restore) migration,
device ownership is being RESET and SET again, which leads to backends
drop and reattach.  If we end up connecting during this window, an
AF_VSOCK client gets -EHOSTUNREACH, which is wrong.

Add a cpr_paused flag set inside vhost_vsock_drop_backends() when the
backend was previously live, cleared by vhost_vsock_start(). When set,
vhost_transport_send_pkt() queues the skb instead of fast-failing; the
existing kick of send_pkt_work in vhost_vsock_start() drains it on
resume. A device that has never run keeps cpr_paused == false and the
boot-time fast-fail behaviour is preserved.

Pair the cpr_paused store with the backend store using an
smp_wmb()/smp_rmb() pair so a concurrent sender on a weakly-ordered
architecture never observes (NULL backend, !paused):

Signed-off-by: Denis V. Lunev <den@openvz.org>
---
 drivers/vhost/vsock.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index e629886e5cf8..bcaba36becd7 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -61,6 +61,7 @@ struct vhost_vsock {
 
 	u32 guest_cid;
 	bool seqpacket_allow;
+	bool cpr_paused;	/* between stop and next start */
 };
 
 static u32 vhost_transport_get_local_cid(void)
@@ -311,11 +312,17 @@ vhost_transport_send_pkt(struct sk_buff *skb, struct net *net)
 	 * the mutex would be too expensive in this hot path, and we already have
 	 * all the outcomes covered: if the backend becomes NULL right after the check,
 	 * vhost_transport_do_send_pkt() will check it under the mutex anyway.
+	 *
+	 * Don't fast-fail if cpr_paused is set, keep queueing skbs instead.
+	 * The kick in vhost_vsock_start() will drain them on resume.
 	 */
 	if (unlikely(!data_race(vhost_vq_get_backend(&vsock->vqs[VSOCK_VQ_RX])))) {
-		rcu_read_unlock();
-		kfree_skb(skb);
-		return -EHOSTUNREACH;
+		smp_rmb();	/* pairs with smp_wmb() in start/drop_backends */
+		if (!READ_ONCE(vsock->cpr_paused)) {
+			rcu_read_unlock();
+			kfree_skb(skb);
+			return -EHOSTUNREACH;
+		}
 	}
 
 	if (virtio_vsock_skb_reply(skb))
@@ -640,6 +647,9 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
 		mutex_unlock(&vq->mutex);
 	}
 
+	smp_wmb();	/* pairs with smp_rmb() in send_pkt */
+	WRITE_ONCE(vsock->cpr_paused, false);
+
 	/* Some packets may have been queued before the device was started,
 	 * let's kick the send worker to send them.
 	 */
@@ -671,6 +681,11 @@ static void vhost_vsock_drop_backends(struct vhost_vsock *vsock)
 
 	lockdep_assert_held(&vsock->dev.mutex);
 
+	if (vhost_vq_get_backend(&vsock->vqs[VSOCK_VQ_RX])) {
+		WRITE_ONCE(vsock->cpr_paused, true);
+		smp_wmb();	/* pairs with smp_rmb() in send_pkt */
+	}
+
 	for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
 		vq = &vsock->vqs[i];
 
@@ -728,6 +743,7 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
 
 	vsock->guest_cid = 0; /* no CID assigned yet */
 	vsock->seqpacket_allow = false;
+	vsock->cpr_paused = false;
 
 	atomic_set(&vsock->queued_replies, 0);
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/4] vhost/vsock: re-scan TX virtqueue on device start
  2026-06-12 16:57 [PATCH 0/4] vhost/vsock: add support for VHOST_RESET_OWNER and CPR migration Andrey Drobyshev
                   ` (2 preceding siblings ...)
  2026-06-12 16:57 ` [PATCH 3/4] vhost/vsock: suppress EHOSTUNREACH fast-fail during CPR pause Andrey Drobyshev
@ 2026-06-12 16:57 ` Andrey Drobyshev
  2026-06-13 16:57   ` sashiko-bot
  3 siblings, 1 reply; 8+ messages in thread
From: Andrey Drobyshev @ 2026-06-12 16:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: kvm, virtualization, netdev, sgarzare, mst, stefanha,
	maciej.szmigiero, bchaney, mark.kanda, ptikhomirov, den,
	andrey.drobyshev

During QEMU CPR live-update (and VHOST_RESET_OWNER in general) the guest
keeps running while the host drops and later re-attaches vhost backends.
If the guest adds a buffer to the TX virtqueue (guest->host) and kicks
while the backend is temporarily NULL (between vhost_vsock_drop_backends()
and the next vhost_vsock_start()), then the kick is delivered to the
vhost worker, handle_tx_kick() sees a NULL backend and returns, and the
kick signal is consumed.  The buffer is then left in the ring.

Then upon device start vhost_vsock_start() only re-kicks the RX send
worker, never the TX VQ, so the buffer is processed only if the guest
happens to kick again.  But if the guest itself is now waiting for data
from the host, it will never kick TX VQ again, and we end up in a
deadlock.

The deadlock is reproduced during active host->guest socat data transfer
under multiple consecutive CPR live-update's.

To fix this, in vhost_vsock_start(), after kicking the RX send worker, also
queue the TX vq poll so any buffers the guest enqueued while we were paused
get scanned.

Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
---
 drivers/vhost/vsock.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index bcaba36becd7..1fcfe71d18be 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -655,6 +655,12 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
 	 */
 	vhost_vq_work_queue(&vsock->vqs[VSOCK_VQ_RX], &vsock->send_pkt_work);

+	/*
+	 * Some packets might've also been queued in TX VQ.  Re-scan it here,
+	 * mirroring the RX send-worker kick above.
+	 */
+	vhost_poll_queue(&vsock->vqs[VSOCK_VQ_TX].poll);
+
 	mutex_unlock(&vsock->dev.mutex);
 	return 0;

-- 
2.47.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/4] vhost/vsock: suppress EHOSTUNREACH fast-fail during CPR pause
  2026-06-12 16:57 ` [PATCH 3/4] vhost/vsock: suppress EHOSTUNREACH fast-fail during CPR pause Andrey Drobyshev
@ 2026-06-13 16:57   ` sashiko-bot
  0 siblings, 0 replies; 8+ messages in thread
From: sashiko-bot @ 2026-06-13 16:57 UTC (permalink / raw)
  To: Andrey Drobyshev; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Memory ordering TOCTOU race allows incorrect fast-fail of packets during CPR resume.
--

commit ee9060520c490bf817a7d5354ea841c3834e4978
Author: Denis V. Lunev <den@openvz.org>

vhost/vsock: suppress EHOSTUNREACH fast-fail during CPR pause

This commit suppresses the fast-fail in vhost_transport_send_pkt() during CPR
pause by adding a cpr_paused flag. The flag is set when backends are dropped
and cleared when started, allowing packets to be queued instead of dropped
with -EHOSTUNREACH while the connection is paused.

> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index e629886e5cf84..bcaba36becd79 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
[ ... ]
> @@ -311,11 +312,17 @@ vhost_transport_send_pkt(struct sk_buff *skb, struct net *net)
>  	 * the mutex would be too expensive in this hot path, and we already have
>  	 * all the outcomes covered: if the backend becomes NULL right after the check,
>  	 * vhost_transport_do_send_pkt() will check it under the mutex anyway.
> +	 *
> +	 * Don't fast-fail if cpr_paused is set, keep queueing skbs instead.
> +	 * The kick in vhost_vsock_start() will drain them on resume.
>  	 */
>  	if (unlikely(!data_race(vhost_vq_get_backend(&vsock->vqs[VSOCK_VQ_RX])))) {
> -		rcu_read_unlock();
> -		kfree_skb(skb);
> -		return -EHOSTUNREACH;
> +		smp_rmb();	/* pairs with smp_wmb() in start/drop_backends */
> +		if (!READ_ONCE(vsock->cpr_paused)) {
> +			rcu_read_unlock();
> +			kfree_skb(skb);
> +			return -EHOSTUNREACH;
> +		}
>  	}
>  
>  	if (virtio_vsock_skb_reply(skb))
[ ... ]
> @@ -640,6 +647,9 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
>  		mutex_unlock(&vq->mutex);
>  	}
>  
> +	smp_wmb();	/* pairs with smp_rmb() in send_pkt */
> +	WRITE_ONCE(vsock->cpr_paused, false);

[Severity: High]
Does this memory ordering still allow a TOCTOU race where packets are
incorrectly fast-failed?

In vhost_vsock_drop_backends(), the writer updates variables in the order:
cpr_paused = true -> smp_wmb() -> backend = NULL.

The lockless reader in vhost_transport_send_pkt() reads in reverse order:
backend -> smp_rmb() -> cpr_paused. This correctly guarantees it won't see
backend == NULL without cpr_paused == true.

However, here in vhost_vsock_start(), the writer updates them in the same
order as the reader:
backend = vsock -> smp_wmb() -> cpr_paused = false.

Because both the reader and this writer process the backend pointer first and
cpr_paused second, the memory barriers do not prevent interleaving. Can a
concurrent reader execute its first read (backend) before the writer's
update, seeing NULL, and its second read (cpr_paused) after the writer's
update, seeing false?

This would cause the reader to erroneously drop the packet with
-EHOSTUNREACH on an active connection.

> +
>  	/* Some packets may have been queued before the device was started,
>  	 * let's kick the send worker to send them.
>  	 */

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260612165718.433546-1-andrey.drobyshev@virtuozzo.com?part=3

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 4/4] vhost/vsock: re-scan TX virtqueue on device start
  2026-06-12 16:57 ` [PATCH 4/4] vhost/vsock: re-scan TX virtqueue on device start Andrey Drobyshev
@ 2026-06-13 16:57   ` sashiko-bot
  0 siblings, 0 replies; 8+ messages in thread
From: sashiko-bot @ 2026-06-13 16:57 UTC (permalink / raw)
  To: Andrey Drobyshev; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

Pre-existing issues:
- [High] Host memory allocation failure incorrectly triggers a fatal virtqueue error and leaks descriptors, causing a guest vsock DoS.
- [High] TOCTOU race in `vhost_transport_send_pkt()` can erroneously drop packets during QEMU CPR live migration.
--

commit 917560a155f52870e04784955e886f79d44169bb
Author: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>

vhost/vsock: re-scan TX virtqueue on device start

This commit addresses a deadlock during QEMU CPR live-update where a TX kick
is lost because the backend is temporarily NULL. It updates vhost_vsock_start()
to queue a TX poll and rescan buffers enqueued while paused.

> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index bcaba36becd79..1fcfe71d18bec 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -655,6 +655,12 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
>  	 */
>  	vhost_vq_work_queue(&vsock->vqs[VSOCK_VQ_RX], &vsock->send_pkt_work);
>  
> +	/*
> +	 * Some packets might've also been queued in TX VQ.  Re-scan it here,
> +	 * mirroring the RX send-worker kick above.
> +	 */
> +	vhost_poll_queue(&vsock->vqs[VSOCK_VQ_TX].poll);
> +
>  	mutex_unlock(&vsock->dev.mutex);
>  	return 0;

[Severity: High]
This is a pre-existing issue, but does this allocation failure handling
correctly manage the descriptor?

In vhost_vsock_handle_tx_kick(), if vhost_vsock_alloc_skb() fails due to
transient host memory pressure:

drivers/vhost/vsock.c:vhost_vsock_handle_tx_kick() {
        ...
		skb = vhost_vsock_alloc_skb(vq, out, in);
		if (!skb) {
			vq_err(vq, "Faulted on pkt\n");
			continue;
		}
        ...
}

It looks like this skips the vhost_add_used() call, triggering a device error
via vq_err() and permanently leaking the descriptor.

Should a transient host memory allocation failure drop the packet and cleanly
return or discard the descriptor instead of breaking the virtqueue?

[Severity: High]
This isn't a bug introduced by this patch, but is there a race condition
during live migration in vhost_transport_send_pkt() that could drop packets?

Looking at vhost_transport_send_pkt():

drivers/vhost/vsock.c:vhost_transport_send_pkt() {
        ...
	if (unlikely(!data_race(vhost_vq_get_backend(&vsock->vqs[VSOCK_VQ_RX])))) {
		smp_rmb();	/* pairs with smp_wmb() in start/drop_backends */
		if (!READ_ONCE(vsock->cpr_paused)) {
			rcu_read_unlock();
			kfree_skb(skb);
			return -EHOSTUNREACH;
		}
	}
        ...
}

If a reader observes the backend is NULL, gets preempted, and then the writer
in vhost_vsock_start() sets the backend to non-NULL and cpr_paused to false,
could the reader wake up and incorrectly conclude the device is permanently
uninitialized?

If this sequence occurs, it returns -EHOSTUNREACH and drops the packet. Would
it be safer to re-verify the backend is still NULL after observing
cpr_paused == false?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260612165718.433546-1-andrey.drobyshev@virtuozzo.com?part=4

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/4] vhost/vsock: add VHOST_RESET_OWNER ioctl
  2026-06-12 16:57 ` [PATCH 2/4] vhost/vsock: add VHOST_RESET_OWNER ioctl Andrey Drobyshev
@ 2026-06-13 16:57   ` sashiko-bot
  0 siblings, 0 replies; 8+ messages in thread
From: sashiko-bot @ 2026-06-13 16:57 UTC (permalink / raw)
  To: Andrey Drobyshev; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [Critical] Concurrent host network traffic can trigger a Use-After-Free of `vhost_worker` during VHOST_RESET_OWNER.
- [High] Pending network work becomes permanently disabled if queued to a destroyed worker during VHOST_RESET_OWNER.
- [Medium] Useless O(N) system-wide socket iteration in `vhost_vsock_reset_owner` causing dead code and lock contention.
--

commit fe82b486d339416d80c28b16f13b7f0d1014a742
Author: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>

vhost/vsock: add VHOST_RESET_OWNER ioctl

This ioctl is needed for QEMU's CPR (checkpoint-restore) migration of
the guest with vhost-vsock device.  For this to work, we need to reset
the device ownership on the source side by calling RESET_OWNER, and then
claim it on the dest side by calling SET_OWNER.  We expect not to lose any
AF_VSOCK connection while this happens.

> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index b12221ce6faf2..e629886e5cf84 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c

[ ... ]

> @@ -894,6 +894,32 @@ static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features)
>  	return -EFAULT;
>  }
>  
> +static int vhost_vsock_reset_owner(struct vhost_vsock *vsock)
> +{
> +	struct vhost_iotlb *umem;
> +	long err;
> +
> +	mutex_lock(&vsock->dev.mutex);
> +	err = vhost_dev_check_owner(&vsock->dev);
> +	if (err)
> +		goto done;
> +	umem = vhost_dev_reset_owner_prepare();
> +	if (!umem) {
> +		err = -ENOMEM;
> +		goto done;
> +	}
> +	/* Follows vhost_vsock_dev_release closely except for guest_cid drop */
> +	vsock_for_each_connected_socket(&vhost_transport.transport,
> +					vhost_vsock_reset_orphans);

[Severity: Medium]
Is this iteration necessary?

Unlike the release path, this ioctl deliberately keeps vsock->guest_cid
in vhost_vsock_hash to preserve connections across QEMU CPR.

Because the CID remains valid, when vhost_vsock_reset_orphans() iterates over
each system socket and calls:

  vhost_vsock_get(vsk->remote_addr.svm_cid, sock_net(sk))

it will successfully find the vsock and immediately return without performing
any cleanup.

Does this perform a useless system-wide traversal over all vsock connections
under the vsock_table_lock spinlock, unnecessarily wasting CPU cycles and
causing lock contention?

> +	vhost_vsock_drop_backends(vsock);
> +	vhost_vsock_flush(vsock);
> +	vhost_dev_stop(&vsock->dev);
> +	vhost_dev_reset_owner(&vsock->dev, umem);

[Severity: Critical]
Can concurrent host network traffic trigger a use-after-free of the
vhost_worker here?

Since vsock intentionally remains in vhost_vsock_hash to preserve connections
across CPR, a concurrent thread sending a packet can still resolve the vsock
object.

If the network thread reads vq->worker under the RCU read lock in
vhost_vq_work_queue() just before vhost_dev_reset_owner() calls
vhost_workers_free(), could we see a sequence like this:

Thread 1 (network):
vhost_transport_send_pkt()
  vhost_vq_work_queue()
    worker = rcu_dereference(vq->worker);

Thread 2 (ioctl):
vhost_vsock_reset_owner()
  vhost_dev_reset_owner()
    vhost_dev_cleanup()
      vhost_workers_free()
        kfree(worker);

Thread 1:
    vhost_worker_queue(worker, work); // use-after-free

Since vhost_workers_free() frees the worker without waiting for an RCU grace
period via synchronize_rcu(), does this leave a window for the network
thread to dereference the freed worker?

[Severity: High]
Does this sequence permanently disable pending network work if queued to a
destroyed worker?

If vhost_transport_send_pkt() races with vhost_vsock_reset_owner(), it calls
vhost_vq_work_queue() which sets the VHOST_WORK_QUEUED flag:

  if (!test_and_set_bit(VHOST_WORK_QUEUED, &work->flags)) {
      ...
  }

If this happens right after vhost_vsock_flush() finishes, but before
vhost_workers_free() destroys the worker, the old worker thread is stopped
before it can process the list and clear the VHOST_WORK_QUEUED flag.

When the VM is resumed and a new worker is created, vhost_vsock_start() kicks
the worker by calling:

  vhost_vq_work_queue(&vsock->vqs[VSOCK_VQ_RX], &vsock->send_pkt_work);

Because the VHOST_WORK_QUEUED flag is still set to 1 from the destroyed worker,
test_and_set_bit() returns true. Will the work be skipped instead of being
added to the new worker's list, causing packets in send_pkt_queue to
permanently stall?

> +done:
> +	mutex_unlock(&vsock->dev.mutex);
> +	return err;
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260612165718.433546-1-andrey.drobyshev@virtuozzo.com?part=2

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-06-13 16:57 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-12 16:57 [PATCH 0/4] vhost/vsock: add support for VHOST_RESET_OWNER and CPR migration Andrey Drobyshev
2026-06-12 16:57 ` [PATCH 1/4] vhost/vsock: split out vhost_vsock_drop_backends helper Andrey Drobyshev
2026-06-12 16:57 ` [PATCH 2/4] vhost/vsock: add VHOST_RESET_OWNER ioctl Andrey Drobyshev
2026-06-13 16:57   ` sashiko-bot
2026-06-12 16:57 ` [PATCH 3/4] vhost/vsock: suppress EHOSTUNREACH fast-fail during CPR pause Andrey Drobyshev
2026-06-13 16:57   ` sashiko-bot
2026-06-12 16:57 ` [PATCH 4/4] vhost/vsock: re-scan TX virtqueue on device start Andrey Drobyshev
2026-06-13 16:57   ` sashiko-bot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.