Kernel KVM virtualization development
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Andrey Drobyshev" <andrey.drobyshev@virtuozzo.com>
Cc: kvm@vger.kernel.org
Subject: Re: [PATCH 2/4] vhost/vsock: add VHOST_RESET_OWNER ioctl
Date: Sat, 13 Jun 2026 16:57:44 +0000	[thread overview]
Message-ID: <20260613165745.212911F00A3D@smtp.kernel.org> (raw)
In-Reply-To: <20260612165718.433546-3-andrey.drobyshev@virtuozzo.com>

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [Critical] Concurrent host network traffic can trigger a Use-After-Free of `vhost_worker` during VHOST_RESET_OWNER.
- [High] Pending network work becomes permanently disabled if queued to a destroyed worker during VHOST_RESET_OWNER.
- [Medium] Useless O(N) system-wide socket iteration in `vhost_vsock_reset_owner` causing dead code and lock contention.
--

commit fe82b486d339416d80c28b16f13b7f0d1014a742
Author: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>

vhost/vsock: add VHOST_RESET_OWNER ioctl

This ioctl is needed for QEMU's CPR (checkpoint-restore) migration of
the guest with vhost-vsock device.  For this to work, we need to reset
the device ownership on the source side by calling RESET_OWNER, and then
claim it on the dest side by calling SET_OWNER.  We expect not to lose any
AF_VSOCK connection while this happens.

> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index b12221ce6faf2..e629886e5cf84 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c

[ ... ]

> @@ -894,6 +894,32 @@ static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features)
>  	return -EFAULT;
>  }
>  
> +static int vhost_vsock_reset_owner(struct vhost_vsock *vsock)
> +{
> +	struct vhost_iotlb *umem;
> +	long err;
> +
> +	mutex_lock(&vsock->dev.mutex);
> +	err = vhost_dev_check_owner(&vsock->dev);
> +	if (err)
> +		goto done;
> +	umem = vhost_dev_reset_owner_prepare();
> +	if (!umem) {
> +		err = -ENOMEM;
> +		goto done;
> +	}
> +	/* Follows vhost_vsock_dev_release closely except for guest_cid drop */
> +	vsock_for_each_connected_socket(&vhost_transport.transport,
> +					vhost_vsock_reset_orphans);

[Severity: Medium]
Is this iteration necessary?

Unlike the release path, this ioctl deliberately keeps vsock->guest_cid
in vhost_vsock_hash to preserve connections across QEMU CPR.

Because the CID remains valid, when vhost_vsock_reset_orphans() iterates over
each system socket and calls:

  vhost_vsock_get(vsk->remote_addr.svm_cid, sock_net(sk))

it will successfully find the vsock and immediately return without performing
any cleanup.

Does this perform a useless system-wide traversal over all vsock connections
under the vsock_table_lock spinlock, unnecessarily wasting CPU cycles and
causing lock contention?

> +	vhost_vsock_drop_backends(vsock);
> +	vhost_vsock_flush(vsock);
> +	vhost_dev_stop(&vsock->dev);
> +	vhost_dev_reset_owner(&vsock->dev, umem);

[Severity: Critical]
Can concurrent host network traffic trigger a use-after-free of the
vhost_worker here?

Since vsock intentionally remains in vhost_vsock_hash to preserve connections
across CPR, a concurrent thread sending a packet can still resolve the vsock
object.

If the network thread reads vq->worker under the RCU read lock in
vhost_vq_work_queue() just before vhost_dev_reset_owner() calls
vhost_workers_free(), could we see a sequence like this:

Thread 1 (network):
vhost_transport_send_pkt()
  vhost_vq_work_queue()
    worker = rcu_dereference(vq->worker);

Thread 2 (ioctl):
vhost_vsock_reset_owner()
  vhost_dev_reset_owner()
    vhost_dev_cleanup()
      vhost_workers_free()
        kfree(worker);

Thread 1:
    vhost_worker_queue(worker, work); // use-after-free

Since vhost_workers_free() frees the worker without waiting for an RCU grace
period via synchronize_rcu(), does this leave a window for the network
thread to dereference the freed worker?

[Severity: High]
Does this sequence permanently disable pending network work if queued to a
destroyed worker?

If vhost_transport_send_pkt() races with vhost_vsock_reset_owner(), it calls
vhost_vq_work_queue() which sets the VHOST_WORK_QUEUED flag:

  if (!test_and_set_bit(VHOST_WORK_QUEUED, &work->flags)) {
      ...
  }

If this happens right after vhost_vsock_flush() finishes, but before
vhost_workers_free() destroys the worker, the old worker thread is stopped
before it can process the list and clear the VHOST_WORK_QUEUED flag.

When the VM is resumed and a new worker is created, vhost_vsock_start() kicks
the worker by calling:

  vhost_vq_work_queue(&vsock->vqs[VSOCK_VQ_RX], &vsock->send_pkt_work);

Because the VHOST_WORK_QUEUED flag is still set to 1 from the destroyed worker,
test_and_set_bit() returns true. Will the work be skipped instead of being
added to the new worker's list, causing packets in send_pkt_queue to
permanently stall?

> +done:
> +	mutex_unlock(&vsock->dev.mutex);
> +	return err;
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260612165718.433546-1-andrey.drobyshev@virtuozzo.com?part=2

  reply	other threads:[~2026-06-13 16:57 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-12 16:57 [PATCH 0/4] vhost/vsock: add support for VHOST_RESET_OWNER and CPR migration Andrey Drobyshev
2026-06-12 16:57 ` [PATCH 1/4] vhost/vsock: split out vhost_vsock_drop_backends helper Andrey Drobyshev
2026-06-12 16:57 ` [PATCH 2/4] vhost/vsock: add VHOST_RESET_OWNER ioctl Andrey Drobyshev
2026-06-13 16:57   ` sashiko-bot [this message]
2026-06-12 16:57 ` [PATCH 3/4] vhost/vsock: suppress EHOSTUNREACH fast-fail during CPR pause Andrey Drobyshev
2026-06-13 16:57   ` sashiko-bot
2026-06-12 16:57 ` [PATCH 4/4] vhost/vsock: re-scan TX virtqueue on device start Andrey Drobyshev
2026-06-13 16:57   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260613165745.212911F00A3D@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=andrey.drobyshev@virtuozzo.com \
    --cc=kvm@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox