From: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
To: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>,
linux-kernel@vger.kernel.org
Cc: kvm@vger.kernel.org, virtualization@lists.linux.dev,
netdev@vger.kernel.org, sgarzare@redhat.com, mst@redhat.com,
stefanha@redhat.com, dongli.zhang@oracle.com,
maciej.szmigiero@oracle.com, bchaney@akamai.com,
mark.kanda@oracle.com, den@openvz.org
Subject: Re: [PATCH v3 4/4] vhost/vsock: add VHOST_RESET_OWNER ioctl
Date: Thu, 25 Jun 2026 18:13:45 +0200 [thread overview]
Message-ID: <de5fa7c0-0734-4ef3-8f20-e761095c8dd1@virtuozzo.com> (raw)
In-Reply-To: <20260625155416.480669-5-andrey.drobyshev@virtuozzo.com>
Reviewed-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
On 6/25/26 17:54, Andrey Drobyshev wrote:
> From: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
>
> This ioctl is needed for QEMU's CPR (checkpoint-restore) migration of
> the guest with vhost-vsock device. For this to work, we need to reset
> the device ownership on the source side by calling RESET_OWNER, and then
> claim it on the dest side by calling SET_OWNER. We expect not to lose any
> AF_VSOCK connection while this happens.
>
> RESET_OWNER keeps the guest CID hashed, so that connections survive. That
> leaves the device reachable by a lockless send/cancel path while the worker
> is being torn down: a concurrent vhost_transport_send_pkt() or
> vhost_transport_cancel_pkt() can call vhost_vq_work_queue() as
> vhost_workers_free() frees the worker. That might cause a use-after-free
> of vq->worker. In addition, any work queued onto the dying worker leaves
> VHOST_WORK_QUEUED stuck, stalling send_pkt_queue after resume.
>
> Fence the send/cancel paths around the teardown: send_pkt()/cancel_pkt()
> only kick the worker while the backend is alive. And reset_owner() calls
> synchronize_rcu() after drop_backends() so in-flight send/cancel finish
> before the worker is freed.
>
> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
> Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
> ---
> drivers/vhost/vsock.c | 51 +++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 49 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index 81d4f7209719..f0a0aa7d3200 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -318,7 +318,14 @@ vhost_transport_send_pkt(struct sk_buff *skb, struct net *net)
> atomic_inc(&vsock->queued_replies);
>
> virtio_vsock_skb_queue_tail(&vsock->send_pkt_queue, skb);
> - vhost_vq_work_queue(&vsock->vqs[VSOCK_VQ_RX], &vsock->send_pkt_work);
> +
> + /* Skip the kick once the backend is gone (stop/RESET_OWNER); the skb
> + * stays queued and vhost_vsock_start() drains it. Pairs with the
> + * synchronize_rcu() in vhost_vsock_reset_owner().
> + */
> + if (data_race(vhost_vq_get_backend(&vsock->vqs[VSOCK_VQ_RX])))
> + vhost_vq_work_queue(&vsock->vqs[VSOCK_VQ_RX],
> + &vsock->send_pkt_work);
>
> rcu_read_unlock();
> return len;
> @@ -346,7 +353,15 @@ vhost_transport_cancel_pkt(struct vsock_sock *vsk)
> int new_cnt;
>
> new_cnt = atomic_sub_return(cnt, &vsock->queued_replies);
> - if (new_cnt + cnt >= tx_vq->num && new_cnt < tx_vq->num)
> +
> + /* Skip the kick once the backend is gone (stop/RESET_OWNER):
> + * vhost_poll_queue() would touch the worker which is being freed
> + * by teardown, e.g. on RESET_OWNER. Pairs with the
> + * synchronize_rcu() in vhost_vsock_reset_owner(). The TX VQ is
> + * re-kicked by vhost_vsock_start().
> + */
> + if (data_race(vhost_vq_get_backend(tx_vq)) &&
> + new_cnt + cnt >= tx_vq->num && new_cnt < tx_vq->num)
> vhost_poll_queue(&tx_vq->poll);
> }
>
> @@ -903,6 +918,36 @@ static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features)
> return -EFAULT;
> }
>
> +static int vhost_vsock_reset_owner(struct vhost_vsock *vsock)
> +{
> + struct vhost_iotlb *umem;
> + long err;
> +
> + mutex_lock(&vsock->dev.mutex);
> + err = vhost_dev_check_owner(&vsock->dev);
> + if (err)
> + goto done;
> + umem = vhost_dev_reset_owner_prepare();
> + if (!umem) {
> + err = -ENOMEM;
> + goto done;
> + }
> + vhost_vsock_drop_backends(vsock);
> +
> + /* Let in-flight send_pkt() callers stop touching the worker before the
> + * flush + free below. Pairs with the backend check in
> + * vhost_transport_send_pkt().
> + */
> + synchronize_rcu();
> +
> + vhost_vsock_flush(vsock);
> + vhost_dev_stop(&vsock->dev);
> + vhost_dev_reset_owner(&vsock->dev, umem);
> +done:
> + mutex_unlock(&vsock->dev.mutex);
> + return err;
> +}
> +
> static long vhost_vsock_dev_ioctl(struct file *f, unsigned int ioctl,
> unsigned long arg)
> {
> @@ -946,6 +991,8 @@ static long vhost_vsock_dev_ioctl(struct file *f, unsigned int ioctl,
> return -EOPNOTSUPP;
> vhost_set_backend_features(&vsock->dev, features);
> return 0;
> + case VHOST_RESET_OWNER:
> + return vhost_vsock_reset_owner(vsock);
> default:
> mutex_lock(&vsock->dev.mutex);
> r = vhost_dev_ioctl(&vsock->dev, ioctl, argp);
--
Best regards, Pavel Tikhomirov
Senior Software Developer, Virtuozzo.
prev parent reply other threads:[~2026-06-25 16:13 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-25 15:54 [PATCH v3 0/4] vhost/vsock: add support for VHOST_RESET_OWNER and CPR migration Andrey Drobyshev
2026-06-25 15:54 ` [PATCH v3 1/4] vhost/vsock: split out vhost_vsock_drop_backends helper Andrey Drobyshev
2026-06-25 15:54 ` [PATCH v3 2/4] vhost/vsock: suppress EHOSTUNREACH fast-fail during CPR pause Andrey Drobyshev
2026-06-25 15:54 ` [PATCH v3 3/4] vhost/vsock: re-scan TX virtqueue on device start Andrey Drobyshev
2026-06-25 15:54 ` [PATCH v3 4/4] vhost/vsock: add VHOST_RESET_OWNER ioctl Andrey Drobyshev
2026-06-25 16:13 ` Pavel Tikhomirov [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=de5fa7c0-0734-4ef3-8f20-e761095c8dd1@virtuozzo.com \
--to=ptikhomirov@virtuozzo.com \
--cc=andrey.drobyshev@virtuozzo.com \
--cc=bchaney@akamai.com \
--cc=den@openvz.org \
--cc=dongli.zhang@oracle.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maciej.szmigiero@oracle.com \
--cc=mark.kanda@oracle.com \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=sgarzare@redhat.com \
--cc=stefanha@redhat.com \
--cc=virtualization@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox