* [PATCH v2] vhost/net: Replace wait_queue with completion in ubufs reference
@ 2025-07-18 11:03 Nikolay Kuratov
2025-07-18 12:46 ` Hillf Danton
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Nikolay Kuratov @ 2025-07-18 11:03 UTC (permalink / raw)
To: linux-kernel
Cc: netdev, virtualization, kvm, Michael S. Tsirkin, Jason Wang,
Eugenio Pérez, Lei Yang, Hillf Danton, Nikolay Kuratov,
stable, Andrey Ryabinin, Andrey Smetanin
When operating on struct vhost_net_ubuf_ref, the following execution
sequence is theoretically possible:
CPU0 is finalizing DMA operation CPU1 is doing VHOST_NET_SET_BACKEND
// &ubufs->refcount == 2
vhost_net_ubuf_put() vhost_net_ubuf_put_wait_and_free(oldubufs)
vhost_net_ubuf_put_and_wait()
vhost_net_ubuf_put()
int r = atomic_sub_return(1, &ubufs->refcount);
// r = 1
int r = atomic_sub_return(1, &ubufs->refcount);
// r = 0
wait_event(ubufs->wait, !atomic_read(&ubufs->refcount));
// no wait occurs here because condition is already true
kfree(ubufs);
if (unlikely(!r))
wake_up(&ubufs->wait); // use-after-free
This leads to use-after-free on ubufs access. This happens because CPU1
skips waiting for wake_up() when refcount is already zero.
To prevent that use a completion instead of wait_queue as the ubufs
notification mechanism. wait_for_completion() guarantees that there will
be complete() call prior to its return.
We also need to reinit completion in vhost_net_flush(), because
refcnt == 0 does not mean freeing in that case.
Cc: stable@vger.kernel.org
Fixes: 0ad8b480d6ee9 ("vhost: fix ref cnt checking deadlock")
Reported-by: Andrey Ryabinin <arbn@yandex-team.com>
Suggested-by: Andrey Smetanin <asmetanin@yandex-team.ru>
Suggested-by: Hillf Danton <hdanton@sina.com>
Tested-by: Lei Yang <leiyang@redhat.com> (v1)
Signed-off-by: Nikolay Kuratov <kniv@yandex-team.ru>
---
v2:
* move reinit_completion() into vhost_net_flush(), thanks
to Hillf Danton
* add Tested-by: Lei Yang
* check that usages of put_and_wait() are consistent across
LTS kernels
drivers/vhost/net.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 7cbfc7d718b3..69e1bfb9627e 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -94,7 +94,7 @@ struct vhost_net_ubuf_ref {
* >1: outstanding ubufs
*/
atomic_t refcount;
- wait_queue_head_t wait;
+ struct completion wait;
struct vhost_virtqueue *vq;
};
@@ -240,7 +240,7 @@ vhost_net_ubuf_alloc(struct vhost_virtqueue *vq, bool zcopy)
if (!ubufs)
return ERR_PTR(-ENOMEM);
atomic_set(&ubufs->refcount, 1);
- init_waitqueue_head(&ubufs->wait);
+ init_completion(&ubufs->wait);
ubufs->vq = vq;
return ubufs;
}
@@ -249,14 +249,14 @@ static int vhost_net_ubuf_put(struct vhost_net_ubuf_ref *ubufs)
{
int r = atomic_sub_return(1, &ubufs->refcount);
if (unlikely(!r))
- wake_up(&ubufs->wait);
+ complete_all(&ubufs->wait);
return r;
}
static void vhost_net_ubuf_put_and_wait(struct vhost_net_ubuf_ref *ubufs)
{
vhost_net_ubuf_put(ubufs);
- wait_event(ubufs->wait, !atomic_read(&ubufs->refcount));
+ wait_for_completion(&ubufs->wait);
}
static void vhost_net_ubuf_put_wait_and_free(struct vhost_net_ubuf_ref *ubufs)
@@ -1381,6 +1381,7 @@ static void vhost_net_flush(struct vhost_net *n)
mutex_lock(&n->vqs[VHOST_NET_VQ_TX].vq.mutex);
n->tx_flush = false;
atomic_set(&n->vqs[VHOST_NET_VQ_TX].ubufs->refcount, 1);
+ reinit_completion(&n->vqs[VHOST_NET_VQ_TX].ubufs->wait);
mutex_unlock(&n->vqs[VHOST_NET_VQ_TX].vq.mutex);
}
}
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH v2] vhost/net: Replace wait_queue with completion in ubufs reference
2025-07-18 11:03 [PATCH v2] vhost/net: Replace wait_queue with completion in ubufs reference Nikolay Kuratov
@ 2025-07-18 12:46 ` Hillf Danton
2025-07-18 13:24 ` [PATCH] " Nikolay Kuratov
2025-07-18 23:03 ` [PATCH v2] " Hillf Danton
2025-08-05 10:02 ` Michael S. Tsirkin
2 siblings, 1 reply; 12+ messages in thread
From: Hillf Danton @ 2025-07-18 12:46 UTC (permalink / raw)
To: Nikolay Kuratov
Cc: linux-kernel, virtualization, Michael S. Tsirkin, Jason Wang,
Lei Yang, Andrey Ryabinin, Andrey Smetanin
On Fri, 18 Jul 2025 14:03:55 +0300 Nikolay Kuratov wrote:
>
> drivers/vhost/net.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 7cbfc7d718b3..69e1bfb9627e 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -94,7 +94,7 @@ struct vhost_net_ubuf_ref {
> * >1: outstanding ubufs
> */
> atomic_t refcount;
> - wait_queue_head_t wait;
> + struct completion wait;
> struct vhost_virtqueue *vq;
> };
>
> @@ -240,7 +240,7 @@ vhost_net_ubuf_alloc(struct vhost_virtqueue *vq, bool zcopy)
> if (!ubufs)
> return ERR_PTR(-ENOMEM);
> atomic_set(&ubufs->refcount, 1);
> - init_waitqueue_head(&ubufs->wait);
> + init_completion(&ubufs->wait);
> ubufs->vq = vq;
> return ubufs;
> }
> @@ -249,14 +249,14 @@ static int vhost_net_ubuf_put(struct vhost_net_ubuf_ref *ubufs)
> {
> int r = atomic_sub_return(1, &ubufs->refcount);
> if (unlikely(!r))
> - wake_up(&ubufs->wait);
> + complete_all(&ubufs->wait);
> return r;
> }
>
> static void vhost_net_ubuf_put_and_wait(struct vhost_net_ubuf_ref *ubufs)
> {
> vhost_net_ubuf_put(ubufs);
> - wait_event(ubufs->wait, !atomic_read(&ubufs->refcount));
> + wait_for_completion(&ubufs->wait);
> }
>
> static void vhost_net_ubuf_put_wait_and_free(struct vhost_net_ubuf_ref *ubufs)
> @@ -1381,6 +1381,7 @@ static void vhost_net_flush(struct vhost_net *n)
> mutex_lock(&n->vqs[VHOST_NET_VQ_TX].vq.mutex);
> n->tx_flush = false;
> atomic_set(&n->vqs[VHOST_NET_VQ_TX].ubufs->refcount, 1);
> + reinit_completion(&n->vqs[VHOST_NET_VQ_TX].ubufs->wait);
> mutex_unlock(&n->vqs[VHOST_NET_VQ_TX].vq.mutex);
> }
> }
> --
> 2.34.1
>
In the sequence below,
vhost_net_flush()
vhost_net_ubuf_put_and_wait(n->vqs[VHOST_NET_VQ_TX].ubufs);
wait_for_completion(&ubufs->wait);
reinit_completion(&n->vqs[VHOST_NET_VQ_TX].ubufs->wait);
reinit after wait, so the chance for missing wakeup still exists.
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] vhost/net: Replace wait_queue with completion in ubufs reference
2025-07-18 12:46 ` Hillf Danton
@ 2025-07-18 13:24 ` Nikolay Kuratov
2025-07-18 22:11 ` Hillf Danton
0 siblings, 1 reply; 12+ messages in thread
From: Nikolay Kuratov @ 2025-07-18 13:24 UTC (permalink / raw)
To: linux-kernel
Cc: netdev, virtualization, kvm, Michael S. Tsirkin, Jason Wang,
Eugenio Pérez, Lei Yang, Hillf Danton, kniv
> reinit after wait, so the chance for missing wakeup still exists.
Can you please provide more details on this? Yes, it is reinit after wait,
but wait should not be concurrent. I checked multiple code pathes towards
vhost_net_flush(), they're all protected by device mutex, except
vhost_net_release(). In case of vhost_net_release() - it would be a
problem itself if it was called in parallel with some ioctl on a device?
Also rationale for this is that put_and_wait() is waiting for zero
refcount condition. Zero refcount means that after put_and_wait() calling
thread is the only owner of an ubufs structure. If multiple threads got
ubufs structure with zero refcount - how either thread can be sure that
another one is not free'ing it?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] vhost/net: Replace wait_queue with completion in ubufs reference
2025-07-18 13:24 ` [PATCH] " Nikolay Kuratov
@ 2025-07-18 22:11 ` Hillf Danton
0 siblings, 0 replies; 12+ messages in thread
From: Hillf Danton @ 2025-07-18 22:11 UTC (permalink / raw)
To: Nikolay Kuratov
Cc: linux-kernel, virtualization, Michael S. Tsirkin, Jason Wang,
Eugenio Peterz, Lei Yang
On Fri, 18 Jul 2025 16:24:14 +0300 Nikolay Kuratov wrote:
> > reinit after wait, so the chance for missing wakeup still exists.
>
> Can you please provide more details on this? Yes, it is reinit after wait,
The missing wakeup exists if complete_all() is used in combination with
reinit after wait, with nothing to do with vhost.
Your patch was checked simply because of reinit, which hints the chance for
mess in mind without exception.
Of course feel free to prove that missing wakeup disappears in vhost
even if reinit is deployed.
> but wait should not be concurrent. I checked multiple code pathes towards
> vhost_net_flush(), they're all protected by device mutex, except
> vhost_net_release(). In case of vhost_net_release() - it would be a
> problem itself if it was called in parallel with some ioctl on a device?
>
> Also rationale for this is that put_and_wait() is waiting for zero
> refcount condition. Zero refcount means that after put_and_wait() calling
> thread is the only owner of an ubufs structure. If multiple threads got
> ubufs structure with zero refcount - how either thread can be sure that
> another one is not free'ing it?
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] vhost/net: Replace wait_queue with completion in ubufs reference
2025-07-18 11:03 [PATCH v2] vhost/net: Replace wait_queue with completion in ubufs reference Nikolay Kuratov
2025-07-18 12:46 ` Hillf Danton
@ 2025-07-18 23:03 ` Hillf Danton
2025-07-20 16:13 ` Michael S. Tsirkin
2025-08-05 10:02 ` Michael S. Tsirkin
2 siblings, 1 reply; 12+ messages in thread
From: Hillf Danton @ 2025-07-18 23:03 UTC (permalink / raw)
To: Nikolay Kuratov
Cc: linux-kernel, virtualization, Michael S. Tsirkin, Jason Wang,
Eugenio Perez, Lei Yang, Andrey Ryabinin, Andrey Smetanin
On Fri, 18 Jul 2025 14:03:55 +0300 Nikolay Kuratov wrote:
> When operating on struct vhost_net_ubuf_ref, the following execution
> sequence is theoretically possible:
> CPU0 is finalizing DMA operation CPU1 is doing VHOST_NET_SET_BACKEND
> // &ubufs->refcount == 2
> vhost_net_ubuf_put() vhost_net_ubuf_put_wait_and_free(oldubufs)
> vhost_net_ubuf_put_and_wait()
> vhost_net_ubuf_put()
> int r = atomic_sub_return(1, &ubufs->refcount);
> // r = 1
> int r = atomic_sub_return(1, &ubufs->refcount);
> // r = 0
> wait_event(ubufs->wait, !atomic_read(&ubufs->refcount));
> // no wait occurs here because condition is already true
> kfree(ubufs);
> if (unlikely(!r))
> wake_up(&ubufs->wait); // use-after-free
>
> This leads to use-after-free on ubufs access. This happens because CPU1
> skips waiting for wake_up() when refcount is already zero.
>
> To prevent that use a completion instead of wait_queue as the ubufs
> notification mechanism. wait_for_completion() guarantees that there will
> be complete() call prior to its return.
>
Alternatively rcu helps.
--- x/drivers/vhost/net.c
+++ y/drivers/vhost/net.c
@@ -96,6 +96,7 @@ struct vhost_net_ubuf_ref {
atomic_t refcount;
wait_queue_head_t wait;
struct vhost_virtqueue *vq;
+ struct rcu_head rcu;
};
#define VHOST_NET_BATCH 64
@@ -247,9 +248,13 @@ vhost_net_ubuf_alloc(struct vhost_virtqu
static int vhost_net_ubuf_put(struct vhost_net_ubuf_ref *ubufs)
{
- int r = atomic_sub_return(1, &ubufs->refcount);
+ int r;
+
+ rcu_read_lock();
+ r = atomic_sub_return(1, &ubufs->refcount);
if (unlikely(!r))
wake_up(&ubufs->wait);
+ rcu_read_unlock();
return r;
}
@@ -262,7 +267,7 @@ static void vhost_net_ubuf_put_and_wait(
static void vhost_net_ubuf_put_wait_and_free(struct vhost_net_ubuf_ref *ubufs)
{
vhost_net_ubuf_put_and_wait(ubufs);
- kfree(ubufs);
+ kfree_rcu(ubufs, rcu);
}
static void vhost_net_clear_ubuf_info(struct vhost_net *n)
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH v2] vhost/net: Replace wait_queue with completion in ubufs reference
2025-07-18 23:03 ` [PATCH v2] " Hillf Danton
@ 2025-07-20 16:13 ` Michael S. Tsirkin
2025-07-21 14:52 ` Lei Yang
0 siblings, 1 reply; 12+ messages in thread
From: Michael S. Tsirkin @ 2025-07-20 16:13 UTC (permalink / raw)
To: Hillf Danton
Cc: Nikolay Kuratov, linux-kernel, virtualization, Jason Wang,
Eugenio Perez, Lei Yang, Andrey Ryabinin, Andrey Smetanin
On Sat, Jul 19, 2025 at 07:03:23AM +0800, Hillf Danton wrote:
> On Fri, 18 Jul 2025 14:03:55 +0300 Nikolay Kuratov wrote:
> > When operating on struct vhost_net_ubuf_ref, the following execution
> > sequence is theoretically possible:
> > CPU0 is finalizing DMA operation CPU1 is doing VHOST_NET_SET_BACKEND
> > // &ubufs->refcount == 2
> > vhost_net_ubuf_put() vhost_net_ubuf_put_wait_and_free(oldubufs)
> > vhost_net_ubuf_put_and_wait()
> > vhost_net_ubuf_put()
> > int r = atomic_sub_return(1, &ubufs->refcount);
> > // r = 1
> > int r = atomic_sub_return(1, &ubufs->refcount);
> > // r = 0
> > wait_event(ubufs->wait, !atomic_read(&ubufs->refcount));
> > // no wait occurs here because condition is already true
> > kfree(ubufs);
> > if (unlikely(!r))
> > wake_up(&ubufs->wait); // use-after-free
> >
> > This leads to use-after-free on ubufs access. This happens because CPU1
> > skips waiting for wake_up() when refcount is already zero.
> >
> > To prevent that use a completion instead of wait_queue as the ubufs
> > notification mechanism. wait_for_completion() guarantees that there will
> > be complete() call prior to its return.
> >
> Alternatively rcu helps.
>
> --- x/drivers/vhost/net.c
> +++ y/drivers/vhost/net.c
> @@ -96,6 +96,7 @@ struct vhost_net_ubuf_ref {
> atomic_t refcount;
> wait_queue_head_t wait;
> struct vhost_virtqueue *vq;
> + struct rcu_head rcu;
> };
>
> #define VHOST_NET_BATCH 64
> @@ -247,9 +248,13 @@ vhost_net_ubuf_alloc(struct vhost_virtqu
>
> static int vhost_net_ubuf_put(struct vhost_net_ubuf_ref *ubufs)
> {
> - int r = atomic_sub_return(1, &ubufs->refcount);
> + int r;
> +
> + rcu_read_lock();
> + r = atomic_sub_return(1, &ubufs->refcount);
> if (unlikely(!r))
> wake_up(&ubufs->wait);
> + rcu_read_unlock();
> return r;
> }
>
> @@ -262,7 +267,7 @@ static void vhost_net_ubuf_put_and_wait(
> static void vhost_net_ubuf_put_wait_and_free(struct vhost_net_ubuf_ref *ubufs)
> {
> vhost_net_ubuf_put_and_wait(ubufs);
> - kfree(ubufs);
> + kfree_rcu(ubufs, rcu);
> }
>
> static void vhost_net_clear_ubuf_info(struct vhost_net *n)
I like that.
--
MST
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH v2] vhost/net: Replace wait_queue with completion in ubufs reference
2025-07-20 16:13 ` Michael S. Tsirkin
@ 2025-07-21 14:52 ` Lei Yang
0 siblings, 0 replies; 12+ messages in thread
From: Lei Yang @ 2025-07-21 14:52 UTC (permalink / raw)
To: Nikolay Kuratov
Cc: Hillf Danton, linux-kernel, virtualization, Michael S. Tsirkin,
Jason Wang, Eugenio Perez, Andrey Ryabinin, Andrey Smetanin
Tested this patch's V2 with the virtio-net regression test, everything
works fine.
Tested-by: Lei Yang <leiyang@redhat.com>
On Mon, Jul 21, 2025 at 12:13 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Sat, Jul 19, 2025 at 07:03:23AM +0800, Hillf Danton wrote:
> > On Fri, 18 Jul 2025 14:03:55 +0300 Nikolay Kuratov wrote:
> > > When operating on struct vhost_net_ubuf_ref, the following execution
> > > sequence is theoretically possible:
> > > CPU0 is finalizing DMA operation CPU1 is doing VHOST_NET_SET_BACKEND
> > > // &ubufs->refcount == 2
> > > vhost_net_ubuf_put() vhost_net_ubuf_put_wait_and_free(oldubufs)
> > > vhost_net_ubuf_put_and_wait()
> > > vhost_net_ubuf_put()
> > > int r = atomic_sub_return(1, &ubufs->refcount);
> > > // r = 1
> > > int r = atomic_sub_return(1, &ubufs->refcount);
> > > // r = 0
> > > wait_event(ubufs->wait, !atomic_read(&ubufs->refcount));
> > > // no wait occurs here because condition is already true
> > > kfree(ubufs);
> > > if (unlikely(!r))
> > > wake_up(&ubufs->wait); // use-after-free
> > >
> > > This leads to use-after-free on ubufs access. This happens because CPU1
> > > skips waiting for wake_up() when refcount is already zero.
> > >
> > > To prevent that use a completion instead of wait_queue as the ubufs
> > > notification mechanism. wait_for_completion() guarantees that there will
> > > be complete() call prior to its return.
> > >
> > Alternatively rcu helps.
> >
> > --- x/drivers/vhost/net.c
> > +++ y/drivers/vhost/net.c
> > @@ -96,6 +96,7 @@ struct vhost_net_ubuf_ref {
> > atomic_t refcount;
> > wait_queue_head_t wait;
> > struct vhost_virtqueue *vq;
> > + struct rcu_head rcu;
> > };
> >
> > #define VHOST_NET_BATCH 64
> > @@ -247,9 +248,13 @@ vhost_net_ubuf_alloc(struct vhost_virtqu
> >
> > static int vhost_net_ubuf_put(struct vhost_net_ubuf_ref *ubufs)
> > {
> > - int r = atomic_sub_return(1, &ubufs->refcount);
> > + int r;
> > +
> > + rcu_read_lock();
> > + r = atomic_sub_return(1, &ubufs->refcount);
> > if (unlikely(!r))
> > wake_up(&ubufs->wait);
> > + rcu_read_unlock();
> > return r;
> > }
> >
> > @@ -262,7 +267,7 @@ static void vhost_net_ubuf_put_and_wait(
> > static void vhost_net_ubuf_put_wait_and_free(struct vhost_net_ubuf_ref *ubufs)
> > {
> > vhost_net_ubuf_put_and_wait(ubufs);
> > - kfree(ubufs);
> > + kfree_rcu(ubufs, rcu);
> > }
> >
> > static void vhost_net_clear_ubuf_info(struct vhost_net *n)
>
> I like that.
>
> --
> MST
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] vhost/net: Replace wait_queue with completion in ubufs reference
2025-07-18 11:03 [PATCH v2] vhost/net: Replace wait_queue with completion in ubufs reference Nikolay Kuratov
2025-07-18 12:46 ` Hillf Danton
2025-07-18 23:03 ` [PATCH v2] " Hillf Danton
@ 2025-08-05 10:02 ` Michael S. Tsirkin
2 siblings, 0 replies; 12+ messages in thread
From: Michael S. Tsirkin @ 2025-08-05 10:02 UTC (permalink / raw)
To: Nikolay Kuratov
Cc: linux-kernel, netdev, virtualization, kvm, Jason Wang,
Eugenio Pérez, Lei Yang, Hillf Danton, stable,
Andrey Ryabinin, Andrey Smetanin
On Fri, Jul 18, 2025 at 02:03:55PM +0300, Nikolay Kuratov wrote:
> When operating on struct vhost_net_ubuf_ref, the following execution
> sequence is theoretically possible:
> CPU0 is finalizing DMA operation CPU1 is doing VHOST_NET_SET_BACKEND
> // &ubufs->refcount == 2
> vhost_net_ubuf_put() vhost_net_ubuf_put_wait_and_free(oldubufs)
> vhost_net_ubuf_put_and_wait()
> vhost_net_ubuf_put()
> int r = atomic_sub_return(1, &ubufs->refcount);
> // r = 1
> int r = atomic_sub_return(1, &ubufs->refcount);
> // r = 0
> wait_event(ubufs->wait, !atomic_read(&ubufs->refcount));
> // no wait occurs here because condition is already true
> kfree(ubufs);
> if (unlikely(!r))
> wake_up(&ubufs->wait); // use-after-free
>
> This leads to use-after-free on ubufs access. This happens because CPU1
> skips waiting for wake_up() when refcount is already zero.
>
> To prevent that use a completion instead of wait_queue as the ubufs
> notification mechanism. wait_for_completion() guarantees that there will
> be complete() call prior to its return.
>
> We also need to reinit completion in vhost_net_flush(), because
> refcnt == 0 does not mean freeing in that case.
>
> Cc: stable@vger.kernel.org
> Fixes: 0ad8b480d6ee9 ("vhost: fix ref cnt checking deadlock")
> Reported-by: Andrey Ryabinin <arbn@yandex-team.com>
> Suggested-by: Andrey Smetanin <asmetanin@yandex-team.ru>
> Suggested-by: Hillf Danton <hdanton@sina.com>
> Tested-by: Lei Yang <leiyang@redhat.com> (v1)
> Signed-off-by: Nikolay Kuratov <kniv@yandex-team.ru>
Nikolay should I expect v3?
> ---
> v2:
> * move reinit_completion() into vhost_net_flush(), thanks
> to Hillf Danton
> * add Tested-by: Lei Yang
> * check that usages of put_and_wait() are consistent across
> LTS kernels
>
> drivers/vhost/net.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 7cbfc7d718b3..69e1bfb9627e 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -94,7 +94,7 @@ struct vhost_net_ubuf_ref {
> * >1: outstanding ubufs
> */
> atomic_t refcount;
> - wait_queue_head_t wait;
> + struct completion wait;
> struct vhost_virtqueue *vq;
> };
>
> @@ -240,7 +240,7 @@ vhost_net_ubuf_alloc(struct vhost_virtqueue *vq, bool zcopy)
> if (!ubufs)
> return ERR_PTR(-ENOMEM);
> atomic_set(&ubufs->refcount, 1);
> - init_waitqueue_head(&ubufs->wait);
> + init_completion(&ubufs->wait);
> ubufs->vq = vq;
> return ubufs;
> }
> @@ -249,14 +249,14 @@ static int vhost_net_ubuf_put(struct vhost_net_ubuf_ref *ubufs)
> {
> int r = atomic_sub_return(1, &ubufs->refcount);
> if (unlikely(!r))
> - wake_up(&ubufs->wait);
> + complete_all(&ubufs->wait);
> return r;
> }
>
> static void vhost_net_ubuf_put_and_wait(struct vhost_net_ubuf_ref *ubufs)
> {
> vhost_net_ubuf_put(ubufs);
> - wait_event(ubufs->wait, !atomic_read(&ubufs->refcount));
> + wait_for_completion(&ubufs->wait);
> }
>
> static void vhost_net_ubuf_put_wait_and_free(struct vhost_net_ubuf_ref *ubufs)
> @@ -1381,6 +1381,7 @@ static void vhost_net_flush(struct vhost_net *n)
> mutex_lock(&n->vqs[VHOST_NET_VQ_TX].vq.mutex);
> n->tx_flush = false;
> atomic_set(&n->vqs[VHOST_NET_VQ_TX].ubufs->refcount, 1);
> + reinit_completion(&n->vqs[VHOST_NET_VQ_TX].ubufs->wait);
> mutex_unlock(&n->vqs[VHOST_NET_VQ_TX].vq.mutex);
> }
> }
> --
> 2.34.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] vhost/net: Replace wait_queue with completion in ubufs reference
@ 2025-07-16 16:22 Nikolay Kuratov
2025-07-18 8:31 ` Lei Yang
2025-07-18 9:07 ` Hillf Danton
0 siblings, 2 replies; 12+ messages in thread
From: Nikolay Kuratov @ 2025-07-16 16:22 UTC (permalink / raw)
To: linux-kernel
Cc: netdev, virtualization, kvm, Michael S. Tsirkin, Jason Wang,
Eugenio Pérez, Nikolay Kuratov, stable, Andrey Ryabinin,
Andrey Smetanin
When operating on struct vhost_net_ubuf_ref, the following execution
sequence is theoretically possible:
CPU0 is finalizing DMA operation CPU1 is doing VHOST_NET_SET_BACKEND
// &ubufs->refcount == 2
vhost_net_ubuf_put() vhost_net_ubuf_put_wait_and_free(oldubufs)
vhost_net_ubuf_put_and_wait()
vhost_net_ubuf_put()
int r = atomic_sub_return(1, &ubufs->refcount);
// r = 1
int r = atomic_sub_return(1, &ubufs->refcount);
// r = 0
wait_event(ubufs->wait, !atomic_read(&ubufs->refcount));
// no wait occurs here because condition is already true
kfree(ubufs);
if (unlikely(!r))
wake_up(&ubufs->wait); // use-after-free
This leads to use-after-free on ubufs access. This happens because CPU1
skips waiting for wake_up() when refcount is already zero.
To prevent that use a completion instead of wait_queue as the ubufs
notification mechanism. wait_for_completion() guarantees that there will
be complete() call prior to its return.
We also need to reinit completion because refcnt == 0 does not mean
freeing in case of vhost_net_flush() - it then sets refcnt back to 1.
AFAIK concurrent calls to vhost_net_ubuf_put_and_wait() with the same
ubufs object aren't possible since those calls (through vhost_net_flush()
or vhost_net_set_backend()) are protected by the device mutex.
So reinit_completion() right after wait_for_completion() should be fine.
Cc: stable@vger.kernel.org
Fixes: 0ad8b480d6ee9 ("vhost: fix ref cnt checking deadlock")
Reported-by: Andrey Ryabinin <arbn@yandex-team.com>
Suggested-by: Andrey Smetanin <asmetanin@yandex-team.ru>
Signed-off-by: Nikolay Kuratov <kniv@yandex-team.ru>
---
drivers/vhost/net.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 7cbfc7d718b3..454d179fffeb 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -94,7 +94,7 @@ struct vhost_net_ubuf_ref {
* >1: outstanding ubufs
*/
atomic_t refcount;
- wait_queue_head_t wait;
+ struct completion wait;
struct vhost_virtqueue *vq;
};
@@ -240,7 +240,7 @@ vhost_net_ubuf_alloc(struct vhost_virtqueue *vq, bool zcopy)
if (!ubufs)
return ERR_PTR(-ENOMEM);
atomic_set(&ubufs->refcount, 1);
- init_waitqueue_head(&ubufs->wait);
+ init_completion(&ubufs->wait);
ubufs->vq = vq;
return ubufs;
}
@@ -249,14 +249,15 @@ static int vhost_net_ubuf_put(struct vhost_net_ubuf_ref *ubufs)
{
int r = atomic_sub_return(1, &ubufs->refcount);
if (unlikely(!r))
- wake_up(&ubufs->wait);
+ complete_all(&ubufs->wait);
return r;
}
static void vhost_net_ubuf_put_and_wait(struct vhost_net_ubuf_ref *ubufs)
{
vhost_net_ubuf_put(ubufs);
- wait_event(ubufs->wait, !atomic_read(&ubufs->refcount));
+ wait_for_completion(&ubufs->wait);
+ reinit_completion(&ubufs->wait);
}
static void vhost_net_ubuf_put_wait_and_free(struct vhost_net_ubuf_ref *ubufs)
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH] vhost/net: Replace wait_queue with completion in ubufs reference
2025-07-16 16:22 [PATCH] " Nikolay Kuratov
@ 2025-07-18 8:31 ` Lei Yang
2025-07-18 9:07 ` Hillf Danton
1 sibling, 0 replies; 12+ messages in thread
From: Lei Yang @ 2025-07-18 8:31 UTC (permalink / raw)
To: Nikolay Kuratov
Cc: linux-kernel, netdev, virtualization, kvm, Michael S. Tsirkin,
Jason Wang, Eugenio Pérez, stable, Andrey Ryabinin,
Andrey Smetanin
Tested this patch with virtio-net regression tests, everything works fine.
Tested-by: Lei Yang <leiyang@redhat.com>
On Thu, Jul 17, 2025 at 12:24 AM Nikolay Kuratov <kniv@yandex-team.ru> wrote:
>
> When operating on struct vhost_net_ubuf_ref, the following execution
> sequence is theoretically possible:
> CPU0 is finalizing DMA operation CPU1 is doing VHOST_NET_SET_BACKEND
> // &ubufs->refcount == 2
> vhost_net_ubuf_put() vhost_net_ubuf_put_wait_and_free(oldubufs)
> vhost_net_ubuf_put_and_wait()
> vhost_net_ubuf_put()
> int r = atomic_sub_return(1, &ubufs->refcount);
> // r = 1
> int r = atomic_sub_return(1, &ubufs->refcount);
> // r = 0
> wait_event(ubufs->wait, !atomic_read(&ubufs->refcount));
> // no wait occurs here because condition is already true
> kfree(ubufs);
> if (unlikely(!r))
> wake_up(&ubufs->wait); // use-after-free
>
> This leads to use-after-free on ubufs access. This happens because CPU1
> skips waiting for wake_up() when refcount is already zero.
>
> To prevent that use a completion instead of wait_queue as the ubufs
> notification mechanism. wait_for_completion() guarantees that there will
> be complete() call prior to its return.
>
> We also need to reinit completion because refcnt == 0 does not mean
> freeing in case of vhost_net_flush() - it then sets refcnt back to 1.
> AFAIK concurrent calls to vhost_net_ubuf_put_and_wait() with the same
> ubufs object aren't possible since those calls (through vhost_net_flush()
> or vhost_net_set_backend()) are protected by the device mutex.
> So reinit_completion() right after wait_for_completion() should be fine.
>
> Cc: stable@vger.kernel.org
> Fixes: 0ad8b480d6ee9 ("vhost: fix ref cnt checking deadlock")
> Reported-by: Andrey Ryabinin <arbn@yandex-team.com>
> Suggested-by: Andrey Smetanin <asmetanin@yandex-team.ru>
> Signed-off-by: Nikolay Kuratov <kniv@yandex-team.ru>
> ---
> drivers/vhost/net.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 7cbfc7d718b3..454d179fffeb 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -94,7 +94,7 @@ struct vhost_net_ubuf_ref {
> * >1: outstanding ubufs
> */
> atomic_t refcount;
> - wait_queue_head_t wait;
> + struct completion wait;
> struct vhost_virtqueue *vq;
> };
>
> @@ -240,7 +240,7 @@ vhost_net_ubuf_alloc(struct vhost_virtqueue *vq, bool zcopy)
> if (!ubufs)
> return ERR_PTR(-ENOMEM);
> atomic_set(&ubufs->refcount, 1);
> - init_waitqueue_head(&ubufs->wait);
> + init_completion(&ubufs->wait);
> ubufs->vq = vq;
> return ubufs;
> }
> @@ -249,14 +249,15 @@ static int vhost_net_ubuf_put(struct vhost_net_ubuf_ref *ubufs)
> {
> int r = atomic_sub_return(1, &ubufs->refcount);
> if (unlikely(!r))
> - wake_up(&ubufs->wait);
> + complete_all(&ubufs->wait);
> return r;
> }
>
> static void vhost_net_ubuf_put_and_wait(struct vhost_net_ubuf_ref *ubufs)
> {
> vhost_net_ubuf_put(ubufs);
> - wait_event(ubufs->wait, !atomic_read(&ubufs->refcount));
> + wait_for_completion(&ubufs->wait);
> + reinit_completion(&ubufs->wait);
> }
>
> static void vhost_net_ubuf_put_wait_and_free(struct vhost_net_ubuf_ref *ubufs)
> --
> 2.34.1
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] vhost/net: Replace wait_queue with completion in ubufs reference
2025-07-16 16:22 [PATCH] " Nikolay Kuratov
2025-07-18 8:31 ` Lei Yang
@ 2025-07-18 9:07 ` Hillf Danton
2025-07-18 9:48 ` Nikolay Kuratov
1 sibling, 1 reply; 12+ messages in thread
From: Hillf Danton @ 2025-07-18 9:07 UTC (permalink / raw)
To: Nikolay Kuratov
Cc: linux-kernel, virtualization, Michael S. Tsirkin, Jason Wang,
Lei Yang, Andrey Ryabinin, Andrey Smetanin
On Wed, 16 Jul 2025 19:22:43 +0300 Nikolay Kuratov wrote:
> drivers/vhost/net.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 7cbfc7d718b3..454d179fffeb 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -94,7 +94,7 @@ struct vhost_net_ubuf_ref {
> * >1: outstanding ubufs
> */
> atomic_t refcount;
> - wait_queue_head_t wait;
> + struct completion wait;
> struct vhost_virtqueue *vq;
> };
>
> @@ -240,7 +240,7 @@ vhost_net_ubuf_alloc(struct vhost_virtqueue *vq, bool zcopy)
> if (!ubufs)
> return ERR_PTR(-ENOMEM);
> atomic_set(&ubufs->refcount, 1);
> - init_waitqueue_head(&ubufs->wait);
> + init_completion(&ubufs->wait);
> ubufs->vq = vq;
> return ubufs;
> }
> @@ -249,14 +249,15 @@ static int vhost_net_ubuf_put(struct vhost_net_ubuf_ref *ubufs)
> {
> int r = atomic_sub_return(1, &ubufs->refcount);
> if (unlikely(!r))
> - wake_up(&ubufs->wait);
> + complete_all(&ubufs->wait);
> return r;
> }
>
> static void vhost_net_ubuf_put_and_wait(struct vhost_net_ubuf_ref *ubufs)
> {
> vhost_net_ubuf_put(ubufs);
> - wait_event(ubufs->wait, !atomic_read(&ubufs->refcount));
> + wait_for_completion(&ubufs->wait);
> + reinit_completion(&ubufs->wait);
In the case of 5 waiters for example, after the first waiter reinitializes
the completion, the 3rd waiter misses the wakeup, no?
> }
>
> static void vhost_net_ubuf_put_wait_and_free(struct vhost_net_ubuf_ref *ubufs)
> --
> 2.34.1
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] vhost/net: Replace wait_queue with completion in ubufs reference
2025-07-18 9:07 ` Hillf Danton
@ 2025-07-18 9:48 ` Nikolay Kuratov
0 siblings, 0 replies; 12+ messages in thread
From: Nikolay Kuratov @ 2025-07-18 9:48 UTC (permalink / raw)
To: linux-kernel
Cc: netdev, virtualization, kvm, Michael S. Tsirkin, Jason Wang,
Eugenio Pérez, Lei Yang, Hillf Danton
Yes, if multiple waiters call vhost_net_ubuf_put_and_wait() concurrently we
are screwed. Furthermore, it was not the case before this patch. While it
was explicitly mentioned in the commit message, now I changed my mind,
because amount of vhost_net_ubuf_put_and_wait() users may change when this
patch will be backported to older LTSes. In 6.6+ kernels there are
only two put_and_wait() callers, both are ensuring that there
is only one thread calling put_and_wait() at a time.
I think its better to preserve thread-safety of vhost_net_ubuf_put_and_wait()
and move reinit_completion() call to vhost_net_flush(). We don't need
reinit on free'ing path anyway.
I will send v2 with the fix. Thank you for noticing this.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-08-05 10:02 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-18 11:03 [PATCH v2] vhost/net: Replace wait_queue with completion in ubufs reference Nikolay Kuratov
2025-07-18 12:46 ` Hillf Danton
2025-07-18 13:24 ` [PATCH] " Nikolay Kuratov
2025-07-18 22:11 ` Hillf Danton
2025-07-18 23:03 ` [PATCH v2] " Hillf Danton
2025-07-20 16:13 ` Michael S. Tsirkin
2025-07-21 14:52 ` Lei Yang
2025-08-05 10:02 ` Michael S. Tsirkin
-- strict thread matches above, loose matches on Subject: below --
2025-07-16 16:22 [PATCH] " Nikolay Kuratov
2025-07-18 8:31 ` Lei Yang
2025-07-18 9:07 ` Hillf Danton
2025-07-18 9:48 ` Nikolay Kuratov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox