public inbox for virtualization@lists.linux-foundation.org
 help / color / mirror / Atom feed
* [PATCH RFC] vhost: fix vhost_get_avail_idx for a non empty ring
@ 2026-03-02  8:51 Michael S. Tsirkin
  2026-03-02 14:30 ` Stefano Garzarella
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Michael S. Tsirkin @ 2026-03-02  8:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: ShuangYu, Stefano Garzarella, Stefan Hajnoczi, Jason Wang,
	Eugenio Pérez, kvm, virtualization, netdev

vhost_get_avail_idx is supposed to report whether it has updated
vq->avail_idx. Instead, it returns whether all entries have been
consumed, which is usually the same. But not always - in
drivers/vhost/net.c and when mergeable buffers have been enabled, the
driver checks whether the combined entries are big enough to store an
incoming packet. If not, the driver re-enables notifications with
available entries still in the ring. The incorrect return value from
vhost_get_avail_idx propagates through vhost_enable_notify and causes
the host to livelock if the guest is not making progress, as vhost will
immediately disable notifications and retry using the available entries.

The obvious fix is to make vhost_get_avail_idx do what the comment
says it does and report whether new entries have been added.

Reported-by: ShuangYu <shuangyu@yunyoo.cc>
Fixes: d3bb267bbdcb ("vhost: cache avail index in vhost_enable_notify()")
Cc: Stefano Garzarella <sgarzare@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---

Lightly tested, posting early to simplify testing for the reporter.

 drivers/vhost/vhost.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 2f2c45d20883..db329a6f6145 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1522,6 +1522,7 @@ static void vhost_dev_unlock_vqs(struct vhost_dev *d)
 static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq)
 {
 	__virtio16 idx;
+	u16 avail_idx;
 	int r;
 
 	r = vhost_get_avail(vq, idx, &vq->avail->idx);
@@ -1532,17 +1533,19 @@ static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq)
 	}
 
 	/* Check it isn't doing very strange thing with available indexes */
-	vq->avail_idx = vhost16_to_cpu(vq, idx);
-	if (unlikely((u16)(vq->avail_idx - vq->last_avail_idx) > vq->num)) {
+	avail_idx = vhost16_to_cpu(vq, idx);
+	if (unlikely((u16)(avail_idx - vq->last_avail_idx) > vq->num)) {
 		vq_err(vq, "Invalid available index change from %u to %u",
-		       vq->last_avail_idx, vq->avail_idx);
+		       vq->last_avail_idx, avail_idx);
 		return -EINVAL;
 	}
 
 	/* We're done if there is nothing new */
-	if (vq->avail_idx == vq->last_avail_idx)
+	if (avail_idx == vq->avail_idx)
 		return 0;
 
+	vq->avail_idx = avail_idx;
+
 	/*
 	 * We updated vq->avail_idx so we need a memory barrier between
 	 * the index read above and the caller reading avail ring entries.
-- 
MST


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC] vhost: fix vhost_get_avail_idx for a non empty ring
  2026-03-02  8:51 [PATCH RFC] vhost: fix vhost_get_avail_idx for a non empty ring Michael S. Tsirkin
@ 2026-03-02 14:30 ` Stefano Garzarella
  2026-03-02 15:12   ` Michael S. Tsirkin
  2026-03-03  6:38 ` Jason Wang
  2026-03-22  9:43 ` ShuangYu
  2 siblings, 1 reply; 5+ messages in thread
From: Stefano Garzarella @ 2026-03-02 14:30 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, ShuangYu, Stefan Hajnoczi, Jason Wang,
	Eugenio Pérez, kvm, virtualization, netdev

On Mon, Mar 02, 2026 at 03:51:49AM -0500, Michael S. Tsirkin wrote:
>vhost_get_avail_idx is supposed to report whether it has updated
>vq->avail_idx. Instead, it returns whether all entries have been
>consumed, which is usually the same. But not always - in
>drivers/vhost/net.c and when mergeable buffers have been enabled, the
>driver checks whether the combined entries are big enough to store an
>incoming packet. If not, the driver re-enables notifications with
>available entries still in the ring. The incorrect return value from
>vhost_get_avail_idx propagates through vhost_enable_notify and causes
>the host to livelock if the guest is not making progress, as vhost will
>immediately disable notifications and retry using the available entries.

Here I'd add something like this just to make it clear the full picture, 
because I spent quite some time to understand how it was related to the 
Fixes tag (which I agree is the right one to use).

   This goes back to commit d3bb267bbdcb ("vhost: cache avail index in
   vhost_enable_notify()") which changed vhost_enable_notify() to compare
   the freshly read avail index against vq->last_avail_idx instead of the
   previously cached vq->avail_idx. Commit 7ad472397667 ("vhost: move
   smp_rmb() into vhost_get_avail_idx()") then carried over the same
   comparison when refactoring vhost_enable_notify() to call the unified
   vhost_get_avail_idx().

>
>The obvious fix is to make vhost_get_avail_idx do what the comment
>says it does and report whether new entries have been added.
>
>Reported-by: ShuangYu <shuangyu@yunyoo.cc>
>Fixes: d3bb267bbdcb ("vhost: cache avail index in vhost_enable_notify()")
>Cc: Stefano Garzarella <sgarzare@redhat.com>
>Cc: Stefan Hajnoczi <stefanha@redhat.com>
>Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>---
>
>Lightly tested, posting early to simplify testing for the reporter.

Tested with vhost-vsock and I didn't see any issue.

Thanks!

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>

>
> drivers/vhost/vhost.c | 11 +++++++----
> 1 file changed, 7 insertions(+), 4 deletions(-)
>
>diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>index 2f2c45d20883..db329a6f6145 100644
>--- a/drivers/vhost/vhost.c
>+++ b/drivers/vhost/vhost.c
>@@ -1522,6 +1522,7 @@ static void vhost_dev_unlock_vqs(struct vhost_dev *d)
> static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq)
> {
> 	__virtio16 idx;
>+	u16 avail_idx;
> 	int r;
>
> 	r = vhost_get_avail(vq, idx, &vq->avail->idx);
>@@ -1532,17 +1533,19 @@ static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq)
> 	}
>
> 	/* Check it isn't doing very strange thing with available indexes */
>-	vq->avail_idx = vhost16_to_cpu(vq, idx);
>-	if (unlikely((u16)(vq->avail_idx - vq->last_avail_idx) > vq->num)) {
>+	avail_idx = vhost16_to_cpu(vq, idx);
>+	if (unlikely((u16)(avail_idx - vq->last_avail_idx) > vq->num)) {
> 		vq_err(vq, "Invalid available index change from %u to %u",
>-		       vq->last_avail_idx, vq->avail_idx);
>+		       vq->last_avail_idx, avail_idx);
> 		return -EINVAL;
> 	}
>
> 	/* We're done if there is nothing new */
>-	if (vq->avail_idx == vq->last_avail_idx)
>+	if (avail_idx == vq->avail_idx)
> 		return 0;
>
>+	vq->avail_idx = avail_idx;
>+
> 	/*
> 	 * We updated vq->avail_idx so we need a memory barrier between
> 	 * the index read above and the caller reading avail ring entries.
>-- 
>MST
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC] vhost: fix vhost_get_avail_idx for a non empty ring
  2026-03-02 14:30 ` Stefano Garzarella
@ 2026-03-02 15:12   ` Michael S. Tsirkin
  0 siblings, 0 replies; 5+ messages in thread
From: Michael S. Tsirkin @ 2026-03-02 15:12 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: linux-kernel, ShuangYu, Stefan Hajnoczi, Jason Wang,
	Eugenio Pérez, kvm, virtualization, netdev

On Mon, Mar 02, 2026 at 03:30:53PM +0100, Stefano Garzarella wrote:
> On Mon, Mar 02, 2026 at 03:51:49AM -0500, Michael S. Tsirkin wrote:
> > vhost_get_avail_idx is supposed to report whether it has updated
> > vq->avail_idx. Instead, it returns whether all entries have been
> > consumed, which is usually the same. But not always - in
> > drivers/vhost/net.c and when mergeable buffers have been enabled, the
> > driver checks whether the combined entries are big enough to store an
> > incoming packet. If not, the driver re-enables notifications with
> > available entries still in the ring. The incorrect return value from
> > vhost_get_avail_idx propagates through vhost_enable_notify and causes
> > the host to livelock if the guest is not making progress, as vhost will
> > immediately disable notifications and retry using the available entries.
> 
> Here I'd add something like this just to make it clear the full picture,
> because I spent quite some time to understand how it was related to the
> Fixes tag (which I agree is the right one to use).
> 
>   This goes back to commit d3bb267bbdcb ("vhost: cache avail index in
>   vhost_enable_notify()") which changed vhost_enable_notify() to compare
>   the freshly read avail index against vq->last_avail_idx instead of the
>   previously cached vq->avail_idx. Commit 7ad472397667 ("vhost: move
>   smp_rmb() into vhost_get_avail_idx()") then carried over the same
>   comparison when refactoring vhost_enable_notify() to call the unified
>   vhost_get_avail_idx().

Indeed.

> > 
> > The obvious fix is to make vhost_get_avail_idx do what the comment
> > says it does and report whether new entries have been added.
> > 
> > Reported-by: ShuangYu <shuangyu@yunyoo.cc>
> > Fixes: d3bb267bbdcb ("vhost: cache avail index in vhost_enable_notify()")
> > Cc: Stefano Garzarella <sgarzare@redhat.com>
> > Cc: Stefan Hajnoczi <stefanha@redhat.com>
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > 
> > Lightly tested, posting early to simplify testing for the reporter.
> 
> Tested with vhost-vsock and I didn't see any issue.
> 
> Thanks!
> 
> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
> 
> > 
> > drivers/vhost/vhost.c | 11 +++++++----
> > 1 file changed, 7 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > index 2f2c45d20883..db329a6f6145 100644
> > --- a/drivers/vhost/vhost.c
> > +++ b/drivers/vhost/vhost.c
> > @@ -1522,6 +1522,7 @@ static void vhost_dev_unlock_vqs(struct vhost_dev *d)
> > static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq)
> > {
> > 	__virtio16 idx;
> > +	u16 avail_idx;
> > 	int r;
> > 
> > 	r = vhost_get_avail(vq, idx, &vq->avail->idx);
> > @@ -1532,17 +1533,19 @@ static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq)
> > 	}
> > 
> > 	/* Check it isn't doing very strange thing with available indexes */
> > -	vq->avail_idx = vhost16_to_cpu(vq, idx);
> > -	if (unlikely((u16)(vq->avail_idx - vq->last_avail_idx) > vq->num)) {
> > +	avail_idx = vhost16_to_cpu(vq, idx);
> > +	if (unlikely((u16)(avail_idx - vq->last_avail_idx) > vq->num)) {
> > 		vq_err(vq, "Invalid available index change from %u to %u",
> > -		       vq->last_avail_idx, vq->avail_idx);
> > +		       vq->last_avail_idx, avail_idx);
> > 		return -EINVAL;
> > 	}
> > 
> > 	/* We're done if there is nothing new */
> > -	if (vq->avail_idx == vq->last_avail_idx)
> > +	if (avail_idx == vq->avail_idx)
> > 		return 0;
> > 
> > +	vq->avail_idx = avail_idx;
> > +
> > 	/*
> > 	 * We updated vq->avail_idx so we need a memory barrier between
> > 	 * the index read above and the caller reading avail ring entries.
> > -- 
> > MST
> > 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC] vhost: fix vhost_get_avail_idx for a non empty ring
  2026-03-02  8:51 [PATCH RFC] vhost: fix vhost_get_avail_idx for a non empty ring Michael S. Tsirkin
  2026-03-02 14:30 ` Stefano Garzarella
@ 2026-03-03  6:38 ` Jason Wang
  2026-03-22  9:43 ` ShuangYu
  2 siblings, 0 replies; 5+ messages in thread
From: Jason Wang @ 2026-03-03  6:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, ShuangYu, Stefano Garzarella, Stefan Hajnoczi,
	Eugenio Pérez, kvm, virtualization, netdev

On Mon, Mar 2, 2026 at 4:51 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> vhost_get_avail_idx is supposed to report whether it has updated
> vq->avail_idx. Instead, it returns whether all entries have been
> consumed, which is usually the same. But not always - in
> drivers/vhost/net.c and when mergeable buffers have been enabled, the
> driver checks whether the combined entries are big enough to store an
> incoming packet. If not, the driver re-enables notifications with
> available entries still in the ring. The incorrect return value from
> vhost_get_avail_idx propagates through vhost_enable_notify and causes
> the host to livelock if the guest is not making progress, as vhost will
> immediately disable notifications and retry using the available entries.
>
> The obvious fix is to make vhost_get_avail_idx do what the comment
> says it does and report whether new entries have been added.
>
> Reported-by: ShuangYu <shuangyu@yunyoo.cc>
> Fixes: d3bb267bbdcb ("vhost: cache avail index in vhost_enable_notify()")
> Cc: Stefano Garzarella <sgarzare@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC] vhost: fix vhost_get_avail_idx for a non empty ring
  2026-03-02  8:51 [PATCH RFC] vhost: fix vhost_get_avail_idx for a non empty ring Michael S. Tsirkin
  2026-03-02 14:30 ` Stefano Garzarella
  2026-03-03  6:38 ` Jason Wang
@ 2026-03-22  9:43 ` ShuangYu
  2 siblings, 0 replies; 5+ messages in thread
From: ShuangYu @ 2026-03-22  9:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, Stefano Garzarella, Stefan Hajnoczi, Jason Wang,
	Eugenio Pérez, kvm, virtualization, netdev


> From: "Michael S. Tsirkin"<mst@redhat.com>
> Date:  Mon, Mar 2, 2026, 16:52
> Subject:  [PATCH RFC] vhost: fix vhost_get_avail_idx for a non empty ring
> To: <linux-kernel@vger.kernel.org>
> Cc: "ShuangYu"<shuangyu@yunyoo.cc>, "Stefano Garzarella"<sgarzare@redhat.com>, "Stefan Hajnoczi"<stefanha@redhat.com>, "Jason Wang"<jasowang@redhat.com>, "Eugenio Pérez"<eperezma@redhat.com>, <kvm@vger.kernel.org>, <virtualization@lists.linux.dev>, <netdev@vger.kernel.org>
> vhost_get_avail_idx is supposed to report whether it has updated
> vq->avail_idx. Instead, it returns whether all entries have been
> consumed, which is usually the same. But not always - in
> drivers/vhost/net.c and when mergeable buffers have been enabled, the
> driver checks whether the combined entries are big enough to store an
> incoming packet. If not, the driver re-enables notifications with
> available entries still in the ring. The incorrect return value from
> vhost_get_avail_idx propagates through vhost_enable_notify and causes
> the host to livelock if the guest is not making progress, as vhost will
> immediately disable notifications and retry using the available entries.
> 
> The obvious fix is to make vhost_get_avail_idx do what the comment
> says it does and report whether new entries have been added.
> 
> Reported-by: ShuangYu <shuangyu@yunyoo.cc>
> Fixes: d3bb267bbdcb ("vhost: cache avail index in vhost_enable_notify()")
> Cc: Stefano Garzarella <sgarzare@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> 
> Lightly tested, posting early to simplify testing for the reporter.
> 
>  drivers/vhost/vhost.c | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 2f2c45d20883..db329a6f6145 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -1522,6 +1522,7 @@ static void vhost_dev_unlock_vqs(struct vhost_dev *d)
>  static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq)
>  {
>          __virtio16 idx;
> +        u16 avail_idx;
>          int r;
>  
>          r = vhost_get_avail(vq, idx, &vq->avail->idx);
> @@ -1532,17 +1533,19 @@ static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq)
>          }
>  
>          /* Check it isn't doing very strange thing with available indexes */
> -        vq->avail_idx = vhost16_to_cpu(vq, idx);
> -        if (unlikely((u16)(vq->avail_idx - vq->last_avail_idx) > vq->num)) {
> +        avail_idx = vhost16_to_cpu(vq, idx);
> +        if (unlikely((u16)(avail_idx - vq->last_avail_idx) > vq->num)) {
>                  vq_err(vq, "Invalid available index change from %u to %u",
> -                       vq->last_avail_idx, vq->avail_idx);
> +                       vq->last_avail_idx, avail_idx);
>                  return -EINVAL;
>          }
>  
>          /* We're done if there is nothing new */
> -        if (vq->avail_idx == vq->last_avail_idx)
> +        if (avail_idx == vq->avail_idx)
>                  return 0;
>  
> +        vq->avail_idx = avail_idx;
> +
>          /*
>           * We updated vq->avail_idx so we need a memory barrier between
>           * the index read above and the caller reading avail ring entries.
> -- 
> MST
> 

Hi,

Sorry for the delay, it took me some time to build a reliable test environment. 

I've verified the patch on 6.18.10. With 16 concurrent TCP flows over
virtio-net (TSO enabled, vCPU throttled to 1%, 300s duration):

  Before patch (bpftrace + CPU monitor):
    - vhost_discard_vq_desc: up to 3,716,272/3s
    - vhost_enable_notify false positives: up to 3,716,278/3s
      (nearly 1:1 with discard — every partial alloc triggers re-loop)
    - vhost worker CPU: 0~99%, frequently 50-99%
    - successful receives: as low as 137/3s

  After patch:
    - vhost_discard_vq_desc: 9~33/3s
    - vhost_enable_notify false positives: 0 (all 100 sample windows)
    - vhost worker CPU: 0% (all 149 sample points)
    - successful receives: 873~2,667/3s

The partial allocations still occur under load, but after the fix
vhost_get_avail_idx correctly returns 0, so the worker sleeps
instead of spinning. Thanks.

Tested-by: ShuangYu <shuangyu@yunyoo.cc>

Best regards,
ShuangYu

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-03-22  9:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-02  8:51 [PATCH RFC] vhost: fix vhost_get_avail_idx for a non empty ring Michael S. Tsirkin
2026-03-02 14:30 ` Stefano Garzarella
2026-03-02 15:12   ` Michael S. Tsirkin
2026-03-03  6:38 ` Jason Wang
2026-03-22  9:43 ` ShuangYu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox