* Re: [PATCH RFC] vhost: fix vhost_get_avail_idx for a non empty ring
2026-03-02 8:51 [PATCH RFC] vhost: fix vhost_get_avail_idx for a non empty ring Michael S. Tsirkin
@ 2026-03-02 14:30 ` Stefano Garzarella
2026-03-02 15:12 ` Michael S. Tsirkin
2026-03-03 6:38 ` Jason Wang
2026-03-22 9:43 ` ShuangYu
2 siblings, 1 reply; 5+ messages in thread
From: Stefano Garzarella @ 2026-03-02 14:30 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: linux-kernel, ShuangYu, Stefan Hajnoczi, Jason Wang,
Eugenio Pérez, kvm, virtualization, netdev
On Mon, Mar 02, 2026 at 03:51:49AM -0500, Michael S. Tsirkin wrote:
>vhost_get_avail_idx is supposed to report whether it has updated
>vq->avail_idx. Instead, it returns whether all entries have been
>consumed, which is usually the same. But not always - in
>drivers/vhost/net.c and when mergeable buffers have been enabled, the
>driver checks whether the combined entries are big enough to store an
>incoming packet. If not, the driver re-enables notifications with
>available entries still in the ring. The incorrect return value from
>vhost_get_avail_idx propagates through vhost_enable_notify and causes
>the host to livelock if the guest is not making progress, as vhost will
>immediately disable notifications and retry using the available entries.
Here I'd add something like this just to make it clear the full picture,
because I spent quite some time to understand how it was related to the
Fixes tag (which I agree is the right one to use).
This goes back to commit d3bb267bbdcb ("vhost: cache avail index in
vhost_enable_notify()") which changed vhost_enable_notify() to compare
the freshly read avail index against vq->last_avail_idx instead of the
previously cached vq->avail_idx. Commit 7ad472397667 ("vhost: move
smp_rmb() into vhost_get_avail_idx()") then carried over the same
comparison when refactoring vhost_enable_notify() to call the unified
vhost_get_avail_idx().
>
>The obvious fix is to make vhost_get_avail_idx do what the comment
>says it does and report whether new entries have been added.
>
>Reported-by: ShuangYu <shuangyu@yunyoo.cc>
>Fixes: d3bb267bbdcb ("vhost: cache avail index in vhost_enable_notify()")
>Cc: Stefano Garzarella <sgarzare@redhat.com>
>Cc: Stefan Hajnoczi <stefanha@redhat.com>
>Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>---
>
>Lightly tested, posting early to simplify testing for the reporter.
Tested with vhost-vsock and I didn't see any issue.
Thanks!
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
>
> drivers/vhost/vhost.c | 11 +++++++----
> 1 file changed, 7 insertions(+), 4 deletions(-)
>
>diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>index 2f2c45d20883..db329a6f6145 100644
>--- a/drivers/vhost/vhost.c
>+++ b/drivers/vhost/vhost.c
>@@ -1522,6 +1522,7 @@ static void vhost_dev_unlock_vqs(struct vhost_dev *d)
> static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq)
> {
> __virtio16 idx;
>+ u16 avail_idx;
> int r;
>
> r = vhost_get_avail(vq, idx, &vq->avail->idx);
>@@ -1532,17 +1533,19 @@ static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq)
> }
>
> /* Check it isn't doing very strange thing with available indexes */
>- vq->avail_idx = vhost16_to_cpu(vq, idx);
>- if (unlikely((u16)(vq->avail_idx - vq->last_avail_idx) > vq->num)) {
>+ avail_idx = vhost16_to_cpu(vq, idx);
>+ if (unlikely((u16)(avail_idx - vq->last_avail_idx) > vq->num)) {
> vq_err(vq, "Invalid available index change from %u to %u",
>- vq->last_avail_idx, vq->avail_idx);
>+ vq->last_avail_idx, avail_idx);
> return -EINVAL;
> }
>
> /* We're done if there is nothing new */
>- if (vq->avail_idx == vq->last_avail_idx)
>+ if (avail_idx == vq->avail_idx)
> return 0;
>
>+ vq->avail_idx = avail_idx;
>+
> /*
> * We updated vq->avail_idx so we need a memory barrier between
> * the index read above and the caller reading avail ring entries.
>--
>MST
>
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH RFC] vhost: fix vhost_get_avail_idx for a non empty ring
2026-03-02 14:30 ` Stefano Garzarella
@ 2026-03-02 15:12 ` Michael S. Tsirkin
0 siblings, 0 replies; 5+ messages in thread
From: Michael S. Tsirkin @ 2026-03-02 15:12 UTC (permalink / raw)
To: Stefano Garzarella
Cc: linux-kernel, ShuangYu, Stefan Hajnoczi, Jason Wang,
Eugenio Pérez, kvm, virtualization, netdev
On Mon, Mar 02, 2026 at 03:30:53PM +0100, Stefano Garzarella wrote:
> On Mon, Mar 02, 2026 at 03:51:49AM -0500, Michael S. Tsirkin wrote:
> > vhost_get_avail_idx is supposed to report whether it has updated
> > vq->avail_idx. Instead, it returns whether all entries have been
> > consumed, which is usually the same. But not always - in
> > drivers/vhost/net.c and when mergeable buffers have been enabled, the
> > driver checks whether the combined entries are big enough to store an
> > incoming packet. If not, the driver re-enables notifications with
> > available entries still in the ring. The incorrect return value from
> > vhost_get_avail_idx propagates through vhost_enable_notify and causes
> > the host to livelock if the guest is not making progress, as vhost will
> > immediately disable notifications and retry using the available entries.
>
> Here I'd add something like this just to make it clear the full picture,
> because I spent quite some time to understand how it was related to the
> Fixes tag (which I agree is the right one to use).
>
> This goes back to commit d3bb267bbdcb ("vhost: cache avail index in
> vhost_enable_notify()") which changed vhost_enable_notify() to compare
> the freshly read avail index against vq->last_avail_idx instead of the
> previously cached vq->avail_idx. Commit 7ad472397667 ("vhost: move
> smp_rmb() into vhost_get_avail_idx()") then carried over the same
> comparison when refactoring vhost_enable_notify() to call the unified
> vhost_get_avail_idx().
Indeed.
> >
> > The obvious fix is to make vhost_get_avail_idx do what the comment
> > says it does and report whether new entries have been added.
> >
> > Reported-by: ShuangYu <shuangyu@yunyoo.cc>
> > Fixes: d3bb267bbdcb ("vhost: cache avail index in vhost_enable_notify()")
> > Cc: Stefano Garzarella <sgarzare@redhat.com>
> > Cc: Stefan Hajnoczi <stefanha@redhat.com>
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >
> > Lightly tested, posting early to simplify testing for the reporter.
>
> Tested with vhost-vsock and I didn't see any issue.
>
> Thanks!
>
> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
>
> >
> > drivers/vhost/vhost.c | 11 +++++++----
> > 1 file changed, 7 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > index 2f2c45d20883..db329a6f6145 100644
> > --- a/drivers/vhost/vhost.c
> > +++ b/drivers/vhost/vhost.c
> > @@ -1522,6 +1522,7 @@ static void vhost_dev_unlock_vqs(struct vhost_dev *d)
> > static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq)
> > {
> > __virtio16 idx;
> > + u16 avail_idx;
> > int r;
> >
> > r = vhost_get_avail(vq, idx, &vq->avail->idx);
> > @@ -1532,17 +1533,19 @@ static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq)
> > }
> >
> > /* Check it isn't doing very strange thing with available indexes */
> > - vq->avail_idx = vhost16_to_cpu(vq, idx);
> > - if (unlikely((u16)(vq->avail_idx - vq->last_avail_idx) > vq->num)) {
> > + avail_idx = vhost16_to_cpu(vq, idx);
> > + if (unlikely((u16)(avail_idx - vq->last_avail_idx) > vq->num)) {
> > vq_err(vq, "Invalid available index change from %u to %u",
> > - vq->last_avail_idx, vq->avail_idx);
> > + vq->last_avail_idx, avail_idx);
> > return -EINVAL;
> > }
> >
> > /* We're done if there is nothing new */
> > - if (vq->avail_idx == vq->last_avail_idx)
> > + if (avail_idx == vq->avail_idx)
> > return 0;
> >
> > + vq->avail_idx = avail_idx;
> > +
> > /*
> > * We updated vq->avail_idx so we need a memory barrier between
> > * the index read above and the caller reading avail ring entries.
> > --
> > MST
> >
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH RFC] vhost: fix vhost_get_avail_idx for a non empty ring
2026-03-02 8:51 [PATCH RFC] vhost: fix vhost_get_avail_idx for a non empty ring Michael S. Tsirkin
2026-03-02 14:30 ` Stefano Garzarella
@ 2026-03-03 6:38 ` Jason Wang
2026-03-22 9:43 ` ShuangYu
2 siblings, 0 replies; 5+ messages in thread
From: Jason Wang @ 2026-03-03 6:38 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: linux-kernel, ShuangYu, Stefano Garzarella, Stefan Hajnoczi,
Eugenio Pérez, kvm, virtualization, netdev
On Mon, Mar 2, 2026 at 4:51 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> vhost_get_avail_idx is supposed to report whether it has updated
> vq->avail_idx. Instead, it returns whether all entries have been
> consumed, which is usually the same. But not always - in
> drivers/vhost/net.c and when mergeable buffers have been enabled, the
> driver checks whether the combined entries are big enough to store an
> incoming packet. If not, the driver re-enables notifications with
> available entries still in the ring. The incorrect return value from
> vhost_get_avail_idx propagates through vhost_enable_notify and causes
> the host to livelock if the guest is not making progress, as vhost will
> immediately disable notifications and retry using the available entries.
>
> The obvious fix is to make vhost_get_avail_idx do what the comment
> says it does and report whether new entries have been added.
>
> Reported-by: ShuangYu <shuangyu@yunyoo.cc>
> Fixes: d3bb267bbdcb ("vhost: cache avail index in vhost_enable_notify()")
> Cc: Stefano Garzarella <sgarzare@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>
Acked-by: Jason Wang <jasowang@redhat.com>
Thanks
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH RFC] vhost: fix vhost_get_avail_idx for a non empty ring
2026-03-02 8:51 [PATCH RFC] vhost: fix vhost_get_avail_idx for a non empty ring Michael S. Tsirkin
2026-03-02 14:30 ` Stefano Garzarella
2026-03-03 6:38 ` Jason Wang
@ 2026-03-22 9:43 ` ShuangYu
2 siblings, 0 replies; 5+ messages in thread
From: ShuangYu @ 2026-03-22 9:43 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: linux-kernel, Stefano Garzarella, Stefan Hajnoczi, Jason Wang,
Eugenio Pérez, kvm, virtualization, netdev
> From: "Michael S. Tsirkin"<mst@redhat.com>
> Date: Mon, Mar 2, 2026, 16:52
> Subject: [PATCH RFC] vhost: fix vhost_get_avail_idx for a non empty ring
> To: <linux-kernel@vger.kernel.org>
> Cc: "ShuangYu"<shuangyu@yunyoo.cc>, "Stefano Garzarella"<sgarzare@redhat.com>, "Stefan Hajnoczi"<stefanha@redhat.com>, "Jason Wang"<jasowang@redhat.com>, "Eugenio Pérez"<eperezma@redhat.com>, <kvm@vger.kernel.org>, <virtualization@lists.linux.dev>, <netdev@vger.kernel.org>
> vhost_get_avail_idx is supposed to report whether it has updated
> vq->avail_idx. Instead, it returns whether all entries have been
> consumed, which is usually the same. But not always - in
> drivers/vhost/net.c and when mergeable buffers have been enabled, the
> driver checks whether the combined entries are big enough to store an
> incoming packet. If not, the driver re-enables notifications with
> available entries still in the ring. The incorrect return value from
> vhost_get_avail_idx propagates through vhost_enable_notify and causes
> the host to livelock if the guest is not making progress, as vhost will
> immediately disable notifications and retry using the available entries.
>
> The obvious fix is to make vhost_get_avail_idx do what the comment
> says it does and report whether new entries have been added.
>
> Reported-by: ShuangYu <shuangyu@yunyoo.cc>
> Fixes: d3bb267bbdcb ("vhost: cache avail index in vhost_enable_notify()")
> Cc: Stefano Garzarella <sgarzare@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>
> Lightly tested, posting early to simplify testing for the reporter.
>
> drivers/vhost/vhost.c | 11 +++++++----
> 1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 2f2c45d20883..db329a6f6145 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -1522,6 +1522,7 @@ static void vhost_dev_unlock_vqs(struct vhost_dev *d)
> static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq)
> {
> __virtio16 idx;
> + u16 avail_idx;
> int r;
>
> r = vhost_get_avail(vq, idx, &vq->avail->idx);
> @@ -1532,17 +1533,19 @@ static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq)
> }
>
> /* Check it isn't doing very strange thing with available indexes */
> - vq->avail_idx = vhost16_to_cpu(vq, idx);
> - if (unlikely((u16)(vq->avail_idx - vq->last_avail_idx) > vq->num)) {
> + avail_idx = vhost16_to_cpu(vq, idx);
> + if (unlikely((u16)(avail_idx - vq->last_avail_idx) > vq->num)) {
> vq_err(vq, "Invalid available index change from %u to %u",
> - vq->last_avail_idx, vq->avail_idx);
> + vq->last_avail_idx, avail_idx);
> return -EINVAL;
> }
>
> /* We're done if there is nothing new */
> - if (vq->avail_idx == vq->last_avail_idx)
> + if (avail_idx == vq->avail_idx)
> return 0;
>
> + vq->avail_idx = avail_idx;
> +
> /*
> * We updated vq->avail_idx so we need a memory barrier between
> * the index read above and the caller reading avail ring entries.
> --
> MST
>
Hi,
Sorry for the delay, it took me some time to build a reliable test environment.
I've verified the patch on 6.18.10. With 16 concurrent TCP flows over
virtio-net (TSO enabled, vCPU throttled to 1%, 300s duration):
Before patch (bpftrace + CPU monitor):
- vhost_discard_vq_desc: up to 3,716,272/3s
- vhost_enable_notify false positives: up to 3,716,278/3s
(nearly 1:1 with discard — every partial alloc triggers re-loop)
- vhost worker CPU: 0~99%, frequently 50-99%
- successful receives: as low as 137/3s
After patch:
- vhost_discard_vq_desc: 9~33/3s
- vhost_enable_notify false positives: 0 (all 100 sample windows)
- vhost worker CPU: 0% (all 149 sample points)
- successful receives: 873~2,667/3s
The partial allocations still occur under load, but after the fix
vhost_get_avail_idx correctly returns 0, so the worker sleeps
instead of spinning. Thanks.
Tested-by: ShuangYu <shuangyu@yunyoo.cc>
Best regards,
ShuangYu
^ permalink raw reply [flat|nested] 5+ messages in thread