virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] vhost: Fix stale available ring entries
@ 2024-03-26 23:38 Gavin Shan
  2024-03-26 23:38 ` [PATCH v2 1/2] vhost: Add smp_rmb() in vhost_vq_avail_empty() Gavin Shan
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Gavin Shan @ 2024-03-26 23:38 UTC (permalink / raw)
  To: virtualization
  Cc: linux-kernel, mst, jasowang, davem, stefanha, sgarzare, keirf,
	yihyu, shan.gavin

The issue was reported by Yihuang Yu on NVidia's grace-hopper (ARM64)
platform. The wrong head (available ring entry) is seen by the guest
when running 'netperf' on the guest and running 'netserver' on another
NVidia's grace-grace machine.

  /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64      \
  -accel kvm -machine virt,gic-version=host -cpu host          \
  -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \
  -m 4096M,slots=16,maxmem=64G                                 \
  -object memory-backend-ram,id=mem0,size=4096M                \
   :                                                           \
  -netdev tap,id=tap0,vhost=true                               \
  -device virtio-net-pci,bus=pcie.8,netdev=tap0,mac=52:54:00:f1:26:b0
   :
  guest# ifconfig eth0 | grep 'inet addr'
  inet addr:10.26.1.220
  guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM
  virtio_net virtio0: output.0:id 100 is not a head!

There is missed smp_rmb() in vhost_vq_avail_empty() and vhost_enable_notify().
Without smp_rmb(), vq->avail_idx is increased but the available ring
entries aren't arriving to vhost side yet. So a stale available ring
entry can be fetched in vhost_get_vq_desc().

Fix it by adding smp_rmb() in those two functions. Note that I need
two patches so that they can be easily picked up by the stable kernel.
With the changes, I'm unable to hit the issue again.

Gavin Shan (2):
  vhost: Add smp_rmb() in vhost_vq_avail_empty()
  vhost: Add smp_rmb() in vhost_enable_notify()

 drivers/vhost/vhost.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

-- 
2.44.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/2] vhost: Add smp_rmb() in vhost_vq_avail_empty()
  2024-03-26 23:38 [PATCH v2 0/2] vhost: Fix stale available ring entries Gavin Shan
@ 2024-03-26 23:38 ` Gavin Shan
  2024-03-27  2:34   ` Jason Wang
  2024-03-27 12:07   ` Michael S. Tsirkin
  2024-03-26 23:38 ` [PATCH v2 2/2] vhost: Add smp_rmb() in vhost_enable_notify() Gavin Shan
  2024-03-26 23:55 ` [PATCH v2 0/2] vhost: Fix stale available ring entries Gavin Shan
  2 siblings, 2 replies; 14+ messages in thread
From: Gavin Shan @ 2024-03-26 23:38 UTC (permalink / raw)
  To: virtualization
  Cc: linux-kernel, mst, jasowang, davem, stefanha, sgarzare, keirf,
	yihyu, shan.gavin

A smp_rmb() has been missed in vhost_vq_avail_empty(), spotted by
Will Deacon <will@kernel.org>. Otherwise, it's not ensured the
available ring entries pushed by guest can be observed by vhost
in time, leading to stale available ring entries fetched by vhost
in vhost_get_vq_desc(), as reported by Yihuang Yu on NVidia's
grace-hopper (ARM64) platform.

  /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64      \
  -accel kvm -machine virt,gic-version=host -cpu host          \
  -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \
  -m 4096M,slots=16,maxmem=64G                                 \
  -object memory-backend-ram,id=mem0,size=4096M                \
   :                                                           \
  -netdev tap,id=vnet0,vhost=true                              \
  -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0
   :
  guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM
  virtio_net virtio0: output.0:id 100 is not a head!

Add the missed smp_rmb() in vhost_vq_avail_empty(). Note that it
should be safe until vq->avail_idx is changed by commit 275bf960ac697
("vhost: better detection of available buffers").

Fixes: 275bf960ac697 ("vhost: better detection of available buffers")
Cc: <stable@kernel.org> # v4.11+
Reported-by: Yihuang Yu <yihyu@redhat.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 drivers/vhost/vhost.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 045f666b4f12..00445ab172b3 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2799,9 +2799,18 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 	r = vhost_get_avail_idx(vq, &avail_idx);
 	if (unlikely(r))
 		return false;
+
 	vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
+	if (vq->avail_idx != vq->last_avail_idx) {
+		/* Similar to what's done in vhost_get_vq_desc(), we need
+		 * to ensure the available ring entries have been exposed
+		 * by guest.
+		 */
+		smp_rmb();
+		return false;
+	}
 
-	return vq->avail_idx == vq->last_avail_idx;
+	return true;
 }
 EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/2] vhost: Add smp_rmb() in vhost_enable_notify()
  2024-03-26 23:38 [PATCH v2 0/2] vhost: Fix stale available ring entries Gavin Shan
  2024-03-26 23:38 ` [PATCH v2 1/2] vhost: Add smp_rmb() in vhost_vq_avail_empty() Gavin Shan
@ 2024-03-26 23:38 ` Gavin Shan
  2024-03-27  2:41   ` Jason Wang
  2024-03-26 23:55 ` [PATCH v2 0/2] vhost: Fix stale available ring entries Gavin Shan
  2 siblings, 1 reply; 14+ messages in thread
From: Gavin Shan @ 2024-03-26 23:38 UTC (permalink / raw)
  To: virtualization
  Cc: linux-kernel, mst, jasowang, davem, stefanha, sgarzare, keirf,
	yihyu, shan.gavin

A smp_rmb() has been missed in vhost_enable_notify(), inspired by
Will Deacon <will@kernel.org>. Otherwise, it's not ensured the
available ring entries pushed by guest can be observed by vhost
in time, leading to stale available ring entries fetched by vhost
in vhost_get_vq_desc(), as reported by Yihuang Yu on NVidia's
grace-hopper (ARM64) platform.

  /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64      \
  -accel kvm -machine virt,gic-version=host -cpu host          \
  -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \
  -m 4096M,slots=16,maxmem=64G                                 \
  -object memory-backend-ram,id=mem0,size=4096M                \
   :                                                           \
  -netdev tap,id=vnet0,vhost=true                              \
  -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0
   :
  guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM
  virtio_net virtio0: output.0:id 100 is not a head!

Add the missed smp_rmb() in vhost_enable_notify(). Note that it
should be safe until vq->avail_idx is changed by commit d3bb267bbdcb
("vhost: cache avail index in vhost_enable_notify()").

Fixes: d3bb267bbdcb ("vhost: cache avail index in vhost_enable_notify()")
Cc: <stable@kernel.org> # v5.18+
Reported-by: Yihuang Yu <yihyu@redhat.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 drivers/vhost/vhost.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 00445ab172b3..58f9d6a435f0 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2847,9 +2847,18 @@ bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 		       &vq->avail->idx, r);
 		return false;
 	}
+
 	vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
+	if (vq->avail_idx != vq->last_avail_idx) {
+		/* Similar to what's done in vhost_get_vq_desc(), we need
+		 * to ensure the available ring entries have been exposed
+		 * by guest.
+		 */
+		smp_rmb();
+		return true;
+	}
 
-	return vq->avail_idx != vq->last_avail_idx;
+	return false;
 }
 EXPORT_SYMBOL_GPL(vhost_enable_notify);
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 0/2] vhost: Fix stale available ring entries
  2024-03-26 23:38 [PATCH v2 0/2] vhost: Fix stale available ring entries Gavin Shan
  2024-03-26 23:38 ` [PATCH v2 1/2] vhost: Add smp_rmb() in vhost_vq_avail_empty() Gavin Shan
  2024-03-26 23:38 ` [PATCH v2 2/2] vhost: Add smp_rmb() in vhost_enable_notify() Gavin Shan
@ 2024-03-26 23:55 ` Gavin Shan
  2 siblings, 0 replies; 14+ messages in thread
From: Gavin Shan @ 2024-03-26 23:55 UTC (permalink / raw)
  To: virtualization
  Cc: linux-kernel, mst, jasowang, davem, stefanha, sgarzare, keirf,
	yihyu, shan.gavin, Will Deacon

On 3/27/24 09:38, Gavin Shan wrote:
> The issue was reported by Yihuang Yu on NVidia's grace-hopper (ARM64)
> platform. The wrong head (available ring entry) is seen by the guest
> when running 'netperf' on the guest and running 'netserver' on another
> NVidia's grace-grace machine.
> 
>    /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64      \
>    -accel kvm -machine virt,gic-version=host -cpu host          \
>    -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \
>    -m 4096M,slots=16,maxmem=64G                                 \
>    -object memory-backend-ram,id=mem0,size=4096M                \
>     :                                                           \
>    -netdev tap,id=tap0,vhost=true                               \
>    -device virtio-net-pci,bus=pcie.8,netdev=tap0,mac=52:54:00:f1:26:b0
>     :
>    guest# ifconfig eth0 | grep 'inet addr'
>    inet addr:10.26.1.220
>    guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM
>    virtio_net virtio0: output.0:id 100 is not a head!
> 
> There is missed smp_rmb() in vhost_vq_avail_empty() and vhost_enable_notify().
> Without smp_rmb(), vq->avail_idx is increased but the available ring
> entries aren't arriving to vhost side yet. So a stale available ring
> entry can be fetched in vhost_get_vq_desc().
> 
> Fix it by adding smp_rmb() in those two functions. Note that I need
> two patches so that they can be easily picked up by the stable kernel.
> With the changes, I'm unable to hit the issue again.
> 
> Gavin Shan (2):
>    vhost: Add smp_rmb() in vhost_vq_avail_empty()
>    vhost: Add smp_rmb() in vhost_enable_notify()
> 
>   drivers/vhost/vhost.c | 22 ++++++++++++++++++++--
>   1 file changed, 20 insertions(+), 2 deletions(-)
> 

Sorry, I was supposed to copy Will. Amending for it.

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] vhost: Add smp_rmb() in vhost_vq_avail_empty()
  2024-03-26 23:38 ` [PATCH v2 1/2] vhost: Add smp_rmb() in vhost_vq_avail_empty() Gavin Shan
@ 2024-03-27  2:34   ` Jason Wang
  2024-03-27  2:44     ` Jason Wang
  2024-03-27 12:07   ` Michael S. Tsirkin
  1 sibling, 1 reply; 14+ messages in thread
From: Jason Wang @ 2024-03-27  2:34 UTC (permalink / raw)
  To: Gavin Shan
  Cc: virtualization, linux-kernel, mst, davem, stefanha, sgarzare,
	keirf, yihyu, shan.gavin

On Wed, Mar 27, 2024 at 7:39 AM Gavin Shan <gshan@redhat.com> wrote:
>
> A smp_rmb() has been missed in vhost_vq_avail_empty(), spotted by
> Will Deacon <will@kernel.org>. Otherwise, it's not ensured the
> available ring entries pushed by guest can be observed by vhost
> in time, leading to stale available ring entries fetched by vhost
> in vhost_get_vq_desc(), as reported by Yihuang Yu on NVidia's
> grace-hopper (ARM64) platform.
>
>   /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64      \
>   -accel kvm -machine virt,gic-version=host -cpu host          \
>   -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \
>   -m 4096M,slots=16,maxmem=64G                                 \
>   -object memory-backend-ram,id=mem0,size=4096M                \
>    :                                                           \
>   -netdev tap,id=vnet0,vhost=true                              \
>   -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0
>    :
>   guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM
>   virtio_net virtio0: output.0:id 100 is not a head!
>
> Add the missed smp_rmb() in vhost_vq_avail_empty(). Note that it
> should be safe until vq->avail_idx is changed by commit 275bf960ac697
> ("vhost: better detection of available buffers").
>
> Fixes: 275bf960ac697 ("vhost: better detection of available buffers")
> Cc: <stable@kernel.org> # v4.11+
> Reported-by: Yihuang Yu <yihyu@redhat.com>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
>  drivers/vhost/vhost.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 045f666b4f12..00445ab172b3 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -2799,9 +2799,18 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
>         r = vhost_get_avail_idx(vq, &avail_idx);
>         if (unlikely(r))
>                 return false;
> +
>         vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
> +       if (vq->avail_idx != vq->last_avail_idx) {
> +               /* Similar to what's done in vhost_get_vq_desc(), we need
> +                * to ensure the available ring entries have been exposed
> +                * by guest.
> +                */

We need to be more verbose here. For example, which load needs to be
ordered with which load.

The rmb in vhost_get_vq_desc() is used to order the load of avail idx
and the load of head. It is paired with e.g virtio_wmb() in
virtqueue_add_split().

vhost_vq_avail_empty() are mostly used as a hint in
vhost_net_busy_poll() which is under the protection of the vq mutex.

An exception is the tx_can_batch(), but in that case it doesn't even
want to read the head.

Thanks


> +               smp_rmb();
> +               return false;
> +       }
>
> -       return vq->avail_idx == vq->last_avail_idx;
> +       return true;
>  }
>  EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);
>
> --
> 2.44.0
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] vhost: Add smp_rmb() in vhost_enable_notify()
  2024-03-26 23:38 ` [PATCH v2 2/2] vhost: Add smp_rmb() in vhost_enable_notify() Gavin Shan
@ 2024-03-27  2:41   ` Jason Wang
  2024-03-27  4:10     ` Gavin Shan
  0 siblings, 1 reply; 14+ messages in thread
From: Jason Wang @ 2024-03-27  2:41 UTC (permalink / raw)
  To: Gavin Shan
  Cc: virtualization, linux-kernel, mst, davem, stefanha, sgarzare,
	keirf, yihyu, shan.gavin

On Wed, Mar 27, 2024 at 7:39 AM Gavin Shan <gshan@redhat.com> wrote:
>
> A smp_rmb() has been missed in vhost_enable_notify(), inspired by
> Will Deacon <will@kernel.org>. Otherwise, it's not ensured the
> available ring entries pushed by guest can be observed by vhost
> in time, leading to stale available ring entries fetched by vhost
> in vhost_get_vq_desc(), as reported by Yihuang Yu on NVidia's
> grace-hopper (ARM64) platform.
>
>   /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64      \
>   -accel kvm -machine virt,gic-version=host -cpu host          \
>   -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \
>   -m 4096M,slots=16,maxmem=64G                                 \
>   -object memory-backend-ram,id=mem0,size=4096M                \
>    :                                                           \
>   -netdev tap,id=vnet0,vhost=true                              \
>   -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0
>    :
>   guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM
>   virtio_net virtio0: output.0:id 100 is not a head!
>
> Add the missed smp_rmb() in vhost_enable_notify(). Note that it
> should be safe until vq->avail_idx is changed by commit d3bb267bbdcb
> ("vhost: cache avail index in vhost_enable_notify()").
>
> Fixes: d3bb267bbdcb ("vhost: cache avail index in vhost_enable_notify()")
> Cc: <stable@kernel.org> # v5.18+
> Reported-by: Yihuang Yu <yihyu@redhat.com>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
>  drivers/vhost/vhost.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 00445ab172b3..58f9d6a435f0 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -2847,9 +2847,18 @@ bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
>                        &vq->avail->idx, r);
>                 return false;
>         }
> +
>         vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
> +       if (vq->avail_idx != vq->last_avail_idx) {
> +               /* Similar to what's done in vhost_get_vq_desc(), we need
> +                * to ensure the available ring entries have been exposed
> +                * by guest.
> +                */
> +               smp_rmb();
> +               return true;
> +       }
>
> -       return vq->avail_idx != vq->last_avail_idx;
> +       return false;

So we only care about the case when vhost_enable_notify() returns true.

In that case, I think you want to order with vhost_get_vq_desc():

last_avail_idx = vq->last_avail_idx;

if (vq->avail_idx == vq->last_avail_idx) { /* false */
}

vhost_get_avail_head(vq, &ring_head, last_avail_idx)

Assuming I understand the patch correctly.

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks

>  }
>  EXPORT_SYMBOL_GPL(vhost_enable_notify);
>
> --
> 2.44.0
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] vhost: Add smp_rmb() in vhost_vq_avail_empty()
  2024-03-27  2:34   ` Jason Wang
@ 2024-03-27  2:44     ` Jason Wang
  2024-03-27  4:08       ` Gavin Shan
  0 siblings, 1 reply; 14+ messages in thread
From: Jason Wang @ 2024-03-27  2:44 UTC (permalink / raw)
  To: Gavin Shan
  Cc: virtualization, linux-kernel, mst, davem, stefanha, sgarzare,
	keirf, yihyu, shan.gavin

On Wed, Mar 27, 2024 at 10:34 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Wed, Mar 27, 2024 at 7:39 AM Gavin Shan <gshan@redhat.com> wrote:
> >
> > A smp_rmb() has been missed in vhost_vq_avail_empty(), spotted by
> > Will Deacon <will@kernel.org>. Otherwise, it's not ensured the
> > available ring entries pushed by guest can be observed by vhost
> > in time, leading to stale available ring entries fetched by vhost
> > in vhost_get_vq_desc(), as reported by Yihuang Yu on NVidia's
> > grace-hopper (ARM64) platform.
> >
> >   /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64      \
> >   -accel kvm -machine virt,gic-version=host -cpu host          \
> >   -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \
> >   -m 4096M,slots=16,maxmem=64G                                 \
> >   -object memory-backend-ram,id=mem0,size=4096M                \
> >    :                                                           \
> >   -netdev tap,id=vnet0,vhost=true                              \
> >   -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0
> >    :
> >   guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM
> >   virtio_net virtio0: output.0:id 100 is not a head!
> >
> > Add the missed smp_rmb() in vhost_vq_avail_empty(). Note that it
> > should be safe until vq->avail_idx is changed by commit 275bf960ac697
> > ("vhost: better detection of available buffers").
> >
> > Fixes: 275bf960ac697 ("vhost: better detection of available buffers")
> > Cc: <stable@kernel.org> # v4.11+
> > Reported-by: Yihuang Yu <yihyu@redhat.com>
> > Signed-off-by: Gavin Shan <gshan@redhat.com>
> > ---
> >  drivers/vhost/vhost.c | 11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > index 045f666b4f12..00445ab172b3 100644
> > --- a/drivers/vhost/vhost.c
> > +++ b/drivers/vhost/vhost.c
> > @@ -2799,9 +2799,18 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
> >         r = vhost_get_avail_idx(vq, &avail_idx);
> >         if (unlikely(r))
> >                 return false;
> > +
> >         vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
> > +       if (vq->avail_idx != vq->last_avail_idx) {
> > +               /* Similar to what's done in vhost_get_vq_desc(), we need
> > +                * to ensure the available ring entries have been exposed
> > +                * by guest.
> > +                */
>
> We need to be more verbose here. For example, which load needs to be
> ordered with which load.
>
> The rmb in vhost_get_vq_desc() is used to order the load of avail idx
> and the load of head. It is paired with e.g virtio_wmb() in
> virtqueue_add_split().
>
> vhost_vq_avail_empty() are mostly used as a hint in
> vhost_net_busy_poll() which is under the protection of the vq mutex.
>
> An exception is the tx_can_batch(), but in that case it doesn't even
> want to read the head.

Ok, if it is needed only in that path, maybe we can move the barriers there.

Thanks

>
> Thanks
>
>
> > +               smp_rmb();
> > +               return false;
> > +       }
> >
> > -       return vq->avail_idx == vq->last_avail_idx;
> > +       return true;
> >  }
> >  EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);
> >
> > --
> > 2.44.0
> >


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] vhost: Add smp_rmb() in vhost_vq_avail_empty()
  2024-03-27  2:44     ` Jason Wang
@ 2024-03-27  4:08       ` Gavin Shan
  2024-03-27  7:35         ` Gavin Shan
  0 siblings, 1 reply; 14+ messages in thread
From: Gavin Shan @ 2024-03-27  4:08 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, linux-kernel, mst, davem, stefanha, sgarzare,
	keirf, yihyu, shan.gavin, Will Deacon

On 3/27/24 12:44, Jason Wang wrote:
> On Wed, Mar 27, 2024 at 10:34 AM Jason Wang <jasowang@redhat.com> wrote:
>> On Wed, Mar 27, 2024 at 7:39 AM Gavin Shan <gshan@redhat.com> wrote:
>>>
>>> A smp_rmb() has been missed in vhost_vq_avail_empty(), spotted by
>>> Will Deacon <will@kernel.org>. Otherwise, it's not ensured the
>>> available ring entries pushed by guest can be observed by vhost
>>> in time, leading to stale available ring entries fetched by vhost
>>> in vhost_get_vq_desc(), as reported by Yihuang Yu on NVidia's
>>> grace-hopper (ARM64) platform.
>>>
>>>    /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64      \
>>>    -accel kvm -machine virt,gic-version=host -cpu host          \
>>>    -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \
>>>    -m 4096M,slots=16,maxmem=64G                                 \
>>>    -object memory-backend-ram,id=mem0,size=4096M                \
>>>     :                                                           \
>>>    -netdev tap,id=vnet0,vhost=true                              \
>>>    -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0
>>>     :
>>>    guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM
>>>    virtio_net virtio0: output.0:id 100 is not a head!
>>>
>>> Add the missed smp_rmb() in vhost_vq_avail_empty(). Note that it
>>> should be safe until vq->avail_idx is changed by commit 275bf960ac697
>>> ("vhost: better detection of available buffers").
>>>
>>> Fixes: 275bf960ac697 ("vhost: better detection of available buffers")
>>> Cc: <stable@kernel.org> # v4.11+
>>> Reported-by: Yihuang Yu <yihyu@redhat.com>
>>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>>> ---
>>>   drivers/vhost/vhost.c | 11 ++++++++++-
>>>   1 file changed, 10 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>>> index 045f666b4f12..00445ab172b3 100644
>>> --- a/drivers/vhost/vhost.c
>>> +++ b/drivers/vhost/vhost.c
>>> @@ -2799,9 +2799,18 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
>>>          r = vhost_get_avail_idx(vq, &avail_idx);
>>>          if (unlikely(r))
>>>                  return false;
>>> +
>>>          vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
>>> +       if (vq->avail_idx != vq->last_avail_idx) {
>>> +               /* Similar to what's done in vhost_get_vq_desc(), we need
>>> +                * to ensure the available ring entries have been exposed
>>> +                * by guest.
>>> +                */
>>
>> We need to be more verbose here. For example, which load needs to be
>> ordered with which load.
>>
>> The rmb in vhost_get_vq_desc() is used to order the load of avail idx
>> and the load of head. It is paired with e.g virtio_wmb() in
>> virtqueue_add_split().
>>
>> vhost_vq_avail_empty() are mostly used as a hint in
>> vhost_net_busy_poll() which is under the protection of the vq mutex.
>>
>> An exception is the tx_can_batch(), but in that case it doesn't even
>> want to read the head.
> 
> Ok, if it is needed only in that path, maybe we can move the barriers there.
> 

[cc Will Deacon]

Jason, appreciate for your review and comments. I think PATCH[1/2] is
the fix for the hypothesis, meaning PATCH[2/2] is the real fix. However,
it would be nice to fix all of them in one shoot. I will try with PATCH[2/2]
only to see if our issue will disappear or not. However, the issue still
exists if PATCH[2/2] is missed.

Firstly, We were failing on the transmit queue and {tvq, rvq}->busyloop_timeout
== false if I remember correctly. So the added smp_rmb() in vhost_vq_avail_empty()
is only a concern to tx_can_batch(). A mutex isn't enough to ensure the order
for the available index and available ring entry (head). For example, vhost_vq_avail_empty()
called by tx_can_batch() can see next available index, but its corresponding
available ring entry (head) may not be seen by vhost yet if smp_rmb() is missed.
The next call to get_tx_bufs(), where the available ring entry (head) doesn't
arrived yet, leading to stale available ring entry (head) being fetched.

   handle_tx_copy
     get_tx_bufs                 // smp_rmb() won't be executed when vq->avail_idx != vq->last_avail_idx
     tx_can_batch
       vhost_vq_avail_empty      // vq->avail_idx is updated from vq->avail->idx

The reason why I added smp_rmb() to vhost_vq_avail_empty() is because the function
is a exposed API, even it's only used by drivers/vhost/net.c at present. It means
the API has been broken internally. So it seems more appropriate to fix it up in
vhost_vq_avail_empty() so that the API's users needn't worry about the memory access
order.

>>
>>
>>> +               smp_rmb();
>>> +               return false;
>>> +       }
>>>
>>> -       return vq->avail_idx == vq->last_avail_idx;
>>> +       return true;
>>>   }
>>>   EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] vhost: Add smp_rmb() in vhost_enable_notify()
  2024-03-27  2:41   ` Jason Wang
@ 2024-03-27  4:10     ` Gavin Shan
  0 siblings, 0 replies; 14+ messages in thread
From: Gavin Shan @ 2024-03-27  4:10 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, linux-kernel, mst, davem, stefanha, sgarzare,
	keirf, yihyu, shan.gavin, Will Deacon

On 3/27/24 12:41, Jason Wang wrote:
> On Wed, Mar 27, 2024 at 7:39 AM Gavin Shan <gshan@redhat.com> wrote:
>>
>> A smp_rmb() has been missed in vhost_enable_notify(), inspired by
>> Will Deacon <will@kernel.org>. Otherwise, it's not ensured the
>> available ring entries pushed by guest can be observed by vhost
>> in time, leading to stale available ring entries fetched by vhost
>> in vhost_get_vq_desc(), as reported by Yihuang Yu on NVidia's
>> grace-hopper (ARM64) platform.
>>
>>    /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64      \
>>    -accel kvm -machine virt,gic-version=host -cpu host          \
>>    -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \
>>    -m 4096M,slots=16,maxmem=64G                                 \
>>    -object memory-backend-ram,id=mem0,size=4096M                \
>>     :                                                           \
>>    -netdev tap,id=vnet0,vhost=true                              \
>>    -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0
>>     :
>>    guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM
>>    virtio_net virtio0: output.0:id 100 is not a head!
>>
>> Add the missed smp_rmb() in vhost_enable_notify(). Note that it
>> should be safe until vq->avail_idx is changed by commit d3bb267bbdcb
>> ("vhost: cache avail index in vhost_enable_notify()").
>>
>> Fixes: d3bb267bbdcb ("vhost: cache avail index in vhost_enable_notify()")
>> Cc: <stable@kernel.org> # v5.18+
>> Reported-by: Yihuang Yu <yihyu@redhat.com>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> ---
>>   drivers/vhost/vhost.c | 11 ++++++++++-
>>   1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>> index 00445ab172b3..58f9d6a435f0 100644
>> --- a/drivers/vhost/vhost.c
>> +++ b/drivers/vhost/vhost.c
>> @@ -2847,9 +2847,18 @@ bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
>>                         &vq->avail->idx, r);
>>                  return false;
>>          }
>> +
>>          vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
>> +       if (vq->avail_idx != vq->last_avail_idx) {
>> +               /* Similar to what's done in vhost_get_vq_desc(), we need
>> +                * to ensure the available ring entries have been exposed
>> +                * by guest.
>> +                */
>> +               smp_rmb();
>> +               return true;
>> +       }
>>
>> -       return vq->avail_idx != vq->last_avail_idx;
>> +       return false;
> 
> So we only care about the case when vhost_enable_notify() returns true.
> 
> In that case, I think you want to order with vhost_get_vq_desc():
> 
> last_avail_idx = vq->last_avail_idx;
> 
> if (vq->avail_idx == vq->last_avail_idx) { /* false */
> }
> 
> vhost_get_avail_head(vq, &ring_head, last_avail_idx)
> 
> Assuming I understand the patch correctly.
> 
> Acked-by: Jason Wang <jasowang@redhat.com>
> 

Jason, thanks for your review and comments. Your understanding is exactly
what I understood.

> 
>>   }
>>   EXPORT_SYMBOL_GPL(vhost_enable_notify);
>>

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] vhost: Add smp_rmb() in vhost_vq_avail_empty()
  2024-03-27  4:08       ` Gavin Shan
@ 2024-03-27  7:35         ` Gavin Shan
  2024-03-27  7:42           ` Jason Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Gavin Shan @ 2024-03-27  7:35 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, linux-kernel, mst, davem, stefanha, sgarzare,
	keirf, yihyu, shan.gavin, Will Deacon

On 3/27/24 14:08, Gavin Shan wrote:
> On 3/27/24 12:44, Jason Wang wrote:
>> On Wed, Mar 27, 2024 at 10:34 AM Jason Wang <jasowang@redhat.com> wrote:
>>> On Wed, Mar 27, 2024 at 7:39 AM Gavin Shan <gshan@redhat.com> wrote:
>>>>
>>>> A smp_rmb() has been missed in vhost_vq_avail_empty(), spotted by
>>>> Will Deacon <will@kernel.org>. Otherwise, it's not ensured the
>>>> available ring entries pushed by guest can be observed by vhost
>>>> in time, leading to stale available ring entries fetched by vhost
>>>> in vhost_get_vq_desc(), as reported by Yihuang Yu on NVidia's
>>>> grace-hopper (ARM64) platform.
>>>>
>>>>    /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64      \
>>>>    -accel kvm -machine virt,gic-version=host -cpu host          \
>>>>    -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \
>>>>    -m 4096M,slots=16,maxmem=64G                                 \
>>>>    -object memory-backend-ram,id=mem0,size=4096M                \
>>>>     :                                                           \
>>>>    -netdev tap,id=vnet0,vhost=true                              \
>>>>    -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0
>>>>     :
>>>>    guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM
>>>>    virtio_net virtio0: output.0:id 100 is not a head!
>>>>
>>>> Add the missed smp_rmb() in vhost_vq_avail_empty(). Note that it
>>>> should be safe until vq->avail_idx is changed by commit 275bf960ac697
>>>> ("vhost: better detection of available buffers").
>>>>
>>>> Fixes: 275bf960ac697 ("vhost: better detection of available buffers")
>>>> Cc: <stable@kernel.org> # v4.11+
>>>> Reported-by: Yihuang Yu <yihyu@redhat.com>
>>>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>>>> ---
>>>>   drivers/vhost/vhost.c | 11 ++++++++++-
>>>>   1 file changed, 10 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>>>> index 045f666b4f12..00445ab172b3 100644
>>>> --- a/drivers/vhost/vhost.c
>>>> +++ b/drivers/vhost/vhost.c
>>>> @@ -2799,9 +2799,18 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
>>>>          r = vhost_get_avail_idx(vq, &avail_idx);
>>>>          if (unlikely(r))
>>>>                  return false;
>>>> +
>>>>          vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
>>>> +       if (vq->avail_idx != vq->last_avail_idx) {
>>>> +               /* Similar to what's done in vhost_get_vq_desc(), we need
>>>> +                * to ensure the available ring entries have been exposed
>>>> +                * by guest.
>>>> +                */
>>>
>>> We need to be more verbose here. For example, which load needs to be
>>> ordered with which load.
>>>
>>> The rmb in vhost_get_vq_desc() is used to order the load of avail idx
>>> and the load of head. It is paired with e.g virtio_wmb() in
>>> virtqueue_add_split().
>>>
>>> vhost_vq_avail_empty() are mostly used as a hint in
>>> vhost_net_busy_poll() which is under the protection of the vq mutex.
>>>
>>> An exception is the tx_can_batch(), but in that case it doesn't even
>>> want to read the head.
>>
>> Ok, if it is needed only in that path, maybe we can move the barriers there.
>>
> 
> [cc Will Deacon]
> 
> Jason, appreciate for your review and comments. I think PATCH[1/2] is
> the fix for the hypothesis, meaning PATCH[2/2] is the real fix. However,
> it would be nice to fix all of them in one shoot. I will try with PATCH[2/2]
> only to see if our issue will disappear or not. However, the issue still
> exists if PATCH[2/2] is missed.
> 

Jason, PATCH[2/2] is sufficient to fix our current issue. I tried with PATCH[2/2]
only and unable to hit the issue. However, PATCH[1/2] may be needed by other scenarios.
So it would be nice to fix them in one shoot.


> Firstly, We were failing on the transmit queue and {tvq, rvq}->busyloop_timeout
> == false if I remember correctly. So the added smp_rmb() in vhost_vq_avail_empty()
> is only a concern to tx_can_batch(). A mutex isn't enough to ensure the order
> for the available index and available ring entry (head). For example, vhost_vq_avail_empty()
> called by tx_can_batch() can see next available index, but its corresponding
> available ring entry (head) may not be seen by vhost yet if smp_rmb() is missed.
> The next call to get_tx_bufs(), where the available ring entry (head) doesn't
> arrived yet, leading to stale available ring entry (head) being fetched.
> 
>    handle_tx_copy
>      get_tx_bufs                 // smp_rmb() won't be executed when vq->avail_idx != vq->last_avail_idx
>      tx_can_batch
>        vhost_vq_avail_empty      // vq->avail_idx is updated from vq->avail->idx
> 
> The reason why I added smp_rmb() to vhost_vq_avail_empty() is because the function
> is a exposed API, even it's only used by drivers/vhost/net.c at present. It means
> the API has been broken internally. So it seems more appropriate to fix it up in
> vhost_vq_avail_empty() so that the API's users needn't worry about the memory access
> order.
> 
>>>
>>>
>>>> +               smp_rmb();
>>>> +               return false;
>>>> +       }
>>>>
>>>> -       return vq->avail_idx == vq->last_avail_idx;
>>>> +       return true;
>>>>   }
>>>>   EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] vhost: Add smp_rmb() in vhost_vq_avail_empty()
  2024-03-27  7:35         ` Gavin Shan
@ 2024-03-27  7:42           ` Jason Wang
  2024-03-28  0:27             ` Gavin Shan
  0 siblings, 1 reply; 14+ messages in thread
From: Jason Wang @ 2024-03-27  7:42 UTC (permalink / raw)
  To: Gavin Shan
  Cc: virtualization, linux-kernel, mst, davem, stefanha, sgarzare,
	keirf, yihyu, shan.gavin, Will Deacon

On Wed, Mar 27, 2024 at 3:35 PM Gavin Shan <gshan@redhat.com> wrote:
>
> On 3/27/24 14:08, Gavin Shan wrote:
> > On 3/27/24 12:44, Jason Wang wrote:
> >> On Wed, Mar 27, 2024 at 10:34 AM Jason Wang <jasowang@redhat.com> wrote:
> >>> On Wed, Mar 27, 2024 at 7:39 AM Gavin Shan <gshan@redhat.com> wrote:
> >>>>
> >>>> A smp_rmb() has been missed in vhost_vq_avail_empty(), spotted by
> >>>> Will Deacon <will@kernel.org>. Otherwise, it's not ensured the
> >>>> available ring entries pushed by guest can be observed by vhost
> >>>> in time, leading to stale available ring entries fetched by vhost
> >>>> in vhost_get_vq_desc(), as reported by Yihuang Yu on NVidia's
> >>>> grace-hopper (ARM64) platform.
> >>>>
> >>>>    /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64      \
> >>>>    -accel kvm -machine virt,gic-version=host -cpu host          \
> >>>>    -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \
> >>>>    -m 4096M,slots=16,maxmem=64G                                 \
> >>>>    -object memory-backend-ram,id=mem0,size=4096M                \
> >>>>     :                                                           \
> >>>>    -netdev tap,id=vnet0,vhost=true                              \
> >>>>    -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0
> >>>>     :
> >>>>    guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM
> >>>>    virtio_net virtio0: output.0:id 100 is not a head!
> >>>>
> >>>> Add the missed smp_rmb() in vhost_vq_avail_empty(). Note that it
> >>>> should be safe until vq->avail_idx is changed by commit 275bf960ac697
> >>>> ("vhost: better detection of available buffers").
> >>>>
> >>>> Fixes: 275bf960ac697 ("vhost: better detection of available buffers")
> >>>> Cc: <stable@kernel.org> # v4.11+
> >>>> Reported-by: Yihuang Yu <yihyu@redhat.com>
> >>>> Signed-off-by: Gavin Shan <gshan@redhat.com>
> >>>> ---
> >>>>   drivers/vhost/vhost.c | 11 ++++++++++-
> >>>>   1 file changed, 10 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> >>>> index 045f666b4f12..00445ab172b3 100644
> >>>> --- a/drivers/vhost/vhost.c
> >>>> +++ b/drivers/vhost/vhost.c
> >>>> @@ -2799,9 +2799,18 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
> >>>>          r = vhost_get_avail_idx(vq, &avail_idx);
> >>>>          if (unlikely(r))
> >>>>                  return false;
> >>>> +
> >>>>          vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
> >>>> +       if (vq->avail_idx != vq->last_avail_idx) {
> >>>> +               /* Similar to what's done in vhost_get_vq_desc(), we need
> >>>> +                * to ensure the available ring entries have been exposed
> >>>> +                * by guest.
> >>>> +                */
> >>>
> >>> We need to be more verbose here. For example, which load needs to be
> >>> ordered with which load.
> >>>
> >>> The rmb in vhost_get_vq_desc() is used to order the load of avail idx
> >>> and the load of head. It is paired with e.g virtio_wmb() in
> >>> virtqueue_add_split().
> >>>
> >>> vhost_vq_avail_empty() are mostly used as a hint in
> >>> vhost_net_busy_poll() which is under the protection of the vq mutex.
> >>>
> >>> An exception is the tx_can_batch(), but in that case it doesn't even
> >>> want to read the head.
> >>
> >> Ok, if it is needed only in that path, maybe we can move the barriers there.
> >>
> >
> > [cc Will Deacon]
> >
> > Jason, appreciate for your review and comments. I think PATCH[1/2] is
> > the fix for the hypothesis, meaning PATCH[2/2] is the real fix. However,
> > it would be nice to fix all of them in one shoot. I will try with PATCH[2/2]
> > only to see if our issue will disappear or not. However, the issue still
> > exists if PATCH[2/2] is missed.
> >
>
> Jason, PATCH[2/2] is sufficient to fix our current issue. I tried with PATCH[2/2]
> only and unable to hit the issue. However, PATCH[1/2] may be needed by other scenarios.
> So it would be nice to fix them in one shoot.

Yes, see below.

>
>
> > Firstly, We were failing on the transmit queue and {tvq, rvq}->busyloop_timeout
> > == false if I remember correctly. So the added smp_rmb() in vhost_vq_avail_empty()
> > is only a concern to tx_can_batch(). A mutex isn't enough to ensure the order
> > for the available index and available ring entry (head). For example, vhost_vq_avail_empty()
> > called by tx_can_batch() can see next available index, but its corresponding
> > available ring entry (head) may not be seen by vhost yet if smp_rmb() is missed.
> > The next call to get_tx_bufs(), where the available ring entry (head) doesn't
> > arrived yet, leading to stale available ring entry (head) being fetched.
> >
> >    handle_tx_copy
> >      get_tx_bufs                 // smp_rmb() won't be executed when vq->avail_idx != vq->last_avail_idx
> >      tx_can_batch
> >        vhost_vq_avail_empty      // vq->avail_idx is updated from vq->avail->idx
> >
> > The reason why I added smp_rmb() to vhost_vq_avail_empty() is because the function
> > is a exposed API, even it's only used by drivers/vhost/net.c at present. It means
> > the API has been broken internally. So it seems more appropriate to fix it up in
> > vhost_vq_avail_empty() so that the API's users needn't worry about the memory access
> > order.

When tx_can_batch returns true it means there's still pending tx
buffers. Since it might read indices so it still can bypass the
smp_rmb() in the vhost_get_vq_desc().

I'd suggest adding those above to change log.

With this,

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks

> >
> >>>
> >>>
> >>>> +               smp_rmb();
> >>>> +               return false;
> >>>> +       }
> >>>>
> >>>> -       return vq->avail_idx == vq->last_avail_idx;
> >>>> +       return true;
> >>>>   }
> >>>>   EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);
>
> Thanks,
> Gavin
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] vhost: Add smp_rmb() in vhost_vq_avail_empty()
  2024-03-26 23:38 ` [PATCH v2 1/2] vhost: Add smp_rmb() in vhost_vq_avail_empty() Gavin Shan
  2024-03-27  2:34   ` Jason Wang
@ 2024-03-27 12:07   ` Michael S. Tsirkin
  2024-03-28  0:26     ` Gavin Shan
  1 sibling, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2024-03-27 12:07 UTC (permalink / raw)
  To: Gavin Shan
  Cc: virtualization, linux-kernel, jasowang, davem, stefanha, sgarzare,
	keirf, yihyu, shan.gavin

On Wed, Mar 27, 2024 at 09:38:45AM +1000, Gavin Shan wrote:
> A smp_rmb() has been missed in vhost_vq_avail_empty(), spotted by
> Will Deacon <will@kernel.org>. Otherwise, it's not ensured the
> available ring entries pushed by guest can be observed by vhost
> in time, leading to stale available ring entries fetched by vhost
> in vhost_get_vq_desc(), as reported by Yihuang Yu on NVidia's
> grace-hopper (ARM64) platform.
> 
>   /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64      \
>   -accel kvm -machine virt,gic-version=host -cpu host          \
>   -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \
>   -m 4096M,slots=16,maxmem=64G                                 \
>   -object memory-backend-ram,id=mem0,size=4096M                \
>    :                                                           \
>   -netdev tap,id=vnet0,vhost=true                              \
>   -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0
>    :
>   guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM
>   virtio_net virtio0: output.0:id 100 is not a head!
> 
> Add the missed smp_rmb() in vhost_vq_avail_empty(). Note that it
> should be safe until vq->avail_idx is changed by commit 275bf960ac697
> ("vhost: better detection of available buffers").
> 
> Fixes: 275bf960ac697 ("vhost: better detection of available buffers")
> Cc: <stable@kernel.org> # v4.11+
> Reported-by: Yihuang Yu <yihyu@redhat.com>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
>  drivers/vhost/vhost.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 045f666b4f12..00445ab172b3 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -2799,9 +2799,18 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
>  	r = vhost_get_avail_idx(vq, &avail_idx);
>  	if (unlikely(r))
>  		return false;
> +
>  	vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
> +	if (vq->avail_idx != vq->last_avail_idx) {
> +		/* Similar to what's done in vhost_get_vq_desc(), we need
> +		 * to ensure the available ring entries have been exposed
> +		 * by guest.
> +		 */

A slightly clearer comment:

/* Since we have updated avail_idx, the following call to
 * vhost_get_vq_desc will read available ring entries.
 * Make sure that read happens after the avail_idx read.
 */

Pls repost with that, and I will apply.

Also add suggested-by for will.


> +		smp_rmb();
> +		return false;
> +	}
>  
> -	return vq->avail_idx == vq->last_avail_idx;
> +	return true;
>  }
>  EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);

As a follow-up patch, we should clean out code duplication that
accumulated with 3 places reading avail idx in essentially
the same way - this duplication is what causes the mess in
the 1st place.






> -- 
> 2.44.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] vhost: Add smp_rmb() in vhost_vq_avail_empty()
  2024-03-27 12:07   ` Michael S. Tsirkin
@ 2024-03-28  0:26     ` Gavin Shan
  0 siblings, 0 replies; 14+ messages in thread
From: Gavin Shan @ 2024-03-28  0:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, linux-kernel, jasowang, davem, stefanha, sgarzare,
	keirf, yihyu, shan.gavin

On 3/27/24 22:07, Michael S. Tsirkin wrote:
> On Wed, Mar 27, 2024 at 09:38:45AM +1000, Gavin Shan wrote:
>> A smp_rmb() has been missed in vhost_vq_avail_empty(), spotted by
>> Will Deacon <will@kernel.org>. Otherwise, it's not ensured the
>> available ring entries pushed by guest can be observed by vhost
>> in time, leading to stale available ring entries fetched by vhost
>> in vhost_get_vq_desc(), as reported by Yihuang Yu on NVidia's
>> grace-hopper (ARM64) platform.
>>
>>    /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64      \
>>    -accel kvm -machine virt,gic-version=host -cpu host          \
>>    -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \
>>    -m 4096M,slots=16,maxmem=64G                                 \
>>    -object memory-backend-ram,id=mem0,size=4096M                \
>>     :                                                           \
>>    -netdev tap,id=vnet0,vhost=true                              \
>>    -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0
>>     :
>>    guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM
>>    virtio_net virtio0: output.0:id 100 is not a head!
>>
>> Add the missed smp_rmb() in vhost_vq_avail_empty(). Note that it
>> should be safe until vq->avail_idx is changed by commit 275bf960ac697
>> ("vhost: better detection of available buffers").
>>
>> Fixes: 275bf960ac697 ("vhost: better detection of available buffers")
>> Cc: <stable@kernel.org> # v4.11+
>> Reported-by: Yihuang Yu <yihyu@redhat.com>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> ---
>>   drivers/vhost/vhost.c | 11 ++++++++++-
>>   1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>> index 045f666b4f12..00445ab172b3 100644
>> --- a/drivers/vhost/vhost.c
>> +++ b/drivers/vhost/vhost.c
>> @@ -2799,9 +2799,18 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
>>   	r = vhost_get_avail_idx(vq, &avail_idx);
>>   	if (unlikely(r))
>>   		return false;
>> +
>>   	vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
>> +	if (vq->avail_idx != vq->last_avail_idx) {
>> +		/* Similar to what's done in vhost_get_vq_desc(), we need
>> +		 * to ensure the available ring entries have been exposed
>> +		 * by guest.
>> +		 */
> 
> A slightly clearer comment:
> 
> /* Since we have updated avail_idx, the following call to
>   * vhost_get_vq_desc will read available ring entries.
>   * Make sure that read happens after the avail_idx read.
>   */
> 
> Pls repost with that, and I will apply.
> 
> Also add suggested-by for will.
> 

Sure, the suggested comments have been included to v3.

> 
>> +		smp_rmb();
>> +		return false;
>> +	}
>>   
>> -	return vq->avail_idx == vq->last_avail_idx;
>> +	return true;
>>   }
>>   EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);
> 
> As a follow-up patch, we should clean out code duplication that
> accumulated with 3 places reading avail idx in essentially
> the same way - this duplication is what causes the mess in
> the 1st place.
> 

Yes, nice idea. I've added PATCH[v3 3/3] to improve vhost_get_avail_idx()
to handle the memory barrier since all the callers have the concern.

v3: https://lore.kernel.org/virtualization/20240328002149.1141302-1-gshan@redhat.com/

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] vhost: Add smp_rmb() in vhost_vq_avail_empty()
  2024-03-27  7:42           ` Jason Wang
@ 2024-03-28  0:27             ` Gavin Shan
  0 siblings, 0 replies; 14+ messages in thread
From: Gavin Shan @ 2024-03-28  0:27 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, linux-kernel, mst, davem, stefanha, sgarzare,
	keirf, yihyu, shan.gavin, Will Deacon

On 3/27/24 17:42, Jason Wang wrote:
> On Wed, Mar 27, 2024 at 3:35 PM Gavin Shan <gshan@redhat.com> wrote:
>>
>> On 3/27/24 14:08, Gavin Shan wrote:
>>> On 3/27/24 12:44, Jason Wang wrote:
>>>> On Wed, Mar 27, 2024 at 10:34 AM Jason Wang <jasowang@redhat.com> wrote:
>>>>> On Wed, Mar 27, 2024 at 7:39 AM Gavin Shan <gshan@redhat.com> wrote:
>>>>>>
>>>>>> A smp_rmb() has been missed in vhost_vq_avail_empty(), spotted by
>>>>>> Will Deacon <will@kernel.org>. Otherwise, it's not ensured the
>>>>>> available ring entries pushed by guest can be observed by vhost
>>>>>> in time, leading to stale available ring entries fetched by vhost
>>>>>> in vhost_get_vq_desc(), as reported by Yihuang Yu on NVidia's
>>>>>> grace-hopper (ARM64) platform.
>>>>>>
>>>>>>     /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64      \
>>>>>>     -accel kvm -machine virt,gic-version=host -cpu host          \
>>>>>>     -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \
>>>>>>     -m 4096M,slots=16,maxmem=64G                                 \
>>>>>>     -object memory-backend-ram,id=mem0,size=4096M                \
>>>>>>      :                                                           \
>>>>>>     -netdev tap,id=vnet0,vhost=true                              \
>>>>>>     -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0
>>>>>>      :
>>>>>>     guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM
>>>>>>     virtio_net virtio0: output.0:id 100 is not a head!
>>>>>>
>>>>>> Add the missed smp_rmb() in vhost_vq_avail_empty(). Note that it
>>>>>> should be safe until vq->avail_idx is changed by commit 275bf960ac697
>>>>>> ("vhost: better detection of available buffers").
>>>>>>
>>>>>> Fixes: 275bf960ac697 ("vhost: better detection of available buffers")
>>>>>> Cc: <stable@kernel.org> # v4.11+
>>>>>> Reported-by: Yihuang Yu <yihyu@redhat.com>
>>>>>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>>>>>> ---
>>>>>>    drivers/vhost/vhost.c | 11 ++++++++++-
>>>>>>    1 file changed, 10 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>>>>>> index 045f666b4f12..00445ab172b3 100644
>>>>>> --- a/drivers/vhost/vhost.c
>>>>>> +++ b/drivers/vhost/vhost.c
>>>>>> @@ -2799,9 +2799,18 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
>>>>>>           r = vhost_get_avail_idx(vq, &avail_idx);
>>>>>>           if (unlikely(r))
>>>>>>                   return false;
>>>>>> +
>>>>>>           vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
>>>>>> +       if (vq->avail_idx != vq->last_avail_idx) {
>>>>>> +               /* Similar to what's done in vhost_get_vq_desc(), we need
>>>>>> +                * to ensure the available ring entries have been exposed
>>>>>> +                * by guest.
>>>>>> +                */
>>>>>
>>>>> We need to be more verbose here. For example, which load needs to be
>>>>> ordered with which load.
>>>>>
>>>>> The rmb in vhost_get_vq_desc() is used to order the load of avail idx
>>>>> and the load of head. It is paired with e.g virtio_wmb() in
>>>>> virtqueue_add_split().
>>>>>
>>>>> vhost_vq_avail_empty() are mostly used as a hint in
>>>>> vhost_net_busy_poll() which is under the protection of the vq mutex.
>>>>>
>>>>> An exception is the tx_can_batch(), but in that case it doesn't even
>>>>> want to read the head.
>>>>
>>>> Ok, if it is needed only in that path, maybe we can move the barriers there.
>>>>
>>>
>>> [cc Will Deacon]
>>>
>>> Jason, appreciate for your review and comments. I think PATCH[1/2] is
>>> the fix for the hypothesis, meaning PATCH[2/2] is the real fix. However,
>>> it would be nice to fix all of them in one shoot. I will try with PATCH[2/2]
>>> only to see if our issue will disappear or not. However, the issue still
>>> exists if PATCH[2/2] is missed.
>>>
>>
>> Jason, PATCH[2/2] is sufficient to fix our current issue. I tried with PATCH[2/2]
>> only and unable to hit the issue. However, PATCH[1/2] may be needed by other scenarios.
>> So it would be nice to fix them in one shoot.
> 
> Yes, see below.
> 
>>
>>
>>> Firstly, We were failing on the transmit queue and {tvq, rvq}->busyloop_timeout
>>> == false if I remember correctly. So the added smp_rmb() in vhost_vq_avail_empty()
>>> is only a concern to tx_can_batch(). A mutex isn't enough to ensure the order
>>> for the available index and available ring entry (head). For example, vhost_vq_avail_empty()
>>> called by tx_can_batch() can see next available index, but its corresponding
>>> available ring entry (head) may not be seen by vhost yet if smp_rmb() is missed.
>>> The next call to get_tx_bufs(), where the available ring entry (head) doesn't
>>> arrived yet, leading to stale available ring entry (head) being fetched.
>>>
>>>     handle_tx_copy
>>>       get_tx_bufs                 // smp_rmb() won't be executed when vq->avail_idx != vq->last_avail_idx
>>>       tx_can_batch
>>>         vhost_vq_avail_empty      // vq->avail_idx is updated from vq->avail->idx
>>>
>>> The reason why I added smp_rmb() to vhost_vq_avail_empty() is because the function
>>> is a exposed API, even it's only used by drivers/vhost/net.c at present. It means
>>> the API has been broken internally. So it seems more appropriate to fix it up in
>>> vhost_vq_avail_empty() so that the API's users needn't worry about the memory access
>>> order.
> 
> When tx_can_batch returns true it means there's still pending tx
> buffers. Since it might read indices so it still can bypass the
> smp_rmb() in the vhost_get_vq_desc().
> 
> I'd suggest adding those above to change log.
> 
> With this,
> 
> Acked-by: Jason Wang <jasowang@redhat.com>
> 

Thanks, Jason. The change log has been improved as you suggested in
v3, which was posted for further review.

Thanks,
Gavin

>>>
>>>>>
>>>>>
>>>>>> +               smp_rmb();
>>>>>> +               return false;
>>>>>> +       }
>>>>>>
>>>>>> -       return vq->avail_idx == vq->last_avail_idx;
>>>>>> +       return true;
>>>>>>    }
>>>>>>    EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-03-28  0:27 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-26 23:38 [PATCH v2 0/2] vhost: Fix stale available ring entries Gavin Shan
2024-03-26 23:38 ` [PATCH v2 1/2] vhost: Add smp_rmb() in vhost_vq_avail_empty() Gavin Shan
2024-03-27  2:34   ` Jason Wang
2024-03-27  2:44     ` Jason Wang
2024-03-27  4:08       ` Gavin Shan
2024-03-27  7:35         ` Gavin Shan
2024-03-27  7:42           ` Jason Wang
2024-03-28  0:27             ` Gavin Shan
2024-03-27 12:07   ` Michael S. Tsirkin
2024-03-28  0:26     ` Gavin Shan
2024-03-26 23:38 ` [PATCH v2 2/2] vhost: Add smp_rmb() in vhost_enable_notify() Gavin Shan
2024-03-27  2:41   ` Jason Wang
2024-03-27  4:10     ` Gavin Shan
2024-03-26 23:55 ` [PATCH v2 0/2] vhost: Fix stale available ring entries Gavin Shan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).