All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: eric.auger@redhat.com, qemu-devel@nongnu.org,
	qemu-stable@nongnu.org, Yalan Zhang <yalzhang@redhat.com>,
	Lei Yang <leiyang@redhat.com>
Subject: Re: [PATCH for 7.2?] vhost: fix vq dirt bitmap syncing when vIOMMU is enabled
Date: Thu, 24 Nov 2022 02:06:29 -0500	[thread overview]
Message-ID: <20221124020000-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <CACGkMEsA=gLX8PZSQr91J=-nKr5OL=a4h5NSnJVRdRhvU9V+Ww@mail.gmail.com>

On Thu, Nov 24, 2022 at 12:12:15PM +0800, Jason Wang wrote:
> On Wed, Nov 23, 2022 at 3:21 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Nov 23, 2022 at 03:08:25PM +0800, Jason Wang wrote:
> > > On Wed, Nov 23, 2022 at 2:15 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Wed, Nov 23, 2022 at 01:47:04PM +0800, Jason Wang wrote:
> > > > > On Wed, Nov 23, 2022 at 1:26 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > >
> > > > > > On Wed, Nov 23, 2022 at 12:28 AM Eric Auger <eric.auger@redhat.com> wrote:
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > On 11/22/22 10:43, Michael S. Tsirkin wrote:
> > > > > > > > On Tue, Nov 22, 2022 at 11:01:11AM +0800, Jason Wang wrote:
> > > > > > > >> When vIOMMU is enabled, the vq->used_phys is actually the IOVA not
> > > > > > > >> GPA. So we need to translate it to GPA before the syncing otherwise we
> > > > > > > >> may hit the following crash since IOVA could be out of the scope of
> > > > > > > >> the GPA log size. This could be noted when using virtio-IOMMU with
> > > > > > > >> vhost using 1G memory.
> > > > > > > > Noted how exactly? What does "using 1G memory" mean?
> > > > > > >
> > > > > > > We hit the following assertion:
> > > > > > >
> > > > > > > qemu-system-x86_64: ../hw/virtio/vhost.c:85: vhost_dev_sync_region: Assertion `end / VHOST_LOG_CHUNK < dev->log_size' failed.
> > > > > > >
> > > > > > > On the last time vhost_get_log_size() is called it takes into account 2 regions when computing the log_size:
> > > > > > > qemu-system-x86_64: vhost_get_log_size region 0 last=0x9ffff updated log_size=0x3
> > > > > > > qemu-system-x86_64: vhost_get_log_size region 1 last=0x3fffffff updated log_size=0x1000
> > > > > > > so in vhost_migration_log() vhost_get_log_size(dev) returns 0x1000
> > > > > > >
> > > > > > > In the test case, memory_region_sync_dirty_bitmap() gets called for mem-machine_mem, vga.vram (several times) and eventually on pc.bios. This latter is reponsible for the assertion:
> > > > > > >
> > > > > > > qemu-system-x86_64: vhost_log_sync calls sync_dirty_map on pc.bios for the full range
> > > > > > > qemu-system-x86_64: vhost_sync_dirty_bitmap calls vhost_dev_sync_region on region 0
> > > > > > > qemu-system-x86_64: vhost_dev_sync_region end=0x9ffff < start=0xfffc0000
> > > > > > > qemu-system-x86_64: vhost_sync_dirty_bitmap calls vhost_dev_sync_region on region 1
> > > > > > > qemu-system-x86_64: vhost_dev_sync_region end=0x3fffffff < start=0xfffc0000
> > > > > > > qemu-system-x86_64: vhost_sync_dirty_bitmap calls vhost_dev_sync_region on vq 0 <-----
> > > > > > > qemu-system-x86_64: vhost_dev_sync_region pc.bios mfirst=0xfffc0000 mlast=0xffffffff rfirst=0xfffff240 rlast=0xfffffa45
> > > > > > > qemu-system-x86_64: vhost_dev_sync_region pc.bios end=0xfffffa45 VHOST_LOG_CHUNK=0x40000 end/VHOST_LOG_CHUNK=0x3fff dev->log_size=0x1000
> > > > > > > qemu-system-x86_64: ../hw/virtio/vhost.c:85: vhost_dev_sync_region: Assertion `end / VHOST_LOG_CHUNK < dev->log_size' failed.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > "using 1G memory": We hit the issue with a guest started with 1GB initial RAM.
> > > > > >
> > > > > > Yes, so in the case the guest iova allocator may try to use an IOVA
> > > > > > that is beyond 1G.
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > >> Fixes: c471ad0e9bd46 ("vhost_net: device IOTLB support")
> > > > > > > >> Cc: qemu-stable@nongnu.org
> > > > > > > >> Reported-by: Yalan Zhang <yalzhang@redhat.com>
> > > > > > > >> Tested-by: Eric Auger <eric.auger@redhat.com>
> > > > > > > >> Tested-by: Lei Yang <leiyang@redhat.com>
> > > > > > > >> Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > > >> ---
> > > > > > > >>  hw/virtio/vhost.c | 65 ++++++++++++++++++++++++++++++++---------------
> > > > > > > >>  1 file changed, 45 insertions(+), 20 deletions(-)
> > > > > > > >>
> > > > > > > >> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > > > > > > >> index d1c4c20b8c..26b319f34e 100644
> > > > > > > >> --- a/hw/virtio/vhost.c
> > > > > > > >> +++ b/hw/virtio/vhost.c
> > > > > > > >> @@ -106,11 +106,30 @@ static void vhost_dev_sync_region(struct vhost_dev *dev,
> > > > > > > >>      }
> > > > > > > >>  }
> > > > > > > >>
> > > > > > > >> +static bool vhost_dev_has_iommu(struct vhost_dev *dev)
> > > > > > > >> +{
> > > > > > > >> +    VirtIODevice *vdev = dev->vdev;
> > > > > > > >> +
> > > > > > > >> +    /*
> > > > > > > >> +     * For vhost, VIRTIO_F_IOMMU_PLATFORM means the backend support
> > > > > > > >> +     * incremental memory mapping API via IOTLB API. For platform that
> > > > > > > >> +     * does not have IOMMU, there's no need to enable this feature
> > > > > > > >> +     * which may cause unnecessary IOTLB miss/update transactions.
> > > > > > > >> +     */
> > > > > > > >> +    if (vdev) {
> > > > > > > >> +        return virtio_bus_device_iommu_enabled(vdev) &&
> > > > > > > >> +            virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM);
> > > > > > > >> +    } else {
> > > > > > > >> +        return false;
> > > > > > > >> +    }
> > > > > > > >> +}
> > > > > > > >> +
> > > > > > > >>  static int vhost_sync_dirty_bitmap(struct vhost_dev *dev,
> > > > > > > >>                                     MemoryRegionSection *section,
> > > > > > > >>                                     hwaddr first,
> > > > > > > >>                                     hwaddr last)
> > > > > > > >>  {
> > > > > > > >> +    IOMMUTLBEntry iotlb;
> > > > > > > > why don't we move this inside the scope where it's used?
> > > > > >
> > > > > > That's fine.
> > > > > >
> > > > > > > >
> > > > > > > >>      int i;
> > > > > > > >>      hwaddr start_addr;
> > > > > > > >>      hwaddr end_addr;
> > > > > > > >> @@ -132,13 +151,37 @@ static int vhost_sync_dirty_bitmap(struct vhost_dev *dev,
> > > > > > > >>      }
> > > > > > > >>      for (i = 0; i < dev->nvqs; ++i) {
> > > > > > > >>          struct vhost_virtqueue *vq = dev->vqs + i;
> > > > > > > >> +        hwaddr used_phys = vq->used_phys, used_size = vq->used_size;
> > > > > > > >> +        hwaddr phys, s;
> > > > > > > > these two, too.
> > > > > >
> > > > > > Right.
> > > > > >
> > > > > > > >
> > > > > > > >>
> > > > > > > >>          if (!vq->used_phys && !vq->used_size) {
> > > > > > > >>              continue;
> > > > > > > >>          }
> > > > > > > >>
> > > > > > > >> -        vhost_dev_sync_region(dev, section, start_addr, end_addr, vq->used_phys,
> > > > > > > >> -                              range_get_last(vq->used_phys, vq->used_size));
> > > > > > > >> +        if (vhost_dev_has_iommu(dev)) {
> > > > > > > >> +            while (used_size) {
> > > > > > > >> +                rcu_read_lock();
> > > > > > > >> +                iotlb = address_space_get_iotlb_entry(dev->vdev->dma_as,
> > > > > > > >> +                                                      used_phys,
> > > > > > > >> +                                                      true, MEMTXATTRS_UNSPECIFIED);
> > > > > > > >> +                rcu_read_unlock();
> > > > > > > >> +
> > > > > > > >> +                if (iotlb.target_as == NULL) {
> > > > > > > >> +                    return -EINVAL;
> > > > > > > > I am not sure how this can trigger. I don't like == NULL:
> > > > > > > > !iotlb.target_as is more succint. But a bigger question is how to
> > > > > > > > handle this. callers ignore the return value so maybe
> > > > > > > > log guest error? iommu seems misconfigured ...
> > > > > >
> > > > > > Ok.
> > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >> +                }
> > > > > > > >> +
> > > > > > > >> +                phys = iotlb.translated_addr;
> > > > > > > >> +                s = MIN(iotlb.addr_mask + 1, used_size);
> > > > > > > > Note, that iotlb.translated_addr here is an aligned address and
> > > > > > > > iotlb.addr_mask + 1 is size from there.
> > > > > > > >
> > > > > > > > So I think phys that you want is actually
> > > > > > > >       phys = iotlb.translated_addr + (used_phys & iotlb.addr_mask);
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > accordingly, the size would be from there until end of mask:
> > > > > > > >       s = MIN(iotlb.addr_mask + 1 - (used_phys & iotlb.addr_mask), used_size);
> > > > > > > >
> > > > > > > >
> > > > > > > > Also, it bothers me that the math here will give you 0 if addr_mask is
> > > > > > > > all ones.
> > > > >
> > > > > So even if addr_mask is all ones, we end up with s = MIN(1, used_size)
> > > > > which should be fine.
> > > >
> > > > How do you figure? addr_mask is all ones, addr_mask + 1 is 0, we get MIN(0, used_size).
> > >
> > > But we have a substration after that.
> >
> >
> > I don't get it. This is your code:
> >                s = MIN(iotlb.addr_mask + 1, used_size);
> >
> > if addr_mask is all ones s becomes 0. Then while loop will loop forever.
> 
> Ok, I think there's some misunderstanding here. I meant using:
> 
> s = MIN(iotlb.addr_mask + 1 - (used_phys & iotlb.addr_mask), used_size);
> 
> Won't result in a zero for s.

If used_phys is page aligned then I think it will.

> So there's no need to move the plus one
> outside the MIN.
> 
> Thanks

I mean why don't you try?

[mst@tuck ~]$ cat > a.c
#include <stdio.h>
#define  MIN(a,b) (a<b ? a : b)

int main(int argc, char **argv)
{
   unsigned mask = 0xffffffff;
   unsigned s = MIN(mask + 1, 100);
   printf("s=0x%x\n",s);
}
[mst@tuck ~]$ gcc a.c
[mst@tuck ~]$ ./a.out 
s=0x0
[mst@tuck ~]$ 






> >
> > > In order to get a zero result,
> > > used_phys must be zero then used_phys & iotlb.addr_mask can be zero,
> > > or anything I missed here?
> > >
> > > The 1 is calculated via assuming used_phys are all ones.
> > >
> > > Thanks
> >
> > Not used_phys, I'm talking about addr_mask being all ones.
> >
> >
> > >
> > > >
> > > > > The used_size has been validated before to be
> > > > > non-zero.
> > > > >
> > > > > Thanks
> > > > >
> > > > > > Then MIN will give 0 too and we loop forever.  I think this
> > > > > > > > can not trigger, but I'd rather we play it safe and add outside of MIN
> > > > > > > > after it's capped to a reasonable value. So we end up with:
> > > > > > > >
> > > > > > > >       /* Distance from start of used ring until last byte of IOMMU page */
> > > > > > > >       s = iotlb.addr_mask - (used_phys & iotlb.addr_mask);
> > > > > > > >       /* size of used ring, or of the part of it until end of IOMMU page */
> > > > > > > >       s = MIN(s, used_size - 1) + 1;
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > > > Right.
> > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >> +
> > > > > > > >> +                vhost_dev_sync_region(dev, section, start_addr, end_addr, phys,
> > > > > > > >> +                                      range_get_last(phys, used_size));
> > > > > > > > why are you syncing used_size here? Shouldn't it be s?
> > > > > >
> > > > > > Let me fix this.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >> +                used_size -= s;
> > > > > > > >> +                used_phys += s;
> > > > > > > >> +            }
> > > > > > > >> +        } else {
> > > > > > > >> +            vhost_dev_sync_region(dev, section, start_addr, end_addr, used_phys,
> > > > > > > >> +                                  range_get_last(used_phys, used_size));
> > > > > > > >> +        }
> > > > > > > >>      }
> > > > > > > >>      return 0;
> > > > > > > >>  }
> > > > > > > >> @@ -306,24 +349,6 @@ static inline void vhost_dev_log_resize(struct vhost_dev *dev, uint64_t size)
> > > > > > > >>      dev->log_size = size;
> > > > > > > >>  }
> > > > > > > >>
> > > > > > > >> -static bool vhost_dev_has_iommu(struct vhost_dev *dev)
> > > > > > > >> -{
> > > > > > > >> -    VirtIODevice *vdev = dev->vdev;
> > > > > > > >> -
> > > > > > > >> -    /*
> > > > > > > >> -     * For vhost, VIRTIO_F_IOMMU_PLATFORM means the backend support
> > > > > > > >> -     * incremental memory mapping API via IOTLB API. For platform that
> > > > > > > >> -     * does not have IOMMU, there's no need to enable this feature
> > > > > > > >> -     * which may cause unnecessary IOTLB miss/update transactions.
> > > > > > > >> -     */
> > > > > > > >> -    if (vdev) {
> > > > > > > >> -        return virtio_bus_device_iommu_enabled(vdev) &&
> > > > > > > >> -            virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM);
> > > > > > > >> -    } else {
> > > > > > > >> -        return false;
> > > > > > > >> -    }
> > > > > > > >> -}
> > > > > > > >> -
> > > > > > > >>  static void *vhost_memory_map(struct vhost_dev *dev, hwaddr addr,
> > > > > > > >>                                hwaddr *plen, bool is_write)
> > > > > > > >>  {
> > > > > > > >> --
> > > > > > > >> 2.25.1
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > Eric
> > > > > > >
> > > >
> >



  reply	other threads:[~2022-11-24  7:07 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-22  3:01 [PATCH for 7.2?] vhost: fix vq dirt bitmap syncing when vIOMMU is enabled Jason Wang
2022-11-22  9:43 ` Michael S. Tsirkin
2022-11-22 16:28   ` Eric Auger
2022-11-23  5:26     ` Jason Wang
2022-11-23  5:47       ` Jason Wang
2022-11-23  6:15         ` Michael S. Tsirkin
2022-11-23  7:08           ` Jason Wang
2022-11-23  7:21             ` Michael S. Tsirkin
2022-11-24  4:12               ` Jason Wang
2022-11-24  7:06                 ` Michael S. Tsirkin [this message]
2022-11-24  7:31                   ` Jason Wang
2022-11-24  7:41                     ` Michael S. Tsirkin
2022-11-24  7:58                       ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221124020000-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=eric.auger@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=leiyang@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-stable@nongnu.org \
    --cc=yalzhang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.