public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Alex Mastro <amastro@fb.com>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: David Matlack <dmatlack@google.com>,
	Alex Williamson <alex@shazbot.org>, Shuah Khan <shuah@kernel.org>,
	Peter Xu <peterx@redhat.com>, <linux-kernel@vger.kernel.org>,
	<kvm@vger.kernel.org>, <linux-kselftest@vger.kernel.org>
Subject: Re: [PATCH] vfio: selftests: Add vfio_dma_mapping_mmio_test
Date: Thu, 8 Jan 2026 08:25:19 -0800	[thread overview]
Message-ID: <aV/ab4BueabG/qZN@devgpu015.cco6.facebook.com> (raw)
In-Reply-To: <20260108143804.GD545276@ziepe.ca>

On Thu, Jan 08, 2026 at 10:38:04AM -0400, Jason Gunthorpe wrote:
> On Wed, Jan 07, 2026 at 07:36:44PM -0800, Alex Mastro wrote:
> > This was inspired by QEMU's hw/vfio/region.c which also does this rounding up
> > of size to the next power of two [1].
> > 
> > I'm now realizing that's only necessary for regions with
> > VFIO_REGION_INFO_CAP_SPARSE_MMAP where there are multiple mmaps per region, and
> > each mmap's size is less than the size of the BAR. Here, since we're mapping the
> > entire BAR which must be pow2, it shouldn't be necessary.
> 
> You only need to do this dance if you care about having large PTEs
> under the VMAs, which is probably something worth testing both
> scenarios.

Yep, makes sense. The test takes a long time to run without this due potentially
faulting a 128G BAR region 4K at a time during VFIO_IOMMU_MAP_DMA.

> 
> > The intent of QEMU's mmap alignment code is imperfect in the SPARE_MMAP case?
> > After a hole, the next mmap'able range could be some arbitrary page-aligned
> > offset into the region. It's not helpful mmap some region offset which is
> > maximally 4K-aligned at a 1G-aligned vaddr.
> > 
> > I think to be optimal, QEMU should be attempting to align the vaddr for bar
> > mmaps such that
> > 
> > vaddr % {2M,1G} == region_offset % {2M,1G}
> > 
> > Would love someone to sanity check me on this. Kind of a diversion.
> 
> What you write is correct. Ankit recently discovered this bug in
> qemu. It happens not just with SPARSE_MMAP but also when mmmaping
> around the MSI-X hole..

Is my mental model broken? I thought MSI-X holes in a VFIO-exposed BAR region
implied SPARSE_MMAP? I didn't think there was another way for the uapi to
express hole-yness.

> 
> I also advocated for what you write here that qemu should ensure:
> 
>   vaddr % region_size == region_offset % region_size

Why region_size out of curiosity? Assuming perfect knowledge of kernel internals
I would have expected something like this:

diff --git a/hw/vfio/region.c b/hw/vfio/region.c
index ca75ab1be4..1d8595e808 100644
--- a/hw/vfio/region.c
+++ b/hw/vfio/region.c
@@ -238,6 +238,18 @@ static void vfio_subregion_unmap(VFIORegion *region, int index)
     region->mmaps[index].mmap = NULL;
 }
 
+/*
+ * Return the next value greater than or equal to `input` such that
+ * (value % align) == offset.
+ */
+static size_t align_offset(size_t input, size_t offset, size_t align)
+{
+    size_t remainder = input % align;
+    size_t delta = (align + offset - remainder) % align;
+
+    return input + delta;
+}
+
 int vfio_region_mmap(VFIORegion *region)
 {
     int i, ret, prot = 0;
@@ -252,7 +264,11 @@ int vfio_region_mmap(VFIORegion *region)
     prot |= region->flags & VFIO_REGION_INFO_FLAG_WRITE ? PROT_WRITE : 0;
 
     for (i = 0; i < region->nr_mmaps; i++) {
-        size_t align = MIN(1ULL << ctz64(region->mmaps[i].size), 1 * GiB);
+        size_t size = region->mmaps[i].size;
+        size_t offs = region->mmaps[i].offset;
+        size_t align = size >= GiB ? GiB :
+                       size >= 2 * MiB ? 2 * MiB :
+                       getpagesize();
         void *map_base, *map_align;
 
         /*
@@ -275,7 +291,7 @@ int vfio_region_mmap(VFIORegion *region)
 
         fd = vfio_device_get_region_fd(region->vbasedev, region->nr);
 
-        map_align = (void *)ROUND_UP((uintptr_t)map_base, (uintptr_t)align);
+        map_align = (void *)align_offset((size_t)map_base, offs % align, align);
         munmap(map_base, map_align - map_base);
         munmap(map_align + region->mmaps[i].size,
                align - (map_align - map_base));

  reply	other threads:[~2026-01-08 16:25 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-07 22:13 [PATCH] vfio: selftests: Add vfio_dma_mapping_mmio_test Alex Mastro
2026-01-07 22:37 ` Alex Mastro
2026-01-07 23:54 ` David Matlack
2026-01-08  0:54   ` Jason Gunthorpe
2026-01-08  2:41     ` Alex Mastro
2026-01-08 14:10       ` Jason Gunthorpe
2026-01-08 15:45         ` Alex Williamson
2026-01-08 18:24           ` David Matlack
2026-01-08 18:33             ` Jason Gunthorpe
2026-01-08 21:29               ` David Matlack
2026-01-09  0:36                 ` Jason Gunthorpe
2026-01-09  0:43                   ` David Matlack
2026-01-09  0:54                     ` Jason Gunthorpe
2026-01-09 17:04                       ` David Matlack
2026-01-09 18:01                         ` Jason Gunthorpe
2026-01-09 21:38                           ` Alex Williamson
2026-01-13 23:47                             ` David Matlack
2026-01-08  3:36   ` Alex Mastro
2026-01-08 14:38     ` Jason Gunthorpe
2026-01-08 16:25       ` Alex Mastro [this message]
2026-01-08 23:04         ` Alex Mastro
2026-01-08 15:42     ` Alex Williamson
2026-01-08 18:20       ` David Matlack

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aV/ab4BueabG/qZN@devgpu015.cco6.facebook.com \
    --to=amastro@fb.com \
    --cc=alex@shazbot.org \
    --cc=dmatlack@google.com \
    --cc=jgg@ziepe.ca \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=peterx@redhat.com \
    --cc=shuah@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox