From: Alex Mastro <amastro@fb.com>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: David Matlack <dmatlack@google.com>,
Alex Williamson <alex@shazbot.org>, Shuah Khan <shuah@kernel.org>,
Peter Xu <peterx@redhat.com>, <linux-kernel@vger.kernel.org>,
<kvm@vger.kernel.org>, <linux-kselftest@vger.kernel.org>
Subject: Re: [PATCH] vfio: selftests: Add vfio_dma_mapping_mmio_test
Date: Thu, 8 Jan 2026 08:25:19 -0800 [thread overview]
Message-ID: <aV/ab4BueabG/qZN@devgpu015.cco6.facebook.com> (raw)
In-Reply-To: <20260108143804.GD545276@ziepe.ca>
On Thu, Jan 08, 2026 at 10:38:04AM -0400, Jason Gunthorpe wrote:
> On Wed, Jan 07, 2026 at 07:36:44PM -0800, Alex Mastro wrote:
> > This was inspired by QEMU's hw/vfio/region.c which also does this rounding up
> > of size to the next power of two [1].
> >
> > I'm now realizing that's only necessary for regions with
> > VFIO_REGION_INFO_CAP_SPARSE_MMAP where there are multiple mmaps per region, and
> > each mmap's size is less than the size of the BAR. Here, since we're mapping the
> > entire BAR which must be pow2, it shouldn't be necessary.
>
> You only need to do this dance if you care about having large PTEs
> under the VMAs, which is probably something worth testing both
> scenarios.
Yep, makes sense. The test takes a long time to run without this due potentially
faulting a 128G BAR region 4K at a time during VFIO_IOMMU_MAP_DMA.
>
> > The intent of QEMU's mmap alignment code is imperfect in the SPARE_MMAP case?
> > After a hole, the next mmap'able range could be some arbitrary page-aligned
> > offset into the region. It's not helpful mmap some region offset which is
> > maximally 4K-aligned at a 1G-aligned vaddr.
> >
> > I think to be optimal, QEMU should be attempting to align the vaddr for bar
> > mmaps such that
> >
> > vaddr % {2M,1G} == region_offset % {2M,1G}
> >
> > Would love someone to sanity check me on this. Kind of a diversion.
>
> What you write is correct. Ankit recently discovered this bug in
> qemu. It happens not just with SPARSE_MMAP but also when mmmaping
> around the MSI-X hole..
Is my mental model broken? I thought MSI-X holes in a VFIO-exposed BAR region
implied SPARSE_MMAP? I didn't think there was another way for the uapi to
express hole-yness.
>
> I also advocated for what you write here that qemu should ensure:
>
> vaddr % region_size == region_offset % region_size
Why region_size out of curiosity? Assuming perfect knowledge of kernel internals
I would have expected something like this:
diff --git a/hw/vfio/region.c b/hw/vfio/region.c
index ca75ab1be4..1d8595e808 100644
--- a/hw/vfio/region.c
+++ b/hw/vfio/region.c
@@ -238,6 +238,18 @@ static void vfio_subregion_unmap(VFIORegion *region, int index)
region->mmaps[index].mmap = NULL;
}
+/*
+ * Return the next value greater than or equal to `input` such that
+ * (value % align) == offset.
+ */
+static size_t align_offset(size_t input, size_t offset, size_t align)
+{
+ size_t remainder = input % align;
+ size_t delta = (align + offset - remainder) % align;
+
+ return input + delta;
+}
+
int vfio_region_mmap(VFIORegion *region)
{
int i, ret, prot = 0;
@@ -252,7 +264,11 @@ int vfio_region_mmap(VFIORegion *region)
prot |= region->flags & VFIO_REGION_INFO_FLAG_WRITE ? PROT_WRITE : 0;
for (i = 0; i < region->nr_mmaps; i++) {
- size_t align = MIN(1ULL << ctz64(region->mmaps[i].size), 1 * GiB);
+ size_t size = region->mmaps[i].size;
+ size_t offs = region->mmaps[i].offset;
+ size_t align = size >= GiB ? GiB :
+ size >= 2 * MiB ? 2 * MiB :
+ getpagesize();
void *map_base, *map_align;
/*
@@ -275,7 +291,7 @@ int vfio_region_mmap(VFIORegion *region)
fd = vfio_device_get_region_fd(region->vbasedev, region->nr);
- map_align = (void *)ROUND_UP((uintptr_t)map_base, (uintptr_t)align);
+ map_align = (void *)align_offset((size_t)map_base, offs % align, align);
munmap(map_base, map_align - map_base);
munmap(map_align + region->mmaps[i].size,
align - (map_align - map_base));
next prev parent reply other threads:[~2026-01-08 16:25 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-07 22:13 [PATCH] vfio: selftests: Add vfio_dma_mapping_mmio_test Alex Mastro
2026-01-07 22:37 ` Alex Mastro
2026-01-07 23:54 ` David Matlack
2026-01-08 0:54 ` Jason Gunthorpe
2026-01-08 2:41 ` Alex Mastro
2026-01-08 14:10 ` Jason Gunthorpe
2026-01-08 15:45 ` Alex Williamson
2026-01-08 18:24 ` David Matlack
2026-01-08 18:33 ` Jason Gunthorpe
2026-01-08 21:29 ` David Matlack
2026-01-09 0:36 ` Jason Gunthorpe
2026-01-09 0:43 ` David Matlack
2026-01-09 0:54 ` Jason Gunthorpe
2026-01-09 17:04 ` David Matlack
2026-01-09 18:01 ` Jason Gunthorpe
2026-01-09 21:38 ` Alex Williamson
2026-01-13 23:47 ` David Matlack
2026-01-08 3:36 ` Alex Mastro
2026-01-08 14:38 ` Jason Gunthorpe
2026-01-08 16:25 ` Alex Mastro [this message]
2026-01-08 23:04 ` Alex Mastro
2026-01-08 15:42 ` Alex Williamson
2026-01-08 18:20 ` David Matlack
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aV/ab4BueabG/qZN@devgpu015.cco6.facebook.com \
--to=amastro@fb.com \
--cc=alex@shazbot.org \
--cc=dmatlack@google.com \
--cc=jgg@ziepe.ca \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=peterx@redhat.com \
--cc=shuah@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.