From: Alex Mastro <amastro@fb.com>
To: David Matlack <dmatlack@google.com>
Cc: Alex Williamson <alex@shazbot.org>, Shuah Khan <shuah@kernel.org>,
Peter Xu <peterx@redhat.com>, <linux-kernel@vger.kernel.org>,
<kvm@vger.kernel.org>, <linux-kselftest@vger.kernel.org>,
Jason Gunthorpe <jgg@ziepe.ca>
Subject: Re: [PATCH] vfio: selftests: Add vfio_dma_mapping_mmio_test
Date: Wed, 7 Jan 2026 19:36:44 -0800 [thread overview]
Message-ID: <aV8mTHk/CUsyEEs1@devgpu015.cco6.facebook.com> (raw)
In-Reply-To: <aV7yIchrL3mzNyFO@google.com>
On Wed, Jan 07, 2026 at 11:54:09PM +0000, David Matlack wrote:
> On 2026-01-07 02:13 PM, Alex Mastro wrote:
> > Test MMIO-backed DMA mappings by iommu_map()-ing mmap'ed BAR regions.
>
> Thanks for adding this!
>
> > Also update vfio_pci_bar_map() to align BAR mmaps for efficient huge
> > page mappings.
> >
> > Only vfio_type1 variants are tested; iommufd variants can be added
> > once kernel support lands.
>
> Are there plans to support mapping BARs via virtual address in iommufd?
> I thought the plan was only to support via dma-bufs. Maybe Jason can
> confirm.
>
> Assuming not, should we add negative tests here to make sure iommufd
> does not allow mapping BARs?
>
> And then we can add dma-buf tests in a future commit.
>
> > The manual mmap alignment can be removed
> > once mmap(!MAP_FIXED) on vfio device fds improves to automatically
> > return well-aligned addresses.
> >
> > Signed-off-by: Alex Mastro <amastro@fb.com>
> > ---
> > Sanity test run:
> >
> > $ ./vfio_dma_mapping_mmio_test 0000:05:00.0
> > TAP version 13
> > 1..4
> > # Starting 4 tests from 2 test cases.
> > # RUN vfio_dma_mapping_mmio_test.vfio_type1_iommu.map_full_bar ...
> > Mapping BAR4: vaddr=0x7fad40000000 size=0x2000000000 iova=0x2000000000
> > # OK vfio_dma_mapping_mmio_test.vfio_type1_iommu.map_full_bar
> > ok 1 vfio_dma_mapping_mmio_test.vfio_type1_iommu.map_full_bar
> > # RUN vfio_dma_mapping_mmio_test.vfio_type1_iommu.map_partial_bar ...
> > Mapping BAR4 (partial): vaddr=0x7fad40000000 size=0x1000 iova=0x0
> > # OK vfio_dma_mapping_mmio_test.vfio_type1_iommu.map_partial_bar
> > ok 2 vfio_dma_mapping_mmio_test.vfio_type1_iommu.map_partial_bar
> > # RUN vfio_dma_mapping_mmio_test.vfio_type1v2_iommu.map_full_bar ...
> > Mapping BAR4: vaddr=0x7fad40000000 size=0x2000000000 iova=0x2000000000
> > # OK vfio_dma_mapping_mmio_test.vfio_type1v2_iommu.map_full_bar
> > ok 3 vfio_dma_mapping_mmio_test.vfio_type1v2_iommu.map_full_bar
> > # RUN vfio_dma_mapping_mmio_test.vfio_type1v2_iommu.map_partial_bar ...
> > Mapping BAR4 (partial): vaddr=0x7fad40000000 size=0x1000 iova=0x0
> > # OK vfio_dma_mapping_mmio_test.vfio_type1v2_iommu.map_partial_bar
> > ok 4 vfio_dma_mapping_mmio_test.vfio_type1v2_iommu.map_partial_bar
> > # PASSED: 4 / 4 tests passed.
> > # Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0
> > ---
> > tools/testing/selftests/vfio/Makefile | 1 +
> > tools/testing/selftests/vfio/lib/vfio_pci_device.c | 28 ++++-
> > .../selftests/vfio/vfio_dma_mapping_mmio_test.c | 132 +++++++++++++++++++++
> > 3 files changed, 160 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/testing/selftests/vfio/Makefile b/tools/testing/selftests/vfio/Makefile
> > index 3c796ca99a50..ead27892ab65 100644
> > --- a/tools/testing/selftests/vfio/Makefile
> > +++ b/tools/testing/selftests/vfio/Makefile
> > @@ -1,5 +1,6 @@
> > CFLAGS = $(KHDR_INCLUDES)
> > TEST_GEN_PROGS += vfio_dma_mapping_test
> > +TEST_GEN_PROGS += vfio_dma_mapping_mmio_test
> > TEST_GEN_PROGS += vfio_iommufd_setup_test
> > TEST_GEN_PROGS += vfio_pci_device_test
> > TEST_GEN_PROGS += vfio_pci_device_init_perf_test
> > diff --git a/tools/testing/selftests/vfio/lib/vfio_pci_device.c b/tools/testing/selftests/vfio/lib/vfio_pci_device.c
> > index 13fdb4b0b10f..6f29543856a5 100644
> > --- a/tools/testing/selftests/vfio/lib/vfio_pci_device.c
> > +++ b/tools/testing/selftests/vfio/lib/vfio_pci_device.c
> > @@ -12,10 +12,13 @@
> > #include <sys/mman.h>
> >
> > #include <uapi/linux/types.h>
> > +#include <linux/align.h>
> > #include <linux/iommufd.h>
> > +#include <linux/kernel.h>
> > #include <linux/limits.h>
> > #include <linux/mman.h>
> > #include <linux/overflow.h>
> > +#include <linux/sizes.h>
> > #include <linux/types.h>
> > #include <linux/vfio.h>
> >
> > @@ -124,20 +127,43 @@ static void vfio_pci_region_get(struct vfio_pci_device *device, int index,
> > static void vfio_pci_bar_map(struct vfio_pci_device *device, int index)
> > {
> > struct vfio_pci_bar *bar = &device->bars[index];
> > + size_t align, size;
> > + void *map_base, *map_align;
> > int prot = 0;
> >
> > VFIO_ASSERT_LT(index, PCI_STD_NUM_BARS);
> > VFIO_ASSERT_NULL(bar->vaddr);
> > VFIO_ASSERT_TRUE(bar->info.flags & VFIO_REGION_INFO_FLAG_MMAP);
> > + VFIO_ASSERT_GT(bar->info.size, 0);
> >
> > if (bar->info.flags & VFIO_REGION_INFO_FLAG_READ)
> > prot |= PROT_READ;
> > if (bar->info.flags & VFIO_REGION_INFO_FLAG_WRITE)
> > prot |= PROT_WRITE;
> >
> > - bar->vaddr = mmap(NULL, bar->info.size, prot, MAP_FILE | MAP_SHARED,
> > + /*
> > + * Align the mmap for more efficient IOMMU mapping.
> > + * The largest PUD size supporting huge pfnmap is 1GiB.
> > + */
> > + size = bar->info.size;
> > + align = min_t(u64, 1ULL << __builtin_ctzll(size), SZ_1G);
>
> What's the reason to align to 1ULL << __builtin_ctzll(size) and not just
> size?
This was inspired by QEMU's hw/vfio/region.c which also does this rounding up
of size to the next power of two [1].
I'm now realizing that's only necessary for regions with
VFIO_REGION_INFO_CAP_SPARSE_MMAP where there are multiple mmaps per region, and
each mmap's size is less than the size of the BAR. Here, since we're mapping the
entire BAR which must be pow2, it shouldn't be necessary.
It's also making me realize...
That the existing vfio_pci_device_setup() BAR mapping path isn't accounting for
the possibility of SPARE_MMAP where attempting to mmap the entire region would
fail due to non-mmap'able holes.
The intent of QEMU's mmap alignment code is imperfect in the SPARE_MMAP case?
After a hole, the next mmap'able range could be some arbitrary page-aligned
offset into the region. It's not helpful mmap some region offset which is
maximally 4K-aligned at a 1G-aligned vaddr.
I think to be optimal, QEMU should be attempting to align the vaddr for bar
mmaps such that
vaddr % {2M,1G} == region_offset % {2M,1G}
Would love someone to sanity check me on this. Kind of a diversion.
[1] https://github.com/qemu/qemu/blob/master/hw/vfio/region.c#L255-L286
>
> > +
> > + map_base = mmap(NULL, size + align, PROT_NONE,
> > + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> > + VFIO_ASSERT_NE(map_base, MAP_FAILED);
> > +
> > + map_align = (void *)ALIGN((uintptr_t)map_base, align);
> > +
> > + if (map_align > map_base)
> > + munmap(map_base, map_align - map_base);
> > + if (align > (size_t)(map_align - map_base))
> > + munmap(map_align + size, align - (map_align - map_base));
> > +
> > + bar->vaddr = mmap(map_align, size, prot, MAP_SHARED | MAP_FIXED,
> > device->fd, bar->info.offset);
> > VFIO_ASSERT_NE(bar->vaddr, MAP_FAILED);
> > +
> > + madvise(bar->vaddr, size, MADV_HUGEPAGE);
> > }
>
> Can you split these changes out into a precursor commit? I think they
> stand on their own.
Ack.
>
> >
> > static void vfio_pci_bar_unmap(struct vfio_pci_device *device, int index)
> > diff --git a/tools/testing/selftests/vfio/vfio_dma_mapping_mmio_test.c b/tools/testing/selftests/vfio/vfio_dma_mapping_mmio_test.c
> > new file mode 100644
> > index 000000000000..211fa4203b49
> > --- /dev/null
> > +++ b/tools/testing/selftests/vfio/vfio_dma_mapping_mmio_test.c
> > @@ -0,0 +1,132 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +#include <stdio.h>
> > +#include <sys/mman.h>
> > +#include <unistd.h>
> > +
> > +#include <uapi/linux/types.h>
> > +#include <linux/pci_regs.h>
> > +#include <linux/sizes.h>
> > +#include <linux/vfio.h>
> > +
> > +#include <libvfio.h>
> > +
> > +#include "../kselftest_harness.h"
> > +
> > +static const char *device_bdf;
> > +
> > +static int largest_mapped_bar(struct vfio_pci_device *device)
> > +{
> > + int bar_idx = -1;
> > + u64 bar_size = 0;
> > +
> > + for (int i = 0; i < PCI_STD_NUM_BARS; i++) {
> > + struct vfio_pci_bar *bar = &device->bars[i];
> > +
> > + if (!bar->vaddr)
> > + continue;
> > +
> > + if (!(bar->info.flags & VFIO_REGION_INFO_FLAG_WRITE))
> > + continue;
>
> nit: Add a comment here. I assume this is because iommu_map() tries to
> create writable IOMMU mappings?
Yes, and I'll actually make a stronger check for both READ|WRITE here since
iommu_map() always maps with VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE.
>
> Speaking of, maybe we can add a test that creating writable IOMMU
> mappings fails for read-only BARs?
I think I'll have to look into this as a follow-on. I'm not sure how to validate
it yet without mocks or similar since I don't have such HW.
>
> > +
> > + if (bar->info.size > bar_size) {
> > + bar_size = bar->info.size;
> > + bar_idx = i;
> > + }
> > + }
> > +
> > + return bar_idx;
> > +}
> > +
> > +FIXTURE(vfio_dma_mapping_mmio_test) {
> > + struct iommu *iommu;
> > + struct vfio_pci_device *device;
> > + struct iova_allocator *iova_allocator;
> > + int bar_idx;
> > +};
> > +
> > +FIXTURE_VARIANT(vfio_dma_mapping_mmio_test) {
> > + const char *iommu_mode;
> > +};
> > +
> > +#define FIXTURE_VARIANT_ADD_IOMMU_MODE(_iommu_mode) \
> > +FIXTURE_VARIANT_ADD(vfio_dma_mapping_mmio_test, _iommu_mode) { \
> > + .iommu_mode = #_iommu_mode, \
> > +}
>
> nit: Alignment of trailing backslashes is off.
Ack
>
> > +
> > +FIXTURE_VARIANT_ADD_IOMMU_MODE(vfio_type1_iommu);
> > +FIXTURE_VARIANT_ADD_IOMMU_MODE(vfio_type1v2_iommu);
> > +
> > +#undef FIXTURE_VARIANT_ADD_IOMMU_MODE
> > +
> > +FIXTURE_SETUP(vfio_dma_mapping_mmio_test)
> > +{
> > + self->iommu = iommu_init(variant->iommu_mode);
> > + self->device = vfio_pci_device_init(device_bdf, self->iommu);
> > + self->iova_allocator = iova_allocator_init(self->iommu);
> > + self->bar_idx = largest_mapped_bar(self->device);
> > +}
> > +
> > +FIXTURE_TEARDOWN(vfio_dma_mapping_mmio_test)
> > +{
> > + iova_allocator_cleanup(self->iova_allocator);
> > + vfio_pci_device_cleanup(self->device);
> > + iommu_cleanup(self->iommu);
> > +}
> > +
> > +TEST_F(vfio_dma_mapping_mmio_test, map_full_bar)
> > +{
> > + struct vfio_pci_bar *bar;
> > + struct dma_region region;
> > +
> > + if (self->bar_idx < 0)
> > + SKIP(return, "No mappable BAR found on device %s", device_bdf);
>
> I think you can do this in the FIXTURE_SETUP() to avoid duplication.
Ack
>
> > +
> > + bar = &self->device->bars[self->bar_idx];
> > +
> > + region = (struct dma_region) {
> > + .vaddr = bar->vaddr,
> > + .size = bar->info.size,
> > + .iova = iova_allocator_alloc(self->iova_allocator, bar->info.size),
> > + };
> > +
> > + printf("Mapping BAR%d: vaddr=%p size=0x%lx iova=0x%lx\n",
> > + self->bar_idx, region.vaddr, region.size, region.iova);
> > +
> > + iommu_map(self->iommu, ®ion);
> > + iommu_unmap(self->iommu, ®ion);
> > +}
> > +
> > +TEST_F(vfio_dma_mapping_mmio_test, map_partial_bar)
> > +{
> > + struct vfio_pci_bar *bar;
> > + struct dma_region region;
> > + size_t page_size;
> > +
> > + if (self->bar_idx < 0)
> > + SKIP(return, "No mappable BAR found on device %s", device_bdf);
> > +
> > + bar = &self->device->bars[self->bar_idx];
> > + page_size = getpagesize();
> > +
> > + if (bar->info.size < 2 * page_size)
> > + SKIP(return, "BAR%d too small for partial mapping test (size=0x%llx)",
> > + self->bar_idx, bar->info.size);
> > +
> > + region = (struct dma_region) {
> > + .vaddr = bar->vaddr,
> > + .size = page_size,
> > + .iova = iova_allocator_alloc(self->iova_allocator, page_size),
> > + };
> > +
> > + printf("Mapping BAR%d (partial): vaddr=%p size=0x%lx iova=0x%lx\n",
> > + self->bar_idx, region.vaddr, region.size, region.iova);
> > +
> > + iommu_map(self->iommu, ®ion);
> > + iommu_unmap(self->iommu, ®ion);
> > +}
> > +
> > +int main(int argc, char *argv[])
> > +{
> > + device_bdf = vfio_selftests_get_bdf(&argc, argv);
> > + return test_harness_run(argc, argv);
> > +}
> >
> > ---
> > base-commit: d721f52e31553a848e0e9947ca15a49c5674aef3
> > change-id: 20260107-scratch-amastro-vfio-dma-mapping-mmio-test-eecf25d9a742
> >
> > Best regards,
> > --
> > Alex Mastro <amastro@fb.com>
> >
next prev parent reply other threads:[~2026-01-08 3:36 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-07 22:13 [PATCH] vfio: selftests: Add vfio_dma_mapping_mmio_test Alex Mastro
2026-01-07 22:37 ` Alex Mastro
2026-01-07 23:54 ` David Matlack
2026-01-08 0:54 ` Jason Gunthorpe
2026-01-08 2:41 ` Alex Mastro
2026-01-08 14:10 ` Jason Gunthorpe
2026-01-08 15:45 ` Alex Williamson
2026-01-08 18:24 ` David Matlack
2026-01-08 18:33 ` Jason Gunthorpe
2026-01-08 21:29 ` David Matlack
2026-01-09 0:36 ` Jason Gunthorpe
2026-01-09 0:43 ` David Matlack
2026-01-09 0:54 ` Jason Gunthorpe
2026-01-09 17:04 ` David Matlack
2026-01-09 18:01 ` Jason Gunthorpe
2026-01-09 21:38 ` Alex Williamson
2026-01-13 23:47 ` David Matlack
2026-01-08 3:36 ` Alex Mastro [this message]
2026-01-08 14:38 ` Jason Gunthorpe
2026-01-08 16:25 ` Alex Mastro
2026-01-08 23:04 ` Alex Mastro
2026-01-08 15:42 ` Alex Williamson
2026-01-08 18:20 ` David Matlack
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aV8mTHk/CUsyEEs1@devgpu015.cco6.facebook.com \
--to=amastro@fb.com \
--cc=alex@shazbot.org \
--cc=dmatlack@google.com \
--cc=jgg@ziepe.ca \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=peterx@redhat.com \
--cc=shuah@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox