[BUG?] vfio/pci: VA alignment sensitivity of VFIO_IOMMU_MAP_DMA which target MMIO

Linux PCI subsystem development
 help / color / mirror / Atom feed

From: Alex Mastro <amastro@fb.com>
To: <linux-pci@vger.kernel.org>
Cc: <alex.williamson@redhat.com>, <jgg@nvidia.com>,
	<peterx@redhat.com>, <kbusch@kernel.org>, <linux-mm@kvack.org>
Subject: [BUG?] vfio/pci: VA alignment sensitivity of VFIO_IOMMU_MAP_DMA which target MMIO
Date: Thu, 29 May 2025 14:44:14 -0700	[thread overview]
Message-ID: <20250529214414.1508155-1-amastro@fb.com> (raw)

Hello,

We are running user space drivers in production on top of VFIO, and after
upgrading from v6.9.0 to v6.13.2 noticed intermittent, slow performance leading
to "rcu_sched self-detected stall" when issuing VFIO_IOMMU_MAP_DMA on ~64 GiB
mmap-ed BAR regions. When doing this on enough devices concurrently, we
triggered softlockup_panic. The mmap-ed BAR regions were obtained from mmap on
a VFIO device fd.

We map regions > 1G, which sometimes do not start at 1G-aligned BAR offsets,
but they are always aligned by at least 2 MiB.

We determined that slow, stalling runs were correlated with 4 KiB-aligned
addresses returned by mmap, and normal runs with >= 2 MiB alignment.

Inspired by QEMU's mmap-alloc.c, we are handling this by reserving VA with an
oversized mmap, and then clobbering with MAP_FIXED at a good address inside the
reservation with the mmap on the VFIO device fd.

At first we settled for aligning the mmap address to {1 GiB, 2 MiB} exactly,
and the stalls disappeared, but then improved performance with the following:

We found that the best addresses to pass to VFIO_IOMMU_MAP_DMA have the
following properties, where va_align and va_offset are chosen based on the size
and BAR offsets of the desired mapping.

va_align = {1 GiB, 2 MiB, 4 KiB}
va_offset = mmap_offset % va_align
(addr_to_mmap % va_align) == va_offset

Using addresses with the above properties seems to optimize the count and
granularity of faults as confirmed by bpftrace-ing vfio_pci_mmap_huge_fault.

We then backported "Improve DMA mapping performance for huge pfnmaps" [1] to
our 6.13 tree, and saw further performance improvements consistent with those
described in the patch (thank you!). However, with the backport, we still need
to align mmap addresses manually, otherwise we see stalls.

We are wondering the following:
- Is all of the above expected behavior, and usage of VFIO?
- Is there an expected minimum alignment greater than 4K (our system page size)
  for non-MAP_FIXED mmap on a VFIO device fd?
- Was there an unintended regression to our use-case in between 6.9 and 6.13?

Thanks,
Alex Mastro

[1] https://lore.kernel.org/all/20250205231728.2527186-1-alex.williamson@redhat.com/

next             reply	other threads:[~2025-05-29 21:45 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-29 21:44 Alex Mastro [this message]
2025-05-30 13:10 ` [BUG?] vfio/pci: VA alignment sensitivity of VFIO_IOMMU_MAP_DMA which target MMIO Jason Gunthorpe
2025-05-30 14:25   ` Peter Xu
2025-05-30 23:05     ` Alex Mastro
2025-06-06 18:49   ` Alex Mastro
2025-06-09  0:20     ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250529214414.1508155-1-amastro@fb.com \
    --to=amastro@fb.com \
    --cc=alex.williamson@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=kbusch@kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=peterx@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox