Re: [RFC PATCH 1/2] vfio: Improve DMA mapping performance for huge pages

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jason Gunthorpe <jgg@nvidia.com>
To: Aaron Lewis <aaronlewis@google.com>
Cc: alex.williamson@redhat.com, dmatlack@google.com,
	kvm@vger.kernel.org, seanjc@google.com
Subject: Re: [RFC PATCH 1/2] vfio: Improve DMA mapping performance for huge pages
Date: Mon, 29 Dec 2025 21:12:41 -0400	[thread overview]
Message-ID: <20251230011241.GA23056@nvidia.com> (raw)
In-Reply-To: <CAAAPnDEcAGEBexGfC92pS=t9iYQRJFyFE9yPUU916T92Y465qw@mail.gmail.com>

On Mon, Dec 29, 2025 at 01:40:02PM -0800, Aaron Lewis wrote:
> I tried the memfd path on iommufd and it is indeed fast.  It was in
> the same ballpark as the optimization I posted for VFIO_TYPE1_IOMMU.

I would expect it to be better than just ball park, it does a lot less
work with 1G pages.

> I also tried it with a HugeTLB fd, e.g. /mnt/huge/tmp.bin, and it was
> fast too.  I haven't had a chance to try it with DevDax w/ 1G pages,
> though.  

I don't think any of the DAX cases are supported. I doubt anyone would
complain about adding support to memfd_pin_folios() for devdax..

> I noticed DevDax was quite a bit slower than HugeTLB when I
> tested it on VFIO_TYPE1_IOMMU.  

If you are testing latest upstream kernels it should be the same,
Alistair fixed it up to install folios in the PUD level:

aed877c2b425 ("device/dax: properly refcount device dax pages when mapping")

So the folios can be properly 1G sized and what comes out of
pin_user_pages() should not be any different from hugetlbfs.

If your devdax has 1G folios and PUD mappings is another question..

Older kerenls were all broken here and num_pages_contiguous() wouldn't
work right.

> > This isn't right, num_pages_contiguous() is the best we can do for
> > lists returns by pin_user_pages(). In a VMA context you cannot blindly
> > assume the whole folio was mapped contiguously. Indeed I seem to
> > recall this was already proposed and rejected and that is how we ended
> > up with num_pages_contiguous().
> 
> Can't we assume a single page will be mapped contiguously?  

No.

pin_user_pages() ignores VMA boundaries and the user can create a
combination of VMAs that slices a folio.

In general the output of pin_user_pages() cannot be assumed to work
like this.

> If we are operating on 1GB pages and the batch only covers ~30MB of
> that, if we are a head page can't we assume the VA and PA will be
> contiguous at least until the end of the current page?  

No, only memfd_pin_folios() can use that logic because it can assume
there is no discontiguity. This is why it returns folios, and why
num_pages_contiguous() exists.

> + untagged_vaddr = untagged_addr_remote(mm, vaddr);
> + vma = vma_lookup(mm, untagged_vaddr);

Searching the VMA like this is kind of ridiculous for a performance
path.

Even within a VMA I don't think we actually have a universal rule that
folios have to be installed contiguously in PTEs, I believe they are
permitted to be sliced, though a memfd wouldn't do that for its own
VMAs.

> Using memfd sounds reasonable assuming DevDax w/ 1GB pages performs
> well.

IMHO you are better to improve memfd_pin_folios() for your use case
than to mess with this stuff and type1. It is simpler without unknown
corner cases.

Jason

next prev parent reply	other threads:[~2025-12-30  1:12 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-23 23:00 [RFC PATCH 0/2] vfio: Improve DMA mapping performance for huge pages Aaron Lewis
2025-12-23 23:00 ` [RFC PATCH 1/2] " Aaron Lewis
2025-12-24  2:10   ` Jason Gunthorpe
2025-12-29 21:40     ` Aaron Lewis
2025-12-30  1:12       ` Jason Gunthorpe [this message]
2026-01-05 18:31         ` David Matlack
2026-01-05 19:01           ` Jason Gunthorpe
2026-01-05 19:36             ` David Matlack
2025-12-23 23:00 ` [RFC PATCH 2/2] vfio: selftest: Add vfio_dma_mapping_perf_test Aaron Lewis
2025-12-24  2:04 ` [RFC PATCH 0/2] vfio: Improve DMA mapping performance for huge pages Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251230011241.GA23056@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=aaronlewis@google.com \
    --cc=alex.williamson@redhat.com \
    --cc=dmatlack@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.