Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: Leon Romanovsky <leonro@nvidia.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>, <intel-xe@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>, <francois.dugast@intel.com>,
	<thomas.hellstrom@linux.intel.com>,
	<himal.prasad.ghimiray@intel.com>
Subject: Re: [RFC PATCH v3 04/11] drm/pagemap: Use dma-map IOVA alloc, link, and sync API for DRM pagemap
Date: Thu, 29 Jan 2026 10:22:47 -0800	[thread overview]
Message-ID: <aXuld/QOR9gh4fjg@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <20260129101427.GE10992@unreal>

On Thu, Jan 29, 2026 at 12:14:27PM +0200, Leon Romanovsky wrote:
> On Wed, Jan 28, 2026 at 01:04:49PM -0800, Matthew Brost wrote:
> > On Wed, Jan 28, 2026 at 09:45:40PM +0200, Leon Romanovsky wrote:
> > > On Wed, Jan 28, 2026 at 11:29:23AM -0800, Matthew Brost wrote:
> > > > On Wed, Jan 28, 2026 at 01:55:31PM -0400, Jason Gunthorpe wrote:
> > > > > On Wed, Jan 28, 2026 at 09:46:44AM -0800, Matthew Brost wrote:
> > > > > 
> > > > > > It is intended to fill holes. The input pages come from the
> > > > > > migrate_vma_* functions, which can return a sparsely populated array of
> > > > > > pages for a region (e.g., it scans a 2M range but only finds several of
> > > > > > the 512 pages eligible for migration). As a result, if (!page) is true
> > > > > > for many entries.
> > > > > 
> > > > > This is migration?? So something is DMA'ing from A -> B - why put
> > > > > holes in the first place? Can you tightly pack the pages in the IOVA?
> > > > > 
> > > > 
> > > > This could probably could be made to work. I think it would be an
> > > > initial pass to figure out the IOVA size then tightly pack.
> > > > 
> > > > Let me look at this. Probably better too as installing dummy pages is a
> > > > non-zero cost as I assume dma_iova_link is a radix tree walk.
> > > > 
> > > > > If there is no iommu then the addresses are scattered all over anyhow
> > > > > so it can't be relying on some dma_addr_t relationship?
> > > > 
> > > > Scattered dma-addresses is already handled in the copy code, likewise
> > > > holes so non-issue.
> > > > 
> > > > > 
> > > > > You don't have to fully populate the allocated iova, you can link from
> > > > > A-B and then unlink from A-B even if B is less than the total size
> > > > > requested.
> > > > > 
> > > > > The hmm users have the holes because hmm is dynamically
> > > > > adding/removing pages as it runs and it can't do anything to pack the
> > > > > mapping.
> > > > > 
> > > > > > > IOVA space? If so, what necessitates those holes? You can have less mapped
> > > > > > > than IOVA and dma_iova_*() API can handle it.
> > > > > > 
> > > > > > I was actually going to ask you about this, so I’m glad you brought it
> > > > > > up here. Again, this is a hack to avoid holes — the holes are never
> > > > > > touched by our copy function, but rather skipped, so we just jam in a
> > > > > > dummy address so the entire IOVA range has valid IOMMU pages.
> > > > > 
> > > > > I would say what you are doing is trying to optimize unmap by
> > > > 
> > > > Yes and make the code simplish.
> > > > 
> > > > > unmapping everything in one shot instead of just the mapped areas, and
> > > > > the WARN_ON is telling you that it isn't allowed to unmap across a
> > > > > hole.
> > > > > 
> > > > > > at the moment I’m not sure whether this warning affects actual
> > > > > > functionality or if we could just delete it. 
> > > > > 
> > > > > It means the iommu page table stopped unmapping when it hit a hole and
> > > > > there is a bunch of left over maps in the page table that shouldn't be
> > > > > there. So yes, it is serious and cannot be deleted.
> > > > >
> > > > 
> > > > Cool, this explains the warning.
> > > >  
> > > > > This is a possible option to teach things to detect the holes and
> > > > > ignore them..
> > > > 
> > > > Another option — and IMO probably the best one — as it makes potential
> > > > usages with holes the simplest at the driver level. Let me look at this
> > > > too.
> > > 
> > > It would be ideal if we could code a more general solution. In HMM we
> > > release pages one by one, and it would be preferable to have a single-shot
> > > unmap routine instead. In similar to NVMe which release all IOVA space
> > > with one call to dma_iova_destroy().
> > > 
> > > HMM chain:
> > > 
> > > ib_umem_odp_unmap_dma_pages()
> > >  -> for (...)
> > >    -> hmm_dma_unmap_pfn()
> > > 
> > > After giving more thought to my earlier suggestion to use
> > > hmm_pfn_to_phys(), I began to wonder why did not you use the
> > > hmm_dma_*() API instead?
> > > 
> > 
> > That is ill-suited for high-speed fabrics, but so is our existing
> > implementation — we’re just in slightly better shape (?). It also seems
> > ill-suited [1][2][3] for variable page sizes (which are possible with
> > our API), as well as the way we currently program device PTEs in our
> > driver. We also receive PFNs from the migrate_vma_* layer, which must
> > also be mapped.
> > 
> > I also believe the hmm_dma_* code predates the DRM code being merged, or
> > was merged around the same time.
> > 
> > We could work to unify the HMM helpers and make them usable, but that
> > won’t happen overnight. The HMM layer needs quite a bit of work to
> > useable, and then we’d have to propagate everything upward through
> > DRM/Xe and any new users. Let me play around with this though a bit
> > though to get rough idea what would need to be done here.
> > 
> > [1] https://elixir.bootlin.com/linux/v6.18.6/source/drivers/infiniband/core/umem_odp.c#L255
> > [2] https://elixir.bootlin.com/linux/v6.18.6/source/drivers/infiniband/core/umem_odp.c#L193
> > [3] https://elixir.bootlin.com/linux/v6.18.6/source/drivers/infiniband/core/umem_odp.c#L104 
> > 
> > Also this is some odd stuff going... Why sync after every mapping [4].
> 
> Right now, hmm_dma_map_pfn() user is page-based one, we need to sync
> after every pagefault.
> 

Right. On GPUs we typically fault in chunks because it involves a copy
to/from the device, which is expensive, and it’s much more efficient to
transfer larger sizes. (IIRC, faulting 2M with 512 × 4K pages versus 512
separate 4K faults was about 58× faster in the former case.)

THP device pages (+mTHP) make the “one fault == one page” model more
palatable, but memory gets fragmented, and if THP/mTHP allocations fail,
we really need the path where we can move multiple pages in a single
fault. Because of that, we wouldn’t want to sync until all dma-mappings
are linked.

Also, I dug into hmm_dma_map_pfn; it doesn’t handle device-private pages
either, which is something we need.

As a long-term goal, having a function like hmm_dma_map_pfn that handled
multiple pages of various sizes, supported device-private memory,
sparsely populated regions, unified handling with migration PFNs,
handled high-speed fabric vs. P2P, and simply returned a mapping that a
driver could take and program into PTEs would be great. Perhaps that’s a
goal we can work toward eventually, though making this generic across
many driver/subsystem use cases seems difficult. The code I'm writing
here make this generic across DRM but my driver (Xe) is the only user
thus far so really impossible to know if I've got this correct until
another vender jumps in which hopefully is happening soonish.

Matt

> > Blindly doing BIDIRECTIONAL [5]... 
> 
> It was promoted from old code, callers can provide direction.
> 
> > 
> > [4] https://elixir.bootlin.com/linux/v6.18.6/source/mm/hmm.c#L826
> > [5] https://elixir.bootlin.com/linux/v6.18.6/source/mm/hmm.c#L821
> > 
> > > > 
> > > > Do you think we need flag somewhere for 'ignore holes' or can I just
> > > > blindly skip them?
> > > 
> > > Better if we will have something like dma_iova_with_holes_destroy()
> > > function call to make sure that we don't hurt performance of existing
> > > dma_iova_destroy() users.
> > >
> > 
> > Yes, I think this is the best route for the time being. Let me look at
> > this.
> > 
> > Matt
> > 
> > > Thanks
> > > 
> > > > 
> > > > Matt
> > > > 
> > > > > 
> > > > > Jason

  reply	other threads:[~2026-01-29 18:23 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-28  0:48 [RFC PATCH v3 00/11] Use new dma-map IOVA alloc, link, and sync API in GPU SVM and DRM pagemap Matthew Brost
2026-01-28  0:48 ` [RFC PATCH v3 01/11] drm/pagemap: Add helper to access zone_device_data Matthew Brost
2026-01-28 13:53   ` Leon Romanovsky
2026-01-28  0:48 ` [RFC PATCH v3 02/11] drm/gpusvm: Use dma-map IOVA alloc, link, and sync API in GPU SVM Matthew Brost
2026-01-28 14:04   ` Leon Romanovsky
2026-01-28  0:48 ` [RFC PATCH v3 03/11] drm/pagemap: Split drm_pagemap_migrate_map_pages into device / system Matthew Brost
2026-01-28  0:48 ` [RFC PATCH v3 04/11] drm/pagemap: Use dma-map IOVA alloc, link, and sync API for DRM pagemap Matthew Brost
2026-01-28 14:28   ` Leon Romanovsky
2026-01-28 17:46     ` Matthew Brost
     [not found]       ` <20260128175531.GR1641016@ziepe.ca>
2026-01-28 19:29         ` Matthew Brost
2026-01-28 19:45           ` Leon Romanovsky
2026-01-28 21:04             ` Matthew Brost
2026-01-29 10:14               ` Leon Romanovsky
2026-01-29 18:22                 ` Matthew Brost [this message]
2026-01-28  0:48 ` [RFC PATCH v3 05/11] drm/pagemap: Reduce number of IOVA link calls Matthew Brost
2026-01-28  0:48 ` [RFC PATCH v3 06/11] drm/pagemap: Add IOVA interface to DRM pagemap Matthew Brost
     [not found]   ` <20260128151458.GJ1641016@ziepe.ca>
2026-01-28 18:42     ` Matthew Brost
2026-01-28 19:41       ` Matthew Brost
     [not found]       ` <20260128193509.GU1641016@ziepe.ca>
2026-01-28 20:24         ` Matthew Brost
2026-01-29 18:57           ` Jason Gunthorpe
2026-01-29 19:28             ` Matthew Brost
2026-01-29 19:32               ` Jason Gunthorpe
2026-01-28  0:48 ` [RFC PATCH v3 07/11] drm/xe: Stub out DRM pagemap IOVA alloc implementation Matthew Brost
2026-01-28  0:48 ` [RFC PATCH v3 08/11] drm/pagemap: Use device-to-device IOVA alloc, link, and sync API for DRM pagemap Matthew Brost
2026-01-28  0:48 ` [RFC PATCH v3 09/11] drm/xe: Drop BO dma-resv lock during SVM migrate-to-device Matthew Brost
2026-01-28  0:48 ` [RFC PATCH v3 10/11] drm/xe: Implement DRM pagemap IOVA vfuncs Matthew Brost
2026-01-28  0:48 ` [RFC PATCH v3 11/11] drm/gpusvm: Use device-to-device IOVA alloc, link, and sync API in GPU SVM Matthew Brost
2026-01-28  0:59 ` ✗ CI.checkpatch: warning for Use new dma-map IOVA alloc, link, and sync API in GPU SVM and DRM pagemap (rev3) Patchwork
2026-01-28  1:01 ` ✓ CI.KUnit: success " Patchwork
2026-01-28  1:42 ` ✓ Xe.CI.BAT: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aXuld/QOR9gh4fjg@lstrano-desk.jf.intel.com \
    --to=matthew.brost@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=francois.dugast@intel.com \
    --cc=himal.prasad.ghimiray@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jgg@ziepe.ca \
    --cc=leonro@nvidia.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox