All of lore.kernel.org
 help / color / mirror / Atom feed
From: Leon Romanovsky <leon@kernel.org>
To: Matthew Brost <matthew.brost@intel.com>
Cc: "Jason Gunthorpe" <jgg@nvidia.com>,
	"Francois Dugast" <francois.dugast@intel.com>,
	iommu@lists.linux.dev, intel-xe@lists.freedesktop.org,
	"Joerg Roedel" <joerg.roedel@amd.com>,
	"Calvin Owens" <calvin@wbinvd.org>,
	"David Woodhouse" <dwmw2@infradead.org>,
	"Will Deacon" <will@kernel.org>,
	"Robin Murphy" <robin.murphy@arm.com>,
	"Samiullah Khawaja" <skhawaja@google.com>,
	"Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	"Tina Zhang" <tina.zhang@intel.com>,
	"Lu Baolu" <baolu.lu@linux.intel.com>,
	"Kevin Tian" <kevin.tian@intel.com>
Subject: Re: Xe performance regression with recent IOMMU changes
Date: Thu, 22 Jan 2026 09:29:13 +0200	[thread overview]
Message-ID: <20260122072913.GJ13201@unreal> (raw)
In-Reply-To: <aXHAcibp/4pUB8f0@lstrano-desk.jf.intel.com>

On Wed, Jan 21, 2026 at 10:15:14PM -0800, Matthew Brost wrote:
> On Wed, Jan 21, 2026 at 02:04:49PM -0400, Jason Gunthorpe wrote:
> > On Wed, Jan 21, 2026 at 09:11:35AM -0400, Jason Gunthorpe wrote:
> > > On Wed, Jan 21, 2026 at 02:02:16PM +0100, Francois Dugast wrote:
> > > > I am reporting a slowdown in Xe caused by a couple of IOMMU changes. It
> > > > can be observed during DMA mappings/unmappings required to issue copies
> > > > between system memory and the device, when handling GPU faults. Not sure
> > > > how other use cases or vendors are affected but below is the impact on
> > > > execution times for BMG:
> > > > 
> > > > Before changes:
> > > >   4KB
> > > >     drm_pagemap_migrate_map_pages: 0.4 us
> > > >     drm_pagemap_migrate_unmap_pages: 0.4 us
> > > >   64KB
> > > >     drm_pagemap_migrate_map_pages: 2.5 us
> > > >     drm_pagemap_migrate_unmap_pages: 3.5 us
> > > >   2MB
> > > >     drm_pagemap_migrate_map_pages: 88 us
> > > >     drm_pagemap_migrate_unmap_pages: 108 us
> > > > 
> > > > After changes:
> > > >   4KB
> > > >     drm_pagemap_migrate_map_pages: 0.7 us
> > > >     drm_pagemap_migrate_unmap_pages: 0.7 us
> > > >   64KB
> > > >     drm_pagemap_migrate_map_pages: 3.5 us
> > > >     drm_pagemap_migrate_unmap_pages: 10.5 us
> > > >   2MB
> > > >     drm_pagemap_migrate_map_pages: 102 us
> > > >     drm_pagemap_migrate_unmap_pages: 330 us
> > > 
> > > I posted some more optimizations for these cases, it should reduce the
> > > numbers.
> > > 
> 
> We can try those — link? I believe I know the series, but just to make
> sure we’re on the same page.
> 
> > > This is the opposite of the benchmark numbers I ran which showed
> > > significant gains as the page count and sizes increased.
> > > 
> > > But something weird is going on to see a 3x increase in unmap, that
> > > shouldn't be just algorithm overhead. That almost seems like
> > > additional IOTLB invalidation overhead or something else going wrong.
> > > 
> > > Is this from a system with the VT-d cache flushing requirement? That
> > > logic changed around too and could have this kind of big impact.
> > 
> > Oh looking at the code a bit you've got pretty much the slowest
> > possible thing you can do here:
> 
> This was a fairly common pattern prior to Leon’s series, I believe. The
> cross-references show this pattern appearing frequently in the kernel
> [1]. I do agree with the point below that, with Leon’s changes applied,
> this could be refactored into an IOVA alloc/link/unlink/free flow, which
> would work better (also 2M device pages reduces the common 2M case to a
> mute point).
> 
> But that’s not what we’re discussing here. We’re talking about a
> regression introduced in the dma-mapping API for x86, which in my view
> is unacceptable for a kernel release. So IMO we should revert those
> changes [2].
> 
> [1] https://elixir.bootlin.com/linux/v6.18.6/A/ident/dma_unmap_page

I think this comparison is unfair. The previous behavior was bad for
everyone, while the current issue affects only the specific
drm_pagemap_migrate_unmap_pages() flow. Cases where the performance of
dma_unmap_page() in non-direct mode matters are extremely rare.

It should be relatively straightforward to add a link/unlink path to the
drm_pagemap_*() helpers and achieve decent performance.

Thanks

> [2]
> e6fbd544619c50b4a4d96ccb4676cac03cb iommupt/vtd: Support mgaw's less than a 4 level walk for first stage
> d856f9d27885c499d96ab7fe506083346ccf145d iommupt/vtd: Allow VT-d to have a larger table top than the vasz requires
> 6cbc09b7719ec7fd9f650f18b3828b7f60c17881 iommu/vt-d: Restore previous domain::aperture_end calculation
> a97fbc3ee3e2a536fafaff04f21f45472db71769 syscore: Pass context data to callbacks
> 101a2854110fa8787226dae1202892071ff2c369 iommu/vt-d: Follow PT_FEAT_DMA_INCOHERENT into the PASID entry
> d373449d8e97891434db0c64afca79d903c1194e iommu/vt-d: Use the generic iommu page table
> 
> > 
> > 	for (i = 0; i < npages;) {
> > 		if (!pagemap_addr[i].addr || dma_mapping_error(dev, pagemap_addr[i].addr))
> > 			goto next;
> > 
> > 		dma_unmap_page(dev, pagemap_addr[i].addr, PAGE_SIZE << pagemap_addr[i].order, dir);
> > 
> > It is weird though:
> > 
> > 0.7 us * 512 = 358us so it is about the reported speed.
> > 
> > But the old one is 0.4 us * 512 = 204 us which is twice as
> > slow as reported?? It got 2x faster the more times you loop it? Huh?
> > 
> > The real way to fix this up is to use the new DMA API so this can be
> > collapsed into a single unmap. Then it will take < 1us for all those cases.
> > 
> > Look at the patches Leon made for the RDMA ODP stuff, it has a similar
> > looking workflow.
> > 
> 
> See above. I agree this is the right direction, but we can’t simply
> regress kernels from existing performance.
> 
> > The optimizations I posted will help this noticably.
> > 
> 
> I think we need to start with a revert and then discuss whether your
> subsequent changes actually fix the problem.
> 
> Matt
> 
> > Jason
> 

  reply	other threads:[~2026-01-22  7:29 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-21 13:02 Xe performance regression with recent IOMMU changes Francois Dugast
2026-01-21 13:11 ` Jason Gunthorpe
2026-01-21 18:04   ` Jason Gunthorpe
2026-01-22  6:15     ` Matthew Brost
2026-01-22  7:29       ` Leon Romanovsky [this message]
2026-01-22  7:36         ` Matthew Brost
2026-01-22 10:26           ` Leon Romanovsky
2026-01-22 13:31       ` Jason Gunthorpe
2026-01-23 16:27         ` Francois Dugast
2026-01-23 19:07           ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260122072913.GJ13201@unreal \
    --to=leon@kernel.org \
    --cc=baolu.lu@linux.intel.com \
    --cc=calvin@wbinvd.org \
    --cc=dwmw2@infradead.org \
    --cc=francois.dugast@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joerg.roedel@amd.com \
    --cc=kevin.tian@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=robin.murphy@arm.com \
    --cc=skhawaja@google.com \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=tina.zhang@intel.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.