Re: Xe performance regression with recent IOMMU changes

public inbox for intel-xe@lists.freedesktop.org
 help / color / mirror / Atom feed

From: Matthew Brost <matthew.brost@intel.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: "Francois Dugast" <francois.dugast@intel.com>,
	iommu@lists.linux.dev, intel-xe@lists.freedesktop.org,
	"Joerg Roedel" <joerg.roedel@amd.com>,
	"Calvin Owens" <calvin@wbinvd.org>,
	"David Woodhouse" <dwmw2@infradead.org>,
	"Will Deacon" <will@kernel.org>,
	"Robin Murphy" <robin.murphy@arm.com>,
	"Samiullah Khawaja" <skhawaja@google.com>,
	"Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	"Tina Zhang" <tina.zhang@intel.com>,
	"Lu Baolu" <baolu.lu@linux.intel.com>,
	"Kevin Tian" <kevin.tian@intel.com>
Subject: Re: Xe performance regression with recent IOMMU changes
Date: Wed, 21 Jan 2026 22:15:14 -0800	[thread overview]
Message-ID: <aXHAcibp/4pUB8f0@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <20260121180449.GA1490142@nvidia.com>

On Wed, Jan 21, 2026 at 02:04:49PM -0400, Jason Gunthorpe wrote:
> On Wed, Jan 21, 2026 at 09:11:35AM -0400, Jason Gunthorpe wrote:
> > On Wed, Jan 21, 2026 at 02:02:16PM +0100, Francois Dugast wrote:
> > > I am reporting a slowdown in Xe caused by a couple of IOMMU changes. It
> > > can be observed during DMA mappings/unmappings required to issue copies
> > > between system memory and the device, when handling GPU faults. Not sure
> > > how other use cases or vendors are affected but below is the impact on
> > > execution times for BMG:
> > > 
> > > Before changes:
> > >   4KB
> > >     drm_pagemap_migrate_map_pages: 0.4 us
> > >     drm_pagemap_migrate_unmap_pages: 0.4 us
> > >   64KB
> > >     drm_pagemap_migrate_map_pages: 2.5 us
> > >     drm_pagemap_migrate_unmap_pages: 3.5 us
> > >   2MB
> > >     drm_pagemap_migrate_map_pages: 88 us
> > >     drm_pagemap_migrate_unmap_pages: 108 us
> > > 
> > > After changes:
> > >   4KB
> > >     drm_pagemap_migrate_map_pages: 0.7 us
> > >     drm_pagemap_migrate_unmap_pages: 0.7 us
> > >   64KB
> > >     drm_pagemap_migrate_map_pages: 3.5 us
> > >     drm_pagemap_migrate_unmap_pages: 10.5 us
> > >   2MB
> > >     drm_pagemap_migrate_map_pages: 102 us
> > >     drm_pagemap_migrate_unmap_pages: 330 us
> > 
> > I posted some more optimizations for these cases, it should reduce the
> > numbers.
> > 

We can try those — link? I believe I know the series, but just to make
sure we’re on the same page.

> > This is the opposite of the benchmark numbers I ran which showed
> > significant gains as the page count and sizes increased.
> > 
> > But something weird is going on to see a 3x increase in unmap, that
> > shouldn't be just algorithm overhead. That almost seems like
> > additional IOTLB invalidation overhead or something else going wrong.
> > 
> > Is this from a system with the VT-d cache flushing requirement? That
> > logic changed around too and could have this kind of big impact.
> 
> Oh looking at the code a bit you've got pretty much the slowest
> possible thing you can do here:

This was a fairly common pattern prior to Leon’s series, I believe. The
cross-references show this pattern appearing frequently in the kernel
[1]. I do agree with the point below that, with Leon’s changes applied,
this could be refactored into an IOVA alloc/link/unlink/free flow, which
would work better (also 2M device pages reduces the common 2M case to a
mute point).

But that’s not what we’re discussing here. We’re talking about a
regression introduced in the dma-mapping API for x86, which in my view
is unacceptable for a kernel release. So IMO we should revert those
changes [2].

[1] https://elixir.bootlin.com/linux/v6.18.6/A/ident/dma_unmap_page
[2]
e6fbd544619c50b4a4d96ccb4676cac03cb iommupt/vtd: Support mgaw's less than a 4 level walk for first stage
d856f9d27885c499d96ab7fe506083346ccf145d iommupt/vtd: Allow VT-d to have a larger table top than the vasz requires
6cbc09b7719ec7fd9f650f18b3828b7f60c17881 iommu/vt-d: Restore previous domain::aperture_end calculation
a97fbc3ee3e2a536fafaff04f21f45472db71769 syscore: Pass context data to callbacks
101a2854110fa8787226dae1202892071ff2c369 iommu/vt-d: Follow PT_FEAT_DMA_INCOHERENT into the PASID entry
d373449d8e97891434db0c64afca79d903c1194e iommu/vt-d: Use the generic iommu page table

> 
> 	for (i = 0; i < npages;) {
> 		if (!pagemap_addr[i].addr || dma_mapping_error(dev, pagemap_addr[i].addr))
> 			goto next;
> 
> 		dma_unmap_page(dev, pagemap_addr[i].addr, PAGE_SIZE << pagemap_addr[i].order, dir);
> 
> It is weird though:
> 
> 0.7 us * 512 = 358us so it is about the reported speed.
> 
> But the old one is 0.4 us * 512 = 204 us which is twice as
> slow as reported?? It got 2x faster the more times you loop it? Huh?
> 
> The real way to fix this up is to use the new DMA API so this can be
> collapsed into a single unmap. Then it will take < 1us for all those cases.
> 
> Look at the patches Leon made for the RDMA ODP stuff, it has a similar
> looking workflow.
> 

See above. I agree this is the right direction, but we can’t simply
regress kernels from existing performance.

> The optimizations I posted will help this noticably.
> 

I think we need to start with a revert and then discuss whether your
subsequent changes actually fix the problem.

Matt

> Jason

next prev parent reply	other threads:[~2026-01-22  6:15 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-21 13:02 Xe performance regression with recent IOMMU changes Francois Dugast
2026-01-21 13:11 ` Jason Gunthorpe
2026-01-21 18:04   ` Jason Gunthorpe
2026-01-22  6:15     ` Matthew Brost [this message]
2026-01-22  7:29       ` Leon Romanovsky
2026-01-22  7:36         ` Matthew Brost
2026-01-22 10:26           ` Leon Romanovsky
2026-01-22 13:31       ` Jason Gunthorpe
2026-01-23 16:27         ` Francois Dugast
2026-01-23 19:07           ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aXHAcibp/4pUB8f0@lstrano-desk.jf.intel.com \
    --to=matthew.brost@intel.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=calvin@wbinvd.org \
    --cc=dwmw2@infradead.org \
    --cc=francois.dugast@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joerg.roedel@amd.com \
    --cc=kevin.tian@intel.com \
    --cc=robin.murphy@arm.com \
    --cc=skhawaja@google.com \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=tina.zhang@intel.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox