Re: Xe performance regression with recent IOMMU changes

public inbox for intel-xe@lists.freedesktop.org
 help / color / mirror / Atom feed

From: Jason Gunthorpe <jgg@nvidia.com>
To: Francois Dugast <francois.dugast@intel.com>
Cc: iommu@lists.linux.dev, intel-xe@lists.freedesktop.org,
	"Joerg Roedel" <joerg.roedel@amd.com>,
	"Calvin Owens" <calvin@wbinvd.org>,
	"David Woodhouse" <dwmw2@infradead.org>,
	"Will Deacon" <will@kernel.org>,
	"Robin Murphy" <robin.murphy@arm.com>,
	"Samiullah Khawaja" <skhawaja@google.com>,
	"Matthew Brost" <matthew.brost@intel.com>,
	"Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	"Tina Zhang" <tina.zhang@intel.com>,
	"Lu Baolu" <baolu.lu@linux.intel.com>,
	"Kevin Tian" <kevin.tian@intel.com>
Subject: Re: Xe performance regression with recent IOMMU changes
Date: Wed, 21 Jan 2026 14:04:49 -0400	[thread overview]
Message-ID: <20260121180449.GA1490142@nvidia.com> (raw)
In-Reply-To: <20260121131135.GF1134360@nvidia.com>

On Wed, Jan 21, 2026 at 09:11:35AM -0400, Jason Gunthorpe wrote:
> On Wed, Jan 21, 2026 at 02:02:16PM +0100, Francois Dugast wrote:
> > I am reporting a slowdown in Xe caused by a couple of IOMMU changes. It
> > can be observed during DMA mappings/unmappings required to issue copies
> > between system memory and the device, when handling GPU faults. Not sure
> > how other use cases or vendors are affected but below is the impact on
> > execution times for BMG:
> > 
> > Before changes:
> >   4KB
> >     drm_pagemap_migrate_map_pages: 0.4 us
> >     drm_pagemap_migrate_unmap_pages: 0.4 us
> >   64KB
> >     drm_pagemap_migrate_map_pages: 2.5 us
> >     drm_pagemap_migrate_unmap_pages: 3.5 us
> >   2MB
> >     drm_pagemap_migrate_map_pages: 88 us
> >     drm_pagemap_migrate_unmap_pages: 108 us
> > 
> > After changes:
> >   4KB
> >     drm_pagemap_migrate_map_pages: 0.7 us
> >     drm_pagemap_migrate_unmap_pages: 0.7 us
> >   64KB
> >     drm_pagemap_migrate_map_pages: 3.5 us
> >     drm_pagemap_migrate_unmap_pages: 10.5 us
> >   2MB
> >     drm_pagemap_migrate_map_pages: 102 us
> >     drm_pagemap_migrate_unmap_pages: 330 us
> 
> I posted some more optimizations for these cases, it should reduce the
> numbers.
> 
> This is the opposite of the benchmark numbers I ran which showed
> significant gains as the page count and sizes increased.
> 
> But something weird is going on to see a 3x increase in unmap, that
> shouldn't be just algorithm overhead. That almost seems like
> additional IOTLB invalidation overhead or something else going wrong.
> 
> Is this from a system with the VT-d cache flushing requirement? That
> logic changed around too and could have this kind of big impact.

Oh looking at the code a bit you've got pretty much the slowest
possible thing you can do here:

	for (i = 0; i < npages;) {
		if (!pagemap_addr[i].addr || dma_mapping_error(dev, pagemap_addr[i].addr))
			goto next;

		dma_unmap_page(dev, pagemap_addr[i].addr, PAGE_SIZE << pagemap_addr[i].order, dir);

It is weird though:

0.7 us * 512 = 358us so it is about the reported speed.

But the old one is 0.4 us * 512 = 204 us which is twice as
slow as reported?? It got 2x faster the more times you loop it? Huh?

The real way to fix this up is to use the new DMA API so this can be
collapsed into a single unmap. Then it will take < 1us for all those cases.

Look at the patches Leon made for the RDMA ODP stuff, it has a similar
looking workflow.

The optimizations I posted will help this noticably.

Jason

next prev parent reply	other threads:[~2026-01-21 18:05 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-21 13:02 Xe performance regression with recent IOMMU changes Francois Dugast
2026-01-21 13:11 ` Jason Gunthorpe
2026-01-21 18:04   ` Jason Gunthorpe [this message]
2026-01-22  6:15     ` Matthew Brost
2026-01-22  7:29       ` Leon Romanovsky
2026-01-22  7:36         ` Matthew Brost
2026-01-22 10:26           ` Leon Romanovsky
2026-01-22 13:31       ` Jason Gunthorpe
2026-01-23 16:27         ` Francois Dugast
2026-01-23 19:07           ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260121180449.GA1490142@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=calvin@wbinvd.org \
    --cc=dwmw2@infradead.org \
    --cc=francois.dugast@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=iommu@lists.linux.dev \
    --cc=joerg.roedel@amd.com \
    --cc=kevin.tian@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=robin.murphy@arm.com \
    --cc=skhawaja@google.com \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=tina.zhang@intel.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox