From: Jason Gunthorpe <jgg@nvidia.com>
To: Francois Dugast <francois.dugast@intel.com>
Cc: "Matthew Brost" <matthew.brost@intel.com>,
iommu@lists.linux.dev, intel-xe@lists.freedesktop.org,
"Joerg Roedel" <joerg.roedel@amd.com>,
"Calvin Owens" <calvin@wbinvd.org>,
"David Woodhouse" <dwmw2@infradead.org>,
"Will Deacon" <will@kernel.org>,
"Robin Murphy" <robin.murphy@arm.com>,
"Samiullah Khawaja" <skhawaja@google.com>,
"Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
"Tina Zhang" <tina.zhang@intel.com>,
"Lu Baolu" <baolu.lu@linux.intel.com>,
"Kevin Tian" <kevin.tian@intel.com>
Subject: Re: Xe performance regression with recent IOMMU changes
Date: Fri, 23 Jan 2026 15:07:16 -0400 [thread overview]
Message-ID: <20260123190716.GB1134360@nvidia.com> (raw)
In-Reply-To: <aXOhbK6Ed9WPbz0u@fdugast-desk>
On Fri, Jan 23, 2026 at 05:27:24PM +0100, Francois Dugast wrote:
> On Thu, Jan 22, 2026 at 09:31:31AM -0400, Jason Gunthorpe wrote:
> > Try the patches, give me the new numbers,
>
> Thanks for the suggestion but they do not seem to help, see new
> execution times below in ns, collected this time without kprobe
> to reduce variation:
>
> # iommu-tip + https://patch.msgid.link/r/0-v2-973a6bdc820f+693-iommpt_map_direct_jgg@nvidia.com
> +-----------------------------------+--------+--------+--------+
> | | 4KB | 64KB | 2MB |
> +-----------------------------------+--------+--------+--------+
> | drm_pagemap_migrate_map_pages() | 660 | 3951 | 113813 |
> +-----------------------------------+--------+--------+--------+
> | drm_pagemap_migrate_unmap_pages() | 610 | 11136 | 322802 |
> +-----------------------------------+--------+--------+--------+
>
> # drm-tip
> +-----------------------------------+--------+--------+--------+
> | | 4KB | 64KB | 2MB |
> +-----------------------------------+--------+--------+--------+
> | drm_pagemap_migrate_map_pages() | 687 | 3890 | 114749 |
> +-----------------------------------+--------+--------+--------+
> | drm_pagemap_migrate_unmap_pages() | 621 | 11180 | 334472 |
> +-----------------------------------+--------+--------+--------+
It is not nothing, that looks like about a 4% gain, that matches the
lower bound of what I was measuring for those patches as well.
There are two mysteries in your report.
First, compared to my measurements:
https://lore.kernel.org/linux-iommu/5-v3-634ccd3efce0+16d38-iommu_pt_vtd_jgg@nvidia.com/
iommu_map()
pgsz ,avg new,old ns, min new,old ns , min % (+ve is better)
2^12, 53,66 , 50,64 , 21.21
256*2^12, 384,524 , 337,516 , 34.34
iommu_unmap()
pgsz ,avg new,old ns, min new,old ns , min % (+ve is better)
2^12, 67,86 , 63,84 , 25.25
256*2^12, 216,335 , 198,317 , 37.37
Yours are about 10x higher. Granted they are not exactly the same
thing, but I'm measuring the actual page table code as 20% faster, not
slower. So I'm really wondering what is so different on your
situation. Is the cache flushing causing the 10x delta?
Second, it is normal for the map/unmap to be approximately the same,
your results have map being 165% slower. This surely must be a bug, I
have a guess that some cache flush is the incorrect length..
Still, that 10x difference is confusing, are you running with debug
options in your .kconfig? I wouldn't be surprised at all to be told
kasn/gcov/etc reacts much differently.
> > tell me if you have the non-cache iommu
>
> The setup used in this test has non-cache coherent IOMMU.
That helps a lot. The non-coherent case disables a meaningful
optimization for the 4k page case map case, and triggers a bunch of
hard to test cache flushing code that we can look at.
Any chance you can run this on a system that has a coherent IOMMU?
That would really help narrow things down.
Can you measure directly iommu_map/unmap() calls under the dma API?
Another thought is something related to the gather outside the actual
page table is acting differently.
I will attempt to run some benchmarking here specifically with the
non-coherent mode enabled to see if I can find a bug.
Thanks,
Jason
prev parent reply other threads:[~2026-01-23 19:07 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-21 13:02 Xe performance regression with recent IOMMU changes Francois Dugast
2026-01-21 13:11 ` Jason Gunthorpe
2026-01-21 18:04 ` Jason Gunthorpe
2026-01-22 6:15 ` Matthew Brost
2026-01-22 7:29 ` Leon Romanovsky
2026-01-22 7:36 ` Matthew Brost
2026-01-22 10:26 ` Leon Romanovsky
2026-01-22 13:31 ` Jason Gunthorpe
2026-01-23 16:27 ` Francois Dugast
2026-01-23 19:07 ` Jason Gunthorpe [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260123190716.GB1134360@nvidia.com \
--to=jgg@nvidia.com \
--cc=baolu.lu@linux.intel.com \
--cc=calvin@wbinvd.org \
--cc=dwmw2@infradead.org \
--cc=francois.dugast@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=iommu@lists.linux.dev \
--cc=joerg.roedel@amd.com \
--cc=kevin.tian@intel.com \
--cc=matthew.brost@intel.com \
--cc=robin.murphy@arm.com \
--cc=skhawaja@google.com \
--cc=thomas.hellstrom@linux.intel.com \
--cc=tina.zhang@intel.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox