From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 072FBC44536 for ; Thu, 22 Jan 2026 10:26:14 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id A6BE210E035; Thu, 22 Jan 2026 10:26:14 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=kernel.org header.i=@kernel.org header.b="B+mW1Jni"; dkim-atps=neutral Received: from sea.source.kernel.org (sea.source.kernel.org [172.234.252.31]) by gabe.freedesktop.org (Postfix) with ESMTPS id 7B05A10E035 for ; Thu, 22 Jan 2026 10:26:13 +0000 (UTC) Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sea.source.kernel.org (Postfix) with ESMTP id 31483438D8; Thu, 22 Jan 2026 10:26:13 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A2E3BC116C6; Thu, 22 Jan 2026 10:26:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1769077573; bh=e1EPzldw91Ho5uT5Mgj2iuuebqOQUeSD880I5lTXzvQ=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=B+mW1JniIE3obYKPFV2lu9DGPBMqbDYh574WY4zKvlkpcCFTfg5mYVjYVd0IbDGi/ Z40H3mcwGS3HgBxGJdtJEuuxYBz78VtR4ReNLI6kNN1vbaMS43FmJUMKJgbHZowVlm ytux24fGmcHVhEWCeJNPqT/v8cJ52n9VwOVze8LoMDW8/bPKZhSc5/7EeIytrmvEH9 K1hDtC0urC1ie6PIytkykcFQ5wUkhmYmRP4/d0PiEYTm5o6dVBDAT48CwbawV/if09 wuhJdm+HjjsaEf0ydbQE/7cYqgVYZYl7LQqmFtPCw4nfK8AAAQFfLRUwR+qUTM9VuO MH4rNoSe4qAsQ== Date: Thu, 22 Jan 2026 12:26:07 +0200 From: Leon Romanovsky To: Matthew Brost Cc: Jason Gunthorpe , Francois Dugast , iommu@lists.linux.dev, intel-xe@lists.freedesktop.org, Joerg Roedel , Calvin Owens , David Woodhouse , Will Deacon , Robin Murphy , Samiullah Khawaja , Thomas =?iso-8859-1?Q?Hellstr=F6m?= , Tina Zhang , Lu Baolu , Kevin Tian Subject: Re: Xe performance regression with recent IOMMU changes Message-ID: <20260122102607.GK13201@unreal> References: <20260121130233.257428-1-francois.dugast@intel.com> <20260121131135.GF1134360@nvidia.com> <20260121180449.GA1490142@nvidia.com> <20260122072913.GJ13201@unreal> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" On Wed, Jan 21, 2026 at 11:36:47PM -0800, Matthew Brost wrote: > On Thu, Jan 22, 2026 at 09:29:13AM +0200, Leon Romanovsky wrote: > > On Wed, Jan 21, 2026 at 10:15:14PM -0800, Matthew Brost wrote: > > > On Wed, Jan 21, 2026 at 02:04:49PM -0400, Jason Gunthorpe wrote: > > > > On Wed, Jan 21, 2026 at 09:11:35AM -0400, Jason Gunthorpe wrote: > > > > > On Wed, Jan 21, 2026 at 02:02:16PM +0100, Francois Dugast wrote: > > > > > > I am reporting a slowdown in Xe caused by a couple of IOMMU changes. It > > > > > > can be observed during DMA mappings/unmappings required to issue copies > > > > > > between system memory and the device, when handling GPU faults. Not sure > > > > > > how other use cases or vendors are affected but below is the impact on > > > > > > execution times for BMG: > > > > > > > > > > > > Before changes: > > > > > > 4KB > > > > > > drm_pagemap_migrate_map_pages: 0.4 us > > > > > > drm_pagemap_migrate_unmap_pages: 0.4 us > > > > > > 64KB > > > > > > drm_pagemap_migrate_map_pages: 2.5 us > > > > > > drm_pagemap_migrate_unmap_pages: 3.5 us > > > > > > 2MB > > > > > > drm_pagemap_migrate_map_pages: 88 us > > > > > > drm_pagemap_migrate_unmap_pages: 108 us > > > > > > > > > > > > After changes: > > > > > > 4KB > > > > > > drm_pagemap_migrate_map_pages: 0.7 us > > > > > > drm_pagemap_migrate_unmap_pages: 0.7 us > > > > > > 64KB > > > > > > drm_pagemap_migrate_map_pages: 3.5 us > > > > > > drm_pagemap_migrate_unmap_pages: 10.5 us > > > > > > 2MB > > > > > > drm_pagemap_migrate_map_pages: 102 us > > > > > > drm_pagemap_migrate_unmap_pages: 330 us > > > > > > > > > > I posted some more optimizations for these cases, it should reduce the > > > > > numbers. > > > > > > > > > > > We can try those — link? I believe I know the series, but just to make > > > sure we’re on the same page. > > > > > > > > This is the opposite of the benchmark numbers I ran which showed > > > > > significant gains as the page count and sizes increased. > > > > > > > > > > But something weird is going on to see a 3x increase in unmap, that > > > > > shouldn't be just algorithm overhead. That almost seems like > > > > > additional IOTLB invalidation overhead or something else going wrong. > > > > > > > > > > Is this from a system with the VT-d cache flushing requirement? That > > > > > logic changed around too and could have this kind of big impact. > > > > > > > > Oh looking at the code a bit you've got pretty much the slowest > > > > possible thing you can do here: > > > > > > This was a fairly common pattern prior to Leon’s series, I believe. The > > > cross-references show this pattern appearing frequently in the kernel > > > [1]. I do agree with the point below that, with Leon’s changes applied, > > > this could be refactored into an IOVA alloc/link/unlink/free flow, which > > > would work better (also 2M device pages reduces the common 2M case to a > > > mute point). > > > > > > But that’s not what we’re discussing here. We’re talking about a > > > regression introduced in the dma-mapping API for x86, which in my view > > > is unacceptable for a kernel release. So IMO we should revert those > > > changes [2]. > > > > > > [1] https://elixir.bootlin.com/linux/v6.18.6/A/ident/dma_unmap_page > > > > I think this comparison is unfair. The previous behavior was bad for > > everyone, while the current issue affects only the specific > > drm_pagemap_migrate_unmap_pages() flow. Cases where the performance of > > dma_unmap_page() in non-direct mode matters are extremely rare. > > > > I don’t think you can reason about this without extensive testing across > multiple platforms. Nor is it fair to say - sorry we slowed down your > existing code, good luck. It is not what I said. I only pointed to the specific point that loop over dma_unmap_page() is universally performance critical. Thanks