public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
* [PATCH] iommu: Always fill in gather when unmapping
@ 2026-03-31 19:56 Jason Gunthorpe
  2026-04-01 10:40 ` Jon Hunter
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Jason Gunthorpe @ 2026-03-31 19:56 UTC (permalink / raw)
  To: Alexandre Ghiti, AngeloGioacchino Del Regno, Albert Ou, asahi,
	Baolin Wang, iommu, Janne Grunau, Jernej Skrabec, Joerg Roedel,
	Jean-Philippe Brucker, linux-arm-kernel, linux-mediatek,
	linux-riscv, linux-sunxi, Matthias Brugger, Neal Gompa,
	Orson Zhai, Palmer Dabbelt, Paul Walmsley, Samuel Holland,
	Sven Peter, virtualization, Chen-Yu Tsai, Will Deacon, Yong Wu,
	Chunyan Zhang
  Cc: Lu Baolu, Janusz Krzysztofik, Joerg Roedel, Jon Hunter, patches,
	Robin Murphy, Samiullah Khawaja, stable, Vasant Hegde

The fixed commit assumed that the gather would always be populated if
an iotlb_sync was required.

arm-smmu-v3, amd, VT-d, riscv, s390, mtk all use information from the
gather during their iotlb_sync() and this approach works for them.

However, arm-smmu, qcom_iommu, ipmmu-vmsa, sun50i, sprd, virtio,
apple-dart all ignore the gather during their iotlb_sync(). They
mostly issue a full flush.

Unfortunately the latter set of drivers often don't bother to add
anything to the gather since they don't intend on using it. Since the
core code now blocks gathers that were never filled, this caused those
drivers to stop getting their iotlb_sync() calls and breaks them.

Since it is impossible to tell the difference between gathers that are
empty because there is nothing to do and gathers that are empty
because they are not used, fill in the gathers for the missing cases.

io-pgtable might have intended to allow the driver to choose between
gather or immediate flush because it passed gather to
ops->tlb_add_page(), however no driver does anything with it.

mtk uses io-pgtable-arm-v7s but added the range to the gather in the
unmap callback. Move this into the io-pgtable-arm unmap itself. That
will fix all the armv7 using drivers (arm-smmu, qcom_iommu,
ipmmu-vmsa).

arm-smmu uses both ARM_V7S and ARM LPAE formats. The LPAE formats
already have the gather population because SMMUv3 requires it, so it
becomes consistent.

Add a trivial gather population to io-pgtable-dart.

Add trivial populations to sprd, sun50i and virtio-iommu in their
unmap functions.

Fixes: 90c5def10bea ("iommu: Do not call drivers for empty gathers")
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Closes: https://lore.kernel.org/r/8800a38b-8515-4bbe-af15-0dae81274bf7@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/io-pgtable-arm.c  | 4 +++-
 drivers/iommu/io-pgtable-dart.c | 3 +++
 drivers/iommu/mtk_iommu.c       | 1 -
 drivers/iommu/sprd-iommu.c      | 1 +
 drivers/iommu/sun50i-iommu.c    | 1 +
 drivers/iommu/virtio-iommu.c    | 2 ++
 6 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 0208e5897c299a..8572713a42ca29 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -666,9 +666,11 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
 		/* Clear the remaining entries */
 		__arm_lpae_clear_pte(ptep, &iop->cfg, i);
 
-		if (gather && !iommu_iotlb_gather_queued(gather))
+		if (gather && !iommu_iotlb_gather_queued(gather)) {
+			iommu_iotlb_gather_add_range(gather, iova, i * size);
 			for (int j = 0; j < i; j++)
 				io_pgtable_tlb_add_page(iop, gather, iova + j * size, size);
+		}
 
 		return i * size;
 	} else if (iopte_leaf(pte, lvl, iop->fmt)) {
diff --git a/drivers/iommu/io-pgtable-dart.c b/drivers/iommu/io-pgtable-dart.c
index cbc5d6aa2daa23..75d699dc28e7b0 100644
--- a/drivers/iommu/io-pgtable-dart.c
+++ b/drivers/iommu/io-pgtable-dart.c
@@ -330,6 +330,9 @@ static size_t dart_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova,
 		i++;
 	}
 
+	if (i && !iommu_iotlb_gather_queued(gather))
+		iommu_iotlb_gather_add_range(gather, iova, i * pgsize);
+
 	return i * pgsize;
 }
 
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 2be990c108de2b..a2f80a92f51f2c 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -828,7 +828,6 @@ static size_t mtk_iommu_unmap(struct iommu_domain *domain,
 {
 	struct mtk_iommu_domain *dom = to_mtk_domain(domain);
 
-	iommu_iotlb_gather_add_range(gather, iova, pgsize * pgcount);
 	return dom->iop->unmap_pages(dom->iop, iova, pgsize, pgcount, gather);
 }
 
diff --git a/drivers/iommu/sprd-iommu.c b/drivers/iommu/sprd-iommu.c
index c1a34445d244fb..893ea67d322644 100644
--- a/drivers/iommu/sprd-iommu.c
+++ b/drivers/iommu/sprd-iommu.c
@@ -340,6 +340,7 @@ static size_t sprd_iommu_unmap(struct iommu_domain *domain, unsigned long iova,
 	spin_lock_irqsave(&dom->pgtlock, flags);
 	memset(pgt_base_iova, 0, pgcount * sizeof(u32));
 	spin_unlock_irqrestore(&dom->pgtlock, flags);
+	iommu_iotlb_gather_add_range(iotlb_gather, iova, size);
 
 	return size;
 }
diff --git a/drivers/iommu/sun50i-iommu.c b/drivers/iommu/sun50i-iommu.c
index be3f1ce696ba29..b9aa4bbc82acad 100644
--- a/drivers/iommu/sun50i-iommu.c
+++ b/drivers/iommu/sun50i-iommu.c
@@ -655,6 +655,7 @@ static size_t sun50i_iommu_unmap(struct iommu_domain *domain, unsigned long iova
 
 	memset(pte_addr, 0, sizeof(*pte_addr));
 	sun50i_table_flush(sun50i_domain, pte_addr, 1);
+	iommu_iotlb_gather_add_range(gather, iova, SZ_4K);
 
 	return SZ_4K;
 }
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 587fc13197f122..5865b8f6c6e67a 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -897,6 +897,8 @@ static size_t viommu_unmap_pages(struct iommu_domain *domain, unsigned long iova
 	if (unmapped < size)
 		return 0;
 
+	iommu_iotlb_gather_add_range(gather, iova, unmapped);
+
 	/* Device already removed all mappings after detach. */
 	if (!vdomain->nr_endpoints)
 		return unmapped;

base-commit: fcbe430399ca5c318e99bfda6df9beee90ab051c
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] iommu: Always fill in gather when unmapping
  2026-03-31 19:56 [PATCH] iommu: Always fill in gather when unmapping Jason Gunthorpe
@ 2026-04-01 10:40 ` Jon Hunter
  2026-04-01 11:23 ` Pranjal Shrivastava
  2026-04-01 16:33 ` Robin Murphy
  2 siblings, 0 replies; 8+ messages in thread
From: Jon Hunter @ 2026-04-01 10:40 UTC (permalink / raw)
  To: Jason Gunthorpe, Alexandre Ghiti, AngeloGioacchino Del Regno,
	Albert Ou, asahi, Baolin Wang, iommu, Janne Grunau,
	Jernej Skrabec, Joerg Roedel, Jean-Philippe Brucker,
	linux-arm-kernel, linux-mediatek, linux-riscv, linux-sunxi,
	Matthias Brugger, Neal Gompa, Orson Zhai, Palmer Dabbelt,
	Paul Walmsley, Samuel Holland, Sven Peter, virtualization,
	Chen-Yu Tsai, Will Deacon, Yong Wu, Chunyan Zhang
  Cc: Lu Baolu, Janusz Krzysztofik, Joerg Roedel, patches, Robin Murphy,
	Samiullah Khawaja, stable, Vasant Hegde,
	linux-tegra@vger.kernel.org


On 31/03/2026 20:56, Jason Gunthorpe wrote:
> The fixed commit assumed that the gather would always be populated if
> an iotlb_sync was required.
> 
> arm-smmu-v3, amd, VT-d, riscv, s390, mtk all use information from the
> gather during their iotlb_sync() and this approach works for them.
> 
> However, arm-smmu, qcom_iommu, ipmmu-vmsa, sun50i, sprd, virtio,
> apple-dart all ignore the gather during their iotlb_sync(). They
> mostly issue a full flush.
> 
> Unfortunately the latter set of drivers often don't bother to add
> anything to the gather since they don't intend on using it. Since the
> core code now blocks gathers that were never filled, this caused those
> drivers to stop getting their iotlb_sync() calls and breaks them.
> 
> Since it is impossible to tell the difference between gathers that are
> empty because there is nothing to do and gathers that are empty
> because they are not used, fill in the gathers for the missing cases.
> 
> io-pgtable might have intended to allow the driver to choose between
> gather or immediate flush because it passed gather to
> ops->tlb_add_page(), however no driver does anything with it.
> 
> mtk uses io-pgtable-arm-v7s but added the range to the gather in the
> unmap callback. Move this into the io-pgtable-arm unmap itself. That
> will fix all the armv7 using drivers (arm-smmu, qcom_iommu,
> ipmmu-vmsa).
> 
> arm-smmu uses both ARM_V7S and ARM LPAE formats. The LPAE formats
> already have the gather population because SMMUv3 requires it, so it
> becomes consistent.
> 
> Add a trivial gather population to io-pgtable-dart.
> 
> Add trivial populations to sprd, sun50i and virtio-iommu in their
> unmap functions.
> 
> Fixes: 90c5def10bea ("iommu: Do not call drivers for empty gathers")
> Reported-by: Jon Hunter <jonathanh@nvidia.com>
> Closes: https://lore.kernel.org/r/8800a38b-8515-4bbe-af15-0dae81274bf7@nvidia.com
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>   drivers/iommu/io-pgtable-arm.c  | 4 +++-
>   drivers/iommu/io-pgtable-dart.c | 3 +++
>   drivers/iommu/mtk_iommu.c       | 1 -
>   drivers/iommu/sprd-iommu.c      | 1 +
>   drivers/iommu/sun50i-iommu.c    | 1 +
>   drivers/iommu/virtio-iommu.c    | 2 ++
>   6 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 0208e5897c299a..8572713a42ca29 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -666,9 +666,11 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>   		/* Clear the remaining entries */
>   		__arm_lpae_clear_pte(ptep, &iop->cfg, i);
>   
> -		if (gather && !iommu_iotlb_gather_queued(gather))
> +		if (gather && !iommu_iotlb_gather_queued(gather)) {
> +			iommu_iotlb_gather_add_range(gather, iova, i * size);
>   			for (int j = 0; j < i; j++)
>   				io_pgtable_tlb_add_page(iop, gather, iova + j * size, size);
> +		}
>   
>   		return i * size;
>   	} else if (iopte_leaf(pte, lvl, iop->fmt)) {
> diff --git a/drivers/iommu/io-pgtable-dart.c b/drivers/iommu/io-pgtable-dart.c
> index cbc5d6aa2daa23..75d699dc28e7b0 100644
> --- a/drivers/iommu/io-pgtable-dart.c
> +++ b/drivers/iommu/io-pgtable-dart.c
> @@ -330,6 +330,9 @@ static size_t dart_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova,
>   		i++;
>   	}
>   
> +	if (i && !iommu_iotlb_gather_queued(gather))
> +		iommu_iotlb_gather_add_range(gather, iova, i * pgsize);
> +
>   	return i * pgsize;
>   }
>   
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 2be990c108de2b..a2f80a92f51f2c 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -828,7 +828,6 @@ static size_t mtk_iommu_unmap(struct iommu_domain *domain,
>   {
>   	struct mtk_iommu_domain *dom = to_mtk_domain(domain);
>   
> -	iommu_iotlb_gather_add_range(gather, iova, pgsize * pgcount);
>   	return dom->iop->unmap_pages(dom->iop, iova, pgsize, pgcount, gather);
>   }
>   
> diff --git a/drivers/iommu/sprd-iommu.c b/drivers/iommu/sprd-iommu.c
> index c1a34445d244fb..893ea67d322644 100644
> --- a/drivers/iommu/sprd-iommu.c
> +++ b/drivers/iommu/sprd-iommu.c
> @@ -340,6 +340,7 @@ static size_t sprd_iommu_unmap(struct iommu_domain *domain, unsigned long iova,
>   	spin_lock_irqsave(&dom->pgtlock, flags);
>   	memset(pgt_base_iova, 0, pgcount * sizeof(u32));
>   	spin_unlock_irqrestore(&dom->pgtlock, flags);
> +	iommu_iotlb_gather_add_range(iotlb_gather, iova, size);
>   
>   	return size;
>   }
> diff --git a/drivers/iommu/sun50i-iommu.c b/drivers/iommu/sun50i-iommu.c
> index be3f1ce696ba29..b9aa4bbc82acad 100644
> --- a/drivers/iommu/sun50i-iommu.c
> +++ b/drivers/iommu/sun50i-iommu.c
> @@ -655,6 +655,7 @@ static size_t sun50i_iommu_unmap(struct iommu_domain *domain, unsigned long iova
>   
>   	memset(pte_addr, 0, sizeof(*pte_addr));
>   	sun50i_table_flush(sun50i_domain, pte_addr, 1);
> +	iommu_iotlb_gather_add_range(gather, iova, SZ_4K);
>   
>   	return SZ_4K;
>   }
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index 587fc13197f122..5865b8f6c6e67a 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -897,6 +897,8 @@ static size_t viommu_unmap_pages(struct iommu_domain *domain, unsigned long iova
>   	if (unmapped < size)
>   		return 0;
>   
> +	iommu_iotlb_gather_add_range(gather, iova, unmapped);
> +
>   	/* Device already removed all mappings after detach. */
>   	if (!vdomain->nr_endpoints)
>   		return unmapped;
> 
> base-commit: fcbe430399ca5c318e99bfda6df9beee90ab051c


Fixes the issue I was seeing ...

Tested-by: Jon Hunter <jonathanh@nvidia.com>

Thanks!
Jon

-- 
nvpublic



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] iommu: Always fill in gather when unmapping
  2026-03-31 19:56 [PATCH] iommu: Always fill in gather when unmapping Jason Gunthorpe
  2026-04-01 10:40 ` Jon Hunter
@ 2026-04-01 11:23 ` Pranjal Shrivastava
  2026-04-01 12:58   ` Jason Gunthorpe
  2026-04-01 16:33 ` Robin Murphy
  2 siblings, 1 reply; 8+ messages in thread
From: Pranjal Shrivastava @ 2026-04-01 11:23 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alexandre Ghiti, AngeloGioacchino Del Regno, Albert Ou, asahi,
	Baolin Wang, iommu, Janne Grunau, Jernej Skrabec, Joerg Roedel,
	Jean-Philippe Brucker, linux-arm-kernel, linux-mediatek,
	linux-riscv, linux-sunxi, Matthias Brugger, Neal Gompa,
	Orson Zhai, Palmer Dabbelt, Paul Walmsley, Samuel Holland,
	Sven Peter, virtualization, Chen-Yu Tsai, Will Deacon, Yong Wu,
	Chunyan Zhang, Lu Baolu, Janusz Krzysztofik, Joerg Roedel,
	Jon Hunter, patches, Robin Murphy, Samiullah Khawaja, stable,
	Vasant Hegde

On Tue, Mar 31, 2026 at 04:56:22PM -0300, Jason Gunthorpe wrote:
> The fixed commit assumed that the gather would always be populated if
> an iotlb_sync was required.
> 
> arm-smmu-v3, amd, VT-d, riscv, s390, mtk all use information from the
> gather during their iotlb_sync() and this approach works for them.
> 
> However, arm-smmu, qcom_iommu, ipmmu-vmsa, sun50i, sprd, virtio,
> apple-dart all ignore the gather during their iotlb_sync(). They
> mostly issue a full flush.
> 
> Unfortunately the latter set of drivers often don't bother to add
> anything to the gather since they don't intend on using it. Since the
> core code now blocks gathers that were never filled, this caused those
> drivers to stop getting their iotlb_sync() calls and breaks them.
> 
> Since it is impossible to tell the difference between gathers that are
> empty because there is nothing to do and gathers that are empty
> because they are not used, fill in the gathers for the missing cases.
> 

I believe the problem is a fundamental disagreement between the core
layer and these drivers. The core assumes an empty gather means there
is no work to do, while these drivers expect a sync regardless. With
this, it seems we're forcing the drivers to lie to the core by 
populating a gather they don't actually use just to trigger the sync.

I was wondering if, as a longer-term direction, having an explicit flag
for these drivers to indicate they always require a sync would be a
cleaner way to handle this than the trivial population?

Just a thought, not a hard disagreement with the current approach..

> io-pgtable might have intended to allow the driver to choose between
> gather or immediate flush because it passed gather to
> ops->tlb_add_page(), however no driver does anything with it.
> 
> mtk uses io-pgtable-arm-v7s but added the range to the gather in the
> unmap callback. Move this into the io-pgtable-arm unmap itself. That
> will fix all the armv7 using drivers (arm-smmu, qcom_iommu,
> ipmmu-vmsa).
> 
> arm-smmu uses both ARM_V7S and ARM LPAE formats. The LPAE formats
> already have the gather population because SMMUv3 requires it, so it
> becomes consistent.
> 
> Add a trivial gather population to io-pgtable-dart.
> 
> Add trivial populations to sprd, sun50i and virtio-iommu in their
> unmap functions.
> 
> Fixes: 90c5def10bea ("iommu: Do not call drivers for empty gathers")
> Reported-by: Jon Hunter <jonathanh@nvidia.com>
> Closes: https://lore.kernel.org/r/8800a38b-8515-4bbe-af15-0dae81274bf7@nvidia.com
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/io-pgtable-arm.c  | 4 +++-
>  drivers/iommu/io-pgtable-dart.c | 3 +++
>  drivers/iommu/mtk_iommu.c       | 1 -
>  drivers/iommu/sprd-iommu.c      | 1 +
>  drivers/iommu/sun50i-iommu.c    | 1 +
>  drivers/iommu/virtio-iommu.c    | 2 ++
>  6 files changed, 10 insertions(+), 2 deletions(-)
>

Acked-by: Pranjal Shrivastava <praan@google.com>

Thanks,
Praan


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] iommu: Always fill in gather when unmapping
  2026-04-01 11:23 ` Pranjal Shrivastava
@ 2026-04-01 12:58   ` Jason Gunthorpe
  0 siblings, 0 replies; 8+ messages in thread
From: Jason Gunthorpe @ 2026-04-01 12:58 UTC (permalink / raw)
  To: Pranjal Shrivastava
  Cc: Alexandre Ghiti, AngeloGioacchino Del Regno, Albert Ou, asahi,
	Baolin Wang, iommu, Janne Grunau, Jernej Skrabec, Joerg Roedel,
	Jean-Philippe Brucker, linux-arm-kernel, linux-mediatek,
	linux-riscv, linux-sunxi, Matthias Brugger, Neal Gompa,
	Orson Zhai, Palmer Dabbelt, Paul Walmsley, Samuel Holland,
	Sven Peter, virtualization, Chen-Yu Tsai, Will Deacon, Yong Wu,
	Chunyan Zhang, Lu Baolu, Janusz Krzysztofik, Joerg Roedel,
	Jon Hunter, patches, Robin Murphy, Samiullah Khawaja, stable,
	Vasant Hegde

On Wed, Apr 01, 2026 at 11:23:23AM +0000, Pranjal Shrivastava wrote:

> I was wondering if, as a longer-term direction, having an explicit flag
> for these drivers to indicate they always require a sync would be a
> cleaner way to handle this than the trivial population?

My first thought was to just set the gather to start=0,end=ULONG_MAX
but it turned out to be trivial to just set the right gather parameters
and it looks like it is basically the same cost..

Adding a flag would mean we have to test the flag on the other case
where we don't use this flow, which doesn't seem good either.

Jason


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] iommu: Always fill in gather when unmapping
  2026-03-31 19:56 [PATCH] iommu: Always fill in gather when unmapping Jason Gunthorpe
  2026-04-01 10:40 ` Jon Hunter
  2026-04-01 11:23 ` Pranjal Shrivastava
@ 2026-04-01 16:33 ` Robin Murphy
  2026-04-01 17:36   ` Jason Gunthorpe
  2 siblings, 1 reply; 8+ messages in thread
From: Robin Murphy @ 2026-04-01 16:33 UTC (permalink / raw)
  To: Jason Gunthorpe, Alexandre Ghiti, AngeloGioacchino Del Regno,
	Albert Ou, asahi, Baolin Wang, iommu, Janne Grunau,
	Jernej Skrabec, Joerg Roedel, Jean-Philippe Brucker,
	linux-arm-kernel, linux-mediatek, linux-riscv, linux-sunxi,
	Matthias Brugger, Neal Gompa, Orson Zhai, Palmer Dabbelt,
	Paul Walmsley, Samuel Holland, Sven Peter, virtualization,
	Chen-Yu Tsai, Will Deacon, Yong Wu, Chunyan Zhang
  Cc: Lu Baolu, Janusz Krzysztofik, Joerg Roedel, Jon Hunter, patches,
	Samiullah Khawaja, stable, Vasant Hegde

On 2026-03-31 8:56 pm, Jason Gunthorpe wrote:
> The fixed commit assumed that the gather would always be populated if
> an iotlb_sync was required.
> 
> arm-smmu-v3, amd, VT-d, riscv, s390, mtk all use information from the
> gather during their iotlb_sync() and this approach works for them.
> 
> However, arm-smmu, qcom_iommu, ipmmu-vmsa, sun50i, sprd, virtio,
> apple-dart all ignore the gather during their iotlb_sync(). They
> mostly issue a full flush.
> 
> Unfortunately the latter set of drivers often don't bother to add
> anything to the gather since they don't intend on using it. Since the
> core code now blocks gathers that were never filled, this caused those
> drivers to stop getting their iotlb_sync() calls and breaks them.
> 
> Since it is impossible to tell the difference between gathers that are
> empty because there is nothing to do and gathers that are empty
> because they are not used, fill in the gathers for the missing cases.
> 
> io-pgtable might have intended to allow the driver to choose between
> gather or immediate flush because it passed gather to
> ops->tlb_add_page(), however no driver does anything with it.

Apart from arm-smmu-v3...

> mtk uses io-pgtable-arm-v7s but added the range to the gather in the
> unmap callback. Move this into the io-pgtable-arm unmap itself. That
> will fix all the armv7 using drivers (arm-smmu, qcom_iommu,
> ipmmu-vmsa).

io-pgtable-arm-v7s != io-pgtable-arm. You're *breaking* MTK (and failing
to fix the other v7s user, which is MSM).

> arm-smmu uses both ARM_V7S and ARM LPAE formats. The LPAE formats
> already have the gather population because SMMUv3 requires it, so it
> becomes consistent.

Huh? arm-smmu-v3 invokes iommu_iotlb_gather_add_page() itself, because
arm-smmu-v3 uses gathers; arm-smmu does not. io-pgtable-arm has nothing
to do with it. Invoking add range before add_page will end up defeating
the iommu_iotlb_gather_is_disjoint() check and making SMMUv3
overinvalidate between disjoint ranges.

I guess now I remember why we weren't validating gathers in core code
before :(

However, if it is for the sake of a core code check, why not just make
the core code robust itself?

Thanks,
Robin.

----->8-----
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 35db51780954..9ca23f89a279 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2714,6 +2714,10 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
  		pr_debug("unmapped: iova 0x%lx size 0x%zx\n",
  			 iova, unmapped_page);
  
+		/* If the driver itself isn't using the gather, mark it used */
+		if (iotlb_gather->end <= iotlb_gather->start)
+			iommu_iotlb_gather_add_range(&iotlb_gather, iova, unmapped_page);
+
  		iova += unmapped_page;
  		unmapped += unmapped_page;
  	}



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] iommu: Always fill in gather when unmapping
  2026-04-01 16:33 ` Robin Murphy
@ 2026-04-01 17:36   ` Jason Gunthorpe
  2026-04-02 18:11     ` Robin Murphy
  0 siblings, 1 reply; 8+ messages in thread
From: Jason Gunthorpe @ 2026-04-01 17:36 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Alexandre Ghiti, AngeloGioacchino Del Regno, Albert Ou, asahi,
	Baolin Wang, iommu, Janne Grunau, Jernej Skrabec, Joerg Roedel,
	Jean-Philippe Brucker, linux-arm-kernel, linux-mediatek,
	linux-riscv, linux-sunxi, Matthias Brugger, Neal Gompa,
	Orson Zhai, Palmer Dabbelt, Paul Walmsley, Samuel Holland,
	Sven Peter, virtualization, Chen-Yu Tsai, Will Deacon, Yong Wu,
	Chunyan Zhang, Lu Baolu, Janusz Krzysztofik, Joerg Roedel,
	Jon Hunter, patches, Samiullah Khawaja, stable, Vasant Hegde

On Wed, Apr 01, 2026 at 05:33:28PM +0100, Robin Murphy wrote:
> > io-pgtable might have intended to allow the driver to choose between
> > gather or immediate flush because it passed gather to
> > ops->tlb_add_page(), however no driver does anything with it.
> 
> Apart from arm-smmu-v3...

Bah, I did my research on the wrong tree and missed this.

> > mtk uses io-pgtable-arm-v7s but added the range to the gather in the
> > unmap callback. Move this into the io-pgtable-arm unmap itself. That
> > will fix all the armv7 using drivers (arm-smmu, qcom_iommu,
> > ipmmu-vmsa).
> 
> io-pgtable-arm-v7s != io-pgtable-arm. You're *breaking* MTK (and failing
> to fix the other v7s user, which is MSM).

I was very confused what you were talking about, but I see now that
the hunk adding iommu_iotlb_gather_add_range() to v7 got lost somehow!

@@ -596,6 +596,9 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *data,
 
                __arm_v7s_set_pte(ptep, 0, num_entries, &iop->cfg);
 
+               if (!iommu_iotlb_gather_queued(gather))
+                       iommu_iotlb_gather_add_range(gather, iova, size);
+
                for (i = 0; i < num_entries; i++) {
                        if (ARM_V7S_PTE_IS_TABLE(pte[i], lvl)) {
                                /* Also flush any partial walks */

> > arm-smmu uses both ARM_V7S and ARM LPAE formats. The LPAE formats
> > already have the gather population because SMMUv3 requires it, so it
> > becomes consistent.
> 
> Huh? arm-smmu-v3 invokes iommu_iotlb_gather_add_page() itself, because
> arm-smmu-v3 uses gathers

Yeah, I missed this whole bit, it needs some changes.

> Invoking add range before add_page will end up defeating the
> iommu_iotlb_gather_is_disjoint() check and making SMMUv3
> overinvalidate between disjoint ranges.

Right, that flow needs fixing.

> I guess now I remember why we weren't validating gathers in core code
> before :(

My point is not filling the gather is a micro-optimization that
benefits a few drivers. I think it is so small compared to an IOTLB
flush that it isn't worth worrying about.

So, I'd like to make everything the same and populate the gather
correctly in all flows. I'll fix the SMMUv3 thing and lets look again,
this patch is not so scary to make me think we shouldn't do that.

> @@ -2714,6 +2714,10 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
>  		pr_debug("unmapped: iova 0x%lx size 0x%zx\n",
>  			 iova, unmapped_page);
> +		/* If the driver itself isn't using the gather, mark it used */
> +		if (iotlb_gather->end <= iotlb_gather->start)
> +			iommu_iotlb_gather_add_range(&iotlb_gather, iova, unmapped_page);

The gathers can be joined across unmaps and now we are inviting subtly
ill-formed gathers as only the first unmap will get included.

We do have error cases where the gather is legitimately empty, and
this would squash that, it probably needs to check unmapped_page for 0
too, at least.

Thanks,
Jason


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] iommu: Always fill in gather when unmapping
  2026-04-01 17:36   ` Jason Gunthorpe
@ 2026-04-02 18:11     ` Robin Murphy
  2026-04-02 22:51       ` Jason Gunthorpe
  0 siblings, 1 reply; 8+ messages in thread
From: Robin Murphy @ 2026-04-02 18:11 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alexandre Ghiti, AngeloGioacchino Del Regno, Albert Ou, asahi,
	Baolin Wang, iommu, Janne Grunau, Jernej Skrabec, Joerg Roedel,
	Jean-Philippe Brucker, linux-arm-kernel, linux-mediatek,
	linux-riscv, linux-sunxi, Matthias Brugger, Neal Gompa,
	Orson Zhai, Palmer Dabbelt, Paul Walmsley, Samuel Holland,
	Sven Peter, virtualization, Chen-Yu Tsai, Will Deacon, Yong Wu,
	Chunyan Zhang, Lu Baolu, Janusz Krzysztofik, Joerg Roedel,
	Jon Hunter, patches, Samiullah Khawaja, stable, Vasant Hegde

On 2026-04-01 6:36 pm, Jason Gunthorpe wrote:
> On Wed, Apr 01, 2026 at 05:33:28PM +0100, Robin Murphy wrote:
>>> io-pgtable might have intended to allow the driver to choose between
>>> gather or immediate flush because it passed gather to
>>> ops->tlb_add_page(), however no driver does anything with it.
>>
>> Apart from arm-smmu-v3...
> 
> Bah, I did my research on the wrong tree and missed this.
> 
>>> mtk uses io-pgtable-arm-v7s but added the range to the gather in the
>>> unmap callback. Move this into the io-pgtable-arm unmap itself. That
>>> will fix all the armv7 using drivers (arm-smmu, qcom_iommu,
>>> ipmmu-vmsa).
>>
>> io-pgtable-arm-v7s != io-pgtable-arm. You're *breaking* MTK (and failing
>> to fix the other v7s user, which is MSM).
> 
> I was very confused what you were talking about, but I see now that
> the hunk adding iommu_iotlb_gather_add_range() to v7 got lost somehow!
> 
> @@ -596,6 +596,9 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *data,
>   
>                  __arm_v7s_set_pte(ptep, 0, num_entries, &iop->cfg);
>   
> +               if (!iommu_iotlb_gather_queued(gather))
> +                       iommu_iotlb_gather_add_range(gather, iova, size);
> +
>                  for (i = 0; i < num_entries; i++) {
>                          if (ARM_V7S_PTE_IS_TABLE(pte[i], lvl)) {
>                                  /* Also flush any partial walks */
> 
>>> arm-smmu uses both ARM_V7S and ARM LPAE formats. The LPAE formats
>>> already have the gather population because SMMUv3 requires it, so it
>>> becomes consistent.
>>
>> Huh? arm-smmu-v3 invokes iommu_iotlb_gather_add_page() itself, because
>> arm-smmu-v3 uses gathers
> 
> Yeah, I missed this whole bit, it needs some changes.
> 
>> Invoking add range before add_page will end up defeating the
>> iommu_iotlb_gather_is_disjoint() check and making SMMUv3
>> overinvalidate between disjoint ranges.
> 
> Right, that flow needs fixing.
> 
>> I guess now I remember why we weren't validating gathers in core code
>> before :(
> 
> My point is not filling the gather is a micro-optimization that
> benefits a few drivers. I think it is so small compared to an IOTLB
> flush that it isn't worth worrying about.

It's hardly a "micro-optimisation" for drivers to just not touch an 
optional mechanism which offers no benefit to them, especially when in 
many cases said mechanism is newer than the code that isn't using it 
anyway. The only required semantic of .iotlb_sync is to ensure that any 
previous .unmap_pages calls are complete and their associated 
translations invalidated. The entire concept of gathering and deferred 
invalidation is irrelevant to many IOMMU designs where it would only 
stand to make overall invalidation performance worse.

I'm starting to wish I'd been able to page all this context back in 
before reviewing the first patch, as I too only really had Intel and 
SMMUv3 in mind at the time... :(

> So, I'd like to make everything the same and populate the gather
> correctly in all flows. I'll fix the SMMUv3 thing and lets look again,
> this patch is not so scary to make me think we shouldn't do that.
> 
>> @@ -2714,6 +2714,10 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
>>   		pr_debug("unmapped: iova 0x%lx size 0x%zx\n",
>>   			 iova, unmapped_page);
>> +		/* If the driver itself isn't using the gather, mark it used */
>> +		if (iotlb_gather->end <= iotlb_gather->start)
>> +			iommu_iotlb_gather_add_range(&iotlb_gather, iova, unmapped_page);
> 
> The gathers can be joined across unmaps and now we are inviting subtly
> ill-formed gathers as only the first unmap will get included.

Ill-formed? It's a perfectly valid range for the purposes of any 
subsequent generic check - which couldn't realistically be anything 
beyond empty vs. non-empty anyway - and it's only being set at all in 
the case where we know the driver doesn't care, because if the driver 
*was* going to look at gather->start or gather->end in its .iotlb_sync 
then it must have already set them to meaningful values in the prior 
successful .unmap_pages call. I think we can safely consider it invalid 
for a driver to suddenly decide to start using a gather mid-way through 
an unmap (or indeed to use start/end in any intentionally non-obvious 
manner either).

> We do have error cases where the gather is legitimately empty, and
> this would squash that, it probably needs to check unmapped_page for 0
> too, at least.

Maybe try looking at the rest of the code around these lines...

Thanks,
Robin.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] iommu: Always fill in gather when unmapping
  2026-04-02 18:11     ` Robin Murphy
@ 2026-04-02 22:51       ` Jason Gunthorpe
  0 siblings, 0 replies; 8+ messages in thread
From: Jason Gunthorpe @ 2026-04-02 22:51 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Alexandre Ghiti, AngeloGioacchino Del Regno, Albert Ou, asahi,
	Baolin Wang, iommu, Janne Grunau, Jernej Skrabec, Joerg Roedel,
	Jean-Philippe Brucker, linux-arm-kernel, linux-mediatek,
	linux-riscv, linux-sunxi, Matthias Brugger, Neal Gompa,
	Orson Zhai, Palmer Dabbelt, Paul Walmsley, Samuel Holland,
	Sven Peter, virtualization, Chen-Yu Tsai, Will Deacon, Yong Wu,
	Chunyan Zhang, Lu Baolu, Janusz Krzysztofik, Joerg Roedel,
	Jon Hunter, patches, Samiullah Khawaja, stable, Vasant Hegde

On Thu, Apr 02, 2026 at 07:11:13PM +0100, Robin Murphy wrote:
> > > @@ -2714,6 +2714,10 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
> > >   		pr_debug("unmapped: iova 0x%lx size 0x%zx\n",
> > >   			 iova, unmapped_page);
> > > +		/* If the driver itself isn't using the gather, mark it used */
> > > +		if (iotlb_gather->end <= iotlb_gather->start)
> > > +			iommu_iotlb_gather_add_range(&iotlb_gather, iova, unmapped_page);
> > 
> > The gathers can be joined across unmaps and now we are inviting subtly
> > ill-formed gathers as only the first unmap will get included.

> > We do have error cases where the gather is legitimately empty, and
> > this would squash that, it probably needs to check unmapped_page for 0
> > too, at least.
> 
> Maybe try looking at the rest of the code around these lines...

Okay, well lets do this one, do you want to send it since it is your
idea?

Jason


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-04-02 22:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-31 19:56 [PATCH] iommu: Always fill in gather when unmapping Jason Gunthorpe
2026-04-01 10:40 ` Jon Hunter
2026-04-01 11:23 ` Pranjal Shrivastava
2026-04-01 12:58   ` Jason Gunthorpe
2026-04-01 16:33 ` Robin Murphy
2026-04-01 17:36   ` Jason Gunthorpe
2026-04-02 18:11     ` Robin Murphy
2026-04-02 22:51       ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox