* Re: [PATCH] iommu: Always fill in gather when unmapping
2026-03-31 19:56 [PATCH] iommu: Always fill in gather when unmapping Jason Gunthorpe
@ 2026-04-01 10:40 ` Jon Hunter
2026-04-01 11:23 ` Pranjal Shrivastava
2026-04-01 16:33 ` Robin Murphy
2 siblings, 0 replies; 8+ messages in thread
From: Jon Hunter @ 2026-04-01 10:40 UTC (permalink / raw)
To: Jason Gunthorpe, Alexandre Ghiti, AngeloGioacchino Del Regno,
Albert Ou, asahi, Baolin Wang, iommu, Janne Grunau,
Jernej Skrabec, Joerg Roedel, Jean-Philippe Brucker,
linux-arm-kernel, linux-mediatek, linux-riscv, linux-sunxi,
Matthias Brugger, Neal Gompa, Orson Zhai, Palmer Dabbelt,
Paul Walmsley, Samuel Holland, Sven Peter, virtualization,
Chen-Yu Tsai, Will Deacon, Yong Wu, Chunyan Zhang
Cc: Lu Baolu, Janusz Krzysztofik, Joerg Roedel, patches, Robin Murphy,
Samiullah Khawaja, stable, Vasant Hegde,
linux-tegra@vger.kernel.org
On 31/03/2026 20:56, Jason Gunthorpe wrote:
> The fixed commit assumed that the gather would always be populated if
> an iotlb_sync was required.
>
> arm-smmu-v3, amd, VT-d, riscv, s390, mtk all use information from the
> gather during their iotlb_sync() and this approach works for them.
>
> However, arm-smmu, qcom_iommu, ipmmu-vmsa, sun50i, sprd, virtio,
> apple-dart all ignore the gather during their iotlb_sync(). They
> mostly issue a full flush.
>
> Unfortunately the latter set of drivers often don't bother to add
> anything to the gather since they don't intend on using it. Since the
> core code now blocks gathers that were never filled, this caused those
> drivers to stop getting their iotlb_sync() calls and breaks them.
>
> Since it is impossible to tell the difference between gathers that are
> empty because there is nothing to do and gathers that are empty
> because they are not used, fill in the gathers for the missing cases.
>
> io-pgtable might have intended to allow the driver to choose between
> gather or immediate flush because it passed gather to
> ops->tlb_add_page(), however no driver does anything with it.
>
> mtk uses io-pgtable-arm-v7s but added the range to the gather in the
> unmap callback. Move this into the io-pgtable-arm unmap itself. That
> will fix all the armv7 using drivers (arm-smmu, qcom_iommu,
> ipmmu-vmsa).
>
> arm-smmu uses both ARM_V7S and ARM LPAE formats. The LPAE formats
> already have the gather population because SMMUv3 requires it, so it
> becomes consistent.
>
> Add a trivial gather population to io-pgtable-dart.
>
> Add trivial populations to sprd, sun50i and virtio-iommu in their
> unmap functions.
>
> Fixes: 90c5def10bea ("iommu: Do not call drivers for empty gathers")
> Reported-by: Jon Hunter <jonathanh@nvidia.com>
> Closes: https://lore.kernel.org/r/8800a38b-8515-4bbe-af15-0dae81274bf7@nvidia.com
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
> drivers/iommu/io-pgtable-arm.c | 4 +++-
> drivers/iommu/io-pgtable-dart.c | 3 +++
> drivers/iommu/mtk_iommu.c | 1 -
> drivers/iommu/sprd-iommu.c | 1 +
> drivers/iommu/sun50i-iommu.c | 1 +
> drivers/iommu/virtio-iommu.c | 2 ++
> 6 files changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 0208e5897c299a..8572713a42ca29 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -666,9 +666,11 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
> /* Clear the remaining entries */
> __arm_lpae_clear_pte(ptep, &iop->cfg, i);
>
> - if (gather && !iommu_iotlb_gather_queued(gather))
> + if (gather && !iommu_iotlb_gather_queued(gather)) {
> + iommu_iotlb_gather_add_range(gather, iova, i * size);
> for (int j = 0; j < i; j++)
> io_pgtable_tlb_add_page(iop, gather, iova + j * size, size);
> + }
>
> return i * size;
> } else if (iopte_leaf(pte, lvl, iop->fmt)) {
> diff --git a/drivers/iommu/io-pgtable-dart.c b/drivers/iommu/io-pgtable-dart.c
> index cbc5d6aa2daa23..75d699dc28e7b0 100644
> --- a/drivers/iommu/io-pgtable-dart.c
> +++ b/drivers/iommu/io-pgtable-dart.c
> @@ -330,6 +330,9 @@ static size_t dart_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova,
> i++;
> }
>
> + if (i && !iommu_iotlb_gather_queued(gather))
> + iommu_iotlb_gather_add_range(gather, iova, i * pgsize);
> +
> return i * pgsize;
> }
>
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 2be990c108de2b..a2f80a92f51f2c 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -828,7 +828,6 @@ static size_t mtk_iommu_unmap(struct iommu_domain *domain,
> {
> struct mtk_iommu_domain *dom = to_mtk_domain(domain);
>
> - iommu_iotlb_gather_add_range(gather, iova, pgsize * pgcount);
> return dom->iop->unmap_pages(dom->iop, iova, pgsize, pgcount, gather);
> }
>
> diff --git a/drivers/iommu/sprd-iommu.c b/drivers/iommu/sprd-iommu.c
> index c1a34445d244fb..893ea67d322644 100644
> --- a/drivers/iommu/sprd-iommu.c
> +++ b/drivers/iommu/sprd-iommu.c
> @@ -340,6 +340,7 @@ static size_t sprd_iommu_unmap(struct iommu_domain *domain, unsigned long iova,
> spin_lock_irqsave(&dom->pgtlock, flags);
> memset(pgt_base_iova, 0, pgcount * sizeof(u32));
> spin_unlock_irqrestore(&dom->pgtlock, flags);
> + iommu_iotlb_gather_add_range(iotlb_gather, iova, size);
>
> return size;
> }
> diff --git a/drivers/iommu/sun50i-iommu.c b/drivers/iommu/sun50i-iommu.c
> index be3f1ce696ba29..b9aa4bbc82acad 100644
> --- a/drivers/iommu/sun50i-iommu.c
> +++ b/drivers/iommu/sun50i-iommu.c
> @@ -655,6 +655,7 @@ static size_t sun50i_iommu_unmap(struct iommu_domain *domain, unsigned long iova
>
> memset(pte_addr, 0, sizeof(*pte_addr));
> sun50i_table_flush(sun50i_domain, pte_addr, 1);
> + iommu_iotlb_gather_add_range(gather, iova, SZ_4K);
>
> return SZ_4K;
> }
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index 587fc13197f122..5865b8f6c6e67a 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -897,6 +897,8 @@ static size_t viommu_unmap_pages(struct iommu_domain *domain, unsigned long iova
> if (unmapped < size)
> return 0;
>
> + iommu_iotlb_gather_add_range(gather, iova, unmapped);
> +
> /* Device already removed all mappings after detach. */
> if (!vdomain->nr_endpoints)
> return unmapped;
>
> base-commit: fcbe430399ca5c318e99bfda6df9beee90ab051c
Fixes the issue I was seeing ...
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Thanks!
Jon
--
nvpublic
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] iommu: Always fill in gather when unmapping
2026-03-31 19:56 [PATCH] iommu: Always fill in gather when unmapping Jason Gunthorpe
2026-04-01 10:40 ` Jon Hunter
@ 2026-04-01 11:23 ` Pranjal Shrivastava
2026-04-01 12:58 ` Jason Gunthorpe
2026-04-01 16:33 ` Robin Murphy
2 siblings, 1 reply; 8+ messages in thread
From: Pranjal Shrivastava @ 2026-04-01 11:23 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Alexandre Ghiti, AngeloGioacchino Del Regno, Albert Ou, asahi,
Baolin Wang, iommu, Janne Grunau, Jernej Skrabec, Joerg Roedel,
Jean-Philippe Brucker, linux-arm-kernel, linux-mediatek,
linux-riscv, linux-sunxi, Matthias Brugger, Neal Gompa,
Orson Zhai, Palmer Dabbelt, Paul Walmsley, Samuel Holland,
Sven Peter, virtualization, Chen-Yu Tsai, Will Deacon, Yong Wu,
Chunyan Zhang, Lu Baolu, Janusz Krzysztofik, Joerg Roedel,
Jon Hunter, patches, Robin Murphy, Samiullah Khawaja, stable,
Vasant Hegde
On Tue, Mar 31, 2026 at 04:56:22PM -0300, Jason Gunthorpe wrote:
> The fixed commit assumed that the gather would always be populated if
> an iotlb_sync was required.
>
> arm-smmu-v3, amd, VT-d, riscv, s390, mtk all use information from the
> gather during their iotlb_sync() and this approach works for them.
>
> However, arm-smmu, qcom_iommu, ipmmu-vmsa, sun50i, sprd, virtio,
> apple-dart all ignore the gather during their iotlb_sync(). They
> mostly issue a full flush.
>
> Unfortunately the latter set of drivers often don't bother to add
> anything to the gather since they don't intend on using it. Since the
> core code now blocks gathers that were never filled, this caused those
> drivers to stop getting their iotlb_sync() calls and breaks them.
>
> Since it is impossible to tell the difference between gathers that are
> empty because there is nothing to do and gathers that are empty
> because they are not used, fill in the gathers for the missing cases.
>
I believe the problem is a fundamental disagreement between the core
layer and these drivers. The core assumes an empty gather means there
is no work to do, while these drivers expect a sync regardless. With
this, it seems we're forcing the drivers to lie to the core by
populating a gather they don't actually use just to trigger the sync.
I was wondering if, as a longer-term direction, having an explicit flag
for these drivers to indicate they always require a sync would be a
cleaner way to handle this than the trivial population?
Just a thought, not a hard disagreement with the current approach..
> io-pgtable might have intended to allow the driver to choose between
> gather or immediate flush because it passed gather to
> ops->tlb_add_page(), however no driver does anything with it.
>
> mtk uses io-pgtable-arm-v7s but added the range to the gather in the
> unmap callback. Move this into the io-pgtable-arm unmap itself. That
> will fix all the armv7 using drivers (arm-smmu, qcom_iommu,
> ipmmu-vmsa).
>
> arm-smmu uses both ARM_V7S and ARM LPAE formats. The LPAE formats
> already have the gather population because SMMUv3 requires it, so it
> becomes consistent.
>
> Add a trivial gather population to io-pgtable-dart.
>
> Add trivial populations to sprd, sun50i and virtio-iommu in their
> unmap functions.
>
> Fixes: 90c5def10bea ("iommu: Do not call drivers for empty gathers")
> Reported-by: Jon Hunter <jonathanh@nvidia.com>
> Closes: https://lore.kernel.org/r/8800a38b-8515-4bbe-af15-0dae81274bf7@nvidia.com
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
> drivers/iommu/io-pgtable-arm.c | 4 +++-
> drivers/iommu/io-pgtable-dart.c | 3 +++
> drivers/iommu/mtk_iommu.c | 1 -
> drivers/iommu/sprd-iommu.c | 1 +
> drivers/iommu/sun50i-iommu.c | 1 +
> drivers/iommu/virtio-iommu.c | 2 ++
> 6 files changed, 10 insertions(+), 2 deletions(-)
>
Acked-by: Pranjal Shrivastava <praan@google.com>
Thanks,
Praan
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] iommu: Always fill in gather when unmapping
2026-04-01 11:23 ` Pranjal Shrivastava
@ 2026-04-01 12:58 ` Jason Gunthorpe
0 siblings, 0 replies; 8+ messages in thread
From: Jason Gunthorpe @ 2026-04-01 12:58 UTC (permalink / raw)
To: Pranjal Shrivastava
Cc: Alexandre Ghiti, AngeloGioacchino Del Regno, Albert Ou, asahi,
Baolin Wang, iommu, Janne Grunau, Jernej Skrabec, Joerg Roedel,
Jean-Philippe Brucker, linux-arm-kernel, linux-mediatek,
linux-riscv, linux-sunxi, Matthias Brugger, Neal Gompa,
Orson Zhai, Palmer Dabbelt, Paul Walmsley, Samuel Holland,
Sven Peter, virtualization, Chen-Yu Tsai, Will Deacon, Yong Wu,
Chunyan Zhang, Lu Baolu, Janusz Krzysztofik, Joerg Roedel,
Jon Hunter, patches, Robin Murphy, Samiullah Khawaja, stable,
Vasant Hegde
On Wed, Apr 01, 2026 at 11:23:23AM +0000, Pranjal Shrivastava wrote:
> I was wondering if, as a longer-term direction, having an explicit flag
> for these drivers to indicate they always require a sync would be a
> cleaner way to handle this than the trivial population?
My first thought was to just set the gather to start=0,end=ULONG_MAX
but it turned out to be trivial to just set the right gather parameters
and it looks like it is basically the same cost..
Adding a flag would mean we have to test the flag on the other case
where we don't use this flow, which doesn't seem good either.
Jason
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] iommu: Always fill in gather when unmapping
2026-03-31 19:56 [PATCH] iommu: Always fill in gather when unmapping Jason Gunthorpe
2026-04-01 10:40 ` Jon Hunter
2026-04-01 11:23 ` Pranjal Shrivastava
@ 2026-04-01 16:33 ` Robin Murphy
2026-04-01 17:36 ` Jason Gunthorpe
2 siblings, 1 reply; 8+ messages in thread
From: Robin Murphy @ 2026-04-01 16:33 UTC (permalink / raw)
To: Jason Gunthorpe, Alexandre Ghiti, AngeloGioacchino Del Regno,
Albert Ou, asahi, Baolin Wang, iommu, Janne Grunau,
Jernej Skrabec, Joerg Roedel, Jean-Philippe Brucker,
linux-arm-kernel, linux-mediatek, linux-riscv, linux-sunxi,
Matthias Brugger, Neal Gompa, Orson Zhai, Palmer Dabbelt,
Paul Walmsley, Samuel Holland, Sven Peter, virtualization,
Chen-Yu Tsai, Will Deacon, Yong Wu, Chunyan Zhang
Cc: Lu Baolu, Janusz Krzysztofik, Joerg Roedel, Jon Hunter, patches,
Samiullah Khawaja, stable, Vasant Hegde
On 2026-03-31 8:56 pm, Jason Gunthorpe wrote:
> The fixed commit assumed that the gather would always be populated if
> an iotlb_sync was required.
>
> arm-smmu-v3, amd, VT-d, riscv, s390, mtk all use information from the
> gather during their iotlb_sync() and this approach works for them.
>
> However, arm-smmu, qcom_iommu, ipmmu-vmsa, sun50i, sprd, virtio,
> apple-dart all ignore the gather during their iotlb_sync(). They
> mostly issue a full flush.
>
> Unfortunately the latter set of drivers often don't bother to add
> anything to the gather since they don't intend on using it. Since the
> core code now blocks gathers that were never filled, this caused those
> drivers to stop getting their iotlb_sync() calls and breaks them.
>
> Since it is impossible to tell the difference between gathers that are
> empty because there is nothing to do and gathers that are empty
> because they are not used, fill in the gathers for the missing cases.
>
> io-pgtable might have intended to allow the driver to choose between
> gather or immediate flush because it passed gather to
> ops->tlb_add_page(), however no driver does anything with it.
Apart from arm-smmu-v3...
> mtk uses io-pgtable-arm-v7s but added the range to the gather in the
> unmap callback. Move this into the io-pgtable-arm unmap itself. That
> will fix all the armv7 using drivers (arm-smmu, qcom_iommu,
> ipmmu-vmsa).
io-pgtable-arm-v7s != io-pgtable-arm. You're *breaking* MTK (and failing
to fix the other v7s user, which is MSM).
> arm-smmu uses both ARM_V7S and ARM LPAE formats. The LPAE formats
> already have the gather population because SMMUv3 requires it, so it
> becomes consistent.
Huh? arm-smmu-v3 invokes iommu_iotlb_gather_add_page() itself, because
arm-smmu-v3 uses gathers; arm-smmu does not. io-pgtable-arm has nothing
to do with it. Invoking add range before add_page will end up defeating
the iommu_iotlb_gather_is_disjoint() check and making SMMUv3
overinvalidate between disjoint ranges.
I guess now I remember why we weren't validating gathers in core code
before :(
However, if it is for the sake of a core code check, why not just make
the core code robust itself?
Thanks,
Robin.
----->8-----
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 35db51780954..9ca23f89a279 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2714,6 +2714,10 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
pr_debug("unmapped: iova 0x%lx size 0x%zx\n",
iova, unmapped_page);
+ /* If the driver itself isn't using the gather, mark it used */
+ if (iotlb_gather->end <= iotlb_gather->start)
+ iommu_iotlb_gather_add_range(&iotlb_gather, iova, unmapped_page);
+
iova += unmapped_page;
unmapped += unmapped_page;
}
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH] iommu: Always fill in gather when unmapping
2026-04-01 16:33 ` Robin Murphy
@ 2026-04-01 17:36 ` Jason Gunthorpe
2026-04-02 18:11 ` Robin Murphy
0 siblings, 1 reply; 8+ messages in thread
From: Jason Gunthorpe @ 2026-04-01 17:36 UTC (permalink / raw)
To: Robin Murphy
Cc: Alexandre Ghiti, AngeloGioacchino Del Regno, Albert Ou, asahi,
Baolin Wang, iommu, Janne Grunau, Jernej Skrabec, Joerg Roedel,
Jean-Philippe Brucker, linux-arm-kernel, linux-mediatek,
linux-riscv, linux-sunxi, Matthias Brugger, Neal Gompa,
Orson Zhai, Palmer Dabbelt, Paul Walmsley, Samuel Holland,
Sven Peter, virtualization, Chen-Yu Tsai, Will Deacon, Yong Wu,
Chunyan Zhang, Lu Baolu, Janusz Krzysztofik, Joerg Roedel,
Jon Hunter, patches, Samiullah Khawaja, stable, Vasant Hegde
On Wed, Apr 01, 2026 at 05:33:28PM +0100, Robin Murphy wrote:
> > io-pgtable might have intended to allow the driver to choose between
> > gather or immediate flush because it passed gather to
> > ops->tlb_add_page(), however no driver does anything with it.
>
> Apart from arm-smmu-v3...
Bah, I did my research on the wrong tree and missed this.
> > mtk uses io-pgtable-arm-v7s but added the range to the gather in the
> > unmap callback. Move this into the io-pgtable-arm unmap itself. That
> > will fix all the armv7 using drivers (arm-smmu, qcom_iommu,
> > ipmmu-vmsa).
>
> io-pgtable-arm-v7s != io-pgtable-arm. You're *breaking* MTK (and failing
> to fix the other v7s user, which is MSM).
I was very confused what you were talking about, but I see now that
the hunk adding iommu_iotlb_gather_add_range() to v7 got lost somehow!
@@ -596,6 +596,9 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *data,
__arm_v7s_set_pte(ptep, 0, num_entries, &iop->cfg);
+ if (!iommu_iotlb_gather_queued(gather))
+ iommu_iotlb_gather_add_range(gather, iova, size);
+
for (i = 0; i < num_entries; i++) {
if (ARM_V7S_PTE_IS_TABLE(pte[i], lvl)) {
/* Also flush any partial walks */
> > arm-smmu uses both ARM_V7S and ARM LPAE formats. The LPAE formats
> > already have the gather population because SMMUv3 requires it, so it
> > becomes consistent.
>
> Huh? arm-smmu-v3 invokes iommu_iotlb_gather_add_page() itself, because
> arm-smmu-v3 uses gathers
Yeah, I missed this whole bit, it needs some changes.
> Invoking add range before add_page will end up defeating the
> iommu_iotlb_gather_is_disjoint() check and making SMMUv3
> overinvalidate between disjoint ranges.
Right, that flow needs fixing.
> I guess now I remember why we weren't validating gathers in core code
> before :(
My point is not filling the gather is a micro-optimization that
benefits a few drivers. I think it is so small compared to an IOTLB
flush that it isn't worth worrying about.
So, I'd like to make everything the same and populate the gather
correctly in all flows. I'll fix the SMMUv3 thing and lets look again,
this patch is not so scary to make me think we shouldn't do that.
> @@ -2714,6 +2714,10 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
> pr_debug("unmapped: iova 0x%lx size 0x%zx\n",
> iova, unmapped_page);
> + /* If the driver itself isn't using the gather, mark it used */
> + if (iotlb_gather->end <= iotlb_gather->start)
> + iommu_iotlb_gather_add_range(&iotlb_gather, iova, unmapped_page);
The gathers can be joined across unmaps and now we are inviting subtly
ill-formed gathers as only the first unmap will get included.
We do have error cases where the gather is legitimately empty, and
this would squash that, it probably needs to check unmapped_page for 0
too, at least.
Thanks,
Jason
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] iommu: Always fill in gather when unmapping
2026-04-01 17:36 ` Jason Gunthorpe
@ 2026-04-02 18:11 ` Robin Murphy
2026-04-02 22:51 ` Jason Gunthorpe
0 siblings, 1 reply; 8+ messages in thread
From: Robin Murphy @ 2026-04-02 18:11 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Alexandre Ghiti, AngeloGioacchino Del Regno, Albert Ou, asahi,
Baolin Wang, iommu, Janne Grunau, Jernej Skrabec, Joerg Roedel,
Jean-Philippe Brucker, linux-arm-kernel, linux-mediatek,
linux-riscv, linux-sunxi, Matthias Brugger, Neal Gompa,
Orson Zhai, Palmer Dabbelt, Paul Walmsley, Samuel Holland,
Sven Peter, virtualization, Chen-Yu Tsai, Will Deacon, Yong Wu,
Chunyan Zhang, Lu Baolu, Janusz Krzysztofik, Joerg Roedel,
Jon Hunter, patches, Samiullah Khawaja, stable, Vasant Hegde
On 2026-04-01 6:36 pm, Jason Gunthorpe wrote:
> On Wed, Apr 01, 2026 at 05:33:28PM +0100, Robin Murphy wrote:
>>> io-pgtable might have intended to allow the driver to choose between
>>> gather or immediate flush because it passed gather to
>>> ops->tlb_add_page(), however no driver does anything with it.
>>
>> Apart from arm-smmu-v3...
>
> Bah, I did my research on the wrong tree and missed this.
>
>>> mtk uses io-pgtable-arm-v7s but added the range to the gather in the
>>> unmap callback. Move this into the io-pgtable-arm unmap itself. That
>>> will fix all the armv7 using drivers (arm-smmu, qcom_iommu,
>>> ipmmu-vmsa).
>>
>> io-pgtable-arm-v7s != io-pgtable-arm. You're *breaking* MTK (and failing
>> to fix the other v7s user, which is MSM).
>
> I was very confused what you were talking about, but I see now that
> the hunk adding iommu_iotlb_gather_add_range() to v7 got lost somehow!
>
> @@ -596,6 +596,9 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *data,
>
> __arm_v7s_set_pte(ptep, 0, num_entries, &iop->cfg);
>
> + if (!iommu_iotlb_gather_queued(gather))
> + iommu_iotlb_gather_add_range(gather, iova, size);
> +
> for (i = 0; i < num_entries; i++) {
> if (ARM_V7S_PTE_IS_TABLE(pte[i], lvl)) {
> /* Also flush any partial walks */
>
>>> arm-smmu uses both ARM_V7S and ARM LPAE formats. The LPAE formats
>>> already have the gather population because SMMUv3 requires it, so it
>>> becomes consistent.
>>
>> Huh? arm-smmu-v3 invokes iommu_iotlb_gather_add_page() itself, because
>> arm-smmu-v3 uses gathers
>
> Yeah, I missed this whole bit, it needs some changes.
>
>> Invoking add range before add_page will end up defeating the
>> iommu_iotlb_gather_is_disjoint() check and making SMMUv3
>> overinvalidate between disjoint ranges.
>
> Right, that flow needs fixing.
>
>> I guess now I remember why we weren't validating gathers in core code
>> before :(
>
> My point is not filling the gather is a micro-optimization that
> benefits a few drivers. I think it is so small compared to an IOTLB
> flush that it isn't worth worrying about.
It's hardly a "micro-optimisation" for drivers to just not touch an
optional mechanism which offers no benefit to them, especially when in
many cases said mechanism is newer than the code that isn't using it
anyway. The only required semantic of .iotlb_sync is to ensure that any
previous .unmap_pages calls are complete and their associated
translations invalidated. The entire concept of gathering and deferred
invalidation is irrelevant to many IOMMU designs where it would only
stand to make overall invalidation performance worse.
I'm starting to wish I'd been able to page all this context back in
before reviewing the first patch, as I too only really had Intel and
SMMUv3 in mind at the time... :(
> So, I'd like to make everything the same and populate the gather
> correctly in all flows. I'll fix the SMMUv3 thing and lets look again,
> this patch is not so scary to make me think we shouldn't do that.
>
>> @@ -2714,6 +2714,10 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
>> pr_debug("unmapped: iova 0x%lx size 0x%zx\n",
>> iova, unmapped_page);
>> + /* If the driver itself isn't using the gather, mark it used */
>> + if (iotlb_gather->end <= iotlb_gather->start)
>> + iommu_iotlb_gather_add_range(&iotlb_gather, iova, unmapped_page);
>
> The gathers can be joined across unmaps and now we are inviting subtly
> ill-formed gathers as only the first unmap will get included.
Ill-formed? It's a perfectly valid range for the purposes of any
subsequent generic check - which couldn't realistically be anything
beyond empty vs. non-empty anyway - and it's only being set at all in
the case where we know the driver doesn't care, because if the driver
*was* going to look at gather->start or gather->end in its .iotlb_sync
then it must have already set them to meaningful values in the prior
successful .unmap_pages call. I think we can safely consider it invalid
for a driver to suddenly decide to start using a gather mid-way through
an unmap (or indeed to use start/end in any intentionally non-obvious
manner either).
> We do have error cases where the gather is legitimately empty, and
> this would squash that, it probably needs to check unmapped_page for 0
> too, at least.
Maybe try looking at the rest of the code around these lines...
Thanks,
Robin.
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] iommu: Always fill in gather when unmapping
2026-04-02 18:11 ` Robin Murphy
@ 2026-04-02 22:51 ` Jason Gunthorpe
0 siblings, 0 replies; 8+ messages in thread
From: Jason Gunthorpe @ 2026-04-02 22:51 UTC (permalink / raw)
To: Robin Murphy
Cc: Alexandre Ghiti, AngeloGioacchino Del Regno, Albert Ou, asahi,
Baolin Wang, iommu, Janne Grunau, Jernej Skrabec, Joerg Roedel,
Jean-Philippe Brucker, linux-arm-kernel, linux-mediatek,
linux-riscv, linux-sunxi, Matthias Brugger, Neal Gompa,
Orson Zhai, Palmer Dabbelt, Paul Walmsley, Samuel Holland,
Sven Peter, virtualization, Chen-Yu Tsai, Will Deacon, Yong Wu,
Chunyan Zhang, Lu Baolu, Janusz Krzysztofik, Joerg Roedel,
Jon Hunter, patches, Samiullah Khawaja, stable, Vasant Hegde
On Thu, Apr 02, 2026 at 07:11:13PM +0100, Robin Murphy wrote:
> > > @@ -2714,6 +2714,10 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
> > > pr_debug("unmapped: iova 0x%lx size 0x%zx\n",
> > > iova, unmapped_page);
> > > + /* If the driver itself isn't using the gather, mark it used */
> > > + if (iotlb_gather->end <= iotlb_gather->start)
> > > + iommu_iotlb_gather_add_range(&iotlb_gather, iova, unmapped_page);
> >
> > The gathers can be joined across unmaps and now we are inviting subtly
> > ill-formed gathers as only the first unmap will get included.
> > We do have error cases where the gather is legitimately empty, and
> > this would squash that, it probably needs to check unmapped_page for 0
> > too, at least.
>
> Maybe try looking at the rest of the code around these lines...
Okay, well lets do this one, do you want to send it since it is your
idea?
Jason
^ permalink raw reply [flat|nested] 8+ messages in thread