* [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists
@ 2026-02-21 23:50 Antheas Kapenekakis
2026-02-23 6:02 ` Vasant Hegde
2026-02-24 19:23 ` Jason Gunthorpe
0 siblings, 2 replies; 16+ messages in thread
From: Antheas Kapenekakis @ 2026-02-21 23:50 UTC (permalink / raw)
To: iommu, linux-kernel
Cc: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
Vasant Hegde, Alejandro Jimenez, dnaim, Mario.Limonciello,
Antheas Kapenekakis
Commit 789a5913b29c ("iommu/amd: Use the generic iommu page table")
introduces the shared iommu page table for AMD IOMMU. Some bioses
contain an identity mapping for address 0x0, which is not parsed
properly (e.g., certain Strix Halo devices). This causes the DMA
components of the device to fail to initialize (e.g., the NVMe SSD
controller), leading to a failed post.
The failure is caused by iommu_create_device_direct_mappings(), which
is the new mapping implementation. In it, address aliasing is handled
via the following check:
```
phys_addr = iommu_iova_to_phys(domain, addr);
if (!phys_addr) {
map_size += pg_size;
continue;
}
````
Obviously, the iommu_iova_to_phys() signature is faulty and aliases
unmapped and 0 together, causing the allocation code to try to
re-allocate the 0 address per device. However, it has too many
instantiations to fix. Therefore, catch ret == -EADDRINUSE only when
addr == 0 and, instead of bailing, skip the mapping and print a warning.
Closes: https://www.reddit.com/r/cachyos/comments/1r5sgr6/freeze_on_boot_after_kernel_update_to_619_fixed/
Fixes: 789a5913b29c ("iommu/amd: Use the generic iommu page table")
Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
---
drivers/iommu/iommu.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 4926a43118e6..d424b7124311 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1224,6 +1224,11 @@ static int iommu_create_device_direct_mappings(struct iommu_domain *domain,
ret = iommu_map(domain, addr - map_size,
addr - map_size, map_size,
entry->prot, GFP_KERNEL);
+ if (ret == -EADDRINUSE && addr - map_size == 0) {
+ dev_warn_once(dev,
+ "iommu: identity mapping at addr 0x0 already exists, skipping\n");
+ ret = 0;
+ }
if (ret)
goto out;
map_size = 0;
base-commit: 57d76ceccee4b497eb835831206b50e72915a501
--
2.52.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists
2026-02-21 23:50 [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists Antheas Kapenekakis
@ 2026-02-23 6:02 ` Vasant Hegde
2026-02-23 7:46 ` Antheas Kapenekakis
2026-02-24 19:23 ` Jason Gunthorpe
1 sibling, 1 reply; 16+ messages in thread
From: Vasant Hegde @ 2026-02-23 6:02 UTC (permalink / raw)
To: Antheas Kapenekakis, iommu, linux-kernel
Cc: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
Alejandro Jimenez, dnaim, Mario.Limonciello
Antheas,
On 2/22/2026 5:20 AM, Antheas Kapenekakis wrote:
> Commit 789a5913b29c ("iommu/amd: Use the generic iommu page table")
> introduces the shared iommu page table for AMD IOMMU. Some bioses
> contain an identity mapping for address 0x0, which is not parsed
> properly (e.g., certain Strix Halo devices). This causes the DMA
> components of the device to fail to initialize (e.g., the NVMe SSD
> controller), leading to a failed post.
>
> The failure is caused by iommu_create_device_direct_mappings(), which
> is the new mapping implementation. In it, address aliasing is handled
> via the following check:
>
> ```
> phys_addr = iommu_iova_to_phys(domain, addr);
> if (!phys_addr) {
> map_size += pg_size;
> continue;
> }
> ````
Thanks for debugging and fixing it . Just wondering why can't we replace replace
above check with pfn_valid() ?
-Vasant
>
> Obviously, the iommu_iova_to_phys() signature is faulty and aliases
> unmapped and 0 together, causing the allocation code to try to
> re-allocate the 0 address per device. However, it has too many
> instantiations to fix. Therefore, catch ret == -EADDRINUSE only when
> addr == 0 and, instead of bailing, skip the mapping and print a warning.
>
> Closes: https://www.reddit.com/r/cachyos/comments/1r5sgr6/freeze_on_boot_after_kernel_update_to_619_fixed/
> Fixes: 789a5913b29c ("iommu/amd: Use the generic iommu page table")
> Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
> ---
> drivers/iommu/iommu.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 4926a43118e6..d424b7124311 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1224,6 +1224,11 @@ static int iommu_create_device_direct_mappings(struct iommu_domain *domain,
> ret = iommu_map(domain, addr - map_size,
> addr - map_size, map_size,
> entry->prot, GFP_KERNEL);
> + if (ret == -EADDRINUSE && addr - map_size == 0) {
> + dev_warn_once(dev,
> + "iommu: identity mapping at addr 0x0 already exists, skipping\n");
> + ret = 0;
> + }
> if (ret)
> goto out;
> map_size = 0;
>
> base-commit: 57d76ceccee4b497eb835831206b50e72915a501
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists
2026-02-23 6:02 ` Vasant Hegde
@ 2026-02-23 7:46 ` Antheas Kapenekakis
2026-02-24 8:25 ` Vasant Hegde
0 siblings, 1 reply; 16+ messages in thread
From: Antheas Kapenekakis @ 2026-02-23 7:46 UTC (permalink / raw)
To: Vasant Hegde
Cc: iommu, linux-kernel, Joerg Roedel, Will Deacon, Robin Murphy,
Jason Gunthorpe, Alejandro Jimenez, dnaim, Mario.Limonciello
On Mon, 23 Feb 2026 at 07:02, Vasant Hegde <vasant.hegde@amd.com> wrote:
>
> Antheas,
>
> On 2/22/2026 5:20 AM, Antheas Kapenekakis wrote:
> > Commit 789a5913b29c ("iommu/amd: Use the generic iommu page table")
> > introduces the shared iommu page table for AMD IOMMU. Some bioses
> > contain an identity mapping for address 0x0, which is not parsed
> > properly (e.g., certain Strix Halo devices). This causes the DMA
> > components of the device to fail to initialize (e.g., the NVMe SSD
> > controller), leading to a failed post.
> >
> > The failure is caused by iommu_create_device_direct_mappings(), which
> > is the new mapping implementation. In it, address aliasing is handled
> > via the following check:
> >
> > ```
> > phys_addr = iommu_iova_to_phys(domain, addr);
> > if (!phys_addr) {
> > map_size += pg_size;
> > continue;
> > }
> > ````
>
> Thanks for debugging and fixing it . Just wondering why can't we replace replace
> above check with pfn_valid() ?
>
Hi Vasant,
I can check later today
From a cursory glance though, it does not seem very suitable... Can you expand?
Antheas
>
> -Vasant
>
>
> >
> > Obviously, the iommu_iova_to_phys() signature is faulty and aliases
> > unmapped and 0 together, causing the allocation code to try to
> > re-allocate the 0 address per device. However, it has too many
> > instantiations to fix. Therefore, catch ret == -EADDRINUSE only when
> > addr == 0 and, instead of bailing, skip the mapping and print a warning.
> >
> > Closes: https://www.reddit.com/r/cachyos/comments/1r5sgr6/freeze_on_boot_after_kernel_update_to_619_fixed/
> > Fixes: 789a5913b29c ("iommu/amd: Use the generic iommu page table")
> > Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
> > ---
> > drivers/iommu/iommu.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index 4926a43118e6..d424b7124311 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -1224,6 +1224,11 @@ static int iommu_create_device_direct_mappings(struct iommu_domain *domain,
> > ret = iommu_map(domain, addr - map_size,
> > addr - map_size, map_size,
> > entry->prot, GFP_KERNEL);
> > + if (ret == -EADDRINUSE && addr - map_size == 0) {
> > + dev_warn_once(dev,
> > + "iommu: identity mapping at addr 0x0 already exists, skipping\n");
> > + ret = 0;
> > + }
> > if (ret)
> > goto out;
> > map_size = 0;
> >
> > base-commit: 57d76ceccee4b497eb835831206b50e72915a501
>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists
2026-02-23 7:46 ` Antheas Kapenekakis
@ 2026-02-24 8:25 ` Vasant Hegde
2026-02-24 9:16 ` Antheas Kapenekakis
0 siblings, 1 reply; 16+ messages in thread
From: Vasant Hegde @ 2026-02-24 8:25 UTC (permalink / raw)
To: Antheas Kapenekakis
Cc: iommu, linux-kernel, Joerg Roedel, Will Deacon, Robin Murphy,
Jason Gunthorpe, Alejandro Jimenez, dnaim, Mario.Limonciello
Hi Antheas,
On 2/23/2026 1:16 PM, Antheas Kapenekakis wrote:
> [You don't often get email from lkml@antheas.dev. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> On Mon, 23 Feb 2026 at 07:02, Vasant Hegde <vasant.hegde@amd.com> wrote:
>>
>> Antheas,
>>
>> On 2/22/2026 5:20 AM, Antheas Kapenekakis wrote:
>>> Commit 789a5913b29c ("iommu/amd: Use the generic iommu page table")
>>> introduces the shared iommu page table for AMD IOMMU. Some bioses
>>> contain an identity mapping for address 0x0, which is not parsed
>>> properly (e.g., certain Strix Halo devices). This causes the DMA
>>> components of the device to fail to initialize (e.g., the NVMe SSD
>>> controller), leading to a failed post.
>>>
>>> The failure is caused by iommu_create_device_direct_mappings(), which
>>> is the new mapping implementation. In it, address aliasing is handled
>>> via the following check:
>>>
>>> ```
>>> phys_addr = iommu_iova_to_phys(domain, addr);
>>> if (!phys_addr) {
>>> map_size += pg_size;
>>> continue;
>>> }
>>> ````
>>
>> Thanks for debugging and fixing it . Just wondering why can't we replace replace
>> above check with pfn_valid() ?
>>
>
> Hi Vasant,
> I can check later today
>
> From a cursory glance though, it does not seem very suitable... Can you expand?
Looking into the details, you are right. It won't work for direct_mapping().
May be we can have this patch for now, but I think on long run we should update
iova_to_phys() ops.
-Vasant
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists
2026-02-24 8:25 ` Vasant Hegde
@ 2026-02-24 9:16 ` Antheas Kapenekakis
0 siblings, 0 replies; 16+ messages in thread
From: Antheas Kapenekakis @ 2026-02-24 9:16 UTC (permalink / raw)
To: Vasant Hegde
Cc: iommu, linux-kernel, Joerg Roedel, Will Deacon, Robin Murphy,
Jason Gunthorpe, Alejandro Jimenez, dnaim, Mario.Limonciello
On Tue, 24 Feb 2026 at 09:26, Vasant Hegde <vasant.hegde@amd.com> wrote:
>
> Hi Antheas,
>
> On 2/23/2026 1:16 PM, Antheas Kapenekakis wrote:
> > [You don't often get email from lkml@antheas.dev. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> >
> > On Mon, 23 Feb 2026 at 07:02, Vasant Hegde <vasant.hegde@amd.com> wrote:
> >>
> >> Antheas,
> >>
> >> On 2/22/2026 5:20 AM, Antheas Kapenekakis wrote:
> >>> Commit 789a5913b29c ("iommu/amd: Use the generic iommu page table")
> >>> introduces the shared iommu page table for AMD IOMMU. Some bioses
> >>> contain an identity mapping for address 0x0, which is not parsed
> >>> properly (e.g., certain Strix Halo devices). This causes the DMA
> >>> components of the device to fail to initialize (e.g., the NVMe SSD
> >>> controller), leading to a failed post.
> >>>
> >>> The failure is caused by iommu_create_device_direct_mappings(), which
> >>> is the new mapping implementation. In it, address aliasing is handled
> >>> via the following check:
> >>>
> >>> ```
> >>> phys_addr = iommu_iova_to_phys(domain, addr);
> >>> if (!phys_addr) {
> >>> map_size += pg_size;
> >>> continue;
> >>> }
> >>> ````
> >>
> >> Thanks for debugging and fixing it . Just wondering why can't we replace replace
> >> above check with pfn_valid() ?
> >>
> >
> > Hi Vasant,
> > I can check later today
> >
> > From a cursory glance though, it does not seem very suitable... Can you expand?
>
> Looking into the details, you are right. It won't work for direct_mapping().
>
>
> May be we can have this patch for now, but I think on long run we should update
> iova_to_phys() ops.
Agreed. This commit can be backported so I would like to see it merged.
When you move forward with the refactor, feel free to cc for me to test
Best,
Antheas
> -Vasant
>
>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists
2026-02-21 23:50 [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists Antheas Kapenekakis
2026-02-23 6:02 ` Vasant Hegde
@ 2026-02-24 19:23 ` Jason Gunthorpe
2026-02-24 19:33 ` Antheas Kapenekakis
1 sibling, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2026-02-24 19:23 UTC (permalink / raw)
To: Antheas Kapenekakis
Cc: iommu, linux-kernel, Joerg Roedel, Will Deacon, Robin Murphy,
Vasant Hegde, Alejandro Jimenez, dnaim, Mario.Limonciello
On Sun, Feb 22, 2026 at 12:50:50AM +0100, Antheas Kapenekakis wrote:
> Commit 789a5913b29c ("iommu/amd: Use the generic iommu page table")
> introduces the shared iommu page table for AMD IOMMU. Some bioses
> contain an identity mapping for address 0x0, which is not parsed
> properly (e.g., certain Strix Halo devices). This causes the DMA
> components of the device to fail to initialize (e.g., the NVMe SSD
> controller), leading to a failed post.
I'm trying to understand the issue here, is it that the old AMD code
incorrectly succeeded iommu_map() on top of an existing mapping while
the new code returns -EADDRINUSE?
Then the existing guard for double mapping doesn't work since 0 is an
ambiguous return?
> @@ -1224,6 +1224,11 @@ static int iommu_create_device_direct_mappings(struct iommu_domain *domain,
> ret = iommu_map(domain, addr - map_size,
> addr - map_size, map_size,
> entry->prot, GFP_KERNEL);
> + if (ret == -EADDRINUSE && addr - map_size == 0) {
> + dev_warn_once(dev,
> + "iommu: identity mapping at addr 0x0 already exists, skipping\n");
> + ret = 0;
> + }
Nothing else prints here, so I wouldn't print either..
Apparently we just silently ignore if the BIOS creates conflicting
mappings for some reason..
I think it is OK to just ignore EADDRINUSE always, it unambigously
means a mapping is present and the intention of this logic is to
ignore double mappings to the same IOVA.
The cleaner fix is to correct the return code of iommu_iova_to_phys()
or to make EADDRINUSE reliable and remove the iommu_iova_to_phys(),
those are both a lot of trouble so I think this proposed single if is
reasonabe. Just please clean up the commit message to be a bit clearer
on what caused this regression and add a short comment above the new if:
0 return from iommu_iova_to_phys() is ambiguous, it could mean a
present mapping. EADDRINUSE is reliable when supported, it means the
IOVA is already mapped, so ignore it to resolve the ambiguity.
Thanks,
Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists
2026-02-24 19:23 ` Jason Gunthorpe
@ 2026-02-24 19:33 ` Antheas Kapenekakis
2026-02-24 19:43 ` Jason Gunthorpe
0 siblings, 1 reply; 16+ messages in thread
From: Antheas Kapenekakis @ 2026-02-24 19:33 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, linux-kernel, Joerg Roedel, Will Deacon, Robin Murphy,
Vasant Hegde, Alejandro Jimenez, dnaim, Mario.Limonciello
On Tue, 24 Feb 2026 at 20:23, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Sun, Feb 22, 2026 at 12:50:50AM +0100, Antheas Kapenekakis wrote:
> > Commit 789a5913b29c ("iommu/amd: Use the generic iommu page table")
> > introduces the shared iommu page table for AMD IOMMU. Some bioses
> > contain an identity mapping for address 0x0, which is not parsed
> > properly (e.g., certain Strix Halo devices). This causes the DMA
> > components of the device to fail to initialize (e.g., the NVMe SSD
> > controller), leading to a failed post.
>
> I'm trying to understand the issue here, is it that the old AMD code
> incorrectly succeeded iommu_map() on top of an existing mapping while
> the new code returns -EADDRINUSE?
>
> Then the existing guard for double mapping doesn't work since 0 is an
> ambiguous return?
Hi Jason,
It seems like the previous code correctly handled the 0 case, and the
new code does not due to the ambiguous return.
> > @@ -1224,6 +1224,11 @@ static int iommu_create_device_direct_mappings(struct iommu_domain *domain,
> > ret = iommu_map(domain, addr - map_size,
> > addr - map_size, map_size,
> > entry->prot, GFP_KERNEL);
> > + if (ret == -EADDRINUSE && addr - map_size == 0) {
> > + dev_warn_once(dev,
> > + "iommu: identity mapping at addr 0x0 already exists, skipping\n");
> > + ret = 0;
> > + }
>
> Nothing else prints here, so I wouldn't print either..
>
> Apparently we just silently ignore if the BIOS creates conflicting
> mappings for some reason..
It seems like multiple DMA devices get a shared memory space, so
applying the mappings multiple times is expected. The problem is
handling the 0 address.
> I think it is OK to just ignore EADDRINUSE always, it unambigously
> means a mapping is present and the intention of this logic is to
> ignore double mappings to the same IOVA.
>
> The cleaner fix is to correct the return code of iommu_iova_to_phys()
> or to make EADDRINUSE reliable and remove the iommu_iova_to_phys(),
> those are both a lot of trouble so I think this proposed single if is
> reasonabe. Just please clean up the commit message to be a bit clearer
> on what caused this regression and add a short comment above the new if:
>
> 0 return from iommu_iova_to_phys() is ambiguous, it could mean a
> present mapping. EADDRINUSE is reliable when supported, it means the
> IOVA is already mapped, so ignore it to resolve the ambiguity.
If you think that this patch is a long term solution, I can drop the
print, the addr == 0 check, add a comment on top, then repost a V2
with an updated description to be more concrete.
Best,
Antheas
> Thanks,
> Jason
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists
2026-02-24 19:33 ` Antheas Kapenekakis
@ 2026-02-24 19:43 ` Jason Gunthorpe
2026-02-24 19:51 ` Antheas Kapenekakis
0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2026-02-24 19:43 UTC (permalink / raw)
To: Antheas Kapenekakis
Cc: iommu, linux-kernel, Joerg Roedel, Will Deacon, Robin Murphy,
Vasant Hegde, Alejandro Jimenez, dnaim, Mario.Limonciello
On Tue, Feb 24, 2026 at 08:33:50PM +0100, Antheas Kapenekakis wrote:
> On Tue, 24 Feb 2026 at 20:23, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Sun, Feb 22, 2026 at 12:50:50AM +0100, Antheas Kapenekakis wrote:
> > > Commit 789a5913b29c ("iommu/amd: Use the generic iommu page table")
> > > introduces the shared iommu page table for AMD IOMMU. Some bioses
> > > contain an identity mapping for address 0x0, which is not parsed
> > > properly (e.g., certain Strix Halo devices). This causes the DMA
> > > components of the device to fail to initialize (e.g., the NVMe SSD
> > > controller), leading to a failed post.
> >
> > I'm trying to understand the issue here, is it that the old AMD code
> > incorrectly succeeded iommu_map() on top of an existing mapping while
> > the new code returns -EADDRINUSE?
> >
> > Then the existing guard for double mapping doesn't work since 0 is an
> > ambiguous return?
>
> Hi Jason,
>
> It seems like the previous code correctly handled the 0 case, and the
> new code does not due to the ambiguous return.
I checked the old AMD driver it is definately returning 0 from
iommu_iova_to_phys(), so that hasn't changed.
> > reasonabe. Just please clean up the commit message to be a bit clearer
> > on what caused this regression and add a short comment above the new if:
> >
> > 0 return from iommu_iova_to_phys() is ambiguous, it could mean a
> > present mapping. EADDRINUSE is reliable when supported, it means the
> > IOVA is already mapped, so ignore it to resolve the ambiguity.
>
> If you think that this patch is a long term solution, I can drop the
> print, the addr == 0 check, add a comment on top, then repost a V2
> with an updated description to be more concrete.
I think it is a good solution as-is, I just want clarity on what
actually regressed here, and I think it is that iommu_map() behavior
changed.
Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists
2026-02-24 19:43 ` Jason Gunthorpe
@ 2026-02-24 19:51 ` Antheas Kapenekakis
2026-02-25 18:27 ` Robin Murphy
0 siblings, 1 reply; 16+ messages in thread
From: Antheas Kapenekakis @ 2026-02-24 19:51 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, linux-kernel, Joerg Roedel, Will Deacon, Robin Murphy,
Vasant Hegde, Alejandro Jimenez, dnaim, Mario.Limonciello
On Tue, 24 Feb 2026 at 20:43, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Feb 24, 2026 at 08:33:50PM +0100, Antheas Kapenekakis wrote:
> > On Tue, 24 Feb 2026 at 20:23, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Sun, Feb 22, 2026 at 12:50:50AM +0100, Antheas Kapenekakis wrote:
> > > > Commit 789a5913b29c ("iommu/amd: Use the generic iommu page table")
> > > > introduces the shared iommu page table for AMD IOMMU. Some bioses
> > > > contain an identity mapping for address 0x0, which is not parsed
> > > > properly (e.g., certain Strix Halo devices). This causes the DMA
> > > > components of the device to fail to initialize (e.g., the NVMe SSD
> > > > controller), leading to a failed post.
> > >
> > > I'm trying to understand the issue here, is it that the old AMD code
> > > incorrectly succeeded iommu_map() on top of an existing mapping while
> > > the new code returns -EADDRINUSE?
> > >
> > > Then the existing guard for double mapping doesn't work since 0 is an
> > > ambiguous return?
> >
> > Hi Jason,
> >
> > It seems like the previous code correctly handled the 0 case, and the
> > new code does not due to the ambiguous return.
>
> I checked the old AMD driver it is definately returning 0 from
> iommu_iova_to_phys(), so that hasn't changed.
>
> > > reasonabe. Just please clean up the commit message to be a bit clearer
> > > on what caused this regression and add a short comment above the new if:
> > >
> > > 0 return from iommu_iova_to_phys() is ambiguous, it could mean a
> > > present mapping. EADDRINUSE is reliable when supported, it means the
> > > IOVA is already mapped, so ignore it to resolve the ambiguity.
> >
> > If you think that this patch is a long term solution, I can drop the
> > print, the addr == 0 check, add a comment on top, then repost a V2
> > with an updated description to be more concrete.
>
> I think it is a good solution as-is, I just want clarity on what
> actually regressed here, and I think it is that iommu_map() behavior
> changed.
Reverting 789a5913b29c ("iommu/amd: Use the generic iommu page table")
plus two follow-up fixes also worked.
Looking through it, the removed amd_iommu_map_pages() /
amd_iommu_iova_to_phys() in the previous ops might have behaved
differently.
Antheas
> Jason
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists
2026-02-24 19:51 ` Antheas Kapenekakis
@ 2026-02-25 18:27 ` Robin Murphy
2026-02-25 22:05 ` Antheas Kapenekakis
0 siblings, 1 reply; 16+ messages in thread
From: Robin Murphy @ 2026-02-25 18:27 UTC (permalink / raw)
To: Antheas Kapenekakis, Jason Gunthorpe
Cc: iommu, linux-kernel, Joerg Roedel, Will Deacon, Vasant Hegde,
Alejandro Jimenez, dnaim, Mario.Limonciello
On 2026-02-24 7:51 pm, Antheas Kapenekakis wrote:
> On Tue, 24 Feb 2026 at 20:43, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>>
>> On Tue, Feb 24, 2026 at 08:33:50PM +0100, Antheas Kapenekakis wrote:
>>> On Tue, 24 Feb 2026 at 20:23, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>>>>
>>>> On Sun, Feb 22, 2026 at 12:50:50AM +0100, Antheas Kapenekakis wrote:
>>>>> Commit 789a5913b29c ("iommu/amd: Use the generic iommu page table")
>>>>> introduces the shared iommu page table for AMD IOMMU. Some bioses
>>>>> contain an identity mapping for address 0x0, which is not parsed
>>>>> properly (e.g., certain Strix Halo devices). This causes the DMA
>>>>> components of the device to fail to initialize (e.g., the NVMe SSD
>>>>> controller), leading to a failed post.
>>>>
>>>> I'm trying to understand the issue here, is it that the old AMD code
>>>> incorrectly succeeded iommu_map() on top of an existing mapping while
>>>> the new code returns -EADDRINUSE?
>>>>
>>>> Then the existing guard for double mapping doesn't work since 0 is an
>>>> ambiguous return?
>>>
>>> Hi Jason,
>>>
>>> It seems like the previous code correctly handled the 0 case, and the
>>> new code does not due to the ambiguous return.
>>
>> I checked the old AMD driver it is definately returning 0 from
>> iommu_iova_to_phys(), so that hasn't changed.
>>
>>>> reasonabe. Just please clean up the commit message to be a bit clearer
>>>> on what caused this regression and add a short comment above the new if:
>>>>
>>>> 0 return from iommu_iova_to_phys() is ambiguous, it could mean a
>>>> present mapping. EADDRINUSE is reliable when supported, it means the
>>>> IOVA is already mapped, so ignore it to resolve the ambiguity.
>>>
>>> If you think that this patch is a long term solution, I can drop the
>>> print, the addr == 0 check, add a comment on top, then repost a V2
>>> with an updated description to be more concrete.
>>
>> I think it is a good solution as-is, I just want clarity on what
>> actually regressed here, and I think it is that iommu_map() behavior
>> changed.
>
> Reverting 789a5913b29c ("iommu/amd: Use the generic iommu page table")
> plus two follow-up fixes also worked.
>
> Looking through it, the removed amd_iommu_map_pages() /
> amd_iommu_iova_to_phys() in the previous ops might have behaved
> differently.
Yes, the old AMD code would happily replace an existing mapping without
question (but whether it would correctly invalidate TLBs if just
replacing a leaf entry with a different leaf entry at the same level
I'm not entirely sure...)
However, waiting for a mapping to fail (which may scream its own
warning) and then attempting to paper over it seems like an awful bodge.
Why not just do something like this?
Cheers,
Robin.
----->8-----
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 35db51780954..682e95c94236 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1212,8 +1212,11 @@ static int iommu_create_device_direct_mappings(struct iommu_domain *domain,
if (addr == end)
goto map_end;
-
- phys_addr = iommu_iova_to_phys(domain, addr);
+ /*
+ * Check for existing translations using an offset, to
+ * distinguish a mapping of the page at PA 0 from none.
+ */
+ phys_addr = iommu_iova_to_phys(domain, addr + 1);
if (!phys_addr) {
map_size += pg_size;
continue;
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists
2026-02-25 18:27 ` Robin Murphy
@ 2026-02-25 22:05 ` Antheas Kapenekakis
2026-02-26 19:46 ` Robin Murphy
0 siblings, 1 reply; 16+ messages in thread
From: Antheas Kapenekakis @ 2026-02-25 22:05 UTC (permalink / raw)
To: Robin Murphy
Cc: Jason Gunthorpe, iommu, linux-kernel, Joerg Roedel, Will Deacon,
Vasant Hegde, Alejandro Jimenez, dnaim, Mario.Limonciello
On Wed, 25 Feb 2026 at 19:27, Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 2026-02-24 7:51 pm, Antheas Kapenekakis wrote:
> > On Tue, 24 Feb 2026 at 20:43, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >>
> >> On Tue, Feb 24, 2026 at 08:33:50PM +0100, Antheas Kapenekakis wrote:
> >>> On Tue, 24 Feb 2026 at 20:23, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >>>>
> >>>> On Sun, Feb 22, 2026 at 12:50:50AM +0100, Antheas Kapenekakis wrote:
> >>>>> Commit 789a5913b29c ("iommu/amd: Use the generic iommu page table")
> >>>>> introduces the shared iommu page table for AMD IOMMU. Some bioses
> >>>>> contain an identity mapping for address 0x0, which is not parsed
> >>>>> properly (e.g., certain Strix Halo devices). This causes the DMA
> >>>>> components of the device to fail to initialize (e.g., the NVMe SSD
> >>>>> controller), leading to a failed post.
> >>>>
> >>>> I'm trying to understand the issue here, is it that the old AMD code
> >>>> incorrectly succeeded iommu_map() on top of an existing mapping while
> >>>> the new code returns -EADDRINUSE?
> >>>>
> >>>> Then the existing guard for double mapping doesn't work since 0 is an
> >>>> ambiguous return?
> >>>
> >>> Hi Jason,
> >>>
> >>> It seems like the previous code correctly handled the 0 case, and the
> >>> new code does not due to the ambiguous return.
> >>
> >> I checked the old AMD driver it is definately returning 0 from
> >> iommu_iova_to_phys(), so that hasn't changed.
> >>
> >>>> reasonabe. Just please clean up the commit message to be a bit clearer
> >>>> on what caused this regression and add a short comment above the new if:
> >>>>
> >>>> 0 return from iommu_iova_to_phys() is ambiguous, it could mean a
> >>>> present mapping. EADDRINUSE is reliable when supported, it means the
> >>>> IOVA is already mapped, so ignore it to resolve the ambiguity.
> >>>
> >>> If you think that this patch is a long term solution, I can drop the
> >>> print, the addr == 0 check, add a comment on top, then repost a V2
> >>> with an updated description to be more concrete.
> >>
> >> I think it is a good solution as-is, I just want clarity on what
> >> actually regressed here, and I think it is that iommu_map() behavior
> >> changed.
> >
> > Reverting 789a5913b29c ("iommu/amd: Use the generic iommu page table")
> > plus two follow-up fixes also worked.
> >
> > Looking through it, the removed amd_iommu_map_pages() /
> > amd_iommu_iova_to_phys() in the previous ops might have behaved
> > differently.
>
> Yes, the old AMD code would happily replace an existing mapping without
> question (but whether it would correctly invalidate TLBs if just
> replacing a leaf entry with a different leaf entry at the same level
> I'm not entirely sure...)
>
> However, waiting for a mapping to fail (which may scream its own
> warning) and then attempting to paper over it seems like an awful bodge.
> Why not just do something like this?
>
> Cheers,
> Robin.
>
> ----->8-----
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 35db51780954..682e95c94236 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1212,8 +1212,11 @@ static int iommu_create_device_direct_mappings(struct iommu_domain *domain,
>
> if (addr == end)
> goto map_end;
> -
> - phys_addr = iommu_iova_to_phys(domain, addr);
> + /*
> + * Check for existing translations using an offset, to
> + * distinguish a mapping of the page at PA 0 from none.
> + */
> + phys_addr = iommu_iova_to_phys(domain, addr + 1);
Hi Robin,
Assuming pg_size != 1, which it seems to always be due to `1UL <<
__ffs(domain->pgsize_bitmap)` is an interesting solution. Then again,
it assumes that the underlying implementations for iommu_iova_to_phys
would respect translating unaligned addresses. I cannot comment on
that.
I would tend to consider both solutions not ideal, and edge towards
Jason's suggestion of universally ignoring EADDRINUSE without an
error. However, I am not an absolutist in that regard.
I am on a business trip and do not have the SMTP credentials with me
for git, so I will send a V2 tomorrow. I already drafted it. I will
make sure to cc new participants on this thread if not already
included.
As an aside, it seems that the issue fixed by this patch is not the
only one introduced in 6.19, with [1] having users where this patch
does not help. It seems there are additional issues with thunderbolt
docks. iommu_map could be the culprit and perhaps forceful remapping,
as was done previously for AMD would be a more universal solution.
This way, EADDRINUSE stops being an error.
Best,
Antheas
[1] https://github.com/CachyOS/linux-cachyos/issues/704
> if (!phys_addr) {
> map_size += pg_size;
> continue;
>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists
2026-02-25 22:05 ` Antheas Kapenekakis
@ 2026-02-26 19:46 ` Robin Murphy
2026-02-26 20:40 ` Antheas Kapenekakis
0 siblings, 1 reply; 16+ messages in thread
From: Robin Murphy @ 2026-02-26 19:46 UTC (permalink / raw)
To: Antheas Kapenekakis
Cc: Jason Gunthorpe, iommu, linux-kernel, Joerg Roedel, Will Deacon,
Vasant Hegde, Alejandro Jimenez, dnaim, Mario.Limonciello
On 25/02/2026 10:05 pm, Antheas Kapenekakis wrote:
> On Wed, 25 Feb 2026 at 19:27, Robin Murphy <robin.murphy@arm.com> wrote:
>>
>> On 2026-02-24 7:51 pm, Antheas Kapenekakis wrote:
>>> On Tue, 24 Feb 2026 at 20:43, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>>>>
>>>> On Tue, Feb 24, 2026 at 08:33:50PM +0100, Antheas Kapenekakis wrote:
>>>>> On Tue, 24 Feb 2026 at 20:23, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>>>>>>
>>>>>> On Sun, Feb 22, 2026 at 12:50:50AM +0100, Antheas Kapenekakis wrote:
>>>>>>> Commit 789a5913b29c ("iommu/amd: Use the generic iommu page table")
>>>>>>> introduces the shared iommu page table for AMD IOMMU. Some bioses
>>>>>>> contain an identity mapping for address 0x0, which is not parsed
>>>>>>> properly (e.g., certain Strix Halo devices). This causes the DMA
>>>>>>> components of the device to fail to initialize (e.g., the NVMe SSD
>>>>>>> controller), leading to a failed post.
>>>>>>
>>>>>> I'm trying to understand the issue here, is it that the old AMD code
>>>>>> incorrectly succeeded iommu_map() on top of an existing mapping while
>>>>>> the new code returns -EADDRINUSE?
>>>>>>
>>>>>> Then the existing guard for double mapping doesn't work since 0 is an
>>>>>> ambiguous return?
>>>>>
>>>>> Hi Jason,
>>>>>
>>>>> It seems like the previous code correctly handled the 0 case, and the
>>>>> new code does not due to the ambiguous return.
>>>>
>>>> I checked the old AMD driver it is definately returning 0 from
>>>> iommu_iova_to_phys(), so that hasn't changed.
>>>>
>>>>>> reasonabe. Just please clean up the commit message to be a bit clearer
>>>>>> on what caused this regression and add a short comment above the new if:
>>>>>>
>>>>>> 0 return from iommu_iova_to_phys() is ambiguous, it could mean a
>>>>>> present mapping. EADDRINUSE is reliable when supported, it means the
>>>>>> IOVA is already mapped, so ignore it to resolve the ambiguity.
>>>>>
>>>>> If you think that this patch is a long term solution, I can drop the
>>>>> print, the addr == 0 check, add a comment on top, then repost a V2
>>>>> with an updated description to be more concrete.
>>>>
>>>> I think it is a good solution as-is, I just want clarity on what
>>>> actually regressed here, and I think it is that iommu_map() behavior
>>>> changed.
>>>
>>> Reverting 789a5913b29c ("iommu/amd: Use the generic iommu page table")
>>> plus two follow-up fixes also worked.
>>>
>>> Looking through it, the removed amd_iommu_map_pages() /
>>> amd_iommu_iova_to_phys() in the previous ops might have behaved
>>> differently.
>>
>> Yes, the old AMD code would happily replace an existing mapping without
>> question (but whether it would correctly invalidate TLBs if just
>> replacing a leaf entry with a different leaf entry at the same level
>> I'm not entirely sure...)
>>
>> However, waiting for a mapping to fail (which may scream its own
>> warning) and then attempting to paper over it seems like an awful bodge.
>> Why not just do something like this?
>>
>> Cheers,
>> Robin.
>>
>> ----->8-----
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index 35db51780954..682e95c94236 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -1212,8 +1212,11 @@ static int iommu_create_device_direct_mappings(struct iommu_domain *domain,
>>
>> if (addr == end)
>> goto map_end;
>> -
>> - phys_addr = iommu_iova_to_phys(domain, addr);
>> + /*
>> + * Check for existing translations using an offset, to
>> + * distinguish a mapping of the page at PA 0 from none.
>> + */
>> + phys_addr = iommu_iova_to_phys(domain, addr + 1);
>
> Hi Robin,
> Assuming pg_size != 1, which it seems to always be due to `1UL <<
> __ffs(domain->pgsize_bitmap)` is an interesting solution. Then again,
> it assumes that the underlying implementations for iommu_iova_to_phys
> would respect translating unaligned addresses. I cannot comment on
> that.
It was a somewhat rhetorical question. We can safely assume that nobody
would ever build an IOMMU with a 1-byte page size, and that even if
someone was insane enough to do so, Linux would never support using that
granularity (since it would be nice to have some memory left for
anything other than infinite IOMMU pagetables). It's also indeed not
impossible that a driver might have a bug in its iova_to_phys
implementation, but then that driver would have a bug that would also
break other iova_to_phys users (e.g. iommu-dma) anyway. Every other
possible return value from iova_to_phys is unambiguous, and the single
one that isn't is trivial to avoid in this case, so why do anything
other than avoid it?
> I would tend to consider both solutions not ideal, and edge towards
> Jason's suggestion of universally ignoring EADDRINUSE without an
> error. However, I am not an absolutist in that regard.
Relying on magic return values doesn't even work around this issue
robustly, since even among the drivers which do treat double-mapping as
an error, they do not all use the same error codes, and as mentioned
some would already log a visible warning before getting that far. This
is some core code not quite working as intended, and that fact that it's
apparently only shown up on AMD platforms which also previously happened
to hide it hardly makes it right to paper over in a manner that only
"works" for a few drivers including AMD, rather than properly fix the
root cause and make the initial check consistently work as intended to
prevent double-mapping attempts in the first place.
Thanks,
Robin.
> I am on a business trip and do not have the SMTP credentials with me
> for git, so I will send a V2 tomorrow. I already drafted it. I will
> make sure to cc new participants on this thread if not already
> included.
>
> As an aside, it seems that the issue fixed by this patch is not the
> only one introduced in 6.19, with [1] having users where this patch
> does not help. It seems there are additional issues with thunderbolt
> docks. iommu_map could be the culprit and perhaps forceful remapping,
> as was done previously for AMD would be a more universal solution.
> This way, EADDRINUSE stops being an error.
>
> Best,
> Antheas
>
> [1] https://github.com/CachyOS/linux-cachyos/issues/704
>
>> if (!phys_addr) {
>> map_size += pg_size;
>> continue;
>>
>>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists
2026-02-26 19:46 ` Robin Murphy
@ 2026-02-26 20:40 ` Antheas Kapenekakis
2026-02-27 1:03 ` Jason Gunthorpe
0 siblings, 1 reply; 16+ messages in thread
From: Antheas Kapenekakis @ 2026-02-26 20:40 UTC (permalink / raw)
To: Robin Murphy
Cc: Jason Gunthorpe, iommu, linux-kernel, Joerg Roedel, Will Deacon,
Vasant Hegde, Alejandro Jimenez, dnaim, Mario.Limonciello
On Thu, 26 Feb 2026 at 20:46, Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 25/02/2026 10:05 pm, Antheas Kapenekakis wrote:
> > On Wed, 25 Feb 2026 at 19:27, Robin Murphy <robin.murphy@arm.com> wrote:
> >>
> >> On 2026-02-24 7:51 pm, Antheas Kapenekakis wrote:
> >>> On Tue, 24 Feb 2026 at 20:43, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >>>>
> >>>> On Tue, Feb 24, 2026 at 08:33:50PM +0100, Antheas Kapenekakis wrote:
> >>>>> On Tue, 24 Feb 2026 at 20:23, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >>>>>>
> >>>>>> On Sun, Feb 22, 2026 at 12:50:50AM +0100, Antheas Kapenekakis wrote:
> >>>>>>> Commit 789a5913b29c ("iommu/amd: Use the generic iommu page table")
> >>>>>>> introduces the shared iommu page table for AMD IOMMU. Some bioses
> >>>>>>> contain an identity mapping for address 0x0, which is not parsed
> >>>>>>> properly (e.g., certain Strix Halo devices). This causes the DMA
> >>>>>>> components of the device to fail to initialize (e.g., the NVMe SSD
> >>>>>>> controller), leading to a failed post.
> >>>>>>
> >>>>>> I'm trying to understand the issue here, is it that the old AMD code
> >>>>>> incorrectly succeeded iommu_map() on top of an existing mapping while
> >>>>>> the new code returns -EADDRINUSE?
> >>>>>>
> >>>>>> Then the existing guard for double mapping doesn't work since 0 is an
> >>>>>> ambiguous return?
> >>>>>
> >>>>> Hi Jason,
> >>>>>
> >>>>> It seems like the previous code correctly handled the 0 case, and the
> >>>>> new code does not due to the ambiguous return.
> >>>>
> >>>> I checked the old AMD driver it is definately returning 0 from
> >>>> iommu_iova_to_phys(), so that hasn't changed.
> >>>>
> >>>>>> reasonabe. Just please clean up the commit message to be a bit clearer
> >>>>>> on what caused this regression and add a short comment above the new if:
> >>>>>>
> >>>>>> 0 return from iommu_iova_to_phys() is ambiguous, it could mean a
> >>>>>> present mapping. EADDRINUSE is reliable when supported, it means the
> >>>>>> IOVA is already mapped, so ignore it to resolve the ambiguity.
> >>>>>
> >>>>> If you think that this patch is a long term solution, I can drop the
> >>>>> print, the addr == 0 check, add a comment on top, then repost a V2
> >>>>> with an updated description to be more concrete.
> >>>>
> >>>> I think it is a good solution as-is, I just want clarity on what
> >>>> actually regressed here, and I think it is that iommu_map() behavior
> >>>> changed.
> >>>
> >>> Reverting 789a5913b29c ("iommu/amd: Use the generic iommu page table")
> >>> plus two follow-up fixes also worked.
> >>>
> >>> Looking through it, the removed amd_iommu_map_pages() /
> >>> amd_iommu_iova_to_phys() in the previous ops might have behaved
> >>> differently.
> >>
> >> Yes, the old AMD code would happily replace an existing mapping without
> >> question (but whether it would correctly invalidate TLBs if just
> >> replacing a leaf entry with a different leaf entry at the same level
> >> I'm not entirely sure...)
> >>
> >> However, waiting for a mapping to fail (which may scream its own
> >> warning) and then attempting to paper over it seems like an awful bodge.
> >> Why not just do something like this?
> >>
> >> Cheers,
> >> Robin.
> >>
> >> ----->8-----
> >> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> >> index 35db51780954..682e95c94236 100644
> >> --- a/drivers/iommu/iommu.c
> >> +++ b/drivers/iommu/iommu.c
> >> @@ -1212,8 +1212,11 @@ static int iommu_create_device_direct_mappings(struct iommu_domain *domain,
> >>
> >> if (addr == end)
> >> goto map_end;
> >> -
> >> - phys_addr = iommu_iova_to_phys(domain, addr);
> >> + /*
> >> + * Check for existing translations using an offset, to
> >> + * distinguish a mapping of the page at PA 0 from none.
> >> + */
> >> + phys_addr = iommu_iova_to_phys(domain, addr + 1);
> >
> > Hi Robin,
> > Assuming pg_size != 1, which it seems to always be due to `1UL <<
> > __ffs(domain->pgsize_bitmap)` is an interesting solution. Then again,
> > it assumes that the underlying implementations for iommu_iova_to_phys
> > would respect translating unaligned addresses. I cannot comment on
> > that.
>
> It was a somewhat rhetorical question. We can safely assume that nobody
> would ever build an IOMMU with a 1-byte page size, and that even if
> someone was insane enough to do so, Linux would never support using that
pgsize_bitmap must not be 0, otherwise the function errors and
returns, therefore page_size cannot be 1.
> granularity (since it would be nice to have some memory left for
> anything other than infinite IOMMU pagetables). It's also indeed not
> impossible that a driver might have a bug in its iova_to_phys
> implementation, but then that driver would have a bug that would also
> break other iova_to_phys users (e.g. iommu-dma) anyway. Every other
> possible return value from iova_to_phys is unambiguous, and the single
> one that isn't is trivial to avoid in this case, so why do anything
> other than avoid it?
I do not know if other iova_to_phys users do unaligned accesses with a
single byte offset. I would not be eager to find out if all underlying
implementations are robust.
> > I would tend to consider both solutions not ideal, and edge towards
> > Jason's suggestion of universally ignoring EADDRINUSE without an
> > error. However, I am not an absolutist in that regard.
>
> Relying on magic return values doesn't even work around this issue
> robustly, since even among the drivers which do treat double-mapping as
> an error, they do not all use the same error codes, and as mentioned
> some would already log a visible warning before getting that far. This
> is some core code not quite working as intended, and that fact that it's
> apparently only shown up on AMD platforms which also previously happened
> to hide it hardly makes it right to paper over in a manner that only
> "works" for a few drivers including AMD, rather than properly fix the
> root cause and make the initial check consistently work as intended to
> prevent double-mapping attempts in the first place.
I am still concerned about unaligned checks. It is a functional change
that can cause regressions in all devices. The approach of this patch
does not affect behavior in other devices. I would like for Jason to
weigh in.
In my device, the only error printed due to this V1 is the warning I
added and this seems to be an AMD specific bug for certain BIOSes.
In the interim, I will send a V2 with Jason's suggestions.
Antheas
> Thanks,
> Robin.
>
> > I am on a business trip and do not have the SMTP credentials with me
> > for git, so I will send a V2 tomorrow. I already drafted it. I will
> > make sure to cc new participants on this thread if not already
> > included.
> >
> > As an aside, it seems that the issue fixed by this patch is not the
> > only one introduced in 6.19, with [1] having users where this patch
> > does not help. It seems there are additional issues with thunderbolt
> > docks. iommu_map could be the culprit and perhaps forceful remapping,
> > as was done previously for AMD would be a more universal solution.
> > This way, EADDRINUSE stops being an error.
> >
> > Best,
> > Antheas
> >
> > [1] https://github.com/CachyOS/linux-cachyos/issues/704
> >
> >> if (!phys_addr) {
> >> map_size += pg_size;
> >> continue;
> >>
> >>
> >
>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists
2026-02-26 20:40 ` Antheas Kapenekakis
@ 2026-02-27 1:03 ` Jason Gunthorpe
2026-02-27 8:06 ` Antheas Kapenekakis
0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2026-02-27 1:03 UTC (permalink / raw)
To: Antheas Kapenekakis
Cc: Robin Murphy, iommu, linux-kernel, Joerg Roedel, Will Deacon,
Vasant Hegde, Alejandro Jimenez, dnaim, Mario.Limonciello
On Thu, Feb 26, 2026 at 09:40:10PM +0100, Antheas Kapenekakis wrote:
> I am still concerned about unaligned checks. It is a functional change
> that can cause regressions in all devices. The approach of this patch
> does not affect behavior in other devices. I would like for Jason to
> weigh in.
I think Robin's solution is very clever, but I share the concern
regarding what all the implementations do.
So, I fed this question to Claude. It did find two counter points (see
below for the whole report I had it generate):
Implementations that lose the offset
s390-iommu (drivers/iommu/s390-iommu.c:989): After the 3-level ZPCI walk,
returns pte & ZPCI_PTE_ADDR_MASK with no sub-page offset added back.
iova_to_phys(0) and iova_to_phys(1) return the same page-aligned PA.
mtk_iommu_v1 (drivers/iommu/mtk_iommu_v1.c:396): Looks up the PTE by
iova >> PAGE_SHIFT (discarding offset), then returns pte & ~(page_size-1). No
step adds the sub-page offset back.
I checked myself and it seems correct. I didn't try to confirm that
the cases it says are OK are in fact OK, but it paints a convincing
picture.
I doubt S390 uses this function you are fixing, and I have no idea
about mtk. Below is also a diff how Claude thought to fix it, I didn't
try to check it.
So, I'd say if Robin is OK with these outliers then it a good and fine
approach.
Jason
iova_to_phys Implementation Survey
Entry point: iommu_iova_to_phys() in drivers/iommu/iommu.c:2502 calls
domain->ops->iova_to_phys(domain, iova) via iommu_domain_ops.
Category 1 — Delegates to io-pgtable
These drivers hold an io_pgtable_ops * and call ops->iova_to_phys(ops, iova).
The actual walk happens in one of the io-pgtable backends listed in Category
4.
----------------------------------------------------------------------------------------------------------
Driver Function (file:line) Ops assignment Notes
------------- -------------------------------------------------- ----------------------- -----------------
arm-smmu-v3 arm_smmu_iova_to_phys :3767 Pure delegation
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:3471
arm-smmu arm_smmu_iova_to_phys :1655 S1 with
v1/v2 drivers/iommu/arm/arm-smmu/arm-smmu.c:1387 FEAT_TRANS_OPS
uses hw ATS1PR
registers
(CB_PAR),
otherwise
io-pgtable
apple-dart apple_dart_iova_to_phys :1021 Pure delegation →
drivers/iommu/apple-dart.c:531 io-pgtable-dart
qcom_iommu qcom_iommu_iova_to_phys :605 Delegation with
drivers/iommu/arm/arm-smmu/qcom_iommu.c:492 spinlock
ipmmu-vmsa ipmmu_iova_to_phys drivers/iommu/ipmmu-vmsa.c:702 :895 Uses
ARM_32_LPAE_S1
format
mtk_iommu mtk_iommu_iova_to_phys :1073 Delegation + 4GB
drivers/iommu/mtk_iommu.c:861 mode PA remap
fixup
----------------------------------------------------------------------------------------------------------
Category 2 — Open-coded page table walk
These drivers implement their own page table traversal without io-pgtable.
---------------------------------------------------------------------------------------------------------
Driver Function (file:line) Ops assignment Walk structure
---------------- ------------------------------------ -------------------- ------------------------------
sun50i-iommu sun50i_iommu_iova_to_phys :860 2-level (DTE → PTE)
drivers/iommu/sun50i-iommu.c:662
exynos-iommu exynos_iommu_iova_to_phys :1487 2-level (section/large/small
drivers/iommu/exynos-iommu.c:1375 page)
riscv-iommu riscv_iommu_iova_to_phys :1355 Sv39/48/57 via
drivers/iommu/riscv/iommu.c:1280 riscv_iommu_pte_fetch (:1166)
omap-iommu omap_iommu_iova_to_phys :1727 iopgtable_lookup_entry helper
drivers/iommu/omap-iommu.c:1596 (super
section/section/large/small)
rockchip-iommu rk_iommu_iova_to_phys :1190 2-level (DTE → PTE)
drivers/iommu/rockchip-iommu.c:651
msm_iommu msm_iommu_iova_to_phys :709 Hardware walk: writes VA to
drivers/iommu/msm_iommu.c:526 V2PPR register, reads PA from
PAR register
s390-iommu s390_iommu_iova_to_phys :1186 3-level ZPCI (region → segment
drivers/iommu/s390-iommu.c:989 → page)
tegra-smmu tegra_smmu_iova_to_phys :1010 2-level via
drivers/iommu/tegra-smmu.c:806 tegra_smmu_pte_lookup
mtk_iommu_v1 mtk_iommu_v1_iova_to_phys :593 Flat single-level table
drivers/iommu/mtk_iommu_v1.c:396
sprd-iommu sprd_iommu_iova_to_phys :423 Flat single-level table
drivers/iommu/sprd-iommu.c:369
---------------------------------------------------------------------------------------------------------
Category 3 — Special / trivial
-------------------------------------------------------------------------------------------
Driver Function (file:line) Ops assignment Mechanism
-------------- ------------------------------------- ---------------------- ---------------
fsl_pamu fsl_pamu_iova_to_phys :438 Identity:
drivers/iommu/fsl_pamu_domain.c:172 returns iova
(after aperture
bounds check)
virtio-iommu viommu_iova_to_phys :1105 Interval tree
drivers/iommu/virtio-iommu.c:915 reverse lookup
(no page table)
-------------------------------------------------------------------------------------------
Category 4 — io_pgtable_ops backends
These implement struct io_pgtable_ops.iova_to_phys and are the ultimate walk
functions called by Category 1 drivers.
--------------------------------------------------------------------------------------------------
Backend Function (file:line) Ops assignment Walk strategy
----------- ---------------------------------------- -------------------- ------------------------
ARM LPAE arm_lpae_iova_to_phys :950 Visitor pattern via
(64-bit) drivers/iommu/io-pgtable-arm.c:734 __arm_lpae_iopte_walk;
covers ARM_64_LPAE_S1,
S2, ARM_MALI_LPAE
ARM v7s arm_v7s_iova_to_phys :716 Iterative do-while
(32-bit) drivers/iommu/io-pgtable-arm-v7s.c:644 2-level; handles
contiguous entries
Apple DART dart_iova_to_phys :402 dart_get_last pre-walks
drivers/iommu/io-pgtable-dart.c:336 to leaf table, then
single lookup
--------------------------------------------------------------------------------------------------
Category 5 — generic_pt framework
All these drivers use IOMMU_PT_DOMAIN_OPS(fmt) which routes iova_to_phys into
the template function pt_iommu_<fmt>_iova_to_phys at
drivers/iommu/generic_pt/iommu_pt.h:170. The walk uses pt_walk_range +
PT_MAKE_LEVELS to generate a fully-inlined unrolled per-level walk; OA
extracted via pt_entry_oa_exact.
---------------------------------------------------------------------------------
Driver Ops struct (file:line) Format
---------------- ----------------------------------------------- ----------------
AMD IOMMU v1 amdv1_ops drivers/iommu/amd/iommu.c:2662 amdv1
AMD IOMMU v2 amdv2_ops drivers/iommu/amd/iommu.c:2740 x86_64
Intel VT-d intel_fs_paging_domain_ops x86_64
first-stage drivers/iommu/intel/iommu.c:3886
Intel VT-d intel_ss_paging_domain_ops vtdss
second-stage drivers/iommu/intel/iommu.c:3897
iommufd selftest mock_domain_ops etc amdv1_mock /
drivers/iommu/iommufd/selftest.c:403,411,425 amdv1
KUnit wrapper pgtbl_ops Delegates to
drivers/iommu/generic_pt/kunit_iommu_cmp.h:86 io_pgtable_ops
for comparison
testing
---------------------------------------------------------------------------------
Sub-page offset handling
When iova_to_phys(iova) is called with an IOVA that is not aligned to the
start of the mapped page/block (e.g. iova_to_phys(1) when a 4KB page is mapped
at IOVA 0), most implementations return the exact physical address including
the sub-page offset (phys_base + offset). Two do not.
Summary
------------------------------------------------------------------------------------------------------
Implementation Offset preserved? Mechanism
------------------------- ------------------------- --------------------------------------------------
arm_lpae (io-pgtable) YES iopte_to_paddr(pte) | (iova & (block_size-1))
arm_v7s (io-pgtable) YES iopte_to_paddr(pte) | (iova & ~LVL_MASK)
dart (io-pgtable) YES iopte_to_paddr(pte) | (iova & (pgsize-1))
sun50i-iommu YES page_addr + FIELD_GET(GENMASK(11,0), iova)
exynos-iommu YES *_phys(entry) + *_offs(iova) per granularity
riscv-iommu YES pfn_to_phys(pfn) | (iova & (pte_size-1))
omap-iommu YES (descriptor & mask) | (va & ~mask)
rockchip-iommu YES pt_address(pte) + rk_iova_page_offset(iova)
msm_iommu YES HW PAR register + VA low bits spliced back in
s390-iommu NO pte & ZPCI_PTE_ADDR_MASK — offset discarded
tegra-smmu YES SMMU_PFN_PHYS(pfn) + SMMU_OFFSET_IN_PAGE(iova)
mtk_iommu_v1 NO pte & ~(page_size-1) — offset discarded
sprd-iommu YES (pte << PAGE_SHIFT) + (iova & (page_size-1))
fsl_pamu YES (trivial) return iova — identity mapping
virtio-iommu YES paddr + (iova - mapping->iova.start)
generic_pt YES _pt_entry_oa_fast() | log2_mod(va, entry_lg2sz)
------------------------------------------------------------------------------------------------------
Category 1 drivers (arm-smmu-v3, arm-smmu, apple-dart, qcom_iommu, ipmmu-vmsa,
mtk_iommu) inherit the behavior of their io-pgtable backend — all preserve
offset.
Implementations that lose the offset
s390-iommu (drivers/iommu/s390-iommu.c:989): After the 3-level ZPCI walk,
returns pte & ZPCI_PTE_ADDR_MASK with no sub-page offset added back.
iova_to_phys(0) and iova_to_phys(1) return the same page-aligned PA.
mtk_iommu_v1 (drivers/iommu/mtk_iommu_v1.c:396): Looks up the PTE by
iova >> PAGE_SHIFT (discarding offset), then returns pte & ~(page_size-1). No
step adds the sub-page offset back.
diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index c8d8eff5373d30..8db16989270cd8 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -401,7 +401,8 @@ static phys_addr_t mtk_iommu_v1_iova_to_phys(struct iommu_domain *domain, dma_ad
spin_lock_irqsave(&dom->pgtlock, flags);
pa = *(dom->pgt_va + (iova >> MT2701_IOMMU_PAGE_SHIFT));
- pa = pa & (~(MT2701_IOMMU_PAGE_SIZE - 1));
+ pa = (pa & (~(MT2701_IOMMU_PAGE_SIZE - 1))) |
+ (iova & (MT2701_IOMMU_PAGE_SIZE - 1));
spin_unlock_irqrestore(&dom->pgtlock, flags);
return pa;
diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index fe679850af2861..57d27f3a984ed6 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -1015,7 +1015,8 @@ static phys_addr_t s390_iommu_iova_to_phys(struct iommu_domain *domain,
pto = get_st_pto(ste);
pte = READ_ONCE(pto[px]);
if (pt_entry_isvalid(pte))
- phys = pte & ZPCI_PTE_ADDR_MASK;
+ phys = (pte & ZPCI_PTE_ADDR_MASK) |
+ (iova & ~ZPCI_PTE_ADDR_MASK);
}
}
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists
2026-02-27 1:03 ` Jason Gunthorpe
@ 2026-02-27 8:06 ` Antheas Kapenekakis
2026-02-27 14:01 ` Jason Gunthorpe
0 siblings, 1 reply; 16+ messages in thread
From: Antheas Kapenekakis @ 2026-02-27 8:06 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Robin Murphy, iommu, linux-kernel, Joerg Roedel, Will Deacon,
Vasant Hegde, Alejandro Jimenez, dnaim, Mario.Limonciello
On Fri, 27 Feb 2026 at 02:03, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Thu, Feb 26, 2026 at 09:40:10PM +0100, Antheas Kapenekakis wrote:
> > I am still concerned about unaligned checks. It is a functional change
> > that can cause regressions in all devices. The approach of this patch
> > does not affect behavior in other devices. I would like for Jason to
> > weigh in.
>
> I think Robin's solution is very clever, but I share the concern
> regarding what all the implementations do.
I will send a V3 with a ternary instead. This way, only 0 is affected.
I will compile test now and test later today. I am impartial with either fix.
```
phys_addr = iommu_iova_to_phys(domain, addr ? addr : 1);
```
Antheas
> So, I fed this question to Claude. It did find two counter points (see
> below for the whole report I had it generate):
>
> Implementations that lose the offset
>
> s390-iommu (drivers/iommu/s390-iommu.c:989): After the 3-level ZPCI walk,
> returns pte & ZPCI_PTE_ADDR_MASK with no sub-page offset added back.
> iova_to_phys(0) and iova_to_phys(1) return the same page-aligned PA.
>
> mtk_iommu_v1 (drivers/iommu/mtk_iommu_v1.c:396): Looks up the PTE by
> iova >> PAGE_SHIFT (discarding offset), then returns pte & ~(page_size-1). No
> step adds the sub-page offset back.
>
> I checked myself and it seems correct. I didn't try to confirm that
> the cases it says are OK are in fact OK, but it paints a convincing
> picture.
>
> I doubt S390 uses this function you are fixing, and I have no idea
> about mtk. Below is also a diff how Claude thought to fix it, I didn't
> try to check it.
>
> So, I'd say if Robin is OK with these outliers then it a good and fine
> approach.
>
> Jason
>
> iova_to_phys Implementation Survey
>
> Entry point: iommu_iova_to_phys() in drivers/iommu/iommu.c:2502 calls
> domain->ops->iova_to_phys(domain, iova) via iommu_domain_ops.
>
> Category 1 — Delegates to io-pgtable
>
> These drivers hold an io_pgtable_ops * and call ops->iova_to_phys(ops, iova).
> The actual walk happens in one of the io-pgtable backends listed in Category
> 4.
>
> ----------------------------------------------------------------------------------------------------------
> Driver Function (file:line) Ops assignment Notes
> ------------- -------------------------------------------------- ----------------------- -----------------
> arm-smmu-v3 arm_smmu_iova_to_phys :3767 Pure delegation
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:3471
>
> arm-smmu arm_smmu_iova_to_phys :1655 S1 with
> v1/v2 drivers/iommu/arm/arm-smmu/arm-smmu.c:1387 FEAT_TRANS_OPS
> uses hw ATS1PR
> registers
> (CB_PAR),
> otherwise
> io-pgtable
>
> apple-dart apple_dart_iova_to_phys :1021 Pure delegation →
> drivers/iommu/apple-dart.c:531 io-pgtable-dart
>
> qcom_iommu qcom_iommu_iova_to_phys :605 Delegation with
> drivers/iommu/arm/arm-smmu/qcom_iommu.c:492 spinlock
>
> ipmmu-vmsa ipmmu_iova_to_phys drivers/iommu/ipmmu-vmsa.c:702 :895 Uses
> ARM_32_LPAE_S1
> format
>
> mtk_iommu mtk_iommu_iova_to_phys :1073 Delegation + 4GB
> drivers/iommu/mtk_iommu.c:861 mode PA remap
> fixup
> ----------------------------------------------------------------------------------------------------------
>
> Category 2 — Open-coded page table walk
>
> These drivers implement their own page table traversal without io-pgtable.
>
> ---------------------------------------------------------------------------------------------------------
> Driver Function (file:line) Ops assignment Walk structure
> ---------------- ------------------------------------ -------------------- ------------------------------
> sun50i-iommu sun50i_iommu_iova_to_phys :860 2-level (DTE → PTE)
> drivers/iommu/sun50i-iommu.c:662
>
> exynos-iommu exynos_iommu_iova_to_phys :1487 2-level (section/large/small
> drivers/iommu/exynos-iommu.c:1375 page)
>
> riscv-iommu riscv_iommu_iova_to_phys :1355 Sv39/48/57 via
> drivers/iommu/riscv/iommu.c:1280 riscv_iommu_pte_fetch (:1166)
>
> omap-iommu omap_iommu_iova_to_phys :1727 iopgtable_lookup_entry helper
> drivers/iommu/omap-iommu.c:1596 (super
> section/section/large/small)
>
> rockchip-iommu rk_iommu_iova_to_phys :1190 2-level (DTE → PTE)
> drivers/iommu/rockchip-iommu.c:651
>
> msm_iommu msm_iommu_iova_to_phys :709 Hardware walk: writes VA to
> drivers/iommu/msm_iommu.c:526 V2PPR register, reads PA from
> PAR register
>
> s390-iommu s390_iommu_iova_to_phys :1186 3-level ZPCI (region → segment
> drivers/iommu/s390-iommu.c:989 → page)
>
> tegra-smmu tegra_smmu_iova_to_phys :1010 2-level via
> drivers/iommu/tegra-smmu.c:806 tegra_smmu_pte_lookup
>
> mtk_iommu_v1 mtk_iommu_v1_iova_to_phys :593 Flat single-level table
> drivers/iommu/mtk_iommu_v1.c:396
>
> sprd-iommu sprd_iommu_iova_to_phys :423 Flat single-level table
> drivers/iommu/sprd-iommu.c:369
> ---------------------------------------------------------------------------------------------------------
>
> Category 3 — Special / trivial
>
> -------------------------------------------------------------------------------------------
> Driver Function (file:line) Ops assignment Mechanism
> -------------- ------------------------------------- ---------------------- ---------------
> fsl_pamu fsl_pamu_iova_to_phys :438 Identity:
> drivers/iommu/fsl_pamu_domain.c:172 returns iova
> (after aperture
> bounds check)
>
> virtio-iommu viommu_iova_to_phys :1105 Interval tree
> drivers/iommu/virtio-iommu.c:915 reverse lookup
> (no page table)
> -------------------------------------------------------------------------------------------
>
> Category 4 — io_pgtable_ops backends
>
> These implement struct io_pgtable_ops.iova_to_phys and are the ultimate walk
> functions called by Category 1 drivers.
>
> --------------------------------------------------------------------------------------------------
> Backend Function (file:line) Ops assignment Walk strategy
> ----------- ---------------------------------------- -------------------- ------------------------
> ARM LPAE arm_lpae_iova_to_phys :950 Visitor pattern via
> (64-bit) drivers/iommu/io-pgtable-arm.c:734 __arm_lpae_iopte_walk;
> covers ARM_64_LPAE_S1,
> S2, ARM_MALI_LPAE
>
> ARM v7s arm_v7s_iova_to_phys :716 Iterative do-while
> (32-bit) drivers/iommu/io-pgtable-arm-v7s.c:644 2-level; handles
> contiguous entries
>
> Apple DART dart_iova_to_phys :402 dart_get_last pre-walks
> drivers/iommu/io-pgtable-dart.c:336 to leaf table, then
> single lookup
> --------------------------------------------------------------------------------------------------
>
> Category 5 — generic_pt framework
>
> All these drivers use IOMMU_PT_DOMAIN_OPS(fmt) which routes iova_to_phys into
> the template function pt_iommu_<fmt>_iova_to_phys at
> drivers/iommu/generic_pt/iommu_pt.h:170. The walk uses pt_walk_range +
> PT_MAKE_LEVELS to generate a fully-inlined unrolled per-level walk; OA
> extracted via pt_entry_oa_exact.
>
> ---------------------------------------------------------------------------------
> Driver Ops struct (file:line) Format
> ---------------- ----------------------------------------------- ----------------
> AMD IOMMU v1 amdv1_ops drivers/iommu/amd/iommu.c:2662 amdv1
>
> AMD IOMMU v2 amdv2_ops drivers/iommu/amd/iommu.c:2740 x86_64
>
> Intel VT-d intel_fs_paging_domain_ops x86_64
> first-stage drivers/iommu/intel/iommu.c:3886
>
> Intel VT-d intel_ss_paging_domain_ops vtdss
> second-stage drivers/iommu/intel/iommu.c:3897
>
> iommufd selftest mock_domain_ops etc amdv1_mock /
> drivers/iommu/iommufd/selftest.c:403,411,425 amdv1
>
> KUnit wrapper pgtbl_ops Delegates to
> drivers/iommu/generic_pt/kunit_iommu_cmp.h:86 io_pgtable_ops
> for comparison
> testing
> ---------------------------------------------------------------------------------
>
> Sub-page offset handling
>
> When iova_to_phys(iova) is called with an IOVA that is not aligned to the
> start of the mapped page/block (e.g. iova_to_phys(1) when a 4KB page is mapped
> at IOVA 0), most implementations return the exact physical address including
> the sub-page offset (phys_base + offset). Two do not.
>
> Summary
>
> ------------------------------------------------------------------------------------------------------
> Implementation Offset preserved? Mechanism
> ------------------------- ------------------------- --------------------------------------------------
> arm_lpae (io-pgtable) YES iopte_to_paddr(pte) | (iova & (block_size-1))
>
> arm_v7s (io-pgtable) YES iopte_to_paddr(pte) | (iova & ~LVL_MASK)
>
> dart (io-pgtable) YES iopte_to_paddr(pte) | (iova & (pgsize-1))
>
> sun50i-iommu YES page_addr + FIELD_GET(GENMASK(11,0), iova)
>
> exynos-iommu YES *_phys(entry) + *_offs(iova) per granularity
>
> riscv-iommu YES pfn_to_phys(pfn) | (iova & (pte_size-1))
>
> omap-iommu YES (descriptor & mask) | (va & ~mask)
>
> rockchip-iommu YES pt_address(pte) + rk_iova_page_offset(iova)
>
> msm_iommu YES HW PAR register + VA low bits spliced back in
>
> s390-iommu NO pte & ZPCI_PTE_ADDR_MASK — offset discarded
>
> tegra-smmu YES SMMU_PFN_PHYS(pfn) + SMMU_OFFSET_IN_PAGE(iova)
>
> mtk_iommu_v1 NO pte & ~(page_size-1) — offset discarded
>
> sprd-iommu YES (pte << PAGE_SHIFT) + (iova & (page_size-1))
>
> fsl_pamu YES (trivial) return iova — identity mapping
>
> virtio-iommu YES paddr + (iova - mapping->iova.start)
>
> generic_pt YES _pt_entry_oa_fast() | log2_mod(va, entry_lg2sz)
> ------------------------------------------------------------------------------------------------------
>
> Category 1 drivers (arm-smmu-v3, arm-smmu, apple-dart, qcom_iommu, ipmmu-vmsa,
> mtk_iommu) inherit the behavior of their io-pgtable backend — all preserve
> offset.
>
> Implementations that lose the offset
>
> s390-iommu (drivers/iommu/s390-iommu.c:989): After the 3-level ZPCI walk,
> returns pte & ZPCI_PTE_ADDR_MASK with no sub-page offset added back.
> iova_to_phys(0) and iova_to_phys(1) return the same page-aligned PA.
>
> mtk_iommu_v1 (drivers/iommu/mtk_iommu_v1.c:396): Looks up the PTE by
> iova >> PAGE_SHIFT (discarding offset), then returns pte & ~(page_size-1). No
> step adds the sub-page offset back.
>
>
> diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
> index c8d8eff5373d30..8db16989270cd8 100644
> --- a/drivers/iommu/mtk_iommu_v1.c
> +++ b/drivers/iommu/mtk_iommu_v1.c
> @@ -401,7 +401,8 @@ static phys_addr_t mtk_iommu_v1_iova_to_phys(struct iommu_domain *domain, dma_ad
>
> spin_lock_irqsave(&dom->pgtlock, flags);
> pa = *(dom->pgt_va + (iova >> MT2701_IOMMU_PAGE_SHIFT));
> - pa = pa & (~(MT2701_IOMMU_PAGE_SIZE - 1));
> + pa = (pa & (~(MT2701_IOMMU_PAGE_SIZE - 1))) |
> + (iova & (MT2701_IOMMU_PAGE_SIZE - 1));
> spin_unlock_irqrestore(&dom->pgtlock, flags);
>
> return pa;
> diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
> index fe679850af2861..57d27f3a984ed6 100644
> --- a/drivers/iommu/s390-iommu.c
> +++ b/drivers/iommu/s390-iommu.c
> @@ -1015,7 +1015,8 @@ static phys_addr_t s390_iommu_iova_to_phys(struct iommu_domain *domain,
> pto = get_st_pto(ste);
> pte = READ_ONCE(pto[px]);
> if (pt_entry_isvalid(pte))
> - phys = pte & ZPCI_PTE_ADDR_MASK;
> + phys = (pte & ZPCI_PTE_ADDR_MASK) |
> + (iova & ~ZPCI_PTE_ADDR_MASK);
> }
> }
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists
2026-02-27 8:06 ` Antheas Kapenekakis
@ 2026-02-27 14:01 ` Jason Gunthorpe
0 siblings, 0 replies; 16+ messages in thread
From: Jason Gunthorpe @ 2026-02-27 14:01 UTC (permalink / raw)
To: Antheas Kapenekakis
Cc: Robin Murphy, iommu, linux-kernel, Joerg Roedel, Will Deacon,
Vasant Hegde, Alejandro Jimenez, dnaim, Mario.Limonciello
On Fri, Feb 27, 2026 at 09:06:27AM +0100, Antheas Kapenekakis wrote:
> On Fri, 27 Feb 2026 at 02:03, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Thu, Feb 26, 2026 at 09:40:10PM +0100, Antheas Kapenekakis wrote:
> > > I am still concerned about unaligned checks. It is a functional change
> > > that can cause regressions in all devices. The approach of this patch
> > > does not affect behavior in other devices. I would like for Jason to
> > > weigh in.
> >
> > I think Robin's solution is very clever, but I share the concern
> > regarding what all the implementations do.
>
> I will send a V3 with a ternary instead. This way, only 0 is affected.
> I will compile test now and test later today. I am impartial with either fix.
>
> ```
> phys_addr = iommu_iova_to_phys(domain, addr ? addr : 1);
> ```
Yeah, that's a good idea
Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2026-02-27 14:02 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-21 23:50 [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists Antheas Kapenekakis
2026-02-23 6:02 ` Vasant Hegde
2026-02-23 7:46 ` Antheas Kapenekakis
2026-02-24 8:25 ` Vasant Hegde
2026-02-24 9:16 ` Antheas Kapenekakis
2026-02-24 19:23 ` Jason Gunthorpe
2026-02-24 19:33 ` Antheas Kapenekakis
2026-02-24 19:43 ` Jason Gunthorpe
2026-02-24 19:51 ` Antheas Kapenekakis
2026-02-25 18:27 ` Robin Murphy
2026-02-25 22:05 ` Antheas Kapenekakis
2026-02-26 19:46 ` Robin Murphy
2026-02-26 20:40 ` Antheas Kapenekakis
2026-02-27 1:03 ` Jason Gunthorpe
2026-02-27 8:06 ` Antheas Kapenekakis
2026-02-27 14:01 ` Jason Gunthorpe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox