public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists
@ 2026-02-21 23:50 Antheas Kapenekakis
  2026-02-23  6:02 ` Vasant Hegde
  2026-02-24 19:23 ` Jason Gunthorpe
  0 siblings, 2 replies; 16+ messages in thread
From: Antheas Kapenekakis @ 2026-02-21 23:50 UTC (permalink / raw)
  To: iommu, linux-kernel
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Vasant Hegde, Alejandro Jimenez, dnaim, Mario.Limonciello,
	Antheas Kapenekakis

Commit 789a5913b29c ("iommu/amd: Use the generic iommu page table")
introduces the shared iommu page table for AMD IOMMU. Some bioses
contain an identity mapping for address 0x0, which is not parsed
properly (e.g., certain Strix Halo devices). This causes the DMA
components of the device to fail to initialize (e.g., the NVMe SSD
controller), leading to a failed post.

The failure is caused by iommu_create_device_direct_mappings(), which
is the new mapping implementation. In it, address aliasing is handled
via the following check:

```
phys_addr = iommu_iova_to_phys(domain, addr);
if (!phys_addr) {
        map_size += pg_size;
        continue;
}
````

Obviously, the iommu_iova_to_phys() signature is faulty and aliases
unmapped and 0 together, causing the allocation code to try to
re-allocate the 0 address per device. However, it has too many
instantiations to fix. Therefore, catch ret == -EADDRINUSE only when
addr == 0 and, instead of bailing, skip the mapping and print a warning.

Closes: https://www.reddit.com/r/cachyos/comments/1r5sgr6/freeze_on_boot_after_kernel_update_to_619_fixed/
Fixes: 789a5913b29c ("iommu/amd: Use the generic iommu page table")
Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
---
 drivers/iommu/iommu.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 4926a43118e6..d424b7124311 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1224,6 +1224,11 @@ static int iommu_create_device_direct_mappings(struct iommu_domain *domain,
 				ret = iommu_map(domain, addr - map_size,
 						addr - map_size, map_size,
 						entry->prot, GFP_KERNEL);
+				if (ret == -EADDRINUSE && addr - map_size == 0) {
+					dev_warn_once(dev,
+						"iommu: identity mapping at addr 0x0 already exists, skipping\n");
+					ret = 0;
+				}
 				if (ret)
 					goto out;
 				map_size = 0;

base-commit: 57d76ceccee4b497eb835831206b50e72915a501
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists
  2026-02-21 23:50 [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists Antheas Kapenekakis
@ 2026-02-23  6:02 ` Vasant Hegde
  2026-02-23  7:46   ` Antheas Kapenekakis
  2026-02-24 19:23 ` Jason Gunthorpe
  1 sibling, 1 reply; 16+ messages in thread
From: Vasant Hegde @ 2026-02-23  6:02 UTC (permalink / raw)
  To: Antheas Kapenekakis, iommu, linux-kernel
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Alejandro Jimenez, dnaim, Mario.Limonciello

Antheas,

On 2/22/2026 5:20 AM, Antheas Kapenekakis wrote:
> Commit 789a5913b29c ("iommu/amd: Use the generic iommu page table")
> introduces the shared iommu page table for AMD IOMMU. Some bioses
> contain an identity mapping for address 0x0, which is not parsed
> properly (e.g., certain Strix Halo devices). This causes the DMA
> components of the device to fail to initialize (e.g., the NVMe SSD
> controller), leading to a failed post.
> 
> The failure is caused by iommu_create_device_direct_mappings(), which
> is the new mapping implementation. In it, address aliasing is handled
> via the following check:
> 
> ```
> phys_addr = iommu_iova_to_phys(domain, addr);
> if (!phys_addr) {
>         map_size += pg_size;
>         continue;
> }
> ````

Thanks for debugging and fixing it . Just wondering why can't we replace replace
above check with pfn_valid() ?


-Vasant


> 
> Obviously, the iommu_iova_to_phys() signature is faulty and aliases
> unmapped and 0 together, causing the allocation code to try to
> re-allocate the 0 address per device. However, it has too many
> instantiations to fix. Therefore, catch ret == -EADDRINUSE only when
> addr == 0 and, instead of bailing, skip the mapping and print a warning.
> 
> Closes: https://www.reddit.com/r/cachyos/comments/1r5sgr6/freeze_on_boot_after_kernel_update_to_619_fixed/
> Fixes: 789a5913b29c ("iommu/amd: Use the generic iommu page table")
> Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
> ---
>  drivers/iommu/iommu.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 4926a43118e6..d424b7124311 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1224,6 +1224,11 @@ static int iommu_create_device_direct_mappings(struct iommu_domain *domain,
>  				ret = iommu_map(domain, addr - map_size,
>  						addr - map_size, map_size,
>  						entry->prot, GFP_KERNEL);
> +				if (ret == -EADDRINUSE && addr - map_size == 0) {
> +					dev_warn_once(dev,
> +						"iommu: identity mapping at addr 0x0 already exists, skipping\n");
> +					ret = 0;
> +				}
>  				if (ret)
>  					goto out;
>  				map_size = 0;
> 
> base-commit: 57d76ceccee4b497eb835831206b50e72915a501


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists
  2026-02-23  6:02 ` Vasant Hegde
@ 2026-02-23  7:46   ` Antheas Kapenekakis
  2026-02-24  8:25     ` Vasant Hegde
  0 siblings, 1 reply; 16+ messages in thread
From: Antheas Kapenekakis @ 2026-02-23  7:46 UTC (permalink / raw)
  To: Vasant Hegde
  Cc: iommu, linux-kernel, Joerg Roedel, Will Deacon, Robin Murphy,
	Jason Gunthorpe, Alejandro Jimenez, dnaim, Mario.Limonciello

On Mon, 23 Feb 2026 at 07:02, Vasant Hegde <vasant.hegde@amd.com> wrote:
>
> Antheas,
>
> On 2/22/2026 5:20 AM, Antheas Kapenekakis wrote:
> > Commit 789a5913b29c ("iommu/amd: Use the generic iommu page table")
> > introduces the shared iommu page table for AMD IOMMU. Some bioses
> > contain an identity mapping for address 0x0, which is not parsed
> > properly (e.g., certain Strix Halo devices). This causes the DMA
> > components of the device to fail to initialize (e.g., the NVMe SSD
> > controller), leading to a failed post.
> >
> > The failure is caused by iommu_create_device_direct_mappings(), which
> > is the new mapping implementation. In it, address aliasing is handled
> > via the following check:
> >
> > ```
> > phys_addr = iommu_iova_to_phys(domain, addr);
> > if (!phys_addr) {
> >         map_size += pg_size;
> >         continue;
> > }
> > ````
>
> Thanks for debugging and fixing it . Just wondering why can't we replace replace
> above check with pfn_valid() ?
>

Hi Vasant,
I can check later today

From a cursory glance though, it does not seem very suitable... Can you expand?

Antheas

>
> -Vasant
>
>
> >
> > Obviously, the iommu_iova_to_phys() signature is faulty and aliases
> > unmapped and 0 together, causing the allocation code to try to
> > re-allocate the 0 address per device. However, it has too many
> > instantiations to fix. Therefore, catch ret == -EADDRINUSE only when
> > addr == 0 and, instead of bailing, skip the mapping and print a warning.
> >
> > Closes: https://www.reddit.com/r/cachyos/comments/1r5sgr6/freeze_on_boot_after_kernel_update_to_619_fixed/
> > Fixes: 789a5913b29c ("iommu/amd: Use the generic iommu page table")
> > Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
> > ---
> >  drivers/iommu/iommu.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index 4926a43118e6..d424b7124311 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -1224,6 +1224,11 @@ static int iommu_create_device_direct_mappings(struct iommu_domain *domain,
> >                               ret = iommu_map(domain, addr - map_size,
> >                                               addr - map_size, map_size,
> >                                               entry->prot, GFP_KERNEL);
> > +                             if (ret == -EADDRINUSE && addr - map_size == 0) {
> > +                                     dev_warn_once(dev,
> > +                                             "iommu: identity mapping at addr 0x0 already exists, skipping\n");
> > +                                     ret = 0;
> > +                             }
> >                               if (ret)
> >                                       goto out;
> >                               map_size = 0;
> >
> > base-commit: 57d76ceccee4b497eb835831206b50e72915a501
>
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists
  2026-02-23  7:46   ` Antheas Kapenekakis
@ 2026-02-24  8:25     ` Vasant Hegde
  2026-02-24  9:16       ` Antheas Kapenekakis
  0 siblings, 1 reply; 16+ messages in thread
From: Vasant Hegde @ 2026-02-24  8:25 UTC (permalink / raw)
  To: Antheas Kapenekakis
  Cc: iommu, linux-kernel, Joerg Roedel, Will Deacon, Robin Murphy,
	Jason Gunthorpe, Alejandro Jimenez, dnaim, Mario.Limonciello

Hi Antheas,

On 2/23/2026 1:16 PM, Antheas Kapenekakis wrote:
> [You don't often get email from lkml@antheas.dev. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> On Mon, 23 Feb 2026 at 07:02, Vasant Hegde <vasant.hegde@amd.com> wrote:
>>
>> Antheas,
>>
>> On 2/22/2026 5:20 AM, Antheas Kapenekakis wrote:
>>> Commit 789a5913b29c ("iommu/amd: Use the generic iommu page table")
>>> introduces the shared iommu page table for AMD IOMMU. Some bioses
>>> contain an identity mapping for address 0x0, which is not parsed
>>> properly (e.g., certain Strix Halo devices). This causes the DMA
>>> components of the device to fail to initialize (e.g., the NVMe SSD
>>> controller), leading to a failed post.
>>>
>>> The failure is caused by iommu_create_device_direct_mappings(), which
>>> is the new mapping implementation. In it, address aliasing is handled
>>> via the following check:
>>>
>>> ```
>>> phys_addr = iommu_iova_to_phys(domain, addr);
>>> if (!phys_addr) {
>>>         map_size += pg_size;
>>>         continue;
>>> }
>>> ````
>>
>> Thanks for debugging and fixing it . Just wondering why can't we replace replace
>> above check with pfn_valid() ?
>>
> 
> Hi Vasant,
> I can check later today
> 
> From a cursory glance though, it does not seem very suitable... Can you expand?

Looking into the details, you are right. It won't work for direct_mapping().


May be we can have this patch for now, but I think on long run we should update
iova_to_phys() ops.

-Vasant



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists
  2026-02-24  8:25     ` Vasant Hegde
@ 2026-02-24  9:16       ` Antheas Kapenekakis
  0 siblings, 0 replies; 16+ messages in thread
From: Antheas Kapenekakis @ 2026-02-24  9:16 UTC (permalink / raw)
  To: Vasant Hegde
  Cc: iommu, linux-kernel, Joerg Roedel, Will Deacon, Robin Murphy,
	Jason Gunthorpe, Alejandro Jimenez, dnaim, Mario.Limonciello

On Tue, 24 Feb 2026 at 09:26, Vasant Hegde <vasant.hegde@amd.com> wrote:
>
> Hi Antheas,
>
> On 2/23/2026 1:16 PM, Antheas Kapenekakis wrote:
> > [You don't often get email from lkml@antheas.dev. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> >
> > On Mon, 23 Feb 2026 at 07:02, Vasant Hegde <vasant.hegde@amd.com> wrote:
> >>
> >> Antheas,
> >>
> >> On 2/22/2026 5:20 AM, Antheas Kapenekakis wrote:
> >>> Commit 789a5913b29c ("iommu/amd: Use the generic iommu page table")
> >>> introduces the shared iommu page table for AMD IOMMU. Some bioses
> >>> contain an identity mapping for address 0x0, which is not parsed
> >>> properly (e.g., certain Strix Halo devices). This causes the DMA
> >>> components of the device to fail to initialize (e.g., the NVMe SSD
> >>> controller), leading to a failed post.
> >>>
> >>> The failure is caused by iommu_create_device_direct_mappings(), which
> >>> is the new mapping implementation. In it, address aliasing is handled
> >>> via the following check:
> >>>
> >>> ```
> >>> phys_addr = iommu_iova_to_phys(domain, addr);
> >>> if (!phys_addr) {
> >>>         map_size += pg_size;
> >>>         continue;
> >>> }
> >>> ````
> >>
> >> Thanks for debugging and fixing it . Just wondering why can't we replace replace
> >> above check with pfn_valid() ?
> >>
> >
> > Hi Vasant,
> > I can check later today
> >
> > From a cursory glance though, it does not seem very suitable... Can you expand?
>
> Looking into the details, you are right. It won't work for direct_mapping().
>
>
> May be we can have this patch for now, but I think on long run we should update
> iova_to_phys() ops.

Agreed. This commit can be backported so I would like to see it merged.

When you move forward with the refactor, feel free to cc for me to test

Best,
Antheas

> -Vasant
>
>
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists
  2026-02-21 23:50 [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists Antheas Kapenekakis
  2026-02-23  6:02 ` Vasant Hegde
@ 2026-02-24 19:23 ` Jason Gunthorpe
  2026-02-24 19:33   ` Antheas Kapenekakis
  1 sibling, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2026-02-24 19:23 UTC (permalink / raw)
  To: Antheas Kapenekakis
  Cc: iommu, linux-kernel, Joerg Roedel, Will Deacon, Robin Murphy,
	Vasant Hegde, Alejandro Jimenez, dnaim, Mario.Limonciello

On Sun, Feb 22, 2026 at 12:50:50AM +0100, Antheas Kapenekakis wrote:
> Commit 789a5913b29c ("iommu/amd: Use the generic iommu page table")
> introduces the shared iommu page table for AMD IOMMU. Some bioses
> contain an identity mapping for address 0x0, which is not parsed
> properly (e.g., certain Strix Halo devices). This causes the DMA
> components of the device to fail to initialize (e.g., the NVMe SSD
> controller), leading to a failed post.

I'm trying to understand the issue here, is it that the old AMD code
incorrectly succeeded iommu_map() on top of an existing mapping while
the new code returns -EADDRINUSE?

Then the existing guard for double mapping doesn't work since 0 is an
ambiguous return?

> @@ -1224,6 +1224,11 @@ static int iommu_create_device_direct_mappings(struct iommu_domain *domain,
>  				ret = iommu_map(domain, addr - map_size,
>  						addr - map_size, map_size,
>  						entry->prot, GFP_KERNEL);
> +				if (ret == -EADDRINUSE && addr - map_size == 0) {
> +					dev_warn_once(dev,
> +						"iommu: identity mapping at addr 0x0 already exists, skipping\n");
> +					ret = 0;
> +				}

Nothing else prints here, so I wouldn't print either..

Apparently we just silently ignore if the BIOS creates conflicting
mappings for some reason..

I think it is OK to just ignore EADDRINUSE always, it unambigously
means a mapping is present and the intention of this logic is to
ignore double mappings to the same IOVA.

The cleaner fix is to correct the return code of iommu_iova_to_phys()
or to make EADDRINUSE reliable and remove the iommu_iova_to_phys(),
those are both a lot of trouble so I think this proposed single if is
reasonabe. Just please clean up the commit message to be a bit clearer
on what caused this regression and add a short comment above the new if:

 0 return from iommu_iova_to_phys() is ambiguous, it could mean a
 present mapping. EADDRINUSE is reliable when supported, it means the
 IOVA is already mapped, so ignore it to resolve the ambiguity.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists
  2026-02-24 19:23 ` Jason Gunthorpe
@ 2026-02-24 19:33   ` Antheas Kapenekakis
  2026-02-24 19:43     ` Jason Gunthorpe
  0 siblings, 1 reply; 16+ messages in thread
From: Antheas Kapenekakis @ 2026-02-24 19:33 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, linux-kernel, Joerg Roedel, Will Deacon, Robin Murphy,
	Vasant Hegde, Alejandro Jimenez, dnaim, Mario.Limonciello

On Tue, 24 Feb 2026 at 20:23, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Sun, Feb 22, 2026 at 12:50:50AM +0100, Antheas Kapenekakis wrote:
> > Commit 789a5913b29c ("iommu/amd: Use the generic iommu page table")
> > introduces the shared iommu page table for AMD IOMMU. Some bioses
> > contain an identity mapping for address 0x0, which is not parsed
> > properly (e.g., certain Strix Halo devices). This causes the DMA
> > components of the device to fail to initialize (e.g., the NVMe SSD
> > controller), leading to a failed post.
>
> I'm trying to understand the issue here, is it that the old AMD code
> incorrectly succeeded iommu_map() on top of an existing mapping while
> the new code returns -EADDRINUSE?
>
> Then the existing guard for double mapping doesn't work since 0 is an
> ambiguous return?

Hi Jason,

It seems like the previous code correctly handled the 0 case, and the
new code does not due to the ambiguous return.

> > @@ -1224,6 +1224,11 @@ static int iommu_create_device_direct_mappings(struct iommu_domain *domain,
> >                               ret = iommu_map(domain, addr - map_size,
> >                                               addr - map_size, map_size,
> >                                               entry->prot, GFP_KERNEL);
> > +                             if (ret == -EADDRINUSE && addr - map_size == 0) {
> > +                                     dev_warn_once(dev,
> > +                                             "iommu: identity mapping at addr 0x0 already exists, skipping\n");
> > +                                     ret = 0;
> > +                             }
>
> Nothing else prints here, so I wouldn't print either..
>
> Apparently we just silently ignore if the BIOS creates conflicting
> mappings for some reason..

It seems like multiple DMA devices get a shared memory space, so
applying the mappings multiple times is expected. The problem is
handling the 0 address.

> I think it is OK to just ignore EADDRINUSE always, it unambigously
> means a mapping is present and the intention of this logic is to
> ignore double mappings to the same IOVA.
>
> The cleaner fix is to correct the return code of iommu_iova_to_phys()
> or to make EADDRINUSE reliable and remove the iommu_iova_to_phys(),
> those are both a lot of trouble so I think this proposed single if is
> reasonabe. Just please clean up the commit message to be a bit clearer
> on what caused this regression and add a short comment above the new if:
>
>  0 return from iommu_iova_to_phys() is ambiguous, it could mean a
>  present mapping. EADDRINUSE is reliable when supported, it means the
>  IOVA is already mapped, so ignore it to resolve the ambiguity.

If you think that this patch is a long term solution, I can drop the
print, the addr == 0 check, add a comment on top, then repost a V2
with an updated description to be more concrete.

Best,
Antheas

> Thanks,
> Jason
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists
  2026-02-24 19:33   ` Antheas Kapenekakis
@ 2026-02-24 19:43     ` Jason Gunthorpe
  2026-02-24 19:51       ` Antheas Kapenekakis
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2026-02-24 19:43 UTC (permalink / raw)
  To: Antheas Kapenekakis
  Cc: iommu, linux-kernel, Joerg Roedel, Will Deacon, Robin Murphy,
	Vasant Hegde, Alejandro Jimenez, dnaim, Mario.Limonciello

On Tue, Feb 24, 2026 at 08:33:50PM +0100, Antheas Kapenekakis wrote:
> On Tue, 24 Feb 2026 at 20:23, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Sun, Feb 22, 2026 at 12:50:50AM +0100, Antheas Kapenekakis wrote:
> > > Commit 789a5913b29c ("iommu/amd: Use the generic iommu page table")
> > > introduces the shared iommu page table for AMD IOMMU. Some bioses
> > > contain an identity mapping for address 0x0, which is not parsed
> > > properly (e.g., certain Strix Halo devices). This causes the DMA
> > > components of the device to fail to initialize (e.g., the NVMe SSD
> > > controller), leading to a failed post.
> >
> > I'm trying to understand the issue here, is it that the old AMD code
> > incorrectly succeeded iommu_map() on top of an existing mapping while
> > the new code returns -EADDRINUSE?
> >
> > Then the existing guard for double mapping doesn't work since 0 is an
> > ambiguous return?
> 
> Hi Jason,
> 
> It seems like the previous code correctly handled the 0 case, and the
> new code does not due to the ambiguous return.

I checked the old AMD driver it is definately returning 0 from
iommu_iova_to_phys(), so that hasn't changed.

> > reasonabe. Just please clean up the commit message to be a bit clearer
> > on what caused this regression and add a short comment above the new if:
> >
> >  0 return from iommu_iova_to_phys() is ambiguous, it could mean a
> >  present mapping. EADDRINUSE is reliable when supported, it means the
> >  IOVA is already mapped, so ignore it to resolve the ambiguity.
> 
> If you think that this patch is a long term solution, I can drop the
> print, the addr == 0 check, add a comment on top, then repost a V2
> with an updated description to be more concrete.

I think it is a good solution as-is, I just want clarity on what
actually regressed here, and I think it is that iommu_map() behavior
changed.

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists
  2026-02-24 19:43     ` Jason Gunthorpe
@ 2026-02-24 19:51       ` Antheas Kapenekakis
  2026-02-25 18:27         ` Robin Murphy
  0 siblings, 1 reply; 16+ messages in thread
From: Antheas Kapenekakis @ 2026-02-24 19:51 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, linux-kernel, Joerg Roedel, Will Deacon, Robin Murphy,
	Vasant Hegde, Alejandro Jimenez, dnaim, Mario.Limonciello

On Tue, 24 Feb 2026 at 20:43, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Feb 24, 2026 at 08:33:50PM +0100, Antheas Kapenekakis wrote:
> > On Tue, 24 Feb 2026 at 20:23, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Sun, Feb 22, 2026 at 12:50:50AM +0100, Antheas Kapenekakis wrote:
> > > > Commit 789a5913b29c ("iommu/amd: Use the generic iommu page table")
> > > > introduces the shared iommu page table for AMD IOMMU. Some bioses
> > > > contain an identity mapping for address 0x0, which is not parsed
> > > > properly (e.g., certain Strix Halo devices). This causes the DMA
> > > > components of the device to fail to initialize (e.g., the NVMe SSD
> > > > controller), leading to a failed post.
> > >
> > > I'm trying to understand the issue here, is it that the old AMD code
> > > incorrectly succeeded iommu_map() on top of an existing mapping while
> > > the new code returns -EADDRINUSE?
> > >
> > > Then the existing guard for double mapping doesn't work since 0 is an
> > > ambiguous return?
> >
> > Hi Jason,
> >
> > It seems like the previous code correctly handled the 0 case, and the
> > new code does not due to the ambiguous return.
>
> I checked the old AMD driver it is definately returning 0 from
> iommu_iova_to_phys(), so that hasn't changed.
>
> > > reasonabe. Just please clean up the commit message to be a bit clearer
> > > on what caused this regression and add a short comment above the new if:
> > >
> > >  0 return from iommu_iova_to_phys() is ambiguous, it could mean a
> > >  present mapping. EADDRINUSE is reliable when supported, it means the
> > >  IOVA is already mapped, so ignore it to resolve the ambiguity.
> >
> > If you think that this patch is a long term solution, I can drop the
> > print, the addr == 0 check, add a comment on top, then repost a V2
> > with an updated description to be more concrete.
>
> I think it is a good solution as-is, I just want clarity on what
> actually regressed here, and I think it is that iommu_map() behavior
> changed.

Reverting 789a5913b29c ("iommu/amd: Use the generic iommu page table")
plus two follow-up fixes also worked.

Looking through it, the removed amd_iommu_map_pages() /
amd_iommu_iova_to_phys() in the previous ops might have behaved
differently.

Antheas

> Jason
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists
  2026-02-24 19:51       ` Antheas Kapenekakis
@ 2026-02-25 18:27         ` Robin Murphy
  2026-02-25 22:05           ` Antheas Kapenekakis
  0 siblings, 1 reply; 16+ messages in thread
From: Robin Murphy @ 2026-02-25 18:27 UTC (permalink / raw)
  To: Antheas Kapenekakis, Jason Gunthorpe
  Cc: iommu, linux-kernel, Joerg Roedel, Will Deacon, Vasant Hegde,
	Alejandro Jimenez, dnaim, Mario.Limonciello

On 2026-02-24 7:51 pm, Antheas Kapenekakis wrote:
> On Tue, 24 Feb 2026 at 20:43, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>>
>> On Tue, Feb 24, 2026 at 08:33:50PM +0100, Antheas Kapenekakis wrote:
>>> On Tue, 24 Feb 2026 at 20:23, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>>>>
>>>> On Sun, Feb 22, 2026 at 12:50:50AM +0100, Antheas Kapenekakis wrote:
>>>>> Commit 789a5913b29c ("iommu/amd: Use the generic iommu page table")
>>>>> introduces the shared iommu page table for AMD IOMMU. Some bioses
>>>>> contain an identity mapping for address 0x0, which is not parsed
>>>>> properly (e.g., certain Strix Halo devices). This causes the DMA
>>>>> components of the device to fail to initialize (e.g., the NVMe SSD
>>>>> controller), leading to a failed post.
>>>>
>>>> I'm trying to understand the issue here, is it that the old AMD code
>>>> incorrectly succeeded iommu_map() on top of an existing mapping while
>>>> the new code returns -EADDRINUSE?
>>>>
>>>> Then the existing guard for double mapping doesn't work since 0 is an
>>>> ambiguous return?
>>>
>>> Hi Jason,
>>>
>>> It seems like the previous code correctly handled the 0 case, and the
>>> new code does not due to the ambiguous return.
>>
>> I checked the old AMD driver it is definately returning 0 from
>> iommu_iova_to_phys(), so that hasn't changed.
>>
>>>> reasonabe. Just please clean up the commit message to be a bit clearer
>>>> on what caused this regression and add a short comment above the new if:
>>>>
>>>>   0 return from iommu_iova_to_phys() is ambiguous, it could mean a
>>>>   present mapping. EADDRINUSE is reliable when supported, it means the
>>>>   IOVA is already mapped, so ignore it to resolve the ambiguity.
>>>
>>> If you think that this patch is a long term solution, I can drop the
>>> print, the addr == 0 check, add a comment on top, then repost a V2
>>> with an updated description to be more concrete.
>>
>> I think it is a good solution as-is, I just want clarity on what
>> actually regressed here, and I think it is that iommu_map() behavior
>> changed.
> 
> Reverting 789a5913b29c ("iommu/amd: Use the generic iommu page table")
> plus two follow-up fixes also worked.
> 
> Looking through it, the removed amd_iommu_map_pages() /
> amd_iommu_iova_to_phys() in the previous ops might have behaved
> differently.

Yes, the old AMD code would happily replace an existing mapping without
question (but whether it would correctly invalidate TLBs if just
replacing a leaf entry with a different leaf entry at the same level
I'm not entirely sure...)

However, waiting for a mapping to fail (which may scream its own
warning) and then attempting to paper over it seems like an awful bodge.
Why not just do something like this?

Cheers,
Robin.

----->8-----
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 35db51780954..682e95c94236 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1212,8 +1212,11 @@ static int iommu_create_device_direct_mappings(struct iommu_domain *domain,
  
  			if (addr == end)
  				goto map_end;
-
-			phys_addr = iommu_iova_to_phys(domain, addr);
+			/*
+			 * Check for existing translations using an offset, to
+			 * distinguish a mapping of the page at PA 0 from none.
+			 */
+			phys_addr = iommu_iova_to_phys(domain, addr + 1);
  			if (!phys_addr) {
  				map_size += pg_size;
  				continue;


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists
  2026-02-25 18:27         ` Robin Murphy
@ 2026-02-25 22:05           ` Antheas Kapenekakis
  2026-02-26 19:46             ` Robin Murphy
  0 siblings, 1 reply; 16+ messages in thread
From: Antheas Kapenekakis @ 2026-02-25 22:05 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Jason Gunthorpe, iommu, linux-kernel, Joerg Roedel, Will Deacon,
	Vasant Hegde, Alejandro Jimenez, dnaim, Mario.Limonciello

On Wed, 25 Feb 2026 at 19:27, Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 2026-02-24 7:51 pm, Antheas Kapenekakis wrote:
> > On Tue, 24 Feb 2026 at 20:43, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >>
> >> On Tue, Feb 24, 2026 at 08:33:50PM +0100, Antheas Kapenekakis wrote:
> >>> On Tue, 24 Feb 2026 at 20:23, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >>>>
> >>>> On Sun, Feb 22, 2026 at 12:50:50AM +0100, Antheas Kapenekakis wrote:
> >>>>> Commit 789a5913b29c ("iommu/amd: Use the generic iommu page table")
> >>>>> introduces the shared iommu page table for AMD IOMMU. Some bioses
> >>>>> contain an identity mapping for address 0x0, which is not parsed
> >>>>> properly (e.g., certain Strix Halo devices). This causes the DMA
> >>>>> components of the device to fail to initialize (e.g., the NVMe SSD
> >>>>> controller), leading to a failed post.
> >>>>
> >>>> I'm trying to understand the issue here, is it that the old AMD code
> >>>> incorrectly succeeded iommu_map() on top of an existing mapping while
> >>>> the new code returns -EADDRINUSE?
> >>>>
> >>>> Then the existing guard for double mapping doesn't work since 0 is an
> >>>> ambiguous return?
> >>>
> >>> Hi Jason,
> >>>
> >>> It seems like the previous code correctly handled the 0 case, and the
> >>> new code does not due to the ambiguous return.
> >>
> >> I checked the old AMD driver it is definately returning 0 from
> >> iommu_iova_to_phys(), so that hasn't changed.
> >>
> >>>> reasonabe. Just please clean up the commit message to be a bit clearer
> >>>> on what caused this regression and add a short comment above the new if:
> >>>>
> >>>>   0 return from iommu_iova_to_phys() is ambiguous, it could mean a
> >>>>   present mapping. EADDRINUSE is reliable when supported, it means the
> >>>>   IOVA is already mapped, so ignore it to resolve the ambiguity.
> >>>
> >>> If you think that this patch is a long term solution, I can drop the
> >>> print, the addr == 0 check, add a comment on top, then repost a V2
> >>> with an updated description to be more concrete.
> >>
> >> I think it is a good solution as-is, I just want clarity on what
> >> actually regressed here, and I think it is that iommu_map() behavior
> >> changed.
> >
> > Reverting 789a5913b29c ("iommu/amd: Use the generic iommu page table")
> > plus two follow-up fixes also worked.
> >
> > Looking through it, the removed amd_iommu_map_pages() /
> > amd_iommu_iova_to_phys() in the previous ops might have behaved
> > differently.
>
> Yes, the old AMD code would happily replace an existing mapping without
> question (but whether it would correctly invalidate TLBs if just
> replacing a leaf entry with a different leaf entry at the same level
> I'm not entirely sure...)
>
> However, waiting for a mapping to fail (which may scream its own
> warning) and then attempting to paper over it seems like an awful bodge.
> Why not just do something like this?
>
> Cheers,
> Robin.
>
> ----->8-----
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 35db51780954..682e95c94236 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1212,8 +1212,11 @@ static int iommu_create_device_direct_mappings(struct iommu_domain *domain,
>
>                         if (addr == end)
>                                 goto map_end;
> -
> -                       phys_addr = iommu_iova_to_phys(domain, addr);
> +                       /*
> +                        * Check for existing translations using an offset, to
> +                        * distinguish a mapping of the page at PA 0 from none.
> +                        */
> +                       phys_addr = iommu_iova_to_phys(domain, addr + 1);

Hi Robin,
Assuming pg_size != 1, which it seems to always be due to `1UL <<
__ffs(domain->pgsize_bitmap)` is an interesting solution. Then again,
it assumes that the underlying implementations for iommu_iova_to_phys
would respect translating unaligned addresses. I cannot comment on
that.

I would tend to consider both solutions not ideal, and edge towards
Jason's suggestion of universally ignoring EADDRINUSE without an
error. However, I am not an absolutist in that regard.

I am on a business trip and do not have the SMTP credentials with me
for git, so I will send a V2 tomorrow. I already drafted it. I will
make sure to cc new participants on this thread if not already
included.

As an aside, it seems that the issue fixed by this patch is not the
only one introduced in 6.19, with [1] having users where this patch
does not help. It seems there are additional issues with thunderbolt
docks. iommu_map could be the culprit and perhaps forceful remapping,
as was done previously for AMD would be a more universal solution.
This way, EADDRINUSE stops being an error.

Best,
Antheas

[1] https://github.com/CachyOS/linux-cachyos/issues/704

>                         if (!phys_addr) {
>                                 map_size += pg_size;
>                                 continue;
>
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists
  2026-02-25 22:05           ` Antheas Kapenekakis
@ 2026-02-26 19:46             ` Robin Murphy
  2026-02-26 20:40               ` Antheas Kapenekakis
  0 siblings, 1 reply; 16+ messages in thread
From: Robin Murphy @ 2026-02-26 19:46 UTC (permalink / raw)
  To: Antheas Kapenekakis
  Cc: Jason Gunthorpe, iommu, linux-kernel, Joerg Roedel, Will Deacon,
	Vasant Hegde, Alejandro Jimenez, dnaim, Mario.Limonciello

On 25/02/2026 10:05 pm, Antheas Kapenekakis wrote:
> On Wed, 25 Feb 2026 at 19:27, Robin Murphy <robin.murphy@arm.com> wrote:
>>
>> On 2026-02-24 7:51 pm, Antheas Kapenekakis wrote:
>>> On Tue, 24 Feb 2026 at 20:43, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>>>>
>>>> On Tue, Feb 24, 2026 at 08:33:50PM +0100, Antheas Kapenekakis wrote:
>>>>> On Tue, 24 Feb 2026 at 20:23, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>>>>>>
>>>>>> On Sun, Feb 22, 2026 at 12:50:50AM +0100, Antheas Kapenekakis wrote:
>>>>>>> Commit 789a5913b29c ("iommu/amd: Use the generic iommu page table")
>>>>>>> introduces the shared iommu page table for AMD IOMMU. Some bioses
>>>>>>> contain an identity mapping for address 0x0, which is not parsed
>>>>>>> properly (e.g., certain Strix Halo devices). This causes the DMA
>>>>>>> components of the device to fail to initialize (e.g., the NVMe SSD
>>>>>>> controller), leading to a failed post.
>>>>>>
>>>>>> I'm trying to understand the issue here, is it that the old AMD code
>>>>>> incorrectly succeeded iommu_map() on top of an existing mapping while
>>>>>> the new code returns -EADDRINUSE?
>>>>>>
>>>>>> Then the existing guard for double mapping doesn't work since 0 is an
>>>>>> ambiguous return?
>>>>>
>>>>> Hi Jason,
>>>>>
>>>>> It seems like the previous code correctly handled the 0 case, and the
>>>>> new code does not due to the ambiguous return.
>>>>
>>>> I checked the old AMD driver it is definately returning 0 from
>>>> iommu_iova_to_phys(), so that hasn't changed.
>>>>
>>>>>> reasonabe. Just please clean up the commit message to be a bit clearer
>>>>>> on what caused this regression and add a short comment above the new if:
>>>>>>
>>>>>>    0 return from iommu_iova_to_phys() is ambiguous, it could mean a
>>>>>>    present mapping. EADDRINUSE is reliable when supported, it means the
>>>>>>    IOVA is already mapped, so ignore it to resolve the ambiguity.
>>>>>
>>>>> If you think that this patch is a long term solution, I can drop the
>>>>> print, the addr == 0 check, add a comment on top, then repost a V2
>>>>> with an updated description to be more concrete.
>>>>
>>>> I think it is a good solution as-is, I just want clarity on what
>>>> actually regressed here, and I think it is that iommu_map() behavior
>>>> changed.
>>>
>>> Reverting 789a5913b29c ("iommu/amd: Use the generic iommu page table")
>>> plus two follow-up fixes also worked.
>>>
>>> Looking through it, the removed amd_iommu_map_pages() /
>>> amd_iommu_iova_to_phys() in the previous ops might have behaved
>>> differently.
>>
>> Yes, the old AMD code would happily replace an existing mapping without
>> question (but whether it would correctly invalidate TLBs if just
>> replacing a leaf entry with a different leaf entry at the same level
>> I'm not entirely sure...)
>>
>> However, waiting for a mapping to fail (which may scream its own
>> warning) and then attempting to paper over it seems like an awful bodge.
>> Why not just do something like this?
>>
>> Cheers,
>> Robin.
>>
>> ----->8-----
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index 35db51780954..682e95c94236 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -1212,8 +1212,11 @@ static int iommu_create_device_direct_mappings(struct iommu_domain *domain,
>>
>>                          if (addr == end)
>>                                  goto map_end;
>> -
>> -                       phys_addr = iommu_iova_to_phys(domain, addr);
>> +                       /*
>> +                        * Check for existing translations using an offset, to
>> +                        * distinguish a mapping of the page at PA 0 from none.
>> +                        */
>> +                       phys_addr = iommu_iova_to_phys(domain, addr + 1);
> 
> Hi Robin,
> Assuming pg_size != 1, which it seems to always be due to `1UL <<
> __ffs(domain->pgsize_bitmap)` is an interesting solution. Then again,
> it assumes that the underlying implementations for iommu_iova_to_phys
> would respect translating unaligned addresses. I cannot comment on
> that.

It was a somewhat rhetorical question. We can safely assume that nobody 
would ever build an IOMMU with a 1-byte page size, and that even if 
someone was insane enough to do so, Linux would never support using that 
granularity (since it would be nice to have some memory left for 
anything other than infinite IOMMU pagetables). It's also indeed not 
impossible that a driver might have a bug in its iova_to_phys 
implementation, but then that driver would have a bug that would also 
break other iova_to_phys users (e.g. iommu-dma) anyway. Every other 
possible return value from iova_to_phys is unambiguous, and the single 
one that isn't is trivial to avoid in this case, so why do anything 
other than avoid it?

> I would tend to consider both solutions not ideal, and edge towards
> Jason's suggestion of universally ignoring EADDRINUSE without an
> error. However, I am not an absolutist in that regard.

Relying on magic return values doesn't even work around this issue 
robustly, since even among the drivers which do treat double-mapping as 
an error, they do not all use the same error codes, and as mentioned 
some would already log a visible warning before getting that far. This 
is some core code not quite working as intended, and that fact that it's 
apparently only shown up on AMD platforms which also previously happened 
to hide it hardly makes it right to paper over in a manner that only 
"works" for a few drivers including AMD, rather than properly fix the 
root cause and make the initial check consistently work as intended to 
prevent double-mapping attempts in the first place.

Thanks,
Robin.

> I am on a business trip and do not have the SMTP credentials with me
> for git, so I will send a V2 tomorrow. I already drafted it. I will
> make sure to cc new participants on this thread if not already
> included.
> 
> As an aside, it seems that the issue fixed by this patch is not the
> only one introduced in 6.19, with [1] having users where this patch
> does not help. It seems there are additional issues with thunderbolt
> docks. iommu_map could be the culprit and perhaps forceful remapping,
> as was done previously for AMD would be a more universal solution.
> This way, EADDRINUSE stops being an error.
> 
> Best,
> Antheas
> 
> [1] https://github.com/CachyOS/linux-cachyos/issues/704
> 
>>                          if (!phys_addr) {
>>                                  map_size += pg_size;
>>                                  continue;
>>
>>
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists
  2026-02-26 19:46             ` Robin Murphy
@ 2026-02-26 20:40               ` Antheas Kapenekakis
  2026-02-27  1:03                 ` Jason Gunthorpe
  0 siblings, 1 reply; 16+ messages in thread
From: Antheas Kapenekakis @ 2026-02-26 20:40 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Jason Gunthorpe, iommu, linux-kernel, Joerg Roedel, Will Deacon,
	Vasant Hegde, Alejandro Jimenez, dnaim, Mario.Limonciello

On Thu, 26 Feb 2026 at 20:46, Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 25/02/2026 10:05 pm, Antheas Kapenekakis wrote:
> > On Wed, 25 Feb 2026 at 19:27, Robin Murphy <robin.murphy@arm.com> wrote:
> >>
> >> On 2026-02-24 7:51 pm, Antheas Kapenekakis wrote:
> >>> On Tue, 24 Feb 2026 at 20:43, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >>>>
> >>>> On Tue, Feb 24, 2026 at 08:33:50PM +0100, Antheas Kapenekakis wrote:
> >>>>> On Tue, 24 Feb 2026 at 20:23, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >>>>>>
> >>>>>> On Sun, Feb 22, 2026 at 12:50:50AM +0100, Antheas Kapenekakis wrote:
> >>>>>>> Commit 789a5913b29c ("iommu/amd: Use the generic iommu page table")
> >>>>>>> introduces the shared iommu page table for AMD IOMMU. Some bioses
> >>>>>>> contain an identity mapping for address 0x0, which is not parsed
> >>>>>>> properly (e.g., certain Strix Halo devices). This causes the DMA
> >>>>>>> components of the device to fail to initialize (e.g., the NVMe SSD
> >>>>>>> controller), leading to a failed post.
> >>>>>>
> >>>>>> I'm trying to understand the issue here, is it that the old AMD code
> >>>>>> incorrectly succeeded iommu_map() on top of an existing mapping while
> >>>>>> the new code returns -EADDRINUSE?
> >>>>>>
> >>>>>> Then the existing guard for double mapping doesn't work since 0 is an
> >>>>>> ambiguous return?
> >>>>>
> >>>>> Hi Jason,
> >>>>>
> >>>>> It seems like the previous code correctly handled the 0 case, and the
> >>>>> new code does not due to the ambiguous return.
> >>>>
> >>>> I checked the old AMD driver it is definately returning 0 from
> >>>> iommu_iova_to_phys(), so that hasn't changed.
> >>>>
> >>>>>> reasonabe. Just please clean up the commit message to be a bit clearer
> >>>>>> on what caused this regression and add a short comment above the new if:
> >>>>>>
> >>>>>>    0 return from iommu_iova_to_phys() is ambiguous, it could mean a
> >>>>>>    present mapping. EADDRINUSE is reliable when supported, it means the
> >>>>>>    IOVA is already mapped, so ignore it to resolve the ambiguity.
> >>>>>
> >>>>> If you think that this patch is a long term solution, I can drop the
> >>>>> print, the addr == 0 check, add a comment on top, then repost a V2
> >>>>> with an updated description to be more concrete.
> >>>>
> >>>> I think it is a good solution as-is, I just want clarity on what
> >>>> actually regressed here, and I think it is that iommu_map() behavior
> >>>> changed.
> >>>
> >>> Reverting 789a5913b29c ("iommu/amd: Use the generic iommu page table")
> >>> plus two follow-up fixes also worked.
> >>>
> >>> Looking through it, the removed amd_iommu_map_pages() /
> >>> amd_iommu_iova_to_phys() in the previous ops might have behaved
> >>> differently.
> >>
> >> Yes, the old AMD code would happily replace an existing mapping without
> >> question (but whether it would correctly invalidate TLBs if just
> >> replacing a leaf entry with a different leaf entry at the same level
> >> I'm not entirely sure...)
> >>
> >> However, waiting for a mapping to fail (which may scream its own
> >> warning) and then attempting to paper over it seems like an awful bodge.
> >> Why not just do something like this?
> >>
> >> Cheers,
> >> Robin.
> >>
> >> ----->8-----
> >> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> >> index 35db51780954..682e95c94236 100644
> >> --- a/drivers/iommu/iommu.c
> >> +++ b/drivers/iommu/iommu.c
> >> @@ -1212,8 +1212,11 @@ static int iommu_create_device_direct_mappings(struct iommu_domain *domain,
> >>
> >>                          if (addr == end)
> >>                                  goto map_end;
> >> -
> >> -                       phys_addr = iommu_iova_to_phys(domain, addr);
> >> +                       /*
> >> +                        * Check for existing translations using an offset, to
> >> +                        * distinguish a mapping of the page at PA 0 from none.
> >> +                        */
> >> +                       phys_addr = iommu_iova_to_phys(domain, addr + 1);
> >
> > Hi Robin,
> > Assuming pg_size != 1, which it seems to always be due to `1UL <<
> > __ffs(domain->pgsize_bitmap)` is an interesting solution. Then again,
> > it assumes that the underlying implementations for iommu_iova_to_phys
> > would respect translating unaligned addresses. I cannot comment on
> > that.
>
> It was a somewhat rhetorical question. We can safely assume that nobody
> would ever build an IOMMU with a 1-byte page size, and that even if
> someone was insane enough to do so, Linux would never support using that

pgsize_bitmap must not be 0, otherwise the function errors and
returns, therefore page_size cannot be 1.

> granularity (since it would be nice to have some memory left for
> anything other than infinite IOMMU pagetables). It's also indeed not
> impossible that a driver might have a bug in its iova_to_phys
> implementation, but then that driver would have a bug that would also
> break other iova_to_phys users (e.g. iommu-dma) anyway. Every other
> possible return value from iova_to_phys is unambiguous, and the single
> one that isn't is trivial to avoid in this case, so why do anything
> other than avoid it?

I do not know if other iova_to_phys users do unaligned accesses with a
single byte offset. I would not be eager to find out if all underlying
implementations are robust.

> > I would tend to consider both solutions not ideal, and edge towards
> > Jason's suggestion of universally ignoring EADDRINUSE without an
> > error. However, I am not an absolutist in that regard.
>
> Relying on magic return values doesn't even work around this issue
> robustly, since even among the drivers which do treat double-mapping as
> an error, they do not all use the same error codes, and as mentioned
> some would already log a visible warning before getting that far. This
> is some core code not quite working as intended, and that fact that it's
> apparently only shown up on AMD platforms which also previously happened
> to hide it hardly makes it right to paper over in a manner that only
> "works" for a few drivers including AMD, rather than properly fix the
> root cause and make the initial check consistently work as intended to
> prevent double-mapping attempts in the first place.

I am still concerned about unaligned checks. It is a functional change
that can cause regressions in all devices. The approach of this patch
does not affect behavior in other devices. I would like for Jason to
weigh in.

In my device, the only error printed due to this V1 is the warning I
added and this seems to be an AMD specific bug for certain BIOSes.

In the interim, I will send a V2 with Jason's suggestions.

Antheas

> Thanks,
> Robin.
>
> > I am on a business trip and do not have the SMTP credentials with me
> > for git, so I will send a V2 tomorrow. I already drafted it. I will
> > make sure to cc new participants on this thread if not already
> > included.
> >
> > As an aside, it seems that the issue fixed by this patch is not the
> > only one introduced in 6.19, with [1] having users where this patch
> > does not help. It seems there are additional issues with thunderbolt
> > docks. iommu_map could be the culprit and perhaps forceful remapping,
> > as was done previously for AMD would be a more universal solution.
> > This way, EADDRINUSE stops being an error.
> >
> > Best,
> > Antheas
> >
> > [1] https://github.com/CachyOS/linux-cachyos/issues/704
> >
> >>                          if (!phys_addr) {
> >>                                  map_size += pg_size;
> >>                                  continue;
> >>
> >>
> >
>
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists
  2026-02-26 20:40               ` Antheas Kapenekakis
@ 2026-02-27  1:03                 ` Jason Gunthorpe
  2026-02-27  8:06                   ` Antheas Kapenekakis
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2026-02-27  1:03 UTC (permalink / raw)
  To: Antheas Kapenekakis
  Cc: Robin Murphy, iommu, linux-kernel, Joerg Roedel, Will Deacon,
	Vasant Hegde, Alejandro Jimenez, dnaim, Mario.Limonciello

On Thu, Feb 26, 2026 at 09:40:10PM +0100, Antheas Kapenekakis wrote:
> I am still concerned about unaligned checks. It is a functional change
> that can cause regressions in all devices. The approach of this patch
> does not affect behavior in other devices. I would like for Jason to
> weigh in.

I think Robin's solution is very clever, but I share the concern
regarding what all the implementations do.

So, I fed this question to Claude. It did find two counter points (see
below for the whole report I had it generate):

     Implementations that lose the offset

     s390-iommu (drivers/iommu/s390-iommu.c:989): After the 3-level ZPCI walk,
     returns pte & ZPCI_PTE_ADDR_MASK with no sub-page offset added back.
     iova_to_phys(0) and iova_to_phys(1) return the same page-aligned PA.

     mtk_iommu_v1 (drivers/iommu/mtk_iommu_v1.c:396): Looks up the PTE by
     iova >> PAGE_SHIFT (discarding offset), then returns pte & ~(page_size-1). No
     step adds the sub-page offset back.

I checked myself and it seems correct. I didn't try to confirm that
the cases it says are OK are in fact OK, but it paints a convincing
picture.

I doubt S390 uses this function you are fixing, and I have no idea
about mtk. Below is also a diff how Claude thought to fix it, I didn't
try to check it.

So, I'd say if Robin is OK with these outliers then it a good and fine
approach.

Jason

iova_to_phys Implementation Survey

Entry point: iommu_iova_to_phys() in drivers/iommu/iommu.c:2502 calls
domain->ops->iova_to_phys(domain, iova) via iommu_domain_ops.

Category 1 — Delegates to io-pgtable

These drivers hold an io_pgtable_ops * and call ops->iova_to_phys(ops, iova).
The actual walk happens in one of the io-pgtable backends listed in Category
4.

  ----------------------------------------------------------------------------------------------------------
  Driver        Function (file:line)                               Ops assignment          Notes
  ------------- -------------------------------------------------- ----------------------- -----------------
  arm-smmu-v3   arm_smmu_iova_to_phys                              :3767                   Pure delegation
                drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:3471                           

  arm-smmu      arm_smmu_iova_to_phys                              :1655                   S1 with
  v1/v2         drivers/iommu/arm/arm-smmu/arm-smmu.c:1387                                 FEAT_TRANS_OPS
                                                                                           uses hw ATS1PR
                                                                                           registers
                                                                                           (CB_PAR),
                                                                                           otherwise
                                                                                           io-pgtable

  apple-dart    apple_dart_iova_to_phys                            :1021                   Pure delegation →
                drivers/iommu/apple-dart.c:531                                             io-pgtable-dart

  qcom_iommu    qcom_iommu_iova_to_phys                            :605                    Delegation with
                drivers/iommu/arm/arm-smmu/qcom_iommu.c:492                                spinlock

  ipmmu-vmsa    ipmmu_iova_to_phys drivers/iommu/ipmmu-vmsa.c:702  :895                    Uses
                                                                                           ARM_32_LPAE_S1
                                                                                           format

  mtk_iommu     mtk_iommu_iova_to_phys                             :1073                   Delegation + 4GB
                drivers/iommu/mtk_iommu.c:861                                              mode PA remap
                                                                                           fixup
  ----------------------------------------------------------------------------------------------------------

Category 2 — Open-coded page table walk

These drivers implement their own page table traversal without io-pgtable.

  ---------------------------------------------------------------------------------------------------------
  Driver           Function (file:line)                 Ops assignment       Walk structure
  ---------------- ------------------------------------ -------------------- ------------------------------
  sun50i-iommu     sun50i_iommu_iova_to_phys            :860                 2-level (DTE → PTE)
                   drivers/iommu/sun50i-iommu.c:662                          

  exynos-iommu     exynos_iommu_iova_to_phys            :1487                2-level (section/large/small
                   drivers/iommu/exynos-iommu.c:1375                         page)

  riscv-iommu      riscv_iommu_iova_to_phys             :1355                Sv39/48/57 via
                   drivers/iommu/riscv/iommu.c:1280                          riscv_iommu_pte_fetch (:1166)

  omap-iommu       omap_iommu_iova_to_phys              :1727                iopgtable_lookup_entry helper
                   drivers/iommu/omap-iommu.c:1596                           (super
                                                                             section/section/large/small)

  rockchip-iommu   rk_iommu_iova_to_phys                :1190                2-level (DTE → PTE)
                   drivers/iommu/rockchip-iommu.c:651                        

  msm_iommu        msm_iommu_iova_to_phys               :709                 Hardware walk: writes VA to
                   drivers/iommu/msm_iommu.c:526                             V2PPR register, reads PA from
                                                                             PAR register

  s390-iommu       s390_iommu_iova_to_phys              :1186                3-level ZPCI (region → segment
                   drivers/iommu/s390-iommu.c:989                            → page)

  tegra-smmu       tegra_smmu_iova_to_phys              :1010                2-level via
                   drivers/iommu/tegra-smmu.c:806                            tegra_smmu_pte_lookup

  mtk_iommu_v1     mtk_iommu_v1_iova_to_phys            :593                 Flat single-level table
                   drivers/iommu/mtk_iommu_v1.c:396                          

  sprd-iommu       sprd_iommu_iova_to_phys              :423                 Flat single-level table
                   drivers/iommu/sprd-iommu.c:369                            
  ---------------------------------------------------------------------------------------------------------

Category 3 — Special / trivial

  -------------------------------------------------------------------------------------------
  Driver         Function (file:line)                  Ops assignment         Mechanism
  -------------- ------------------------------------- ---------------------- ---------------
  fsl_pamu       fsl_pamu_iova_to_phys                 :438                   Identity:
                 drivers/iommu/fsl_pamu_domain.c:172                          returns iova
                                                                              (after aperture
                                                                              bounds check)

  virtio-iommu   viommu_iova_to_phys                   :1105                  Interval tree
                 drivers/iommu/virtio-iommu.c:915                             reverse lookup
                                                                              (no page table)
  -------------------------------------------------------------------------------------------

Category 4 — io_pgtable_ops backends

These implement struct io_pgtable_ops.iova_to_phys and are the ultimate walk
functions called by Category 1 drivers.

  --------------------------------------------------------------------------------------------------
  Backend     Function (file:line)                     Ops assignment       Walk strategy
  ----------- ---------------------------------------- -------------------- ------------------------
  ARM LPAE    arm_lpae_iova_to_phys                    :950                 Visitor pattern via
  (64-bit)    drivers/iommu/io-pgtable-arm.c:734                            __arm_lpae_iopte_walk;
                                                                            covers ARM_64_LPAE_S1,
                                                                            S2, ARM_MALI_LPAE

  ARM v7s     arm_v7s_iova_to_phys                     :716                 Iterative do-while
  (32-bit)    drivers/iommu/io-pgtable-arm-v7s.c:644                        2-level; handles
                                                                            contiguous entries

  Apple DART  dart_iova_to_phys                        :402                 dart_get_last pre-walks
              drivers/iommu/io-pgtable-dart.c:336                           to leaf table, then
                                                                            single lookup
  --------------------------------------------------------------------------------------------------

Category 5 — generic_pt framework

All these drivers use IOMMU_PT_DOMAIN_OPS(fmt) which routes iova_to_phys into
the template function pt_iommu_<fmt>_iova_to_phys at
drivers/iommu/generic_pt/iommu_pt.h:170. The walk uses pt_walk_range +
PT_MAKE_LEVELS to generate a fully-inlined unrolled per-level walk; OA
extracted via pt_entry_oa_exact.

  ---------------------------------------------------------------------------------
  Driver           Ops struct (file:line)                          Format
  ---------------- ----------------------------------------------- ----------------
  AMD IOMMU v1     amdv1_ops drivers/iommu/amd/iommu.c:2662        amdv1

  AMD IOMMU v2     amdv2_ops drivers/iommu/amd/iommu.c:2740        x86_64

  Intel VT-d       intel_fs_paging_domain_ops                      x86_64
  first-stage      drivers/iommu/intel/iommu.c:3886                

  Intel VT-d       intel_ss_paging_domain_ops                      vtdss
  second-stage     drivers/iommu/intel/iommu.c:3897                

  iommufd selftest mock_domain_ops etc                             amdv1_mock /
                   drivers/iommu/iommufd/selftest.c:403,411,425    amdv1

  KUnit wrapper    pgtbl_ops                                       Delegates to
                   drivers/iommu/generic_pt/kunit_iommu_cmp.h:86   io_pgtable_ops
                                                                   for comparison
                                                                   testing
  ---------------------------------------------------------------------------------

Sub-page offset handling

When iova_to_phys(iova) is called with an IOVA that is not aligned to the
start of the mapped page/block (e.g. iova_to_phys(1) when a 4KB page is mapped
at IOVA 0), most implementations return the exact physical address including
the sub-page offset (phys_base + offset). Two do not.

Summary

  ------------------------------------------------------------------------------------------------------
  Implementation            Offset preserved?         Mechanism
  ------------------------- ------------------------- --------------------------------------------------
  arm_lpae (io-pgtable)     YES                       iopte_to_paddr(pte) | (iova & (block_size-1))

  arm_v7s (io-pgtable)      YES                       iopte_to_paddr(pte) | (iova & ~LVL_MASK)

  dart (io-pgtable)         YES                       iopte_to_paddr(pte) | (iova & (pgsize-1))

  sun50i-iommu              YES                       page_addr + FIELD_GET(GENMASK(11,0), iova)

  exynos-iommu              YES                       *_phys(entry) + *_offs(iova) per granularity

  riscv-iommu               YES                       pfn_to_phys(pfn) | (iova & (pte_size-1))

  omap-iommu                YES                       (descriptor & mask) | (va & ~mask)

  rockchip-iommu            YES                       pt_address(pte) + rk_iova_page_offset(iova)

  msm_iommu                 YES                       HW PAR register + VA low bits spliced back in

  s390-iommu                NO                        pte & ZPCI_PTE_ADDR_MASK — offset discarded

  tegra-smmu                YES                       SMMU_PFN_PHYS(pfn) + SMMU_OFFSET_IN_PAGE(iova)

  mtk_iommu_v1              NO                        pte & ~(page_size-1) — offset discarded

  sprd-iommu                YES                       (pte << PAGE_SHIFT) + (iova & (page_size-1))

  fsl_pamu                  YES (trivial)             return iova — identity mapping

  virtio-iommu              YES                       paddr + (iova - mapping->iova.start)

  generic_pt                YES                       _pt_entry_oa_fast() | log2_mod(va, entry_lg2sz)
  ------------------------------------------------------------------------------------------------------

Category 1 drivers (arm-smmu-v3, arm-smmu, apple-dart, qcom_iommu, ipmmu-vmsa,
mtk_iommu) inherit the behavior of their io-pgtable backend — all preserve
offset.

Implementations that lose the offset

s390-iommu (drivers/iommu/s390-iommu.c:989): After the 3-level ZPCI walk,
returns pte & ZPCI_PTE_ADDR_MASK with no sub-page offset added back.
iova_to_phys(0) and iova_to_phys(1) return the same page-aligned PA.

mtk_iommu_v1 (drivers/iommu/mtk_iommu_v1.c:396): Looks up the PTE by
iova >> PAGE_SHIFT (discarding offset), then returns pte & ~(page_size-1). No
step adds the sub-page offset back.


diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index c8d8eff5373d30..8db16989270cd8 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -401,7 +401,8 @@ static phys_addr_t mtk_iommu_v1_iova_to_phys(struct iommu_domain *domain, dma_ad
 
 	spin_lock_irqsave(&dom->pgtlock, flags);
 	pa = *(dom->pgt_va + (iova >> MT2701_IOMMU_PAGE_SHIFT));
-	pa = pa & (~(MT2701_IOMMU_PAGE_SIZE - 1));
+	pa = (pa & (~(MT2701_IOMMU_PAGE_SIZE - 1))) |
+	     (iova & (MT2701_IOMMU_PAGE_SIZE - 1));
 	spin_unlock_irqrestore(&dom->pgtlock, flags);
 
 	return pa;
diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index fe679850af2861..57d27f3a984ed6 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -1015,7 +1015,8 @@ static phys_addr_t s390_iommu_iova_to_phys(struct iommu_domain *domain,
 			pto = get_st_pto(ste);
 			pte = READ_ONCE(pto[px]);
 			if (pt_entry_isvalid(pte))
-				phys = pte & ZPCI_PTE_ADDR_MASK;
+				phys = (pte & ZPCI_PTE_ADDR_MASK) |
+				       (iova & ~ZPCI_PTE_ADDR_MASK);
 		}
 	}

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists
  2026-02-27  1:03                 ` Jason Gunthorpe
@ 2026-02-27  8:06                   ` Antheas Kapenekakis
  2026-02-27 14:01                     ` Jason Gunthorpe
  0 siblings, 1 reply; 16+ messages in thread
From: Antheas Kapenekakis @ 2026-02-27  8:06 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Robin Murphy, iommu, linux-kernel, Joerg Roedel, Will Deacon,
	Vasant Hegde, Alejandro Jimenez, dnaim, Mario.Limonciello

On Fri, 27 Feb 2026 at 02:03, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Thu, Feb 26, 2026 at 09:40:10PM +0100, Antheas Kapenekakis wrote:
> > I am still concerned about unaligned checks. It is a functional change
> > that can cause regressions in all devices. The approach of this patch
> > does not affect behavior in other devices. I would like for Jason to
> > weigh in.
>
> I think Robin's solution is very clever, but I share the concern
> regarding what all the implementations do.

I will send a V3 with a ternary instead. This way, only 0 is affected.
I will compile test now and test later today. I am impartial with either fix.

```
phys_addr = iommu_iova_to_phys(domain, addr ? addr : 1);
```


Antheas

> So, I fed this question to Claude. It did find two counter points (see
> below for the whole report I had it generate):
>
>      Implementations that lose the offset
>
>      s390-iommu (drivers/iommu/s390-iommu.c:989): After the 3-level ZPCI walk,
>      returns pte & ZPCI_PTE_ADDR_MASK with no sub-page offset added back.
>      iova_to_phys(0) and iova_to_phys(1) return the same page-aligned PA.
>
>      mtk_iommu_v1 (drivers/iommu/mtk_iommu_v1.c:396): Looks up the PTE by
>      iova >> PAGE_SHIFT (discarding offset), then returns pte & ~(page_size-1). No
>      step adds the sub-page offset back.
>
> I checked myself and it seems correct. I didn't try to confirm that
> the cases it says are OK are in fact OK, but it paints a convincing
> picture.
>
> I doubt S390 uses this function you are fixing, and I have no idea
> about mtk. Below is also a diff how Claude thought to fix it, I didn't
> try to check it.
>
> So, I'd say if Robin is OK with these outliers then it a good and fine
> approach.
>
> Jason
>
> iova_to_phys Implementation Survey
>
> Entry point: iommu_iova_to_phys() in drivers/iommu/iommu.c:2502 calls
> domain->ops->iova_to_phys(domain, iova) via iommu_domain_ops.
>
> Category 1 — Delegates to io-pgtable
>
> These drivers hold an io_pgtable_ops * and call ops->iova_to_phys(ops, iova).
> The actual walk happens in one of the io-pgtable backends listed in Category
> 4.
>
>   ----------------------------------------------------------------------------------------------------------
>   Driver        Function (file:line)                               Ops assignment          Notes
>   ------------- -------------------------------------------------- ----------------------- -----------------
>   arm-smmu-v3   arm_smmu_iova_to_phys                              :3767                   Pure delegation
>                 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:3471
>
>   arm-smmu      arm_smmu_iova_to_phys                              :1655                   S1 with
>   v1/v2         drivers/iommu/arm/arm-smmu/arm-smmu.c:1387                                 FEAT_TRANS_OPS
>                                                                                            uses hw ATS1PR
>                                                                                            registers
>                                                                                            (CB_PAR),
>                                                                                            otherwise
>                                                                                            io-pgtable
>
>   apple-dart    apple_dart_iova_to_phys                            :1021                   Pure delegation →
>                 drivers/iommu/apple-dart.c:531                                             io-pgtable-dart
>
>   qcom_iommu    qcom_iommu_iova_to_phys                            :605                    Delegation with
>                 drivers/iommu/arm/arm-smmu/qcom_iommu.c:492                                spinlock
>
>   ipmmu-vmsa    ipmmu_iova_to_phys drivers/iommu/ipmmu-vmsa.c:702  :895                    Uses
>                                                                                            ARM_32_LPAE_S1
>                                                                                            format
>
>   mtk_iommu     mtk_iommu_iova_to_phys                             :1073                   Delegation + 4GB
>                 drivers/iommu/mtk_iommu.c:861                                              mode PA remap
>                                                                                            fixup
>   ----------------------------------------------------------------------------------------------------------
>
> Category 2 — Open-coded page table walk
>
> These drivers implement their own page table traversal without io-pgtable.
>
>   ---------------------------------------------------------------------------------------------------------
>   Driver           Function (file:line)                 Ops assignment       Walk structure
>   ---------------- ------------------------------------ -------------------- ------------------------------
>   sun50i-iommu     sun50i_iommu_iova_to_phys            :860                 2-level (DTE → PTE)
>                    drivers/iommu/sun50i-iommu.c:662
>
>   exynos-iommu     exynos_iommu_iova_to_phys            :1487                2-level (section/large/small
>                    drivers/iommu/exynos-iommu.c:1375                         page)
>
>   riscv-iommu      riscv_iommu_iova_to_phys             :1355                Sv39/48/57 via
>                    drivers/iommu/riscv/iommu.c:1280                          riscv_iommu_pte_fetch (:1166)
>
>   omap-iommu       omap_iommu_iova_to_phys              :1727                iopgtable_lookup_entry helper
>                    drivers/iommu/omap-iommu.c:1596                           (super
>                                                                              section/section/large/small)
>
>   rockchip-iommu   rk_iommu_iova_to_phys                :1190                2-level (DTE → PTE)
>                    drivers/iommu/rockchip-iommu.c:651
>
>   msm_iommu        msm_iommu_iova_to_phys               :709                 Hardware walk: writes VA to
>                    drivers/iommu/msm_iommu.c:526                             V2PPR register, reads PA from
>                                                                              PAR register
>
>   s390-iommu       s390_iommu_iova_to_phys              :1186                3-level ZPCI (region → segment
>                    drivers/iommu/s390-iommu.c:989                            → page)
>
>   tegra-smmu       tegra_smmu_iova_to_phys              :1010                2-level via
>                    drivers/iommu/tegra-smmu.c:806                            tegra_smmu_pte_lookup
>
>   mtk_iommu_v1     mtk_iommu_v1_iova_to_phys            :593                 Flat single-level table
>                    drivers/iommu/mtk_iommu_v1.c:396
>
>   sprd-iommu       sprd_iommu_iova_to_phys              :423                 Flat single-level table
>                    drivers/iommu/sprd-iommu.c:369
>   ---------------------------------------------------------------------------------------------------------
>
> Category 3 — Special / trivial
>
>   -------------------------------------------------------------------------------------------
>   Driver         Function (file:line)                  Ops assignment         Mechanism
>   -------------- ------------------------------------- ---------------------- ---------------
>   fsl_pamu       fsl_pamu_iova_to_phys                 :438                   Identity:
>                  drivers/iommu/fsl_pamu_domain.c:172                          returns iova
>                                                                               (after aperture
>                                                                               bounds check)
>
>   virtio-iommu   viommu_iova_to_phys                   :1105                  Interval tree
>                  drivers/iommu/virtio-iommu.c:915                             reverse lookup
>                                                                               (no page table)
>   -------------------------------------------------------------------------------------------
>
> Category 4 — io_pgtable_ops backends
>
> These implement struct io_pgtable_ops.iova_to_phys and are the ultimate walk
> functions called by Category 1 drivers.
>
>   --------------------------------------------------------------------------------------------------
>   Backend     Function (file:line)                     Ops assignment       Walk strategy
>   ----------- ---------------------------------------- -------------------- ------------------------
>   ARM LPAE    arm_lpae_iova_to_phys                    :950                 Visitor pattern via
>   (64-bit)    drivers/iommu/io-pgtable-arm.c:734                            __arm_lpae_iopte_walk;
>                                                                             covers ARM_64_LPAE_S1,
>                                                                             S2, ARM_MALI_LPAE
>
>   ARM v7s     arm_v7s_iova_to_phys                     :716                 Iterative do-while
>   (32-bit)    drivers/iommu/io-pgtable-arm-v7s.c:644                        2-level; handles
>                                                                             contiguous entries
>
>   Apple DART  dart_iova_to_phys                        :402                 dart_get_last pre-walks
>               drivers/iommu/io-pgtable-dart.c:336                           to leaf table, then
>                                                                             single lookup
>   --------------------------------------------------------------------------------------------------
>
> Category 5 — generic_pt framework
>
> All these drivers use IOMMU_PT_DOMAIN_OPS(fmt) which routes iova_to_phys into
> the template function pt_iommu_<fmt>_iova_to_phys at
> drivers/iommu/generic_pt/iommu_pt.h:170. The walk uses pt_walk_range +
> PT_MAKE_LEVELS to generate a fully-inlined unrolled per-level walk; OA
> extracted via pt_entry_oa_exact.
>
>   ---------------------------------------------------------------------------------
>   Driver           Ops struct (file:line)                          Format
>   ---------------- ----------------------------------------------- ----------------
>   AMD IOMMU v1     amdv1_ops drivers/iommu/amd/iommu.c:2662        amdv1
>
>   AMD IOMMU v2     amdv2_ops drivers/iommu/amd/iommu.c:2740        x86_64
>
>   Intel VT-d       intel_fs_paging_domain_ops                      x86_64
>   first-stage      drivers/iommu/intel/iommu.c:3886
>
>   Intel VT-d       intel_ss_paging_domain_ops                      vtdss
>   second-stage     drivers/iommu/intel/iommu.c:3897
>
>   iommufd selftest mock_domain_ops etc                             amdv1_mock /
>                    drivers/iommu/iommufd/selftest.c:403,411,425    amdv1
>
>   KUnit wrapper    pgtbl_ops                                       Delegates to
>                    drivers/iommu/generic_pt/kunit_iommu_cmp.h:86   io_pgtable_ops
>                                                                    for comparison
>                                                                    testing
>   ---------------------------------------------------------------------------------
>
> Sub-page offset handling
>
> When iova_to_phys(iova) is called with an IOVA that is not aligned to the
> start of the mapped page/block (e.g. iova_to_phys(1) when a 4KB page is mapped
> at IOVA 0), most implementations return the exact physical address including
> the sub-page offset (phys_base + offset). Two do not.
>
> Summary
>
>   ------------------------------------------------------------------------------------------------------
>   Implementation            Offset preserved?         Mechanism
>   ------------------------- ------------------------- --------------------------------------------------
>   arm_lpae (io-pgtable)     YES                       iopte_to_paddr(pte) | (iova & (block_size-1))
>
>   arm_v7s (io-pgtable)      YES                       iopte_to_paddr(pte) | (iova & ~LVL_MASK)
>
>   dart (io-pgtable)         YES                       iopte_to_paddr(pte) | (iova & (pgsize-1))
>
>   sun50i-iommu              YES                       page_addr + FIELD_GET(GENMASK(11,0), iova)
>
>   exynos-iommu              YES                       *_phys(entry) + *_offs(iova) per granularity
>
>   riscv-iommu               YES                       pfn_to_phys(pfn) | (iova & (pte_size-1))
>
>   omap-iommu                YES                       (descriptor & mask) | (va & ~mask)
>
>   rockchip-iommu            YES                       pt_address(pte) + rk_iova_page_offset(iova)
>
>   msm_iommu                 YES                       HW PAR register + VA low bits spliced back in
>
>   s390-iommu                NO                        pte & ZPCI_PTE_ADDR_MASK — offset discarded
>
>   tegra-smmu                YES                       SMMU_PFN_PHYS(pfn) + SMMU_OFFSET_IN_PAGE(iova)
>
>   mtk_iommu_v1              NO                        pte & ~(page_size-1) — offset discarded
>
>   sprd-iommu                YES                       (pte << PAGE_SHIFT) + (iova & (page_size-1))
>
>   fsl_pamu                  YES (trivial)             return iova — identity mapping
>
>   virtio-iommu              YES                       paddr + (iova - mapping->iova.start)
>
>   generic_pt                YES                       _pt_entry_oa_fast() | log2_mod(va, entry_lg2sz)
>   ------------------------------------------------------------------------------------------------------
>
> Category 1 drivers (arm-smmu-v3, arm-smmu, apple-dart, qcom_iommu, ipmmu-vmsa,
> mtk_iommu) inherit the behavior of their io-pgtable backend — all preserve
> offset.
>
> Implementations that lose the offset
>
> s390-iommu (drivers/iommu/s390-iommu.c:989): After the 3-level ZPCI walk,
> returns pte & ZPCI_PTE_ADDR_MASK with no sub-page offset added back.
> iova_to_phys(0) and iova_to_phys(1) return the same page-aligned PA.
>
> mtk_iommu_v1 (drivers/iommu/mtk_iommu_v1.c:396): Looks up the PTE by
> iova >> PAGE_SHIFT (discarding offset), then returns pte & ~(page_size-1). No
> step adds the sub-page offset back.
>
>
> diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
> index c8d8eff5373d30..8db16989270cd8 100644
> --- a/drivers/iommu/mtk_iommu_v1.c
> +++ b/drivers/iommu/mtk_iommu_v1.c
> @@ -401,7 +401,8 @@ static phys_addr_t mtk_iommu_v1_iova_to_phys(struct iommu_domain *domain, dma_ad
>
>         spin_lock_irqsave(&dom->pgtlock, flags);
>         pa = *(dom->pgt_va + (iova >> MT2701_IOMMU_PAGE_SHIFT));
> -       pa = pa & (~(MT2701_IOMMU_PAGE_SIZE - 1));
> +       pa = (pa & (~(MT2701_IOMMU_PAGE_SIZE - 1))) |
> +            (iova & (MT2701_IOMMU_PAGE_SIZE - 1));
>         spin_unlock_irqrestore(&dom->pgtlock, flags);
>
>         return pa;
> diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
> index fe679850af2861..57d27f3a984ed6 100644
> --- a/drivers/iommu/s390-iommu.c
> +++ b/drivers/iommu/s390-iommu.c
> @@ -1015,7 +1015,8 @@ static phys_addr_t s390_iommu_iova_to_phys(struct iommu_domain *domain,
>                         pto = get_st_pto(ste);
>                         pte = READ_ONCE(pto[px]);
>                         if (pt_entry_isvalid(pte))
> -                               phys = pte & ZPCI_PTE_ADDR_MASK;
> +                               phys = (pte & ZPCI_PTE_ADDR_MASK) |
> +                                      (iova & ~ZPCI_PTE_ADDR_MASK);
>                 }
>         }
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists
  2026-02-27  8:06                   ` Antheas Kapenekakis
@ 2026-02-27 14:01                     ` Jason Gunthorpe
  0 siblings, 0 replies; 16+ messages in thread
From: Jason Gunthorpe @ 2026-02-27 14:01 UTC (permalink / raw)
  To: Antheas Kapenekakis
  Cc: Robin Murphy, iommu, linux-kernel, Joerg Roedel, Will Deacon,
	Vasant Hegde, Alejandro Jimenez, dnaim, Mario.Limonciello

On Fri, Feb 27, 2026 at 09:06:27AM +0100, Antheas Kapenekakis wrote:
> On Fri, 27 Feb 2026 at 02:03, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Thu, Feb 26, 2026 at 09:40:10PM +0100, Antheas Kapenekakis wrote:
> > > I am still concerned about unaligned checks. It is a functional change
> > > that can cause regressions in all devices. The approach of this patch
> > > does not affect behavior in other devices. I would like for Jason to
> > > weigh in.
> >
> > I think Robin's solution is very clever, but I share the concern
> > regarding what all the implementations do.
> 
> I will send a V3 with a ternary instead. This way, only 0 is affected.
> I will compile test now and test later today. I am impartial with either fix.
> 
> ```
> phys_addr = iommu_iova_to_phys(domain, addr ? addr : 1);
> ```

Yeah, that's a good idea

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2026-02-27 14:02 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-21 23:50 [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists Antheas Kapenekakis
2026-02-23  6:02 ` Vasant Hegde
2026-02-23  7:46   ` Antheas Kapenekakis
2026-02-24  8:25     ` Vasant Hegde
2026-02-24  9:16       ` Antheas Kapenekakis
2026-02-24 19:23 ` Jason Gunthorpe
2026-02-24 19:33   ` Antheas Kapenekakis
2026-02-24 19:43     ` Jason Gunthorpe
2026-02-24 19:51       ` Antheas Kapenekakis
2026-02-25 18:27         ` Robin Murphy
2026-02-25 22:05           ` Antheas Kapenekakis
2026-02-26 19:46             ` Robin Murphy
2026-02-26 20:40               ` Antheas Kapenekakis
2026-02-27  1:03                 ` Jason Gunthorpe
2026-02-27  8:06                   ` Antheas Kapenekakis
2026-02-27 14:01                     ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox