* [PATCH rc] iommu/amd: Fix geometry.aperture_end for V2 tables
@ 2025-04-17 16:21 Jason Gunthorpe
2025-04-24 7:56 ` Vasant Hegde
0 siblings, 1 reply; 7+ messages in thread
From: Jason Gunthorpe @ 2025-04-17 16:21 UTC (permalink / raw)
To: iommu, Joerg Roedel, Robin Murphy, Vasant Hegde, Will Deacon
Cc: Joerg Roedel, Jerry Snitselaar, patches, Suravee Suthikulpanit
The AMD IOMMU documentation seems pretty clear that the V2 table follows
the normal CPU expectation of sign extension. This is shown in
Figure 25: AMD64 Long Mode 4-Kbyte Page Address Translation
Where bits Sign-Extend [63:57] == [56]. This is typical for x86 which
would have three regions in the page table: lower, non-canonical, upper.
The manual describes that the V1 table does not sign extend in section
2.2.4 Sharing AMD64 Processor and IOMMU Page Tables GPA-to-SPA
The iommu domain geometry does not directly support sign extended page
tables. The driver should report only one of the lower/upper spaces. Solve
this by removing the top VA bit from the geometry to use only the lower
space.
Adjust dma_max_address() to do this. It now returns:
5 Level:
Before 0x1ffffffffffffff
After 0x0ffffffffffffff
4 Level:
Before 0xffffffffffff
After 0x7fffffffffff
Fixes: 11c439a19466 ("iommu/amd/pgtbl_v2: Fix domain max address")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/amd/iommu.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
AMD folks: I'm just reading the documentation, it would be good to confirm
this understanding. I'm a bit surprised nobody hit this, but given the domain
aperture was wildly wrong up till 2023 maybe it never gets exercised
carefully.
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index dea0fed7abb044..10af3e9d9ea54e 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2495,8 +2495,14 @@ static inline u64 dma_max_address(enum protection_domain_mode pgtable)
if (pgtable == PD_MODE_V1)
return ~0ULL;
- /* V2 with 4/5 level page table */
- return ((1ULL << PM_LEVEL_SHIFT(amd_iommu_gpt_level)) - 1);
+ /*
+ * V2 with 4/5 level page table. Note that "2.2.6.5 AMD64 4-Kbyte Page
+ * Translation" shows that the V2 table sign extends the top of the
+ * address space creating a reserved region in the middle of the
+ * translation, just like the CPU does. Due to this we can only use half
+ * of the IOVA.
+ */
+ return ((1ULL << (PM_LEVEL_SHIFT(amd_iommu_gpt_level) - 1)) - 1);
}
static bool amd_iommu_hd_support(struct amd_iommu *iommu)
base-commit: 47aadfbd64cdf1f19c83e85ff4ffa9c024619ad6
--
2.43.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH rc] iommu/amd: Fix geometry.aperture_end for V2 tables
2025-04-17 16:21 [PATCH rc] iommu/amd: Fix geometry.aperture_end for V2 tables Jason Gunthorpe
@ 2025-04-24 7:56 ` Vasant Hegde
2025-04-24 14:06 ` Jason Gunthorpe
0 siblings, 1 reply; 7+ messages in thread
From: Vasant Hegde @ 2025-04-24 7:56 UTC (permalink / raw)
To: Jason Gunthorpe, iommu, Joerg Roedel, Robin Murphy, Will Deacon
Cc: Joerg Roedel, Jerry Snitselaar, patches, Suravee Suthikulpanit
Hi Jason,
On 4/17/2025 9:51 PM, Jason Gunthorpe wrote:
> The AMD IOMMU documentation seems pretty clear that the V2 table follows
> the normal CPU expectation of sign extension. This is shown in
>
> Figure 25: AMD64 Long Mode 4-Kbyte Page Address Translation
>
> Where bits Sign-Extend [63:57] == [56]. This is typical for x86 which
> would have three regions in the page table: lower, non-canonical, upper.
>
> The manual describes that the V1 table does not sign extend in section
> 2.2.4 Sharing AMD64 Processor and IOMMU Page Tables GPA-to-SPA
>
> The iommu domain geometry does not directly support sign extended page
> tables. The driver should report only one of the lower/upper spaces. Solve
> this by removing the top VA bit from the geometry to use only the lower
> space.
>
> Adjust dma_max_address() to do this. It now returns:
>
> 5 Level:
> Before 0x1ffffffffffffff
> After 0x0ffffffffffffff
> 4 Level:
> Before 0xffffffffffff
> After 0x7fffffffffff
>
> Fixes: 11c439a19466 ("iommu/amd/pgtbl_v2: Fix domain max address")
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
> drivers/iommu/amd/iommu.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> AMD folks: I'm just reading the documentation, it would be good to confirm
> this understanding. I'm a bit surprised nobody hit this, but given the domain
> aperture was wildly wrong up till 2023 maybe it never gets exercised
> carefully.
We have tested this with 4 and 5 level w/ some bench marks as well as w/
forcedac=1. It works fine. My understanding is IOMMU uses bit 56/47 as well for
address translation. Its fine for both DMA translation as well as PASID mode.
I have checked with MM folks. They are saying sign bit is notional thing and is
not reserved as such.
I can cross check with HW architects and request them to make it clear in
specification.
-Vasant
>
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index dea0fed7abb044..10af3e9d9ea54e 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -2495,8 +2495,14 @@ static inline u64 dma_max_address(enum protection_domain_mode pgtable)
> if (pgtable == PD_MODE_V1)
> return ~0ULL;
>
> - /* V2 with 4/5 level page table */
> - return ((1ULL << PM_LEVEL_SHIFT(amd_iommu_gpt_level)) - 1);
> + /*
> + * V2 with 4/5 level page table. Note that "2.2.6.5 AMD64 4-Kbyte Page
> + * Translation" shows that the V2 table sign extends the top of the
> + * address space creating a reserved region in the middle of the
> + * translation, just like the CPU does. Due to this we can only use half
> + * of the IOVA.
> + */
> + return ((1ULL << (PM_LEVEL_SHIFT(amd_iommu_gpt_level) - 1)) - 1);
> }
>
> static bool amd_iommu_hd_support(struct amd_iommu *iommu)
>
> base-commit: 47aadfbd64cdf1f19c83e85ff4ffa9c024619ad6
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH rc] iommu/amd: Fix geometry.aperture_end for V2 tables
2025-04-24 7:56 ` Vasant Hegde
@ 2025-04-24 14:06 ` Jason Gunthorpe
2025-04-29 6:03 ` Vasant Hegde
2025-05-28 8:47 ` Vasant Hegde
0 siblings, 2 replies; 7+ messages in thread
From: Jason Gunthorpe @ 2025-04-24 14:06 UTC (permalink / raw)
To: Vasant Hegde
Cc: iommu, Joerg Roedel, Robin Murphy, Will Deacon, Joerg Roedel,
Jerry Snitselaar, patches, Suravee Suthikulpanit
On Thu, Apr 24, 2025 at 01:26:21PM +0530, Vasant Hegde wrote:
> Hi Jason,
>
> On 4/17/2025 9:51 PM, Jason Gunthorpe wrote:
> > The AMD IOMMU documentation seems pretty clear that the V2 table follows
> > the normal CPU expectation of sign extension. This is shown in
> >
> > Figure 25: AMD64 Long Mode 4-Kbyte Page Address Translation
> >
> > Where bits Sign-Extend [63:57] == [56]. This is typical for x86 which
> > would have three regions in the page table: lower, non-canonical, upper.
> >
> > The manual describes that the V1 table does not sign extend in section
> > 2.2.4 Sharing AMD64 Processor and IOMMU Page Tables GPA-to-SPA
> >
> > The iommu domain geometry does not directly support sign extended page
> > tables. The driver should report only one of the lower/upper spaces. Solve
> > this by removing the top VA bit from the geometry to use only the lower
> > space.
> >
> > Adjust dma_max_address() to do this. It now returns:
> >
> > 5 Level:
> > Before 0x1ffffffffffffff
> > After 0x0ffffffffffffff
> > 4 Level:
> > Before 0xffffffffffff
> > After 0x7fffffffffff
> >
> > Fixes: 11c439a19466 ("iommu/amd/pgtbl_v2: Fix domain max address")
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > ---
> > drivers/iommu/amd/iommu.c | 10 ++++++++--
> > 1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > AMD folks: I'm just reading the documentation, it would be good to confirm
> > this understanding. I'm a bit surprised nobody hit this, but given the domain
> > aperture was wildly wrong up till 2023 maybe it never gets exercised
> > carefully.
>
> We have tested this with 4 and 5 level w/ some bench marks as well as w/
> forcedac=1. It works fine. My understanding is IOMMU uses bit 56/47 as well for
> address translation.
Yes, it should use bit 56 for address translation, that is part of the
page table architecture.
The question is what happen if a device uses IOVA 0x0100000000000000
with the iommu. This is a non-canonical address, so I think
architecturally on x86 it should be rejected. I would not be surprised
if some HW treats it the same as 0xFF00000000000000 - though that
would be dangerous.
There is a significant correctness issue here with ATS, the IOMMU
*must not* allow address aliases to exist, so if it responds to ATS
queries at both 0x0100000000000000 and 0xFF00000000000000 with the
same PTE then it is security broken. The device ATC is only flushed
based on the canonical IOVA, so any aliases can remain in the ATC and
trigger UAF issues. This can possibly be triggered by userspace when
using VFIO :\
So the question is not about if bit 56/47 is used, but if the IOMMU hw
is validating the sign extension. Assuming it is validating then we
must not tell the iommu core code to use
0x0100000000000000 -> 0x01FFFFFFFFFFFFFF
as IOVA since it is not legal IOVA. This is why the bit width is
reduced by one when computing the aperture.
Given the security sensistivity with ATS the sign validation behavior
should be understood because I do plan to come with a patch to enable
the high address space for iommufd and if some AMD implementations
need to block that we should know :)
Thanks,
Jason
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH rc] iommu/amd: Fix geometry.aperture_end for V2 tables
2025-04-24 14:06 ` Jason Gunthorpe
@ 2025-04-29 6:03 ` Vasant Hegde
2025-05-28 8:47 ` Vasant Hegde
1 sibling, 0 replies; 7+ messages in thread
From: Vasant Hegde @ 2025-04-29 6:03 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Joerg Roedel, Robin Murphy, Will Deacon, Joerg Roedel,
Jerry Snitselaar, patches, Suravee Suthikulpanit
Hi,
On 4/24/2025 7:36 PM, Jason Gunthorpe wrote:
> On Thu, Apr 24, 2025 at 01:26:21PM +0530, Vasant Hegde wrote:
>> Hi Jason,
>>
>> On 4/17/2025 9:51 PM, Jason Gunthorpe wrote:
>>> The AMD IOMMU documentation seems pretty clear that the V2 table follows
>>> the normal CPU expectation of sign extension. This is shown in
>>>
>>> Figure 25: AMD64 Long Mode 4-Kbyte Page Address Translation
>>>
>>> Where bits Sign-Extend [63:57] == [56]. This is typical for x86 which
>>> would have three regions in the page table: lower, non-canonical, upper.
>>>
>>> The manual describes that the V1 table does not sign extend in section
>>> 2.2.4 Sharing AMD64 Processor and IOMMU Page Tables GPA-to-SPA
>>>
>>> The iommu domain geometry does not directly support sign extended page
>>> tables. The driver should report only one of the lower/upper spaces. Solve
>>> this by removing the top VA bit from the geometry to use only the lower
>>> space.
>>>
>>> Adjust dma_max_address() to do this. It now returns:
>>>
>>> 5 Level:
>>> Before 0x1ffffffffffffff
>>> After 0x0ffffffffffffff
>>> 4 Level:
>>> Before 0xffffffffffff
>>> After 0x7fffffffffff
>>>
>>> Fixes: 11c439a19466 ("iommu/amd/pgtbl_v2: Fix domain max address")
>>> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
>>> ---
>>> drivers/iommu/amd/iommu.c | 10 ++++++++--
>>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>>
>>> AMD folks: I'm just reading the documentation, it would be good to confirm
>>> this understanding. I'm a bit surprised nobody hit this, but given the domain
>>> aperture was wildly wrong up till 2023 maybe it never gets exercised
>>> carefully.
>>
>> We have tested this with 4 and 5 level w/ some bench marks as well as w/
>> forcedac=1. It works fine. My understanding is IOMMU uses bit 56/47 as well for
>> address translation.
>
> Yes, it should use bit 56 for address translation, that is part of the
> page table architecture.
>
> The question is what happen if a device uses IOVA 0x0100000000000000
> with the iommu. This is a non-canonical address, so I think
> architecturally on x86 it should be rejected. I would not be surprised
> if some HW treats it the same as 0xFF00000000000000 - though that
> would be dangerous.
>
> There is a significant correctness issue here with ATS, the IOMMU
> *must not* allow address aliases to exist, so if it responds to ATS
> queries at both 0x0100000000000000 and 0xFF00000000000000 with the
> same PTE then it is security broken. The device ATC is only flushed
> based on the canonical IOVA, so any aliases can remain in the ATC and
> trigger UAF issues. This can possibly be triggered by userspace when
> using VFIO :\
>
> So the question is not about if bit 56/47 is used, but if the IOMMU hw
> is validating the sign extension. Assuming it is validating then we
> must not tell the iommu core code to use
> 0x0100000000000000 -> 0x01FFFFFFFFFFFFFF
> as IOVA since it is not legal IOVA. This is why the bit width is
> reduced by one when computing the aperture.
Got it . I am checking with HW architects on this scenario. Will let you the
behavior soon.
-Vasant
>
> Given the security sensistivity with ATS the sign validation behavior
> should be understood because I do plan to come with a patch to enable
> the high address space for iommufd and if some AMD implementations
> need to block that we should know :)
>
> Thanks,
> Jason
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH rc] iommu/amd: Fix geometry.aperture_end for V2 tables
2025-04-24 14:06 ` Jason Gunthorpe
2025-04-29 6:03 ` Vasant Hegde
@ 2025-05-28 8:47 ` Vasant Hegde
2025-05-28 11:57 ` Jason Gunthorpe
1 sibling, 1 reply; 7+ messages in thread
From: Vasant Hegde @ 2025-05-28 8:47 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Joerg Roedel, Robin Murphy, Will Deacon, Joerg Roedel,
Jerry Snitselaar, patches, Suravee Suthikulpanit
Hi Jason,
On 4/24/2025 7:36 PM, Jason Gunthorpe wrote:
> On Thu, Apr 24, 2025 at 01:26:21PM +0530, Vasant Hegde wrote:
>> Hi Jason,
>>
>> On 4/17/2025 9:51 PM, Jason Gunthorpe wrote:
>>> The AMD IOMMU documentation seems pretty clear that the V2 table follows
>>> the normal CPU expectation of sign extension. This is shown in
>>>
>>> Figure 25: AMD64 Long Mode 4-Kbyte Page Address Translation
>>>
>>> Where bits Sign-Extend [63:57] == [56]. This is typical for x86 which
>>> would have three regions in the page table: lower, non-canonical, upper.
>>>
>>> The manual describes that the V1 table does not sign extend in section
>>> 2.2.4 Sharing AMD64 Processor and IOMMU Page Tables GPA-to-SPA
>>>
>>> The iommu domain geometry does not directly support sign extended page
>>> tables. The driver should report only one of the lower/upper spaces. Solve
>>> this by removing the top VA bit from the geometry to use only the lower
>>> space.
>>>
>>> Adjust dma_max_address() to do this. It now returns:
>>>
>>> 5 Level:
>>> Before 0x1ffffffffffffff
>>> After 0x0ffffffffffffff
>>> 4 Level:
>>> Before 0xffffffffffff
>>> After 0x7fffffffffff
>>>
>>> Fixes: 11c439a19466 ("iommu/amd/pgtbl_v2: Fix domain max address")
>>> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
>>> ---
>>> drivers/iommu/amd/iommu.c | 10 ++++++++--
>>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>>
>>> AMD folks: I'm just reading the documentation, it would be good to confirm
>>> this understanding. I'm a bit surprised nobody hit this, but given the domain
>>> aperture was wildly wrong up till 2023 maybe it never gets exercised
>>> carefully.
>>
>> We have tested this with 4 and 5 level w/ some bench marks as well as w/
>> forcedac=1. It works fine. My understanding is IOMMU uses bit 56/47 as well for
>> address translation.
>
> Yes, it should use bit 56 for address translation, that is part of the
> page table architecture.
I have checked with HW architects.
In DMA API mode (PASID=0), IOMMU HW does not use canonical addresses. It's safe
to use bit 56 (5 level page table) -OR- bit 47 (4 level page table) for address
translation (we don't need sign extension). However when PASID is *enabled*
(PASID != 0), then IOMMU expects canonical address bit[63-57] should match
bit[56]. Otherwise it will abort the request.
I have requested spec writer to add details in the spec.
-Vasant
>
> The question is what happen if a device uses IOVA 0x0100000000000000
> with the iommu. This is a non-canonical address, so I think
> architecturally on x86 it should be rejected. I would not be surprised
> if some HW treats it the same as 0xFF00000000000000 - though that
> would be dangerous.
>
> There is a significant correctness issue here with ATS, the IOMMU
> *must not* allow address aliases to exist, so if it responds to ATS
> queries at both 0x0100000000000000 and 0xFF00000000000000 with the
> same PTE then it is security broken. The device ATC is only flushed
> based on the canonical IOVA, so any aliases can remain in the ATC and
> trigger UAF issues. This can possibly be triggered by userspace when
> using VFIO :\
>
> So the question is not about if bit 56/47 is used, but if the IOMMU hw
> is validating the sign extension. Assuming it is validating then we
> must not tell the iommu core code to use
> 0x0100000000000000 -> 0x01FFFFFFFFFFFFFF
> as IOVA since it is not legal IOVA. This is why the bit width is
> reduced by one when computing the aperture.
>
> Given the security sensistivity with ATS the sign validation behavior
> should be understood because I do plan to come with a patch to enable
> the high address space for iommufd and if some AMD implementations
> need to block that we should know :)
>
> Thanks,
> Jason
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH rc] iommu/amd: Fix geometry.aperture_end for V2 tables
2025-05-28 8:47 ` Vasant Hegde
@ 2025-05-28 11:57 ` Jason Gunthorpe
2025-06-12 4:57 ` Vasant Hegde
0 siblings, 1 reply; 7+ messages in thread
From: Jason Gunthorpe @ 2025-05-28 11:57 UTC (permalink / raw)
To: Vasant Hegde
Cc: iommu, Joerg Roedel, Robin Murphy, Will Deacon, Joerg Roedel,
Jerry Snitselaar, patches, Suravee Suthikulpanit
On Wed, May 28, 2025 at 02:17:56PM +0530, Vasant Hegde wrote:
> > Yes, it should use bit 56 for address translation, that is part of the
> > page table architecture.
>
> I have checked with HW architects.
>
> In DMA API mode (PASID=0), IOMMU HW does not use canonical
> addresses.
Do you really mean PASID=0 but still using a AMDv2 table with the
GCR3? That seems wild that only PASID0 would break the normal rules
for x86 page tables...
We already know AMDv1 is different and that is documented, and it only
works on PASID0.
> It's safe to use bit 56 (5 level page table) -OR- bit 47 (4 level
> page table) for address translation (we don't need sign
> extension).
But does it validate that the upper bits are zero or does it ignore
them? As I mentioned previously ignoring them breaks ATS.
> However when PASID is *enabled* (PASID != 0), then IOMMU
> expects canonical address bit[63-57] should match bit[56]. Otherwise
> it will abort the request.
Okay that's great at least.
But this is a pain.. Since iommu_domains don't act differently
depending on what PASID they are linked to we have to use the least
common denominator, meaning we should disable use of the top bit
entirely, and disable use of sign extension.
> I have requested spec writer to add details in the spec.
Yes please, this special behavior for PASID0 is very shocking.
Jason
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH rc] iommu/amd: Fix geometry.aperture_end for V2 tables
2025-05-28 11:57 ` Jason Gunthorpe
@ 2025-06-12 4:57 ` Vasant Hegde
0 siblings, 0 replies; 7+ messages in thread
From: Vasant Hegde @ 2025-06-12 4:57 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Joerg Roedel, Robin Murphy, Will Deacon, Joerg Roedel,
Jerry Snitselaar, patches, Suravee Suthikulpanit
Jason,
On 5/28/2025 5:27 PM, Jason Gunthorpe wrote:
> On Wed, May 28, 2025 at 02:17:56PM +0530, Vasant Hegde wrote:
>>> Yes, it should use bit 56 for address translation, that is part of the
>>> page table architecture.
>>
>> I have checked with HW architects.
>>
>> In DMA API mode (PASID=0), IOMMU HW does not use canonical
>> addresses.
>
> Do you really mean PASID=0 but still using a AMDv2 table with the
> GCR3? That seems wild that only PASID0 would break the normal rules
> for x86 page tables...
Right . AMD V2 (guest page table) with PASID zero (DMA API Mode).
>
> We already know AMDv1 is different and that is documented, and it only
> works on PASID0.
>
>> It's safe to use bit 56 (5 level page table) -OR- bit 47 (4 level
>> page table) for address translation (we don't need sign
>> extension).
>
> But does it validate that the upper bits are zero or does it ignore
> them? As I mentioned previously ignoring them breaks ATS.
I believe HW will validate and handle ATS requests properly.
>
>> However when PASID is *enabled* (PASID != 0), then IOMMU
>> expects canonical address bit[63-57] should match bit[56]. Otherwise
>> it will abort the request.
>
> Okay that's great at least.
>
> But this is a pain.. Since iommu_domains don't act differently
> depending on what PASID they are linked to we have to use the least
> common denominator, meaning we should disable use of the top bit
> entirely, and disable use of sign extension.
That should be fine. We can limit to `page level - 1` bits.
>
>> I have requested spec writer to add details in the spec.
>
> Yes please, this special behavior for PASID0 is very shocking.
yep!
-Vasant
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-06-12 4:57 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-17 16:21 [PATCH rc] iommu/amd: Fix geometry.aperture_end for V2 tables Jason Gunthorpe
2025-04-24 7:56 ` Vasant Hegde
2025-04-24 14:06 ` Jason Gunthorpe
2025-04-29 6:03 ` Vasant Hegde
2025-05-28 8:47 ` Vasant Hegde
2025-05-28 11:57 ` Jason Gunthorpe
2025-06-12 4:57 ` Vasant Hegde
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).