* [PATCH] iommu/intel: Fix __domain_mapping()'s usage of switch_to_super_page()
@ 2025-08-26 14:38 Eugene Koira
2025-08-26 14:55 ` Nicolas Saenz Julienne
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Eugene Koira @ 2025-08-26 14:38 UTC (permalink / raw)
To: iommu, linux-kernel
Cc: dwmw2, baolu.lu, joro, will, robin.murphy, longpeng2, graf,
nsaenz, nh-open-source, stable
switch_to_super_page() assumes the memory range it's working on is aligned
to the target large page level. Unfortunately, __domain_mapping() doesn't
take this into account when using it, and will pass unaligned ranges
ultimately freeing a PTE range larger than expected.
Take for example a mapping with the following iov_pfn range [0x3fe400,
0x4c0600], which should be backed by the following mappings:
iov_pfn [0x3fe400, 0x3fffff] covered by 2MiB pages
iov_pfn [0x400000, 0x4bffff] covered by 1GiB pages
iov_pfn [0x4c0000, 0x4c05ff] covered by 2MiB pages
Under this circumstance, __domain_mapping() will pass [0x400000, 0x4c05ff]
to switch_to_super_page() at a 1 GiB granularity, which will in turn
free PTEs all the way to iov_pfn 0x4fffff.
Mitigate this by rounding down the iov_pfn range passed to
switch_to_super_page() in __domain_mapping()
to the target large page level.
Additionally add range alignment checks to switch_to_super_page.
Fixes: 9906b9352a35 ("iommu/vt-d: Avoid duplicate removing in __domain_mapping()")
Signed-off-by: Eugene Koira <eugkoira@amazon.com>
Cc: stable@vger.kernel.org
---
drivers/iommu/intel/iommu.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 9c3ab9d9f69a..dff2d895b8ab 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1575,6 +1575,10 @@ static void switch_to_super_page(struct dmar_domain *domain,
unsigned long lvl_pages = lvl_to_nr_pages(level);
struct dma_pte *pte = NULL;
+ if (WARN_ON(!IS_ALIGNED(start_pfn, lvl_pages) ||
+ !IS_ALIGNED(end_pfn + 1, lvl_pages)))
+ return;
+
while (start_pfn <= end_pfn) {
if (!pte)
pte = pfn_to_dma_pte(domain, start_pfn, &level,
@@ -1650,7 +1654,8 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
unsigned long pages_to_remove;
pteval |= DMA_PTE_LARGE_PAGE;
- pages_to_remove = min_t(unsigned long, nr_pages,
+ pages_to_remove = min_t(unsigned long,
+ round_down(nr_pages, lvl_pages),
nr_pte_to_next_page(pte) * lvl_pages);
end_pfn = iov_pfn + pages_to_remove - 1;
switch_to_super_page(domain, iov_pfn, end_pfn, largepage_lvl);
--
2.47.3
Amazon Development Center (Netherlands) B.V., Johanna Westerdijkplein 1, NL-2521 EN The Hague, Registration No. Chamber of Commerce 56869649, VAT: NL 852339859B01
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] iommu/intel: Fix __domain_mapping()'s usage of switch_to_super_page()
2025-08-26 14:38 [PATCH] iommu/intel: Fix __domain_mapping()'s usage of switch_to_super_page() Eugene Koira
@ 2025-08-26 14:55 ` Nicolas Saenz Julienne
2025-08-26 15:48 ` David Woodhouse
2025-09-01 5:55 ` Baolu Lu
2 siblings, 0 replies; 6+ messages in thread
From: Nicolas Saenz Julienne @ 2025-08-26 14:55 UTC (permalink / raw)
To: Eugene Koira, iommu, linux-kernel
Cc: dwmw2, baolu.lu, joro, will, robin.murphy, longpeng2, graf,
nsaenz, nh-open-source, stable
On Tue Aug 26, 2025 at 2:38 PM UTC, Eugene Koira wrote:
> switch_to_super_page() assumes the memory range it's working on is aligned
> to the target large page level. Unfortunately, __domain_mapping() doesn't
> take this into account when using it, and will pass unaligned ranges
> ultimately freeing a PTE range larger than expected.
>
> Take for example a mapping with the following iov_pfn range [0x3fe400,
> 0x4c0600], which should be backed by the following mappings:
>
> iov_pfn [0x3fe400, 0x3fffff] covered by 2MiB pages
> iov_pfn [0x400000, 0x4bffff] covered by 1GiB pages
> iov_pfn [0x4c0000, 0x4c05ff] covered by 2MiB pages
>
> Under this circumstance, __domain_mapping() will pass [0x400000, 0x4c05ff]
> to switch_to_super_page() at a 1 GiB granularity, which will in turn
> free PTEs all the way to iov_pfn 0x4fffff.
>
> Mitigate this by rounding down the iov_pfn range passed to
> switch_to_super_page() in __domain_mapping()
> to the target large page level.
>
> Additionally add range alignment checks to switch_to_super_page.
>
> Fixes: 9906b9352a35 ("iommu/vt-d: Avoid duplicate removing in __domain_mapping()")
> Signed-off-by: Eugene Koira <eugkoira@amazon.com>
> Cc: stable@vger.kernel.org
> ---
> drivers/iommu/intel/iommu.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 9c3ab9d9f69a..dff2d895b8ab 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -1575,6 +1575,10 @@ static void switch_to_super_page(struct dmar_domain *domain,
> unsigned long lvl_pages = lvl_to_nr_pages(level);
> struct dma_pte *pte = NULL;
>
> + if (WARN_ON(!IS_ALIGNED(start_pfn, lvl_pages) ||
> + !IS_ALIGNED(end_pfn + 1, lvl_pages)))
It might make sense to downgrade the warning to WARN_ON_ONCE().
Other than that:
Reviewed-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
Regards,
Nicolas
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] iommu/intel: Fix __domain_mapping()'s usage of switch_to_super_page()
2025-08-26 14:38 [PATCH] iommu/intel: Fix __domain_mapping()'s usage of switch_to_super_page() Eugene Koira
2025-08-26 14:55 ` Nicolas Saenz Julienne
@ 2025-08-26 15:48 ` David Woodhouse
2025-08-28 6:33 ` Baolu Lu
2025-09-01 5:55 ` Baolu Lu
2 siblings, 1 reply; 6+ messages in thread
From: David Woodhouse @ 2025-08-26 15:48 UTC (permalink / raw)
To: Eugene Koira, iommu, linux-kernel
Cc: baolu.lu, joro, will, robin.murphy, longpeng2, graf, nsaenz,
nh-open-source, stable
[-- Attachment #1: Type: text/plain, Size: 3300 bytes --]
On Tue, 2025-08-26 at 14:38 +0000, Eugene Koira wrote:
> switch_to_super_page() assumes the memory range it's working on is aligned
> to the target large page level. Unfortunately, __domain_mapping() doesn't
> take this into account when using it, and will pass unaligned ranges
> ultimately freeing a PTE range larger than expected.
>
> Take for example a mapping with the following iov_pfn range [0x3fe400,
> 0x4c0600], which should be backed by the following mappings:
The range is [0x3fe400, 0x4c0600) ?
> iov_pfn [0x3fe400, 0x3fffff] covered by 2MiB pages
> iov_pfn [0x400000, 0x4bffff] covered by 1GiB pages
> iov_pfn [0x4c0000, 0x4c05ff] covered by 2MiB pages
>
> Under this circumstance, __domain_mapping() will pass [0x400000, 0x4c05ff]
> to switch_to_super_page() at a 1 GiB granularity, which will in turn
> free PTEs all the way to iov_pfn 0x4fffff.
>
> Mitigate this by rounding down the iov_pfn range passed to
> switch_to_super_page() in __domain_mapping()
> to the target large page level.
>
> Additionally add range alignment checks to switch_to_super_page.
>
> Fixes: 9906b9352a35 ("iommu/vt-d: Avoid duplicate removing in __domain_mapping()")
> Signed-off-by: Eugene Koira <eugkoira@amazon.com>
> Cc: stable@vger.kernel.org
> ---
> drivers/iommu/intel/iommu.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 9c3ab9d9f69a..dff2d895b8ab 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -1575,6 +1575,10 @@ static void switch_to_super_page(struct dmar_domain *domain,
> unsigned long lvl_pages = lvl_to_nr_pages(level);
> struct dma_pte *pte = NULL;
>
> + if (WARN_ON(!IS_ALIGNED(start_pfn, lvl_pages) ||
> + !IS_ALIGNED(end_pfn + 1, lvl_pages)))
> + return;
> +
> while (start_pfn <= end_pfn) {
> if (!pte)
> pte = pfn_to_dma_pte(domain, start_pfn, &level,
> @@ -1650,7 +1654,8 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
> unsigned long pages_to_remove;
>
> pteval |= DMA_PTE_LARGE_PAGE;
> - pages_to_remove = min_t(unsigned long, nr_pages,
> + pages_to_remove = min_t(unsigned long,
> + round_down(nr_pages, lvl_pages),
> nr_pte_to_next_page(pte) * lvl_pages);
> end_pfn = iov_pfn + pages_to_remove - 1;
> switch_to_super_page(domain, iov_pfn, end_pfn, largepage_lvl);
I'm mildly entertained by the fact that the *only* comment in this
block of code is a lie. Would you care to address that while you're
here? Maybe the comment could look something like...
/* If the new mapping is eligible for a large page, then
* remove all smaller pages that the existing pte at this
* level references.
* XX: do we even need to bother calling switch_to_super_page()
* if this PTE wasn't *present* before?
*/
I bet it would benefit from one or two other one-line comments to make
it clearer what's going on, too...
But in general, I think this looks sane even though this code makes my
brain hurt. Could do with a test case, in an ideal world. Maybe we can
work on that as part of the generic pagetable support which is coming?
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] iommu/intel: Fix __domain_mapping()'s usage of switch_to_super_page()
2025-08-26 15:48 ` David Woodhouse
@ 2025-08-28 6:33 ` Baolu Lu
2025-08-28 7:29 ` David Woodhouse
0 siblings, 1 reply; 6+ messages in thread
From: Baolu Lu @ 2025-08-28 6:33 UTC (permalink / raw)
To: David Woodhouse, Eugene Koira, iommu, linux-kernel
Cc: joro, will, robin.murphy, longpeng2, graf, nsaenz, nh-open-source,
stable
On 8/26/25 23:48, David Woodhouse wrote:
> On Tue, 2025-08-26 at 14:38 +0000, Eugene Koira wrote:
>> switch_to_super_page() assumes the memory range it's working on is aligned
>> to the target large page level. Unfortunately, __domain_mapping() doesn't
>> take this into account when using it, and will pass unaligned ranges
>> ultimately freeing a PTE range larger than expected.
>>
>> Take for example a mapping with the following iov_pfn range [0x3fe400,
>> 0x4c0600], which should be backed by the following mappings:
>
> The range is [0x3fe400, 0x4c0600) ?
>
>> iov_pfn [0x3fe400, 0x3fffff] covered by 2MiB pages
>> iov_pfn [0x400000, 0x4bffff] covered by 1GiB pages
>> iov_pfn [0x4c0000, 0x4c05ff] covered by 2MiB pages
>>
>> Under this circumstance, __domain_mapping() will pass [0x400000, 0x4c05ff]
>> to switch_to_super_page() at a 1 GiB granularity, which will in turn
>> free PTEs all the way to iov_pfn 0x4fffff.
>>
>> Mitigate this by rounding down the iov_pfn range passed to
>> switch_to_super_page() in __domain_mapping()
>> to the target large page level.
>>
>> Additionally add range alignment checks to switch_to_super_page.
>>
>> Fixes: 9906b9352a35 ("iommu/vt-d: Avoid duplicate removing in __domain_mapping()")
>> Signed-off-by: Eugene Koira <eugkoira@amazon.com>
>> Cc: stable@vger.kernel.org
>> ---
>> drivers/iommu/intel/iommu.c | 7 ++++++-
>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>> index 9c3ab9d9f69a..dff2d895b8ab 100644
>> --- a/drivers/iommu/intel/iommu.c
>> +++ b/drivers/iommu/intel/iommu.c
>> @@ -1575,6 +1575,10 @@ static void switch_to_super_page(struct dmar_domain *domain,
>> unsigned long lvl_pages = lvl_to_nr_pages(level);
>> struct dma_pte *pte = NULL;
>>
>> + if (WARN_ON(!IS_ALIGNED(start_pfn, lvl_pages) ||
>> + !IS_ALIGNED(end_pfn + 1, lvl_pages)))
>> + return;
>> +
>> while (start_pfn <= end_pfn) {
>> if (!pte)
>> pte = pfn_to_dma_pte(domain, start_pfn, &level,
>> @@ -1650,7 +1654,8 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
>> unsigned long pages_to_remove;
>>
>> pteval |= DMA_PTE_LARGE_PAGE;
>> - pages_to_remove = min_t(unsigned long, nr_pages,
>> + pages_to_remove = min_t(unsigned long,
>> + round_down(nr_pages, lvl_pages),
>> nr_pte_to_next_page(pte) * lvl_pages);
>> end_pfn = iov_pfn + pages_to_remove - 1;
>> switch_to_super_page(domain, iov_pfn, end_pfn, largepage_lvl);
>
> I'm mildly entertained by the fact that the *only* comment in this
> block of code is a lie. Would you care to address that while you're
> here? Maybe the comment could look something like...
>
> /* If the new mapping is eligible for a large page, then
> * remove all smaller pages that the existing pte at this
> * level references.
> * XX: do we even need to bother calling switch_to_super_page()
> * if this PTE wasn't *present* before?
> */
>
> I bet it would benefit from one or two other one-line comments to make
> it clearer what's going on, too...
>
> But in general, I think this looks sane even though this code makes my
> brain hurt. Could do with a test case, in an ideal world. Maybe we can
> work on that as part of the generic pagetable support which is coming?
Agreed. The generic pagetable work will replace this code, so this will
be removed. Therefore, we need a fix patch that can be backported before
the generic pagetable for vt-d lands.
Thanks,
baolu
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] iommu/intel: Fix __domain_mapping()'s usage of switch_to_super_page()
2025-08-28 6:33 ` Baolu Lu
@ 2025-08-28 7:29 ` David Woodhouse
0 siblings, 0 replies; 6+ messages in thread
From: David Woodhouse @ 2025-08-28 7:29 UTC (permalink / raw)
To: Baolu Lu, Eugene Koira, iommu, linux-kernel
Cc: joro, will, robin.murphy, longpeng2, graf, nsaenz, nh-open-source,
stable
[-- Attachment #1: Type: text/plain, Size: 4153 bytes --]
On Thu, 2025-08-28 at 14:33 +0800, Baolu Lu wrote:
> On 8/26/25 23:48, David Woodhouse wrote:
> > On Tue, 2025-08-26 at 14:38 +0000, Eugene Koira wrote:
> > > switch_to_super_page() assumes the memory range it's working on is aligned
> > > to the target large page level. Unfortunately, __domain_mapping() doesn't
> > > take this into account when using it, and will pass unaligned ranges
> > > ultimately freeing a PTE range larger than expected.
> > >
> > > Take for example a mapping with the following iov_pfn range [0x3fe400,
> > > 0x4c0600], which should be backed by the following mappings:
> >
> > The range is [0x3fe400, 0x4c0600) ?
> >
> > > iov_pfn [0x3fe400, 0x3fffff] covered by 2MiB pages
> > > iov_pfn [0x400000, 0x4bffff] covered by 1GiB pages
> > > iov_pfn [0x4c0000, 0x4c05ff] covered by 2MiB pages
> > >
> > > Under this circumstance, __domain_mapping() will pass [0x400000, 0x4c05ff]
> > > to switch_to_super_page() at a 1 GiB granularity, which will in turn
> > > free PTEs all the way to iov_pfn 0x4fffff.
> > >
> > > Mitigate this by rounding down the iov_pfn range passed to
> > > switch_to_super_page() in __domain_mapping()
> > > to the target large page level.
> > >
> > > Additionally add range alignment checks to switch_to_super_page.
> > >
> > > Fixes: 9906b9352a35 ("iommu/vt-d: Avoid duplicate removing in __domain_mapping()")
> > > Signed-off-by: Eugene Koira <eugkoira@amazon.com>
> > > Cc: stable@vger.kernel.org
> > > ---
> > > drivers/iommu/intel/iommu.c | 7 ++++++-
> > > 1 file changed, 6 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > > index 9c3ab9d9f69a..dff2d895b8ab 100644
> > > --- a/drivers/iommu/intel/iommu.c
> > > +++ b/drivers/iommu/intel/iommu.c
> > > @@ -1575,6 +1575,10 @@ static void switch_to_super_page(struct dmar_domain *domain,
> > > unsigned long lvl_pages = lvl_to_nr_pages(level);
> > > struct dma_pte *pte = NULL;
> > >
> > > + if (WARN_ON(!IS_ALIGNED(start_pfn, lvl_pages) ||
> > > + !IS_ALIGNED(end_pfn + 1, lvl_pages)))
> > > + return;
> > > +
> > > while (start_pfn <= end_pfn) {
> > > if (!pte)
> > > pte = pfn_to_dma_pte(domain, start_pfn, &level,
> > > @@ -1650,7 +1654,8 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
> > > unsigned long pages_to_remove;
> > >
> > > pteval |= DMA_PTE_LARGE_PAGE;
> > > - pages_to_remove = min_t(unsigned long, nr_pages,
> > > + pages_to_remove = min_t(unsigned long,
> > > + round_down(nr_pages, lvl_pages),
> > > nr_pte_to_next_page(pte) * lvl_pages);
> > > end_pfn = iov_pfn + pages_to_remove - 1;
> > > switch_to_super_page(domain, iov_pfn, end_pfn, largepage_lvl);
> >
> > I'm mildly entertained by the fact that the *only* comment in this
> > block of code is a lie. Would you care to address that while you're
> > here? Maybe the comment could look something like...
> >
> > /* If the new mapping is eligible for a large page, then
> > * remove all smaller pages that the existing pte at this
> > * level references.
> > * XX: do we even need to bother calling switch_to_super_page()
> > * if this PTE wasn't *present* before?
> > */
> >
> > I bet it would benefit from one or two other one-line comments to make
> > it clearer what's going on, too...
> >
> > But in general, I think this looks sane even though this code makes my
> > brain hurt. Could do with a test case, in an ideal world. Maybe we can
> > work on that as part of the generic pagetable support which is coming?
>
> Agreed. The generic pagetable work will replace this code, so this will
> be removed. Therefore, we need a fix patch that can be backported before
> the generic pagetable for vt-d lands.
Yep. And since Jason has in fact already posted the patches which
*delete* all this code, I don't think I even care about fixing the
comments. Eugene's patch is fine as-is.
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] iommu/intel: Fix __domain_mapping()'s usage of switch_to_super_page()
2025-08-26 14:38 [PATCH] iommu/intel: Fix __domain_mapping()'s usage of switch_to_super_page() Eugene Koira
2025-08-26 14:55 ` Nicolas Saenz Julienne
2025-08-26 15:48 ` David Woodhouse
@ 2025-09-01 5:55 ` Baolu Lu
2 siblings, 0 replies; 6+ messages in thread
From: Baolu Lu @ 2025-09-01 5:55 UTC (permalink / raw)
To: Eugene Koira, iommu, linux-kernel
Cc: dwmw2, joro, will, robin.murphy, longpeng2, graf, nsaenz,
nh-open-source, stable
On 8/26/25 22:38, Eugene Koira wrote:
> switch_to_super_page() assumes the memory range it's working on is aligned
> to the target large page level. Unfortunately, __domain_mapping() doesn't
> take this into account when using it, and will pass unaligned ranges
> ultimately freeing a PTE range larger than expected.
>
> Take for example a mapping with the following iov_pfn range [0x3fe400,
> 0x4c0600], which should be backed by the following mappings:
>
> iov_pfn [0x3fe400, 0x3fffff] covered by 2MiB pages
> iov_pfn [0x400000, 0x4bffff] covered by 1GiB pages
> iov_pfn [0x4c0000, 0x4c05ff] covered by 2MiB pages
>
> Under this circumstance, __domain_mapping() will pass [0x400000, 0x4c05ff]
> to switch_to_super_page() at a 1 GiB granularity, which will in turn
> free PTEs all the way to iov_pfn 0x4fffff.
>
> Mitigate this by rounding down the iov_pfn range passed to
> switch_to_super_page() in __domain_mapping()
> to the target large page level.
>
> Additionally add range alignment checks to switch_to_super_page.
>
> Fixes: 9906b9352a35 ("iommu/vt-d: Avoid duplicate removing in __domain_mapping()")
> Signed-off-by: Eugene Koira<eugkoira@amazon.com>
> Cc:stable@vger.kernel.org
> ---
> drivers/iommu/intel/iommu.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
Queued for v6.17-rc. Thank you!
Thanks,
baolu
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-09-01 5:58 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-26 14:38 [PATCH] iommu/intel: Fix __domain_mapping()'s usage of switch_to_super_page() Eugene Koira
2025-08-26 14:55 ` Nicolas Saenz Julienne
2025-08-26 15:48 ` David Woodhouse
2025-08-28 6:33 ` Baolu Lu
2025-08-28 7:29 ` David Woodhouse
2025-09-01 5:55 ` Baolu Lu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).