* Re: [PATCH] iommu/vt-d: Fix unmap_pages support
2021-11-11 0:32 [PATCH] iommu/vt-d: Fix unmap_pages support Alex Williamson
@ 2021-11-11 8:49 ` Ajay Garg
2021-11-11 9:52 ` Giovanni Cabiddu
2021-11-12 2:59 ` Lu Baolu
2 siblings, 0 replies; 6+ messages in thread
From: Ajay Garg @ 2021-11-11 8:49 UTC (permalink / raw)
To: Alex Williamson; +Cc: iommu, dwmw2
Thanks Alex, Baolu.
The patch fixes things at my end. No kernel-flooding is seen now
(tested starting/stopping vm > 10 times).
Thanks and Regards,
Ajay
On Thu, Nov 11, 2021 at 6:03 AM Alex Williamson
<alex.williamson@redhat.com> wrote:
>
> When supporting only the .map and .unmap callbacks of iommu_ops,
> the IOMMU driver can make assumptions about the size and alignment
> used for mappings based on the driver provided pgsize_bitmap. VT-d
> previously used essentially PAGE_MASK for this bitmap as any power
> of two mapping was acceptably filled by native page sizes.
>
> However, with the .map_pages and .unmap_pages interface we're now
> getting page-size and count arguments. If we simply combine these
> as (page-size * count) and make use of the previous map/unmap
> functions internally, any size and alignment assumptions are very
> different.
>
> As an example, a given vfio device assignment VM will often create
> a 4MB mapping at IOVA pfn [0x3fe00 - 0x401ff]. On a system that
> does not support IOMMU super pages, the unmap_pages interface will
> ask to unmap 1024 4KB pages at the base IOVA. dma_pte_clear_level()
> will recurse down to level 2 of the page table where the first half
> of the pfn range exactly matches the entire pte level. We clear the
> pte, increment the pfn by the level size, but (oops) the next pte is
> on a new page, so we exit the loop an pop back up a level. When we
> then update the pfn based on that higher level, we seem to assume
> that the previous pfn value was at the start of the level. In this
> case the level size is 256K pfns, which we add to the base pfn and
> get a results of 0x7fe00, which is clearly greater than 0x401ff,
> so we're done. Meanwhile we never cleared the ptes for the remainder
> of the range. When the VM remaps this range, we're overwriting valid
> ptes and the VT-d driver complains loudly, as reported by the user
> report linked below.
>
> The fix for this seems relatively simple, if each iteration of the
> loop in dma_pte_clear_level() is assumed to clear to the end of the
> level pte page, then our next pfn should be calculated from level_pfn
> rather than our working pfn.
>
> Fixes: 3f34f1259776 ("iommu/vt-d: Implement map/unmap_pages() iommu_ops callback")
> Reported-by: Ajay Garg <ajaygargnsit@gmail.com>
> Link: https://lore.kernel.org/all/20211002124012.18186-1-ajaygargnsit@gmail.com/
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
> drivers/iommu/intel/iommu.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index d75f59ae28e6..f6395f5425f0 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -1249,7 +1249,7 @@ static struct page *dma_pte_clear_level(struct dmar_domain *domain, int level,
> freelist);
> }
> next:
> - pfn += level_size(level);
> + pfn = level_pfn + level_size(level);
> } while (!first_pte_in_page(++pte) && pfn <= last_pfn);
>
> if (first_pte)
>
>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] iommu/vt-d: Fix unmap_pages support
2021-11-11 0:32 [PATCH] iommu/vt-d: Fix unmap_pages support Alex Williamson
2021-11-11 8:49 ` Ajay Garg
@ 2021-11-11 9:52 ` Giovanni Cabiddu
2021-11-12 2:59 ` Lu Baolu
2 siblings, 0 replies; 6+ messages in thread
From: Giovanni Cabiddu @ 2021-11-11 9:52 UTC (permalink / raw)
To: Alex Williamson; +Cc: iommu, dwmw2
I was able to hit the issue that this patch is fixing using the memory
allocator in QATlib (https://github.com/intel/qatlib/) which uses vfio
to map and unmap memory.
This patch fixes the problem.
On Wed, Nov 10, 2021 at 05:32:50PM -0700, Alex Williamson wrote:
> When supporting only the .map and .unmap callbacks of iommu_ops,
> the IOMMU driver can make assumptions about the size and alignment
> used for mappings based on the driver provided pgsize_bitmap. VT-d
> previously used essentially PAGE_MASK for this bitmap as any power
> of two mapping was acceptably filled by native page sizes.
>
> However, with the .map_pages and .unmap_pages interface we're now
> getting page-size and count arguments. If we simply combine these
> as (page-size * count) and make use of the previous map/unmap
> functions internally, any size and alignment assumptions are very
> different.
>
> As an example, a given vfio device assignment VM will often create
> a 4MB mapping at IOVA pfn [0x3fe00 - 0x401ff]. On a system that
> does not support IOMMU super pages, the unmap_pages interface will
> ask to unmap 1024 4KB pages at the base IOVA. dma_pte_clear_level()
> will recurse down to level 2 of the page table where the first half
> of the pfn range exactly matches the entire pte level. We clear the
> pte, increment the pfn by the level size, but (oops) the next pte is
> on a new page, so we exit the loop an pop back up a level. When we
> then update the pfn based on that higher level, we seem to assume
> that the previous pfn value was at the start of the level. In this
> case the level size is 256K pfns, which we add to the base pfn and
> get a results of 0x7fe00, which is clearly greater than 0x401ff,
> so we're done. Meanwhile we never cleared the ptes for the remainder
> of the range. When the VM remaps this range, we're overwriting valid
> ptes and the VT-d driver complains loudly, as reported by the user
> report linked below.
>
> The fix for this seems relatively simple, if each iteration of the
> loop in dma_pte_clear_level() is assumed to clear to the end of the
> level pte page, then our next pfn should be calculated from level_pfn
> rather than our working pfn.
>
> Fixes: 3f34f1259776 ("iommu/vt-d: Implement map/unmap_pages() iommu_ops callback")
> Reported-by: Ajay Garg <ajaygargnsit@gmail.com>
> Link: https://lore.kernel.org/all/20211002124012.18186-1-ajaygargnsit@gmail.com/
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] iommu/vt-d: Fix unmap_pages support
2021-11-11 0:32 [PATCH] iommu/vt-d: Fix unmap_pages support Alex Williamson
2021-11-11 8:49 ` Ajay Garg
2021-11-11 9:52 ` Giovanni Cabiddu
@ 2021-11-12 2:59 ` Lu Baolu
2021-11-18 19:48 ` Jerry Snitselaar
2 siblings, 1 reply; 6+ messages in thread
From: Lu Baolu @ 2021-11-12 2:59 UTC (permalink / raw)
To: Alex Williamson; +Cc: iommu, dwmw2
Hi Alex,
On 11/11/21 8:32 AM, Alex Williamson wrote:
> When supporting only the .map and .unmap callbacks of iommu_ops,
> the IOMMU driver can make assumptions about the size and alignment
> used for mappings based on the driver provided pgsize_bitmap. VT-d
> previously used essentially PAGE_MASK for this bitmap as any power
> of two mapping was acceptably filled by native page sizes.
>
> However, with the .map_pages and .unmap_pages interface we're now
> getting page-size and count arguments. If we simply combine these
> as (page-size * count) and make use of the previous map/unmap
> functions internally, any size and alignment assumptions are very
> different.
>
> As an example, a given vfio device assignment VM will often create
> a 4MB mapping at IOVA pfn [0x3fe00 - 0x401ff]. On a system that
> does not support IOMMU super pages, the unmap_pages interface will
> ask to unmap 1024 4KB pages at the base IOVA. dma_pte_clear_level()
> will recurse down to level 2 of the page table where the first half
> of the pfn range exactly matches the entire pte level. We clear the
> pte, increment the pfn by the level size, but (oops) the next pte is
> on a new page, so we exit the loop an pop back up a level. When we
> then update the pfn based on that higher level, we seem to assume
> that the previous pfn value was at the start of the level. In this
> case the level size is 256K pfns, which we add to the base pfn and
> get a results of 0x7fe00, which is clearly greater than 0x401ff,
> so we're done. Meanwhile we never cleared the ptes for the remainder
> of the range. When the VM remaps this range, we're overwriting valid
> ptes and the VT-d driver complains loudly, as reported by the user
> report linked below.
>
> The fix for this seems relatively simple, if each iteration of the
> loop in dma_pte_clear_level() is assumed to clear to the end of the
> level pte page, then our next pfn should be calculated from level_pfn
> rather than our working pfn.
>
> Fixes: 3f34f1259776 ("iommu/vt-d: Implement map/unmap_pages() iommu_ops callback")
> Reported-by: Ajay Garg <ajaygargnsit@gmail.com>
> Link: https://lore.kernel.org/all/20211002124012.18186-1-ajaygargnsit@gmail.com/
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Thank you for fixing this! I will queue it for v5.16.
Best regards,
baolu
> ---
> drivers/iommu/intel/iommu.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index d75f59ae28e6..f6395f5425f0 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -1249,7 +1249,7 @@ static struct page *dma_pte_clear_level(struct dmar_domain *domain, int level,
> freelist);
> }
> next:
> - pfn += level_size(level);
> + pfn = level_pfn + level_size(level);
> } while (!first_pte_in_page(++pte) && pfn <= last_pfn);
>
> if (first_pte)
>
>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] iommu/vt-d: Fix unmap_pages support
2021-11-12 2:59 ` Lu Baolu
@ 2021-11-18 19:48 ` Jerry Snitselaar
2021-11-19 0:52 ` Lu Baolu
0 siblings, 1 reply; 6+ messages in thread
From: Jerry Snitselaar @ 2021-11-18 19:48 UTC (permalink / raw)
To: Lu Baolu, Alex Williamson; +Cc: iommu, dwmw2
On Fri, 2021-11-12 at 10:59 +0800, Lu Baolu wrote:
> Hi Alex,
>
> On 11/11/21 8:32 AM, Alex Williamson wrote:
> > When supporting only the .map and .unmap callbacks of iommu_ops,
> > the IOMMU driver can make assumptions about the size and alignment
> > used for mappings based on the driver provided pgsize_bitmap. VT-d
> > previously used essentially PAGE_MASK for this bitmap as any power
> > of two mapping was acceptably filled by native page sizes.
> >
> > However, with the .map_pages and .unmap_pages interface we're now
> > getting page-size and count arguments. If we simply combine these
> > as (page-size * count) and make use of the previous map/unmap
> > functions internally, any size and alignment assumptions are very
> > different.
> >
> > As an example, a given vfio device assignment VM will often create
> > a 4MB mapping at IOVA pfn [0x3fe00 - 0x401ff]. On a system that
> > does not support IOMMU super pages, the unmap_pages interface will
> > ask to unmap 1024 4KB pages at the base IOVA.
> > dma_pte_clear_level()
> > will recurse down to level 2 of the page table where the first half
> > of the pfn range exactly matches the entire pte level. We clear
> > the
> > pte, increment the pfn by the level size, but (oops) the next pte
> > is
> > on a new page, so we exit the loop an pop back up a level. When we
> > then update the pfn based on that higher level, we seem to assume
> > that the previous pfn value was at the start of the level. In this
> > case the level size is 256K pfns, which we add to the base pfn and
> > get a results of 0x7fe00, which is clearly greater than 0x401ff,
> > so we're done. Meanwhile we never cleared the ptes for the
> > remainder
> > of the range. When the VM remaps this range, we're overwriting
> > valid
> > ptes and the VT-d driver complains loudly, as reported by the user
> > report linked below.
> >
> > The fix for this seems relatively simple, if each iteration of the
> > loop in dma_pte_clear_level() is assumed to clear to the end of the
> > level pte page, then our next pfn should be calculated from
> > level_pfn
> > rather than our working pfn.
> >
> > Fixes: 3f34f1259776 ("iommu/vt-d: Implement map/unmap_pages()
> > iommu_ops callback")
> > Reported-by: Ajay Garg <ajaygargnsit@gmail.com>
> > Link:
> > https://lore.kernel.org/all/20211002124012.18186-1-ajaygargnsit@gmail.com/
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
>
> Thank you for fixing this! I will queue it for v5.16.
>
> Best regards,
> baolu
>
Hi Baolu,
Do you have an estimate of when this will be submitted?
Regards,
Jerry
> > ---
> > drivers/iommu/intel/iommu.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/intel/iommu.c
> > b/drivers/iommu/intel/iommu.c
> > index d75f59ae28e6..f6395f5425f0 100644
> > --- a/drivers/iommu/intel/iommu.c
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -1249,7 +1249,7 @@ static struct page
> > *dma_pte_clear_level(struct dmar_domain *domain, int level,
> > freelist);
> > }
> > next:
> > - pfn += level_size(level);
> > + pfn = level_pfn + level_size(level);
> > } while (!first_pte_in_page(++pte) && pfn <= last_pfn);
> >
> > if (first_pte)
> >
> >
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] iommu/vt-d: Fix unmap_pages support
2021-11-18 19:48 ` Jerry Snitselaar
@ 2021-11-19 0:52 ` Lu Baolu
0 siblings, 0 replies; 6+ messages in thread
From: Lu Baolu @ 2021-11-19 0:52 UTC (permalink / raw)
To: Jerry Snitselaar, Alex Williamson; +Cc: iommu, dwmw2
Hi Jerry,
On 11/19/21 3:48 AM, Jerry Snitselaar wrote:
> On Fri, 2021-11-12 at 10:59 +0800, Lu Baolu wrote:
>> Hi Alex,
>>
>> On 11/11/21 8:32 AM, Alex Williamson wrote:
>>> When supporting only the .map and .unmap callbacks of iommu_ops,
>>> the IOMMU driver can make assumptions about the size and alignment
>>> used for mappings based on the driver provided pgsize_bitmap. VT-d
>>> previously used essentially PAGE_MASK for this bitmap as any power
>>> of two mapping was acceptably filled by native page sizes.
>>>
>>> However, with the .map_pages and .unmap_pages interface we're now
>>> getting page-size and count arguments. If we simply combine these
>>> as (page-size * count) and make use of the previous map/unmap
>>> functions internally, any size and alignment assumptions are very
>>> different.
>>>
>>> As an example, a given vfio device assignment VM will often create
>>> a 4MB mapping at IOVA pfn [0x3fe00 - 0x401ff]. On a system that
>>> does not support IOMMU super pages, the unmap_pages interface will
>>> ask to unmap 1024 4KB pages at the base IOVA.
>>> dma_pte_clear_level()
>>> will recurse down to level 2 of the page table where the first half
>>> of the pfn range exactly matches the entire pte level. We clear
>>> the
>>> pte, increment the pfn by the level size, but (oops) the next pte
>>> is
>>> on a new page, so we exit the loop an pop back up a level. When we
>>> then update the pfn based on that higher level, we seem to assume
>>> that the previous pfn value was at the start of the level. In this
>>> case the level size is 256K pfns, which we add to the base pfn and
>>> get a results of 0x7fe00, which is clearly greater than 0x401ff,
>>> so we're done. Meanwhile we never cleared the ptes for the
>>> remainder
>>> of the range. When the VM remaps this range, we're overwriting
>>> valid
>>> ptes and the VT-d driver complains loudly, as reported by the user
>>> report linked below.
>>>
>>> The fix for this seems relatively simple, if each iteration of the
>>> loop in dma_pte_clear_level() is assumed to clear to the end of the
>>> level pte page, then our next pfn should be calculated from
>>> level_pfn
>>> rather than our working pfn.
>>>
>>> Fixes: 3f34f1259776 ("iommu/vt-d: Implement map/unmap_pages()
>>> iommu_ops callback")
>>> Reported-by: Ajay Garg <ajaygargnsit@gmail.com>
>>> Link:
>>> https://lore.kernel.org/all/20211002124012.18186-1-ajaygargnsit@gmail.com/
>>> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
>>
>> Thank you for fixing this! I will queue it for v5.16.
>>
>> Best regards,
>> baolu
>>
>
> Hi Baolu,
>
> Do you have an estimate of when this will be submitted?
I will submit all fix patches in my queue to Joerg early the next week.
Best regards,
baolu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 6+ messages in thread