[PATCH 1/1] iommu/vt-d: Fix incorrect cache invalidation for mm notification

stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/1] iommu/vt-d: Fix incorrect cache invalidation for mm notification
@ 2023-11-17  9:09 Lu Baolu
  2023-11-17 13:11 ` Jason Gunthorpe
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Lu Baolu @ 2023-11-17  9:09 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian
  Cc: iommu, linux-kernel, Lu Baolu, stable, Huang Ying,
	Alistair Popple, Luo Yuzhang, Tony Zhu

Commit 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when
invalidating TLBs") moved the secondary TLB invalidations into the TLB
invalidation functions to ensure that all secondary TLB invalidations
happen at the same time as the CPU invalidation and added a flush-all
type of secondary TLB invalidation for the batched mode, where a range
of [0, -1UL) is used to indicates that the range extends to the end of
the address space.

However, using an end address of -1UL caused an overflow in the Intel
IOMMU driver, where the end address was rounded up to the next page.
As a result, both the IOTLB and device ATC were not invalidated correctly.

Add a flush all helper function and call it when the invalidation range
is from 0 to -1UL, ensuring that the entire caches are invalidated
correctly.

Fixes: 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when invalidating TLBs")
Cc: stable@vger.kernel.org
Cc: Huang Ying <ying.huang@intel.com>
Cc: Alistair Popple <apopple@nvidia.com>
Tested-by: Luo Yuzhang <yuzhang.luo@intel.com> # QAT
Tested-by: Tony Zhu <tony.zhu@intel.com> # DSA
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel/svm.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 50a481c895b8..588385050a07 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -216,6 +216,27 @@ static void intel_flush_svm_range(struct intel_svm *svm, unsigned long address,
 	rcu_read_unlock();
 }
 
+static void intel_flush_svm_all(struct intel_svm *svm)
+{
+	struct device_domain_info *info;
+	struct intel_svm_dev *sdev;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(sdev, &svm->devs, list) {
+		info = dev_iommu_priv_get(sdev->dev);
+
+		qi_flush_piotlb(sdev->iommu, sdev->did, svm->pasid, 0, -1UL, 1);
+		if (info->ats_enabled) {
+			qi_flush_dev_iotlb_pasid(sdev->iommu, sdev->sid, info->pfsid,
+						 svm->pasid, sdev->qdep,
+						 0, 64 - VTD_PAGE_SHIFT);
+			quirk_extra_dev_tlb_flush(info, 0, 64 - VTD_PAGE_SHIFT,
+						  svm->pasid, sdev->qdep);
+		}
+	}
+	rcu_read_unlock();
+}
+
 /* Pages have been freed at this point */
 static void intel_arch_invalidate_secondary_tlbs(struct mmu_notifier *mn,
 					struct mm_struct *mm,
@@ -223,6 +244,11 @@ static void intel_arch_invalidate_secondary_tlbs(struct mmu_notifier *mn,
 {
 	struct intel_svm *svm = container_of(mn, struct intel_svm, notifier);
 
+	if (start == 0 && end == -1UL) {
+		intel_flush_svm_all(svm);
+		return;
+	}
+
 	intel_flush_svm_range(svm, start,
 			      (end - start + PAGE_SIZE - 1) >> VTD_PAGE_SHIFT, 0);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] iommu/vt-d: Fix incorrect cache invalidation for mm notification
  2023-11-17  9:09 [PATCH 1/1] iommu/vt-d: Fix incorrect cache invalidation for mm notification Lu Baolu
@ 2023-11-17 13:11 ` Jason Gunthorpe
  2023-11-19 23:57 ` Alistair Popple
  2023-11-20  3:45 ` Tian, Kevin
  2 siblings, 0 replies; 7+ messages in thread
From: Jason Gunthorpe @ 2023-11-17 13:11 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian, iommu,
	linux-kernel, stable, Huang Ying, Alistair Popple, Luo Yuzhang,
	Tony Zhu

On Fri, Nov 17, 2023 at 05:09:33PM +0800, Lu Baolu wrote:
> Commit 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when
> invalidating TLBs") moved the secondary TLB invalidations into the TLB
> invalidation functions to ensure that all secondary TLB invalidations
> happen at the same time as the CPU invalidation and added a flush-all
> type of secondary TLB invalidation for the batched mode, where a range
> of [0, -1UL) is used to indicates that the range extends to the end of
> the address space.
> 
> However, using an end address of -1UL caused an overflow in the Intel
> IOMMU driver, where the end address was rounded up to the next page.
> As a result, both the IOTLB and device ATC were not invalidated correctly.
> 
> Add a flush all helper function and call it when the invalidation range
> is from 0 to -1UL, ensuring that the entire caches are invalidated
> correctly.
> 
> Fixes: 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when invalidating TLBs")
> Cc: stable@vger.kernel.org
> Cc: Huang Ying <ying.huang@intel.com>
> Cc: Alistair Popple <apopple@nvidia.com>
> Tested-by: Luo Yuzhang <yuzhang.luo@intel.com> # QAT
> Tested-by: Tony Zhu <tony.zhu@intel.com> # DSA
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/intel/svm.c | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

This should go to -rc

Jason

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] iommu/vt-d: Fix incorrect cache invalidation for mm notification
  2023-11-17  9:09 [PATCH 1/1] iommu/vt-d: Fix incorrect cache invalidation for mm notification Lu Baolu
  2023-11-17 13:11 ` Jason Gunthorpe
@ 2023-11-19 23:57 ` Alistair Popple
  2023-11-20  2:55   ` Huang, Ying
  2023-11-20  3:45 ` Tian, Kevin
  2 siblings, 1 reply; 7+ messages in thread
From: Alistair Popple @ 2023-11-19 23:57 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian, iommu, linux-kernel, stable, Huang Ying, Luo Yuzhang,
	Tony Zhu


Lu Baolu <baolu.lu@linux.intel.com> writes:

> Commit 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when
> invalidating TLBs") moved the secondary TLB invalidations into the TLB
> invalidation functions to ensure that all secondary TLB invalidations
> happen at the same time as the CPU invalidation and added a flush-all
> type of secondary TLB invalidation for the batched mode, where a range
> of [0, -1UL) is used to indicates that the range extends to the end of
> the address space.
>
> However, using an end address of -1UL caused an overflow in the Intel
> IOMMU driver, where the end address was rounded up to the next page.
> As a result, both the IOTLB and device ATC were not invalidated correctly.

Thanks for catching. This fix looks good so:

Reviewed-by: Alistair Popple <apopple@nvidia.com>

However examining the fixes patch again I note that we are calling
mmu_notifier_invalidate_range(mm, 0, -1UL) from
arch_tlbbatch_add_pending() in arch/x86/include/asm/tlbflush.h.

That seems suboptimal because we would be doing an invalidate all for
every page unmap, and as of db6c1f6f236d ("mm/tlbbatch: introduce
arch_flush_tlb_batched_pending()") arch_flush_tlb_batched_pending()
calls flush_tlb_mm() anyway. So I think we can probably drop the
explicit notifier call from arch_flush_tlb_batched_pending().

Will put togeather a patch for that.

 - Alistair

> Add a flush all helper function and call it when the invalidation range
> is from 0 to -1UL, ensuring that the entire caches are invalidated
> correctly.
>
> Fixes: 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when invalidating TLBs")
> Cc: stable@vger.kernel.org
> Cc: Huang Ying <ying.huang@intel.com>
> Cc: Alistair Popple <apopple@nvidia.com>
> Tested-by: Luo Yuzhang <yuzhang.luo@intel.com> # QAT
> Tested-by: Tony Zhu <tony.zhu@intel.com> # DSA
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/intel/svm.c | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
>
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index 50a481c895b8..588385050a07 100644
> --- a/drivers/iommu/intel/svm.c
> +++ b/drivers/iommu/intel/svm.c
> @@ -216,6 +216,27 @@ static void intel_flush_svm_range(struct intel_svm *svm, unsigned long address,
>  	rcu_read_unlock();
>  }
>  
> +static void intel_flush_svm_all(struct intel_svm *svm)
> +{
> +	struct device_domain_info *info;
> +	struct intel_svm_dev *sdev;
> +
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(sdev, &svm->devs, list) {
> +		info = dev_iommu_priv_get(sdev->dev);
> +
> +		qi_flush_piotlb(sdev->iommu, sdev->did, svm->pasid, 0, -1UL, 1);
> +		if (info->ats_enabled) {
> +			qi_flush_dev_iotlb_pasid(sdev->iommu, sdev->sid, info->pfsid,
> +						 svm->pasid, sdev->qdep,
> +						 0, 64 - VTD_PAGE_SHIFT);
> +			quirk_extra_dev_tlb_flush(info, 0, 64 - VTD_PAGE_SHIFT,
> +						  svm->pasid, sdev->qdep);
> +		}
> +	}
> +	rcu_read_unlock();
> +}
> +
>  /* Pages have been freed at this point */
>  static void intel_arch_invalidate_secondary_tlbs(struct mmu_notifier *mn,
>  					struct mm_struct *mm,
> @@ -223,6 +244,11 @@ static void intel_arch_invalidate_secondary_tlbs(struct mmu_notifier *mn,
>  {
>  	struct intel_svm *svm = container_of(mn, struct intel_svm, notifier);
>  
> +	if (start == 0 && end == -1UL) {
> +		intel_flush_svm_all(svm);
> +		return;
> +	}
> +
>  	intel_flush_svm_range(svm, start,
>  			      (end - start + PAGE_SIZE - 1) >> VTD_PAGE_SHIFT, 0);
>  }


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] iommu/vt-d: Fix incorrect cache invalidation for mm notification
  2023-11-19 23:57 ` Alistair Popple
@ 2023-11-20  2:55   ` Huang, Ying
  2023-11-22  6:14     ` Alistair Popple
  0 siblings, 1 reply; 7+ messages in thread
From: Huang, Ying @ 2023-11-20  2:55 UTC (permalink / raw)
  To: Alistair Popple
  Cc: Lu Baolu, Joerg Roedel, Will Deacon, Robin Murphy,
	Jason Gunthorpe, Kevin Tian, iommu, linux-kernel, stable,
	Luo Yuzhang, Tony Zhu, Nadav Amit

Alistair Popple <apopple@nvidia.com> writes:

> Lu Baolu <baolu.lu@linux.intel.com> writes:
>
>> Commit 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when
>> invalidating TLBs") moved the secondary TLB invalidations into the TLB
>> invalidation functions to ensure that all secondary TLB invalidations
>> happen at the same time as the CPU invalidation and added a flush-all
>> type of secondary TLB invalidation for the batched mode, where a range
>> of [0, -1UL) is used to indicates that the range extends to the end of
>> the address space.
>>
>> However, using an end address of -1UL caused an overflow in the Intel
>> IOMMU driver, where the end address was rounded up to the next page.
>> As a result, both the IOTLB and device ATC were not invalidated correctly.
>
> Thanks for catching. This fix looks good so:
>
> Reviewed-by: Alistair Popple <apopple@nvidia.com>
>
> However examining the fixes patch again I note that we are calling
> mmu_notifier_invalidate_range(mm, 0, -1UL) from
> arch_tlbbatch_add_pending() in arch/x86/include/asm/tlbflush.h.
>
> That seems suboptimal because we would be doing an invalidate all for
> every page unmap,

Yes.  This can be performance regression for IOMMU TLB flushing.  For
CPU, it's "flush smaller ranges with more IPI" vs. "flush whole range
with less IPI", and in general the later wins because the high overhead
of IPI.  But, IIUC, for IOMMU TLB, it becomes "flush smaller ranges"
vs. "flush whole range".  That is generally bad.  It may be better to
restore the original behavior.  Can we just pass the size of TLB
flushing in set_tlb_ubc_flush_pending()->arch_tlbbatch_add_pending(),
and flush the IOMMU TLB for the range?

> and as of db6c1f6f236d ("mm/tlbbatch: introduce
> arch_flush_tlb_batched_pending()") arch_flush_tlb_batched_pending()
> calls flush_tlb_mm() anyway. So I think we can probably drop the
> explicit notifier call from arch_flush_tlb_batched_pending().

arch_flush_tlb_batched_pending() is used when we need to change page
table (e.g., munmap()) in parallel with TLB flushing batching (e.g.,
try_to_unmap()).  The actual TLB flushing part for
set_tlb_ubc_flush_pending()->arch_tlbbatch_add_pending() is
try_to_unmap_flush()->arch_tlbbatch_flush().

> Will put togeather a patch for that.
>
>  - Alistair
>
>> Add a flush all helper function and call it when the invalidation range
>> is from 0 to -1UL, ensuring that the entire caches are invalidated
>> correctly.
>>

[snip]

--
Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH 1/1] iommu/vt-d: Fix incorrect cache invalidation for mm notification
  2023-11-17  9:09 [PATCH 1/1] iommu/vt-d: Fix incorrect cache invalidation for mm notification Lu Baolu
  2023-11-17 13:11 ` Jason Gunthorpe
  2023-11-19 23:57 ` Alistair Popple
@ 2023-11-20  3:45 ` Tian, Kevin
  2023-11-20  4:17   ` Baolu Lu
  2 siblings, 1 reply; 7+ messages in thread
From: Tian, Kevin @ 2023-11-20  3:45 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, Will Deacon, Robin Murphy,
	Jason Gunthorpe
  Cc: iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org, Huang, Ying, Alistair Popple,
	Luo, Yuzhang, Zhu, Tony

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Friday, November 17, 2023 5:10 PM
> 
> Commit 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when
> invalidating TLBs") moved the secondary TLB invalidations into the TLB
> invalidation functions to ensure that all secondary TLB invalidations
> happen at the same time as the CPU invalidation and added a flush-all
> type of secondary TLB invalidation for the batched mode, where a range
> of [0, -1UL) is used to indicates that the range extends to the end of
> the address space.
> 
> However, using an end address of -1UL caused an overflow in the Intel
> IOMMU driver, where the end address was rounded up to the next page.
> As a result, both the IOTLB and device ATC were not invalidated correctly.
> 
> Add a flush all helper function and call it when the invalidation range
> is from 0 to -1UL, ensuring that the entire caches are invalidated
> correctly.
> 
> Fixes: 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when
> invalidating TLBs")
> Cc: stable@vger.kernel.org
> Cc: Huang Ying <ying.huang@intel.com>
> Cc: Alistair Popple <apopple@nvidia.com>
> Tested-by: Luo Yuzhang <yuzhang.luo@intel.com> # QAT
> Tested-by: Tony Zhu <tony.zhu@intel.com> # DSA
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/intel/svm.c | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index 50a481c895b8..588385050a07 100644
> --- a/drivers/iommu/intel/svm.c
> +++ b/drivers/iommu/intel/svm.c
> @@ -216,6 +216,27 @@ static void intel_flush_svm_range(struct intel_svm
> *svm, unsigned long address,
>  	rcu_read_unlock();
>  }
> 
> +static void intel_flush_svm_all(struct intel_svm *svm)
> +{
> +	struct device_domain_info *info;
> +	struct intel_svm_dev *sdev;
> +
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(sdev, &svm->devs, list) {
> +		info = dev_iommu_priv_get(sdev->dev);
> +
> +		qi_flush_piotlb(sdev->iommu, sdev->did, svm->pasid, 0, -1UL,
> 1);

Why setting 'ih' to skip invalidating page structure caches? 

> +		if (info->ats_enabled) {
> +			qi_flush_dev_iotlb_pasid(sdev->iommu, sdev->sid,
> info->pfsid,
> +						 svm->pasid, sdev->qdep,
> +						 0, 64 - VTD_PAGE_SHIFT);
> +			quirk_extra_dev_tlb_flush(info, 0, 64 -
> VTD_PAGE_SHIFT,
> +						  svm->pasid, sdev->qdep);
> +		}
> +	}
> +	rcu_read_unlock();
> +}
> +
>  /* Pages have been freed at this point */
>  static void intel_arch_invalidate_secondary_tlbs(struct mmu_notifier *mn,
>  					struct mm_struct *mm,
> @@ -223,6 +244,11 @@ static void
> intel_arch_invalidate_secondary_tlbs(struct mmu_notifier *mn,
>  {
>  	struct intel_svm *svm = container_of(mn, struct intel_svm, notifier);
> 
> +	if (start == 0 && end == -1UL) {
> +		intel_flush_svm_all(svm);
> +		return;
> +	}
> +
>  	intel_flush_svm_range(svm, start,
>  			      (end - start + PAGE_SIZE - 1) >> VTD_PAGE_SHIFT,
> 0);
>  }
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] iommu/vt-d: Fix incorrect cache invalidation for mm notification
  2023-11-20  3:45 ` Tian, Kevin
@ 2023-11-20  4:17   ` Baolu Lu
  0 siblings, 0 replies; 7+ messages in thread
From: Baolu Lu @ 2023-11-20  4:17 UTC (permalink / raw)
  To: Tian, Kevin, Joerg Roedel, Will Deacon, Robin Murphy,
	Jason Gunthorpe
  Cc: baolu.lu, iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org, Huang, Ying, Alistair Popple,
	Luo, Yuzhang, Zhu, Tony

On 11/20/23 11:45 AM, Tian, Kevin wrote:
>> From: Lu Baolu<baolu.lu@linux.intel.com>
>> Sent: Friday, November 17, 2023 5:10 PM
>>
>> Commit 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when
>> invalidating TLBs") moved the secondary TLB invalidations into the TLB
>> invalidation functions to ensure that all secondary TLB invalidations
>> happen at the same time as the CPU invalidation and added a flush-all
>> type of secondary TLB invalidation for the batched mode, where a range
>> of [0, -1UL) is used to indicates that the range extends to the end of
>> the address space.
>>
>> However, using an end address of -1UL caused an overflow in the Intel
>> IOMMU driver, where the end address was rounded up to the next page.
>> As a result, both the IOTLB and device ATC were not invalidated correctly.
>>
>> Add a flush all helper function and call it when the invalidation range
>> is from 0 to -1UL, ensuring that the entire caches are invalidated
>> correctly.
>>
>> Fixes: 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when
>> invalidating TLBs")
>> Cc:stable@vger.kernel.org
>> Cc: Huang Ying<ying.huang@intel.com>
>> Cc: Alistair Popple<apopple@nvidia.com>
>> Tested-by: Luo Yuzhang<yuzhang.luo@intel.com>  # QAT
>> Tested-by: Tony Zhu<tony.zhu@intel.com>  # DSA
>> Signed-off-by: Lu Baolu<baolu.lu@linux.intel.com>
>> ---
>>   drivers/iommu/intel/svm.c | 26 ++++++++++++++++++++++++++
>>   1 file changed, 26 insertions(+)
>>
>> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
>> index 50a481c895b8..588385050a07 100644
>> --- a/drivers/iommu/intel/svm.c
>> +++ b/drivers/iommu/intel/svm.c
>> @@ -216,6 +216,27 @@ static void intel_flush_svm_range(struct intel_svm
>> *svm, unsigned long address,
>>   	rcu_read_unlock();
>>   }
>>
>> +static void intel_flush_svm_all(struct intel_svm *svm)
>> +{
>> +	struct device_domain_info *info;
>> +	struct intel_svm_dev *sdev;
>> +
>> +	rcu_read_lock();
>> +	list_for_each_entry_rcu(sdev, &svm->devs, list) {
>> +		info = dev_iommu_priv_get(sdev->dev);
>> +
>> +		qi_flush_piotlb(sdev->iommu, sdev->did, svm->pasid, 0, -1UL,
>> 1);
> Why setting 'ih' to skip invalidating page structure caches?

It should be set to '0'. Good catch! Thank you!

Best regards,
baolu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] iommu/vt-d: Fix incorrect cache invalidation for mm notification
  2023-11-20  2:55   ` Huang, Ying
@ 2023-11-22  6:14     ` Alistair Popple
  0 siblings, 0 replies; 7+ messages in thread
From: Alistair Popple @ 2023-11-22  6:14 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Lu Baolu, Joerg Roedel, Will Deacon, Robin Murphy,
	Jason Gunthorpe, Kevin Tian, iommu, linux-kernel, stable,
	Luo Yuzhang, Tony Zhu, Nadav Amit


"Huang, Ying" <ying.huang@intel.com> writes:

> Alistair Popple <apopple@nvidia.com> writes:
>
>> Lu Baolu <baolu.lu@linux.intel.com> writes:
>>
>>> Commit 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when
>>> invalidating TLBs") moved the secondary TLB invalidations into the TLB
>>> invalidation functions to ensure that all secondary TLB invalidations
>>> happen at the same time as the CPU invalidation and added a flush-all
>>> type of secondary TLB invalidation for the batched mode, where a range
>>> of [0, -1UL) is used to indicates that the range extends to the end of
>>> the address space.
>>>
>>> However, using an end address of -1UL caused an overflow in the Intel
>>> IOMMU driver, where the end address was rounded up to the next page.
>>> As a result, both the IOTLB and device ATC were not invalidated correctly.
>>
>> Thanks for catching. This fix looks good so:
>>
>> Reviewed-by: Alistair Popple <apopple@nvidia.com>
>>
>> However examining the fixes patch again I note that we are calling
>> mmu_notifier_invalidate_range(mm, 0, -1UL) from
>> arch_tlbbatch_add_pending() in arch/x86/include/asm/tlbflush.h.
>>
>> That seems suboptimal because we would be doing an invalidate all for
>> every page unmap,
>
> Yes.  This can be performance regression for IOMMU TLB flushing.  For
> CPU, it's "flush smaller ranges with more IPI" vs. "flush whole range
> with less IPI", and in general the later wins because the high overhead
> of IPI.  But, IIUC, for IOMMU TLB, it becomes "flush smaller ranges"
> vs. "flush whole range".  That is generally bad.

The "flush smaller ranges" vs. "flush whole range" is equally valid for
some architectures, or at least some implementations of SMMU on ARM
because flushing the whole range is a single IOMMU command vs. multiple
for flushing a range. See for example
https://lore.kernel.org/linux-arm-kernel/20230920052257.8615-1-nicolinc@nvidia.com/
which switches to a full invalidate depending on the range. I've no idea
if that's true more generally though, although a similar situation
existed on POWER9.

> It may be better to
> restore the original behavior.  Can we just pass the size of TLB
> flushing in set_tlb_ubc_flush_pending()->arch_tlbbatch_add_pending(),
> and flush the IOMMU TLB for the range?

Ideally we'd push the notifier call down the stack, closer to where the
actual HW tlb invalidate gets called. I think I was just getting lost
through all the indirection in the lower level x86_64 TLB flushing and
batching code though. Will take another look.

>> and as of db6c1f6f236d ("mm/tlbbatch: introduce
>> arch_flush_tlb_batched_pending()") arch_flush_tlb_batched_pending()
>> calls flush_tlb_mm() anyway. So I think we can probably drop the
>> explicit notifier call from arch_flush_tlb_batched_pending().
>
> arch_flush_tlb_batched_pending() is used when we need to change page
> table (e.g., munmap()) in parallel with TLB flushing batching (e.g.,
> try_to_unmap()).  The actual TLB flushing part for
> set_tlb_ubc_flush_pending()->arch_tlbbatch_add_pending() is
> try_to_unmap_flush()->arch_tlbbatch_flush().

Thanks for the pointer. I must have got arch_tlbbatch_flush() and
arch_flush_tlb_batched_pending() crossed at some point.

 - Alistair

>> Will put togeather a patch for that.
>>
>>  - Alistair
>>
>>> Add a flush all helper function and call it when the invalidation range
>>> is from 0 to -1UL, ensuring that the entire caches are invalidated
>>> correctly.
>>>
>
> [snip]


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-11-22  6:28 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-17  9:09 [PATCH 1/1] iommu/vt-d: Fix incorrect cache invalidation for mm notification Lu Baolu
2023-11-17 13:11 ` Jason Gunthorpe
2023-11-19 23:57 ` Alistair Popple
2023-11-20  2:55   ` Huang, Ying
2023-11-22  6:14     ` Alistair Popple
2023-11-20  3:45 ` Tian, Kevin
2023-11-20  4:17   ` Baolu Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).