public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] iommu/vt-d: Avoid superfluous IOTLB tracking in lazy mode
@ 2023-02-03 23:04 Jacob Pan
  2023-02-04  6:32 ` Baolu Lu
  2023-02-06 11:05 ` Robin Murphy
  0 siblings, 2 replies; 8+ messages in thread
From: Jacob Pan @ 2023-02-03 23:04 UTC (permalink / raw)
  To: LKML, iommu, Lu Baolu, Joerg Roedel
  Cc: David Woodhouse, Raj Ashok, Tian, Kevin, Yi Liu, Jacob Pan,
	stable, Sanjay Kumar

Intel IOMMU driver implements IOTLB flush queue with domain selective
or PASID selective invalidations. In this case there's no need to track
IOVA page range and sync IOTLBs, which may cause significant performance
hit.

This patch adds a check to avoid IOVA gather page and IOTLB sync for
the lazy path.

The performance difference on Sapphire Rapids 100Gb NIC is improved by
the following (as measured by iperf send):

w/o this fix~48 Gbits/s. with this fix ~54 Gbits/s

Cc: <stable@vger.kernel.org>
Tested-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/intel/iommu.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index b4878c7ac008..705a1c66691a 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4352,7 +4352,13 @@ static size_t intel_iommu_unmap(struct iommu_domain *domain,
 	if (dmar_domain->max_addr == iova + size)
 		dmar_domain->max_addr = iova;
 
-	iommu_iotlb_gather_add_page(domain, gather, iova, size);
+	/*
+	 * We do not use page-selective IOTLB invalidation in flush queue,
+	 * There is no need to track page and sync iotlb. Domain-selective or
+	 * PASID-selective validation are used in the flush queue.
+	 */
+	if (!gather->queued)
+		iommu_iotlb_gather_add_page(domain, gather, iova, size);
 
 	return size;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] iommu/vt-d: Avoid superfluous IOTLB tracking in lazy mode
  2023-02-03 23:04 [PATCH] iommu/vt-d: Avoid superfluous IOTLB tracking in lazy mode Jacob Pan
@ 2023-02-04  6:32 ` Baolu Lu
  2023-02-06  3:48   ` Tian, Kevin
  2023-02-06 11:20   ` Robin Murphy
  2023-02-06 11:05 ` Robin Murphy
  1 sibling, 2 replies; 8+ messages in thread
From: Baolu Lu @ 2023-02-04  6:32 UTC (permalink / raw)
  To: Jacob Pan, LKML, iommu, Joerg Roedel
  Cc: baolu.lu, David Woodhouse, Raj Ashok, Tian, Kevin, Yi Liu, stable,
	Sanjay Kumar, Robin Murphy

On 2023/2/4 7:04, Jacob Pan wrote:
> Intel IOMMU driver implements IOTLB flush queue with domain selective
> or PASID selective invalidations. In this case there's no need to track
> IOVA page range and sync IOTLBs, which may cause significant performance
> hit.

[Add cc Robin]

If I understand this patch correctly, this might be caused by below
helper:

/**
  * iommu_iotlb_gather_add_page - Gather for page-based TLB invalidation
  * @domain: IOMMU domain to be invalidated
  * @gather: TLB gather data
  * @iova: start of page to invalidate
  * @size: size of page to invalidate
  *
  * Helper for IOMMU drivers to build invalidation commands based on 
individual
  * pages, or with page size/table level hints which cannot be gathered 
if they
  * differ.
  */
static inline void iommu_iotlb_gather_add_page(struct iommu_domain *domain,
                                                struct 
iommu_iotlb_gather *gather,
                                                unsigned long iova, 
size_t size)
{
         /*
          * If the new page is disjoint from the current range or is 
mapped at
          * a different granularity, then sync the TLB so that the gather
          * structure can be rewritten.
          */
         if ((gather->pgsize && gather->pgsize != size) ||
             iommu_iotlb_gather_is_disjoint(gather, iova, size))
                 iommu_iotlb_sync(domain, gather);

         gather->pgsize = size;
         iommu_iotlb_gather_add_range(gather, iova, size);
}

As the comments for iommu_iotlb_gather_is_disjoint() says,

"...For many IOMMUs, flushing the IOMMU in this case is better
  than merging the two, which might lead to unnecessary invalidations.
  ..."

So, perhaps the right fix for this performance issue is to add

	if (!gather->queued)

in iommu_iotlb_gather_add_page() or iommu_iotlb_gather_is_disjoint()?
It should benefit other arch's as well.

> 
> This patch adds a check to avoid IOVA gather page and IOTLB sync for
> the lazy path.
> 
> The performance difference on Sapphire Rapids 100Gb NIC is improved by
> the following (as measured by iperf send):

Which test case have you done? Post the real data if you have any.

> 
> w/o this fix~48 Gbits/s. with this fix ~54 Gbits/s
> 
> Cc: <stable@vger.kernel.org>

Again, add a Fixes tag so that people know how far this fix should be
back ported.

> Tested-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
> Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>

Best regards,
baolu


^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] iommu/vt-d: Avoid superfluous IOTLB tracking in lazy mode
  2023-02-04  6:32 ` Baolu Lu
@ 2023-02-06  3:48   ` Tian, Kevin
  2023-02-07  6:45     ` Baolu Lu
  2023-02-06 11:20   ` Robin Murphy
  1 sibling, 1 reply; 8+ messages in thread
From: Tian, Kevin @ 2023-02-06  3:48 UTC (permalink / raw)
  To: Baolu Lu, Jacob Pan, LKML, iommu@lists.linux.dev, Joerg Roedel
  Cc: David Woodhouse, Raj, Ashok, Liu, Yi L, stable@vger.kernel.org,
	Kumar, Sanjay K, Robin Murphy

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Saturday, February 4, 2023 2:32 PM
> 
> On 2023/2/4 7:04, Jacob Pan wrote:
> > Intel IOMMU driver implements IOTLB flush queue with domain selective
> > or PASID selective invalidations. In this case there's no need to track
> > IOVA page range and sync IOTLBs, which may cause significant
> performance
> > hit.
> 
> [Add cc Robin]
> 
> If I understand this patch correctly, this might be caused by below
> helper:
> 
> /**
>   * iommu_iotlb_gather_add_page - Gather for page-based TLB invalidation
>   * @domain: IOMMU domain to be invalidated
>   * @gather: TLB gather data
>   * @iova: start of page to invalidate
>   * @size: size of page to invalidate
>   *
>   * Helper for IOMMU drivers to build invalidation commands based on
> individual
>   * pages, or with page size/table level hints which cannot be gathered
> if they
>   * differ.
>   */
> static inline void iommu_iotlb_gather_add_page(struct iommu_domain
> *domain,
>                                                 struct
> iommu_iotlb_gather *gather,
>                                                 unsigned long iova,
> size_t size)
> {
>          /*
>           * If the new page is disjoint from the current range or is
> mapped at
>           * a different granularity, then sync the TLB so that the gather
>           * structure can be rewritten.
>           */
>          if ((gather->pgsize && gather->pgsize != size) ||
>              iommu_iotlb_gather_is_disjoint(gather, iova, size))
>                  iommu_iotlb_sync(domain, gather);
> 
>          gather->pgsize = size;
>          iommu_iotlb_gather_add_range(gather, iova, size);
> }
> 
> As the comments for iommu_iotlb_gather_is_disjoint() says,
> 
> "...For many IOMMUs, flushing the IOMMU in this case is better
>   than merging the two, which might lead to unnecessary invalidations.
>   ..."
> 
> So, perhaps the right fix for this performance issue is to add
> 
> 	if (!gather->queued)
> 
> in iommu_iotlb_gather_add_page() or iommu_iotlb_gather_is_disjoint()?
> It should benefit other arch's as well.
> 

There are only two callers of this helper: intel and arm-smmu-v3.

Looks other drivers just implements direct flush via io_pgtable_tlb_add_page().

and their unmap callback typically does:

if (!iommu_iotlb_gather_queued(gather))
	io_pgtable_tlb_add_page();

from this angle it's same policy as Jacob's does, i.e. if it's already
queued then no need to further call optimization for direct flush.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] iommu/vt-d: Avoid superfluous IOTLB tracking in lazy mode
  2023-02-03 23:04 [PATCH] iommu/vt-d: Avoid superfluous IOTLB tracking in lazy mode Jacob Pan
  2023-02-04  6:32 ` Baolu Lu
@ 2023-02-06 11:05 ` Robin Murphy
  1 sibling, 0 replies; 8+ messages in thread
From: Robin Murphy @ 2023-02-06 11:05 UTC (permalink / raw)
  To: Jacob Pan, LKML, iommu, Lu Baolu, Joerg Roedel
  Cc: David Woodhouse, Raj Ashok, Tian, Kevin, Yi Liu, stable,
	Sanjay Kumar

On 2023-02-03 23:04, Jacob Pan wrote:
> Intel IOMMU driver implements IOTLB flush queue with domain selective
> or PASID selective invalidations. In this case there's no need to track
> IOVA page range and sync IOTLBs, which may cause significant performance
> hit.
> 
> This patch adds a check to avoid IOVA gather page and IOTLB sync for
> the lazy path.
> 
> The performance difference on Sapphire Rapids 100Gb NIC is improved by
> the following (as measured by iperf send):
> 
> w/o this fix~48 Gbits/s. with this fix ~54 Gbits/s

Reviewed-by: Robin Murphy <robin.murphy@arm.com>

> Cc: <stable@vger.kernel.org>
> Tested-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
> Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
>   drivers/iommu/intel/iommu.c | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index b4878c7ac008..705a1c66691a 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -4352,7 +4352,13 @@ static size_t intel_iommu_unmap(struct iommu_domain *domain,
>   	if (dmar_domain->max_addr == iova + size)
>   		dmar_domain->max_addr = iova;
>   
> -	iommu_iotlb_gather_add_page(domain, gather, iova, size);
> +	/*
> +	 * We do not use page-selective IOTLB invalidation in flush queue,
> +	 * There is no need to track page and sync iotlb. Domain-selective or
> +	 * PASID-selective validation are used in the flush queue.
> +	 */
> +	if (!gather->queued)
> +		iommu_iotlb_gather_add_page(domain, gather, iova, size);
>   
>   	return size;
>   }

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] iommu/vt-d: Avoid superfluous IOTLB tracking in lazy mode
  2023-02-04  6:32 ` Baolu Lu
  2023-02-06  3:48   ` Tian, Kevin
@ 2023-02-06 11:20   ` Robin Murphy
  2023-02-07  6:42     ` Baolu Lu
  1 sibling, 1 reply; 8+ messages in thread
From: Robin Murphy @ 2023-02-06 11:20 UTC (permalink / raw)
  To: Baolu Lu, Jacob Pan, LKML, iommu, Joerg Roedel
  Cc: David Woodhouse, Raj Ashok, Tian, Kevin, Yi Liu, stable,
	Sanjay Kumar

On 2023-02-04 06:32, Baolu Lu wrote:
> On 2023/2/4 7:04, Jacob Pan wrote:
>> Intel IOMMU driver implements IOTLB flush queue with domain selective
>> or PASID selective invalidations. In this case there's no need to track
>> IOVA page range and sync IOTLBs, which may cause significant performance
>> hit.
> 
> [Add cc Robin]
> 
> If I understand this patch correctly, this might be caused by below
> helper:
> 
> /**
>   * iommu_iotlb_gather_add_page - Gather for page-based TLB invalidation
>   * @domain: IOMMU domain to be invalidated
>   * @gather: TLB gather data
>   * @iova: start of page to invalidate
>   * @size: size of page to invalidate
>   *
>   * Helper for IOMMU drivers to build invalidation commands based on 
> individual
>   * pages, or with page size/table level hints which cannot be gathered 
> if they
>   * differ.
>   */
> static inline void iommu_iotlb_gather_add_page(struct iommu_domain *domain,
>                                                 struct 
> iommu_iotlb_gather *gather,
>                                                 unsigned long iova, 
> size_t size)
> {
>          /*
>           * If the new page is disjoint from the current range or is 
> mapped at
>           * a different granularity, then sync the TLB so that the gather
>           * structure can be rewritten.
>           */
>          if ((gather->pgsize && gather->pgsize != size) ||
>              iommu_iotlb_gather_is_disjoint(gather, iova, size))
>                  iommu_iotlb_sync(domain, gather);
> 
>          gather->pgsize = size;
>          iommu_iotlb_gather_add_range(gather, iova, size);
> }
> 
> As the comments for iommu_iotlb_gather_is_disjoint() says,
> 
> "...For many IOMMUs, flushing the IOMMU in this case is better
>   than merging the two, which might lead to unnecessary invalidations.
>   ..."
> 
> So, perhaps the right fix for this performance issue is to add
> 
>      if (!gather->queued)
> 
> in iommu_iotlb_gather_add_page() or iommu_iotlb_gather_is_disjoint()?
> It should benefit other arch's as well.

The iotlb_gather helpers are really just that - little tools to help 
drivers with various common iotlb_gather accounting patterns. The 
decision whether to bother with that accounting at all should really 
come beforehand, and whether a driver supports flush queues is 
orthogonal to whether it uses any particular gather helper(s) or not, so 
I think the patch as-is is correct.

>> This patch adds a check to avoid IOVA gather page and IOTLB sync for
>> the lazy path.
>>
>> The performance difference on Sapphire Rapids 100Gb NIC is improved by
>> the following (as measured by iperf send):
> 
> Which test case have you done? Post the real data if you have any.
> 
>>
>> w/o this fix~48 Gbits/s. with this fix ~54 Gbits/s
>>
>> Cc: <stable@vger.kernel.org>
> 
> Again, add a Fixes tag so that people know how far this fix should be
> back ported.

Note that the overall issue probably dates back to the initial iommu-dma 
conversion, but if you think it's important enough to go back beyond 
5.15 when gather->queued was introduced, that'll need a different fix.

Cheers,
Robin.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] iommu/vt-d: Avoid superfluous IOTLB tracking in lazy mode
  2023-02-06 11:20   ` Robin Murphy
@ 2023-02-07  6:42     ` Baolu Lu
  0 siblings, 0 replies; 8+ messages in thread
From: Baolu Lu @ 2023-02-07  6:42 UTC (permalink / raw)
  To: Robin Murphy, Jacob Pan, LKML, iommu, Joerg Roedel
  Cc: baolu.lu, David Woodhouse, Raj Ashok, Tian, Kevin, Yi Liu, stable,
	Sanjay Kumar

On 2023/2/6 19:20, Robin Murphy wrote:
> On 2023-02-04 06:32, Baolu Lu wrote:
>> On 2023/2/4 7:04, Jacob Pan wrote:
>>> Intel IOMMU driver implements IOTLB flush queue with domain selective
>>> or PASID selective invalidations. In this case there's no need to track
>>> IOVA page range and sync IOTLBs, which may cause significant performance
>>> hit.
>>
>> [Add cc Robin]
>>
>> If I understand this patch correctly, this might be caused by below
>> helper:
>>
>> /**
>>   * iommu_iotlb_gather_add_page - Gather for page-based TLB invalidation
>>   * @domain: IOMMU domain to be invalidated
>>   * @gather: TLB gather data
>>   * @iova: start of page to invalidate
>>   * @size: size of page to invalidate
>>   *
>>   * Helper for IOMMU drivers to build invalidation commands based on 
>> individual
>>   * pages, or with page size/table level hints which cannot be 
>> gathered if they
>>   * differ.
>>   */
>> static inline void iommu_iotlb_gather_add_page(struct iommu_domain 
>> *domain,
>>                                                 struct 
>> iommu_iotlb_gather *gather,
>>                                                 unsigned long iova, 
>> size_t size)
>> {
>>          /*
>>           * If the new page is disjoint from the current range or is 
>> mapped at
>>           * a different granularity, then sync the TLB so that the gather
>>           * structure can be rewritten.
>>           */
>>          if ((gather->pgsize && gather->pgsize != size) ||
>>              iommu_iotlb_gather_is_disjoint(gather, iova, size))
>>                  iommu_iotlb_sync(domain, gather);
>>
>>          gather->pgsize = size;
>>          iommu_iotlb_gather_add_range(gather, iova, size);
>> }
>>
>> As the comments for iommu_iotlb_gather_is_disjoint() says,
>>
>> "...For many IOMMUs, flushing the IOMMU in this case is better
>>   than merging the two, which might lead to unnecessary invalidations.
>>   ..."
>>
>> So, perhaps the right fix for this performance issue is to add
>>
>>      if (!gather->queued)
>>
>> in iommu_iotlb_gather_add_page() or iommu_iotlb_gather_is_disjoint()?
>> It should benefit other arch's as well.
> 
> The iotlb_gather helpers are really just that - little tools to help 
> drivers with various common iotlb_gather accounting patterns. The 
> decision whether to bother with that accounting at all should really 
> come beforehand, and whether a driver supports flush queues is 
> orthogonal to whether it uses any particular gather helper(s) or not, so 
> I think the patch as-is is correct.

Okay, that's fine. Thanks for the explanation.

> 
>>> This patch adds a check to avoid IOVA gather page and IOTLB sync for
>>> the lazy path.
>>>
>>> The performance difference on Sapphire Rapids 100Gb NIC is improved by
>>> the following (as measured by iperf send):
>>
>> Which test case have you done? Post the real data if you have any.
>>
>>>
>>> w/o this fix~48 Gbits/s. with this fix ~54 Gbits/s
>>>
>>> Cc: <stable@vger.kernel.org>
>>
>> Again, add a Fixes tag so that people know how far this fix should be
>> back ported.
> 
> Note that the overall issue probably dates back to the initial iommu-dma 
> conversion, but if you think it's important enough to go back beyond 
> 5.15 when gather->queued was introduced, that'll need a different fix.

So perhaps

Cc: stable@vger.kernel.org # 5.15+

is enough?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] iommu/vt-d: Avoid superfluous IOTLB tracking in lazy mode
  2023-02-06  3:48   ` Tian, Kevin
@ 2023-02-07  6:45     ` Baolu Lu
  2023-02-07  9:18       ` Tian, Kevin
  0 siblings, 1 reply; 8+ messages in thread
From: Baolu Lu @ 2023-02-07  6:45 UTC (permalink / raw)
  To: Tian, Kevin, Jacob Pan, LKML, iommu@lists.linux.dev, Joerg Roedel
  Cc: baolu.lu, David Woodhouse, Raj, Ashok, Liu, Yi L,
	stable@vger.kernel.org, Kumar, Sanjay K, Robin Murphy

On 2023/2/6 11:48, Tian, Kevin wrote:
>> From: Baolu Lu <baolu.lu@linux.intel.com>
>> Sent: Saturday, February 4, 2023 2:32 PM
>>
>> On 2023/2/4 7:04, Jacob Pan wrote:
>>> Intel IOMMU driver implements IOTLB flush queue with domain selective
>>> or PASID selective invalidations. In this case there's no need to track
>>> IOVA page range and sync IOTLBs, which may cause significant
>> performance
>>> hit.
>>
>> [Add cc Robin]
>>
>> If I understand this patch correctly, this might be caused by below
>> helper:
>>
>> /**
>>    * iommu_iotlb_gather_add_page - Gather for page-based TLB invalidation
>>    * @domain: IOMMU domain to be invalidated
>>    * @gather: TLB gather data
>>    * @iova: start of page to invalidate
>>    * @size: size of page to invalidate
>>    *
>>    * Helper for IOMMU drivers to build invalidation commands based on
>> individual
>>    * pages, or with page size/table level hints which cannot be gathered
>> if they
>>    * differ.
>>    */
>> static inline void iommu_iotlb_gather_add_page(struct iommu_domain
>> *domain,
>>                                                  struct
>> iommu_iotlb_gather *gather,
>>                                                  unsigned long iova,
>> size_t size)
>> {
>>           /*
>>            * If the new page is disjoint from the current range or is
>> mapped at
>>            * a different granularity, then sync the TLB so that the gather
>>            * structure can be rewritten.
>>            */
>>           if ((gather->pgsize && gather->pgsize != size) ||
>>               iommu_iotlb_gather_is_disjoint(gather, iova, size))
>>                   iommu_iotlb_sync(domain, gather);
>>
>>           gather->pgsize = size;
>>           iommu_iotlb_gather_add_range(gather, iova, size);
>> }
>>
>> As the comments for iommu_iotlb_gather_is_disjoint() says,
>>
>> "...For many IOMMUs, flushing the IOMMU in this case is better
>>    than merging the two, which might lead to unnecessary invalidations.
>>    ..."
>>
>> So, perhaps the right fix for this performance issue is to add
>>
>> 	if (!gather->queued)
>>
>> in iommu_iotlb_gather_add_page() or iommu_iotlb_gather_is_disjoint()?
>> It should benefit other arch's as well.
>>
> 
> There are only two callers of this helper: intel and arm-smmu-v3.
> 
> Looks other drivers just implements direct flush via io_pgtable_tlb_add_page().
> 
> and their unmap callback typically does:
> 
> if (!iommu_iotlb_gather_queued(gather))
> 	io_pgtable_tlb_add_page();
> 
> from this angle it's same policy as Jacob's does, i.e. if it's already
> queued then no need to further call optimization for direct flush.

Perhaps we can use iommu_iotlb_gather_queued() to replace direct
gather->queued check in this patch as well?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] iommu/vt-d: Avoid superfluous IOTLB tracking in lazy mode
  2023-02-07  6:45     ` Baolu Lu
@ 2023-02-07  9:18       ` Tian, Kevin
  0 siblings, 0 replies; 8+ messages in thread
From: Tian, Kevin @ 2023-02-07  9:18 UTC (permalink / raw)
  To: Baolu Lu, Jacob Pan, LKML, iommu@lists.linux.dev, Joerg Roedel
  Cc: David Woodhouse, Raj, Ashok, Liu, Yi L, stable@vger.kernel.org,
	Kumar, Sanjay K, Robin Murphy

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Tuesday, February 7, 2023 2:46 PM
> 
> On 2023/2/6 11:48, Tian, Kevin wrote:
> >> From: Baolu Lu <baolu.lu@linux.intel.com>
> >> Sent: Saturday, February 4, 2023 2:32 PM
> >>
> >> On 2023/2/4 7:04, Jacob Pan wrote:
> >>> Intel IOMMU driver implements IOTLB flush queue with domain selective
> >>> or PASID selective invalidations. In this case there's no need to track
> >>> IOVA page range and sync IOTLBs, which may cause significant
> >> performance
> >>> hit.
> >>
> >> [Add cc Robin]
> >>
> >> If I understand this patch correctly, this might be caused by below
> >> helper:
> >>
> >> /**
> >>    * iommu_iotlb_gather_add_page - Gather for page-based TLB
> invalidation
> >>    * @domain: IOMMU domain to be invalidated
> >>    * @gather: TLB gather data
> >>    * @iova: start of page to invalidate
> >>    * @size: size of page to invalidate
> >>    *
> >>    * Helper for IOMMU drivers to build invalidation commands based on
> >> individual
> >>    * pages, or with page size/table level hints which cannot be gathered
> >> if they
> >>    * differ.
> >>    */
> >> static inline void iommu_iotlb_gather_add_page(struct iommu_domain
> >> *domain,
> >>                                                  struct
> >> iommu_iotlb_gather *gather,
> >>                                                  unsigned long iova,
> >> size_t size)
> >> {
> >>           /*
> >>            * If the new page is disjoint from the current range or is
> >> mapped at
> >>            * a different granularity, then sync the TLB so that the gather
> >>            * structure can be rewritten.
> >>            */
> >>           if ((gather->pgsize && gather->pgsize != size) ||
> >>               iommu_iotlb_gather_is_disjoint(gather, iova, size))
> >>                   iommu_iotlb_sync(domain, gather);
> >>
> >>           gather->pgsize = size;
> >>           iommu_iotlb_gather_add_range(gather, iova, size);
> >> }
> >>
> >> As the comments for iommu_iotlb_gather_is_disjoint() says,
> >>
> >> "...For many IOMMUs, flushing the IOMMU in this case is better
> >>    than merging the two, which might lead to unnecessary invalidations.
> >>    ..."
> >>
> >> So, perhaps the right fix for this performance issue is to add
> >>
> >> 	if (!gather->queued)
> >>
> >> in iommu_iotlb_gather_add_page() or iommu_iotlb_gather_is_disjoint()?
> >> It should benefit other arch's as well.
> >>
> >
> > There are only two callers of this helper: intel and arm-smmu-v3.
> >
> > Looks other drivers just implements direct flush via
> io_pgtable_tlb_add_page().
> >
> > and their unmap callback typically does:
> >
> > if (!iommu_iotlb_gather_queued(gather))
> > 	io_pgtable_tlb_add_page();
> >
> > from this angle it's same policy as Jacob's does, i.e. if it's already
> > queued then no need to further call optimization for direct flush.
> 
> Perhaps we can use iommu_iotlb_gather_queued() to replace direct
> gather->queued check in this patch as well?
> 

yes

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-02-07  9:18 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-03 23:04 [PATCH] iommu/vt-d: Avoid superfluous IOTLB tracking in lazy mode Jacob Pan
2023-02-04  6:32 ` Baolu Lu
2023-02-06  3:48   ` Tian, Kevin
2023-02-07  6:45     ` Baolu Lu
2023-02-07  9:18       ` Tian, Kevin
2023-02-06 11:20   ` Robin Murphy
2023-02-07  6:42     ` Baolu Lu
2023-02-06 11:05 ` Robin Murphy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox