Linux IOMMU Development
 help / color / mirror / Atom feed
* [PATCH] iommu/iova: Don't reset cached_node in dac deallocation
@ 2023-05-04 12:56 Zaid Alali
  2023-05-05 18:55 ` Robin Murphy
  0 siblings, 1 reply; 2+ messages in thread
From: Zaid Alali @ 2023-05-04 12:56 UTC (permalink / raw)
  To: Robin Murphy, Joerg Roedel, Will Deacon, iommu; +Cc: D Scott Phillips

The iova allocator has two rbtrees for allocations that are not satisfied
by rcache. The two rbtrees track iovas for the ranges of 32bit address
space and larger address space >32bit. On deallocation, the cached_node
is updated to point to the deallocated iova.

Because the cached_node is moved to point to the recently deallocated
iova with higher address, the first-fit allocator needs to walk the
rbtree backwards skipping holes that do not fit while holding
iova_rbtree_lock, which impacts performance and can cause soft-lockups.
On deallocation, do not reset the cached_node to the freed iova for the
rbtree tracking the dac addresses and keep moving forward with new
allocations. This only affects addresses > 32bit.

This patch was tested with ‘iommu.forcedac=1’ and 20 dd read instances
of 8GB from nvme as well as kernel compilation running in parallel.
The test results obtained from /proc/lock_stat shows the following
improvements for iovad->iova_rbtree_lock:

	Wait time average: reduced by 31%
	Hold time average: reduced by 60%

Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com>
Signed-off-by: Zaid Alali <zaidal@os.amperecomputing.com>
---
 drivers/iommu/iova.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index fe452ce46..d2a6cb573 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -106,7 +106,7 @@ __cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
		iovad->max32_alloc_size = iovad->dma_32bit_pfn;
 
	cached_iova = to_iova(iovad->cached_node);
-	if (free->pfn_lo >= cached_iova->pfn_lo)
+	if (free == cached_iova)
		iovad->cached_node = rb_next(&free->node);
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] iommu/iova: Don't reset cached_node in dac deallocation
  2023-05-04 12:56 [PATCH] iommu/iova: Don't reset cached_node in dac deallocation Zaid Alali
@ 2023-05-05 18:55 ` Robin Murphy
  0 siblings, 0 replies; 2+ messages in thread
From: Robin Murphy @ 2023-05-05 18:55 UTC (permalink / raw)
  To: Zaid Alali, Joerg Roedel, Will Deacon, iommu; +Cc: D Scott Phillips

On 2023-05-04 13:56, Zaid Alali wrote:
> The iova allocator has two rbtrees for allocations that are not satisfied
> by rcache. The two rbtrees track iovas for the ranges of 32bit address
> space and larger address space >32bit. On deallocation, the cached_node
> is updated to point to the deallocated iova.
> 
> Because the cached_node is moved to point to the recently deallocated
> iova with higher address, the first-fit allocator needs to walk the
> rbtree backwards skipping holes that do not fit while holding
> iova_rbtree_lock, which impacts performance and can cause soft-lockups.
> On deallocation, do not reset the cached_node to the freed iova for the
> rbtree tracking the dac addresses and keep moving forward with new
> allocations. This only affects addresses > 32bit.

The trouble with this is the long-term impact: the cached node basically 
never moves upwards, so over time as new IOVA allocations continue, DMA 
working sets slowly and steadily move down through their respective 
address spaces, leaving allocated-but-empty pagetables above. Given 
enough time, all memory is pagetables and the system withers and dies :(

> This patch was tested with ‘iommu.forcedac=1’ and 20 dd read instances
> of 8GB from nvme as well as kernel compilation running in parallel.

Hmm, if it's the case that you're hitting the rbtree all the time 
because your NVMe thinks it wants chunks that are too big for the IOVA 
rcaches, you might like this thread even more:

https://lore.kernel.org/linux-iommu/20230503161759.GA1614@lst.de/

Thanks,
Robin.

> The test results obtained from /proc/lock_stat shows the following
> improvements for iovad->iova_rbtree_lock:
> 
> 	Wait time average: reduced by 31%
> 	Hold time average: reduced by 60%
> 
> Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com>
> Signed-off-by: Zaid Alali <zaidal@os.amperecomputing.com>
> ---
>   drivers/iommu/iova.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index fe452ce46..d2a6cb573 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -106,7 +106,7 @@ __cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
> 		iovad->max32_alloc_size = iovad->dma_32bit_pfn;
>   
> 	cached_iova = to_iova(iovad->cached_node);
> -	if (free->pfn_lo >= cached_iova->pfn_lo)
> +	if (free == cached_iova)
> 		iovad->cached_node = rb_next(&free->node);
>   }
>   

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-05-05 18:55 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-04 12:56 [PATCH] iommu/iova: Don't reset cached_node in dac deallocation Zaid Alali
2023-05-05 18:55 ` Robin Murphy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox