* [PATCH] iommu/iova: Don't reset cached_node in dac deallocation
@ 2023-05-04 12:56 Zaid Alali
2023-05-05 18:55 ` Robin Murphy
0 siblings, 1 reply; 2+ messages in thread
From: Zaid Alali @ 2023-05-04 12:56 UTC (permalink / raw)
To: Robin Murphy, Joerg Roedel, Will Deacon, iommu; +Cc: D Scott Phillips
The iova allocator has two rbtrees for allocations that are not satisfied
by rcache. The two rbtrees track iovas for the ranges of 32bit address
space and larger address space >32bit. On deallocation, the cached_node
is updated to point to the deallocated iova.
Because the cached_node is moved to point to the recently deallocated
iova with higher address, the first-fit allocator needs to walk the
rbtree backwards skipping holes that do not fit while holding
iova_rbtree_lock, which impacts performance and can cause soft-lockups.
On deallocation, do not reset the cached_node to the freed iova for the
rbtree tracking the dac addresses and keep moving forward with new
allocations. This only affects addresses > 32bit.
This patch was tested with ‘iommu.forcedac=1’ and 20 dd read instances
of 8GB from nvme as well as kernel compilation running in parallel.
The test results obtained from /proc/lock_stat shows the following
improvements for iovad->iova_rbtree_lock:
Wait time average: reduced by 31%
Hold time average: reduced by 60%
Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com>
Signed-off-by: Zaid Alali <zaidal@os.amperecomputing.com>
---
drivers/iommu/iova.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index fe452ce46..d2a6cb573 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -106,7 +106,7 @@ __cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
iovad->max32_alloc_size = iovad->dma_32bit_pfn;
cached_iova = to_iova(iovad->cached_node);
- if (free->pfn_lo >= cached_iova->pfn_lo)
+ if (free == cached_iova)
iovad->cached_node = rb_next(&free->node);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH] iommu/iova: Don't reset cached_node in dac deallocation
2023-05-04 12:56 [PATCH] iommu/iova: Don't reset cached_node in dac deallocation Zaid Alali
@ 2023-05-05 18:55 ` Robin Murphy
0 siblings, 0 replies; 2+ messages in thread
From: Robin Murphy @ 2023-05-05 18:55 UTC (permalink / raw)
To: Zaid Alali, Joerg Roedel, Will Deacon, iommu; +Cc: D Scott Phillips
On 2023-05-04 13:56, Zaid Alali wrote:
> The iova allocator has two rbtrees for allocations that are not satisfied
> by rcache. The two rbtrees track iovas for the ranges of 32bit address
> space and larger address space >32bit. On deallocation, the cached_node
> is updated to point to the deallocated iova.
>
> Because the cached_node is moved to point to the recently deallocated
> iova with higher address, the first-fit allocator needs to walk the
> rbtree backwards skipping holes that do not fit while holding
> iova_rbtree_lock, which impacts performance and can cause soft-lockups.
> On deallocation, do not reset the cached_node to the freed iova for the
> rbtree tracking the dac addresses and keep moving forward with new
> allocations. This only affects addresses > 32bit.
The trouble with this is the long-term impact: the cached node basically
never moves upwards, so over time as new IOVA allocations continue, DMA
working sets slowly and steadily move down through their respective
address spaces, leaving allocated-but-empty pagetables above. Given
enough time, all memory is pagetables and the system withers and dies :(
> This patch was tested with ‘iommu.forcedac=1’ and 20 dd read instances
> of 8GB from nvme as well as kernel compilation running in parallel.
Hmm, if it's the case that you're hitting the rbtree all the time
because your NVMe thinks it wants chunks that are too big for the IOVA
rcaches, you might like this thread even more:
https://lore.kernel.org/linux-iommu/20230503161759.GA1614@lst.de/
Thanks,
Robin.
> The test results obtained from /proc/lock_stat shows the following
> improvements for iovad->iova_rbtree_lock:
>
> Wait time average: reduced by 31%
> Hold time average: reduced by 60%
>
> Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com>
> Signed-off-by: Zaid Alali <zaidal@os.amperecomputing.com>
> ---
> drivers/iommu/iova.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index fe452ce46..d2a6cb573 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -106,7 +106,7 @@ __cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
> iovad->max32_alloc_size = iovad->dma_32bit_pfn;
>
> cached_iova = to_iova(iovad->cached_node);
> - if (free->pfn_lo >= cached_iova->pfn_lo)
> + if (free == cached_iova)
> iovad->cached_node = rb_next(&free->node);
> }
>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2023-05-05 18:55 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-04 12:56 [PATCH] iommu/iova: Don't reset cached_node in dac deallocation Zaid Alali
2023-05-05 18:55 ` Robin Murphy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox