From: Robin Murphy <robin.murphy@arm.com>
To: Zaid Alali <zaidal@os.amperecomputing.com>,
Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>,
iommu@lists.linux.dev
Cc: D Scott Phillips <scott@os.amperecomputing.com>
Subject: Re: [PATCH] iommu/iova: Don't reset cached_node in dac deallocation
Date: Fri, 5 May 2023 19:55:17 +0100 [thread overview]
Message-ID: <7324a84b-09e5-c86d-4e11-dc970f124fea@arm.com> (raw)
In-Reply-To: <ZFOri/cwNVeEIPAf@zaid-VirtualBox>
On 2023-05-04 13:56, Zaid Alali wrote:
> The iova allocator has two rbtrees for allocations that are not satisfied
> by rcache. The two rbtrees track iovas for the ranges of 32bit address
> space and larger address space >32bit. On deallocation, the cached_node
> is updated to point to the deallocated iova.
>
> Because the cached_node is moved to point to the recently deallocated
> iova with higher address, the first-fit allocator needs to walk the
> rbtree backwards skipping holes that do not fit while holding
> iova_rbtree_lock, which impacts performance and can cause soft-lockups.
> On deallocation, do not reset the cached_node to the freed iova for the
> rbtree tracking the dac addresses and keep moving forward with new
> allocations. This only affects addresses > 32bit.
The trouble with this is the long-term impact: the cached node basically
never moves upwards, so over time as new IOVA allocations continue, DMA
working sets slowly and steadily move down through their respective
address spaces, leaving allocated-but-empty pagetables above. Given
enough time, all memory is pagetables and the system withers and dies :(
> This patch was tested with ‘iommu.forcedac=1’ and 20 dd read instances
> of 8GB from nvme as well as kernel compilation running in parallel.
Hmm, if it's the case that you're hitting the rbtree all the time
because your NVMe thinks it wants chunks that are too big for the IOVA
rcaches, you might like this thread even more:
https://lore.kernel.org/linux-iommu/20230503161759.GA1614@lst.de/
Thanks,
Robin.
> The test results obtained from /proc/lock_stat shows the following
> improvements for iovad->iova_rbtree_lock:
>
> Wait time average: reduced by 31%
> Hold time average: reduced by 60%
>
> Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com>
> Signed-off-by: Zaid Alali <zaidal@os.amperecomputing.com>
> ---
> drivers/iommu/iova.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index fe452ce46..d2a6cb573 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -106,7 +106,7 @@ __cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
> iovad->max32_alloc_size = iovad->dma_32bit_pfn;
>
> cached_iova = to_iova(iovad->cached_node);
> - if (free->pfn_lo >= cached_iova->pfn_lo)
> + if (free == cached_iova)
> iovad->cached_node = rb_next(&free->node);
> }
>
prev parent reply other threads:[~2023-05-05 18:55 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-04 12:56 [PATCH] iommu/iova: Don't reset cached_node in dac deallocation Zaid Alali
2023-05-05 18:55 ` Robin Murphy [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7324a84b-09e5-c86d-4e11-dc970f124fea@arm.com \
--to=robin.murphy@arm.com \
--cc=iommu@lists.linux.dev \
--cc=joro@8bytes.org \
--cc=scott@os.amperecomputing.com \
--cc=will@kernel.org \
--cc=zaidal@os.amperecomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox