From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id D08D2156EF for ; Fri, 5 May 2023 18:55:22 +0000 (UTC) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3874A1FB; Fri, 5 May 2023 11:56:06 -0700 (PDT) Received: from [10.57.81.246] (unknown [10.57.81.246]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 0FD023F64C; Fri, 5 May 2023 11:55:20 -0700 (PDT) Message-ID: <7324a84b-09e5-c86d-4e11-dc970f124fea@arm.com> Date: Fri, 5 May 2023 19:55:17 +0100 Precedence: bulk X-Mailing-List: iommu@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; rv:102.0) Gecko/20100101 Thunderbird/102.10.1 Subject: Re: [PATCH] iommu/iova: Don't reset cached_node in dac deallocation Content-Language: en-GB To: Zaid Alali , Joerg Roedel , Will Deacon , iommu@lists.linux.dev Cc: D Scott Phillips References: From: Robin Murphy In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 2023-05-04 13:56, Zaid Alali wrote: > The iova allocator has two rbtrees for allocations that are not satisfied > by rcache. The two rbtrees track iovas for the ranges of 32bit address > space and larger address space >32bit. On deallocation, the cached_node > is updated to point to the deallocated iova. > > Because the cached_node is moved to point to the recently deallocated > iova with higher address, the first-fit allocator needs to walk the > rbtree backwards skipping holes that do not fit while holding > iova_rbtree_lock, which impacts performance and can cause soft-lockups. > On deallocation, do not reset the cached_node to the freed iova for the > rbtree tracking the dac addresses and keep moving forward with new > allocations. This only affects addresses > 32bit. The trouble with this is the long-term impact: the cached node basically never moves upwards, so over time as new IOVA allocations continue, DMA working sets slowly and steadily move down through their respective address spaces, leaving allocated-but-empty pagetables above. Given enough time, all memory is pagetables and the system withers and dies :( > This patch was tested with ‘iommu.forcedac=1’ and 20 dd read instances > of 8GB from nvme as well as kernel compilation running in parallel. Hmm, if it's the case that you're hitting the rbtree all the time because your NVMe thinks it wants chunks that are too big for the IOVA rcaches, you might like this thread even more: https://lore.kernel.org/linux-iommu/20230503161759.GA1614@lst.de/ Thanks, Robin. > The test results obtained from /proc/lock_stat shows the following > improvements for iovad->iova_rbtree_lock: > > Wait time average: reduced by 31% > Hold time average: reduced by 60% > > Signed-off-by: D Scott Phillips > Signed-off-by: Zaid Alali > --- > drivers/iommu/iova.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c > index fe452ce46..d2a6cb573 100644 > --- a/drivers/iommu/iova.c > +++ b/drivers/iommu/iova.c > @@ -106,7 +106,7 @@ __cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free) > iovad->max32_alloc_size = iovad->dma_32bit_pfn; > > cached_iova = to_iova(iovad->cached_node); > - if (free->pfn_lo >= cached_iova->pfn_lo) > + if (free == cached_iova) > iovad->cached_node = rb_next(&free->node); > } >