From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.2 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B952C2D0A3 for ; Tue, 3 Nov 2020 16:48:34 +0000 (UTC) Received: from silver.osuosl.org (smtp3.osuosl.org [140.211.166.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 81DCD207BB for ; Tue, 3 Nov 2020 16:48:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 81DCD207BB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=iommu-bounces@lists.linux-foundation.org Received: from localhost (localhost [127.0.0.1]) by silver.osuosl.org (Postfix) with ESMTP id F313020788; Tue, 3 Nov 2020 16:48:31 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from silver.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id my4c7dTDROWB; Tue, 3 Nov 2020 16:48:30 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by silver.osuosl.org (Postfix) with ESMTP id 6B64B20784; Tue, 3 Nov 2020 16:48:30 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 44FDBC0889; Tue, 3 Nov 2020 16:48:30 +0000 (UTC) Received: from hemlock.osuosl.org (smtp2.osuosl.org [140.211.166.133]) by lists.linuxfoundation.org (Postfix) with ESMTP id 884E4C0051 for ; Tue, 3 Nov 2020 16:48:28 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by hemlock.osuosl.org (Postfix) with ESMTP id 722CD871FE for ; Tue, 3 Nov 2020 16:48:28 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from hemlock.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id E8tL2XqMX80H for ; Tue, 3 Nov 2020 16:48:26 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by hemlock.osuosl.org (Postfix) with ESMTP id 371B9871FB for ; Tue, 3 Nov 2020 16:48:26 +0000 (UTC) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 11D0A139F; Tue, 3 Nov 2020 08:48:15 -0800 (PST) Received: from [10.57.54.223] (unknown [10.57.54.223]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B0C173F718; Tue, 3 Nov 2020 08:48:13 -0800 (PST) Subject: Re: [PATCH v2 3/4] iommu/iova: Flush CPU rcache for when a depot fills To: John Garry , joro@8bytes.org References: <1603733501-211004-1-git-send-email-john.garry@huawei.com> <1603733501-211004-4-git-send-email-john.garry@huawei.com> From: Robin Murphy Message-ID: <65b568ef-ff2a-0993-e6f5-b6414b3b19f8@arm.com> Date: Tue, 3 Nov 2020 16:48:12 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; rv:78.0) Gecko/20100101 Thunderbird/78.4.0 MIME-Version: 1.0 In-Reply-To: <1603733501-211004-4-git-send-email-john.garry@huawei.com> Content-Language: en-GB Cc: linuxarm@huawei.com, linux-kernel@vger.kernel.org, iommu@lists.linux-foundation.org, xiyou.wangcong@gmail.com X-BeenThere: iommu@lists.linux-foundation.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Development issues for Linux IOMMU support List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: iommu-bounces@lists.linux-foundation.org Sender: "iommu" On 2020-10-26 17:31, John Garry wrote: > Leizhen reported some time ago that IOVA performance may degrade over time > [0], but unfortunately his solution to fix this problem was not given > attention. > > To summarize, the issue is that as time goes by, the CPU rcache and depot > rcache continue to grow. As such, IOVA RB tree access time also continues > to grow. I'm struggling to see how this is not simply indicative of a leak originating elsewhere. For the number of magazines to continually grow, it means IOVAs *of a particular size* are being freed faster than they are being allocated, while the only place that ongoing allocations should be coming from is those same magazines! Now indeed that could happen over the short term if IOVAs are allocated and freed again in giant batches larger than the total global cache capacity, but that would show a cyclic behaviour - when activity starts, everything is first allocated straight from the tree, then when it ends the caches would get overwhelmed by the large burst of freeing and start having to release things back to the tree, but eventually that would stop once everything *is* freed, then when activity begins again the next round of allocating would inherently clear out all the caches before going anywhere near the tree. To me the "steady decline" behaviour suggests that someone somewhere is making DMA unmap calls with a smaller size than they were mapped with (you tend to notice it quicker the other way round due to all the device errors and random memory corruption) - in many cases that would appear to work out fine from the driver's point of view, but would provoke exactly this behaviour in the IOVA allocator. Robin. > At a certain point, a depot may become full, and also some CPU rcaches may > also be full when inserting another IOVA is attempted. For this scenario, > currently the "loaded" CPU rcache is freed and a new one is created. This > freeing means that many IOVAs in the RB tree need to be freed, which > makes IO throughput performance fall off a cliff in some storage scenarios: > > Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6314MB/0KB/0KB /s] [1616K/0/0 iops] > Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [5669MB/0KB/0KB /s] [1451K/0/0 iops] > Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6031MB/0KB/0KB /s] [1544K/0/0 iops] > Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6673MB/0KB/0KB /s] [1708K/0/0 iops] > Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6705MB/0KB/0KB /s] [1717K/0/0 iops] > Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6031MB/0KB/0KB /s] [1544K/0/0 iops] > Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6761MB/0KB/0KB /s] [1731K/0/0 iops] > Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6705MB/0KB/0KB /s] [1717K/0/0 iops] > Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6685MB/0KB/0KB /s] [1711K/0/0 iops] > Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6178MB/0KB/0KB /s] [1582K/0/0 iops] > Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6731MB/0KB/0KB /s] [1723K/0/0 iops] > Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [2387MB/0KB/0KB /s] [611K/0/0 iops] > Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [2689MB/0KB/0KB /s] [688K/0/0 iops] > Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [2278MB/0KB/0KB /s] [583K/0/0 iops] > Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [1288MB/0KB/0KB /s] [330K/0/0 iops] > Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [1632MB/0KB/0KB /s] [418K/0/0 iops] > Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [1765MB/0KB/0KB /s] [452K/0/0 iops] > > And continue in this fashion, without recovering. Note that in this > example it was required to wait 16 hours for this to occur. Also note that > IO throughput also becomes gradually becomes more unstable leading up to > this point. > > As a solution to this issue, judge that the IOVA caches have grown too big > when cached magazines need to be free, and just flush all the CPUs rcaches > instead. > > The depot rcaches, however, are not flushed, as they can be used to > immediately replenish active CPUs. > > In future, some IOVA compaction could be implemented to solve the > instabilty issue, which I figure could be quite complex to implement. > > [0] https://lore.kernel.org/linux-iommu/20190815121104.29140-3-thunder.leizhen@huawei.com/ > > Analyzed-by: Zhen Lei > Reported-by: Xiang Chen > Signed-off-by: John Garry > --- > drivers/iommu/iova.c | 16 ++++++---------- > 1 file changed, 6 insertions(+), 10 deletions(-) > > diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c > index 1f3f0f8b12e0..386005055aca 100644 > --- a/drivers/iommu/iova.c > +++ b/drivers/iommu/iova.c > @@ -901,7 +901,6 @@ static bool __iova_rcache_insert(struct iova_domain *iovad, > struct iova_rcache *rcache, > unsigned long iova_pfn) > { > - struct iova_magazine *mag_to_free = NULL; > struct iova_cpu_rcache *cpu_rcache; > bool can_insert = false; > unsigned long flags; > @@ -923,13 +922,12 @@ static bool __iova_rcache_insert(struct iova_domain *iovad, > if (cpu_rcache->loaded) > rcache->depot[rcache->depot_size++] = > cpu_rcache->loaded; > - } else { > - mag_to_free = cpu_rcache->loaded; > + can_insert = true; > + cpu_rcache->loaded = new_mag; > } > spin_unlock(&rcache->lock); > - > - cpu_rcache->loaded = new_mag; > - can_insert = true; > + if (!can_insert) > + iova_magazine_free(new_mag); > } > } > > @@ -938,10 +936,8 @@ static bool __iova_rcache_insert(struct iova_domain *iovad, > > spin_unlock_irqrestore(&cpu_rcache->lock, flags); > > - if (mag_to_free) { > - iova_magazine_free_pfns(mag_to_free, iovad); > - iova_magazine_free(mag_to_free); > - } > + if (!can_insert) > + free_all_cpu_cached_iovas(iovad); > > return can_insert; > } > _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu