From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3FFB1C433E0 for ; Thu, 18 Mar 2021 11:42:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E85B064EED for ; Thu, 18 Mar 2021 11:42:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230206AbhCRLlc (ORCPT ); Thu, 18 Mar 2021 07:41:32 -0400 Received: from frasgout.his.huawei.com ([185.176.79.56]:2710 "EHLO frasgout.his.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229854AbhCRLlB (ORCPT ); Thu, 18 Mar 2021 07:41:01 -0400 Received: from fraeml737-chm.china.huawei.com (unknown [172.18.147.226]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4F1Q0z6vdJz6804r; Thu, 18 Mar 2021 19:34:43 +0800 (CST) Received: from lhreml724-chm.china.huawei.com (10.201.108.75) by fraeml737-chm.china.huawei.com (10.206.15.218) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2106.2; Thu, 18 Mar 2021 12:40:59 +0100 Received: from [10.47.0.142] (10.47.0.142) by lhreml724-chm.china.huawei.com (10.201.108.75) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2106.2; Thu, 18 Mar 2021 11:40:58 +0000 Subject: Re: [PATCH 2/2] iommu/iova: Improve restart logic From: John Garry To: Robin Murphy , CC: Linuxarm , , , References: <03931d86c0ad71f44b29394e3a8d38bfc32349cd.1614962123.git.robin.murphy@arm.com> <076b3484d1e5057b95d8c387c894bd6ad2514043.1614962123.git.robin.murphy@arm.com> Message-ID: <6cea11f9-e98d-98cb-6789-93abd8833fa0@huawei.com> Date: Thu, 18 Mar 2021 11:38:49 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.1.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.47.0.142] X-ClientProxiedBy: lhreml730-chm.china.huawei.com (10.201.108.81) To lhreml724-chm.china.huawei.com (10.201.108.75) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/03/2021 17:47, John Garry wrote: > On 09/03/2021 15:55, John Garry wrote: >> On 05/03/2021 16:35, Robin Murphy wrote: >> >> Hi Robin, >> >>> When restarting after searching below the cached node fails, resetting >>> the start point to the anchor node is often overly pessimistic. If >>> allocations are made with mixed limits - particularly in the case of the >>> opportunistic 32-bit allocation for PCI devices - this could mean >>> significant time wasted walking through the whole populated upper range >>> just to reach the initial limit. >> >> Right >> >>> We can improve on that by implementing >>> a proper tree traversal to find the first node above the relevant limit, >>> and set the exact start point. >> >> Thanks for this. However, as mentioned in the other thread, this does >> not help our performance regression Re: commit 4e89dce72521. >> >> And mentioning this "retry" approach again, in case it was missed, >> from my experiment on the affected HW, it also has generally a >> dreadfully low success rate - less than 1% for the 32b range retry. >> Retry rate is about 20%. So I am saying from this 20%, less than 1% of >> those succeed. >> > > So since the retry means that we search through the complete pfn range > most of the time (due to poor success rate), we should be able to do a > better job at maintaining an accurate max alloc size, by calculating it > from the range search, and not relying on max alloc failed or resetting > it frequently. Hopefully that would mean that we're smarter about not > trying the allocation. So I tried that out and we seem to be able to scrap back an appreciable amount of performance. Maybe 80% of original, with with another change, below. Thanks, John > > >> >>> >>> Signed-off-by: Robin Murphy >>> --- >>>   drivers/iommu/iova.c | 39 ++++++++++++++++++++++++++++++++++++++- >>>   1 file changed, 38 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c >>> index c28003e1d2ee..471c48dd71e7 100644 >>> --- a/drivers/iommu/iova.c >>> +++ b/drivers/iommu/iova.c >>> @@ -154,6 +154,43 @@ __cached_rbnode_delete_update(struct iova_domain >>> *iovad, struct iova *free) >>>           iovad->cached_node = rb_next(&free->node); >>>   } >>> +static struct rb_node *iova_find_limit(struct iova_domain *iovad, >>> unsigned long limit_pfn) >>> +{ >>> +    struct rb_node *node, *next; >>> +    /* >>> +     * Ideally what we'd like to judge here is whether limit_pfn is >>> close >>> +     * enough to the highest-allocated IOVA that starting the >>> allocation >>> +     * walk from the anchor node will be quicker than this initial >>> work to >>> +     * find an exact starting point (especially if that ends up >>> being the >>> +     * anchor node anyway). This is an incredibly crude >>> approximation which >>> +     * only really helps the most likely case, but is at least >>> trivially easy. >>> +     */ >>> +    if (limit_pfn > iovad->dma_32bit_pfn) >>> +        return &iovad->anchor.node; >>> + >>> +    node = iovad->rbroot.rb_node; >>> +    while (to_iova(node)->pfn_hi < limit_pfn) >>> +        node = node->rb_right; >>> + >>> +search_left: >>> +    while (node->rb_left && to_iova(node->rb_left)->pfn_lo >= >>> limit_pfn) >>> +        node = node->rb_left; >>> + >>> +    if (!node->rb_left) >>> +        return node; >>> + >>> +    next = node->rb_left; >>> +    while (next->rb_right) { >>> +        next = next->rb_right; >>> +        if (to_iova(next)->pfn_lo >= limit_pfn) { >>> +            node = next; >>> +            goto search_left; >>> +        } >>> +    } >>> + >>> +    return node; >>> +} >>> + >>>   /* Insert the iova into domain rbtree by holding writer lock */ >>>   static void >>>   iova_insert_rbtree(struct rb_root *root, struct iova *iova, >>> @@ -219,7 +256,7 @@ static int __alloc_and_insert_iova_range(struct >>> iova_domain *iovad, >>>           if (low_pfn == iovad->start_pfn && retry_pfn < limit_pfn) { >>>               high_pfn = limit_pfn; >>>               low_pfn = retry_pfn; >>> -            curr = &iovad->anchor.node; >>> +            curr = iova_find_limit(iovad, limit_pfn); I see that it is now applied. However, alternatively could we just add a zero-length 32b boundary marker node for the 32b pfn restart point? >>>               curr_iova = to_iova(curr); >>>               goto retry; >>>           } >>> >> >> _______________________________________________ >> iommu mailing list >> iommu@lists.linux-foundation.org >> https://lists.linuxfoundation.org/mailman/listinfo/iommu >> . >