From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 019AEEB64DC for ; Mon, 26 Jun 2023 01:52:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 73C3C8D0002; Sun, 25 Jun 2023 21:52:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6C53E8D0001; Sun, 25 Jun 2023 21:52:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 53F848D0002; Sun, 25 Jun 2023 21:52:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 3DCB78D0001 for ; Sun, 25 Jun 2023 21:52:03 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id EDFDFC04F2 for ; Mon, 26 Jun 2023 01:52:02 +0000 (UTC) X-FDA: 80943223284.16.BF5F5C8 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf03.hostedemail.com (Postfix) with ESMTP id A7ECD2000B for ; Mon, 26 Jun 2023 01:51:59 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=RgFP7yQe; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf03.hostedemail.com: domain of dakr@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=dakr@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687744319; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=sfgyyq+iw8RNZpIOQEvl88qX1LmwtfH97CloAGRNmOk=; b=vFwTgsFpqp8Gdlrk4xSdbfJXWORjkY5hwgf9bBKk6kb/O/29yQhHIpbWEd7yxY0m5LNi/L rAix2SpWBWT2ixuSavFz3mPD6CjzY72EY4vJBz6Nv0btY+nkU4S4keUb1F9k63K67mgkF+ HKXjfnWkFXc2fCvVDGJi1vok9LQM+iA= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=RgFP7yQe; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf03.hostedemail.com: domain of dakr@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=dakr@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687744319; a=rsa-sha256; cv=none; b=8a8BUM901lsErkwM+f3a2Wed1zKdFEFK/867bFM5Qq75j+LZ6qz+K9FKGp6nhdxGgwNIe/ mXaBRGrTM/ON7nmknSebPZHDU5GxygK4JS6x/AkhI0fWTnuWE4d7eXjdnVBUbQR7He4uAG 43v7USrhvbbx49kwIX5Zr1nZbPjk7pI= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1687744319; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sfgyyq+iw8RNZpIOQEvl88qX1LmwtfH97CloAGRNmOk=; b=RgFP7yQenEpE+joACsgmToOYLbxcR1C6AotX8NUSPyNEDGk60F4S2Vc+iLV37Zr1a6z2aW 0g39zNAsqQaeSVtbyTZxbIAOOZW0ID60pZCpiGYJ0OcHuI6AHtqnDuYtWrWYKLccsKu71M Al7h73+o6D7+giKCDSe+E8jomYdtR9U= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-57-5Ji2xKiJOiuUpyF9gGhd1g-1; Sun, 25 Jun 2023 21:51:57 -0400 X-MC-Unique: 5Ji2xKiJOiuUpyF9gGhd1g-1 Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-31283f7f2d1so1583934f8f.0 for ; Sun, 25 Jun 2023 18:51:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687744316; x=1690336316; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=sfgyyq+iw8RNZpIOQEvl88qX1LmwtfH97CloAGRNmOk=; b=YSwBVEZpUsS644nlfZm0rNMjk+0kN9ry/Rgm3VsN2BAF2FWSrQqwEzNe4XHlXwchet 2jmlcoAndYrjZRDCeDW4FAKBAClJSkYSVxwngZboCsWrh3ws+EdAMbTb8OcAX+5MoFea 99WOyU6ZEh3vBK96AKg1m6uIOn8GfqYXNfP2r++kugYh+Y8LPd1tS6XBwZyWycjgAfgO 2fHuj0wAA2agZy10beBGWRZbiRDo+t3/6H77uBUltSHskC5HUZcaB2OvXbP4jrXV6l0x WqrwBzApPYUXPU7YsqgHsb1sS7+AWxR4TrZsb1omp7AuR357V7m7byjl/z/vufsxB6Fz tHGQ== X-Gm-Message-State: AC+VfDyPIW4EZXZr45Z3jDhlI3piSaul4u7FcvFa56QC8Co6WxvQDVnb ogYIy6GragU4Jx9R5orf+apMPxhdNIHK8RboMRgpX92fqCc7JcLHJ4m4tDQYWnU+lm8QravsjDi e8/OdoPGKRYU= X-Received: by 2002:a5d:448a:0:b0:30e:3ec4:e49e with SMTP id j10-20020a5d448a000000b0030e3ec4e49emr21850450wrq.70.1687744316674; Sun, 25 Jun 2023 18:51:56 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6gpIipJLYnEFx5Q6D/rGiHsZp2nIwlN2f6BezUcw8AvNfBg0LfKZogchXEwUma5epKE/hvvw== X-Received: by 2002:a5d:448a:0:b0:30e:3ec4:e49e with SMTP id j10-20020a5d448a000000b0030e3ec4e49emr21850445wrq.70.1687744316371; Sun, 25 Jun 2023 18:51:56 -0700 (PDT) Received: from ?IPV6:2a02:810d:4b3f:de9c:642:1aff:fe31:a15c? ([2a02:810d:4b3f:de9c:642:1aff:fe31:a15c]) by smtp.gmail.com with ESMTPSA id k26-20020a5d525a000000b003078354f774sm5830052wrc.36.2023.06.25.18.51.55 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 25 Jun 2023 18:51:55 -0700 (PDT) Message-ID: <43ce08db-210a-fec8-51b4-351625b3cdfb@redhat.com> Date: Mon, 26 Jun 2023 02:38:06 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Subject: Re: [PATCH v2 14/16] maple_tree: Refine mas_preallocate() node calculations To: Peng Zhang Cc: maple-tree@lists.infradead.org, linux-mm@kvack.org, Andrew Morton , "Liam R. Howlett" , linux-kernel@vger.kernel.org, Peng Zhang , David Airlie , Boris Brezillon References: <20230612203953.2093911-1-Liam.Howlett@oracle.com> <20230612203953.2093911-15-Liam.Howlett@oracle.com> <26d8fbcf-d34f-0a79-9d91-8c60e66f7341@redhat.com> From: Danilo Krummrich Organization: RedHat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: A7ECD2000B X-Stat-Signature: yc1bxe6j1yec9eha7qawxhzb5ka6kbkh X-HE-Tag: 1687744319-387625 X-HE-Meta: U2FsdGVkX1/nYQ57VRYcb+N0LDuuIgW52uxLzW/23M9YxmxUG84KcBJ0ZX+szD6UmpJ8kygj/bVxwBZTqZkmqbfFYTzylEux+N5+Iw50ls0wpdweFITVtjwQRQH3jeCLT7JFsz0JEIEKHIrrfVdSBWsdew76MBx2dIHvcjKdfysJqP7/9diAG8I643hxXkswKofmMMK+MoffMx67skosrjmTGNwDHhGu33xu3z87SH+jZPO90kd+Or03AUVLkef9CJR/7JzWy+QkdQ6jfcVEAKOOR448nQq6EdtT4/QxXkqq96O9bs/HfxOBB4aYW0uY8r4PL33hfiRHXUm35tR6qY1E7hwlF3/8jpOhJy1u8JI8Zafz3PIqRaB31LP3zblJ/QkcY4yesRtxZWSXsTr4HnxI2bonH5ZeHDegoZ2utN8oFslIezCe7i6Kuntk49ir65GksntNqkJ8pqpBq3Tocc86VkiacXan21mI1c3jYMegcfuTUGp1yDj5M+xKnHpkT0I6nda+KmIOYKqDHkTTq3l8Mho9fcH+kEzMiTR6MlE/iDcKcde+IiaZkVhGmvVf/w11804XVpPyp39EGK+Th9Ur/r40RLgEtr1Jr/Wc2nzPKB1KbbRZjq9wnBEJtEbg2ZGf3yuj/LlLsAX2iEKNUID1da+x6MWeDBfPuzPw7IXGQ7XtoHwFrKscsoE4IrpHAui3zJCt21F9SwU8WRM97SOHFT1z28orhoe5Ur4EBmocB9L6pnOV5CfQK0/nVFL2TEQ2xPEp/LBr6HEuhBmBp1yjjwu2G/8sNd7qSs3s37Ldcz6SkCOw6QLlPfestJJS58vIXgxFkr5YqvFoum6yYfLRDQBViARqor3o4sptu6026ycFS4Jy8i2u//NVxC0dztfKFsIlwV0Rsc20lqxQ7scZgTr2/7ySgmDJQfd5D3FO9W9rqyhyykR9CL1YOj20KLDA5TJOl7W4QIQy18D 9zRFQtb7 /6/WE0XV3W/X+V+92Sw19Uo18PH1eF/WUUPnHUtUuucEG8O9hoV/0J57vh+iAYsXgJKEzGC/hjBZ0qPTPyyZyscUu7z4+dy/V0LBvBjhkWgGc3s89L+NuoDFzqvYi6b3JhLgj2WcRrOTTHwLe6PNLrgRVEloGv1kl3n6G3qpVLUzxs5zXdHz49LMjrcuINI3jVwO9eErlI7yu0hHx8n1SSxYYvHjSirks9nlW8HxVwOipKCjavLN2ycAK9i6yzhbP/n9+YlCfRZ0TWYD2EhatJBLXwa16DKWhnAmwMVVkgY+ojI137Jyp4wWDE/HZXjEX9oHJ0UIrEq8cle80hyTkcy026sokJNIgFjg6lLDZgcMSRl1Eqj8ncQo97GzSgNAilM6FOeoeLOH5blOVZd5XmAJ5uRJZPt92lx9RS+iPAV902JNozlbK/Xk+9hkT0lP1twmBQsMkL1C/2vxGHcqMDa6V0OdILX99jXfnJSrzDDdwJ4To/bsSsY14oPwDZz8A2h31ORjNBr0nN8F2MFVlHetYro5Nn6D6Ce9Suhm61hTSqRzspyAQASL2iA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Peng, On 6/25/23 05:28, Peng Zhang wrote: > > > 在 2023/6/23 00:41, Danilo Krummrich 写道: >> On 6/12/23 22:39, Liam R. Howlett wrote: >>> Calculate the number of nodes based on the pending write action instead >>> of assuming the worst case. >> >> Liam already gave me a heads-up on this patch, which I already replied >> to [1]. >> >> However, I think it might make sense to also reply to this patch >> directly. >> >> For a mas_preallocate() calculating the actual required nodes to be >> allocated instead of assuming the worst to work, it is required to >> ensure that the tree does not change between calling mas_preallocate() >> and mas_store_prealloc() if my understanding is correct. >> >> In DRM however, more specifically the DRM GPUVA Manager [2], we do >> have the case that we are not able to ensure this: >> >> Jobs to create GPU mappings can be submitted by userspace, are queued >> up by the kernel and are processed asynchronously in dma-fence >> signalling critical paths, e.g. by using the drm_gpu_scheduler. Hence, >> we must be able to allocate the worst case amount of node, since at >> the time a job is submitted we can't predict the state the maple tree >> keeping track of mappings has once a mapping is inserted in the >> (asynchronous) dma-fence signalling critical path. >> >> A more detailed explanation can be found in [1]. >> >> Could we keep a separate function for allocating the worst case amount >> of nodes additionally to this optimization? E.g. something like >> mas_preallocate_worst_case() or mas_preallocate_unlocked() (since I >> guess the new one requires the maple tree to be kept locked in order >> not to change)? > Hi Danilo, > > Your understanding seems incorrect. Even with previously unoptimized > mas_preallocate(), the maple tree cannot be modified between calls to > mas_preallocate() and mas_store_prealloc(). The calculation of the > number of pre-allocated nodes depends on the structure of the maple > tree. In the unoptimized mas_preallocate(), it depends on the height of > the tree. If the maple tree is modified before mas_store_prealloc() and > the height of the tree changes, the number of pre-allocated nodes is > inaccurate. Thanks for pointing this out! First of all, it's probably fair to say "naive me", it totally makes sense the tree height is needed - it's a b-tree. On the other hand, unless I miss something (and if so, please let me know), something is bogus with the API then. While the documentation of the Advanced API of the maple tree explicitly claims that the user of the API is responsible for locking, this should be limited to the bounds set by the maple tree implementation. Which means, the user must decide for either the internal (spin-) lock or an external lock (which possibly goes away in the future) and acquire and release it according to the rules maple tree enforces through lockdep checks. Let's say one picks the internal lock. How is one supposed to ensure the tree isn't modified using the internal lock with mas_preallocate()? Besides that, I think the documentation should definitely mention this limitation and give some guidance for the locking. Currently, from an API perspective, I can't see how anyone not familiar with the implementation details would be able to recognize this limitation. In terms of the GPUVA manager, unfortunately, it seems like I need to drop the maple tree and go back to using a rb-tree, since it seems there is no sane way doing a worst-case pre-allocation that does not suffer from this limitation. - Danilo > > Regards, > Peng > >> >> [1] >> https://lore.kernel.org/nouveau/68cd25de-e767-725e-2e7b-703217230bb0@redhat.com/T/#ma326e200b1de1e3c9df4e9fcb3bf243061fee8b5 >> >> [2] >> https://lore.kernel.org/linux-mm/20230620004217.4700-8-dakr@redhat.com/T/#m47ab82310f87793d0f0cc1825a316eb30ad5b653 >> >> - Danilo >> >>> >>> This addresses a performance regression introduced in platforms that >>> have longer allocation timing. >>> >>> Signed-off-by: Liam R. Howlett >>> --- >>>   lib/maple_tree.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++- >>>   1 file changed, 47 insertions(+), 1 deletion(-) >>> >>> diff --git a/lib/maple_tree.c b/lib/maple_tree.c >>> index 048d6413a114..7ac5b5457603 100644 >>> --- a/lib/maple_tree.c >>> +++ b/lib/maple_tree.c >>> @@ -5541,9 +5541,55 @@ EXPORT_SYMBOL_GPL(mas_store_prealloc); >>>    */ >>>   int mas_preallocate(struct ma_state *mas, void *entry, gfp_t gfp) >>>   { >>> +    MA_WR_STATE(wr_mas, mas, entry); >>> +    unsigned char node_size; >>> +    int request = 1; >>>       int ret; >>> -    mas_node_count_gfp(mas, 1 + mas_mt_height(mas) * 3, gfp); >>> + >>> +    if (unlikely(!mas->index && mas->last == ULONG_MAX)) >>> +        goto ask_now; >>> + >>> +    mas_wr_store_setup(&wr_mas); >>> +    wr_mas.content = mas_start(mas); >>> +    /* Root expand */ >>> +    if (unlikely(mas_is_none(mas) || mas_is_ptr(mas))) >>> +        goto ask_now; >>> + >>> +    if (unlikely(!mas_wr_walk(&wr_mas))) { >>> +        /* Spanning store, use worst case for now */ >>> +        request = 1 + mas_mt_height(mas) * 3; >>> +        goto ask_now; >>> +    } >>> + >>> +    /* At this point, we are at the leaf node that needs to be >>> altered. */ >>> +    /* Exact fit, no nodes needed. */ >>> +    if (wr_mas.r_min == mas->index && wr_mas.r_max == mas->last) >>> +        return 0; >>> + >>> +    mas_wr_end_piv(&wr_mas); >>> +    node_size = mas_wr_new_end(&wr_mas); >>> +    /* Slot store can avoid using any nodes */ >>> +    if (node_size == wr_mas.node_end && wr_mas.offset_end - >>> mas->offset == 1) >>> +        return 0; >>> + >>> +    if (node_size >= mt_slots[wr_mas.type]) { >>> +        /* Split, worst case for now. */ >>> +        request = 1 + mas_mt_height(mas) * 2; >>> +        goto ask_now; >>> +    } >>> + >>> +    /* Appending does not need any nodes */ >>> +    if (node_size == wr_mas.node_end + 1 && mas->offset == >>> wr_mas.node_end) >>> +        return 0; >>> + >>> +    /* Potential spanning rebalance collapsing a node, use >>> worst-case */ >>> +    if (node_size  - 1 <= mt_min_slots[wr_mas.type]) >>> +        request = mas_mt_height(mas) * 2 - 1; >>> + >>> +    /* node store needs one node */ >>> +ask_now: >>> +    mas_node_count_gfp(mas, request, gfp); >>>       mas->mas_flags |= MA_STATE_PREALLOC; >>>       if (likely(!mas_is_err(mas))) >>>           return 0; >> >> >