Re: [PATCH v2 14/16] maple_tree: Refine mas_preallocate() node calculations

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Danilo Krummrich <dakr@redhat.com>
To: Peng Zhang <perlyzhang@gmail.com>
Cc: maple-tree@lists.infradead.org, linux-mm@kvack.org,
	Andrew Morton <akpm@linux-foundation.org>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	linux-kernel@vger.kernel.org,
	Peng Zhang <zhangpeng.00@bytedance.com>,
	David Airlie <airlied@redhat.com>,
	Boris Brezillon <boris.brezillon@collabora.com>
Subject: Re: [PATCH v2 14/16] maple_tree: Refine mas_preallocate() node calculations
Date: Mon, 26 Jun 2023 02:38:06 +0200	[thread overview]
Message-ID: <43ce08db-210a-fec8-51b4-351625b3cdfb@redhat.com> (raw)
In-Reply-To: <cdab5e74-7559-cb31-90ca-b99a5c3a6dd6@gmail.com>

Hi Peng,

On 6/25/23 05:28, Peng Zhang wrote:
> 
> 
> 在 2023/6/23 00:41, Danilo Krummrich 写道:
>> On 6/12/23 22:39, Liam R. Howlett wrote:
>>> Calculate the number of nodes based on the pending write action instead
>>> of assuming the worst case.
>>
>> Liam already gave me a heads-up on this patch, which I already replied 
>> to [1].
>>
>> However, I think it might make sense to also reply to this patch 
>> directly.
>>
>> For a mas_preallocate() calculating the actual required nodes to be 
>> allocated instead of assuming the worst to work, it is required to 
>> ensure that the tree does not change between calling mas_preallocate() 
>> and mas_store_prealloc() if my understanding is correct.
>>
>> In DRM however, more specifically the DRM GPUVA Manager [2], we do 
>> have the case that we are not able to ensure this:
>>
>> Jobs to create GPU mappings can be submitted by userspace, are queued 
>> up by the kernel and are processed asynchronously in dma-fence 
>> signalling critical paths, e.g. by using the drm_gpu_scheduler. Hence, 
>> we must be able to allocate the worst case amount of node, since at 
>> the time a job is submitted we can't predict the state the maple tree 
>> keeping track of mappings has once a mapping is inserted in the 
>> (asynchronous) dma-fence signalling critical path.
>>
>> A more detailed explanation can be found in [1].
>>
>> Could we keep a separate function for allocating the worst case amount 
>> of nodes additionally to this optimization? E.g. something like 
>> mas_preallocate_worst_case() or mas_preallocate_unlocked() (since I 
>> guess the new one requires the maple tree to be kept locked in order 
>> not to change)?
> Hi Danilo,
> 
> Your understanding seems incorrect. Even with previously unoptimized
> mas_preallocate(), the maple tree cannot be modified between calls to
> mas_preallocate() and mas_store_prealloc(). The calculation of the
> number of pre-allocated nodes depends on the structure of the maple
> tree. In the unoptimized mas_preallocate(), it depends on the height of
> the tree. If the maple tree is modified before mas_store_prealloc() and
> the height of the tree changes, the number of pre-allocated nodes is
> inaccurate.

Thanks for pointing this out!

First of all, it's probably fair to say "naive me", it totally makes 
sense the tree height is needed - it's a b-tree.

On the other hand, unless I miss something (and if so, please let me 
know), something is bogus with the API then.

While the documentation of the Advanced API of the maple tree explicitly 
claims that the user of the API is responsible for locking, this should 
be limited to the bounds set by the maple tree implementation. Which 
means, the user must decide for either the internal (spin-) lock or an 
external lock (which possibly goes away in the future) and acquire and 
release it according to the rules maple tree enforces through lockdep 
checks.

Let's say one picks the internal lock. How is one supposed to ensure the 
tree isn't modified using the internal lock with mas_preallocate()?

Besides that, I think the documentation should definitely mention this 
limitation and give some guidance for the locking.

Currently, from an API perspective, I can't see how anyone not familiar 
with the implementation details would be able to recognize this limitation.

In terms of the GPUVA manager, unfortunately, it seems like I need to 
drop the maple tree and go back to using a rb-tree, since it seems there 
is no sane way doing a worst-case pre-allocation that does not suffer 
from this limitation.

- Danilo

> 
> Regards,
> Peng
> 
>>
>> [1] 
>> https://lore.kernel.org/nouveau/68cd25de-e767-725e-2e7b-703217230bb0@redhat.com/T/#ma326e200b1de1e3c9df4e9fcb3bf243061fee8b5
>>
>> [2] 
>> https://lore.kernel.org/linux-mm/20230620004217.4700-8-dakr@redhat.com/T/#m47ab82310f87793d0f0cc1825a316eb30ad5b653
>>
>> - Danilo
>>
>>>
>>> This addresses a performance regression introduced in platforms that
>>> have longer allocation timing.
>>>
>>> Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
>>> ---
>>>   lib/maple_tree.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++-
>>>   1 file changed, 47 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
>>> index 048d6413a114..7ac5b5457603 100644
>>> --- a/lib/maple_tree.c
>>> +++ b/lib/maple_tree.c
>>> @@ -5541,9 +5541,55 @@ EXPORT_SYMBOL_GPL(mas_store_prealloc);
>>>    */
>>>   int mas_preallocate(struct ma_state *mas, void *entry, gfp_t gfp)
>>>   {
>>> +    MA_WR_STATE(wr_mas, mas, entry);
>>> +    unsigned char node_size;
>>> +    int request = 1;
>>>       int ret;
>>> -    mas_node_count_gfp(mas, 1 + mas_mt_height(mas) * 3, gfp);
>>> +
>>> +    if (unlikely(!mas->index && mas->last == ULONG_MAX))
>>> +        goto ask_now;
>>> +
>>> +    mas_wr_store_setup(&wr_mas);
>>> +    wr_mas.content = mas_start(mas);
>>> +    /* Root expand */
>>> +    if (unlikely(mas_is_none(mas) || mas_is_ptr(mas)))
>>> +        goto ask_now;
>>> +
>>> +    if (unlikely(!mas_wr_walk(&wr_mas))) {
>>> +        /* Spanning store, use worst case for now */
>>> +        request = 1 + mas_mt_height(mas) * 3;
>>> +        goto ask_now;
>>> +    }
>>> +
>>> +    /* At this point, we are at the leaf node that needs to be 
>>> altered. */
>>> +    /* Exact fit, no nodes needed. */
>>> +    if (wr_mas.r_min == mas->index && wr_mas.r_max == mas->last)
>>> +        return 0;
>>> +
>>> +    mas_wr_end_piv(&wr_mas);
>>> +    node_size = mas_wr_new_end(&wr_mas);
>>> +    /* Slot store can avoid using any nodes */
>>> +    if (node_size == wr_mas.node_end && wr_mas.offset_end - 
>>> mas->offset == 1)
>>> +        return 0;
>>> +
>>> +    if (node_size >= mt_slots[wr_mas.type]) {
>>> +        /* Split, worst case for now. */
>>> +        request = 1 + mas_mt_height(mas) * 2;
>>> +        goto ask_now;
>>> +    }
>>> +
>>> +    /* Appending does not need any nodes */
>>> +    if (node_size == wr_mas.node_end + 1 && mas->offset == 
>>> wr_mas.node_end)
>>> +        return 0;
>>> +
>>> +    /* Potential spanning rebalance collapsing a node, use 
>>> worst-case */
>>> +    if (node_size  - 1 <= mt_min_slots[wr_mas.type])
>>> +        request = mas_mt_height(mas) * 2 - 1;
>>> +
>>> +    /* node store needs one node */
>>> +ask_now:
>>> +    mas_node_count_gfp(mas, request, gfp);
>>>       mas->mas_flags |= MA_STATE_PREALLOC;
>>>       if (likely(!mas_is_err(mas)))
>>>           return 0;
>>
>>
>

next prev parent reply	other threads:[~2023-06-26  1:52 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-12 20:39 [PATCH v2 00/16] Reduce preallocations for maple tree Liam R. Howlett
2023-06-12 20:39 ` [PATCH v2 01/16] maple_tree: Add benchmarking for mas_for_each Liam R. Howlett
2023-06-12 20:39 ` [PATCH v2 02/16] maple_tree: Add benchmarking for mas_prev() Liam R. Howlett
2023-06-12 20:39 ` [PATCH v2 03/16] mm: Move unmap_vmas() declaration to internal header Liam R. Howlett
2023-06-12 20:39 ` [PATCH v2 04/16] mm: Change do_vmi_align_munmap() side tree index Liam R. Howlett
2023-06-20 12:26   ` Sergey Senozhatsky
2023-06-20 13:04     ` Liam R. Howlett
2023-06-21  0:10       ` Sergey Senozhatsky
2023-06-12 20:39 ` [PATCH v2 05/16] mm: Remove prev check from do_vmi_align_munmap() Liam R. Howlett
2023-06-12 20:39 ` [PATCH v2 06/16] maple_tree: Introduce __mas_set_range() Liam R. Howlett
2023-06-12 20:39 ` [PATCH v2 07/16] mm: Remove re-walk from mmap_region() Liam R. Howlett
2023-06-12 20:39 ` [PATCH v2 08/16] maple_tree: Adjust node allocation on mas_rebalance() Liam R. Howlett
2023-06-12 20:39 ` [PATCH v2 09/16] maple_tree: Re-introduce entry to mas_preallocate() arguments Liam R. Howlett
2023-06-12 20:39 ` [PATCH v2 10/16] mm: Use vma_iter_clear_gfp() in nommu Liam R. Howlett
2023-06-12 20:39 ` [PATCH v2 11/16] mm: Set up vma iterator for vma_iter_prealloc() calls Liam R. Howlett
2023-06-12 20:39 ` [PATCH v2 12/16] maple_tree: Move mas_wr_end_piv() below mas_wr_extend_null() Liam R. Howlett
2023-06-12 20:39 ` [PATCH v2 13/16] maple_tree: Update mas_preallocate() testing Liam R. Howlett
2023-06-12 20:39 ` [PATCH v2 14/16] maple_tree: Refine mas_preallocate() node calculations Liam R. Howlett
2023-06-22 16:41   ` Danilo Krummrich
2023-06-25  3:28     ` Peng Zhang
2023-06-26  0:38       ` Danilo Krummrich [this message]
2023-06-26 13:19         ` Matthew Wilcox
2023-06-26 14:27           ` Danilo Krummrich
2023-06-26 14:49             ` Peng Zhang
2023-06-26 18:21               ` Danilo Krummrich
2023-06-26 14:52             ` Matthew Wilcox
2023-06-26 18:37               ` Danilo Krummrich
2023-06-27  1:58                 ` Liam R. Howlett
2023-06-27 13:15                   ` Danilo Krummrich
2023-06-27 16:11                     ` Liam R. Howlett
2023-06-27 21:06                       ` Danilo Krummrich
2023-06-26 14:08         ` Peng Zhang
2023-06-26 14:30           ` Danilo Krummrich
2023-06-12 20:39 ` [PATCH v2 15/16] maple_tree: Reduce resets during store setup Liam R. Howlett
2023-06-12 20:39 ` [PATCH v2 16/16] mm/mmap: Change vma iteration order in do_vmi_align_munmap() Liam R. Howlett
2023-06-15  8:33 ` [PATCH v2 00/16] Reduce preallocations for maple tree Yin, Fengwei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43ce08db-210a-fec8-51b4-351625b3cdfb@redhat.com \
    --to=dakr@redhat.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=airlied@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=boris.brezillon@collabora.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=maple-tree@lists.infradead.org \
    --cc=perlyzhang@gmail.com \
    --cc=zhangpeng.00@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).