All of lore.kernel.org
 help / color / mirror / Atom feed
From: Uladzislau Rezki <urezki@gmail.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Uladzislau Rezki <urezki@gmail.com>,
	Kefeng Wang <wangkefeng.wang@huawei.com>,
	zuoze <zuoze1@huawei.com>,
	gustavoars@kernel.org, akpm@linux-foundation.org,
	linux-hardening@vger.kernel.org, linux-mm@kvack.org,
	keescook@chromium.org
Subject: Re: [PATCH -next] mm: usercopy: add a debugfs interface to bypass the vmalloc check.
Date: Mon, 16 Dec 2024 20:18:24 +0100	[thread overview]
Message-ID: <Z2B9AMOSgTPboo2U@pc636> (raw)
In-Reply-To: <Z1-riu65--CviPba@casper.infradead.org>

Hello, Matthew!

> On Wed, Dec 04, 2024 at 09:51:07AM +0100, Uladzislau Rezki wrote:
> > I think, when i have more free cycles, i will check it from performance
> > point of view. Because i do not know how much a maple tree is efficient
> > when it comes to lookups, insert and removing.
> 
> Maple tree has a fanout of around 8-12 at each level, while an rbtree has
> a fanout of two (arguably 3, since we might find the node).  Let's say you
> have 1000 vmalloc areas.  A perfectly balanced rbtree would have 9 levels
> (and might well be 11+ levels if imperfectly balanced -- and part of the
> advantage of rbtrees over AVL trees is that they can be less balanced
> so need fewer rotations).  A perfectly balanced maple tree would have
> only 3 levels.
> 
Thank you for your explanation and some input on this topic. Density, a
high of tree and branching factor should make the work better :)

>
> Addition/removal is more expensive.  We biased the implementation heavily
> towards lookup, so we chose to keep it very compact.  Most users (and
> particularly the VMA tree which was our first client) do more lookups
> than modifications; a real application takes many more pagefaults than
> it does calls to mmap/munmap/mprotect/etc.
> 
This is what i see. Some use cases are degraded. For example stress-ng
forking bench is worse, test_vmalloc.sh also reports a degrade:

See below figures:

<snip>
# Default
urezki@pc638:~$ time sudo ./test_vmalloc.sh run_test_mask=7 nr_threads=64
+   59.52%     7.15%  [kernel]        [k] __vmalloc_node_range_noprof
+   37.98%     0.22%  [test_vmalloc]  [k] fix_size_alloc_test
+   37.32%     8.56%  [kernel]        [k] vfree.part.0
+   35.31%     0.00%  [kernel]        [k] ret_from_fork_asm
+   35.31%     0.00%  [kernel]        [k] ret_from_fork
+   35.31%     0.00%  [kernel]        [k] kthread
+   35.05%     0.00%  [test_vmalloc]  [k] test_func
+   34.16%     0.06%  [test_vmalloc]  [k] long_busy_list_alloc_test
+   32.10%     0.12%  [kernel]        [k] __get_vm_area_node
+   31.69%     1.82%  [kernel]        [k] alloc_vmap_area
+   27.24%     5.01%  [kernel]        [k] _raw_spin_lock
+   25.45%     0.15%  [test_vmalloc]  [k] full_fit_alloc_test
+   23.57%     0.03%  [kernel]        [k] remove_vm_area
+   22.23%    22.23%  [kernel]        [k] native_queued_spin_lock_slowpath
+   14.34%     0.94%  [kernel]        [k] alloc_pages_bulk_noprof
+   10.80%     7.51%  [kernel]        [k] free_vmap_area_noflush
+   10.59%    10.59%  [kernel]        [k] clear_page_rep
+    9.52%     8.96%  [kernel]        [k] insert_vmap_area
+    7.39%     2.82%  [kernel]        [k] find_unlink_vmap_area

# Maple-tree
time sudo ./test_vmalloc.sh run_test_mask=7 nr_threads=64
+   74.33%     1.50%  [kernel]        [k] __vmalloc_node_range_noprof
+   55.73%     0.06%  [kernel]        [k] __get_vm_area_node
+   55.53%     1.07%  [kernel]        [k] alloc_vmap_area
+   53.78%     0.09%  [test_vmalloc]  [k] long_busy_list_alloc_test
+   53.75%     1.76%  [kernel]        [k] _raw_spin_lock
+   52.81%    51.80%  [kernel]        [k] native_queued_spin_lock_slowpath
+   28.93%     0.09%  [test_vmalloc]  [k] full_fit_alloc_test
+   23.29%     2.43%  [kernel]        [k] vfree.part.0
+   20.29%     0.01%  [kernel]        [k] mt_insert_vmap_area
+   20.27%     0.34%  [kernel]        [k] mtree_insert_range
+   15.30%     0.05%  [test_vmalloc]  [k] fix_size_alloc_test
+   14.06%     0.05%  [kernel]        [k] remove_vm_area
+   13.73%     0.00%  [kernel]        [k] ret_from_fork_asm
+   13.73%     0.00%  [kernel]        [k] ret_from_fork
+   13.73%     0.00%  [kernel]        [k] kthread
+   13.51%     0.00%  [test_vmalloc]  [k] test_func
+   13.15%     0.87%  [kernel]        [k] alloc_pages_bulk_noprof
+    9.92%     9.54%  [kernel]        [k] clear_page_rep
+    9.62%     0.07%  [kernel]        [k] find_unlink_vmap_area
+    9.55%     0.04%  [kernel]        [k] mtree_erase
+    5.92%     1.44%  [kernel]        [k] free_unref_page
+    4.92%     0.24%  [kernel]        [k] mas_insert.isra.0
+    4.69%     0.93%  [kernel]        [k] mas_erase
+    4.47%     0.02%  [kernel]        [k] rcu_do_batch
+    3.35%     2.10%  [kernel]        [k] __vmap_pages_range_noflush
+    3.00%     2.81%  [kernel]        [k] mas_wr_store_type
<snip>

i.e. insert/remove are more expansive, at least my test show this.
It looks like, mtree_insert() uses a range_variant which implies
a tree update after an insert operation is completed. And probably
where an overhead comes from. If i use a b+tree(my own implementation),
as expected, it is better than rb-tree because of b+tree properties.

I have composed some data, you can find more bench data there:

wget ftp://vps418301.ovh.net/incoming/Maple_tree_comparison_with_rb_tree_in_vmalloc.pdf

>> That's what maple trees do; they store non-overlapping ranges.  So you
>> can look up any address in a range and it will return you the pointer
>> associated with that range.  Just like you'd want for a page fault ;-)
Thank you. I see. I though that it also can work as a regular b+ or b
tress so we do not spend cycles on updates to track ranges. Like below
code:

int ret = mtree_insert(t, va->va_start, va, GFP_KERNEL);

i do not store a range here, i store key -> value pair but maple-tree
considers it as range: [va_start:va_start]. Maybe we can improve this
case when not a range is passed?

This is just my thoughts :)

--
Uladzislau Rezki

  reply	other threads:[~2024-12-16 19:18 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-03  2:31 [PATCH -next] mm: usercopy: add a debugfs interface to bypass the vmalloc check Ze Zuo
2024-12-03  4:11 ` Matthew Wilcox
2024-12-03 11:23   ` zuoze
2024-12-03 12:39     ` Uladzislau Rezki
2024-12-03 13:10       ` zuoze
2024-12-03 13:25         ` Uladzislau Rezki
2024-12-03 13:30         ` Kefeng Wang
2024-12-03 13:39           ` Uladzislau Rezki
2024-12-03 13:45             ` Kefeng Wang
2024-12-03 13:51               ` Uladzislau Rezki
2024-12-03 14:10                 ` Kefeng Wang
2024-12-03 14:20                   ` Uladzislau Rezki
2024-12-03 19:02                     ` Uladzislau Rezki
2024-12-03 19:56                       ` Matthew Wilcox
2024-12-04  1:38                         ` zuoze
2024-12-04  4:43                         ` Kees Cook
2024-12-04  7:55                         ` Uladzislau Rezki
2024-12-04  9:21                           ` zuoze
2024-12-04  9:27                             ` Uladzislau Rezki
2024-12-04  8:51                         ` Uladzislau Rezki
2024-12-16  4:24                           ` Matthew Wilcox
2024-12-16 19:18                             ` Uladzislau Rezki [this message]
2024-12-04  1:21                       ` zuoze
2024-12-03  6:12 ` Uladzislau Rezki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z2B9AMOSgTPboo2U@pc636 \
    --to=urezki@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=gustavoars@kernel.org \
    --cc=keescook@chromium.org \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    --cc=zuoze1@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.