From: Andrii Nakryiko <andrii@kernel.org>
To: linux-trace-kernel@vger.kernel.org, peterz@infradead.org,
oleg@redhat.com
Cc: rostedt@goodmis.org, mhiramat@kernel.org, bpf@vger.kernel.org,
linux-kernel@vger.kernel.org, jolsa@kernel.org,
paulmck@kernel.org, willy@infradead.org, surenb@google.com,
akpm@linux-foundation.org, linux-mm@kvack.org,
Andrii Nakryiko <andrii@kernel.org>
Subject: [PATCH v3 07/13] uprobes: perform lockless SRCU-protected uprobes_tree lookup
Date: Mon, 12 Aug 2024 21:29:11 -0700 [thread overview]
Message-ID: <20240813042917.506057-8-andrii@kernel.org> (raw)
In-Reply-To: <20240813042917.506057-1-andrii@kernel.org>
Another big bottleneck to scalablity is uprobe_treelock that's taken in
a very hot path in handle_swbp(). Now that uprobes are SRCU-protected,
take advantage of that and make uprobes_tree RB-tree look up lockless.
To make RB-tree RCU-protected lockless lookup correct, we need to take
into account that such RB-tree lookup can return false negatives if there
are parallel RB-tree modifications (rotations) going on. We use seqcount
lock to detect whether RB-tree changed, and if we find nothing while
RB-tree got modified inbetween, we just retry. If uprobe was found, then
it's guaranteed to be a correct lookup.
With all the lock-avoiding changes done, we get a pretty decent
improvement in performance and scalability of uprobes with number of
CPUs, even though we are still nowhere near linear scalability. This is
due to SRCU not really scaling very well with number of CPUs on
a particular hardware that was used for testing (80-core Intel Xeon Gold
6138 CPU @ 2.00GHz), but also due to the remaning mmap_lock, which is
currently taken to resolve interrupt address to inode+offset and then
uprobe instance. And, of course, uretprobes still need similar RCU to
avoid refcount in the hot path, which will be addressed in the follow up
patches.
Nevertheless, the improvement is good. We used BPF selftest-based
uprobe-nop and uretprobe-nop benchmarks to get the below numbers,
varying number of CPUs on which uprobes and uretprobes are triggered.
BASELINE
========
uprobe-nop ( 1 cpus): 3.032 ± 0.023M/s ( 3.032M/s/cpu)
uprobe-nop ( 2 cpus): 3.452 ± 0.005M/s ( 1.726M/s/cpu)
uprobe-nop ( 4 cpus): 3.663 ± 0.005M/s ( 0.916M/s/cpu)
uprobe-nop ( 8 cpus): 3.718 ± 0.038M/s ( 0.465M/s/cpu)
uprobe-nop (16 cpus): 3.344 ± 0.008M/s ( 0.209M/s/cpu)
uprobe-nop (32 cpus): 2.288 ± 0.021M/s ( 0.071M/s/cpu)
uprobe-nop (64 cpus): 3.205 ± 0.004M/s ( 0.050M/s/cpu)
uretprobe-nop ( 1 cpus): 1.979 ± 0.005M/s ( 1.979M/s/cpu)
uretprobe-nop ( 2 cpus): 2.361 ± 0.005M/s ( 1.180M/s/cpu)
uretprobe-nop ( 4 cpus): 2.309 ± 0.002M/s ( 0.577M/s/cpu)
uretprobe-nop ( 8 cpus): 2.253 ± 0.001M/s ( 0.282M/s/cpu)
uretprobe-nop (16 cpus): 2.007 ± 0.000M/s ( 0.125M/s/cpu)
uretprobe-nop (32 cpus): 1.624 ± 0.003M/s ( 0.051M/s/cpu)
uretprobe-nop (64 cpus): 2.149 ± 0.001M/s ( 0.034M/s/cpu)
SRCU CHANGES
============
uprobe-nop ( 1 cpus): 3.276 ± 0.005M/s ( 3.276M/s/cpu)
uprobe-nop ( 2 cpus): 4.125 ± 0.002M/s ( 2.063M/s/cpu)
uprobe-nop ( 4 cpus): 7.713 ± 0.002M/s ( 1.928M/s/cpu)
uprobe-nop ( 8 cpus): 8.097 ± 0.006M/s ( 1.012M/s/cpu)
uprobe-nop (16 cpus): 6.501 ± 0.056M/s ( 0.406M/s/cpu)
uprobe-nop (32 cpus): 4.398 ± 0.084M/s ( 0.137M/s/cpu)
uprobe-nop (64 cpus): 6.452 ± 0.000M/s ( 0.101M/s/cpu)
uretprobe-nop ( 1 cpus): 2.055 ± 0.001M/s ( 2.055M/s/cpu)
uretprobe-nop ( 2 cpus): 2.677 ± 0.000M/s ( 1.339M/s/cpu)
uretprobe-nop ( 4 cpus): 4.561 ± 0.003M/s ( 1.140M/s/cpu)
uretprobe-nop ( 8 cpus): 5.291 ± 0.002M/s ( 0.661M/s/cpu)
uretprobe-nop (16 cpus): 5.065 ± 0.019M/s ( 0.317M/s/cpu)
uretprobe-nop (32 cpus): 3.622 ± 0.003M/s ( 0.113M/s/cpu)
uretprobe-nop (64 cpus): 3.723 ± 0.002M/s ( 0.058M/s/cpu)
Peak througput increased from 3.7 mln/s (uprobe triggerings) up to about
8 mln/s. For uretprobes it's a bit more modest with bump from 2.4 mln/s
to 5mln/s.
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
kernel/events/uprobes.c | 30 ++++++++++++++++++++++++------
1 file changed, 24 insertions(+), 6 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 0b6d4c0a0088..8559ca365679 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -40,6 +40,7 @@ static struct rb_root uprobes_tree = RB_ROOT;
#define no_uprobe_events() RB_EMPTY_ROOT(&uprobes_tree)
static DEFINE_RWLOCK(uprobes_treelock); /* serialize rbtree access */
+static seqcount_rwlock_t uprobes_seqcount = SEQCNT_RWLOCK_ZERO(uprobes_seqcount, &uprobes_treelock);
DEFINE_STATIC_SRCU(uprobes_srcu);
@@ -634,8 +635,11 @@ static void put_uprobe(struct uprobe *uprobe)
write_lock(&uprobes_treelock);
- if (uprobe_is_active(uprobe))
+ if (uprobe_is_active(uprobe)) {
+ write_seqcount_begin(&uprobes_seqcount);
rb_erase(&uprobe->rb_node, &uprobes_tree);
+ write_seqcount_end(&uprobes_seqcount);
+ }
write_unlock(&uprobes_treelock);
@@ -701,14 +705,26 @@ static struct uprobe *find_uprobe_rcu(struct inode *inode, loff_t offset)
.offset = offset,
};
struct rb_node *node;
+ unsigned int seq;
lockdep_assert(srcu_read_lock_held(&uprobes_srcu));
- read_lock(&uprobes_treelock);
- node = rb_find(&key, &uprobes_tree, __uprobe_cmp_key);
- read_unlock(&uprobes_treelock);
+ do {
+ seq = read_seqcount_begin(&uprobes_seqcount);
+ node = rb_find_rcu(&key, &uprobes_tree, __uprobe_cmp_key);
+ /*
+ * Lockless RB-tree lookups can result only in false negatives.
+ * If the element is found, it is correct and can be returned
+ * under RCU protection. If we find nothing, we need to
+ * validate that seqcount didn't change. If it did, we have to
+ * try again as we might have missed the element (false
+ * negative). If seqcount is unchanged, search truly failed.
+ */
+ if (node)
+ return __node_2_uprobe(node);
+ } while (read_seqcount_retry(&uprobes_seqcount, seq));
- return node ? __node_2_uprobe(node) : NULL;
+ return NULL;
}
/*
@@ -730,7 +746,7 @@ static struct uprobe *__insert_uprobe(struct uprobe *uprobe)
{
struct rb_node *node;
again:
- node = rb_find_add(&uprobe->rb_node, &uprobes_tree, __uprobe_cmp);
+ node = rb_find_add_rcu(&uprobe->rb_node, &uprobes_tree, __uprobe_cmp);
if (node) {
struct uprobe *u = __node_2_uprobe(node);
@@ -755,7 +771,9 @@ static struct uprobe *insert_uprobe(struct uprobe *uprobe)
struct uprobe *u;
write_lock(&uprobes_treelock);
+ write_seqcount_begin(&uprobes_seqcount);
u = __insert_uprobe(uprobe);
+ write_seqcount_end(&uprobes_seqcount);
write_unlock(&uprobes_treelock);
return u;
--
2.43.5
next prev parent reply other threads:[~2024-08-13 4:30 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-13 4:29 [PATCH v3 00/13] uprobes: RCU-protected hot path optimizations Andrii Nakryiko
2024-08-13 4:29 ` [PATCH v3 01/13] uprobes: revamp uprobe refcounting and lifetime management Andrii Nakryiko
2024-08-13 4:29 ` [PATCH v3 02/13] uprobes: protected uprobe lifetime with SRCU Andrii Nakryiko
2024-08-13 4:29 ` [PATCH v3 03/13] uprobes: get rid of enum uprobe_filter_ctx in uprobe filter callbacks Andrii Nakryiko
2024-08-13 4:29 ` [PATCH v3 04/13] uprobes: travers uprobe's consumer list locklessly under SRCU protection Andrii Nakryiko
2024-08-22 14:22 ` Jiri Olsa
2024-08-22 16:59 ` Andrii Nakryiko
2024-08-22 17:35 ` Jiri Olsa
2024-08-22 17:51 ` Andrii Nakryiko
2024-08-13 4:29 ` [PATCH v3 05/13] perf/uprobe: split uprobe_unregister() Andrii Nakryiko
2024-08-13 4:29 ` [PATCH v3 06/13] rbtree: provide rb_find_rcu() / rb_find_add_rcu() Andrii Nakryiko
2024-08-13 4:29 ` Andrii Nakryiko [this message]
2024-08-13 4:29 ` [PATCH v3 08/13] uprobes: switch to RCU Tasks Trace flavor for better performance Andrii Nakryiko
2024-08-13 4:29 ` [PATCH RFC v3 09/13] uprobes: SRCU-protect uretprobe lifetime (with timeout) Andrii Nakryiko
2024-08-19 13:41 ` Oleg Nesterov
2024-08-19 20:34 ` Andrii Nakryiko
2024-08-20 15:05 ` Oleg Nesterov
2024-08-20 18:01 ` Andrii Nakryiko
2024-08-13 4:29 ` [PATCH RFC v3 10/13] uprobes: implement SRCU-protected lifetime for single-stepped uprobe Andrii Nakryiko
2024-08-13 4:29 ` [PATCH RFC v3 11/13] mm: introduce mmap_lock_speculation_{start|end} Andrii Nakryiko
2024-08-13 4:29 ` [PATCH RFC v3 12/13] mm: add SLAB_TYPESAFE_BY_RCU to files_cache Andrii Nakryiko
2024-08-13 6:07 ` Mateusz Guzik
2024-08-13 14:49 ` Suren Baghdasaryan
2024-08-13 18:15 ` Andrii Nakryiko
2024-08-13 4:29 ` [PATCH RFC v3 13/13] uprobes: add speculative lockless VMA to inode resolution Andrii Nakryiko
2024-08-13 6:17 ` Mateusz Guzik
2024-08-13 15:36 ` Suren Baghdasaryan
2024-08-15 13:44 ` Mateusz Guzik
2024-08-15 16:47 ` Andrii Nakryiko
2024-08-15 17:45 ` Suren Baghdasaryan
2024-08-15 18:24 ` Mateusz Guzik
2024-08-15 18:58 ` Jann Horn
2024-08-15 19:07 ` Mateusz Guzik
2024-08-15 19:17 ` Arnaldo Carvalho de Melo
2024-08-15 19:18 ` Arnaldo Carvalho de Melo
2024-08-15 19:44 ` Suren Baghdasaryan
2024-08-15 20:17 ` Andrii Nakryiko
2024-08-15 13:24 ` [PATCH v3 00/13] uprobes: RCU-protected hot path optimizations Oleg Nesterov
2024-08-15 16:49 ` Andrii Nakryiko
2024-08-21 16:41 ` Andrii Nakryiko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240813042917.506057-8-andrii@kernel.org \
--to=andrii@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=bpf@vger.kernel.org \
--cc=jolsa@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mhiramat@kernel.org \
--cc=oleg@redhat.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=surenb@google.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).