From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
To: Jonthan Haslam <jonathan.haslam@gmail.com>
Cc: linux-trace-kernel@vger.kernel.org, andrii@kernel.org,
bpf@vger.kernel.org, rostedt@goodmis.org,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Namhyung Kim <namhyung@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Jiri Olsa <jolsa@kernel.org>, Ian Rogers <irogers@google.com>,
Adrian Hunter <adrian.hunter@intel.com>,
linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] uprobes: reduce contention on uprobes_tree access
Date: Wed, 27 Mar 2024 08:42:58 +0900 [thread overview]
Message-ID: <20240327084258.c385e997782a97fef07ba084@kernel.org> (raw)
In-Reply-To: <dk3obkyavqgzr2xbpykbz3knwgyxl73acuunocoygbhtz5imhm@mdqdefp6kz3t>
On Mon, 25 Mar 2024 19:04:59 +0000
Jonthan Haslam <jonathan.haslam@gmail.com> wrote:
> Hi Masami,
>
> > > This change has been tested against production workloads that exhibit
> > > significant contention on the spinlock and an almost order of magnitude
> > > reduction for mean uprobe execution time is observed (28 -> 3.5 microsecs).
> >
> > Looks good to me.
> >
> > Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> >
> > BTW, how did you measure the overhead? I think spinlock overhead
> > will depend on how much lock contention happens.
>
> Absolutely. I have the original production workload to test this with and
> a derived one that mimics this test case. The production case has ~24
> threads running on a 192 core system which access 14 USDTs around 1.5
> million times per second in total (across all USDTs). My test case is
> similar but can drive a higher rate of USDT access across more threads and
> therefore generate higher contention.
Thanks for the info. So this result is measured in enough large machine
with high parallelism. So lock contention is matter.
Can you also include this information with the number in next version?
Thank you,
>
> All measurements are done using bpftrace scripts around relevant parts of
> code in uprobes.c and application code.
>
> Jon.
>
> >
> > Thank you,
> >
> > >
> > > [0] https://docs.kernel.org/locking/spinlocks.html
> > >
> > > Signed-off-by: Jonathan Haslam <jonathan.haslam@gmail.com>
> > > ---
> > > kernel/events/uprobes.c | 22 +++++++++++-----------
> > > 1 file changed, 11 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> > > index 929e98c62965..42bf9b6e8bc0 100644
> > > --- a/kernel/events/uprobes.c
> > > +++ b/kernel/events/uprobes.c
> > > @@ -39,7 +39,7 @@ static struct rb_root uprobes_tree = RB_ROOT;
> > > */
> > > #define no_uprobe_events() RB_EMPTY_ROOT(&uprobes_tree)
> > >
> > > -static DEFINE_SPINLOCK(uprobes_treelock); /* serialize rbtree access */
> > > +static DEFINE_RWLOCK(uprobes_treelock); /* serialize rbtree access */
> > >
> > > #define UPROBES_HASH_SZ 13
> > > /* serialize uprobe->pending_list */
> > > @@ -669,9 +669,9 @@ static struct uprobe *find_uprobe(struct inode *inode, loff_t offset)
> > > {
> > > struct uprobe *uprobe;
> > >
> > > - spin_lock(&uprobes_treelock);
> > > + read_lock(&uprobes_treelock);
> > > uprobe = __find_uprobe(inode, offset);
> > > - spin_unlock(&uprobes_treelock);
> > > + read_unlock(&uprobes_treelock);
> > >
> > > return uprobe;
> > > }
> > > @@ -701,9 +701,9 @@ static struct uprobe *insert_uprobe(struct uprobe *uprobe)
> > > {
> > > struct uprobe *u;
> > >
> > > - spin_lock(&uprobes_treelock);
> > > + write_lock(&uprobes_treelock);
> > > u = __insert_uprobe(uprobe);
> > > - spin_unlock(&uprobes_treelock);
> > > + write_unlock(&uprobes_treelock);
> > >
> > > return u;
> > > }
> > > @@ -935,9 +935,9 @@ static void delete_uprobe(struct uprobe *uprobe)
> > > if (WARN_ON(!uprobe_is_active(uprobe)))
> > > return;
> > >
> > > - spin_lock(&uprobes_treelock);
> > > + write_lock(&uprobes_treelock);
> > > rb_erase(&uprobe->rb_node, &uprobes_tree);
> > > - spin_unlock(&uprobes_treelock);
> > > + write_unlock(&uprobes_treelock);
> > > RB_CLEAR_NODE(&uprobe->rb_node); /* for uprobe_is_active() */
> > > put_uprobe(uprobe);
> > > }
> > > @@ -1298,7 +1298,7 @@ static void build_probe_list(struct inode *inode,
> > > min = vaddr_to_offset(vma, start);
> > > max = min + (end - start) - 1;
> > >
> > > - spin_lock(&uprobes_treelock);
> > > + read_lock(&uprobes_treelock);
> > > n = find_node_in_range(inode, min, max);
> > > if (n) {
> > > for (t = n; t; t = rb_prev(t)) {
> > > @@ -1316,7 +1316,7 @@ static void build_probe_list(struct inode *inode,
> > > get_uprobe(u);
> > > }
> > > }
> > > - spin_unlock(&uprobes_treelock);
> > > + read_unlock(&uprobes_treelock);
> > > }
> > >
> > > /* @vma contains reference counter, not the probed instruction. */
> > > @@ -1407,9 +1407,9 @@ vma_has_uprobes(struct vm_area_struct *vma, unsigned long start, unsigned long e
> > > min = vaddr_to_offset(vma, start);
> > > max = min + (end - start) - 1;
> > >
> > > - spin_lock(&uprobes_treelock);
> > > + read_lock(&uprobes_treelock);
> > > n = find_node_in_range(inode, min, max);
> > > - spin_unlock(&uprobes_treelock);
> > > + read_unlock(&uprobes_treelock);
> > >
> > > return !!n;
> > > }
> > > --
> > > 2.43.0
> > >
> >
> >
> > --
> > Masami Hiramatsu (Google) <mhiramat@kernel.org>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
next prev parent reply other threads:[~2024-03-26 23:43 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-21 14:57 [PATCH] uprobes: reduce contention on uprobes_tree access Jonathan Haslam
2024-03-21 16:11 ` Andrii Nakryiko
2024-03-24 3:28 ` Ingo Molnar
2024-03-25 19:12 ` Jonthan Haslam
2024-03-25 23:14 ` Andrii Nakryiko
2024-03-26 11:55 ` Jonthan Haslam
2024-03-25 3:03 ` Masami Hiramatsu
2024-03-25 19:04 ` Jonthan Haslam
2024-03-26 23:42 ` Masami Hiramatsu [this message]
2024-03-26 16:01 ` Andrii Nakryiko
2024-03-26 23:42 ` Masami Hiramatsu
2024-03-27 17:06 ` Jonthan Haslam
2024-03-28 0:18 ` Masami Hiramatsu
2024-03-28 0:45 ` Andrii Nakryiko
2024-03-29 17:33 ` Andrii Nakryiko
2024-03-30 0:36 ` Masami Hiramatsu
2024-03-30 5:26 ` Andrii Nakryiko
2024-04-10 10:38 ` Jonthan Haslam
2024-04-10 23:21 ` Masami Hiramatsu
2024-04-11 8:41 ` Jonthan Haslam
2024-04-18 11:10 ` Jonthan Haslam
2024-04-19 0:43 ` Masami Hiramatsu
2024-04-03 11:05 ` Jonthan Haslam
2024-04-03 17:50 ` Andrii Nakryiko
2024-04-04 10:45 ` Jonthan Haslam
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240327084258.c385e997782a97fef07ba084@kernel.org \
--to=mhiramat@kernel.org \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=andrii@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=irogers@google.com \
--cc=jolsa@kernel.org \
--cc=jonathan.haslam@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).