From: "Paul E. McKenney" <paulmck@kernel.org>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Daniel Borkmann <daniel@iogearbox.net>,
Kernel Team <Kernel-team@fb.com>
Subject: Re: slow sync rcu_tasks_trace
Date: Wed, 9 Sep 2020 12:39:00 -0700 [thread overview]
Message-ID: <20200909193900.GK29330@paulmck-ThinkPad-P72> (raw)
In-Reply-To: <20200909180418.hlivoaekhkchlidw@ast-mbp.dhcp.thefacebook.com>
On Wed, Sep 09, 2020 at 11:04:18AM -0700, Alexei Starovoitov wrote:
> On Wed, Sep 09, 2020 at 10:35:12AM -0700, Paul E. McKenney wrote:
> > On Wed, Sep 09, 2020 at 10:12:28AM -0700, Alexei Starovoitov wrote:
> > > On Wed, Sep 09, 2020 at 04:38:58AM -0700, Paul E. McKenney wrote:
> > > > On Tue, Sep 08, 2020 at 07:34:20PM -0700, Alexei Starovoitov wrote:
> > > > > Hi Paul,
> > > > >
> > > > > Looks like sync rcu_tasks_trace got slower or we simply didn't notice
> > > > > it earlier.
> > > > >
> > > > > In selftests/bpf try:
> > > > > time ./test_progs -t trampoline_count
> > > > > #101 trampoline_count:OK
> > > > > Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
> > > > >
> > > > > real 1m17.082s
> > > > > user 0m0.145s
> > > > > sys 0m1.369s
> > > > >
> > > > > so it's really something going on with sync rcu_tasks_trace.
> > > > > Could you please take a look?
> > > >
> > > > I am guessing that your .config has CONFIG_TASKS_TRACE_RCU_READ_MB=n.
> > > > If I am wrong, please try CONFIG_TASKS_TRACE_RCU_READ_MB=y.
> > >
> > > I've added
> > > CONFIG_RCU_EXPERT=y
> > > CONFIG_TASKS_TRACE_RCU_READ_MB=y
> > >
> > > and it helped:
> > >
> > > time ./test_progs -t trampoline_count
> > > #101 trampoline_count:OK
> > > Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
> > >
> > > real 0m8.924s
> > > user 0m0.138s
> > > sys 0m1.408s
> > >
> > > But this is still bad. It's 4 times slower vs rcu_tasks
> > > and isn't really usable for bpf, since it adds memory barriers exactly
> > > where we need them removed.
> > >
> > > In the default configuration rcu_tasks_trace is 40! times slower than rcu_tasks.
> > > This huge difference in sync times concerns me a lot.
> > > If bpf has to use memory barriers in rcu_read_lock_trace
> > > and still be 4 times slower than rcu_tasks in the best case
> > > then there is no much point in rcu_tasks_trace.
> > > Converting everything to srcu would be better, but I really hope
> > > you can find a solution to this tasks_trace issue.
> > >
> > > > Otherwise (or alternatively), could you please try booting with
> > > > rcupdate.rcu_task_ipi_delay=50? The default value is 500, or half a
> > > > second on a HZ=1000 system, which on a busy system could easily result
> > > > in the grace-period delays that you are seeing. The value of this
> > > > kernel boot parameter does interact with the tasklist-scan backoffs,
> > > > so its effect will not likely be linear.
> > >
> > > The tests were run on freshly booted VM with 4 cpus. The VM is idle.
> > > The host is idle too.
> > >
> > > Adding rcupdate.rcu_task_ipi_delay=50 boot param sort-of helped:
> > > time ./test_progs -t trampoline_count
> > > #101 trampoline_count:OK
> > > Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
> > >
> > > real 0m25.890s
> > > user 0m0.124s
> > > sys 0m1.507s
> > > It is still awful.
> > >
> > > >From "perf report" there is little time spend in the kernel. The kernel is
> > > waiting on something. I thought in theory the rcu_tasks_trace should have been
> > > faster on update side vs rcu_tasks ? Could it be a bug somewhere and some
> > > missing wakeup? It doesn't feel that it works as intended. Whatever it is
> > > please try to reproduce it to remove me as a middle man.
> >
> > On it.
> >
> > To be fair, I was designing for a nominal one-second grace period,
> > which was also the rough goal for rcu_tasks.
> >
> > When do you need this by?
> >
> > Left to myself, I will aim for the merge window after the upcoming one,
> > and then backport to the prior -stable versions having RCU tasks trace.
>
> That would be too late.
> We would have to disable sleepable bpf progs or convert them to srcu.
> bcc/bpftrace have a limit of 1000 probes for regexes to make sure
> these tools don't add too many kprobes to the kernel at once.
> Right now fentry/fexit/freplace are using trampoline which does
> synchronize_rcu_tasks(). My measurements show that it's roughly
> equal to synchronize_rcu() on idle box and perfectly capable to
> be a replacement for kprobe based attaching.
> It's not uncommon to attach a hundred kprobes or fentry probes at
> a start time. So bpf trampoline has to be able to do 1000 in a second.
> And it was the case before sleepable got added to the trampoline.
> Now it's doing:
> synchronize_rcu_mult(call_rcu_tasks, call_rcu_tasks_trace);
> and it's causing this massive slowdown which makes bpf trampoline
> pretty much unusable and everything that builds on top suffers.
> I can add a counter of sleepable progs to trampoline and do
> either sync rcu_tasks or sync_mult(tasks, tasks_trace),
> but we've discussed exactly that idea few months back and concluded that
> rcu_tasks is likely to be heavier than rcu_tasks_trace, so I didn't
> bother with the counter. I can still add it, but slow rcu_tasks_trace
> means that sleepable progs are not usable due to slow startup time,
> so have to do something with sleepable anyway.
> So "when do you need this by?" the answer is asap.
> I'm considering such changes to be a bugfix, not a feture.
Got it.
With the patch below, I am able to reproduce this issue, as expected.
My plan is to try the following:
1. Parameterize the backoff sequence so that RCU Tasks Trace
uses faster rechecking than does RCU Tasks. Experiment as
needed to arrive at a good backoff value.
2. If the tasks-list scan turns out to be a tighter bottleneck
than the backoff waits, look into parallelizing this scan.
(This seems unlikely, but the fact remains that RCU Tasks
Trace must do a bit more work per task than RCU Tasks.)
3. If these two approaches, still don't get the update-side
latency where it needs to be, improvise.
The exact path into mainline will of course depend on how far down this
list I must go, but first to get a solution.
Thanx, Paul
------------------------------------------------------------------------
commit 1b5b6a341cc17b5f236bceca3d1cfb23e39176b5
Author: Paul E. McKenney <paulmck@kernel.org>
Date: Wed Sep 9 12:27:03 2020 -0700
rcuscale: Add RCU Tasks Trace
This commit adds the ability to test performance and scalability of RCU
Tasks Trace updaters.
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
diff --git a/kernel/rcu/rcuscale.c b/kernel/rcu/rcuscale.c
index 2819b95..c42f240 100644
--- a/kernel/rcu/rcuscale.c
+++ b/kernel/rcu/rcuscale.c
@@ -38,6 +38,7 @@
#include <asm/byteorder.h>
#include <linux/torture.h>
#include <linux/vmalloc.h>
+#include <linux/rcupdate_trace.h>
#include "rcu.h"
@@ -294,6 +295,35 @@ static struct rcu_scale_ops tasks_ops = {
.name = "tasks"
};
+/*
+ * Definitions for RCU-tasks-trace scalability testing.
+ */
+
+static int tasks_trace_scale_read_lock(void)
+{
+ rcu_read_lock_trace();
+ return 0;
+}
+
+static void tasks_trace_scale_read_unlock(int idx)
+{
+ rcu_read_unlock_trace();
+}
+
+static struct rcu_scale_ops tasks_tracing_ops = {
+ .ptype = RCU_TASKS_FLAVOR,
+ .init = rcu_sync_scale_init,
+ .readlock = tasks_trace_scale_read_lock,
+ .readunlock = tasks_trace_scale_read_unlock,
+ .get_gp_seq = rcu_no_completed,
+ .gp_diff = rcu_seq_diff,
+ .async = call_rcu_tasks_trace,
+ .gp_barrier = rcu_barrier_tasks_trace,
+ .sync = synchronize_rcu_tasks_trace,
+ .exp_sync = synchronize_rcu_tasks_trace,
+ .name = "tasks-tracing"
+};
+
static unsigned long rcuscale_seq_diff(unsigned long new, unsigned long old)
{
if (!cur_ops->gp_diff)
@@ -754,7 +784,7 @@ rcu_scale_init(void)
long i;
int firsterr = 0;
static struct rcu_scale_ops *scale_ops[] = {
- &rcu_ops, &srcu_ops, &srcud_ops, &tasks_ops,
+ &rcu_ops, &srcu_ops, &srcud_ops, &tasks_ops, &tasks_tracing_ops
};
if (!torture_init_begin(scale_type, verbose))
diff --git a/tools/testing/selftests/rcutorture/configs/rcuscale/CFcommon b/tools/testing/selftests/rcutorture/configs/rcuscale/CFcommon
index 87caa0e..90942bb 100644
--- a/tools/testing/selftests/rcutorture/configs/rcuscale/CFcommon
+++ b/tools/testing/selftests/rcutorture/configs/rcuscale/CFcommon
@@ -1,2 +1,5 @@
CONFIG_RCU_SCALE_TEST=y
CONFIG_PRINTK_TIME=y
+CONFIG_TASKS_RCU_GENERIC=y
+CONFIG_TASKS_RCU=y
+CONFIG_TASKS_TRACE_RCU=y
diff --git a/tools/testing/selftests/rcutorture/configs/rcuscale/TRACE01 b/tools/testing/selftests/rcutorture/configs/rcuscale/TRACE01
new file mode 100644
index 0000000..4255490
--- /dev/null
+++ b/tools/testing/selftests/rcutorture/configs/rcuscale/TRACE01
@@ -0,0 +1,18 @@
+CONFIG_SMP=y
+CONFIG_PREEMPT_NONE=y
+CONFIG_PREEMPT_VOLUNTARY=n
+CONFIG_PREEMPT=n
+CONFIG_HZ_PERIODIC=n
+CONFIG_NO_HZ_IDLE=y
+CONFIG_NO_HZ_FULL=n
+CONFIG_RCU_FAST_NO_HZ=n
+CONFIG_HOTPLUG_CPU=n
+CONFIG_SUSPEND=n
+CONFIG_HIBERNATION=n
+CONFIG_RCU_NOCB_CPU=n
+CONFIG_DEBUG_LOCK_ALLOC=n
+CONFIG_PROVE_LOCKING=n
+CONFIG_RCU_BOOST=n
+CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
+CONFIG_RCU_EXPERT=y
+CONFIG_RCU_TRACE=y
diff --git a/tools/testing/selftests/rcutorture/configs/rcuscale/TRACE01.boot b/tools/testing/selftests/rcutorture/configs/rcuscale/TRACE01.boot
new file mode 100644
index 0000000..af0aff1
--- /dev/null
+++ b/tools/testing/selftests/rcutorture/configs/rcuscale/TRACE01.boot
@@ -0,0 +1 @@
+rcuscale.scale_type=tasks-tracing
next prev parent reply other threads:[~2020-09-09 19:39 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-09 2:34 slow sync rcu_tasks_trace Alexei Starovoitov
2020-09-09 11:38 ` Paul E. McKenney
2020-09-09 15:10 ` Jiri Olsa
2020-09-09 17:02 ` Paul E. McKenney
2020-09-09 17:12 ` Alexei Starovoitov
2020-09-09 17:35 ` Paul E. McKenney
2020-09-09 18:04 ` Alexei Starovoitov
2020-09-09 19:39 ` Paul E. McKenney [this message]
2020-09-09 19:48 ` Alexei Starovoitov
2020-09-09 21:04 ` Paul E. McKenney
2020-09-09 21:22 ` Paul E. McKenney
2020-09-10 5:27 ` Paul E. McKenney
2020-09-10 18:33 ` Alexei Starovoitov
2020-09-10 18:51 ` Paul E. McKenney
2020-09-10 19:04 ` Alexei Starovoitov
2020-09-10 20:24 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200909193900.GK29330@paulmck-ThinkPad-P72 \
--to=paulmck@kernel.org \
--cc=Kernel-team@fb.com \
--cc=alexei.starovoitov@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox