From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Oleg Nesterov <oleg@redhat.com>,
linux-kernel@vger.kernel.org, mingo@kernel.org,
laijs@cn.fujitsu.com, dipankar@in.ibm.com,
akpm@linux-foundation.org, mathieu.desnoyers@efficios.com,
josh@joshtriplett.org, tglx@linutronix.de, dhowells@redhat.com,
edumazet@google.com, dvhart@linux.intel.com, fweisbec@gmail.com,
bobby.prani@gmail.com, masami.hiramatsu.pt@hitachi.com
Subject: Re: [PATCH v3 tip/core/rcu 3/9] rcu: Add synchronous grace-period waiting for RCU-tasks
Date: Fri, 8 Aug 2014 07:28:10 -0700 [thread overview]
Message-ID: <20140808142810.GV5821@linux.vnet.ibm.com> (raw)
In-Reply-To: <20140808101221.21056900@gandalf.local.home>
On Fri, Aug 08, 2014 at 10:12:21AM -0400, Steven Rostedt wrote:
> On Fri, 8 Aug 2014 08:40:20 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
>
> > On Thu, Aug 07, 2014 at 05:18:23PM -0400, Steven Rostedt wrote:
> > > On Thu, 7 Aug 2014 22:08:13 +0200
> > > Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > > OK, you've got to start over and start at the beginning, because I'm
> > > > really not understanding this..
> > > >
> > > > What is a 'trampoline' and what are you going to use them for.
> > >
> > > Great question! :-)
> > >
> > > The trampoline is some code that is used to jump to and then jump
> > > someplace else. Currently, we use this for kprobes and ftrace. For
> > > ftrace we have the ftrace_caller trampoline, which is static. When
> > > booting, most functions in the kernel call the mcount code which
> > > simply returns without doing anything. This too is a "trampoline". At
> > > boot, we convert these calls to nops (as you already know). When we
> > > enable callbacks from functions, we convert those calls to call
> > > "ftrace_caller" which is a small assembly trampoline that will call
> > > some function that registered with ftrace.
> > >
> > > Now why do we need the call_rcu_task() routine?
> > >
> > > Right now, if you register multiple callbacks to ftrace, even if they
> > > are not tracing the same routine, ftrace has to change ftrace_caller to
> > > call another trampoline (in C), that does a loop of all ops registered
> > > with ftrace, and compares the function to the ops hash tables to see if
> > > the ops function should be called for that function.
> > >
> > > What we want to do is to create a dynamic trampoline that is a copy of
> > > the ftrace_caller code, but instead of calling this list trampoline, it
> > > calls the ops function directly. This way, each ops registered with
> > > ftrace can have its own custom trampoline that when called will only
> > > call the ops function and not have to iterate over a list. This only
> > > happens if the function being traced only has this one ops registered.
> > > For functions with multiple ops attached to it, we need to call the
> > > list anyway. But for the majority of the cases, this is not the case.
> > >
> > > The one caveat for this is, how do we free this custom trampoline when
> > > the ops is done with it? Especially for users of ftrace that
> > > dynamically create their own ops (like perf, and ftrace instances).
> > >
> > > We need to find a way to free it, but unfortunately, there's no way to
> > > know when it is safe to free it. There's no way to disable preemption
> > > or have some other notifier to let us know if a task has jumped to this
> > > trampoline and has been preempted (sleeping). The only safe way to know
> > > that no task is on the trampoline is to remove the calls to it,
> > > synchronize the CPUS (so the trampolines are not even in the caches),
> > > and then wait for all tasks to go through some quiescent state. This
> > > state happens to be either not running, in userspace, or when it
> > > voluntarily calls schedule. Because nothing that uses this trampoline
> > > should do that, and if the task voluntarily calls schedule, we know
> > > it's not on the trampoline.
> > >
> > > Make sense?
> >
> > Ok, so they're purely used in the function prologue/epilogue callchain.
>
> No, they are also used by optimized kprobes. This is why optimized
> kprobes depend on !CONFIG_PREEMPT. [ added Masami to the discussion ].
>
> Which reminds me. On !CONFIG_PREEMPT, call_rcu_task() should be
> equivalent to call_rcu_sched().
Almost. One difference is that call_rcu_sched() won't wait for
idle-task execution. So presumably you are currently prohibited from
putting kprobes in idle tasks.
Oleg slipped this one past me, and for more than a full hour,
(https://lkml.org/lkml/2014/8/2/18), but this time I remembered. ;-)
Thanx, Paul
> > And you don't want to use synchronize_tasks() because registering a trace
> > functions is atomic ?
>
> No. Has nothing to do with registering the trace function. The issue is
> that we have no idea when a task happens to be on a trampoline after it
> is registered. For example:
>
> ops adds a callback to sys_read:
>
> sys_read() {
> call trampoline ->
> set up regs for function call.
> <interrupt>
> preempt_schedule();
>
> [ new task runs for long time ]
>
>
> While this new task is running, we remove the trampoline and want to
> free it. Say this new task keeps the other task from running for
> minutes! We call synchronize_sched() or any other rcu call, and all
> grace periods finish and we free the trampoline. The sys_read() no
> longer calls our trampoline. Doesn't matter, because that task is still
> on it. Now we schedule that task back. It's on a trampoline that has
> just been freed! BOOM. It's executing code that no longer exits.
>
> >
> > But why would you use dynamic memory allocation for these trampolines at
> > all? Why not use the one default trampoline for this?
>
> That's what ftrace does today.
>
> >
> > Suppose that thing looks like:
> >
> > ftrace_mcount_handler()
> > {
> > for_each_hlist_rcu(entry,..)
> > entry->func();
> > }
> >
> > so why not make it look like:
> >
> > ftrace_mcount_handler()
> > {
> > asm_volatile_goto("jmp %l[label]" ::: &do_list);
> > return;
> >
> > do_list:
> > for_each_hlist_rcu(entry,...)
> > entry->func();
> > }
> >
> > Then, for:
> > no entries -> NOP,
> > one entry -> "CALL $func",
> > more entries -> "JMP &do_list.
>
> Except that we don't use jump labels for this, but just update the
> trampoline directly (we've been doing this before jump labels ever
> existed, and the trampoline is all in assembly anyway).
>
> >
> > No need for extra allocations and fancy means of getting rid of them,
> > and only a few bytes extra wrt the existing function.
>
> This doesn't address the issue we want to solve.
>
> Say we have 1000 functions we want to trace with 1000 different
> callbacks. Each of theses functions has one call back. How do you solve
> that with your solution? Today, we do the list for every function. That
> is, for each of these 1000 functions, we run through 1000 ops looking
> for the ops that registered for this function. Not very efficient is it?
>
>
> What we want to do today, is to create a dynamic trampoline for each of
> theses 1000 functions. Each function will call a separate trampoline
> that will only call the function that was registered to it. That way,
> we can have 1000 different ops registered to 1000 different functions
> and still have the same performance.
>
> -- Steve
>
next prev parent reply other threads:[~2014-08-08 14:28 UTC|newest]
Thread overview: 122+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-31 21:54 [PATCH v3 tip/core/rcu 0/9 Paul E. McKenney
2014-07-31 21:55 ` [PATCH v3 tip/core/rcu 1/9] rcu: Add call_rcu_tasks() Paul E. McKenney
2014-07-31 21:55 ` [PATCH v3 tip/core/rcu 2/9] rcu: Provide cond_resched_rcu_qs() to force quiescent states in long loops Paul E. McKenney
2014-07-31 21:55 ` [PATCH v3 tip/core/rcu 3/9] rcu: Add synchronous grace-period waiting for RCU-tasks Paul E. McKenney
2014-08-01 15:09 ` Oleg Nesterov
2014-08-01 18:32 ` Paul E. McKenney
2014-08-01 19:44 ` Paul E. McKenney
2014-08-02 14:47 ` Oleg Nesterov
2014-08-02 22:58 ` Paul E. McKenney
2014-08-06 0:57 ` Steven Rostedt
2014-08-06 1:21 ` Paul E. McKenney
2014-08-06 8:47 ` Peter Zijlstra
2014-08-06 12:09 ` Paul E. McKenney
2014-08-06 16:30 ` Peter Zijlstra
2014-08-06 22:45 ` Paul E. McKenney
2014-08-07 8:45 ` Peter Zijlstra
2014-08-07 15:00 ` Paul E. McKenney
2014-08-07 15:26 ` Peter Zijlstra
2014-08-07 17:27 ` Peter Zijlstra
2014-08-07 18:46 ` Peter Zijlstra
2014-08-07 19:49 ` Steven Rostedt
2014-08-07 19:53 ` Steven Rostedt
2014-08-07 20:08 ` Peter Zijlstra
2014-08-07 21:18 ` Steven Rostedt
2014-08-08 6:40 ` Peter Zijlstra
2014-08-08 14:12 ` Steven Rostedt
2014-08-08 14:28 ` Paul E. McKenney [this message]
2014-08-09 10:56 ` Masami Hiramatsu
2014-08-08 14:34 ` Peter Zijlstra
2014-08-08 14:58 ` Steven Rostedt
2014-08-08 15:16 ` Peter Zijlstra
2014-08-08 15:39 ` Steven Rostedt
2014-08-08 16:01 ` Peter Zijlstra
2014-08-08 16:10 ` Steven Rostedt
2014-08-08 16:17 ` Peter Zijlstra
2014-08-08 16:40 ` Steven Rostedt
2014-08-08 16:52 ` Peter Zijlstra
2014-08-08 16:27 ` Peter Zijlstra
2014-08-08 16:39 ` Paul E. McKenney
2014-08-08 16:49 ` Steven Rostedt
2014-08-08 16:51 ` Peter Zijlstra
2014-08-08 17:09 ` Paul E. McKenney
2014-08-08 16:43 ` Steven Rostedt
2014-08-08 16:50 ` Peter Zijlstra
2014-08-08 17:27 ` Steven Rostedt
2014-08-09 10:36 ` Masami Hiramatsu
2014-08-07 20:06 ` Peter Zijlstra
2014-07-31 21:55 ` [PATCH v3 tip/core/rcu 4/9] rcu: Export RCU-tasks APIs to GPL modules Paul E. McKenney
2014-07-31 21:55 ` [PATCH v3 tip/core/rcu 5/9] rcutorture: Add torture tests for RCU-tasks Paul E. McKenney
2014-07-31 21:55 ` [PATCH v3 tip/core/rcu 6/9] rcutorture: Add RCU-tasks test cases Paul E. McKenney
2014-07-31 21:55 ` [PATCH v3 tip/core/rcu 7/9] rcu: Add stall-warning checks for RCU-tasks Paul E. McKenney
2014-07-31 21:55 ` [PATCH v3 tip/core/rcu 8/9] rcu: Improve RCU-tasks energy efficiency Paul E. McKenney
2014-07-31 21:55 ` [PATCH v3 tip/core/rcu 9/9] documentation: Add verbiage on RCU-tasks stall warning messages Paul E. McKenney
2014-07-31 23:57 ` [PATCH v3 tip/core/rcu 1/9] rcu: Add call_rcu_tasks() Frederic Weisbecker
2014-08-01 2:04 ` Paul E. McKenney
2014-08-01 15:06 ` Frederic Weisbecker
2014-08-01 1:15 ` Lai Jiangshan
2014-08-01 1:59 ` Paul E. McKenney
2014-08-01 1:31 ` Lai Jiangshan
2014-08-01 2:11 ` Paul E. McKenney
2014-08-01 14:11 ` Oleg Nesterov
2014-08-01 18:28 ` Paul E. McKenney
2014-08-01 18:40 ` Oleg Nesterov
2014-08-02 23:00 ` Paul E. McKenney
2014-08-03 12:57 ` Oleg Nesterov
2014-08-03 22:03 ` Paul E. McKenney
2014-08-04 13:29 ` Oleg Nesterov
2014-08-04 13:48 ` Paul E. McKenney
2014-08-01 18:57 ` Oleg Nesterov
2014-08-02 22:50 ` Paul E. McKenney
2014-08-02 14:56 ` Oleg Nesterov
2014-08-02 22:57 ` Paul E. McKenney
2014-08-03 13:33 ` Oleg Nesterov
2014-08-03 22:05 ` Paul E. McKenney
2014-08-04 0:37 ` Lai Jiangshan
2014-08-04 1:09 ` Paul E. McKenney
2014-08-04 13:25 ` Oleg Nesterov
2014-08-04 13:51 ` Paul E. McKenney
2014-08-04 13:52 ` Paul E. McKenney
2014-08-04 13:32 ` Oleg Nesterov
2014-08-04 19:28 ` Paul E. McKenney
2014-08-04 19:32 ` Oleg Nesterov
2014-08-04 1:28 ` Lai Jiangshan
2014-08-04 7:46 ` Peter Zijlstra
2014-08-04 8:18 ` Lai Jiangshan
2014-08-04 11:50 ` Paul E. McKenney
2014-08-04 12:25 ` Peter Zijlstra
2014-08-04 12:37 ` Paul E. McKenney
2014-08-04 14:56 ` Peter Zijlstra
2014-08-05 0:47 ` Lai Jiangshan
2014-08-05 21:55 ` Paul E. McKenney
2014-08-06 0:27 ` Lai Jiangshan
2014-08-06 0:48 ` Paul E. McKenney
2014-08-06 0:33 ` Lai Jiangshan
2014-08-06 0:51 ` Paul E. McKenney
2014-08-06 22:48 ` Paul E. McKenney
2014-08-07 8:49 ` Peter Zijlstra
2014-08-07 15:43 ` Paul E. McKenney
2014-08-07 16:32 ` Peter Zijlstra
2014-08-07 17:48 ` Paul E. McKenney
2014-08-08 19:13 ` Peter Zijlstra
2014-08-08 20:58 ` Paul E. McKenney
2014-08-09 6:15 ` Peter Zijlstra
2014-08-09 12:44 ` Steven Rostedt
2014-08-09 16:05 ` Paul E. McKenney
2014-08-09 16:01 ` Paul E. McKenney
2014-08-09 18:19 ` Peter Zijlstra
2014-08-09 18:24 ` Peter Zijlstra
2014-08-10 1:29 ` Paul E. McKenney
2014-08-10 8:14 ` Peter Zijlstra
2014-08-11 3:30 ` Paul E. McKenney
2014-08-11 11:57 ` Peter Zijlstra
2014-08-11 16:15 ` Paul E. McKenney
2014-08-10 1:26 ` Paul E. McKenney
2014-08-10 8:12 ` Peter Zijlstra
2014-08-10 16:46 ` Peter Zijlstra
2014-08-11 3:28 ` Paul E. McKenney
2014-08-11 3:23 ` Paul E. McKenney
2014-08-09 18:33 ` Peter Zijlstra
2014-08-10 1:38 ` Paul E. McKenney
2014-08-10 15:00 ` Peter Zijlstra
2014-08-11 3:37 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140808142810.GV5821@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=bobby.prani@gmail.com \
--cc=dhowells@redhat.com \
--cc=dipankar@in.ibm.com \
--cc=dvhart@linux.intel.com \
--cc=edumazet@google.com \
--cc=fweisbec@gmail.com \
--cc=josh@joshtriplett.org \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=masami.hiramatsu.pt@hitachi.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).