From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>,
LKML <linux-kernel@vger.kernel.org>, Tejun Heo <tj@kernel.org>,
Ingo Molnar <mingo@kernel.org>,
Frederic Weisbecker <fweisbec@gmail.com>,
Jiri Olsa <jolsa@redhat.com>
Subject: Re: [RFC][PATCH] ftrace: Use schedule_on_each_cpu() as a heavy synchronize_sched()
Date: Wed, 29 May 2013 06:33:15 -0700 [thread overview]
Message-ID: <20130529133315.GC6172@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130529075249.GC12193@twins.programming.kicks-ass.net>
On Wed, May 29, 2013 at 09:52:49AM +0200, Peter Zijlstra wrote:
> On Tue, May 28, 2013 at 08:01:16PM -0400, Steven Rostedt wrote:
> > The function tracer uses preempt_disable/enable_notrace() for
> > synchronization between reading registered ftrace_ops and unregistering
> > them.
> >
> > Most of the ftrace_ops are global permanent structures that do not
> > require this synchronization. That is, ops may be added and removed from
> > the hlist but are never freed, and wont hurt if a synchronization is
> > missed.
> >
> > But this is not true for dynamically created ftrace_ops or control_ops,
> > which are used by the perf function tracing.
> >
> > The problem here is that the function tracer can be used to trace
> > kernel/user context switches as well as going to and from idle.
> > Basically, it can be used to trace blind spots of the RCU subsystem.
> > This means that even though preempt_disable() is done, a
> > synchronize_sched() will ignore CPUs that haven't made it out of user
> > space or idle. These can include functions that are being traced just
> > before entering or exiting the kernel sections.
>
> Just to be clear, its the idle part that's a problem, right? Being stuck
> in userspace isn't a problem since if that CPU is in userspace its
> certainly not got a reference to whatever list entry we're removing.
You got it! The problem is the exact definition of "idle". The way that
it works now is that the idle loop tells RCU when idle starts and ends
by invoking rcu_idle_enter() and rcu_idle_exit(), respectively. Right
now, these calls are in the top-level idle loop. They could in principle
be moved down further, but last time I tried it, it got pretty ugly.
> Now when the CPU really is idle, its obviously not using tracing either;
> so only the gray area where RCU thinks we're idle but we're not actually
> idle is a problem?
Exactly. And there always will be a grey area, just like the grey area
between being in an interrupt handler and in_irq() knowing about it.
> Is there something a little smarter we can do? Could we use
> on_each_cpu_cond() with a function that checks if the CPU really is
> fully idle?
One recent change that should help is making the _rcuidle variants of
the tracing functions callable from both idle and irq. To make the
on_each_cpu_cond() approach work, event tracing would need to switch
from RCU (which might be preemptible RCU) to RCU-sched (whose read-side
critical sections can pair with on_each_cpu(). I have to defer to Steven
on whether this is a good approach.
> > To implement the RCU synchronization, instead of using
> > synchronize_sched() the use of schedule_on_each_cpu() is performed. This
> > means that when a dynamically allocated ftrace_ops, or a control ops is
> > being unregistered, all CPUs must be touched and execute a ftrace_sync()
> > stub function via the work queues. This will rip CPUs out from idle or
> > in dynamic tick mode. This only happens when a user disables perf
> > function tracing or other dynamically allocated function tracers, but it
> > allows us to continue to debug RCU and context tracking with function
> > tracing.
>
> I don't suppose there's anything perf can do to about this right? Since
> its all on user demand we're kinda stuck with dynamic memory.
I believe that Steven's earlier patch using on_each_cpu() solves this
problem.
Thanx, Paul
next prev parent reply other threads:[~2013-05-29 13:33 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-29 0:01 [RFC][PATCH] ftrace: Use schedule_on_each_cpu() as a heavy synchronize_sched() Steven Rostedt
2013-05-29 7:52 ` Peter Zijlstra
2013-05-29 13:33 ` Paul E. McKenney [this message]
2013-05-29 13:55 ` Steven Rostedt
2013-05-29 13:41 ` Steven Rostedt
2014-06-19 1:56 ` Steven Rostedt
2014-06-19 2:28 ` Paul E. McKenney
2014-06-19 7:18 ` Masami Hiramatsu
2013-05-29 8:23 ` Paul E. McKenney
2013-06-04 11:03 ` Frederic Weisbecker
2013-06-04 12:11 ` Steven Rostedt
2013-06-04 12:30 ` Frederic Weisbecker
2013-06-05 11:51 ` Peter Zijlstra
2013-06-05 13:36 ` Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130529133315.GC6172@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=fweisbec@gmail.com \
--cc=jolsa@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox