From: paulmck@linux.vnet.ibm.com (Paul E. McKenney)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH RFC idle 2/3] arm: Avoid invoking RCU when CPU is idle
Date: Sat, 4 Feb 2012 06:21:23 -0800 [thread overview]
Message-ID: <20120204142123.GA14901@linux.vnet.ibm.com> (raw)
In-Reply-To: <1328297787.5882.203.camel@gandalf.stny.rr.com>
On Fri, Feb 03, 2012 at 02:36:27PM -0500, Steven Rostedt wrote:
> On Fri, 2012-02-03 at 10:41 -0800, Kevin Hilman wrote:
>
> > > How is it a step backwards if it is already broken.
> >
> > Well, I didn't know it was broken. ;) And, as Paul mentioned, this has
> > been broken for a long time. Apparently it's been working well enough
> > for nobody to notice until recently.
> >
> > > Obviously you haven't actually used any tracing here because it
> > > doesn't work right with things as is.
> >
> > It's been working well enough for me to debug several idle path problems
> > with tracing. Admittedly, this has been primarily on UP systems, but
> > I've recently started using the tracing on SMP as well. (however, due
> > to "coupled" low-power states on OMAP, large parts of the idle path are
> > effectively UP since one CPU0 has to wait for CPU1 to hit a low-power
> > state before it can.)
>
> It's used by all users of powertop, and we haven't heard about a bug
> yet. This doesn't mean that the bug doesn't exist. The race is extremely
> hard to hit. It's one of those "good bugs". You know, the kind that you
> don't really have to worry about because you are more likely to win the
> lottery, become President of the United States, and find a cure for
> cancer (all those together, not just one) than the chance of hitting
> this bug. But it's a bug regardless and should, unfortunately, be fixed.
>
> But here's the explanation of the bug:
>
> As Paul has stated, when rcu_idle_enter() is in effect, the calls to
> rcu_read_lock_* are ignored. Thus we can pretend they don't exist.
>
> The code in question is the __DO_TRACE() in include/linux/tracepoint.h:
>
> rcu_read_lock_sched_notrace(); \
> it_func_ptr = rcu_dereference_sched((tp)->funcs); \
> if (it_func_ptr) { \
> do { \
> it_func = (it_func_ptr)->func; \
> __data = (it_func_ptr)->data; \
> ((void(*)(proto))(it_func))(args); \
> } while ((++it_func_ptr)->func); \
> } \
> rcu_read_unlock_sched_notrace();
>
> As stated above, the rcu_read_(un)lock_sched_notrace() are worthless
> when in rcu_idle_enter().
>
> They protect the referencing of tp->funcs, which is an array of all
> funcs that are attached to this tracepoint.
>
> Now we need to look at kernel/tracepoint.c:
>
> The protection is needed against a simultaneous insertion or deletion of
> a tracepoint hook. This happens when a user enables or disables tracing.
>
> Note, this race is even made harder to hit, because due to the static
> branch that controls whether this gets called, will be off if no
> tracepoints are attached. So the race can only happen after at least one
> tracepoint is active.
I agree that this race is hard to hit when running Linux on bare metal.
But consider a Linux kernel running as a guest OS. Then the host might
preempt the guest in the middle of a tracepoint. Then from the guest OS's
viewpoint, that VCPU has just stopped, possibly for a very long time --
easily long enough for all the other VCPUs to pass through quiescent
states. And the guest OS is ignoring that VCPU, so a too-short grace
period could easily happen in this scenario.
Thanx, Paul
> But if two probes are are added to this tracepoint, then we can hit the
> race. And it is possible to trigger with only one probe on removal.
>
> When adding or removing a tracepoint, the array (the one that
> it_func_ptr points to) is updated by allocating a new array, copying the
> old array plus or minus the tracpoint being added or removed, setting
> the tp->funcs to the new array, and then it calls call_rcu_sched() to
> free it.
>
> Now for the bug to hit, something had to be coming in or out of idle,
> and jumping to this code. Between the time it got the it_func_ptr to the
> time it accessed any of that pointer's data in the loop, the tp->func
> had to be updated to the new array, and then all CPUs would have passed
> a schedule point (except the rcu_idle CPUs).
>
> On uniprocessor, this is not an issue, but on SMP, it is possible that
> with two CPUs the first being in rcu_idle may be ignored, and the second
> would have been adding the tracepoint and then going directly to freeing
> the code. But as tracepoints are very low weight, it is most likely that
> the tracepoints will finish before the first could even free the memory.
>
> But the chance does exist. As the chance of me winning the lottery,
> becoming President of the United States, and curing cancer also exists!
>
> ;-)
>
> -- Steve
>
>
next prev parent reply other threads:[~2012-02-04 14:21 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20120202004253.GA10946@linux.vnet.ibm.com>
[not found] ` <1328143404-11038-1-git-send-email-paulmck@linux.vnet.ibm.com>
2012-02-02 0:43 ` [PATCH RFC idle 2/3] arm: Avoid invoking RCU when CPU is idle Paul E. McKenney
2012-02-02 2:48 ` Rob Herring
2012-02-02 4:40 ` Paul E. McKenney
2012-02-02 3:49 ` Nicolas Pitre
2012-02-02 4:44 ` Paul E. McKenney
2012-02-02 17:13 ` Nicolas Pitre
2012-02-02 17:43 ` Paul E. McKenney
2012-02-02 18:31 ` Nicolas Pitre
2012-02-02 19:07 ` Paul E. McKenney
2012-02-02 22:20 ` Kevin Hilman
2012-02-02 22:49 ` Rob Herring
2012-02-02 23:03 ` Steven Rostedt
2012-02-02 23:27 ` Paul E. McKenney
2012-02-02 23:51 ` Paul E. McKenney
2012-02-03 2:45 ` Steven Rostedt
2012-02-03 6:04 ` Paul E. McKenney
2012-02-03 18:55 ` Steven Rostedt
2012-02-03 19:40 ` Paul E. McKenney
2012-02-03 20:02 ` Steven Rostedt
2012-02-03 20:23 ` Paul E. McKenney
2012-02-06 21:18 ` [PATCH][RFC] tracing/rcu: Add trace_##name##__rcuidle() static tracepoint for inside rcu_idle_exit() sections Steven Rostedt
2012-02-06 23:38 ` Paul E. McKenney
2012-02-07 12:32 ` Steven Rostedt
2012-02-07 14:11 ` Paul E. McKenney
2012-02-08 13:57 ` Frederic Weisbecker
2012-02-07 14:40 ` Josh Triplett
[not found] ` <20120206220502.GA21340@leaf>
2012-02-07 0:36 ` Steven Rostedt
[not found] ` <20120203025350.GF13456@leaf>
2012-02-03 6:06 ` [PATCH RFC idle 2/3] arm: Avoid invoking RCU when CPU is idle Paul E. McKenney
2012-02-02 23:39 ` Rob Herring
2012-02-03 18:41 ` Kevin Hilman
2012-02-03 19:26 ` Paul E. McKenney
2012-02-03 19:36 ` Steven Rostedt
2012-02-04 14:21 ` Paul E. McKenney [this message]
2012-02-06 19:32 ` Steven Rostedt
2012-02-02 23:03 ` Paul E. McKenney
2012-02-03 19:12 ` Kevin Hilman
2012-02-03 19:26 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120204142123.GA14901@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).