From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Frederic Weisbecker <fweisbec@gmail.com>
Subject: Re: [PATCH] tracing: add trace_event_read_lock()
Date: Tue, 19 May 2009 21:38:33 -0700 [thread overview]
Message-ID: <20090520043833.GB6925@linux.vnet.ibm.com> (raw)
In-Reply-To: <20090520005948.GF6066@nowhere>
On Wed, May 20, 2009 at 02:59:50AM +0200, Frederic Weisbecker wrote:
> On Tue, May 19, 2009 at 05:38:53AM -0700, Paul E. McKenney wrote:
> > On Tue, May 19, 2009 at 01:15:44PM +0800, Lai Jiangshan wrote:
> > > Paul E. McKenney wrote:
> > > > On Mon, May 18, 2009 at 03:59:31PM +0200, Frederic Weisbecker wrote:
> > > >> On Mon, May 18, 2009 at 07:35:34PM +0800, Lai Jiangshan wrote:
> > > >>> I found that there is nothing to protect event_hash in
> > > >>> ftrace_find_event().
> > > >> Actually, rcu protects it, but not enough. We have neither
> > > >> synchronize_rcu() nor rcu_read_lock.
> > > >>
> > > >> So we protect against concurrent hlist accesses.
> > > >> But the event can be removed when a module is unloaded,
> > > >> and that can happen between the time we get the event output
> > > >> callback and the time we actually use it.
> > > >
> > > > I will ask the stupid question... Would invoking rcu_barrier() in the
> > > > module-exit function take care of this? The rcu_barrier() primitive
> > > > waits for all in-flight RCU callbacks to complete execution.
> > > >
> > > > Thanx, Paul
> > > >
> > >
> > > We have no call_rcu* in it.
> >
> > OK, I will ask the next stupid question... Is it possible to use the
> > same trick rcu_barrier() uses, but for your events?
> >
> > Thanx, Paul
>
>
> Hi Paul,
>
> I think you're right, we could use RCU to solve this race.
>
> The current race window to solve is as follows:
>
>
> --Task A--
>
> print_trace_line(trace) {
> event = find_ftrace_event(trace) {
> hlist_for_each_entry_rcu(....) {
> ...
> return found;
> }
> }
>
>
> --Task B--
>
> /*
> * Can be quite confusing.
> * But we have "trace events" which are global objects
> * handling tracing, output, everything.
> * And we have also "ftrace events" which are unique
> * identifiers on traces than make them able to retrieve
> * their associated output callbacks.
> * So a "trace event" has an associated "ftrace event" to output
> * itself.
> */
>
> trace_module_remove_events(mod) {
> list_trace_events_module(ev, mod) {
> unregister_ftrace_event(ev->event) {
> // mutex protected
> hlist_del(ev->event->node)
> list_del(....)
> //
> }
> }
> }
> //module removed
Ouch!!! ;-)
> -- Task A--
>
> event->print(trace)
>
> KABOOM! print callback was on the module. event doesn't even exist anymore.
>
> So Lai's solution is correct to fix it.
> But rcu is also valid.
>
> Currently rcu only protects the hashlist but not the callback until
> the job is complete.
>
> What we could do is:
>
> (Read side)
>
> rcu_read_lock();
>
> event = find_ftrace_event();
> event->print(trace);
As long as event->print() never blocks...
Though you could use SRCU if it might block:
rcu_read_lock() -> i = srcu_read_lock(&my_srcu_struct);
rcu_read_unlock() -> srcu_read_unlock(&my_srcu_struct, i);
synchronize_rcu() -> synchronize_srcu(&my_srcu_struct);
> rcu_read_unlock()
>
> (Write side)
>
> unregister_ftrace_event(e)
> {
> mutex_lock(&writer);
> hlist_del(e);
> list_del(e);
> mutex_unlock(&writer);
>
> // Wait for every printing grace periods
> synchronize_rcu();
> }
>
>
> Then the only overhead is a preempt_disable()/preempt_enable()
> for each trace to be printed. Just a sub/add pair and quite few
> preemption disabled latency, just the time for a hashlist search
> and some work done to format a trace to be printed.
> But if latency does really matters, we have RCU_PREEMPT.
>
> On the other side, the down_read() is an extra call but it is called
> only once for a whole loop of traces seeking.
>
> I guess both solutions are valid.
> What do you think?
It could go either way. As a very rough rule of thumb, if the down_read()
happens seldom, there is little advantage to RCU. On the other hand,
if the down_read() could happen often enough that a new reader shows up
before the previous down_read() completes, then you really need RCU or
something like it.
On most systems, if down_read() was happening more than a few million
times per second, life might be hard. On the other hand, if down_read()
never happened more than a few tens of thousand times per second or so,
should be no problem.
Since it sounds like a single down_read() substitutes for a potentially
large number of RCU read-side critical sections, I do not believe that
contention-free read-side overhead would be a problem.
Does that help?
Thanx, Paul
next prev parent reply other threads:[~2009-05-25 22:10 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-05-18 11:35 [PATCH] tracing: add trace_event_read_lock() Lai Jiangshan
2009-05-18 13:59 ` Frederic Weisbecker
2009-05-19 0:35 ` Paul E. McKenney
2009-05-19 5:15 ` Lai Jiangshan
2009-05-19 12:38 ` Paul E. McKenney
2009-05-20 0:59 ` Frederic Weisbecker
2009-05-20 4:38 ` Paul E. McKenney [this message]
2009-05-19 2:05 ` Lai Jiangshan
2009-05-20 0:24 ` Frederic Weisbecker
2009-05-20 2:25 ` Lai Jiangshan
2009-05-20 15:41 ` Paul E. McKenney
2009-05-20 16:04 ` Steven Rostedt
2009-05-27 22:34 ` [tip:tracing/core] " tip-bot for Lai Jiangshan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090520043833.GB6925@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=fweisbec@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.