From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Joe Korty <joe.korty@ccur.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>,
Peter Zijlstra <peterz@infradead.org>,
Lai Jiangshan <laijs@cn.fujitsu.com>,
"mathieu.desnoyers@efficios.com" <mathieu.desnoyers@efficios.com>,
"dhowells@redhat.com" <dhowells@redhat.com>,
"loic.minier@linaro.org" <loic.minier@linaro.org>,
"dhaval.giani@gmail.com" <dhaval.giani@gmail.com>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"josh@joshtriplett.org" <josh@joshtriplett.org>,
"houston.jim@comcast.net" <houston.jim@comcast.net>
Subject: Re: [PATCH] An RCU for SMP with a single CPU garbage collector
Date: Sat, 12 Mar 2011 22:09:22 -0800 [thread overview]
Message-ID: <20110313060922.GX2234@linux.vnet.ibm.com> (raw)
In-Reply-To: <20110313012504.GB14518@tsunami.ccur.com>
On Sat, Mar 12, 2011 at 08:25:04PM -0500, Joe Korty wrote:
> On Sat, Mar 12, 2011 at 09:36:53AM -0500, Paul E. McKenney wrote:
> > Hello, Joe,
> >
> > My biggest question is "what does JRCU do that Frederic's patchset
> > does not do?" I am not seeing it at the moment. Given that Frederic's
> > patchset integrates into RCU, thus providing the full RCU API, I really
> > need a good answer to consider JRCU.
>
> Well, it's tiny, it's fast, and it does exactly one thing
> and does that really well. If a user doesn't need that
> one thing they shouldn't use JRCU. But mostly it is an
> exciting thought-experiment on another interesting way to
> do RCU. Who knows, maybe it may end up being better than
> for what it was aimed at.
Fair enough! From my perspective, I am likely to learn something from
watching you work on it, and it is reasonably likely that you will
come up with something useful for the existing RCU implementations.
And who knows what you might come up with?
> > For one, what sort of validation have you done?
> >
> > Thanx, Paul
>
> Not much, I'm writing the code and sending it out for
> comment. And it is currently missing many of the tweaks
> needed to make it a production RCU.
Ah, OK. I strongly recommend rcutorture.
> >> +struct rcu_data {
> >> + u8 wait; /* goes false when this cpu consents to
> >> + * the retirement of the current batch */
> >> + u8 which; /* selects the current callback list */
> >> + struct rcu_list cblist[2]; /* current & previous callback lists */
> >> +} ____cacheline_aligned_in_smp;
> >> +
> >> +static struct rcu_data rcu_data[NR_CPUS];
> >
> > Why not DEFINE_PER_CPU(struct rcu_data, rcu_data)?
>
> All part of being lockless. I didn't want to have to tie
> into cpu onlining and offlining and wanted to eliminate
> sprinking special tests and/or online locks throughout
> the code. Also, note the single for_each_present_cpu(cpu)
> statement in JRCU .. this loops over all offline cpus and
> gradually expires any residuals they have left behind.
OK, but the per-CPU variables do not come and go with the CPUs.
So I am still not seeing how the array is helping compared to
per-CPU variables.
> >> +/*
> >> + * Return our CPU id or zero if we are too early in the boot process to
> >> + * know what that is. For RCU to work correctly, a cpu named '0' must
> >> + * eventually be present (but need not ever be online).
> >> + */
> >> +static inline int rcu_cpu(void)
> >> +{
> >> + return current_thread_info()->cpu;
> >
> > OK, I'll bite... Why not smp_processor_id()?
>
> Until recently, it was :) but it was a multiline thing,
> with 'if' stmts and such, to handle early boot conditions
> when smp_processor_id() isn't valid.
>
> JRCU, perhaps quixotically, tries to do something
> meaningful all the way back to the first microsecond of
> existance, when the CPU is switched from 16 to 32 bit mode.
> In that early epoch, things like 'cpus' and 'interrupts'
> and 'tasks' don't quite yet exist in the form we are used
> to for them.
OK.
> > And what to do about the architectures that put the CPU number somewhere
> > else?
>
> I confess I keep forgetting to look at that other 21 or
> so other architectures, I had thought they all had ->cpu.
> I look into it and, at least for those, reintroduce the
> old smp_processor_id() expression.
But those that have ->cpu can safely use smp_processor_id(), right?
> >> +void rcu_barrier(void)
> >> +{
> >> + struct rcu_synchronize rcu;
> >> +
> >> + if (!rcu_scheduler_active)
> >> + return;
> >> +
> >> + init_completion(&rcu.completion);
> >> + call_rcu(&rcu.head, wakeme_after_rcu);
> >> + wait_for_completion(&rcu.completion);
> >> + atomic_inc(&rcu_stats.nbarriers);
> >> +
> >> +}
> >> +EXPORT_SYMBOL_GPL(rcu_barrier);
> >
> > The rcu_barrier() function must wait on all RCU callbacks, regardless of
> > which CPU they are queued on. This is important when unloading modules
> > that use call_rcu(). In contrast, the above looks to me like it waits
> > only on the current CPU's callbacks.
>
> Oops. I'll come up with an alternate mechanism. Thanks for finding this.
NP. ;-)
> > So, what am I missing?
>
> Nothing. You were right :)
>
> >> + /*
> >> + * Swap current and previous lists. Other cpus must not see this
> >> + * out-of-order w.r.t. the just-completed plist init, hence the above
> >> + * smp_wmb().
> >> + */
> >> + rd->which++;
> >
> > You do seem to have interrupts disabled when sampling ->which, but
> > this is not safe for cross-CPU accesses to ->which, right? The other
> > CPU might queue onto the wrong element. This would mean that you
> > would not be guaranteed a full 50ms delay from quiescent state to
> > corresponding RCU callback invocation.
> >
> > Or am I missing something subtle here?
>
> JRCU expects updates to the old queue to continue for a
> while, it only requires that they end and a trailing wmb
> be fully executed before the next sampling period ends.
OK. I remain skeptical, but mainly because similar setups have proven
buggy in the past.
> >> + /*
> >> + * End the current RCU batch and start a new one.
> >> + */
> >> + for_each_present_cpu(cpu) {
> >> + rd = &rcu_data[cpu];
> >
> > And here we get the cross-CPU accesses that I was worried about above.
>
> Yep. This is one of the trio of reasons why JRCU is for
> small SMP systems. It's the tradeoff I made to move the
> entire RCU load off onto one CPU. If that is not important
> (and it won't be to any but to specialized systems), one
> is expected to use RCU_TREE.
OK, so here is one of the things that you are trying to do with JRCU that
the current in-kernel RCU implementations do not do, namely, cause RCU
callbacks to be consistently invoked on some other CPU. If I remember
correctly, Frederic's changes do not support this either, because his
workloads run almost entirely in user space, and are therefore not
expected to generate RCU callbacks.
Oddly enough, user-space RCU -does- support this notion. ;-)
> The other two of the trio of reasons: doing kfree's on the
> 'wrong' cpu puts the freed buffer in the 'wrong' per-cpu
> free queue, and putting all the load on one cpu means
> that cpu could hit 100% cpu utilization just doing rcu
> callbacks, for systems with thousands of cpus and have
> the io fabrics necessary to keep those cpus busy.
That would be one reason I have been reluctant to implement callback
offloading in advance of a definite need!
Though I must admit that I am unsure how JRCU's call_rcu() implementation
does this safely -- it looks to me like it is only excluding irqs,
not other CPUs.
Thanx, Paul
prev parent reply other threads:[~2011-03-13 6:09 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-04 23:21 dyntick-hpc and RCU Paul E. McKenney
2010-11-05 5:27 ` Frederic Weisbecker
2010-11-05 5:38 ` Frederic Weisbecker
2010-11-05 15:06 ` Paul E. McKenney
2010-11-05 20:06 ` Dhaval Giani
2010-11-05 15:04 ` Paul E. McKenney
2010-11-08 14:10 ` Frederic Weisbecker
2010-11-05 21:00 ` [PATCH] a local-timer-free version of RCU Joe Korty
2010-11-06 19:28 ` Paul E. McKenney
2010-11-06 19:34 ` Mathieu Desnoyers
2010-11-06 19:42 ` Mathieu Desnoyers
2010-11-06 19:44 ` Paul E. McKenney
2010-11-08 2:11 ` Udo A. Steinberg
2010-11-08 2:19 ` Udo A. Steinberg
2010-11-08 2:54 ` Paul E. McKenney
2010-11-08 15:32 ` Frederic Weisbecker
2010-11-08 19:38 ` Paul E. McKenney
2010-11-08 20:40 ` Frederic Weisbecker
2010-11-10 18:08 ` Paul E. McKenney
2010-11-08 15:06 ` Frederic Weisbecker
2010-11-08 15:18 ` Joe Korty
2010-11-08 19:50 ` Paul E. McKenney
2010-11-08 19:49 ` Paul E. McKenney
2010-11-08 20:51 ` Frederic Weisbecker
2010-11-06 20:03 ` Mathieu Desnoyers
2010-11-09 9:22 ` Lai Jiangshan
2010-11-10 15:54 ` Frederic Weisbecker
2010-11-10 17:31 ` Peter Zijlstra
2010-11-10 17:45 ` Frederic Weisbecker
2010-11-11 4:19 ` Paul E. McKenney
2010-11-13 22:30 ` Frederic Weisbecker
2010-11-16 1:28 ` Paul E. McKenney
2010-11-16 13:52 ` Frederic Weisbecker
2010-11-16 15:51 ` Paul E. McKenney
2010-11-17 0:52 ` Frederic Weisbecker
2010-11-17 1:25 ` Paul E. McKenney
2011-03-07 20:31 ` [PATCH] An RCU for SMP with a single CPU garbage collector Joe Korty
[not found] ` <20110307210157.GG3104@linux.vnet.ibm.com>
2011-03-07 21:16 ` Joe Korty
2011-03-07 21:33 ` Joe Korty
2011-03-07 22:51 ` Joe Korty
2011-03-08 9:07 ` Paul E. McKenney
2011-03-08 15:57 ` Joe Korty
2011-03-08 22:53 ` Joe Korty
2011-03-10 0:30 ` Paul E. McKenney
2011-03-10 0:28 ` Paul E. McKenney
2011-03-09 22:29 ` Frederic Weisbecker
2011-03-09 22:15 ` [PATCH 2/4] jrcu: tap rcu_read_unlock Joe Korty
2011-03-10 0:34 ` Paul E. McKenney
2011-03-10 19:50 ` JRCU Theory of Operation Joe Korty
2011-03-12 14:36 ` Paul E. McKenney
2011-03-13 0:43 ` Joe Korty
2011-03-13 5:56 ` Paul E. McKenney
2011-03-13 23:53 ` Joe Korty
2011-03-14 0:50 ` Paul E. McKenney
2011-03-14 0:55 ` Josh Triplett
2011-03-09 22:16 ` [PATCH 3/4] jrcu: tap might_resched() Joe Korty
2011-03-09 22:17 ` [PATCH 4/4] jrcu: add new stat to /sys/kernel/debug/rcu/rcudata Joe Korty
2011-03-09 22:19 ` [PATCH 1/4] jrcu: remove preempt_enable() tap [resend] Joe Korty
2011-03-12 14:36 ` [PATCH] An RCU for SMP with a single CPU garbage collector Paul E. McKenney
2011-03-13 1:25 ` Joe Korty
2011-03-13 6:09 ` Paul E. McKenney [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110313060922.GX2234@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=dhaval.giani@gmail.com \
--cc=dhowells@redhat.com \
--cc=fweisbec@gmail.com \
--cc=houston.jim@comcast.net \
--cc=joe.korty@ccur.com \
--cc=josh@joshtriplett.org \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=loic.minier@linaro.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).