From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Manfred Spraul <manfred@colorfullife.com>
Cc: linux-kernel@vger.kernel.org, mingo@elte.hu,
akpm@linux-foundation.org, oleg@tv-sign.ru, dipankar@in.ibm.com,
rostedt@goodmis.org, dvhltc@us.ibm.com, niv@us.ibm.com
Subject: Re: [PATCH tip/core/rcu] classic RCU locking and memory-barrier cleanups
Date: Wed, 6 Aug 2008 20:18:06 -0700 [thread overview]
Message-ID: <20080807031806.GA6910@linux.vnet.ibm.com> (raw)
In-Reply-To: <489936E5.7020509@colorfullife.com>
On Wed, Aug 06, 2008 at 07:30:13AM +0200, Manfred Spraul wrote:
> Hi Paul,
>
> Paul E. McKenney wrote:
>> This patch is in preparation for moving to a hierarchical
>> algorithm to allow the very large SMP machines -- requested by some
>> people at OLS, and there seem to have been a few recent patches in the
>> 4096-CPU direction as well.
>
> I thought about hierarchical RCU, but I never found the time to implement
> it.
> Do you have a concept in mind?
Actually, you did submit a patch for a two-level hierarchy some years
back:
http://marc.theaimsgroup.com/?l=linux-kernel&m=108546384711797&w=2
I am looking to allow multiple levels to accommodate 4096 CPUs, which
pushes me towards locking on the nodes in the hierarchy. I have
a roughed-out design that (I hope!) avoids deadlock and that allows
adapting to machine topology. I am also trying to minimize the amount
of arch-specific code needed to construct the hierarchy -- hopefully
just a pair of config parameters.
More as it starts working...
> Right now, I try to understand the current code first - and some of it
> doesn't make much sense.
>
> There are three per-cpu lists:
> ->nxt
> ->cur
> ->done.
>
> Obviously, there must be a quiescent state between cur and done.
> But why does the code require a quiescent state between nxt and cur?
> I think that's superflous. The only thing that is required is that all cpus
> have moved their callbacks from nxt to cur. That doesn't need a quiescent
> state, this operation could be done in hard interrupt as well.
The deal is that we have to put incoming callbacks somewhere while
the batch in ->cur waits for an RCU grace period. That somewhere is
->nxt. So to be painfully pedantic, the callbacks in ->nxt are not
waiting for an RCU grace period. Instead, they are waiting for the
callbacks in ->cur to get out of the way.
> Thus I think this should work:
>
> 1) A callback is inserted into ->nxt.
Yep.
> 2) As soon as too many objects are sitting in the ->nxt lists, a new rcu
> cycle is started.
Yep, call_rcu() and friends now do this. (In response to denial of
services attacks some years back.)
> 3) As soon as a cpu sees that a new rcu cycle is started, it moves it's
> callbacks from ->nxt to ->cur. No checks for hard_irq_count & friends
> necessary. Especially: same rule for _bh and normal.
Yep. The checks for hard_irq_count are instead intended to determine
if this CPU is already in a quiescent state for the newly started RCU
grace period. As long as we took the scheduling clock interrupt,
we might as well get our money's worth, right?
> 4) As soon as all cpus have moved their lists from ->nxt to ->cur, the real
> grace period is started.
Jiangshan took a slightly different approach to handling this situation,
but yes, more or less. The trick is that the processing in (4) for
->nxt is overlapped with the processing in (5) for ->cur.
> 5) As soon as all cpus passed a quiescent state (i.e.: now with tests for
> hard_irq_count, different rules for _bh and normal), the list is moved from
> ->cur to ->completed. Once in completed, they can be destroyed by
> performing the callbacks.
To ->done rather than ->completed, but yes.
> What do you think? would that work? It doesn't make much sense that step 3)
> tests for a quiescent state.
The trick is that the work for grace period n and grace period n+1
are overlapped.
> Step 2) could depend memory pressure.
Yep.
> Step 3) and 4) could be accelerated by force_quiescent_state(), if the
> memory pressure is too high.
Yep -- though we haven't done this except on paper.
Thanx, Paul
> --
> Manfred
> -> nxt
>
next prev parent reply other threads:[~2008-08-07 3:18 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-08-05 16:21 [PATCH tip/core/rcu] classic RCU locking and memory-barrier cleanups Paul E. McKenney
2008-08-05 16:48 ` Steven Rostedt
2008-08-05 17:40 ` Paul E. McKenney
2008-08-06 5:30 ` Manfred Spraul
2008-08-07 3:18 ` Paul E. McKenney [this message]
2008-08-18 9:13 ` Manfred Spraul
2008-08-18 14:04 ` Paul E. McKenney
2008-08-19 10:48 ` Manfred Spraul
2008-08-19 14:03 ` Paul E. McKenney
2008-08-19 17:16 ` nohz_cpu_mask question (was: Re: [PATCH tip/core/rcu] classic RCU locking and memory-barrier cleanups) Manfred Spraul
2008-08-19 17:41 ` Paul E. McKenney
2008-08-15 14:09 ` [PATCH tip/core/rcu] classic RCU locking and memory-barrier cleanups Ingo Molnar
2008-08-15 14:24 ` Ingo Molnar
2008-08-15 14:56 ` Ingo Molnar
2008-08-15 14:58 ` Paul E. McKenney
2008-08-17 14:37 ` [PATCH tip/core/rcu] classic RCU locking cleanup fix lockdep problem Paul E. McKenney
2008-08-17 15:38 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080807031806.GA6910@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=dipankar@in.ibm.com \
--cc=dvhltc@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=manfred@colorfullife.com \
--cc=mingo@elte.hu \
--cc=niv@us.ibm.com \
--cc=oleg@tv-sign.ru \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.