public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH tip/core/rcu 0/6] rcu: fix synchronize_rcu_expedited(), update docs, improve perf
@ 2009-10-14 17:15 Paul E. McKenney
  2009-10-14 17:15 ` [PATCH tip/core/rcu 1/6] rcu: Update trace.txt documentation to reflect recent changes Paul E. McKenney
                   ` (7 more replies)
  0 siblings, 8 replies; 24+ messages in thread
From: Paul E. McKenney @ 2009-10-14 17:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, dvhltc,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells, npiggin,
	jens.axboe

This patchset contains a bug fix, a performance improvement, and
documentation updates:

o	Update Documentation/RCU/trace.txt to reflect recent changes
	(including the removal of rcupreempt.c).

o	Fix to the severe performance problem with excessive IPIs and
	lock contention in presence of very large (but legal) numbers
	of RCU callbacks (reported by Nick Piggin).

o	Stopgap fix for a bug in TREE_PREEMPT_RCU's implementation of
	synchronize_rcu_expedited().  This fix is correct, but no faster
	than synchronize_rcu().

o	Add exports for the updated synchronize_rcu_expedited()
	implementation, which moved from a static inline in
	include/linux/rcupdate.h to a separately compiled function
	in kernel/rcutree_plugin.h.

o	Add the new rnp->blocked_tasks field to the rcuhier trace file
	in debugfs.

o	Update the Documentation/RCU/trace.txt documentation to include
	the rnp->blocked_tasks tracing.

I believe that this is 2.6.32 material.

 Documentation/RCU/trace.txt   |   22 ++-
 b/Documentation/RCU/trace.txt |  232 +++++-------------------------------------
 b/include/linux/rcutree.h     |    6 -
 b/kernel/rcutree.c            |   29 ++++-
 b/kernel/rcutree.h            |    5 
 b/kernel/rcutree_plugin.h     |   20 +++
 b/kernel/rcutree_trace.c      |    8 -
 kernel/rcutree_plugin.h       |    3 
 8 files changed, 103 insertions(+), 222 deletions(-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH tip/core/rcu 1/6] rcu: Update trace.txt documentation to reflect recent changes
  2009-10-14 17:15 [PATCH tip/core/rcu 0/6] rcu: fix synchronize_rcu_expedited(), update docs, improve perf Paul E. McKenney
@ 2009-10-14 17:15 ` Paul E. McKenney
  2009-10-15  9:25   ` [tip:core/rcu] " tip-bot for Paul E. McKenney
  2009-10-14 17:15 ` [PATCH tip/core/rcu 2/6] rcu: prevent RCU IPI storms in presence of high call_rcu() load Paul E. McKenney
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2009-10-14 17:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, dvhltc,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells, npiggin,
	jens.axboe, Paul E. McKenney

o	Remove the CONFIG_PREEMPT_RCU documentation since this
	config option has now been removed.

o	Change the now-incorrect references to "rcu" labels to
	instead be "rcu_sched".

o	Add notes stating that CONFIG_TREE_PREEMPT_RCU kernels will
	have additional "rcu_preempt" output.

o	Note the new "oqlen" field in the rcuhier output (for
	RCU callbacks orphaned by an offlined CPU).

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/RCU/trace.txt |  231 ++++++------------------------------------
 1 files changed, 33 insertions(+), 198 deletions(-)

diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt
index 187bbf1..c1a9550 100644
--- a/Documentation/RCU/trace.txt
+++ b/Documentation/RCU/trace.txt
@@ -1,185 +1,10 @@
 CONFIG_RCU_TRACE debugfs Files and Formats
 
 
-The rcupreempt and rcutree implementations of RCU provide debugfs trace
-output that summarizes counters and state.  This information is useful for
-debugging RCU itself, and can sometimes also help to debug abuses of RCU.
-Note that the rcuclassic implementation of RCU does not provide debugfs
-trace output.
-
-The following sections describe the debugfs files and formats for
-preemptable RCU (rcupreempt) and hierarchical RCU (rcutree).
-
-
-Preemptable RCU debugfs Files and Formats
-
-This implementation of RCU provides three debugfs files under the
-top-level directory RCU: rcu/rcuctrs (which displays the per-CPU
-counters used by preemptable RCU) rcu/rcugp (which displays grace-period
-counters), and rcu/rcustats (which internal counters for debugging RCU).
-
-The output of "cat rcu/rcuctrs" looks as follows:
-
-CPU last cur F M
-  0    5  -5 0 0
-  1   -1   0 0 0
-  2    0   1 0 0
-  3    0   1 0 0
-  4    0   1 0 0
-  5    0   1 0 0
-  6    0   2 0 0
-  7    0  -1 0 0
-  8    0   1 0 0
-ggp = 26226, state = waitzero
-
-The per-CPU fields are as follows:
-
-o	"CPU" gives the CPU number.  Offline CPUs are not displayed.
-
-o	"last" gives the value of the counter that is being decremented
-	for the current grace period phase.  In the example above,
-	the counters sum to 4, indicating that there are still four
-	RCU read-side critical sections still running that started
-	before the last counter flip.
-
-o	"cur" gives the value of the counter that is currently being
-	both incremented (by rcu_read_lock()) and decremented (by
-	rcu_read_unlock()).  In the example above, the counters sum to
-	1, indicating that there is only one RCU read-side critical section
-	still running that started after the last counter flip.
-
-o	"F" indicates whether RCU is waiting for this CPU to acknowledge
-	a counter flip.  In the above example, RCU is not waiting on any,
-	which is consistent with the state being "waitzero" rather than
-	"waitack".
-
-o	"M" indicates whether RCU is waiting for this CPU to execute a
-	memory barrier.  In the above example, RCU is not waiting on any,
-	which is consistent with the state being "waitzero" rather than
-	"waitmb".
-
-o	"ggp" is the global grace-period counter.
-
-o	"state" is the RCU state, which can be one of the following:
-
-	o	"idle": there is no grace period in progress.
-
-	o	"waitack": RCU just incremented the global grace-period
-		counter, which has the effect of reversing the roles of
-		the "last" and "cur" counters above, and is waiting for
-		all the CPUs to acknowledge the flip.  Once the flip has
-		been acknowledged, CPUs will no longer be incrementing
-		what are now the "last" counters, so that their sum will
-		decrease monotonically down to zero.
-
-	o	"waitzero": RCU is waiting for the sum of the "last" counters
-		to decrease to zero.
-
-	o	"waitmb": RCU is waiting for each CPU to execute a memory
-		barrier, which ensures that instructions from a given CPU's
-		last RCU read-side critical section cannot be reordered
-		with instructions following the memory-barrier instruction.
-
-The output of "cat rcu/rcugp" looks as follows:
-
-oldggp=48870  newggp=48873
-
-Note that reading from this file provokes a synchronize_rcu().  The
-"oldggp" value is that of "ggp" from rcu/rcuctrs above, taken before
-executing the synchronize_rcu(), and the "newggp" value is also the
-"ggp" value, but taken after the synchronize_rcu() command returns.
-
-
-The output of "cat rcu/rcugp" looks as follows:
-
-na=1337955 nl=40 wa=1337915 wl=44 da=1337871 dl=0 dr=1337871 di=1337871
-1=50989 e1=6138 i1=49722 ie1=82 g1=49640 a1=315203 ae1=265563 a2=49640
-z1=1401244 ze1=1351605 z2=49639 m1=5661253 me1=5611614 m2=49639
-
-These are counters tracking internal preemptable-RCU events, however,
-some of them may be useful for debugging algorithms using RCU.  In
-particular, the "nl", "wl", and "dl" values track the number of RCU
-callbacks in various states.  The fields are as follows:
-
-o	"na" is the total number of RCU callbacks that have been enqueued
-	since boot.
-
-o	"nl" is the number of RCU callbacks waiting for the previous
-	grace period to end so that they can start waiting on the next
-	grace period.
-
-o	"wa" is the total number of RCU callbacks that have started waiting
-	for a grace period since boot.  "na" should be roughly equal to
-	"nl" plus "wa".
-
-o	"wl" is the number of RCU callbacks currently waiting for their
-	grace period to end.
-
-o	"da" is the total number of RCU callbacks whose grace periods
-	have completed since boot.  "wa" should be roughly equal to
-	"wl" plus "da".
-
-o	"dr" is the total number of RCU callbacks that have been removed
-	from the list of callbacks ready to invoke.  "dr" should be roughly
-	equal to "da".
-
-o	"di" is the total number of RCU callbacks that have been invoked
-	since boot.  "di" should be roughly equal to "da", though some
-	early versions of preemptable RCU had a bug so that only the
-	last CPU's count of invocations was displayed, rather than the
-	sum of all CPU's counts.
-
-o	"1" is the number of calls to rcu_try_flip().  This should be
-	roughly equal to the sum of "e1", "i1", "a1", "z1", and "m1"
-	described below.  In other words, the number of times that
-	the state machine is visited should be equal to the sum of the
-	number of times that each state is visited plus the number of
-	times that the state-machine lock acquisition failed.
-
-o	"e1" is the number of times that rcu_try_flip() was unable to
-	acquire the fliplock.
-
-o	"i1" is the number of calls to rcu_try_flip_idle().
-
-o	"ie1" is the number of times rcu_try_flip_idle() exited early
-	due to the calling CPU having no work for RCU.
-
-o	"g1" is the number of times that rcu_try_flip_idle() decided
-	to start a new grace period.  "i1" should be roughly equal to
-	"ie1" plus "g1".
-
-o	"a1" is the number of calls to rcu_try_flip_waitack().
-
-o	"ae1" is the number of times that rcu_try_flip_waitack() found
-	that at least one CPU had not yet acknowledge the new grace period
-	(AKA "counter flip").
-
-o	"a2" is the number of time rcu_try_flip_waitack() found that
-	all CPUs had acknowledged.  "a1" should be roughly equal to
-	"ae1" plus "a2".  (This particular output was collected on
-	a 128-CPU machine, hence the smaller-than-usual fraction of
-	calls to rcu_try_flip_waitack() finding all CPUs having already
-	acknowledged.)
-
-o	"z1" is the number of calls to rcu_try_flip_waitzero().
-
-o	"ze1" is the number of times that rcu_try_flip_waitzero() found
-	that not all of the old RCU read-side critical sections had
-	completed.
-
-o	"z2" is the number of times that rcu_try_flip_waitzero() finds
-	the sum of the counters equal to zero, in other words, that
-	all of the old RCU read-side critical sections had completed.
-	The value of "z1" should be roughly equal to "ze1" plus
-	"z2".
-
-o	"m1" is the number of calls to rcu_try_flip_waitmb().
-
-o	"me1" is the number of times that rcu_try_flip_waitmb() finds
-	that at least one CPU has not yet executed a memory barrier.
-
-o	"m2" is the number of times that rcu_try_flip_waitmb() finds that
-	all CPUs have executed a memory barrier.
+The rcutree implementation of RCU provides debugfs trace output that
+summarizes counters and state.  This information is useful for debugging
+RCU itself, and can sometimes also help to debug abuses of RCU.
+The following sections describe the debugfs files and formats.
 
 
 Hierarchical RCU debugfs Files and Formats
@@ -210,9 +35,10 @@ rcu_bh:
   6 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=859/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
   7 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=3761/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
 
-The first section lists the rcu_data structures for rcu, the second for
-rcu_bh.  Each section has one line per CPU, or eight for this 8-CPU system.
-The fields are as follows:
+The first section lists the rcu_data structures for rcu_sched, the second
+for rcu_bh.  Note that CONFIG_TREE_PREEMPT_RCU kernels will have an
+additional section for rcu_preempt.  Each section has one line per CPU,
+or eight for this 8-CPU system.  The fields are as follows:
 
 o	The number at the beginning of each line is the CPU number.
 	CPUs numbers followed by an exclamation mark are offline,
@@ -223,9 +49,9 @@ o	The number at the beginning of each line is the CPU number.
 
 o	"c" is the count of grace periods that this CPU believes have
 	completed.  CPUs in dynticks idle mode may lag quite a ways
-	behind, for example, CPU 4 under "rcu" above, which has slept
-	through the past 25 RCU grace periods.	It is not unusual to
-	see CPUs lagging by thousands of grace periods.
+	behind, for example, CPU 4 under "rcu_sched" above, which has
+	slept through the past 25 RCU grace periods.  It is not unusual
+	to see CPUs lagging by thousands of grace periods.
 
 o	"g" is the count of grace periods that this CPU believes have
 	started.  Again, CPUs in dynticks idle mode may lag behind.
@@ -308,8 +134,10 @@ The output of "cat rcu/rcugp" looks as follows:
 rcu_sched: completed=33062  gpnum=33063
 rcu_bh: completed=464  gpnum=464
 
-Again, this output is for both "rcu" and "rcu_bh".  The fields are
-taken from the rcu_state structure, and are as follows:
+Again, this output is for both "rcu_sched" and "rcu_bh".  Note that
+kernels built with CONFIG_TREE_PREEMPT_RCU will have an additional
+"rcu_preempt" line.  The fields are taken from the rcu_state structure,
+and are as follows:
 
 o	"completed" is the number of grace periods that have completed.
 	It is comparable to the "c" field from rcu/rcudata in that a
@@ -324,23 +152,24 @@ o	"gpnum" is the number of grace periods that have started.  It is
 	If these two fields are equal (as they are for "rcu_bh" above),
 	then there is no grace period in progress, in other words, RCU
 	is idle.  On the other hand, if the two fields differ (as they
-	do for "rcu" above), then an RCU grace period is in progress.
+	do for "rcu_sched" above), then an RCU grace period is in progress.
 
 
 The output of "cat rcu/rcuhier" looks as follows, with very long lines:
 
-c=6902 g=6903 s=2 jfq=3 j=72c7 nfqs=13142/nfqsng=0(13142) fqlh=6
+c=6902 g=6903 s=2 jfq=3 j=72c7 nfqs=13142/nfqsng=0(13142) fqlh=6 oqlen=0
 1/1 0:127 ^0    
 3/3 0:35 ^0    0/0 36:71 ^1    0/0 72:107 ^2    0/0 108:127 ^3    
 3/3f 0:5 ^0    2/3 6:11 ^1    0/0 12:17 ^2    0/0 18:23 ^3    0/0 24:29 ^4    0/0 30:35 ^5    0/0 36:41 ^0    0/0 42:47 ^1    0/0 48:53 ^2    0/0 54:59 ^3    0/0 60:65 ^4    0/0 66:71 ^5    0/0 72:77 ^0    0/0 78:83 ^1    0/0 84:89 ^2    0/0 90:95 ^3    0/0 96:101 ^4    0/0 102:107 ^5    0/0 108:113 ^0    0/0 114:119 ^1    0/0 120:125 ^2    0/0 126:127 ^3    
 rcu_bh:
-c=-226 g=-226 s=1 jfq=-5701 j=72c7 nfqs=88/nfqsng=0(88) fqlh=0
+c=-226 g=-226 s=1 jfq=-5701 j=72c7 nfqs=88/nfqsng=0(88) fqlh=0 oqlen=0
 0/1 0:127 ^0    
 0/3 0:35 ^0    0/0 36:71 ^1    0/0 72:107 ^2    0/0 108:127 ^3    
 0/3f 0:5 ^0    0/3 6:11 ^1    0/0 12:17 ^2    0/0 18:23 ^3    0/0 24:29 ^4    0/0 30:35 ^5    0/0 36:41 ^0    0/0 42:47 ^1    0/0 48:53 ^2    0/0 54:59 ^3    0/0 60:65 ^4    0/0 66:71 ^5    0/0 72:77 ^0    0/0 78:83 ^1    0/0 84:89 ^2    0/0 90:95 ^3    0/0 96:101 ^4    0/0 102:107 ^5    0/0 108:113 ^0    0/0 114:119 ^1    0/0 120:125 ^2    0/0 126:127 ^3
 
-This is once again split into "rcu" and "rcu_bh" portions.  The fields are
-as follows:
+This is once again split into "rcu_sched" and "rcu_bh" portions,
+and CONFIG_TREE_PREEMPT_RCU kernels will again have an additional
+"rcu_preempt" section.  The fields are as follows:
 
 o	"c" is exactly the same as "completed" under rcu/rcugp.
 
@@ -372,6 +201,11 @@ o	"fqlh" is the number of calls to force_quiescent_state() that
 	exited immediately (without even being counted in nfqs above)
 	due to contention on ->fqslock.
 
+o	"oqlen" is the number of callbacks on the "orphan" callback
+	list.  RCU callbacks are placed on this list by CPUs going
+	offline, and are "adopted" either by the CPU helping the outgoing
+	CPU or by the next rcu_barrier*() call, whichever comes first.
+
 o	Each element of the form "1/1 0:127 ^0" represents one struct
 	rcu_node.  Each line represents one level of the hierarchy, from
 	root to leaves.  It is best to think of the rcu_data structures
@@ -389,10 +223,10 @@ o	Each element of the form "1/1 0:127 ^0" represents one struct
 		The value of qsmaskinit is assigned to that of qsmask
 		at the beginning of each grace period.
 
-		For example, for "rcu", the qsmask of the first entry
-		of the lowest level is 0x14, meaning that we are still
-		waiting for CPUs 2 and 4 to check in for the current
-		grace period.
+		For example, for "rcu_sched", the qsmask of the first
+		entry of the lowest level is 0x14, meaning that we
+		are still waiting for CPUs 2 and 4 to check in for the
+		current grace period.
 
 	o	The numbers separated by the ":" are the range of CPUs
 		served by this struct rcu_node.  This can be helpful
@@ -431,8 +265,9 @@ rcu_bh:
   6 np=120834 qsp=9902 cbr=0 cng=0 gpc=6 gps=3 nf=2 nn=110921
   7 np=144888 qsp=26336 cbr=0 cng=0 gpc=8 gps=2 nf=0 nn=118542
 
-As always, this is once again split into "rcu" and "rcu_bh" portions.
-The fields are as follows:
+As always, this is once again split into "rcu_sched" and "rcu_bh"
+portions, with CONFIG_TREE_PREEMPT_RCU kernels having an additional
+"rcu_preempt" section.  The fields are as follows:
 
 o	"np" is the number of times that __rcu_pending() has been invoked
 	for the corresponding flavor of RCU.
-- 
1.5.2.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH tip/core/rcu 2/6] rcu: prevent RCU IPI storms in presence of high call_rcu() load
  2009-10-14 17:15 [PATCH tip/core/rcu 0/6] rcu: fix synchronize_rcu_expedited(), update docs, improve perf Paul E. McKenney
  2009-10-14 17:15 ` [PATCH tip/core/rcu 1/6] rcu: Update trace.txt documentation to reflect recent changes Paul E. McKenney
@ 2009-10-14 17:15 ` Paul E. McKenney
  2009-10-15  3:31   ` Nick Piggin
                     ` (2 more replies)
  2009-10-14 17:15 ` [PATCH tip/core/rcu 3/6] rcu: stopgap fix for synchronize_rcu_expedited() for TREE_PREEMPT_RCU Paul E. McKenney
                   ` (5 subsequent siblings)
  7 siblings, 3 replies; 24+ messages in thread
From: Paul E. McKenney @ 2009-10-14 17:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, dvhltc,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells, npiggin,
	jens.axboe, Paul E. McKenney

From: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

As the number of callbacks on a given CPU rises, invoke
force_quiescent_state() only every blimit number of callbacks
(defaults to 10,000), and even then only if no other CPU has invoked
force_quiescent_state() in the meantime.

Reported-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcutree.c |   29 ++++++++++++++++++++++++-----
 kernel/rcutree.h |    4 ++++
 2 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 705f02a..ddbf111 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -958,7 +958,7 @@ static void rcu_offline_cpu(int cpu)
  * Invoke any RCU callbacks that have made it to the end of their grace
  * period.  Thottle as specified by rdp->blimit.
  */
-static void rcu_do_batch(struct rcu_data *rdp)
+static void rcu_do_batch(struct rcu_state *rsp, struct rcu_data *rdp)
 {
 	unsigned long flags;
 	struct rcu_head *next, *list, **tail;
@@ -1011,6 +1011,13 @@ static void rcu_do_batch(struct rcu_data *rdp)
 	if (rdp->blimit == LONG_MAX && rdp->qlen <= qlowmark)
 		rdp->blimit = blimit;
 
+	/* Reset ->qlen_last_fqs_check trigger if enough CBs have drained. */
+	if (rdp->qlen == 0 && rdp->qlen_last_fqs_check != 0) {
+		rdp->qlen_last_fqs_check = 0;
+		rdp->n_force_qs_snap = rsp->n_force_qs;
+	} else if (rdp->qlen < rdp->qlen_last_fqs_check - qhimark)
+		rdp->qlen_last_fqs_check = rdp->qlen;
+
 	local_irq_restore(flags);
 
 	/* Re-raise the RCU softirq if there are callbacks remaining. */
@@ -1224,7 +1231,7 @@ __rcu_process_callbacks(struct rcu_state *rsp, struct rcu_data *rdp)
 	}
 
 	/* If there are callbacks ready, invoke them. */
-	rcu_do_batch(rdp);
+	rcu_do_batch(rsp, rdp);
 }
 
 /*
@@ -1288,10 +1295,20 @@ __call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu),
 		rcu_start_gp(rsp, nestflag);  /* releases rnp_root->lock. */
 	}
 
-	/* Force the grace period if too many callbacks or too long waiting. */
-	if (unlikely(++rdp->qlen > qhimark)) {
+	/*
+	 * Force the grace period if too many callbacks or too long waiting.
+	 * Enforce hysteresis, and don't invoke force_quiescent_state()
+	 * if some other CPU has recently done so.  Also, don't bother
+	 * invoking force_quiescent_state() if the newly enqueued callback
+	 * is the only one waiting for a grace period to complete.
+	 */
+	if (unlikely(++rdp->qlen > rdp->qlen_last_fqs_check + qhimark)) {
 		rdp->blimit = LONG_MAX;
-		force_quiescent_state(rsp, 0);
+		if (rsp->n_force_qs == rdp->n_force_qs_snap &&
+		    *rdp->nxttail[RCU_DONE_TAIL] != head)
+			force_quiescent_state(rsp, 0);
+		rdp->n_force_qs_snap = rsp->n_force_qs;
+		rdp->qlen_last_fqs_check = rdp->qlen;
 	} else if ((long)(ACCESS_ONCE(rsp->jiffies_force_qs) - jiffies) < 0)
 		force_quiescent_state(rsp, 1);
 	local_irq_restore(flags);
@@ -1523,6 +1540,8 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp, int preemptable)
 	rdp->beenonline = 1;	 /* We have now been online. */
 	rdp->preemptable = preemptable;
 	rdp->passed_quiesc_completed = lastcomp - 1;
+	rdp->qlen_last_fqs_check = 0;
+	rdp->n_force_qs_snap = rsp->n_force_qs;
 	rdp->blimit = blimit;
 	spin_unlock(&rnp->lock);		/* irqs remain disabled. */
 
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index b40ac57..599161f 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -167,6 +167,10 @@ struct rcu_data {
 	struct rcu_head *nxtlist;
 	struct rcu_head **nxttail[RCU_NEXT_SIZE];
 	long		qlen;		/* # of queued callbacks */
+	long		qlen_last_fqs_check;
+					/* qlen at last check for QS forcing */
+	unsigned long	n_force_qs_snap;
+					/* did other CPU force QS recently? */
 	long		blimit;		/* Upper limit on a processed batch */
 
 #ifdef CONFIG_NO_HZ
-- 
1.5.2.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH tip/core/rcu 3/6] rcu: stopgap fix for synchronize_rcu_expedited() for TREE_PREEMPT_RCU
  2009-10-14 17:15 [PATCH tip/core/rcu 0/6] rcu: fix synchronize_rcu_expedited(), update docs, improve perf Paul E. McKenney
  2009-10-14 17:15 ` [PATCH tip/core/rcu 1/6] rcu: Update trace.txt documentation to reflect recent changes Paul E. McKenney
  2009-10-14 17:15 ` [PATCH tip/core/rcu 2/6] rcu: prevent RCU IPI storms in presence of high call_rcu() load Paul E. McKenney
@ 2009-10-14 17:15 ` Paul E. McKenney
  2009-10-15  9:25   ` [tip:core/rcu] rcu: Stopgap " tip-bot for Paul E. McKenney
  2009-10-14 17:15 ` [PATCH tip/core/rcu 4/6] rcu: add exports for synchronize_rcu_expedited() Paul E. McKenney
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2009-10-14 17:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, dvhltc,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells, npiggin,
	jens.axboe, Paul E. McKenney

From: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

For the short term, map synchronize_rcu_expedited() to synchronize_rcu()
for TREE_PREEMPT_RCU and to synchronize_sched_expedited() for TREE_RCU.
Longer term, there needs to be a real expedited grace period for
TREE_PREEMPT_RCU, but candidate patches to date are considerably more
complex and intrusive.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/rcutree.h |    6 +-----
 kernel/rcutree_plugin.h |   19 +++++++++++++++++++
 2 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index 46e9ab3..9642c6b 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -76,11 +76,7 @@ static inline void __rcu_read_unlock_bh(void)
 
 extern void call_rcu_sched(struct rcu_head *head,
 			   void (*func)(struct rcu_head *rcu));
-
-static inline void synchronize_rcu_expedited(void)
-{
-	synchronize_sched_expedited();
-}
+extern void synchronize_rcu_expedited(void);
 
 static inline void synchronize_rcu_bh_expedited(void)
 {
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index c0cb783..acef871 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -393,6 +393,16 @@ void call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
 EXPORT_SYMBOL_GPL(call_rcu);
 
 /*
+ * Wait for an rcu-preempt grace period.  We are supposed to expedite the
+ * grace period, but this is the crude slow compatability hack, so just
+ * invoke synchronize_rcu().
+ */
+void synchronize_rcu_expedited(void)
+{
+	synchronize_rcu();
+}
+
+/*
  * Check to see if there is any immediate preemptable-RCU-related work
  * to be done.
  */
@@ -565,6 +575,15 @@ void call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
 EXPORT_SYMBOL_GPL(call_rcu);
 
 /*
+ * Wait for an rcu-preempt grace period, but make it happen quickly.
+ * But because preemptable RCU does not exist, map to rcu-sched.
+ */
+void synchronize_rcu_expedited(void)
+{
+	synchronize_sched_expedited();
+}
+
+/*
  * Because preemptable RCU does not exist, it never has any work to do.
  */
 static int rcu_preempt_pending(int cpu)
-- 
1.5.2.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH tip/core/rcu 4/6] rcu: add exports for synchronize_rcu_expedited()
  2009-10-14 17:15 [PATCH tip/core/rcu 0/6] rcu: fix synchronize_rcu_expedited(), update docs, improve perf Paul E. McKenney
                   ` (2 preceding siblings ...)
  2009-10-14 17:15 ` [PATCH tip/core/rcu 3/6] rcu: stopgap fix for synchronize_rcu_expedited() for TREE_PREEMPT_RCU Paul E. McKenney
@ 2009-10-14 17:15 ` Paul E. McKenney
  2009-10-14 17:15 ` [PATCH tip/core/rcu 5/6] rcu: add rnp->blocked_tasks to tracing Paul E. McKenney
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2009-10-14 17:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, dvhltc,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells, npiggin,
	jens.axboe, Paul E. McKenney

From: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Export synchronize_rcu_expedited() to kernel modules via
EXPORT_SYMBOL_GPL().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcutree_plugin.h |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index acef871..ebd20ee 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -401,6 +401,7 @@ void synchronize_rcu_expedited(void)
 {
 	synchronize_rcu();
 }
+EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
 
 /*
  * Check to see if there is any immediate preemptable-RCU-related work
@@ -582,6 +583,7 @@ void synchronize_rcu_expedited(void)
 {
 	synchronize_sched_expedited();
 }
+EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
 
 /*
  * Because preemptable RCU does not exist, it never has any work to do.
-- 
1.5.2.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH tip/core/rcu 5/6] rcu: add rnp->blocked_tasks to tracing
  2009-10-14 17:15 [PATCH tip/core/rcu 0/6] rcu: fix synchronize_rcu_expedited(), update docs, improve perf Paul E. McKenney
                   ` (3 preceding siblings ...)
  2009-10-14 17:15 ` [PATCH tip/core/rcu 4/6] rcu: add exports for synchronize_rcu_expedited() Paul E. McKenney
@ 2009-10-14 17:15 ` Paul E. McKenney
  2009-10-14 20:26   ` Josh Triplett
  2009-10-14 17:15 ` [PATCH tip/core/rcu 6/6] rcu: Update trace.txt documentation for blocked-tasks lists Paul E. McKenney
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2009-10-14 17:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, dvhltc,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells, npiggin,
	jens.axboe, Paul E. McKenney

From: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcutree_trace.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/rcutree_trace.c b/kernel/rcutree_trace.c
index 4b31c77..f9d01a3 100644
--- a/kernel/rcutree_trace.c
+++ b/kernel/rcutree_trace.c
@@ -155,12 +155,13 @@ static const struct file_operations rcudata_csv_fops = {
 
 static void print_one_rcu_state(struct seq_file *m, struct rcu_state *rsp)
 {
+	long gpnum;
 	int level = 0;
 	struct rcu_node *rnp;
 
 	seq_printf(m, "c=%ld g=%ld s=%d jfq=%ld j=%x "
 		      "nfqs=%lu/nfqsng=%lu(%lu) fqlh=%lu oqlen=%ld\n",
-		   rsp->completed, rsp->gpnum, rsp->signaled,
+		   rsp->completed, gpnum = rsp->gpnum, rsp->signaled,
 		   (long)(rsp->jiffies_force_qs - jiffies),
 		   (int)(jiffies & 0xffff),
 		   rsp->n_force_qs, rsp->n_force_qs_ngp,
@@ -171,8 +172,10 @@ static void print_one_rcu_state(struct seq_file *m, struct rcu_state *rsp)
 			seq_puts(m, "\n");
 			level = rnp->level;
 		}
-		seq_printf(m, "%lx/%lx %d:%d ^%d    ",
+		seq_printf(m, "%lx/%lx %c>%c %d:%d ^%d    ",
 			   rnp->qsmask, rnp->qsmaskinit,
+			   "T."[list_empty(&rnp->blocked_tasks[gpnum & 1])],
+			   "T."[list_empty(&rnp->blocked_tasks[!(gpnum & 1)])],
 			   rnp->grplo, rnp->grphi, rnp->grpnum);
 	}
 	seq_puts(m, "\n");
-- 
1.5.2.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH tip/core/rcu 6/6] rcu: Update trace.txt documentation for blocked-tasks lists
  2009-10-14 17:15 [PATCH tip/core/rcu 0/6] rcu: fix synchronize_rcu_expedited(), update docs, improve perf Paul E. McKenney
                   ` (4 preceding siblings ...)
  2009-10-14 17:15 ` [PATCH tip/core/rcu 5/6] rcu: add rnp->blocked_tasks to tracing Paul E. McKenney
@ 2009-10-14 17:15 ` Paul E. McKenney
  2009-10-15  9:25   ` [tip:core/rcu] " tip-bot for Paul E. McKenney
  2009-10-14 20:28 ` [PATCH tip/core/rcu 0/6] rcu: fix synchronize_rcu_expedited(), update docs, improve perf Josh Triplett
  2009-10-15  9:21 ` Ingo Molnar
  7 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2009-10-14 17:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, dvhltc,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells, npiggin,
	jens.axboe, Paul E. McKenney

From: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/RCU/trace.txt |   21 +++++++++++++++------
 1 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt
index c1a9550..b1c5c67 100644
--- a/Documentation/RCU/trace.txt
+++ b/Documentation/RCU/trace.txt
@@ -158,14 +158,14 @@ o	"gpnum" is the number of grace periods that have started.  It is
 The output of "cat rcu/rcuhier" looks as follows, with very long lines:
 
 c=6902 g=6903 s=2 jfq=3 j=72c7 nfqs=13142/nfqsng=0(13142) fqlh=6 oqlen=0
-1/1 0:127 ^0    
-3/3 0:35 ^0    0/0 36:71 ^1    0/0 72:107 ^2    0/0 108:127 ^3    
-3/3f 0:5 ^0    2/3 6:11 ^1    0/0 12:17 ^2    0/0 18:23 ^3    0/0 24:29 ^4    0/0 30:35 ^5    0/0 36:41 ^0    0/0 42:47 ^1    0/0 48:53 ^2    0/0 54:59 ^3    0/0 60:65 ^4    0/0 66:71 ^5    0/0 72:77 ^0    0/0 78:83 ^1    0/0 84:89 ^2    0/0 90:95 ^3    0/0 96:101 ^4    0/0 102:107 ^5    0/0 108:113 ^0    0/0 114:119 ^1    0/0 120:125 ^2    0/0 126:127 ^3    
+1/1 .>. 0:127 ^0    
+3/3 .>. 0:35 ^0    0/0 .>. 36:71 ^1    0/0 .>. 72:107 ^2    0/0 .>. 108:127 ^3    
+3/3f .>. 0:5 ^0    2/3 .>. 6:11 ^1    0/0 .>. 12:17 ^2    0/0 .>. 18:23 ^3    0/0 .>. 24:29 ^4    0/0 .>. 30:35 ^5    0/0 .>. 36:41 ^0    0/0 .>. 42:47 ^1    0/0 .>. 48:53 ^2    0/0 .>. 54:59 ^3    0/0 .>. 60:65 ^4    0/0 .>. 66:71 ^5    0/0 .>. 72:77 ^0    0/0 .>. 78:83 ^1    0/0 .>. 84:89 ^2    0/0 .>. 90:95 ^3    0/0 .>. 96:101 ^4    0/0 .>. 102:107 ^5    0/0 .>. 108:113 ^0    0/0 .>. 114:119 ^1    0/0 .>. 120:125 ^2    0/0 .>. 126:127 ^3    
 rcu_bh:
 c=-226 g=-226 s=1 jfq=-5701 j=72c7 nfqs=88/nfqsng=0(88) fqlh=0 oqlen=0
-0/1 0:127 ^0    
-0/3 0:35 ^0    0/0 36:71 ^1    0/0 72:107 ^2    0/0 108:127 ^3    
-0/3f 0:5 ^0    0/3 6:11 ^1    0/0 12:17 ^2    0/0 18:23 ^3    0/0 24:29 ^4    0/0 30:35 ^5    0/0 36:41 ^0    0/0 42:47 ^1    0/0 48:53 ^2    0/0 54:59 ^3    0/0 60:65 ^4    0/0 66:71 ^5    0/0 72:77 ^0    0/0 78:83 ^1    0/0 84:89 ^2    0/0 90:95 ^3    0/0 96:101 ^4    0/0 102:107 ^5    0/0 108:113 ^0    0/0 114:119 ^1    0/0 120:125 ^2    0/0 126:127 ^3
+0/1 .>. 0:127 ^0    
+0/3 .>. 0:35 ^0    0/0 .>. 36:71 ^1    0/0 .>. 72:107 ^2    0/0 .>. 108:127 ^3    
+0/3f .>. 0:5 ^0    0/3 .>. 6:11 ^1    0/0 .>. 12:17 ^2    0/0 .>. 18:23 ^3    0/0 .>. 24:29 ^4    0/0 .>. 30:35 ^5    0/0 .>. 36:41 ^0    0/0 .>. 42:47 ^1    0/0 .>. 48:53 ^2    0/0 .>. 54:59 ^3    0/0 .>. 60:65 ^4    0/0 .>. 66:71 ^5    0/0 .>. 72:77 ^0    0/0 .>. 78:83 ^1    0/0 .>. 84:89 ^2    0/0 .>. 90:95 ^3    0/0 .>. 96:101 ^4    0/0 .>. 102:107 ^5    0/0 .>. 108:113 ^0    0/0 .>. 114:119 ^1    0/0 .>. 120:125 ^2    0/0 .>. 126:127 ^3
 
 This is once again split into "rcu_sched" and "rcu_bh" portions,
 and CONFIG_TREE_PREEMPT_RCU kernels will again have an additional
@@ -228,6 +228,15 @@ o	Each element of the form "1/1 0:127 ^0" represents one struct
 		are still waiting for CPUs 2 and 4 to check in for the
 		current grace period.
 
+	o	The characters separated by the ">" indicate the state
+		of the blocked-tasks lists.  A "T" preceding the ">"
+		indicates that at least one task blocked in an RCU
+		read-side critical section blocks the current grace
+		period, while a "." preceding the ">" indicates otherwise.
+		The character following the ">" indicates similarly for
+		the next grace period.  A "T" should appear in this
+		field only for rcu-preempt.
+
 	o	The numbers separated by the ":" are the range of CPUs
 		served by this struct rcu_node.  This can be helpful
 		in working out how the hierarchy is wired together.
-- 
1.5.2.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH tip/core/rcu 5/6] rcu: add rnp->blocked_tasks to tracing
  2009-10-14 17:15 ` [PATCH tip/core/rcu 5/6] rcu: add rnp->blocked_tasks to tracing Paul E. McKenney
@ 2009-10-14 20:26   ` Josh Triplett
  2009-10-14 23:36     ` Paul E. McKenney
  0 siblings, 1 reply; 24+ messages in thread
From: Josh Triplett @ 2009-10-14 20:26 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	dvhltc, niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	npiggin, jens.axboe

On Wed, Oct 14, 2009 at 10:15:58AM -0700, Paul E. McKenney wrote:
> From: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> ---
>  kernel/rcutree_trace.c |    7 +++++--
>  1 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcutree_trace.c b/kernel/rcutree_trace.c
> index 4b31c77..f9d01a3 100644
> --- a/kernel/rcutree_trace.c
> +++ b/kernel/rcutree_trace.c
> @@ -155,12 +155,13 @@ static const struct file_operations rcudata_csv_fops = {
>  
>  static void print_one_rcu_state(struct seq_file *m, struct rcu_state *rsp)
>  {
> +	long gpnum;
>  	int level = 0;
>  	struct rcu_node *rnp;
>  
>  	seq_printf(m, "c=%ld g=%ld s=%d jfq=%ld j=%x "
>  		      "nfqs=%lu/nfqsng=%lu(%lu) fqlh=%lu oqlen=%ld\n",
> -		   rsp->completed, rsp->gpnum, rsp->signaled,
> +		   rsp->completed, gpnum = rsp->gpnum, rsp->signaled,

Please don't hide this or any other assignment in the middle of a print
statement.  Just put it immediately preceeding the statement.

- Josh Triplett

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH tip/core/rcu 0/6] rcu: fix synchronize_rcu_expedited(), update docs, improve perf
  2009-10-14 17:15 [PATCH tip/core/rcu 0/6] rcu: fix synchronize_rcu_expedited(), update docs, improve perf Paul E. McKenney
                   ` (5 preceding siblings ...)
  2009-10-14 17:15 ` [PATCH tip/core/rcu 6/6] rcu: Update trace.txt documentation for blocked-tasks lists Paul E. McKenney
@ 2009-10-14 20:28 ` Josh Triplett
  2009-10-15  9:21 ` Ingo Molnar
  7 siblings, 0 replies; 24+ messages in thread
From: Josh Triplett @ 2009-10-14 20:28 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	dvhltc, niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	npiggin, jens.axboe

On Wed, Oct 14, 2009 at 10:15:17AM -0700, Paul E. McKenney wrote:
> This patchset contains a bug fix, a performance improvement, and
> documentation updates:
> 
> o	Update Documentation/RCU/trace.txt to reflect recent changes
> 	(including the removal of rcupreempt.c).
> 
> o	Fix to the severe performance problem with excessive IPIs and
> 	lock contention in presence of very large (but legal) numbers
> 	of RCU callbacks (reported by Nick Piggin).
> 
> o	Stopgap fix for a bug in TREE_PREEMPT_RCU's implementation of
> 	synchronize_rcu_expedited().  This fix is correct, but no faster
> 	than synchronize_rcu().
> 
> o	Add exports for the updated synchronize_rcu_expedited()
> 	implementation, which moved from a static inline in
> 	include/linux/rcupdate.h to a separately compiled function
> 	in kernel/rcutree_plugin.h.
> 
> o	Add the new rnp->blocked_tasks field to the rcuhier trace file
> 	in debugfs.
> 
> o	Update the Documentation/RCU/trace.txt documentation to include
> 	the rnp->blocked_tasks tracing.
> 
> I believe that this is 2.6.32 material.
> 
>  Documentation/RCU/trace.txt   |   22 ++-
>  b/Documentation/RCU/trace.txt |  232 +++++-------------------------------------
>  b/include/linux/rcutree.h     |    6 -
>  b/kernel/rcutree.c            |   29 ++++-
>  b/kernel/rcutree.h            |    5 
>  b/kernel/rcutree_plugin.h     |   20 +++
>  b/kernel/rcutree_trace.c      |    8 -
>  kernel/rcutree_plugin.h       |    3 
>  8 files changed, 103 insertions(+), 222 deletions(-)

For all of these patches except 5/6 (commented on separately):
Acked-by: Josh Triplett <josh@joshtriplett.org>

I agree that these need to go into 2.6.32, to accompany the switch to
using hierarchical RCU exclusively.

- Josh Triplett

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH tip/core/rcu 5/6] rcu: add rnp->blocked_tasks to tracing
  2009-10-14 20:26   ` Josh Triplett
@ 2009-10-14 23:36     ` Paul E. McKenney
  2009-10-15  9:25       ` [tip:core/rcu] rcu: Add " tip-bot for Paul E. McKenney
  0 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2009-10-14 23:36 UTC (permalink / raw)
  To: Josh Triplett
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	dvhltc, niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	npiggin, jens.axboe

On Wed, Oct 14, 2009 at 01:26:50PM -0700, Josh Triplett wrote:
> On Wed, Oct 14, 2009 at 10:15:58AM -0700, Paul E. McKenney wrote:
> > From: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > ---
> >  kernel/rcutree_trace.c |    7 +++++--
> >  1 files changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/rcutree_trace.c b/kernel/rcutree_trace.c
> > index 4b31c77..f9d01a3 100644
> > --- a/kernel/rcutree_trace.c
> > +++ b/kernel/rcutree_trace.c
> > @@ -155,12 +155,13 @@ static const struct file_operations rcudata_csv_fops = {
> >  
> >  static void print_one_rcu_state(struct seq_file *m, struct rcu_state *rsp)
> >  {
> > +	long gpnum;
> >  	int level = 0;
> >  	struct rcu_node *rnp;
> >  
> >  	seq_printf(m, "c=%ld g=%ld s=%d jfq=%ld j=%x "
> >  		      "nfqs=%lu/nfqsng=%lu(%lu) fqlh=%lu oqlen=%ld\n",
> > -		   rsp->completed, rsp->gpnum, rsp->signaled,
> > +		   rsp->completed, gpnum = rsp->gpnum, rsp->signaled,
> 
> Please don't hide this or any other assignment in the middle of a print
> statement.  Just put it immediately preceeding the statement.

Good point, here is the updated patch.

							Thanx, Paul

rcu: add rnp->blocked_tasks to tracing

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

 rcutree_trace.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/rcutree_trace.c b/kernel/rcutree_trace.c
index 4b31c77..1984cdc 100644
--- a/kernel/rcutree_trace.c
+++ b/kernel/rcutree_trace.c
@@ -155,12 +155,14 @@ static const struct file_operations rcudata_csv_fops = {
 
 static void print_one_rcu_state(struct seq_file *m, struct rcu_state *rsp)
 {
+	long gpnum;
 	int level = 0;
 	struct rcu_node *rnp;
 
+	gpnum = rsp->gpnum;
 	seq_printf(m, "c=%ld g=%ld s=%d jfq=%ld j=%x "
 		      "nfqs=%lu/nfqsng=%lu(%lu) fqlh=%lu oqlen=%ld\n",
-		   rsp->completed, rsp->gpnum, rsp->signaled,
+		   rsp->completed, gpnum, rsp->signaled,
 		   (long)(rsp->jiffies_force_qs - jiffies),
 		   (int)(jiffies & 0xffff),
 		   rsp->n_force_qs, rsp->n_force_qs_ngp,
@@ -171,8 +173,10 @@ static void print_one_rcu_state(struct seq_file *m, struct rcu_state *rsp)
 			seq_puts(m, "\n");
 			level = rnp->level;
 		}
-		seq_printf(m, "%lx/%lx %d:%d ^%d    ",
+		seq_printf(m, "%lx/%lx %c>%c %d:%d ^%d    ",
 			   rnp->qsmask, rnp->qsmaskinit,
+			   "T."[list_empty(&rnp->blocked_tasks[gpnum & 1])],
+			   "T."[list_empty(&rnp->blocked_tasks[!(gpnum & 1)])],
 			   rnp->grplo, rnp->grphi, rnp->grpnum);
 	}
 	seq_puts(m, "\n");

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH tip/core/rcu 2/6] rcu: prevent RCU IPI storms in presence of high call_rcu() load
  2009-10-14 17:15 ` [PATCH tip/core/rcu 2/6] rcu: prevent RCU IPI storms in presence of high call_rcu() load Paul E. McKenney
@ 2009-10-15  3:31   ` Nick Piggin
  2009-10-15  4:37     ` Paul E. McKenney
  2009-10-15  9:24   ` [tip:core/rcu] rcu: Prevent " tip-bot for Paul E. McKenney
  2009-10-15 11:04   ` [PATCH tip/core/rcu 2/6] rcu: prevent " Nick Piggin
  2 siblings, 1 reply; 24+ messages in thread
From: Nick Piggin @ 2009-10-15  3:31 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, dvhltc, niv, tglx, peterz, rostedt, Valdis.Kletnieks,
	dhowells, jens.axboe

Hi Paul,

I wonder why you don't just use the existing relaxed logic in
force_quiescent_state? Is it because you are worried about
different granularity of jiffies, and cases where RCU callbacks
are being processed much more quickly than 1 jiffy? (this would
make sense, I'm just asking because I'm curious as to your
thinking behind it).

Thanks, and yes I will give this a test and let you know how
it goes.

On Wed, Oct 14, 2009 at 10:15:55AM -0700, Paul E. McKenney wrote:
> From: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> As the number of callbacks on a given CPU rises, invoke
> force_quiescent_state() only every blimit number of callbacks
> (defaults to 10,000), and even then only if no other CPU has invoked
> force_quiescent_state() in the meantime.
> 
> Reported-by: Nick Piggin <npiggin@suse.de>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> ---
>  kernel/rcutree.c |   29 ++++++++++++++++++++++++-----
>  kernel/rcutree.h |    4 ++++
>  2 files changed, 28 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 705f02a..ddbf111 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -958,7 +958,7 @@ static void rcu_offline_cpu(int cpu)
>   * Invoke any RCU callbacks that have made it to the end of their grace
>   * period.  Thottle as specified by rdp->blimit.
>   */
> -static void rcu_do_batch(struct rcu_data *rdp)
> +static void rcu_do_batch(struct rcu_state *rsp, struct rcu_data *rdp)
>  {
>  	unsigned long flags;
>  	struct rcu_head *next, *list, **tail;
> @@ -1011,6 +1011,13 @@ static void rcu_do_batch(struct rcu_data *rdp)
>  	if (rdp->blimit == LONG_MAX && rdp->qlen <= qlowmark)
>  		rdp->blimit = blimit;
>  
> +	/* Reset ->qlen_last_fqs_check trigger if enough CBs have drained. */
> +	if (rdp->qlen == 0 && rdp->qlen_last_fqs_check != 0) {
> +		rdp->qlen_last_fqs_check = 0;
> +		rdp->n_force_qs_snap = rsp->n_force_qs;
> +	} else if (rdp->qlen < rdp->qlen_last_fqs_check - qhimark)
> +		rdp->qlen_last_fqs_check = rdp->qlen;
> +
>  	local_irq_restore(flags);
>  
>  	/* Re-raise the RCU softirq if there are callbacks remaining. */
> @@ -1224,7 +1231,7 @@ __rcu_process_callbacks(struct rcu_state *rsp, struct rcu_data *rdp)
>  	}
>  
>  	/* If there are callbacks ready, invoke them. */
> -	rcu_do_batch(rdp);
> +	rcu_do_batch(rsp, rdp);
>  }
>  
>  /*
> @@ -1288,10 +1295,20 @@ __call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu),
>  		rcu_start_gp(rsp, nestflag);  /* releases rnp_root->lock. */
>  	}
>  
> -	/* Force the grace period if too many callbacks or too long waiting. */
> -	if (unlikely(++rdp->qlen > qhimark)) {
> +	/*
> +	 * Force the grace period if too many callbacks or too long waiting.
> +	 * Enforce hysteresis, and don't invoke force_quiescent_state()
> +	 * if some other CPU has recently done so.  Also, don't bother
> +	 * invoking force_quiescent_state() if the newly enqueued callback
> +	 * is the only one waiting for a grace period to complete.
> +	 */
> +	if (unlikely(++rdp->qlen > rdp->qlen_last_fqs_check + qhimark)) {
>  		rdp->blimit = LONG_MAX;
> -		force_quiescent_state(rsp, 0);
> +		if (rsp->n_force_qs == rdp->n_force_qs_snap &&
> +		    *rdp->nxttail[RCU_DONE_TAIL] != head)
> +			force_quiescent_state(rsp, 0);
> +		rdp->n_force_qs_snap = rsp->n_force_qs;
> +		rdp->qlen_last_fqs_check = rdp->qlen;
>  	} else if ((long)(ACCESS_ONCE(rsp->jiffies_force_qs) - jiffies) < 0)
>  		force_quiescent_state(rsp, 1);
>  	local_irq_restore(flags);
> @@ -1523,6 +1540,8 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp, int preemptable)
>  	rdp->beenonline = 1;	 /* We have now been online. */
>  	rdp->preemptable = preemptable;
>  	rdp->passed_quiesc_completed = lastcomp - 1;
> +	rdp->qlen_last_fqs_check = 0;
> +	rdp->n_force_qs_snap = rsp->n_force_qs;
>  	rdp->blimit = blimit;
>  	spin_unlock(&rnp->lock);		/* irqs remain disabled. */
>  
> diff --git a/kernel/rcutree.h b/kernel/rcutree.h
> index b40ac57..599161f 100644
> --- a/kernel/rcutree.h
> +++ b/kernel/rcutree.h
> @@ -167,6 +167,10 @@ struct rcu_data {
>  	struct rcu_head *nxtlist;
>  	struct rcu_head **nxttail[RCU_NEXT_SIZE];
>  	long		qlen;		/* # of queued callbacks */
> +	long		qlen_last_fqs_check;
> +					/* qlen at last check for QS forcing */
> +	unsigned long	n_force_qs_snap;
> +					/* did other CPU force QS recently? */
>  	long		blimit;		/* Upper limit on a processed batch */
>  
>  #ifdef CONFIG_NO_HZ
> -- 
> 1.5.2.5

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH tip/core/rcu 2/6] rcu: prevent RCU IPI storms in presence of high call_rcu() load
  2009-10-15  3:31   ` Nick Piggin
@ 2009-10-15  4:37     ` Paul E. McKenney
  0 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2009-10-15  4:37 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, dvhltc, niv, tglx, peterz, rostedt, Valdis.Kletnieks,
	dhowells, jens.axboe

On Thu, Oct 15, 2009 at 05:31:04AM +0200, Nick Piggin wrote:
> Hi Paul,
> 
> I wonder why you don't just use the existing relaxed logic in
> force_quiescent_state? Is it because you are worried about
> different granularity of jiffies, and cases where RCU callbacks
> are being processed much more quickly than 1 jiffy? (this would
> make sense, I'm just asking because I'm curious as to your
> thinking behind it).

This code is designed to handle the case where the kernel is generating
callbacks quickly enough that you cannot afford to wait for the normal
time that force_quiescent_state() would be invoked.  So, if callbacks
are piling up too quickly, do a force_quiescent_state() -- but only
one during such an interval in order to keep the IPIs and lock
contention down to a dull roar.

Quite possibly in need of further tuning or modification, hence the
request for testing.

> Thanks, and yes I will give this a test and let you know how
> it goes.

Look forward to seeing the results!

						Thanx, Paul

> On Wed, Oct 14, 2009 at 10:15:55AM -0700, Paul E. McKenney wrote:
> > From: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > 
> > As the number of callbacks on a given CPU rises, invoke
> > force_quiescent_state() only every blimit number of callbacks
> > (defaults to 10,000), and even then only if no other CPU has invoked
> > force_quiescent_state() in the meantime.
> > 
> > Reported-by: Nick Piggin <npiggin@suse.de>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > ---
> >  kernel/rcutree.c |   29 ++++++++++++++++++++++++-----
> >  kernel/rcutree.h |    4 ++++
> >  2 files changed, 28 insertions(+), 5 deletions(-)
> > 
> > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > index 705f02a..ddbf111 100644
> > --- a/kernel/rcutree.c
> > +++ b/kernel/rcutree.c
> > @@ -958,7 +958,7 @@ static void rcu_offline_cpu(int cpu)
> >   * Invoke any RCU callbacks that have made it to the end of their grace
> >   * period.  Thottle as specified by rdp->blimit.
> >   */
> > -static void rcu_do_batch(struct rcu_data *rdp)
> > +static void rcu_do_batch(struct rcu_state *rsp, struct rcu_data *rdp)
> >  {
> >  	unsigned long flags;
> >  	struct rcu_head *next, *list, **tail;
> > @@ -1011,6 +1011,13 @@ static void rcu_do_batch(struct rcu_data *rdp)
> >  	if (rdp->blimit == LONG_MAX && rdp->qlen <= qlowmark)
> >  		rdp->blimit = blimit;
> >  
> > +	/* Reset ->qlen_last_fqs_check trigger if enough CBs have drained. */
> > +	if (rdp->qlen == 0 && rdp->qlen_last_fqs_check != 0) {
> > +		rdp->qlen_last_fqs_check = 0;
> > +		rdp->n_force_qs_snap = rsp->n_force_qs;
> > +	} else if (rdp->qlen < rdp->qlen_last_fqs_check - qhimark)
> > +		rdp->qlen_last_fqs_check = rdp->qlen;
> > +
> >  	local_irq_restore(flags);
> >  
> >  	/* Re-raise the RCU softirq if there are callbacks remaining. */
> > @@ -1224,7 +1231,7 @@ __rcu_process_callbacks(struct rcu_state *rsp, struct rcu_data *rdp)
> >  	}
> >  
> >  	/* If there are callbacks ready, invoke them. */
> > -	rcu_do_batch(rdp);
> > +	rcu_do_batch(rsp, rdp);
> >  }
> >  
> >  /*
> > @@ -1288,10 +1295,20 @@ __call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu),
> >  		rcu_start_gp(rsp, nestflag);  /* releases rnp_root->lock. */
> >  	}
> >  
> > -	/* Force the grace period if too many callbacks or too long waiting. */
> > -	if (unlikely(++rdp->qlen > qhimark)) {
> > +	/*
> > +	 * Force the grace period if too many callbacks or too long waiting.
> > +	 * Enforce hysteresis, and don't invoke force_quiescent_state()
> > +	 * if some other CPU has recently done so.  Also, don't bother
> > +	 * invoking force_quiescent_state() if the newly enqueued callback
> > +	 * is the only one waiting for a grace period to complete.
> > +	 */
> > +	if (unlikely(++rdp->qlen > rdp->qlen_last_fqs_check + qhimark)) {
> >  		rdp->blimit = LONG_MAX;
> > -		force_quiescent_state(rsp, 0);
> > +		if (rsp->n_force_qs == rdp->n_force_qs_snap &&
> > +		    *rdp->nxttail[RCU_DONE_TAIL] != head)
> > +			force_quiescent_state(rsp, 0);
> > +		rdp->n_force_qs_snap = rsp->n_force_qs;
> > +		rdp->qlen_last_fqs_check = rdp->qlen;
> >  	} else if ((long)(ACCESS_ONCE(rsp->jiffies_force_qs) - jiffies) < 0)
> >  		force_quiescent_state(rsp, 1);
> >  	local_irq_restore(flags);
> > @@ -1523,6 +1540,8 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp, int preemptable)
> >  	rdp->beenonline = 1;	 /* We have now been online. */
> >  	rdp->preemptable = preemptable;
> >  	rdp->passed_quiesc_completed = lastcomp - 1;
> > +	rdp->qlen_last_fqs_check = 0;
> > +	rdp->n_force_qs_snap = rsp->n_force_qs;
> >  	rdp->blimit = blimit;
> >  	spin_unlock(&rnp->lock);		/* irqs remain disabled. */
> >  
> > diff --git a/kernel/rcutree.h b/kernel/rcutree.h
> > index b40ac57..599161f 100644
> > --- a/kernel/rcutree.h
> > +++ b/kernel/rcutree.h
> > @@ -167,6 +167,10 @@ struct rcu_data {
> >  	struct rcu_head *nxtlist;
> >  	struct rcu_head **nxttail[RCU_NEXT_SIZE];
> >  	long		qlen;		/* # of queued callbacks */
> > +	long		qlen_last_fqs_check;
> > +					/* qlen at last check for QS forcing */
> > +	unsigned long	n_force_qs_snap;
> > +					/* did other CPU force QS recently? */
> >  	long		blimit;		/* Upper limit on a processed batch */
> >  
> >  #ifdef CONFIG_NO_HZ
> > -- 
> > 1.5.2.5

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH tip/core/rcu 0/6] rcu: fix synchronize_rcu_expedited(), update docs, improve perf
  2009-10-14 17:15 [PATCH tip/core/rcu 0/6] rcu: fix synchronize_rcu_expedited(), update docs, improve perf Paul E. McKenney
                   ` (6 preceding siblings ...)
  2009-10-14 20:28 ` [PATCH tip/core/rcu 0/6] rcu: fix synchronize_rcu_expedited(), update docs, improve perf Josh Triplett
@ 2009-10-15  9:21 ` Ingo Molnar
  2009-10-15  9:35   ` Josh Triplett
  7 siblings, 1 reply; 24+ messages in thread
From: Ingo Molnar @ 2009-10-15  9:21 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, laijs, dipankar, akpm, mathieu.desnoyers, josh,
	dvhltc, niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	npiggin, jens.axboe


* Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:

> This patchset contains a bug fix, a performance improvement, and 
> documentation updates:
> 
> o	Update Documentation/RCU/trace.txt to reflect recent changes
> 	(including the removal of rcupreempt.c).

i've applied this to the .33 queue.

> 
> o	Fix to the severe performance problem with excessive IPIs and
> 	lock contention in presence of very large (but legal) numbers
> 	of RCU callbacks (reported by Nick Piggin).

i've applied this to the .32 queue.

> o	Stopgap fix for a bug in TREE_PREEMPT_RCU's implementation of
> 	synchronize_rcu_expedited().  This fix is correct, but no faster
> 	than synchronize_rcu().
> 
> o	Add exports for the updated synchronize_rcu_expedited()
> 	implementation, which moved from a static inline in
> 	include/linux/rcupdate.h to a separately compiled function
> 	in kernel/rcutree_plugin.h.

i've merged these two into the same commit and applied it to the .32 
queue.

> o	Add the new rnp->blocked_tasks field to the rcuhier trace file
> 	in debugfs.
> 
> o	Update the Documentation/RCU/trace.txt documentation to include
> 	the rnp->blocked_tasks tracing.

i've applied these to the .33 queue as well. (both tracing and 
documentation is not urgent material.) I also did minor edits to the 
changelogs.

Thanks Paul!

	Ingo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [tip:core/rcu] rcu: Prevent RCU IPI storms in presence of high call_rcu() load
  2009-10-14 17:15 ` [PATCH tip/core/rcu 2/6] rcu: prevent RCU IPI storms in presence of high call_rcu() load Paul E. McKenney
  2009-10-15  3:31   ` Nick Piggin
@ 2009-10-15  9:24   ` tip-bot for Paul E. McKenney
  2009-10-15 11:04   ` [PATCH tip/core/rcu 2/6] rcu: prevent " Nick Piggin
  2 siblings, 0 replies; 24+ messages in thread
From: tip-bot for Paul E. McKenney @ 2009-10-15  9:24 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, npiggin, paulmck, hpa, mingo, tglx, mingo

Commit-ID:  37c72e56f6b234ea7387ba530434a80abf2658d8
Gitweb:     http://git.kernel.org/tip/37c72e56f6b234ea7387ba530434a80abf2658d8
Author:     Paul E. McKenney <paulmck@linux.vnet.ibm.com>
AuthorDate: Wed, 14 Oct 2009 10:15:55 -0700
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 15 Oct 2009 11:17:16 +0200

rcu: Prevent RCU IPI storms in presence of high call_rcu() load

As the number of callbacks on a given CPU rises, invoke
force_quiescent_state() only every blimit number of callbacks
(defaults to 10,000), and even then only if no other CPU has
invoked force_quiescent_state() in the meantime.

This should fix the performance regression reported by Nick.

Reported-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: jens.axboe@oracle.com
LKML-Reference: <12555405592133-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/rcutree.c |   29 ++++++++++++++++++++++++-----
 kernel/rcutree.h |    4 ++++
 2 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 705f02a..ddbf111 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -958,7 +958,7 @@ static void rcu_offline_cpu(int cpu)
  * Invoke any RCU callbacks that have made it to the end of their grace
  * period.  Thottle as specified by rdp->blimit.
  */
-static void rcu_do_batch(struct rcu_data *rdp)
+static void rcu_do_batch(struct rcu_state *rsp, struct rcu_data *rdp)
 {
 	unsigned long flags;
 	struct rcu_head *next, *list, **tail;
@@ -1011,6 +1011,13 @@ static void rcu_do_batch(struct rcu_data *rdp)
 	if (rdp->blimit == LONG_MAX && rdp->qlen <= qlowmark)
 		rdp->blimit = blimit;
 
+	/* Reset ->qlen_last_fqs_check trigger if enough CBs have drained. */
+	if (rdp->qlen == 0 && rdp->qlen_last_fqs_check != 0) {
+		rdp->qlen_last_fqs_check = 0;
+		rdp->n_force_qs_snap = rsp->n_force_qs;
+	} else if (rdp->qlen < rdp->qlen_last_fqs_check - qhimark)
+		rdp->qlen_last_fqs_check = rdp->qlen;
+
 	local_irq_restore(flags);
 
 	/* Re-raise the RCU softirq if there are callbacks remaining. */
@@ -1224,7 +1231,7 @@ __rcu_process_callbacks(struct rcu_state *rsp, struct rcu_data *rdp)
 	}
 
 	/* If there are callbacks ready, invoke them. */
-	rcu_do_batch(rdp);
+	rcu_do_batch(rsp, rdp);
 }
 
 /*
@@ -1288,10 +1295,20 @@ __call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu),
 		rcu_start_gp(rsp, nestflag);  /* releases rnp_root->lock. */
 	}
 
-	/* Force the grace period if too many callbacks or too long waiting. */
-	if (unlikely(++rdp->qlen > qhimark)) {
+	/*
+	 * Force the grace period if too many callbacks or too long waiting.
+	 * Enforce hysteresis, and don't invoke force_quiescent_state()
+	 * if some other CPU has recently done so.  Also, don't bother
+	 * invoking force_quiescent_state() if the newly enqueued callback
+	 * is the only one waiting for a grace period to complete.
+	 */
+	if (unlikely(++rdp->qlen > rdp->qlen_last_fqs_check + qhimark)) {
 		rdp->blimit = LONG_MAX;
-		force_quiescent_state(rsp, 0);
+		if (rsp->n_force_qs == rdp->n_force_qs_snap &&
+		    *rdp->nxttail[RCU_DONE_TAIL] != head)
+			force_quiescent_state(rsp, 0);
+		rdp->n_force_qs_snap = rsp->n_force_qs;
+		rdp->qlen_last_fqs_check = rdp->qlen;
 	} else if ((long)(ACCESS_ONCE(rsp->jiffies_force_qs) - jiffies) < 0)
 		force_quiescent_state(rsp, 1);
 	local_irq_restore(flags);
@@ -1523,6 +1540,8 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp, int preemptable)
 	rdp->beenonline = 1;	 /* We have now been online. */
 	rdp->preemptable = preemptable;
 	rdp->passed_quiesc_completed = lastcomp - 1;
+	rdp->qlen_last_fqs_check = 0;
+	rdp->n_force_qs_snap = rsp->n_force_qs;
 	rdp->blimit = blimit;
 	spin_unlock(&rnp->lock);		/* irqs remain disabled. */
 
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index b40ac57..599161f 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -167,6 +167,10 @@ struct rcu_data {
 	struct rcu_head *nxtlist;
 	struct rcu_head **nxttail[RCU_NEXT_SIZE];
 	long		qlen;		/* # of queued callbacks */
+	long		qlen_last_fqs_check;
+					/* qlen at last check for QS forcing */
+	unsigned long	n_force_qs_snap;
+					/* did other CPU force QS recently? */
 	long		blimit;		/* Upper limit on a processed batch */
 
 #ifdef CONFIG_NO_HZ

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [tip:core/rcu] rcu: Stopgap fix for synchronize_rcu_expedited() for TREE_PREEMPT_RCU
  2009-10-14 17:15 ` [PATCH tip/core/rcu 3/6] rcu: stopgap fix for synchronize_rcu_expedited() for TREE_PREEMPT_RCU Paul E. McKenney
@ 2009-10-15  9:25   ` tip-bot for Paul E. McKenney
  0 siblings, 0 replies; 24+ messages in thread
From: tip-bot for Paul E. McKenney @ 2009-10-15  9:25 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, paulmck, hpa, mingo, tglx, mingo

Commit-ID:  019129d595caaa5bd0b41d128308da1be6a91869
Gitweb:     http://git.kernel.org/tip/019129d595caaa5bd0b41d128308da1be6a91869
Author:     Paul E. McKenney <paulmck@linux.vnet.ibm.com>
AuthorDate: Wed, 14 Oct 2009 10:15:56 -0700
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 15 Oct 2009 11:17:17 +0200

rcu: Stopgap fix for synchronize_rcu_expedited() for TREE_PREEMPT_RCU

For the short term, map synchronize_rcu_expedited() to
synchronize_rcu() for TREE_PREEMPT_RCU and to
synchronize_sched_expedited() for TREE_RCU.

Longer term, there needs to be a real expedited grace period for
TREE_PREEMPT_RCU, but candidate patches to date are considerably
more complex and intrusive.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: npiggin@suse.de
Cc: jens.axboe@oracle.com
LKML-Reference: <12555405592331-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 include/linux/rcutree.h |    6 +-----
 kernel/rcutree_plugin.h |   21 +++++++++++++++++++++
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index 46e9ab3..9642c6b 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -76,11 +76,7 @@ static inline void __rcu_read_unlock_bh(void)
 
 extern void call_rcu_sched(struct rcu_head *head,
 			   void (*func)(struct rcu_head *rcu));
-
-static inline void synchronize_rcu_expedited(void)
-{
-	synchronize_sched_expedited();
-}
+extern void synchronize_rcu_expedited(void);
 
 static inline void synchronize_rcu_bh_expedited(void)
 {
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index c0cb783..ebd20ee 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -393,6 +393,17 @@ void call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
 EXPORT_SYMBOL_GPL(call_rcu);
 
 /*
+ * Wait for an rcu-preempt grace period.  We are supposed to expedite the
+ * grace period, but this is the crude slow compatability hack, so just
+ * invoke synchronize_rcu().
+ */
+void synchronize_rcu_expedited(void)
+{
+	synchronize_rcu();
+}
+EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
+
+/*
  * Check to see if there is any immediate preemptable-RCU-related work
  * to be done.
  */
@@ -565,6 +576,16 @@ void call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
 EXPORT_SYMBOL_GPL(call_rcu);
 
 /*
+ * Wait for an rcu-preempt grace period, but make it happen quickly.
+ * But because preemptable RCU does not exist, map to rcu-sched.
+ */
+void synchronize_rcu_expedited(void)
+{
+	synchronize_sched_expedited();
+}
+EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
+
+/*
  * Because preemptable RCU does not exist, it never has any work to do.
  */
 static int rcu_preempt_pending(int cpu)

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [tip:core/rcu] rcu: Add rnp->blocked_tasks to tracing
  2009-10-14 23:36     ` Paul E. McKenney
@ 2009-10-15  9:25       ` tip-bot for Paul E. McKenney
  0 siblings, 0 replies; 24+ messages in thread
From: tip-bot for Paul E. McKenney @ 2009-10-15  9:25 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, paulmck, hpa, mingo, josh, tglx, mingo

Commit-ID:  3397e040dfacbb303498ced1baa96be983dcea06
Gitweb:     http://git.kernel.org/tip/3397e040dfacbb303498ced1baa96be983dcea06
Author:     Paul E. McKenney <paulmck@linux.vnet.ibm.com>
AuthorDate: Wed, 14 Oct 2009 16:36:38 -0700
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 15 Oct 2009 11:20:22 +0200

rcu: Add rnp->blocked_tasks to tracing

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: npiggin@suse.de
Cc: jens.axboe@oracle.com
Cc: Josh Triplett <josh@joshtriplett.org>
LKML-Reference: <20091014233638.GE6763@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
 kernel/rcutree_trace.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)
---
 kernel/rcutree_trace.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/rcutree_trace.c b/kernel/rcutree_trace.c
index 4b31c77..1984cdc 100644
--- a/kernel/rcutree_trace.c
+++ b/kernel/rcutree_trace.c
@@ -155,12 +155,14 @@ static const struct file_operations rcudata_csv_fops = {
 
 static void print_one_rcu_state(struct seq_file *m, struct rcu_state *rsp)
 {
+	long gpnum;
 	int level = 0;
 	struct rcu_node *rnp;
 
+	gpnum = rsp->gpnum;
 	seq_printf(m, "c=%ld g=%ld s=%d jfq=%ld j=%x "
 		      "nfqs=%lu/nfqsng=%lu(%lu) fqlh=%lu oqlen=%ld\n",
-		   rsp->completed, rsp->gpnum, rsp->signaled,
+		   rsp->completed, gpnum, rsp->signaled,
 		   (long)(rsp->jiffies_force_qs - jiffies),
 		   (int)(jiffies & 0xffff),
 		   rsp->n_force_qs, rsp->n_force_qs_ngp,
@@ -171,8 +173,10 @@ static void print_one_rcu_state(struct seq_file *m, struct rcu_state *rsp)
 			seq_puts(m, "\n");
 			level = rnp->level;
 		}
-		seq_printf(m, "%lx/%lx %d:%d ^%d    ",
+		seq_printf(m, "%lx/%lx %c>%c %d:%d ^%d    ",
 			   rnp->qsmask, rnp->qsmaskinit,
+			   "T."[list_empty(&rnp->blocked_tasks[gpnum & 1])],
+			   "T."[list_empty(&rnp->blocked_tasks[!(gpnum & 1)])],
 			   rnp->grplo, rnp->grphi, rnp->grpnum);
 	}
 	seq_puts(m, "\n");

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [tip:core/rcu] rcu: Update trace.txt documentation to reflect recent changes
  2009-10-14 17:15 ` [PATCH tip/core/rcu 1/6] rcu: Update trace.txt documentation to reflect recent changes Paul E. McKenney
@ 2009-10-15  9:25   ` tip-bot for Paul E. McKenney
  0 siblings, 0 replies; 24+ messages in thread
From: tip-bot for Paul E. McKenney @ 2009-10-15  9:25 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, paulmck, hpa, mingo, tglx, mingo

Commit-ID:  bd58b430039435e4c981cf802b5b11d511d73abd
Gitweb:     http://git.kernel.org/tip/bd58b430039435e4c981cf802b5b11d511d73abd
Author:     Paul E. McKenney <paulmck@linux.vnet.ibm.com>
AuthorDate: Wed, 14 Oct 2009 10:15:54 -0700
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 15 Oct 2009 11:20:23 +0200

rcu: Update trace.txt documentation to reflect recent changes

o	Remove the CONFIG_PREEMPT_RCU documentation since this
	config option has now been removed.

o	Change the now-incorrect references to "rcu" labels to
	instead be "rcu_sched".

o	Add notes stating that CONFIG_TREE_PREEMPT_RCU kernels will
	have additional "rcu_preempt" output.

o	Note the new "oqlen" field in the rcuhier output (for
	RCU callbacks orphaned by an offlined CPU).

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: npiggin@suse.de
Cc: jens.axboe@oracle.com
LKML-Reference: <1255540559799-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 Documentation/RCU/trace.txt |  231 ++++++------------------------------------
 1 files changed, 33 insertions(+), 198 deletions(-)

diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt
index 187bbf1..c1a9550 100644
--- a/Documentation/RCU/trace.txt
+++ b/Documentation/RCU/trace.txt
@@ -1,185 +1,10 @@
 CONFIG_RCU_TRACE debugfs Files and Formats
 
 
-The rcupreempt and rcutree implementations of RCU provide debugfs trace
-output that summarizes counters and state.  This information is useful for
-debugging RCU itself, and can sometimes also help to debug abuses of RCU.
-Note that the rcuclassic implementation of RCU does not provide debugfs
-trace output.
-
-The following sections describe the debugfs files and formats for
-preemptable RCU (rcupreempt) and hierarchical RCU (rcutree).
-
-
-Preemptable RCU debugfs Files and Formats
-
-This implementation of RCU provides three debugfs files under the
-top-level directory RCU: rcu/rcuctrs (which displays the per-CPU
-counters used by preemptable RCU) rcu/rcugp (which displays grace-period
-counters), and rcu/rcustats (which internal counters for debugging RCU).
-
-The output of "cat rcu/rcuctrs" looks as follows:
-
-CPU last cur F M
-  0    5  -5 0 0
-  1   -1   0 0 0
-  2    0   1 0 0
-  3    0   1 0 0
-  4    0   1 0 0
-  5    0   1 0 0
-  6    0   2 0 0
-  7    0  -1 0 0
-  8    0   1 0 0
-ggp = 26226, state = waitzero
-
-The per-CPU fields are as follows:
-
-o	"CPU" gives the CPU number.  Offline CPUs are not displayed.
-
-o	"last" gives the value of the counter that is being decremented
-	for the current grace period phase.  In the example above,
-	the counters sum to 4, indicating that there are still four
-	RCU read-side critical sections still running that started
-	before the last counter flip.
-
-o	"cur" gives the value of the counter that is currently being
-	both incremented (by rcu_read_lock()) and decremented (by
-	rcu_read_unlock()).  In the example above, the counters sum to
-	1, indicating that there is only one RCU read-side critical section
-	still running that started after the last counter flip.
-
-o	"F" indicates whether RCU is waiting for this CPU to acknowledge
-	a counter flip.  In the above example, RCU is not waiting on any,
-	which is consistent with the state being "waitzero" rather than
-	"waitack".
-
-o	"M" indicates whether RCU is waiting for this CPU to execute a
-	memory barrier.  In the above example, RCU is not waiting on any,
-	which is consistent with the state being "waitzero" rather than
-	"waitmb".
-
-o	"ggp" is the global grace-period counter.
-
-o	"state" is the RCU state, which can be one of the following:
-
-	o	"idle": there is no grace period in progress.
-
-	o	"waitack": RCU just incremented the global grace-period
-		counter, which has the effect of reversing the roles of
-		the "last" and "cur" counters above, and is waiting for
-		all the CPUs to acknowledge the flip.  Once the flip has
-		been acknowledged, CPUs will no longer be incrementing
-		what are now the "last" counters, so that their sum will
-		decrease monotonically down to zero.
-
-	o	"waitzero": RCU is waiting for the sum of the "last" counters
-		to decrease to zero.
-
-	o	"waitmb": RCU is waiting for each CPU to execute a memory
-		barrier, which ensures that instructions from a given CPU's
-		last RCU read-side critical section cannot be reordered
-		with instructions following the memory-barrier instruction.
-
-The output of "cat rcu/rcugp" looks as follows:
-
-oldggp=48870  newggp=48873
-
-Note that reading from this file provokes a synchronize_rcu().  The
-"oldggp" value is that of "ggp" from rcu/rcuctrs above, taken before
-executing the synchronize_rcu(), and the "newggp" value is also the
-"ggp" value, but taken after the synchronize_rcu() command returns.
-
-
-The output of "cat rcu/rcugp" looks as follows:
-
-na=1337955 nl=40 wa=1337915 wl=44 da=1337871 dl=0 dr=1337871 di=1337871
-1=50989 e1=6138 i1=49722 ie1=82 g1=49640 a1=315203 ae1=265563 a2=49640
-z1=1401244 ze1=1351605 z2=49639 m1=5661253 me1=5611614 m2=49639
-
-These are counters tracking internal preemptable-RCU events, however,
-some of them may be useful for debugging algorithms using RCU.  In
-particular, the "nl", "wl", and "dl" values track the number of RCU
-callbacks in various states.  The fields are as follows:
-
-o	"na" is the total number of RCU callbacks that have been enqueued
-	since boot.
-
-o	"nl" is the number of RCU callbacks waiting for the previous
-	grace period to end so that they can start waiting on the next
-	grace period.
-
-o	"wa" is the total number of RCU callbacks that have started waiting
-	for a grace period since boot.  "na" should be roughly equal to
-	"nl" plus "wa".
-
-o	"wl" is the number of RCU callbacks currently waiting for their
-	grace period to end.
-
-o	"da" is the total number of RCU callbacks whose grace periods
-	have completed since boot.  "wa" should be roughly equal to
-	"wl" plus "da".
-
-o	"dr" is the total number of RCU callbacks that have been removed
-	from the list of callbacks ready to invoke.  "dr" should be roughly
-	equal to "da".
-
-o	"di" is the total number of RCU callbacks that have been invoked
-	since boot.  "di" should be roughly equal to "da", though some
-	early versions of preemptable RCU had a bug so that only the
-	last CPU's count of invocations was displayed, rather than the
-	sum of all CPU's counts.
-
-o	"1" is the number of calls to rcu_try_flip().  This should be
-	roughly equal to the sum of "e1", "i1", "a1", "z1", and "m1"
-	described below.  In other words, the number of times that
-	the state machine is visited should be equal to the sum of the
-	number of times that each state is visited plus the number of
-	times that the state-machine lock acquisition failed.
-
-o	"e1" is the number of times that rcu_try_flip() was unable to
-	acquire the fliplock.
-
-o	"i1" is the number of calls to rcu_try_flip_idle().
-
-o	"ie1" is the number of times rcu_try_flip_idle() exited early
-	due to the calling CPU having no work for RCU.
-
-o	"g1" is the number of times that rcu_try_flip_idle() decided
-	to start a new grace period.  "i1" should be roughly equal to
-	"ie1" plus "g1".
-
-o	"a1" is the number of calls to rcu_try_flip_waitack().
-
-o	"ae1" is the number of times that rcu_try_flip_waitack() found
-	that at least one CPU had not yet acknowledge the new grace period
-	(AKA "counter flip").
-
-o	"a2" is the number of time rcu_try_flip_waitack() found that
-	all CPUs had acknowledged.  "a1" should be roughly equal to
-	"ae1" plus "a2".  (This particular output was collected on
-	a 128-CPU machine, hence the smaller-than-usual fraction of
-	calls to rcu_try_flip_waitack() finding all CPUs having already
-	acknowledged.)
-
-o	"z1" is the number of calls to rcu_try_flip_waitzero().
-
-o	"ze1" is the number of times that rcu_try_flip_waitzero() found
-	that not all of the old RCU read-side critical sections had
-	completed.
-
-o	"z2" is the number of times that rcu_try_flip_waitzero() finds
-	the sum of the counters equal to zero, in other words, that
-	all of the old RCU read-side critical sections had completed.
-	The value of "z1" should be roughly equal to "ze1" plus
-	"z2".
-
-o	"m1" is the number of calls to rcu_try_flip_waitmb().
-
-o	"me1" is the number of times that rcu_try_flip_waitmb() finds
-	that at least one CPU has not yet executed a memory barrier.
-
-o	"m2" is the number of times that rcu_try_flip_waitmb() finds that
-	all CPUs have executed a memory barrier.
+The rcutree implementation of RCU provides debugfs trace output that
+summarizes counters and state.  This information is useful for debugging
+RCU itself, and can sometimes also help to debug abuses of RCU.
+The following sections describe the debugfs files and formats.
 
 
 Hierarchical RCU debugfs Files and Formats
@@ -210,9 +35,10 @@ rcu_bh:
   6 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=859/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
   7 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=3761/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
 
-The first section lists the rcu_data structures for rcu, the second for
-rcu_bh.  Each section has one line per CPU, or eight for this 8-CPU system.
-The fields are as follows:
+The first section lists the rcu_data structures for rcu_sched, the second
+for rcu_bh.  Note that CONFIG_TREE_PREEMPT_RCU kernels will have an
+additional section for rcu_preempt.  Each section has one line per CPU,
+or eight for this 8-CPU system.  The fields are as follows:
 
 o	The number at the beginning of each line is the CPU number.
 	CPUs numbers followed by an exclamation mark are offline,
@@ -223,9 +49,9 @@ o	The number at the beginning of each line is the CPU number.
 
 o	"c" is the count of grace periods that this CPU believes have
 	completed.  CPUs in dynticks idle mode may lag quite a ways
-	behind, for example, CPU 4 under "rcu" above, which has slept
-	through the past 25 RCU grace periods.	It is not unusual to
-	see CPUs lagging by thousands of grace periods.
+	behind, for example, CPU 4 under "rcu_sched" above, which has
+	slept through the past 25 RCU grace periods.  It is not unusual
+	to see CPUs lagging by thousands of grace periods.
 
 o	"g" is the count of grace periods that this CPU believes have
 	started.  Again, CPUs in dynticks idle mode may lag behind.
@@ -308,8 +134,10 @@ The output of "cat rcu/rcugp" looks as follows:
 rcu_sched: completed=33062  gpnum=33063
 rcu_bh: completed=464  gpnum=464
 
-Again, this output is for both "rcu" and "rcu_bh".  The fields are
-taken from the rcu_state structure, and are as follows:
+Again, this output is for both "rcu_sched" and "rcu_bh".  Note that
+kernels built with CONFIG_TREE_PREEMPT_RCU will have an additional
+"rcu_preempt" line.  The fields are taken from the rcu_state structure,
+and are as follows:
 
 o	"completed" is the number of grace periods that have completed.
 	It is comparable to the "c" field from rcu/rcudata in that a
@@ -324,23 +152,24 @@ o	"gpnum" is the number of grace periods that have started.  It is
 	If these two fields are equal (as they are for "rcu_bh" above),
 	then there is no grace period in progress, in other words, RCU
 	is idle.  On the other hand, if the two fields differ (as they
-	do for "rcu" above), then an RCU grace period is in progress.
+	do for "rcu_sched" above), then an RCU grace period is in progress.
 
 
 The output of "cat rcu/rcuhier" looks as follows, with very long lines:
 
-c=6902 g=6903 s=2 jfq=3 j=72c7 nfqs=13142/nfqsng=0(13142) fqlh=6
+c=6902 g=6903 s=2 jfq=3 j=72c7 nfqs=13142/nfqsng=0(13142) fqlh=6 oqlen=0
 1/1 0:127 ^0    
 3/3 0:35 ^0    0/0 36:71 ^1    0/0 72:107 ^2    0/0 108:127 ^3    
 3/3f 0:5 ^0    2/3 6:11 ^1    0/0 12:17 ^2    0/0 18:23 ^3    0/0 24:29 ^4    0/0 30:35 ^5    0/0 36:41 ^0    0/0 42:47 ^1    0/0 48:53 ^2    0/0 54:59 ^3    0/0 60:65 ^4    0/0 66:71 ^5    0/0 72:77 ^0    0/0 78:83 ^1    0/0 84:89 ^2    0/0 90:95 ^3    0/0 96:101 ^4    0/0 102:107 ^5    0/0 108:113 ^0    0/0 114:119 ^1    0/0 120:125 ^2    0/0 126:127 ^3    
 rcu_bh:
-c=-226 g=-226 s=1 jfq=-5701 j=72c7 nfqs=88/nfqsng=0(88) fqlh=0
+c=-226 g=-226 s=1 jfq=-5701 j=72c7 nfqs=88/nfqsng=0(88) fqlh=0 oqlen=0
 0/1 0:127 ^0    
 0/3 0:35 ^0    0/0 36:71 ^1    0/0 72:107 ^2    0/0 108:127 ^3    
 0/3f 0:5 ^0    0/3 6:11 ^1    0/0 12:17 ^2    0/0 18:23 ^3    0/0 24:29 ^4    0/0 30:35 ^5    0/0 36:41 ^0    0/0 42:47 ^1    0/0 48:53 ^2    0/0 54:59 ^3    0/0 60:65 ^4    0/0 66:71 ^5    0/0 72:77 ^0    0/0 78:83 ^1    0/0 84:89 ^2    0/0 90:95 ^3    0/0 96:101 ^4    0/0 102:107 ^5    0/0 108:113 ^0    0/0 114:119 ^1    0/0 120:125 ^2    0/0 126:127 ^3
 
-This is once again split into "rcu" and "rcu_bh" portions.  The fields are
-as follows:
+This is once again split into "rcu_sched" and "rcu_bh" portions,
+and CONFIG_TREE_PREEMPT_RCU kernels will again have an additional
+"rcu_preempt" section.  The fields are as follows:
 
 o	"c" is exactly the same as "completed" under rcu/rcugp.
 
@@ -372,6 +201,11 @@ o	"fqlh" is the number of calls to force_quiescent_state() that
 	exited immediately (without even being counted in nfqs above)
 	due to contention on ->fqslock.
 
+o	"oqlen" is the number of callbacks on the "orphan" callback
+	list.  RCU callbacks are placed on this list by CPUs going
+	offline, and are "adopted" either by the CPU helping the outgoing
+	CPU or by the next rcu_barrier*() call, whichever comes first.
+
 o	Each element of the form "1/1 0:127 ^0" represents one struct
 	rcu_node.  Each line represents one level of the hierarchy, from
 	root to leaves.  It is best to think of the rcu_data structures
@@ -389,10 +223,10 @@ o	Each element of the form "1/1 0:127 ^0" represents one struct
 		The value of qsmaskinit is assigned to that of qsmask
 		at the beginning of each grace period.
 
-		For example, for "rcu", the qsmask of the first entry
-		of the lowest level is 0x14, meaning that we are still
-		waiting for CPUs 2 and 4 to check in for the current
-		grace period.
+		For example, for "rcu_sched", the qsmask of the first
+		entry of the lowest level is 0x14, meaning that we
+		are still waiting for CPUs 2 and 4 to check in for the
+		current grace period.
 
 	o	The numbers separated by the ":" are the range of CPUs
 		served by this struct rcu_node.  This can be helpful
@@ -431,8 +265,9 @@ rcu_bh:
   6 np=120834 qsp=9902 cbr=0 cng=0 gpc=6 gps=3 nf=2 nn=110921
   7 np=144888 qsp=26336 cbr=0 cng=0 gpc=8 gps=2 nf=0 nn=118542
 
-As always, this is once again split into "rcu" and "rcu_bh" portions.
-The fields are as follows:
+As always, this is once again split into "rcu_sched" and "rcu_bh"
+portions, with CONFIG_TREE_PREEMPT_RCU kernels having an additional
+"rcu_preempt" section.  The fields are as follows:
 
 o	"np" is the number of times that __rcu_pending() has been invoked
 	for the corresponding flavor of RCU.

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [tip:core/rcu] rcu: Update trace.txt documentation for blocked-tasks lists
  2009-10-14 17:15 ` [PATCH tip/core/rcu 6/6] rcu: Update trace.txt documentation for blocked-tasks lists Paul E. McKenney
@ 2009-10-15  9:25   ` tip-bot for Paul E. McKenney
  0 siblings, 0 replies; 24+ messages in thread
From: tip-bot for Paul E. McKenney @ 2009-10-15  9:25 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, paulmck, hpa, mingo, tglx, mingo

Commit-ID:  0edf1a683e499191b27a067956ae9f5fa6e046c6
Gitweb:     http://git.kernel.org/tip/0edf1a683e499191b27a067956ae9f5fa6e046c6
Author:     Paul E. McKenney <paulmck@linux.vnet.ibm.com>
AuthorDate: Wed, 14 Oct 2009 10:15:59 -0700
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 15 Oct 2009 11:20:23 +0200

rcu: Update trace.txt documentation for blocked-tasks lists

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: npiggin@suse.de
Cc: jens.axboe@oracle.com
LKML-Reference: <12555405592804-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 Documentation/RCU/trace.txt |   23 ++++++++++++++++-------
 1 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt
index c1a9550..8608fd8 100644
--- a/Documentation/RCU/trace.txt
+++ b/Documentation/RCU/trace.txt
@@ -158,14 +158,14 @@ o	"gpnum" is the number of grace periods that have started.  It is
 The output of "cat rcu/rcuhier" looks as follows, with very long lines:
 
 c=6902 g=6903 s=2 jfq=3 j=72c7 nfqs=13142/nfqsng=0(13142) fqlh=6 oqlen=0
-1/1 0:127 ^0    
-3/3 0:35 ^0    0/0 36:71 ^1    0/0 72:107 ^2    0/0 108:127 ^3    
-3/3f 0:5 ^0    2/3 6:11 ^1    0/0 12:17 ^2    0/0 18:23 ^3    0/0 24:29 ^4    0/0 30:35 ^5    0/0 36:41 ^0    0/0 42:47 ^1    0/0 48:53 ^2    0/0 54:59 ^3    0/0 60:65 ^4    0/0 66:71 ^5    0/0 72:77 ^0    0/0 78:83 ^1    0/0 84:89 ^2    0/0 90:95 ^3    0/0 96:101 ^4    0/0 102:107 ^5    0/0 108:113 ^0    0/0 114:119 ^1    0/0 120:125 ^2    0/0 126:127 ^3    
+1/1 .>. 0:127 ^0    
+3/3 .>. 0:35 ^0    0/0 .>. 36:71 ^1    0/0 .>. 72:107 ^2    0/0 .>. 108:127 ^3    
+3/3f .>. 0:5 ^0    2/3 .>. 6:11 ^1    0/0 .>. 12:17 ^2    0/0 .>. 18:23 ^3    0/0 .>. 24:29 ^4    0/0 .>. 30:35 ^5    0/0 .>. 36:41 ^0    0/0 .>. 42:47 ^1    0/0 .>. 48:53 ^2    0/0 .>. 54:59 ^3    0/0 .>. 60:65 ^4    0/0 .>. 66:71 ^5    0/0 .>. 72:77 ^0    0/0 .>. 78:83 ^1    0/0 .>. 84:89 ^2    0/0 .>. 90:95 ^3    0/0 .>. 96:101 ^4    0/0 .>. 102:107 ^5    0/0 .>. 108:113 ^0    0/0 .>. 114:119 ^1    0/0 .>. 120:125 ^2    0/0 .>. 126:127 ^3    
 rcu_bh:
 c=-226 g=-226 s=1 jfq=-5701 j=72c7 nfqs=88/nfqsng=0(88) fqlh=0 oqlen=0
-0/1 0:127 ^0    
-0/3 0:35 ^0    0/0 36:71 ^1    0/0 72:107 ^2    0/0 108:127 ^3    
-0/3f 0:5 ^0    0/3 6:11 ^1    0/0 12:17 ^2    0/0 18:23 ^3    0/0 24:29 ^4    0/0 30:35 ^5    0/0 36:41 ^0    0/0 42:47 ^1    0/0 48:53 ^2    0/0 54:59 ^3    0/0 60:65 ^4    0/0 66:71 ^5    0/0 72:77 ^0    0/0 78:83 ^1    0/0 84:89 ^2    0/0 90:95 ^3    0/0 96:101 ^4    0/0 102:107 ^5    0/0 108:113 ^0    0/0 114:119 ^1    0/0 120:125 ^2    0/0 126:127 ^3
+0/1 .>. 0:127 ^0    
+0/3 .>. 0:35 ^0    0/0 .>. 36:71 ^1    0/0 .>. 72:107 ^2    0/0 .>. 108:127 ^3    
+0/3f .>. 0:5 ^0    0/3 .>. 6:11 ^1    0/0 .>. 12:17 ^2    0/0 .>. 18:23 ^3    0/0 .>. 24:29 ^4    0/0 .>. 30:35 ^5    0/0 .>. 36:41 ^0    0/0 .>. 42:47 ^1    0/0 .>. 48:53 ^2    0/0 .>. 54:59 ^3    0/0 .>. 60:65 ^4    0/0 .>. 66:71 ^5    0/0 .>. 72:77 ^0    0/0 .>. 78:83 ^1    0/0 .>. 84:89 ^2    0/0 .>. 90:95 ^3    0/0 .>. 96:101 ^4    0/0 .>. 102:107 ^5    0/0 .>. 108:113 ^0    0/0 .>. 114:119 ^1    0/0 .>. 120:125 ^2    0/0 .>. 126:127 ^3
 
 This is once again split into "rcu_sched" and "rcu_bh" portions,
 and CONFIG_TREE_PREEMPT_RCU kernels will again have an additional
@@ -213,7 +213,7 @@ o	Each element of the form "1/1 0:127 ^0" represents one struct
 	might be either one, two, or three levels of rcu_node structures,
 	depending on the relationship between CONFIG_RCU_FANOUT and
 	CONFIG_NR_CPUS.
-	
+
 	o	The numbers separated by the "/" are the qsmask followed
 		by the qsmaskinit.  The qsmask will have one bit
 		set for each entity in the next lower level that
@@ -228,6 +228,15 @@ o	Each element of the form "1/1 0:127 ^0" represents one struct
 		are still waiting for CPUs 2 and 4 to check in for the
 		current grace period.
 
+	o	The characters separated by the ">" indicate the state
+		of the blocked-tasks lists.  A "T" preceding the ">"
+		indicates that at least one task blocked in an RCU
+		read-side critical section blocks the current grace
+		period, while a "." preceding the ">" indicates otherwise.
+		The character following the ">" indicates similarly for
+		the next grace period.  A "T" should appear in this
+		field only for rcu-preempt.
+
 	o	The numbers separated by the ":" are the range of CPUs
 		served by this struct rcu_node.  This can be helpful
 		in working out how the hierarchy is wired together.

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH tip/core/rcu 0/6] rcu: fix synchronize_rcu_expedited(), update docs, improve perf
  2009-10-15  9:21 ` Ingo Molnar
@ 2009-10-15  9:35   ` Josh Triplett
  2009-10-15 11:19     ` Ingo Molnar
  0 siblings, 1 reply; 24+ messages in thread
From: Josh Triplett @ 2009-10-15  9:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Paul E. McKenney, linux-kernel, laijs, dipankar, akpm,
	mathieu.desnoyers, dvhltc, niv, tglx, peterz, rostedt,
	Valdis.Kletnieks, dhowells, npiggin, jens.axboe

On Thu, Oct 15, 2009 at 11:21:55AM +0200, Ingo Molnar wrote:
> * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> 
> > This patchset contains a bug fix, a performance improvement, and 
> > documentation updates:
> > 
> > o	Update Documentation/RCU/trace.txt to reflect recent changes
> > 	(including the removal of rcupreempt.c).
> 
> i've applied this to the .33 queue.

I realize this only represents a documentation change, but it updates
the documentation to match the code in 2.6.32, which seems worth doing.

> > o	Add the new rnp->blocked_tasks field to the rcuhier trace file
> > 	in debugfs.
> > 
> > o	Update the Documentation/RCU/trace.txt documentation to include
> > 	the rnp->blocked_tasks tracing.
> 
> i've applied these to the .33 queue as well. (both tracing and 
> documentation is not urgent material.) I also did minor edits to the 
> changelogs.

Those who debug RCU-related issues would disagree that having adequate
tracing information proves non-urgent. :) The tracing information this
adds proves essential for debugging issues with the new hierarchical
RCU.  (And the documentation patch just documents the added tracing
information, so both should go together as a unit; actually, perhaps
they should get merged.)

- Josh Triplett

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH tip/core/rcu 2/6] rcu: prevent RCU IPI storms in presence of high call_rcu() load
  2009-10-14 17:15 ` [PATCH tip/core/rcu 2/6] rcu: prevent RCU IPI storms in presence of high call_rcu() load Paul E. McKenney
  2009-10-15  3:31   ` Nick Piggin
  2009-10-15  9:24   ` [tip:core/rcu] rcu: Prevent " tip-bot for Paul E. McKenney
@ 2009-10-15 11:04   ` Nick Piggin
  2009-10-15 11:20     ` Ingo Molnar
  2009-10-15 16:07     ` Paul E. McKenney
  2 siblings, 2 replies; 24+ messages in thread
From: Nick Piggin @ 2009-10-15 11:04 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, dvhltc, niv, tglx, peterz, rostedt, Valdis.Kletnieks,
	dhowells, jens.axboe

Testing this on top of my vfs-scale patches, a 64-way 32-node ia64
system runs the parallel open/close microbenchmark ~30 times faster
and is scaling linearly now. Single thread performance is not
noticably changed. Thanks very much Paul.

On Wed, Oct 14, 2009 at 10:15:55AM -0700, Paul E. McKenney wrote:
> From: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> As the number of callbacks on a given CPU rises, invoke
> force_quiescent_state() only every blimit number of callbacks
> (defaults to 10,000), and even then only if no other CPU has invoked
> force_quiescent_state() in the meantime.
> 
> Reported-by: Nick Piggin <npiggin@suse.de>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> ---
>  kernel/rcutree.c |   29 ++++++++++++++++++++++++-----
>  kernel/rcutree.h |    4 ++++
>  2 files changed, 28 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 705f02a..ddbf111 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -958,7 +958,7 @@ static void rcu_offline_cpu(int cpu)
>   * Invoke any RCU callbacks that have made it to the end of their grace
>   * period.  Thottle as specified by rdp->blimit.
>   */
> -static void rcu_do_batch(struct rcu_data *rdp)
> +static void rcu_do_batch(struct rcu_state *rsp, struct rcu_data *rdp)
>  {
>  	unsigned long flags;
>  	struct rcu_head *next, *list, **tail;
> @@ -1011,6 +1011,13 @@ static void rcu_do_batch(struct rcu_data *rdp)
>  	if (rdp->blimit == LONG_MAX && rdp->qlen <= qlowmark)
>  		rdp->blimit = blimit;
>  
> +	/* Reset ->qlen_last_fqs_check trigger if enough CBs have drained. */
> +	if (rdp->qlen == 0 && rdp->qlen_last_fqs_check != 0) {
> +		rdp->qlen_last_fqs_check = 0;
> +		rdp->n_force_qs_snap = rsp->n_force_qs;
> +	} else if (rdp->qlen < rdp->qlen_last_fqs_check - qhimark)
> +		rdp->qlen_last_fqs_check = rdp->qlen;
> +
>  	local_irq_restore(flags);
>  
>  	/* Re-raise the RCU softirq if there are callbacks remaining. */
> @@ -1224,7 +1231,7 @@ __rcu_process_callbacks(struct rcu_state *rsp, struct rcu_data *rdp)
>  	}
>  
>  	/* If there are callbacks ready, invoke them. */
> -	rcu_do_batch(rdp);
> +	rcu_do_batch(rsp, rdp);
>  }
>  
>  /*
> @@ -1288,10 +1295,20 @@ __call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu),
>  		rcu_start_gp(rsp, nestflag);  /* releases rnp_root->lock. */
>  	}
>  
> -	/* Force the grace period if too many callbacks or too long waiting. */
> -	if (unlikely(++rdp->qlen > qhimark)) {
> +	/*
> +	 * Force the grace period if too many callbacks or too long waiting.
> +	 * Enforce hysteresis, and don't invoke force_quiescent_state()
> +	 * if some other CPU has recently done so.  Also, don't bother
> +	 * invoking force_quiescent_state() if the newly enqueued callback
> +	 * is the only one waiting for a grace period to complete.
> +	 */
> +	if (unlikely(++rdp->qlen > rdp->qlen_last_fqs_check + qhimark)) {
>  		rdp->blimit = LONG_MAX;
> -		force_quiescent_state(rsp, 0);
> +		if (rsp->n_force_qs == rdp->n_force_qs_snap &&
> +		    *rdp->nxttail[RCU_DONE_TAIL] != head)
> +			force_quiescent_state(rsp, 0);
> +		rdp->n_force_qs_snap = rsp->n_force_qs;
> +		rdp->qlen_last_fqs_check = rdp->qlen;
>  	} else if ((long)(ACCESS_ONCE(rsp->jiffies_force_qs) - jiffies) < 0)
>  		force_quiescent_state(rsp, 1);
>  	local_irq_restore(flags);
> @@ -1523,6 +1540,8 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp, int preemptable)
>  	rdp->beenonline = 1;	 /* We have now been online. */
>  	rdp->preemptable = preemptable;
>  	rdp->passed_quiesc_completed = lastcomp - 1;
> +	rdp->qlen_last_fqs_check = 0;
> +	rdp->n_force_qs_snap = rsp->n_force_qs;
>  	rdp->blimit = blimit;
>  	spin_unlock(&rnp->lock);		/* irqs remain disabled. */
>  
> diff --git a/kernel/rcutree.h b/kernel/rcutree.h
> index b40ac57..599161f 100644
> --- a/kernel/rcutree.h
> +++ b/kernel/rcutree.h
> @@ -167,6 +167,10 @@ struct rcu_data {
>  	struct rcu_head *nxtlist;
>  	struct rcu_head **nxttail[RCU_NEXT_SIZE];
>  	long		qlen;		/* # of queued callbacks */
> +	long		qlen_last_fqs_check;
> +					/* qlen at last check for QS forcing */
> +	unsigned long	n_force_qs_snap;
> +					/* did other CPU force QS recently? */
>  	long		blimit;		/* Upper limit on a processed batch */
>  
>  #ifdef CONFIG_NO_HZ
> -- 
> 1.5.2.5

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH tip/core/rcu 0/6] rcu: fix synchronize_rcu_expedited(), update docs, improve perf
  2009-10-15  9:35   ` Josh Triplett
@ 2009-10-15 11:19     ` Ingo Molnar
  2009-10-15 16:14       ` Paul E. McKenney
  0 siblings, 1 reply; 24+ messages in thread
From: Ingo Molnar @ 2009-10-15 11:19 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Paul E. McKenney, linux-kernel, laijs, dipankar, akpm,
	mathieu.desnoyers, dvhltc, niv, tglx, peterz, rostedt,
	Valdis.Kletnieks, dhowells, npiggin, jens.axboe


* Josh Triplett <josh@joshtriplett.org> wrote:

> On Thu, Oct 15, 2009 at 11:21:55AM +0200, Ingo Molnar wrote:
> > * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> > 
> > > This patchset contains a bug fix, a performance improvement, and 
> > > documentation updates:
> > > 
> > > o	Update Documentation/RCU/trace.txt to reflect recent changes
> > > 	(including the removal of rcupreempt.c).
> > 
> > i've applied this to the .33 queue.
> 
> I realize this only represents a documentation change, but it updates 
> the documentation to match the code in 2.6.32, which seems worth 
> doing.
> 
> > > o	Add the new rnp->blocked_tasks field to the rcuhier trace file
> > > 	in debugfs.
> > > 
> > > o	Update the Documentation/RCU/trace.txt documentation to include
> > > 	the rnp->blocked_tasks tracing.
> > 
> > i've applied these to the .33 queue as well. (both tracing and 
> > documentation is not urgent material.) I also did minor edits to the 
> > changelogs.
> 
> Those who debug RCU-related issues would disagree that having adequate 
> tracing information proves non-urgent. :) The tracing information this 
> adds proves essential for debugging issues with the new hierarchical 
> RCU.  (And the documentation patch just documents the added tracing 
> information, so both should go together as a unit; actually, perhaps 
> they should get merged.)

No, we generally dont do such changes so late in -rc's (these would hit 
upstream in -rc6 - which is too late).

People doing development will use the latest RCU tree so the practical 
impact is small. Furthermore, we had a higher than usual rate of 
post-rc1 RCU changes in this cycle already, it needs to cool down a bit.

	Ingo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH tip/core/rcu 2/6] rcu: prevent RCU IPI storms in presence of high call_rcu() load
  2009-10-15 11:04   ` [PATCH tip/core/rcu 2/6] rcu: prevent " Nick Piggin
@ 2009-10-15 11:20     ` Ingo Molnar
  2009-10-15 16:07     ` Paul E. McKenney
  1 sibling, 0 replies; 24+ messages in thread
From: Ingo Molnar @ 2009-10-15 11:20 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Paul E. McKenney, linux-kernel, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, dvhltc, niv, tglx, peterz, rostedt,
	Valdis.Kletnieks, dhowells, jens.axboe


* Nick Piggin <npiggin@suse.de> wrote:

> Testing this on top of my vfs-scale patches, a 64-way 32-node ia64 
> system runs the parallel open/close microbenchmark ~30 times faster 
> and is scaling linearly now. Single thread performance is not 
> noticably changed. Thanks very much Paul.

Cool, thanks for the testing Nick!

	Ingo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH tip/core/rcu 2/6] rcu: prevent RCU IPI storms in presence of high call_rcu() load
  2009-10-15 11:04   ` [PATCH tip/core/rcu 2/6] rcu: prevent " Nick Piggin
  2009-10-15 11:20     ` Ingo Molnar
@ 2009-10-15 16:07     ` Paul E. McKenney
  1 sibling, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2009-10-15 16:07 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, dvhltc, niv, tglx, peterz, rostedt, Valdis.Kletnieks,
	dhowells, jens.axboe

On Thu, Oct 15, 2009 at 01:04:04PM +0200, Nick Piggin wrote:
> Testing this on top of my vfs-scale patches, a 64-way 32-node ia64
> system runs the parallel open/close microbenchmark ~30 times faster
> and is scaling linearly now. Single thread performance is not
> noticably changed. Thanks very much Paul.

Good to hear, thank you for giving it a go!

						Thanx, Paul

> On Wed, Oct 14, 2009 at 10:15:55AM -0700, Paul E. McKenney wrote:
> > From: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > 
> > As the number of callbacks on a given CPU rises, invoke
> > force_quiescent_state() only every blimit number of callbacks
> > (defaults to 10,000), and even then only if no other CPU has invoked
> > force_quiescent_state() in the meantime.
> > 
> > Reported-by: Nick Piggin <npiggin@suse.de>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > ---
> >  kernel/rcutree.c |   29 ++++++++++++++++++++++++-----
> >  kernel/rcutree.h |    4 ++++
> >  2 files changed, 28 insertions(+), 5 deletions(-)
> > 
> > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > index 705f02a..ddbf111 100644
> > --- a/kernel/rcutree.c
> > +++ b/kernel/rcutree.c
> > @@ -958,7 +958,7 @@ static void rcu_offline_cpu(int cpu)
> >   * Invoke any RCU callbacks that have made it to the end of their grace
> >   * period.  Thottle as specified by rdp->blimit.
> >   */
> > -static void rcu_do_batch(struct rcu_data *rdp)
> > +static void rcu_do_batch(struct rcu_state *rsp, struct rcu_data *rdp)
> >  {
> >  	unsigned long flags;
> >  	struct rcu_head *next, *list, **tail;
> > @@ -1011,6 +1011,13 @@ static void rcu_do_batch(struct rcu_data *rdp)
> >  	if (rdp->blimit == LONG_MAX && rdp->qlen <= qlowmark)
> >  		rdp->blimit = blimit;
> >  
> > +	/* Reset ->qlen_last_fqs_check trigger if enough CBs have drained. */
> > +	if (rdp->qlen == 0 && rdp->qlen_last_fqs_check != 0) {
> > +		rdp->qlen_last_fqs_check = 0;
> > +		rdp->n_force_qs_snap = rsp->n_force_qs;
> > +	} else if (rdp->qlen < rdp->qlen_last_fqs_check - qhimark)
> > +		rdp->qlen_last_fqs_check = rdp->qlen;
> > +
> >  	local_irq_restore(flags);
> >  
> >  	/* Re-raise the RCU softirq if there are callbacks remaining. */
> > @@ -1224,7 +1231,7 @@ __rcu_process_callbacks(struct rcu_state *rsp, struct rcu_data *rdp)
> >  	}
> >  
> >  	/* If there are callbacks ready, invoke them. */
> > -	rcu_do_batch(rdp);
> > +	rcu_do_batch(rsp, rdp);
> >  }
> >  
> >  /*
> > @@ -1288,10 +1295,20 @@ __call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu),
> >  		rcu_start_gp(rsp, nestflag);  /* releases rnp_root->lock. */
> >  	}
> >  
> > -	/* Force the grace period if too many callbacks or too long waiting. */
> > -	if (unlikely(++rdp->qlen > qhimark)) {
> > +	/*
> > +	 * Force the grace period if too many callbacks or too long waiting.
> > +	 * Enforce hysteresis, and don't invoke force_quiescent_state()
> > +	 * if some other CPU has recently done so.  Also, don't bother
> > +	 * invoking force_quiescent_state() if the newly enqueued callback
> > +	 * is the only one waiting for a grace period to complete.
> > +	 */
> > +	if (unlikely(++rdp->qlen > rdp->qlen_last_fqs_check + qhimark)) {
> >  		rdp->blimit = LONG_MAX;
> > -		force_quiescent_state(rsp, 0);
> > +		if (rsp->n_force_qs == rdp->n_force_qs_snap &&
> > +		    *rdp->nxttail[RCU_DONE_TAIL] != head)
> > +			force_quiescent_state(rsp, 0);
> > +		rdp->n_force_qs_snap = rsp->n_force_qs;
> > +		rdp->qlen_last_fqs_check = rdp->qlen;
> >  	} else if ((long)(ACCESS_ONCE(rsp->jiffies_force_qs) - jiffies) < 0)
> >  		force_quiescent_state(rsp, 1);
> >  	local_irq_restore(flags);
> > @@ -1523,6 +1540,8 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp, int preemptable)
> >  	rdp->beenonline = 1;	 /* We have now been online. */
> >  	rdp->preemptable = preemptable;
> >  	rdp->passed_quiesc_completed = lastcomp - 1;
> > +	rdp->qlen_last_fqs_check = 0;
> > +	rdp->n_force_qs_snap = rsp->n_force_qs;
> >  	rdp->blimit = blimit;
> >  	spin_unlock(&rnp->lock);		/* irqs remain disabled. */
> >  
> > diff --git a/kernel/rcutree.h b/kernel/rcutree.h
> > index b40ac57..599161f 100644
> > --- a/kernel/rcutree.h
> > +++ b/kernel/rcutree.h
> > @@ -167,6 +167,10 @@ struct rcu_data {
> >  	struct rcu_head *nxtlist;
> >  	struct rcu_head **nxttail[RCU_NEXT_SIZE];
> >  	long		qlen;		/* # of queued callbacks */
> > +	long		qlen_last_fqs_check;
> > +					/* qlen at last check for QS forcing */
> > +	unsigned long	n_force_qs_snap;
> > +					/* did other CPU force QS recently? */
> >  	long		blimit;		/* Upper limit on a processed batch */
> >  
> >  #ifdef CONFIG_NO_HZ
> > -- 
> > 1.5.2.5

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH tip/core/rcu 0/6] rcu: fix synchronize_rcu_expedited(), update docs, improve perf
  2009-10-15 11:19     ` Ingo Molnar
@ 2009-10-15 16:14       ` Paul E. McKenney
  0 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2009-10-15 16:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Josh Triplett, linux-kernel, laijs, dipankar, akpm,
	mathieu.desnoyers, dvhltc, niv, tglx, peterz, rostedt,
	Valdis.Kletnieks, dhowells, npiggin, jens.axboe

On Thu, Oct 15, 2009 at 01:19:44PM +0200, Ingo Molnar wrote:
> 
> * Josh Triplett <josh@joshtriplett.org> wrote:
> 
> > On Thu, Oct 15, 2009 at 11:21:55AM +0200, Ingo Molnar wrote:
> > > * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> > > 
> > > > This patchset contains a bug fix, a performance improvement, and 
> > > > documentation updates:
> > > > 
> > > > o	Update Documentation/RCU/trace.txt to reflect recent changes
> > > > 	(including the removal of rcupreempt.c).
> > > 
> > > i've applied this to the .33 queue.
> > 
> > I realize this only represents a documentation change, but it updates 
> > the documentation to match the code in 2.6.32, which seems worth 
> > doing.
> > 
> > > > o	Add the new rnp->blocked_tasks field to the rcuhier trace file
> > > > 	in debugfs.
> > > > 
> > > > o	Update the Documentation/RCU/trace.txt documentation to include
> > > > 	the rnp->blocked_tasks tracing.
> > > 
> > > i've applied these to the .33 queue as well. (both tracing and 
> > > documentation is not urgent material.) I also did minor edits to the 
> > > changelogs.
> > 
> > Those who debug RCU-related issues would disagree that having adequate 
> > tracing information proves non-urgent. :) The tracing information this 
> > adds proves essential for debugging issues with the new hierarchical 
> > RCU.  (And the documentation patch just documents the added tracing 
> > information, so both should go together as a unit; actually, perhaps 
> > they should get merged.)
> 
> No, we generally dont do such changes so late in -rc's (these would hit 
> upstream in -rc6 - which is too late).
> 
> People doing development will use the latest RCU tree so the practical 
> impact is small. Furthermore, we had a higher than usual rate of 
> post-rc1 RCU changes in this cycle already, it needs to cool down a bit.

Works for me!  I can supply appropriate diffs off of one of the -rc
releases if people need this in order to experiment with recent RCU
patches.  (But I get to choose the -rc!)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2009-10-15 18:41 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-14 17:15 [PATCH tip/core/rcu 0/6] rcu: fix synchronize_rcu_expedited(), update docs, improve perf Paul E. McKenney
2009-10-14 17:15 ` [PATCH tip/core/rcu 1/6] rcu: Update trace.txt documentation to reflect recent changes Paul E. McKenney
2009-10-15  9:25   ` [tip:core/rcu] " tip-bot for Paul E. McKenney
2009-10-14 17:15 ` [PATCH tip/core/rcu 2/6] rcu: prevent RCU IPI storms in presence of high call_rcu() load Paul E. McKenney
2009-10-15  3:31   ` Nick Piggin
2009-10-15  4:37     ` Paul E. McKenney
2009-10-15  9:24   ` [tip:core/rcu] rcu: Prevent " tip-bot for Paul E. McKenney
2009-10-15 11:04   ` [PATCH tip/core/rcu 2/6] rcu: prevent " Nick Piggin
2009-10-15 11:20     ` Ingo Molnar
2009-10-15 16:07     ` Paul E. McKenney
2009-10-14 17:15 ` [PATCH tip/core/rcu 3/6] rcu: stopgap fix for synchronize_rcu_expedited() for TREE_PREEMPT_RCU Paul E. McKenney
2009-10-15  9:25   ` [tip:core/rcu] rcu: Stopgap " tip-bot for Paul E. McKenney
2009-10-14 17:15 ` [PATCH tip/core/rcu 4/6] rcu: add exports for synchronize_rcu_expedited() Paul E. McKenney
2009-10-14 17:15 ` [PATCH tip/core/rcu 5/6] rcu: add rnp->blocked_tasks to tracing Paul E. McKenney
2009-10-14 20:26   ` Josh Triplett
2009-10-14 23:36     ` Paul E. McKenney
2009-10-15  9:25       ` [tip:core/rcu] rcu: Add " tip-bot for Paul E. McKenney
2009-10-14 17:15 ` [PATCH tip/core/rcu 6/6] rcu: Update trace.txt documentation for blocked-tasks lists Paul E. McKenney
2009-10-15  9:25   ` [tip:core/rcu] " tip-bot for Paul E. McKenney
2009-10-14 20:28 ` [PATCH tip/core/rcu 0/6] rcu: fix synchronize_rcu_expedited(), update docs, improve perf Josh Triplett
2009-10-15  9:21 ` Ingo Molnar
2009-10-15  9:35   ` Josh Triplett
2009-10-15 11:19     ` Ingo Molnar
2009-10-15 16:14       ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox