* [PATCH 0/2] rcu: Fix series of spurious RCU softirqs
@ 2010-11-24 0:31 Frederic Weisbecker
2010-11-24 0:31 ` [PATCH 1/2] rcu: Don't chase unnecessary quiescent states after extended grace periods Frederic Weisbecker
` (2 more replies)
0 siblings, 3 replies; 27+ messages in thread
From: Frederic Weisbecker @ 2010-11-24 0:31 UTC (permalink / raw)
To: Paul E. McKenney
Cc: LKML, Frederic Weisbecker, Paul E . McKenney, Lai Jiangshan,
Thomas Gleixner, Peter Zijlstra, Steven Rostedt, Ingo Molnar
Hi,
I've observed some not so unfrequent series of spurious rcu
softirqs, sometimes happening at each ticks for a random
while.
These patches aims at fixing them.
Thanks.
Frederic Weisbecker (2):
rcu: Don't chase unnecessary quiescent states after extended grace periods
rcu: Stop checking quiescent states after grace period completion from remote
kernel/rcutree.c | 22 ++++++++++++++++++++--
1 files changed, 20 insertions(+), 2 deletions(-)
^ permalink raw reply [flat|nested] 27+ messages in thread* [PATCH 1/2] rcu: Don't chase unnecessary quiescent states after extended grace periods 2010-11-24 0:31 [PATCH 0/2] rcu: Fix series of spurious RCU softirqs Frederic Weisbecker @ 2010-11-24 0:31 ` Frederic Weisbecker 2010-11-24 0:58 ` Paul E. McKenney 2010-11-24 0:31 ` [PATCH 2/2] rcu: Stop checking quiescent states after grace period completion from remote Frederic Weisbecker 2010-11-25 3:42 ` [PATCH 0/2] rcu: Fix series of spurious RCU softirqs Lai Jiangshan 2 siblings, 1 reply; 27+ messages in thread From: Frederic Weisbecker @ 2010-11-24 0:31 UTC (permalink / raw) To: Paul E. McKenney Cc: LKML, Frederic Weisbecker, Paul E. McKenney, Lai Jiangshan, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Steven Rostedt When a cpu is in an extended quiescent state, which includes idle nohz or CPU offline, others CPUs will take care of the grace periods on its behalf. When this CPU exits its extended quiescent state, it will catch up with the last started grace period and start chasing its own quiescent states to end the current grace period. However in this case we always start to track quiescent states if the grace period number has changed since we started our extended quiescent state. And we do this because we always assume that the last grace period is not finished and needs us to complete it, which is sometimes wrong. This patch verifies if the last grace period has been completed and if so, start hunting local quiescent states like we always did. Otherwise don't do anything, this economizes us some work and an unnecessary softirq. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> --- kernel/rcutree.c | 13 +++++++++++-- 1 files changed, 11 insertions(+), 2 deletions(-) diff --git a/kernel/rcutree.c b/kernel/rcutree.c index ccdc04c..5f038a1 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -620,8 +620,17 @@ static void __init check_cpu_stall_init(void) static void __note_new_gpnum(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_data *rdp) { if (rdp->gpnum != rnp->gpnum) { - rdp->qs_pending = 1; - rdp->passed_quiesc = 0; + /* + * Another CPU might have taken take of this new grace period + * while we were idle and handled us as in an extended quiescent + * state. In that case, we don't need to chase a local quiescent + * state, otherwise: + */ + if (rdp->completed != rnp->gpnum) { + rdp->qs_pending = 1; + rdp->passed_quiesc = 0; + } + rdp->gpnum = rnp->gpnum; } } -- 1.7.1 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] rcu: Don't chase unnecessary quiescent states after extended grace periods 2010-11-24 0:31 ` [PATCH 1/2] rcu: Don't chase unnecessary quiescent states after extended grace periods Frederic Weisbecker @ 2010-11-24 0:58 ` Paul E. McKenney 2010-11-24 2:29 ` Frederic Weisbecker 0 siblings, 1 reply; 27+ messages in thread From: Paul E. McKenney @ 2010-11-24 0:58 UTC (permalink / raw) To: Frederic Weisbecker Cc: LKML, Lai Jiangshan, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Steven Rostedt On Wed, Nov 24, 2010 at 01:31:12AM +0100, Frederic Weisbecker wrote: > When a cpu is in an extended quiescent state, which includes idle > nohz or CPU offline, others CPUs will take care of the grace periods > on its behalf. > > When this CPU exits its extended quiescent state, it will catch up > with the last started grace period and start chasing its own > quiescent states to end the current grace period. > > However in this case we always start to track quiescent states if the > grace period number has changed since we started our extended > quiescent state. And we do this because we always assume that the last > grace period is not finished and needs us to complete it, which is > sometimes wrong. > > This patch verifies if the last grace period has been completed and > if so, start hunting local quiescent states like we always did. > Otherwise don't do anything, this economizes us some work and > an unnecessary softirq. Interesting approach! I can see how this helps in the case where the CPU just came online, but I don't see it in the nohz case, because the nohz case does not update the rdp->completed variable. In contrast, the online path calls rcu_init_percpu_data() which sets up this variable. So, what am I missing here? Thanx, Paul PS. It might well be worthwhile for the online case alone, but the commit message does need to be accurate. > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > Cc: Lai Jiangshan <laijs@cn.fujitsu.com> > Cc: Ingo Molnar <mingo@elte.hu> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> > Cc: Steven Rostedt <rostedt@goodmis.org> > --- > kernel/rcutree.c | 13 +++++++++++-- > 1 files changed, 11 insertions(+), 2 deletions(-) > > diff --git a/kernel/rcutree.c b/kernel/rcutree.c > index ccdc04c..5f038a1 100644 > --- a/kernel/rcutree.c > +++ b/kernel/rcutree.c > @@ -620,8 +620,17 @@ static void __init check_cpu_stall_init(void) > static void __note_new_gpnum(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_data *rdp) > { > if (rdp->gpnum != rnp->gpnum) { > - rdp->qs_pending = 1; > - rdp->passed_quiesc = 0; > + /* > + * Another CPU might have taken take of this new grace period > + * while we were idle and handled us as in an extended quiescent > + * state. In that case, we don't need to chase a local quiescent > + * state, otherwise: > + */ > + if (rdp->completed != rnp->gpnum) { > + rdp->qs_pending = 1; > + rdp->passed_quiesc = 0; > + } > + > rdp->gpnum = rnp->gpnum; > } > } > -- > 1.7.1 > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] rcu: Don't chase unnecessary quiescent states after extended grace periods 2010-11-24 0:58 ` Paul E. McKenney @ 2010-11-24 2:29 ` Frederic Weisbecker 2010-11-24 2:33 ` Frederic Weisbecker 0 siblings, 1 reply; 27+ messages in thread From: Frederic Weisbecker @ 2010-11-24 2:29 UTC (permalink / raw) To: paulmck Cc: LKML, Lai Jiangshan, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Steven Rostedt On Tue, Nov 23, 2010 at 04:58:20PM -0800, Paul E. McKenney wrote: > On Wed, Nov 24, 2010 at 01:31:12AM +0100, Frederic Weisbecker wrote: > > When a cpu is in an extended quiescent state, which includes idle > > nohz or CPU offline, others CPUs will take care of the grace periods > > on its behalf. > > > > When this CPU exits its extended quiescent state, it will catch up > > with the last started grace period and start chasing its own > > quiescent states to end the current grace period. > > > > However in this case we always start to track quiescent states if the > > grace period number has changed since we started our extended > > quiescent state. And we do this because we always assume that the last > > grace period is not finished and needs us to complete it, which is > > sometimes wrong. > > > > This patch verifies if the last grace period has been completed and > > if so, start hunting local quiescent states like we always did. > > Otherwise don't do anything, this economizes us some work and > > an unnecessary softirq. > > Interesting approach! I can see how this helps in the case where the > CPU just came online, but I don't see it in the nohz case, because the > nohz case does not update the rdp->completed variable. In contrast, > the online path calls rcu_init_percpu_data() which sets up this variable. > > So, what am I missing here? > > Thanx, Paul > > PS. It might well be worthwhile for the online case alone, but > the commit message does need to be accurate. So, let's take this scenario (inspired from a freshly dumped trace to clarify my ideas): CPU 1 was idle, it has missed several grace periods, but CPU 0 took care of that. Hence, CPU 0's rdp->gpnum = rdp->completed = 4294967000 But the last grace period was 4294967002 and it's completed (rnp->pgnum = rnp->completed = rsp->pgnum = 4294967002). Now CPU 0 gets a tick for a random reason, it calls rcu_check_callbacks() and then rcu_pending() which raises the softirq because of this: /* Has another RCU grace period completed? */ if (ACCESS_ONCE(rnp->completed) != rdp->completed) { /* outside lock */ rdp->n_rp_gp_completed++; return 1; } The softirq fires, we call rcu_process_gp_end() which will update rdp->completed into the global state: (rsp->completed = rnp->pgnum = rnp->completed = rsp->pgnum = 4294967002). But rsp->pgnum is still 2 offsets backwards. Now we call rcu_check_quiescent_state() -> check_for_new_grace_period() -> note_new_gpnum() and then we end up a requested quiescent state while every grace periods are completed. So, now that I describe all that, I wonder if actually the solution would be better with changing the above condition to not fire the softirq to begin with, because: rnp->completed != rdp->completed doesn't seem to mean we need the current cpu. It just mean that the node was smart enough to make its way without us when we were in an extended quiescent state :) ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] rcu: Don't chase unnecessary quiescent states after extended grace periods 2010-11-24 2:29 ` Frederic Weisbecker @ 2010-11-24 2:33 ` Frederic Weisbecker 2010-11-24 6:22 ` Paul E. McKenney 0 siblings, 1 reply; 27+ messages in thread From: Frederic Weisbecker @ 2010-11-24 2:33 UTC (permalink / raw) To: paulmck Cc: LKML, Lai Jiangshan, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Steven Rostedt 2010/11/24 Frederic Weisbecker <fweisbec@gmail.com>: > On Tue, Nov 23, 2010 at 04:58:20PM -0800, Paul E. McKenney wrote: >> On Wed, Nov 24, 2010 at 01:31:12AM +0100, Frederic Weisbecker wrote: >> > When a cpu is in an extended quiescent state, which includes idle >> > nohz or CPU offline, others CPUs will take care of the grace periods >> > on its behalf. >> > >> > When this CPU exits its extended quiescent state, it will catch up >> > with the last started grace period and start chasing its own >> > quiescent states to end the current grace period. >> > >> > However in this case we always start to track quiescent states if the >> > grace period number has changed since we started our extended >> > quiescent state. And we do this because we always assume that the last >> > grace period is not finished and needs us to complete it, which is >> > sometimes wrong. >> > >> > This patch verifies if the last grace period has been completed and >> > if so, start hunting local quiescent states like we always did. >> > Otherwise don't do anything, this economizes us some work and >> > an unnecessary softirq. >> >> Interesting approach! I can see how this helps in the case where the >> CPU just came online, but I don't see it in the nohz case, because the >> nohz case does not update the rdp->completed variable. In contrast, >> the online path calls rcu_init_percpu_data() which sets up this variable. >> >> So, what am I missing here? >> >> Thanx, Paul >> >> PS. It might well be worthwhile for the online case alone, but >> the commit message does need to be accurate. > > > So, let's take this scenario (inspired from a freshly dumped trace to > clarify my ideas): > > CPU 1 was idle, it has missed several grace periods, but CPU 0 took care > of that. > > Hence, CPU 0's rdp->gpnum = rdp->completed = 4294967000 > > But the last grace period was 4294967002 and it's completed > (rnp->pgnum = rnp->completed = rsp->pgnum = 4294967002). > > Now CPU 0 gets a tick for a random reason, it calls rcu_check_callbacks() > and then rcu_pending() which raises the softirq because of this: > > /* Has another RCU grace period completed? */ > if (ACCESS_ONCE(rnp->completed) != rdp->completed) { /* outside lock */ > rdp->n_rp_gp_completed++; > return 1; > } > > The softirq fires, we call rcu_process_gp_end() which will > update rdp->completed into the global state: > (rsp->completed = rnp->pgnum = rnp->completed = rsp->pgnum = 4294967002). > > But rsp->pgnum is still 2 offsets backwards. > > Now we call rcu_check_quiescent_state() -> check_for_new_grace_period() > -> note_new_gpnum() and then we end up a requested quiescent state while > every grace periods are completed. Sorry I should have described that in the changelogs but my ideas weren't as clear as they are now (at least I feel they are, doesn't mean they actually are ;) Chasing these RCU bugs for too much hours has toasted my brain.. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] rcu: Don't chase unnecessary quiescent states after extended grace periods 2010-11-24 2:33 ` Frederic Weisbecker @ 2010-11-24 6:22 ` Paul E. McKenney 2010-11-24 13:48 ` Frederic Weisbecker 0 siblings, 1 reply; 27+ messages in thread From: Paul E. McKenney @ 2010-11-24 6:22 UTC (permalink / raw) To: Frederic Weisbecker Cc: LKML, Lai Jiangshan, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Steven Rostedt On Wed, Nov 24, 2010 at 03:33:21AM +0100, Frederic Weisbecker wrote: > 2010/11/24 Frederic Weisbecker <fweisbec@gmail.com>: > > On Tue, Nov 23, 2010 at 04:58:20PM -0800, Paul E. McKenney wrote: > >> On Wed, Nov 24, 2010 at 01:31:12AM +0100, Frederic Weisbecker wrote: > >> > When a cpu is in an extended quiescent state, which includes idle > >> > nohz or CPU offline, others CPUs will take care of the grace periods > >> > on its behalf. > >> > > >> > When this CPU exits its extended quiescent state, it will catch up > >> > with the last started grace period and start chasing its own > >> > quiescent states to end the current grace period. > >> > > >> > However in this case we always start to track quiescent states if the > >> > grace period number has changed since we started our extended > >> > quiescent state. And we do this because we always assume that the last > >> > grace period is not finished and needs us to complete it, which is > >> > sometimes wrong. > >> > > >> > This patch verifies if the last grace period has been completed and > >> > if so, start hunting local quiescent states like we always did. > >> > Otherwise don't do anything, this economizes us some work and > >> > an unnecessary softirq. > >> > >> Interesting approach! I can see how this helps in the case where the > >> CPU just came online, but I don't see it in the nohz case, because the > >> nohz case does not update the rdp->completed variable. In contrast, > >> the online path calls rcu_init_percpu_data() which sets up this variable. > >> > >> So, what am I missing here? > >> > >> Thanx, Paul > >> > >> PS. It might well be worthwhile for the online case alone, but > >> the commit message does need to be accurate. > > > > > > So, let's take this scenario (inspired from a freshly dumped trace to > > clarify my ideas): > > > > CPU 1 was idle, it has missed several grace periods, but CPU 0 took care > > of that. > > > > Hence, CPU 0's rdp->gpnum = rdp->completed = 4294967000 But CPU 1's rdp state is outdated, due to locking design. > > But the last grace period was 4294967002 and it's completed > > (rnp->pgnum = rnp->completed = rsp->pgnum = 4294967002). > > > > Now CPU 0 gets a tick for a random reason, it calls rcu_check_callbacks() > > and then rcu_pending() which raises the softirq because of this: > > > > /* Has another RCU grace period completed? */ > > if (ACCESS_ONCE(rnp->completed) != rdp->completed) { /* outside lock */ > > rdp->n_rp_gp_completed++; > > return 1; > > } Yes, because CPU 0 has not yet updated its rdp state. > > The softirq fires, we call rcu_process_gp_end() which will > > update rdp->completed into the global state: > > (rsp->completed = rnp->pgnum = rnp->completed = rsp->pgnum = 4294967002). > > > > But rsp->pgnum is still 2 offsets backwards. This one is hard for me to believe -- rsp->gpnum drives the rest of the ->gpnum fields, right? > > Now we call rcu_check_quiescent_state() -> check_for_new_grace_period() > > -> note_new_gpnum() and then we end up a requested quiescent state while > > every grace periods are completed. > > Sorry I should have described that in the changelogs but my ideas > weren't as clear as they > are now (at least I feel they are, doesn't mean they actually are ;) > Chasing these RCU bugs for too much hours has toasted my brain.. Welcome to my world!!! But keep in mind that an extra timer tick or two is much preferable to a potential hang! And you only get the extra timer tick if there was some other reason that the CPU came out of nohz mode, correct? Which is why checking the rnp fields makes more sense to me, actually. Acquiring rnp->lock is much less painful than pinning down the rsp state. Thanx, Paul ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] rcu: Don't chase unnecessary quiescent states after extended grace periods 2010-11-24 6:22 ` Paul E. McKenney @ 2010-11-24 13:48 ` Frederic Weisbecker 2010-11-24 14:42 ` Paul E. McKenney 0 siblings, 1 reply; 27+ messages in thread From: Frederic Weisbecker @ 2010-11-24 13:48 UTC (permalink / raw) To: paulmck Cc: LKML, Lai Jiangshan, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Steven Rostedt 2010/11/24 Paul E. McKenney <paulmck@linux.vnet.ibm.com>: > On Wed, Nov 24, 2010 at 03:33:21AM +0100, Frederic Weisbecker wrote: >> 2010/11/24 Frederic Weisbecker <fweisbec@gmail.com>: >> > On Tue, Nov 23, 2010 at 04:58:20PM -0800, Paul E. McKenney wrote: >> >> On Wed, Nov 24, 2010 at 01:31:12AM +0100, Frederic Weisbecker wrote: >> >> > When a cpu is in an extended quiescent state, which includes idle >> >> > nohz or CPU offline, others CPUs will take care of the grace periods >> >> > on its behalf. >> >> > >> >> > When this CPU exits its extended quiescent state, it will catch up >> >> > with the last started grace period and start chasing its own >> >> > quiescent states to end the current grace period. >> >> > >> >> > However in this case we always start to track quiescent states if the >> >> > grace period number has changed since we started our extended >> >> > quiescent state. And we do this because we always assume that the last >> >> > grace period is not finished and needs us to complete it, which is >> >> > sometimes wrong. >> >> > >> >> > This patch verifies if the last grace period has been completed and >> >> > if so, start hunting local quiescent states like we always did. >> >> > Otherwise don't do anything, this economizes us some work and >> >> > an unnecessary softirq. >> >> >> >> Interesting approach! I can see how this helps in the case where the >> >> CPU just came online, but I don't see it in the nohz case, because the >> >> nohz case does not update the rdp->completed variable. In contrast, >> >> the online path calls rcu_init_percpu_data() which sets up this variable. >> >> >> >> So, what am I missing here? >> >> >> >> Thanx, Paul >> >> >> >> PS. It might well be worthwhile for the online case alone, but >> >> the commit message does need to be accurate. >> > >> > >> > So, let's take this scenario (inspired from a freshly dumped trace to >> > clarify my ideas): >> > >> > CPU 1 was idle, it has missed several grace periods, but CPU 0 took care >> > of that. >> > >> > Hence, CPU 0's rdp->gpnum = rdp->completed = 4294967000 (Actually I was talking about CPU 1 here. CPU was idle and has rdp->gpnum and rdp->completed at 4294967000. While the global state and even the node are on 4294967002 > But CPU 1's rdp state is outdated, due to locking design. Yeah. >> > But the last grace period was 4294967002 and it's completed >> > (rnp->pgnum = rnp->completed = rsp->pgnum = 4294967002). >> > >> > Now CPU 0 gets a tick for a random reason, it calls rcu_check_callbacks() >> > and then rcu_pending() which raises the softirq because of this: >> > >> > /* Has another RCU grace period completed? */ >> > if (ACCESS_ONCE(rnp->completed) != rdp->completed) { /* outside lock */ >> > rdp->n_rp_gp_completed++; >> > return 1; >> > } > > Yes, because CPU 0 has not yet updated its rdp state. So again I made a mistake. Here it's CPU 1 that takes this path, the outdated CPU that was idle. So yeah, here it has not yet updated its rdp state, it's still 2 offsets backwards. >> > The softirq fires, we call rcu_process_gp_end() which will >> > update rdp->completed into the global state: >> > (rsp->completed = rnp->pgnum = rnp->completed = rsp->pgnum = 4294967002). >> > >> > But rsp->pgnum is still 2 offsets backwards. > > This one is hard for me to believe -- rsp->gpnum drives the rest of > the ->gpnum fields, right? Oops, I meant rdp->pgnum is still 2 offsets backward, for CPU 1. Sorry. >> > Now we call rcu_check_quiescent_state() -> check_for_new_grace_period() >> > -> note_new_gpnum() and then we end up a requested quiescent state while >> > every grace periods are completed. >> >> Sorry I should have described that in the changelogs but my ideas >> weren't as clear as they >> are now (at least I feel they are, doesn't mean they actually are ;) >> Chasing these RCU bugs for too much hours has toasted my brain.. > > Welcome to my world!!! But keep in mind that an extra timer tick > or two is much preferable to a potential hang! And you only get > the extra timer tick if there was some other reason that the > CPU came out of nohz mode, correct? Yeah, either because of a timer, hrtimer, or a reschedule. But you still generate a spurious softirq in this scheme. Two in fact: one because of the rnp->completed != rsp->completed condition in rcu_pending(), another one because when we update the pgnum, we always start chasing QS, regardless of the last GP beeing completed or not. > Which is why checking the rnp fields makes more sense to me, actually. > Acquiring rnp->lock is much less painful than pinning down the rsp state. Right. Another thing, we already have the (rnp->gpnum) != rdp->gpnu check in rcu_pending(), why also checking (rnp->completed) != rdp->completed) ? Thanks. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] rcu: Don't chase unnecessary quiescent states after extended grace periods 2010-11-24 13:48 ` Frederic Weisbecker @ 2010-11-24 14:42 ` Paul E. McKenney 2010-11-24 15:45 ` Frederic Weisbecker 0 siblings, 1 reply; 27+ messages in thread From: Paul E. McKenney @ 2010-11-24 14:42 UTC (permalink / raw) To: Frederic Weisbecker Cc: LKML, Lai Jiangshan, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Steven Rostedt On Wed, Nov 24, 2010 at 02:48:46PM +0100, Frederic Weisbecker wrote: > 2010/11/24 Paul E. McKenney <paulmck@linux.vnet.ibm.com>: > > On Wed, Nov 24, 2010 at 03:33:21AM +0100, Frederic Weisbecker wrote: > >> 2010/11/24 Frederic Weisbecker <fweisbec@gmail.com>: > >> > On Tue, Nov 23, 2010 at 04:58:20PM -0800, Paul E. McKenney wrote: > >> >> On Wed, Nov 24, 2010 at 01:31:12AM +0100, Frederic Weisbecker wrote: > >> >> > When a cpu is in an extended quiescent state, which includes idle > >> >> > nohz or CPU offline, others CPUs will take care of the grace periods > >> >> > on its behalf. > >> >> > > >> >> > When this CPU exits its extended quiescent state, it will catch up > >> >> > with the last started grace period and start chasing its own > >> >> > quiescent states to end the current grace period. > >> >> > > >> >> > However in this case we always start to track quiescent states if the > >> >> > grace period number has changed since we started our extended > >> >> > quiescent state. And we do this because we always assume that the last > >> >> > grace period is not finished and needs us to complete it, which is > >> >> > sometimes wrong. > >> >> > > >> >> > This patch verifies if the last grace period has been completed and > >> >> > if so, start hunting local quiescent states like we always did. > >> >> > Otherwise don't do anything, this economizes us some work and > >> >> > an unnecessary softirq. > >> >> > >> >> Interesting approach! I can see how this helps in the case where the > >> >> CPU just came online, but I don't see it in the nohz case, because the > >> >> nohz case does not update the rdp->completed variable. In contrast, > >> >> the online path calls rcu_init_percpu_data() which sets up this variable. > >> >> > >> >> So, what am I missing here? > >> >> > >> >> Thanx, Paul > >> >> > >> >> PS. It might well be worthwhile for the online case alone, but > >> >> the commit message does need to be accurate. > >> > > >> > > >> > So, let's take this scenario (inspired from a freshly dumped trace to > >> > clarify my ideas): > >> > > >> > CPU 1 was idle, it has missed several grace periods, but CPU 0 took care > >> > of that. > >> > > >> > Hence, CPU 0's rdp->gpnum = rdp->completed = 4294967000 > > > (Actually I was talking about CPU 1 here. CPU was idle and > has rdp->gpnum and rdp->completed at 4294967000. > > While the global state and even the node are on 4294967002 > > > > But CPU 1's rdp state is outdated, due to locking design. > > Yeah. > > > >> > But the last grace period was 4294967002 and it's completed > >> > (rnp->pgnum = rnp->completed = rsp->pgnum = 4294967002). > >> > > >> > Now CPU 0 gets a tick for a random reason, it calls rcu_check_callbacks() > >> > and then rcu_pending() which raises the softirq because of this: > >> > > >> > /* Has another RCU grace period completed? */ > >> > if (ACCESS_ONCE(rnp->completed) != rdp->completed) { /* outside lock */ > >> > rdp->n_rp_gp_completed++; > >> > return 1; > >> > } > > > > Yes, because CPU 0 has not yet updated its rdp state. > > So again I made a mistake. Here it's CPU 1 that takes this path, the outdated > CPU that was idle. > > So yeah, here it has not yet updated its rdp state, it's still 2 > offsets backwards. OK, we might now be on the same page. ;-) > >> > The softirq fires, we call rcu_process_gp_end() which will > >> > update rdp->completed into the global state: > >> > (rsp->completed = rnp->pgnum = rnp->completed = rsp->pgnum = 4294967002). > >> > > >> > But rsp->pgnum is still 2 offsets backwards. > > > > This one is hard for me to believe -- rsp->gpnum drives the rest of > > the ->gpnum fields, right? > > Oops, I meant rdp->pgnum is still 2 offsets backward, for CPU 1. > > Sorry. CPU 1? Or CPU 0? > >> > Now we call rcu_check_quiescent_state() -> check_for_new_grace_period() > >> > -> note_new_gpnum() and then we end up a requested quiescent state while > >> > every grace periods are completed. > >> > >> Sorry I should have described that in the changelogs but my ideas > >> weren't as clear as they > >> are now (at least I feel they are, doesn't mean they actually are ;) > >> Chasing these RCU bugs for too much hours has toasted my brain.. > > > > Welcome to my world!!! But keep in mind that an extra timer tick > > or two is much preferable to a potential hang! And you only get > > the extra timer tick if there was some other reason that the > > CPU came out of nohz mode, correct? > > Yeah, either because of a timer, hrtimer, or a reschedule. > But you still generate a spurious softirq in this scheme. Eliminating spurious softirqs is a good thing, but let's review the overall priorities (probably missing a few, but this should give you an overall view): 1. No too-short grace periods!!! 2. No grace-period hangs!! 3. Minimize read-side overhead! 4. Maintain decent update-side performance and scalability! 5. Avoid inflicting real-time latency problems! 6. Avoid waking up sleeping CPUs! 7. Let CPUs that can do so go to sleep immediately (as opposed to waiting a few milliseconds). 8. Avoid spurious softirqs. 9. Super-optimization of update and grace-period paths. (I normally just say "no" -- it has to be an impressive optimization for me to be willing to risk messing up #1 through #7.) Nevertheless, simple changes that avoid spurious softirqs are good things. > Two in fact: one because of the rnp->completed != rsp->completed condition > in rcu_pending(), another one because when we update the pgnum, we > always start chasing QS, regardless of the last GP beeing completed or not. OK -- is this an example of #8 above, or is it really #7? I am absolutely not worried about a pair of back-to-back softirqs, and I don't believe that you should be, either. ;-) > > Which is why checking the rnp fields makes more sense to me, actually. > > Acquiring rnp->lock is much less painful than pinning down the rsp state. > > Right. > > Another thing, we already have the (rnp->gpnum) != rdp->gpnu check in > rcu_pending(), > why also checking (rnp->completed) != rdp->completed) ? Because if (rnp->completed != rdp->completed), we might need to process some callbacks, either advancing them or invoking them. By the way, have you introduced a config variable for your HPC dynticks changes? Longer term, __rcu_pending() for your HPC dynticks case could check for the current CPU having any callbacks before the call to check_cpu_stalls(), as in rcu_needs_cpu_quick_check(). That way, the CPUs with callbacks would drive the RCU core state machine. We don't necessarily want this in the common case because it can increase grace-period latency, but it could be very useful in the HPC-dyntick case -- eliminate any number of sources of spurious ticks with low risk to the high-priority RCU properties. There are some corresponding changes required on the force_quiescent_state() path, but these are reasonably straightforward. Thanx, Paul ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] rcu: Don't chase unnecessary quiescent states after extended grace periods 2010-11-24 14:42 ` Paul E. McKenney @ 2010-11-24 15:45 ` Frederic Weisbecker 2010-11-24 16:15 ` Paul E. McKenney 0 siblings, 1 reply; 27+ messages in thread From: Frederic Weisbecker @ 2010-11-24 15:45 UTC (permalink / raw) To: paulmck Cc: LKML, Lai Jiangshan, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Steven Rostedt 2010/11/24 Paul E. McKenney <paulmck@linux.vnet.ibm.com>: > On Wed, Nov 24, 2010 at 02:48:46PM +0100, Frederic Weisbecker wrote: >> 2010/11/24 Paul E. McKenney <paulmck@linux.vnet.ibm.com>: >> > On Wed, Nov 24, 2010 at 03:33:21AM +0100, Frederic Weisbecker wrote: >> >> 2010/11/24 Frederic Weisbecker <fweisbec@gmail.com>: >> >> > On Tue, Nov 23, 2010 at 04:58:20PM -0800, Paul E. McKenney wrote: >> >> >> On Wed, Nov 24, 2010 at 01:31:12AM +0100, Frederic Weisbecker wrote: >> >> > The softirq fires, we call rcu_process_gp_end() which will >> >> > update rdp->completed into the global state: >> >> > (rsp->completed = rnp->pgnum = rnp->completed = rsp->pgnum = 4294967002). >> >> > >> >> > But rsp->pgnum is still 2 offsets backwards. >> > >> > This one is hard for me to believe -- rsp->gpnum drives the rest of >> > the ->gpnum fields, right? >> >> Oops, I meant rdp->pgnum is still 2 offsets backward, for CPU 1. >> >> Sorry. > > CPU 1? Or CPU 0? CPU 1, the one that was idle :-D So CPU 1 rdp did catch up with node and state for its completed field. But not its pgnum yet. >> >> > Now we call rcu_check_quiescent_state() -> check_for_new_grace_period() >> >> > -> note_new_gpnum() and then we end up a requested quiescent state while >> >> > every grace periods are completed. >> >> >> >> Sorry I should have described that in the changelogs but my ideas >> >> weren't as clear as they >> >> are now (at least I feel they are, doesn't mean they actually are ;) >> >> Chasing these RCU bugs for too much hours has toasted my brain.. >> > >> > Welcome to my world!!! But keep in mind that an extra timer tick >> > or two is much preferable to a potential hang! And you only get >> > the extra timer tick if there was some other reason that the >> > CPU came out of nohz mode, correct? >> >> Yeah, either because of a timer, hrtimer, or a reschedule. >> But you still generate a spurious softirq in this scheme. > > Eliminating spurious softirqs is a good thing, but let's review the > overall priorities (probably missing a few, but this should give you > an overall view): > > 1. No too-short grace periods!!! Oh, I did not know that. Is it to avoid too much short series of callbacks execution or? > 2. No grace-period hangs!! > > 3. Minimize read-side overhead! > > 4. Maintain decent update-side performance and scalability! > > 5. Avoid inflicting real-time latency problems! > > 6. Avoid waking up sleeping CPUs! > > 7. Let CPUs that can do so go to sleep immediately (as opposed to > waiting a few milliseconds). > > 8. Avoid spurious softirqs. Avoiding spurious softirqs bring us back to 7, even though it's not a matter of ms but rather us. > > 9. Super-optimization of update and grace-period paths. (I normally > just say "no" -- it has to be an impressive optimization for me > to be willing to risk messing up #1 through #7.) > > Nevertheless, simple changes that avoid spurious softirqs are good things. Ok, then the original patch of this discussion seems to fit then. Although I need to respin the changelog. >> Two in fact: one because of the rnp->completed != rsp->completed condition >> in rcu_pending(), another one because when we update the pgnum, we >> always start chasing QS, regardless of the last GP beeing completed or not. > > OK -- is this an example of #8 above, or is it really #7? I am absolutely > not worried about a pair of back-to-back softirqs, and I don't believe > that you should be, either. ;-) This seems to be 8 and 7. A pair of back-to-back softirqs is no huge deal, but still time consuming, spurious, etc... if we can easily spott they are useless, why not get rid of them. At least we can easily and reliably avoid the second one from note_new_gpnum(). >> > Which is why checking the rnp fields makes more sense to me, actually. >> > Acquiring rnp->lock is much less painful than pinning down the rsp state. >> >> Right. >> >> Another thing, we already have the (rnp->gpnum) != rdp->gpnu check in >> rcu_pending(), >> why also checking (rnp->completed) != rdp->completed) ? > > Because if (rnp->completed != rdp->completed), we might need to process > some callbacks, either advancing them or invoking them. The rdp->nxttail in rcu is still an obscur part for me, so I just believe you :) > By the way, have you introduced a config variable for your HPC dynticks > changes? I will, because I'll have to touch some fast path and I would prefer to do that compile-conditional. But for now I don't care and experiment drafts :) I'm quite close to something that seems to work well BTW, those series of softirq were my latest problem as it made rcu_pending() returning 1 for a good while and I coudn't stop the tick. After the 2nd patch it should be fine now, I hope. > Longer term, __rcu_pending() for your HPC dynticks case > could check for the current CPU having any callbacks before the call to > check_cpu_stalls(), as in rcu_needs_cpu_quick_check(). That way, the > CPUs with callbacks would drive the RCU core state machine. Ah, because currently every CPUs calling rcu_pending() can be pulled into handling the state machine because of this check, right? /* Has an RCU GP gone long enough to send resched IPIs &c? */ if (rcu_gp_in_progress(rsp) && ULONG_CMP_LT(ACCESS_ONCE(rsp->jiffies_force_qs), jiffies)) { rdp->n_rp_need_fqs++; return 1; So the goal would be to let this job to CPUs that have callbacks, right? > We don't > necessarily want this in the common case because it can increase > grace-period latency Because it makes less CPUs to handle the grace period state machine? > but it could be very useful in the HPC-dyntick > case -- eliminate any number of sources of spurious ticks with low > risk to the high-priority RCU properties. There are some corresponding > changes required on the force_quiescent_state() path, but these are > reasonably straightforward. Ok, I may have a try at it (if I completely understand what you suggest :-) But first I'll try to get that dyntick mode with the current state, which seems to be enough for the basic desired functionality. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] rcu: Don't chase unnecessary quiescent states after extended grace periods 2010-11-24 15:45 ` Frederic Weisbecker @ 2010-11-24 16:15 ` Paul E. McKenney 2010-11-24 17:38 ` Frederic Weisbecker 0 siblings, 1 reply; 27+ messages in thread From: Paul E. McKenney @ 2010-11-24 16:15 UTC (permalink / raw) To: Frederic Weisbecker Cc: LKML, Lai Jiangshan, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Steven Rostedt On Wed, Nov 24, 2010 at 04:45:11PM +0100, Frederic Weisbecker wrote: > 2010/11/24 Paul E. McKenney <paulmck@linux.vnet.ibm.com>: > > On Wed, Nov 24, 2010 at 02:48:46PM +0100, Frederic Weisbecker wrote: > >> 2010/11/24 Paul E. McKenney <paulmck@linux.vnet.ibm.com>: > >> > On Wed, Nov 24, 2010 at 03:33:21AM +0100, Frederic Weisbecker wrote: > >> >> 2010/11/24 Frederic Weisbecker <fweisbec@gmail.com>: > >> >> > On Tue, Nov 23, 2010 at 04:58:20PM -0800, Paul E. McKenney wrote: > >> >> >> On Wed, Nov 24, 2010 at 01:31:12AM +0100, Frederic Weisbecker wrote: > >> >> > The softirq fires, we call rcu_process_gp_end() which will > >> >> > update rdp->completed into the global state: > >> >> > (rsp->completed = rnp->pgnum = rnp->completed = rsp->pgnum = 4294967002). > >> >> > > >> >> > But rsp->pgnum is still 2 offsets backwards. > >> > > >> > This one is hard for me to believe -- rsp->gpnum drives the rest of > >> > the ->gpnum fields, right? > >> > >> Oops, I meant rdp->pgnum is still 2 offsets backward, for CPU 1. > >> > >> Sorry. > > > > CPU 1? Or CPU 0? > > CPU 1, the one that was idle :-D > > So CPU 1 rdp did catch up with node and state for its completed field. > But not its pgnum yet. OK, I will need to take a closer look at the rdp->gpnum setting. > >> >> > Now we call rcu_check_quiescent_state() -> check_for_new_grace_period() > >> >> > -> note_new_gpnum() and then we end up a requested quiescent state while > >> >> > every grace periods are completed. > >> >> > >> >> Sorry I should have described that in the changelogs but my ideas > >> >> weren't as clear as they > >> >> are now (at least I feel they are, doesn't mean they actually are ;) > >> >> Chasing these RCU bugs for too much hours has toasted my brain.. > >> > > >> > Welcome to my world!!! But keep in mind that an extra timer tick > >> > or two is much preferable to a potential hang! And you only get > >> > the extra timer tick if there was some other reason that the > >> > CPU came out of nohz mode, correct? > >> > >> Yeah, either because of a timer, hrtimer, or a reschedule. > >> But you still generate a spurious softirq in this scheme. > > > > Eliminating spurious softirqs is a good thing, but let's review the > > overall priorities (probably missing a few, but this should give you > > an overall view): > > > > 1. No too-short grace periods!!! > > Oh, I did not know that. Is it to avoid too much > short series of callbacks execution or? To avoid breaking RCU's fundamental guarantee. ;-) It is easy to create bugs where a CPU thinks that it needs to respond to the grace period based on a prior quiescent state, but unknown to it that grace period has finished and a new one has started. This CPU might then wrongly report a quiescent state against the new grace period. Needless to say, this is very very very very very very bad. > > 2. No grace-period hangs!! > > > > 3. Minimize read-side overhead! > > > > 4. Maintain decent update-side performance and scalability! > > > > 5. Avoid inflicting real-time latency problems! > > > > 6. Avoid waking up sleeping CPUs! > > > > 7. Let CPUs that can do so go to sleep immediately (as opposed to > > waiting a few milliseconds). > > > > 8. Avoid spurious softirqs. > > Avoiding spurious softirqs bring us back to 7, even though it's not a matter > of ms but rather us. Microseconds should not be a problem as long as they are rare. Milliseconds are a problem for the battery-powered folks as well as for people who care about OS jitter. > > 9. Super-optimization of update and grace-period paths. (I normally > > just say "no" -- it has to be an impressive optimization for me > > to be willing to risk messing up #1 through #7.) > > > > Nevertheless, simple changes that avoid spurious softirqs are good things. > > Ok, then the original patch of this discussion seems to fit then. > Although I need to respin the changelog. Sounds good -- I will then review it carefully. > >> Two in fact: one because of the rnp->completed != rsp->completed condition > >> in rcu_pending(), another one because when we update the pgnum, we > >> always start chasing QS, regardless of the last GP beeing completed or not. > > > > OK -- is this an example of #8 above, or is it really #7? I am absolutely > > not worried about a pair of back-to-back softirqs, and I don't believe > > that you should be, either. ;-) > > This seems to be 8 and 7. > > A pair of back-to-back softirqs is no huge deal, but still time > consuming, spurious, etc... > if we can easily spott they are useless, why not get rid of them. At > least we can easily > and reliably avoid the second one from note_new_gpnum(). As long as I can prove to myself that the patch that gets rid of them does not cause other problems. ;-) > >> > Which is why checking the rnp fields makes more sense to me, actually. > >> > Acquiring rnp->lock is much less painful than pinning down the rsp state. > >> > >> Right. > >> > >> Another thing, we already have the (rnp->gpnum) != rdp->gpnu check in > >> rcu_pending(), > >> why also checking (rnp->completed) != rdp->completed) ? > > > > Because if (rnp->completed != rdp->completed), we might need to process > > some callbacks, either advancing them or invoking them. > > The rdp->nxttail in rcu is still an obscur part for me, so I just believe you :) It is just the place that the pending rcu_head callbacks are stored. > > By the way, have you introduced a config variable for your HPC dynticks > > changes? > > I will, because I'll have to touch some fast path and I would prefer to do > that compile-conditional. > > But for now I don't care and experiment drafts :) Well, if any of the softirq/tick streamlining in RCU is risky, it needs to go under that same #ifdef! ;-) > I'm quite close to something that seems to work well BTW, those series of > softirq were my latest problem as it made rcu_pending() returning 1 for > a good while and I coudn't stop the tick. After the 2nd patch it should be > fine now, I hope. By "couldn't stop the tick" you mean "the tick went on a few times more than you like" or do you really mean "couldn't ever stop the tick"? My guess is the former. If there were problems like the latter, Arjan would probably have beat me up about it long ago! > > Longer term, __rcu_pending() for your HPC dynticks case > > could check for the current CPU having any callbacks before the call to > > check_cpu_stalls(), as in rcu_needs_cpu_quick_check(). That way, the > > CPUs with callbacks would drive the RCU core state machine. > > Ah, because currently every CPUs calling rcu_pending() can be pulled into > handling the state machine because of this check, right? > > /* Has an RCU GP gone long enough to send resched IPIs &c? */ > if (rcu_gp_in_progress(rsp) && > ULONG_CMP_LT(ACCESS_ONCE(rsp->jiffies_force_qs), jiffies)) { > rdp->n_rp_need_fqs++; > return 1; > > So the goal would be to let this job to CPUs that have callbacks, right? In the dynticks-HPC case, yes. > > We don't > > necessarily want this in the common case because it can increase > > grace-period latency > > Because it makes less CPUs to handle the grace period state machine? Not quite. Here I am not worried about the dyntick-HPC case, but rather the usual more-runnable-tasks-than-CPUs case. Here, CPUs are context switching frequently, so if they report their own quiescent states (even when they don't happen to have any callbacks queued) the grace period will complete more quickly. After all, force_quiescent_state() doesn't even get started until the third tick following the start of the grace period. In contrast, if we let the context-switching callback-free CPUs do their own work, the grace period will likely end in a tick or two. > > but it could be very useful in the HPC-dyntick > > case -- eliminate any number of sources of spurious ticks with low > > risk to the high-priority RCU properties. There are some corresponding > > changes required on the force_quiescent_state() path, but these are > > reasonably straightforward. > > Ok, I may have a try at it (if I completely understand what you suggest :-) > But first I'll try to get that dyntick mode with the current state, which seems > to be enough for the basic desired functionality. If the two changes you have posted thus far are all you need, this does sound like a good starting point. Thanx, Paul ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] rcu: Don't chase unnecessary quiescent states after extended grace periods 2010-11-24 16:15 ` Paul E. McKenney @ 2010-11-24 17:38 ` Frederic Weisbecker 2010-11-24 18:20 ` Paul E. McKenney 0 siblings, 1 reply; 27+ messages in thread From: Frederic Weisbecker @ 2010-11-24 17:38 UTC (permalink / raw) To: paulmck Cc: LKML, Lai Jiangshan, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Steven Rostedt 2010/11/24 Paul E. McKenney <paulmck@linux.vnet.ibm.com>: > On Wed, Nov 24, 2010 at 04:45:11PM +0100, Frederic Weisbecker wrote: >> 2010/11/24 Paul E. McKenney <paulmck@linux.vnet.ibm.com>: >> > On Wed, Nov 24, 2010 at 02:48:46PM +0100, Frederic Weisbecker wrote: >> CPU 1, the one that was idle :-D >> >> So CPU 1 rdp did catch up with node and state for its completed field. >> But not its pgnum yet. > > OK, I will need to take a closer look at the rdp->gpnum setting. Ok, do you want me to resend the patch with the changelog changed accordingly to our discussion or? >> >> >> > Now we call rcu_check_quiescent_state() -> check_for_new_grace_period() >> >> >> > -> note_new_gpnum() and then we end up a requested quiescent state while >> >> >> > every grace periods are completed. >> >> >> >> >> >> Sorry I should have described that in the changelogs but my ideas >> >> >> weren't as clear as they >> >> >> are now (at least I feel they are, doesn't mean they actually are ;) >> >> >> Chasing these RCU bugs for too much hours has toasted my brain.. >> >> > >> >> > Welcome to my world!!! But keep in mind that an extra timer tick >> >> > or two is much preferable to a potential hang! And you only get >> >> > the extra timer tick if there was some other reason that the >> >> > CPU came out of nohz mode, correct? >> >> >> >> Yeah, either because of a timer, hrtimer, or a reschedule. >> >> But you still generate a spurious softirq in this scheme. >> > >> > Eliminating spurious softirqs is a good thing, but let's review the >> > overall priorities (probably missing a few, but this should give you >> > an overall view): >> > >> > 1. No too-short grace periods!!! >> >> Oh, I did not know that. Is it to avoid too much >> short series of callbacks execution or? > > To avoid breaking RCU's fundamental guarantee. ;-) > > It is easy to create bugs where a CPU thinks that it needs to respond > to the grace period based on a prior quiescent state, but unknown to > it that grace period has finished and a new one has started. This > CPU might then wrongly report a quiescent state against the new grace > period. Needless to say, this is very very very very very very bad. Aaah ok. I thought you were talking about the time of a correct grace period, not a bug that would make it seem shorter than it actually is. Ok. Of course, priority number 1. >> > 7. Let CPUs that can do so go to sleep immediately (as opposed to >> > waiting a few milliseconds). >> > >> > 8. Avoid spurious softirqs. >> >> Avoiding spurious softirqs bring us back to 7, even though it's not a matter >> of ms but rather us. > > Microseconds should not be a problem as long as they are rare. Milliseconds > are a problem for the battery-powered folks as well as for people who care > about OS jitter. Right. >> >> Two in fact: one because of the rnp->completed != rsp->completed condition >> >> in rcu_pending(), another one because when we update the pgnum, we >> >> always start chasing QS, regardless of the last GP beeing completed or not. >> > >> > OK -- is this an example of #8 above, or is it really #7? I am absolutely >> > not worried about a pair of back-to-back softirqs, and I don't believe >> > that you should be, either. ;-) >> >> This seems to be 8 and 7. >> >> A pair of back-to-back softirqs is no huge deal, but still time >> consuming, spurious, etc... >> if we can easily spott they are useless, why not get rid of them. At >> least we can easily >> and reliably avoid the second one from note_new_gpnum(). > > As long as I can prove to myself that the patch that gets rid of them > does not cause other problems. ;-) :) >> >> > Which is why checking the rnp fields makes more sense to me, actually. >> >> > Acquiring rnp->lock is much less painful than pinning down the rsp state. >> >> >> >> Right. >> >> >> >> Another thing, we already have the (rnp->gpnum) != rdp->gpnu check in >> >> rcu_pending(), >> >> why also checking (rnp->completed) != rdp->completed) ? >> > >> > Because if (rnp->completed != rdp->completed), we might need to process >> > some callbacks, either advancing them or invoking them. >> >> The rdp->nxttail in rcu is still an obscur part for me, so I just believe you :) > > It is just the place that the pending rcu_head callbacks are stored. Yeah. I mean, I need to read how the code manages the different queues. But __rcu_process_gp_end() seems to sum it up quite well. >> > By the way, have you introduced a config variable for your HPC dynticks >> > changes? >> >> I will, because I'll have to touch some fast path and I would prefer to do >> that compile-conditional. >> >> But for now I don't care and experiment drafts :) > > Well, if any of the softirq/tick streamlining in RCU is risky, it needs > to go under that same #ifdef! ;-) Sure, if something needs a reorg or a change even in the dyntick-hpc off-case code, I'll propose that to you, like I did for these two patches. But I'll try to keep the invasive changes that are only for dyntick-hpc only under that config. >> I'm quite close to something that seems to work well BTW, those series of >> softirq were my latest problem as it made rcu_pending() returning 1 for >> a good while and I coudn't stop the tick. After the 2nd patch it should be >> fine now, I hope. > > By "couldn't stop the tick" you mean "the tick went on a few times more > than you like" or do you really mean "couldn't ever stop the tick"? > My guess is the former. If there were problems like the latter, Arjan > would probably have beat me up about it long ago! It's more like couldn't ever stop the tick. But that doesn't concern mainline. This is because I have a hook that prevents the tick from beeing stopped until rcu_pending() == 0. In mainline it doesn't prevent the CPU from going nohz idle though, because the softirq is armed from the tick. Once the softirq is processed, the CPU can go to sleep. On the next timer tick it would again raise the softirq and then could again go to sleep, etc.. I still have a trace of that, with my rcu_pending() hook, in dyntick-hpc, that kept returning 1 during at least 100 seconds and on each tick. I did not go really further into this from my code as I immediately switched to tip:master to check if the problem came from my code or not. And then I discovered that rcu_pending() indeed kept returning 1 for some while in mainline (don't remember how much could be "some while" though), I saw all these spurious rcu softirq at each ticks caused by rcu_pending() and for random time slices: probably between a wake up from idle and the next grace period, if my theory is right, and I think that happened likely with bh flavour probably because it's subject to less grace periods. And this is what the second patch fixes in mainline and that also seems to fix my issue in dyntick-hpc. Probably it happened more easily on dynctick-hpc as I was calling rcu_pending() after calling rcu_enter_nohz() (some buggy part of mine). >> > Longer term, __rcu_pending() for your HPC dynticks case >> > could check for the current CPU having any callbacks before the call to >> > check_cpu_stalls(), as in rcu_needs_cpu_quick_check(). That way, the >> > CPUs with callbacks would drive the RCU core state machine. >> >> Ah, because currently every CPUs calling rcu_pending() can be pulled into >> handling the state machine because of this check, right? >> >> /* Has an RCU GP gone long enough to send resched IPIs &c? */ >> if (rcu_gp_in_progress(rsp) && >> ULONG_CMP_LT(ACCESS_ONCE(rsp->jiffies_force_qs), jiffies)) { >> rdp->n_rp_need_fqs++; >> return 1; >> >> So the goal would be to let this job to CPUs that have callbacks, right? > > In the dynticks-HPC case, yes. > >> > We don't >> > necessarily want this in the common case because it can increase >> > grace-period latency >> >> Because it makes less CPUs to handle the grace period state machine? > > Not quite. Here I am not worried about the dyntick-HPC case, but rather > the usual more-runnable-tasks-than-CPUs case. Here, CPUs are context > switching frequently, so if they report their own quiescent states (even > when they don't happen to have any callbacks queued) the grace period will > complete more quickly. After all, force_quiescent_state() doesn't even > get started until the third tick following the start of the grace period. Ah, I see what you mean. So you would suggest to even ignore those explicit QS report when in dynticj-hpc mode for CPUs that don't have callbacks? Why not keeping them? >> > but it could be very useful in the HPC-dyntick >> > case -- eliminate any number of sources of spurious ticks with low >> > risk to the high-priority RCU properties. There are some corresponding >> > changes required on the force_quiescent_state() path, but these are >> > reasonably straightforward. >> >> Ok, I may have a try at it (if I completely understand what you suggest :-) >> But first I'll try to get that dyntick mode with the current state, which seems >> to be enough for the basic desired functionality. > > If the two changes you have posted thus far are all you need, this does > sound like a good starting point. For now yeah, it seems to work well. I may run into further surpises though :) ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] rcu: Don't chase unnecessary quiescent states after extended grace periods 2010-11-24 17:38 ` Frederic Weisbecker @ 2010-11-24 18:20 ` Paul E. McKenney 2010-11-24 20:22 ` Paul E. McKenney 2010-11-24 22:42 ` Frederic Weisbecker 0 siblings, 2 replies; 27+ messages in thread From: Paul E. McKenney @ 2010-11-24 18:20 UTC (permalink / raw) To: Frederic Weisbecker Cc: LKML, Lai Jiangshan, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Steven Rostedt On Wed, Nov 24, 2010 at 06:38:45PM +0100, Frederic Weisbecker wrote: > 2010/11/24 Paul E. McKenney <paulmck@linux.vnet.ibm.com>: > > On Wed, Nov 24, 2010 at 04:45:11PM +0100, Frederic Weisbecker wrote: > >> 2010/11/24 Paul E. McKenney <paulmck@linux.vnet.ibm.com>: > >> > On Wed, Nov 24, 2010 at 02:48:46PM +0100, Frederic Weisbecker wrote: > >> CPU 1, the one that was idle :-D > >> > >> So CPU 1 rdp did catch up with node and state for its completed field. > >> But not its pgnum yet. > > > > OK, I will need to take a closer look at the rdp->gpnum setting. > > Ok, do you want me to resend the patch with the changelog changed accordingly > to our discussion or? Please! > >> >> >> > Now we call rcu_check_quiescent_state() -> check_for_new_grace_period() > >> >> >> > -> note_new_gpnum() and then we end up a requested quiescent state while > >> >> >> > every grace periods are completed. > >> >> >> > >> >> >> Sorry I should have described that in the changelogs but my ideas > >> >> >> weren't as clear as they > >> >> >> are now (at least I feel they are, doesn't mean they actually are ;) > >> >> >> Chasing these RCU bugs for too much hours has toasted my brain.. > >> >> > > >> >> > Welcome to my world!!! But keep in mind that an extra timer tick > >> >> > or two is much preferable to a potential hang! And you only get > >> >> > the extra timer tick if there was some other reason that the > >> >> > CPU came out of nohz mode, correct? > >> >> > >> >> Yeah, either because of a timer, hrtimer, or a reschedule. > >> >> But you still generate a spurious softirq in this scheme. > >> > > >> > Eliminating spurious softirqs is a good thing, but let's review the > >> > overall priorities (probably missing a few, but this should give you > >> > an overall view): > >> > > >> > 1. No too-short grace periods!!! > >> > >> Oh, I did not know that. Is it to avoid too much > >> short series of callbacks execution or? > > > > To avoid breaking RCU's fundamental guarantee. ;-) > > > > It is easy to create bugs where a CPU thinks that it needs to respond > > to the grace period based on a prior quiescent state, but unknown to > > it that grace period has finished and a new one has started. This > > CPU might then wrongly report a quiescent state against the new grace > > period. Needless to say, this is very very very very very very bad. > > Aaah ok. I thought you were talking about the time of a correct grace period, > not a bug that would make it seem shorter than it actually is. > > Ok. Of course, priority number 1. ;-) > >> > 7. Let CPUs that can do so go to sleep immediately (as opposed to > >> > waiting a few milliseconds). > >> > > >> > 8. Avoid spurious softirqs. > >> > >> Avoiding spurious softirqs bring us back to 7, even though it's not a matter > >> of ms but rather us. > > > > Microseconds should not be a problem as long as they are rare. Milliseconds > > are a problem for the battery-powered folks as well as for people who care > > about OS jitter. > > Right. > > > >> >> Two in fact: one because of the rnp->completed != rsp->completed condition > >> >> in rcu_pending(), another one because when we update the pgnum, we > >> >> always start chasing QS, regardless of the last GP beeing completed or not. > >> > > >> > OK -- is this an example of #8 above, or is it really #7? I am absolutely > >> > not worried about a pair of back-to-back softirqs, and I don't believe > >> > that you should be, either. ;-) > >> > >> This seems to be 8 and 7. > >> > >> A pair of back-to-back softirqs is no huge deal, but still time > >> consuming, spurious, etc... > >> if we can easily spott they are useless, why not get rid of them. At > >> least we can easily > >> and reliably avoid the second one from note_new_gpnum(). > > > > As long as I can prove to myself that the patch that gets rid of them > > does not cause other problems. ;-) > > :) > > >> >> > Which is why checking the rnp fields makes more sense to me, actually. > >> >> > Acquiring rnp->lock is much less painful than pinning down the rsp state. > >> >> > >> >> Right. > >> >> > >> >> Another thing, we already have the (rnp->gpnum) != rdp->gpnu check in > >> >> rcu_pending(), > >> >> why also checking (rnp->completed) != rdp->completed) ? > >> > > >> > Because if (rnp->completed != rdp->completed), we might need to process > >> > some callbacks, either advancing them or invoking them. > >> > >> The rdp->nxttail in rcu is still an obscur part for me, so I just believe you :) > > > > It is just the place that the pending rcu_head callbacks are stored. > > Yeah. I mean, I need to read how the code manages the different queues. > But __rcu_process_gp_end() seems to sum it up quite well. For advancing callbacks, that is the one! For invocation of callbacks, see rcu_do_batch(). > >> > By the way, have you introduced a config variable for your HPC dynticks > >> > changes? > >> > >> I will, because I'll have to touch some fast path and I would prefer to do > >> that compile-conditional. > >> > >> But for now I don't care and experiment drafts :) > > > > Well, if any of the softirq/tick streamlining in RCU is risky, it needs > > to go under that same #ifdef! ;-) > > Sure, if something needs a reorg or a change even in the dyntick-hpc > off-case code, > I'll propose that to you, like I did for these two patches. > > But I'll try to keep the invasive changes that are only for > dyntick-hpc only under that config. Sounds good! > >> I'm quite close to something that seems to work well BTW, those series of > >> softirq were my latest problem as it made rcu_pending() returning 1 for > >> a good while and I coudn't stop the tick. After the 2nd patch it should be > >> fine now, I hope. > > > > By "couldn't stop the tick" you mean "the tick went on a few times more > > than you like" or do you really mean "couldn't ever stop the tick"? > > My guess is the former. If there were problems like the latter, Arjan > > would probably have beat me up about it long ago! > > It's more like couldn't ever stop the tick. But that doesn't concern mainline. > This is because I have a hook that prevents the tick from beeing stopped > until rcu_pending() == 0. That would certainly change behavior!!! Why did you need to do that? Ah, because force_quiescent_state() has not yet been taught about dyntick-HPC, got it... > In mainline it doesn't prevent the CPU from going nohz idle though, because > the softirq is armed from the tick. Once the softirq is processed, the CPU > can go to sleep. On the next timer tick it would again raise the softirq and > then could again go to sleep, etc.. You lost me on this one. If the CPU goes to sleep (AKA enters dyntick-idle mode, right?), then there wouldn't be a next timer tick, right? > I still have a trace of that, with my rcu_pending() hook, in > dyntick-hpc, that kept > returning 1 during at least 100 seconds and on each tick. > I did not go really further into this from my code as I immediately > switched to tip:master > to check if the problem came from my code or not. > And then I discovered that rcu_pending() indeed kept returning 1 for some while > in mainline (don't remember how much could be "some while" though), I > saw all these > spurious rcu softirq at each ticks caused by rcu_pending() and for > random time slices: > probably between a wake up from idle and the next grace period, if my > theory is right, and I > think that happened likely with bh flavour probably because it's > subject to less grace periods. > > And this is what the second patch fixes in mainline and that also > seems to fix my issue in > dyntick-hpc. > > Probably it happened more easily on dynctick-hpc as I was calling > rcu_pending() after > calling rcu_enter_nohz() (some buggy part of mine). OK, but that is why dyntick-idle is governed by rcu_needs_cpu() rather than rcu_pending(). But yes, need to upgrade force_quiescent_state(). One hacky way to do that would be to replace smp_send_reschedule() with an smp_call_function_single() that invoked something like the following on the target CPU: static void rcu_poke_cpu(void *info) { raise_softirq(RCU_SOFTIRQ); } So rcu_implicit_offline_qs() does something like the following in place of the smp_send_reschedule(): smp_call_function_single(rdp->cpu, rcu_poke_cpu, NULL, 0); The call to set_need_resched() can remain as is. Of course, a mainline version would need to be a bit more discerning, but this should do work just fine for your experimental use. This should allow you to revert back to rcu_needs_cpu(). Or am I missing something here? > >> > Longer term, __rcu_pending() for your HPC dynticks case > >> > could check for the current CPU having any callbacks before the call to > >> > check_cpu_stalls(), as in rcu_needs_cpu_quick_check(). That way, the > >> > CPUs with callbacks would drive the RCU core state machine. > >> > >> Ah, because currently every CPUs calling rcu_pending() can be pulled into > >> handling the state machine because of this check, right? > >> > >> /* Has an RCU GP gone long enough to send resched IPIs &c? */ > >> if (rcu_gp_in_progress(rsp) && > >> ULONG_CMP_LT(ACCESS_ONCE(rsp->jiffies_force_qs), jiffies)) { > >> rdp->n_rp_need_fqs++; > >> return 1; > >> > >> So the goal would be to let this job to CPUs that have callbacks, right? > > > > In the dynticks-HPC case, yes. > > > >> > We don't > >> > necessarily want this in the common case because it can increase > >> > grace-period latency > >> > >> Because it makes less CPUs to handle the grace period state machine? > > > > Not quite. Here I am not worried about the dyntick-HPC case, but rather > > the usual more-runnable-tasks-than-CPUs case. Here, CPUs are context > > switching frequently, so if they report their own quiescent states (even > > when they don't happen to have any callbacks queued) the grace period will > > complete more quickly. After all, force_quiescent_state() doesn't even > > get started until the third tick following the start of the grace period. > > Ah, I see what you mean. So you would suggest to even ignore those > explicit QS report when in dynticj-hpc mode for CPUs that don't have callbacks? > > Why not keeping them? My belief is that people needing dyntick-HPC are OK with RCU grace periods taking a few jiffies longer than they might otherwise. Besides, when you are running dyntick-HPC, you aren't context switching much, so keeping the tick doesn't buy you as much reduction in grace-period latency. > >> > but it could be very useful in the HPC-dyntick > >> > case -- eliminate any number of sources of spurious ticks with low > >> > risk to the high-priority RCU properties. There are some corresponding > >> > changes required on the force_quiescent_state() path, but these are > >> > reasonably straightforward. > >> > >> Ok, I may have a try at it (if I completely understand what you suggest :-) > >> But first I'll try to get that dyntick mode with the current state, which seems > >> to be enough for the basic desired functionality. > > > > If the two changes you have posted thus far are all you need, this does > > sound like a good starting point. > > For now yeah, it seems to work well. I may run into further surpises though :) Perish the thought!!! ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] rcu: Don't chase unnecessary quiescent states after extended grace periods 2010-11-24 18:20 ` Paul E. McKenney @ 2010-11-24 20:22 ` Paul E. McKenney 2010-11-24 20:45 ` Frederic Weisbecker 2010-11-24 22:42 ` Frederic Weisbecker 1 sibling, 1 reply; 27+ messages in thread From: Paul E. McKenney @ 2010-11-24 20:22 UTC (permalink / raw) To: Frederic Weisbecker Cc: LKML, Lai Jiangshan, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Steven Rostedt On Wed, Nov 24, 2010 at 10:20:51AM -0800, Paul E. McKenney wrote: > On Wed, Nov 24, 2010 at 06:38:45PM +0100, Frederic Weisbecker wrote: > > 2010/11/24 Paul E. McKenney <paulmck@linux.vnet.ibm.com>: > > > On Wed, Nov 24, 2010 at 04:45:11PM +0100, Frederic Weisbecker wrote: > > >> 2010/11/24 Paul E. McKenney <paulmck@linux.vnet.ibm.com>: > > >> > On Wed, Nov 24, 2010 at 02:48:46PM +0100, Frederic Weisbecker wrote: > > >> CPU 1, the one that was idle :-D > > >> > > >> So CPU 1 rdp did catch up with node and state for its completed field. > > >> But not its pgnum yet. > > > > > > OK, I will need to take a closer look at the rdp->gpnum setting. > > > > Ok, do you want me to resend the patch with the changelog changed accordingly > > to our discussion or? > > Please! I take it back. I queued the following -- your code, but updated comment and commit message. Please let me know if I missed anything. Thanx, Paul ------------------------------------------------------------------------ commit 1d9d947bb882371a0877ba05207a0b996dcb38ee Author: Frederic Weisbecker <fweisbec@gmail.com> Date: Wed Nov 24 01:31:12 2010 +0100 rcu: Don't chase unnecessary quiescent states after extended grace periods When a CPU is in an extended quiescent state, including offline and dyntick-idle mode, other CPUs will detect the extended quiescent state and respond to the the current grace period on that CPU's behalf. However, the locking design prevents those other CPUs from updating the first CPU's rcu_data state. Therefore, when this CPU exits its extended quiescent state, it must update its rcu_data state. Because such a CPU will usually check for the completion of a prior grace period before checking for the start of a new grace period, the rcu_data ->completed field will be updated before the rcu_data ->gpnum field. This means that if RCU is currently idle, the CPU will usually enter __note_new_gpnum() with ->completed set to the current grace-period number, but with ->gpnum set to some long-ago grace period number. Unfortunately, __note_new_gpnum() will then insist that the current CPU needlessly check for a new quiescent state. This checking can result in this CPU needlessly taking several scheduling-clock interrupts. This bug is harmless in most cases, but is a problem for users concerned with OS jitter for HPC applications or concerned with battery lifetime for portable SMP embedded devices. This commit therefore makes the test in __note_new_gpnum() check for this situation and avoid the needless quiescent-state checks. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> diff --git a/kernel/rcutree.c b/kernel/rcutree.c index 5df948f..76cd5d2 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -616,8 +616,20 @@ static void __init check_cpu_stall_init(void) static void __note_new_gpnum(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_data *rdp) { if (rdp->gpnum != rnp->gpnum) { - rdp->qs_pending = 1; - rdp->passed_quiesc = 0; + /* + * Because RCU checks for the prior grace period ending + * before checking for a new grace period starting, it + * is possible for rdp->gpnum to be set to the old grace + * period and rdp->completed to be set to the new grace + * period. So don't bother checking for a quiescent state + * for the rnp->gpnum grace period unless it really is + * waiting for this CPU. + */ + if (rdp->completed != rnp->gpnum) { + rdp->qs_pending = 1; + rdp->passed_quiesc = 0; + } + rdp->gpnum = rnp->gpnum; } } ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] rcu: Don't chase unnecessary quiescent states after extended grace periods 2010-11-24 20:22 ` Paul E. McKenney @ 2010-11-24 20:45 ` Frederic Weisbecker 2010-11-24 21:19 ` Paul E. McKenney 0 siblings, 1 reply; 27+ messages in thread From: Frederic Weisbecker @ 2010-11-24 20:45 UTC (permalink / raw) To: paulmck Cc: LKML, Lai Jiangshan, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Steven Rostedt 2010/11/24 Paul E. McKenney <paulmck@linux.vnet.ibm.com>: > I take it back. I queued the following -- your code, but updated > comment and commit message. Please let me know if I missed anything. > > Thanx, Paul > > ------------------------------------------------------------------------ > > commit 1d9d947bb882371a0877ba05207a0b996dcb38ee > Author: Frederic Weisbecker <fweisbec@gmail.com> > Date: Wed Nov 24 01:31:12 2010 +0100 > > rcu: Don't chase unnecessary quiescent states after extended grace periods > > When a CPU is in an extended quiescent state, including offline and > dyntick-idle mode, other CPUs will detect the extended quiescent state > and respond to the the current grace period on that CPU's behalf. > However, the locking design prevents those other CPUs from updating > the first CPU's rcu_data state. > > Therefore, when this CPU exits its extended quiescent state, it must > update its rcu_data state. Because such a CPU will usually check for > the completion of a prior grace period before checking for the start of a > new grace period, the rcu_data ->completed field will be updated before > the rcu_data ->gpnum field. This means that if RCU is currently idle, > the CPU will usually enter __note_new_gpnum() with ->completed set to > the current grace-period number, but with ->gpnum set to some long-ago > grace period number. Unfortunately, __note_new_gpnum() will then insist > that the current CPU needlessly check for a new quiescent state. This > checking can result in this CPU needlessly taking several scheduling-clock > interrupts. So I'm all ok for the commit and the comments updated. But just a doubt about the about sentence. The effect seems more that there will be one extra softirq. But not an extra tick because before sleeping, the CPU will check rcu_needs_cpu() which doesn't check for the need of noting a quiescent state, IIRC... And I think the softirq will be only raised on the next tick. Hm? > > This bug is harmless in most cases, but is a problem for users concerned > with OS jitter for HPC applications or concerned with battery lifetime > for portable SMP embedded devices. This commit therefore makes the > test in __note_new_gpnum() check for this situation and avoid the needless > quiescent-state checks. > > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > Cc: Lai Jiangshan <laijs@cn.fujitsu.com> > Cc: Ingo Molnar <mingo@elte.hu> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> > Cc: Steven Rostedt <rostedt@goodmis.org> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > diff --git a/kernel/rcutree.c b/kernel/rcutree.c > index 5df948f..76cd5d2 100644 > --- a/kernel/rcutree.c > +++ b/kernel/rcutree.c > @@ -616,8 +616,20 @@ static void __init check_cpu_stall_init(void) > static void __note_new_gpnum(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_data *rdp) > { > if (rdp->gpnum != rnp->gpnum) { > - rdp->qs_pending = 1; > - rdp->passed_quiesc = 0; > + /* > + * Because RCU checks for the prior grace period ending > + * before checking for a new grace period starting, it > + * is possible for rdp->gpnum to be set to the old grace > + * period and rdp->completed to be set to the new grace > + * period. So don't bother checking for a quiescent state > + * for the rnp->gpnum grace period unless it really is > + * waiting for this CPU. > + */ > + if (rdp->completed != rnp->gpnum) { > + rdp->qs_pending = 1; > + rdp->passed_quiesc = 0; > + } > + > rdp->gpnum = rnp->gpnum; > } > } > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] rcu: Don't chase unnecessary quiescent states after extended grace periods 2010-11-24 20:45 ` Frederic Weisbecker @ 2010-11-24 21:19 ` Paul E. McKenney 2010-11-24 21:50 ` Frederic Weisbecker 0 siblings, 1 reply; 27+ messages in thread From: Paul E. McKenney @ 2010-11-24 21:19 UTC (permalink / raw) To: Frederic Weisbecker Cc: LKML, Lai Jiangshan, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Steven Rostedt On Wed, Nov 24, 2010 at 09:45:08PM +0100, Frederic Weisbecker wrote: > 2010/11/24 Paul E. McKenney <paulmck@linux.vnet.ibm.com>: > > I take it back. I queued the following -- your code, but updated > > comment and commit message. Please let me know if I missed anything. > > > > Thanx, Paul > > > > ------------------------------------------------------------------------ > > > > commit 1d9d947bb882371a0877ba05207a0b996dcb38ee > > Author: Frederic Weisbecker <fweisbec@gmail.com> > > Date: Wed Nov 24 01:31:12 2010 +0100 > > > > rcu: Don't chase unnecessary quiescent states after extended grace periods > > > > When a CPU is in an extended quiescent state, including offline and > > dyntick-idle mode, other CPUs will detect the extended quiescent state > > and respond to the the current grace period on that CPU's behalf. > > However, the locking design prevents those other CPUs from updating > > the first CPU's rcu_data state. > > > > Therefore, when this CPU exits its extended quiescent state, it must > > update its rcu_data state. Because such a CPU will usually check for > > the completion of a prior grace period before checking for the start of a > > new grace period, the rcu_data ->completed field will be updated before > > the rcu_data ->gpnum field. This means that if RCU is currently idle, > > the CPU will usually enter __note_new_gpnum() with ->completed set to > > the current grace-period number, but with ->gpnum set to some long-ago > > grace period number. Unfortunately, __note_new_gpnum() will then insist > > that the current CPU needlessly check for a new quiescent state. This > > checking can result in this CPU needlessly taking several scheduling-clock > > interrupts. > > > So I'm all ok for the commit and the comments updated. But just a doubt about > the about sentence. > > The effect seems more that there will be one extra softirq. But not an > extra tick > because before sleeping, the CPU will check rcu_needs_cpu() which > doesn't check for > the need of noting a quiescent state, IIRC... > > And I think the softirq will be only raised on the next tick. > > Hm? Good point! This paragraph now reads: Therefore, when this CPU exits its extended quiescent state, it must update its rcu_data state. Because such a CPU will usually check for the completion of a prior grace period before checking for the start of a new grace period, the rcu_data ->completed field will be updated before the rcu_data ->gpnum field. This means that if RCU is currently idle, the CPU will usually enter __note_new_gpnum() with ->completed set to the current grace-period number, but with ->gpnum set to some long-ago grace period number. Unfortunately, __note_new_gpnum() will then insist that the current CPU needlessly check for a new quiescent state. This checking can result in this CPU needlessly taking an additional softirq for unnecessary RCU processing. Fair enough? Thanx, Paul > > This bug is harmless in most cases, but is a problem for users concerned > > with OS jitter for HPC applications or concerned with battery lifetime > > for portable SMP embedded devices. This commit therefore makes the > > test in __note_new_gpnum() check for this situation and avoid the needless > > quiescent-state checks. > > > > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> > > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > Cc: Lai Jiangshan <laijs@cn.fujitsu.com> > > Cc: Ingo Molnar <mingo@elte.hu> > > Cc: Thomas Gleixner <tglx@linutronix.de> > > Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> > > Cc: Steven Rostedt <rostedt@goodmis.org> > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > > > diff --git a/kernel/rcutree.c b/kernel/rcutree.c > > index 5df948f..76cd5d2 100644 > > --- a/kernel/rcutree.c > > +++ b/kernel/rcutree.c > > @@ -616,8 +616,20 @@ static void __init check_cpu_stall_init(void) > > static void __note_new_gpnum(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_data *rdp) > > { > > if (rdp->gpnum != rnp->gpnum) { > > - rdp->qs_pending = 1; > > - rdp->passed_quiesc = 0; > > + /* > > + * Because RCU checks for the prior grace period ending > > + * before checking for a new grace period starting, it > > + * is possible for rdp->gpnum to be set to the old grace > > + * period and rdp->completed to be set to the new grace > > + * period. So don't bother checking for a quiescent state > > + * for the rnp->gpnum grace period unless it really is > > + * waiting for this CPU. > > + */ > > + if (rdp->completed != rnp->gpnum) { > > + rdp->qs_pending = 1; > > + rdp->passed_quiesc = 0; > > + } > > + > > rdp->gpnum = rnp->gpnum; > > } > > } > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] rcu: Don't chase unnecessary quiescent states after extended grace periods 2010-11-24 21:19 ` Paul E. McKenney @ 2010-11-24 21:50 ` Frederic Weisbecker 0 siblings, 0 replies; 27+ messages in thread From: Frederic Weisbecker @ 2010-11-24 21:50 UTC (permalink / raw) To: paulmck Cc: LKML, Lai Jiangshan, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Steven Rostedt 2010/11/24 Paul E. McKenney <paulmck@linux.vnet.ibm.com>: > On Wed, Nov 24, 2010 at 09:45:08PM +0100, Frederic Weisbecker wrote: >> 2010/11/24 Paul E. McKenney <paulmck@linux.vnet.ibm.com>: >> > I take it back. I queued the following -- your code, but updated >> > comment and commit message. Please let me know if I missed anything. >> > >> > Thanx, Paul >> > >> > ------------------------------------------------------------------------ >> > >> > commit 1d9d947bb882371a0877ba05207a0b996dcb38ee >> > Author: Frederic Weisbecker <fweisbec@gmail.com> >> > Date: Wed Nov 24 01:31:12 2010 +0100 >> > >> > rcu: Don't chase unnecessary quiescent states after extended grace periods >> > >> > When a CPU is in an extended quiescent state, including offline and >> > dyntick-idle mode, other CPUs will detect the extended quiescent state >> > and respond to the the current grace period on that CPU's behalf. >> > However, the locking design prevents those other CPUs from updating >> > the first CPU's rcu_data state. >> > >> > Therefore, when this CPU exits its extended quiescent state, it must >> > update its rcu_data state. Because such a CPU will usually check for >> > the completion of a prior grace period before checking for the start of a >> > new grace period, the rcu_data ->completed field will be updated before >> > the rcu_data ->gpnum field. This means that if RCU is currently idle, >> > the CPU will usually enter __note_new_gpnum() with ->completed set to >> > the current grace-period number, but with ->gpnum set to some long-ago >> > grace period number. Unfortunately, __note_new_gpnum() will then insist >> > that the current CPU needlessly check for a new quiescent state. This >> > checking can result in this CPU needlessly taking several scheduling-clock >> > interrupts. >> >> >> So I'm all ok for the commit and the comments updated. But just a doubt about >> the about sentence. >> >> The effect seems more that there will be one extra softirq. But not an >> extra tick >> because before sleeping, the CPU will check rcu_needs_cpu() which >> doesn't check for >> the need of noting a quiescent state, IIRC... >> >> And I think the softirq will be only raised on the next tick. >> >> Hm? > > Good point! This paragraph now reads: > > Therefore, when this CPU exits its extended quiescent state, > it must update its rcu_data state. Because such a CPU will > usually check for the completion of a prior grace period > before checking for the start of a new grace period, the > rcu_data ->completed field will be updated before the rcu_data > ->gpnum field. This means that if RCU is currently idle, the > CPU will usually enter __note_new_gpnum() with ->completed set > to the current grace-period number, but with ->gpnum set to some > long-ago grace period number. Unfortunately, __note_new_gpnum() > will then insist that the current CPU needlessly check for a new > quiescent state. This checking can result in this CPU needlessly > taking an additional softirq for unnecessary RCU processing. > > Fair enough? Perfect! :) Thanks a lot! ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] rcu: Don't chase unnecessary quiescent states after extended grace periods 2010-11-24 18:20 ` Paul E. McKenney 2010-11-24 20:22 ` Paul E. McKenney @ 2010-11-24 22:42 ` Frederic Weisbecker 2010-11-25 14:56 ` Paul E. McKenney 1 sibling, 1 reply; 27+ messages in thread From: Frederic Weisbecker @ 2010-11-24 22:42 UTC (permalink / raw) To: paulmck Cc: LKML, Lai Jiangshan, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Steven Rostedt Le Wed, 24 Nov 2010 10:20:51 -0800, "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> a écrit : > On Wed, Nov 24, 2010 at 06:38:45PM +0100, Frederic Weisbecker wrote: > > Yeah. I mean, I need to read how the code manages the different > > queues. But __rcu_process_gp_end() seems to sum it up quite well. > > For advancing callbacks, that is the one! For invocation of > callbacks, see rcu_do_batch(). Ok. > > It's more like couldn't ever stop the tick. But that doesn't > > concern mainline. This is because I have a hook that prevents the > > tick from beeing stopped until rcu_pending() == 0. > > That would certainly change behavior!!! Why did you need to do that? > > Ah, because force_quiescent_state() has not yet been taught about > dyntick-HPC, got it... Oh actually I have taught it about that. For such isolated CPU that doesn't respond, it sends a specific IPI that will restart the tick if we are not in nohz. The point in restarting the tick is to find some quiescent states and also to keep the tick for a little while for potential further grace periods to complete while we are in the kernel. This is why I use rcu_pending() from the tick: to check if we still need the tick for rcu. > > In mainline it doesn't prevent the CPU from going nohz idle though, > > because the softirq is armed from the tick. Once the softirq is > > processed, the CPU can go to sleep. On the next timer tick it would > > again raise the softirq and then could again go to sleep, etc.. > > You lost me on this one. If the CPU goes to sleep (AKA enters > dyntick-idle mode, right?), then there wouldn't be a next timer tick, > right? If there is a timer queued (timer_list or hrtimer), then the next timer tick is programmed to fire for the next timer. Until then the CPU can go to sleep and it will be woken up on that next timer interrupt. > > I still have a trace of that, with my rcu_pending() hook, in > > dyntick-hpc, that kept > > returning 1 during at least 100 seconds and on each tick. > > I did not go really further into this from my code as I immediately > > switched to tip:master > > to check if the problem came from my code or not. > > And then I discovered that rcu_pending() indeed kept returning 1 > > for some while in mainline (don't remember how much could be "some > > while" though), I saw all these > > spurious rcu softirq at each ticks caused by rcu_pending() and for > > random time slices: > > probably between a wake up from idle and the next grace period, if > > my theory is right, and I > > think that happened likely with bh flavour probably because it's > > subject to less grace periods. > > > > And this is what the second patch fixes in mainline and that also > > seems to fix my issue in > > dyntick-hpc. > > > > Probably it happened more easily on dynctick-hpc as I was calling > > rcu_pending() after > > calling rcu_enter_nohz() (some buggy part of mine). > > OK, but that is why dyntick-idle is governed by rcu_needs_cpu() rather > than rcu_pending(). But yes, need to upgrade force_quiescent_state(). > > One hacky way to do that would be to replace smp_send_reschedule() > with an smp_call_function_single() that invoked something like the > following on the target CPU: > > static void rcu_poke_cpu(void *info) > { > raise_softirq(RCU_SOFTIRQ); > } > > So rcu_implicit_offline_qs() does something like the following in > place of the smp_send_reschedule(): > > smp_call_function_single(rdp->cpu, rcu_poke_cpu, NULL, 0); > > The call to set_need_resched() can remain as is. > > Of course, a mainline version would need to be a bit more discerning, > but this should do work just fine for your experimental use. > > This should allow you to revert back to rcu_needs_cpu(). > > Or am I missing something here? So, as I explained above I'm currently using such an alternate IPI. But raising the softirq would only take care of: * checking if there is a new grace period (rearm rdp->qs_pending and so) * take care of callbacks But it's not enough to track quiescent states. And we have no more timer interrupts to track them. So we need to restart the tick at least until we find a quiescent state for the grace period waiting for us. But I may be missing something either :) > > Ah, I see what you mean. So you would suggest to even ignore those > > explicit QS report when in dynticj-hpc mode for CPUs that don't > > have callbacks? > > > > Why not keeping them? > > My belief is that people needing dyntick-HPC are OK with RCU grace > periods taking a few jiffies longer than they might otherwise. > Besides, when you are running dyntick-HPC, you aren't context > switching much, so keeping the tick doesn't buy you as much reduction > in grace-period latency. But don't we still need the tick on such cases, (if we aren't in userspace) when a grace period starts, to note our grace periods? The rcu IPI itself doesn't seem to be sufficient for that. I'm not sure I undrstand what you mean. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] rcu: Don't chase unnecessary quiescent states after extended grace periods 2010-11-24 22:42 ` Frederic Weisbecker @ 2010-11-25 14:56 ` Paul E. McKenney 2010-11-26 14:06 ` Frederic Weisbecker 0 siblings, 1 reply; 27+ messages in thread From: Paul E. McKenney @ 2010-11-25 14:56 UTC (permalink / raw) To: Frederic Weisbecker Cc: LKML, Lai Jiangshan, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Steven Rostedt On Wed, Nov 24, 2010 at 11:42:57PM +0100, Frederic Weisbecker wrote: > Le Wed, 24 Nov 2010 10:20:51 -0800, > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> a écrit : > > > On Wed, Nov 24, 2010 at 06:38:45PM +0100, Frederic Weisbecker wrote: > > > Yeah. I mean, I need to read how the code manages the different > > > queues. But __rcu_process_gp_end() seems to sum it up quite well. > > > > For advancing callbacks, that is the one! For invocation of > > callbacks, see rcu_do_batch(). > > Ok. > > > > It's more like couldn't ever stop the tick. But that doesn't > > > concern mainline. This is because I have a hook that prevents the > > > tick from beeing stopped until rcu_pending() == 0. > > > > That would certainly change behavior!!! Why did you need to do that? > > > > Ah, because force_quiescent_state() has not yet been taught about > > dyntick-HPC, got it... > > Oh actually I have taught it about that. For such isolated CPU that > doesn't respond, it sends a specific IPI that will restart the tick if > we are not in nohz. > > The point in restarting the tick is to find some quiescent states and > also to keep the tick for a little while for potential further grace > periods to complete while we are in the kernel. > > This is why I use rcu_pending() from the tick: to check if we still > need the tick for rcu. So if you have received the IPI, then you turn on the tick. As soon as rcu_pending() returns false, you can turn off the tick. Once the tick is off, you can go back to using rcu_needs_cpu(), right? (At least until that CPU receives another RCU IPI.) Either that or you need to check for both rcu_needs_cpu() -and- rcu_pending(). Actually, you do need to check both during the time that you have the tick on after receiving an RCU IPI. > > > In mainline it doesn't prevent the CPU from going nohz idle though, > > > because the softirq is armed from the tick. Once the softirq is > > > processed, the CPU can go to sleep. On the next timer tick it would > > > again raise the softirq and then could again go to sleep, etc.. > > > > You lost me on this one. If the CPU goes to sleep (AKA enters > > dyntick-idle mode, right?), then there wouldn't be a next timer tick, > > right? > > If there is a timer queued (timer_list or hrtimer), then the next timer > tick is programmed to fire for the next timer. Until then the CPU can > go to sleep and it will be woken up on that next timer interrupt. OK. > > > I still have a trace of that, with my rcu_pending() hook, in > > > dyntick-hpc, that kept > > > returning 1 during at least 100 seconds and on each tick. > > > I did not go really further into this from my code as I immediately > > > switched to tip:master > > > to check if the problem came from my code or not. > > > And then I discovered that rcu_pending() indeed kept returning 1 > > > for some while in mainline (don't remember how much could be "some > > > while" though), I saw all these > > > spurious rcu softirq at each ticks caused by rcu_pending() and for > > > random time slices: > > > probably between a wake up from idle and the next grace period, if > > > my theory is right, and I > > > think that happened likely with bh flavour probably because it's > > > subject to less grace periods. > > > > > > And this is what the second patch fixes in mainline and that also > > > seems to fix my issue in > > > dyntick-hpc. > > > > > > Probably it happened more easily on dynctick-hpc as I was calling > > > rcu_pending() after > > > calling rcu_enter_nohz() (some buggy part of mine). > > > > OK, but that is why dyntick-idle is governed by rcu_needs_cpu() rather > > than rcu_pending(). But yes, need to upgrade force_quiescent_state(). > > > > One hacky way to do that would be to replace smp_send_reschedule() > > with an smp_call_function_single() that invoked something like the > > following on the target CPU: > > > > static void rcu_poke_cpu(void *info) > > { > > raise_softirq(RCU_SOFTIRQ); > > } > > > > So rcu_implicit_offline_qs() does something like the following in > > place of the smp_send_reschedule(): > > > > smp_call_function_single(rdp->cpu, rcu_poke_cpu, NULL, 0); > > > > The call to set_need_resched() can remain as is. > > > > Of course, a mainline version would need to be a bit more discerning, > > but this should do work just fine for your experimental use. > > > > This should allow you to revert back to rcu_needs_cpu(). > > > > Or am I missing something here? > > So, as I explained above I'm currently using such an alternate IPI. But > raising the softirq would only take care of: > > * checking if there is a new grace period (rearm rdp->qs_pending and so) > * take care of callbacks > > But it's not enough to track quiescent states. And we have no more timer > interrupts to track them. So we need to restart the tick at least until > we find a quiescent state for the grace period waiting for us. > > But I may be missing something either :) Well, I don't know whether or not you are missing it, but I do need to get my act together and get RCU priority boosting ported from tiny RCU to tree RCU. Otherwise, force_quiescent_state() cannot do much for preemptible RCU. > > > Ah, I see what you mean. So you would suggest to even ignore those > > > explicit QS report when in dynticj-hpc mode for CPUs that don't > > > have callbacks? > > > > > > Why not keeping them? > > > > My belief is that people needing dyntick-HPC are OK with RCU grace > > periods taking a few jiffies longer than they might otherwise. > > Besides, when you are running dyntick-HPC, you aren't context > > switching much, so keeping the tick doesn't buy you as much reduction > > in grace-period latency. > > But don't we still need the tick on such cases, (if we aren't in > userspace) when a grace period starts, to note our grace periods? > The rcu IPI itself doesn't seem to be sufficient for that. > > I'm not sure I undrstand what you mean. If there is a grace period, there must be an RCU callback. The CPU that has the RCU callback queued will keep its own tick going, because rcu_needs_cpu() will return true. The reason for ignoring the explicit QS reports in at least some of the cases is that paying attention to them requires that the tick be enabled. Which is fine for most workloads, but not so good for workloads that care about OS jitter. Thanx, Paul ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] rcu: Don't chase unnecessary quiescent states after extended grace periods 2010-11-25 14:56 ` Paul E. McKenney @ 2010-11-26 14:06 ` Frederic Weisbecker 2010-11-29 23:06 ` Paul E. McKenney 0 siblings, 1 reply; 27+ messages in thread From: Frederic Weisbecker @ 2010-11-26 14:06 UTC (permalink / raw) To: Paul E. McKenney Cc: LKML, Lai Jiangshan, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Steven Rostedt On Thu, Nov 25, 2010 at 06:56:32AM -0800, Paul E. McKenney wrote: > On Wed, Nov 24, 2010 at 11:42:57PM +0100, Frederic Weisbecker wrote: > > Le Wed, 24 Nov 2010 10:20:51 -0800, > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> a écrit : > > > > > On Wed, Nov 24, 2010 at 06:38:45PM +0100, Frederic Weisbecker wrote: > > > > Yeah. I mean, I need to read how the code manages the different > > > > queues. But __rcu_process_gp_end() seems to sum it up quite well. > > > > > > For advancing callbacks, that is the one! For invocation of > > > callbacks, see rcu_do_batch(). > > > > Ok. > > > > > > It's more like couldn't ever stop the tick. But that doesn't > > > > concern mainline. This is because I have a hook that prevents the > > > > tick from beeing stopped until rcu_pending() == 0. > > > > > > That would certainly change behavior!!! Why did you need to do that? > > > > > > Ah, because force_quiescent_state() has not yet been taught about > > > dyntick-HPC, got it... > > > > Oh actually I have taught it about that. For such isolated CPU that > > doesn't respond, it sends a specific IPI that will restart the tick if > > we are not in nohz. > > > > The point in restarting the tick is to find some quiescent states and > > also to keep the tick for a little while for potential further grace > > periods to complete while we are in the kernel. > > > > This is why I use rcu_pending() from the tick: to check if we still > > need the tick for rcu. > > So if you have received the IPI, then you turn on the tick. As soon as > rcu_pending() returns false, you can turn off the tick. Once the tick > is off, you can go back to using rcu_needs_cpu(), right? (At least > until that CPU receives another RCU IPI.) Yep. Although there is already a test for rcu_needs_cpu() in tick_nohz_stop_sched_tick() that cancels the switch to nohz mode. > Either that or you need to check for both rcu_needs_cpu() -and- > rcu_pending(). Actually, you do need to check both during the time that > you have the tick on after receiving an RCU IPI. Yeah. > > But it's not enough to track quiescent states. And we have no more timer > > interrupts to track them. So we need to restart the tick at least until > > we find a quiescent state for the grace period waiting for us. > > > > But I may be missing something either :) > > Well, I don't know whether or not you are missing it, but I do need to > get my act together and get RCU priority boosting ported from tiny > RCU to tree RCU. Otherwise, force_quiescent_state() cannot do much > for preemptible RCU. And if I understood it correctly, RCU priority boosting involves a new kthread that handles the callbacks instead of a softirq? So that you can give dynamic priority to this thread and so? How will that help in the preemptibe RCU case to force the quiescent state? Scheduling the kthread doesn't mean that every tasks that were preempted have been rescheduled and exited their rcu_read_unlock(), so I guess you plan another trickery :) > > > > Ah, I see what you mean. So you would suggest to even ignore those > > > > explicit QS report when in dynticj-hpc mode for CPUs that don't > > > > have callbacks? > > > > > > > > Why not keeping them? > > > > > > My belief is that people needing dyntick-HPC are OK with RCU grace > > > periods taking a few jiffies longer than they might otherwise. > > > Besides, when you are running dyntick-HPC, you aren't context > > > switching much, so keeping the tick doesn't buy you as much reduction > > > in grace-period latency. > > > > But don't we still need the tick on such cases, (if we aren't in > > userspace) when a grace period starts, to note our grace periods? > > The rcu IPI itself doesn't seem to be sufficient for that. > > > > I'm not sure I undrstand what you mean. > > If there is a grace period, there must be an RCU callback. The CPU that > has the RCU callback queued will keep its own tick going, because > rcu_needs_cpu() will return true. > > The reason for ignoring the explicit QS reports in at least some of the > cases is that paying attention to them requires that the tick be enabled. > Which is fine for most workloads, but not so good for workloads that care > about OS jitter. Is there another solution than restarting the tick a local CPU when we need a quiescent state from it? The simple IPI is not enough to ensure we find a decent grace period, in that case we need to restart the tick anyway if we are in the kernel, no? Enjoy your turkey(s) ;-) ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/2] rcu: Don't chase unnecessary quiescent states after extended grace periods 2010-11-26 14:06 ` Frederic Weisbecker @ 2010-11-29 23:06 ` Paul E. McKenney 0 siblings, 0 replies; 27+ messages in thread From: Paul E. McKenney @ 2010-11-29 23:06 UTC (permalink / raw) To: Frederic Weisbecker Cc: LKML, Lai Jiangshan, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Steven Rostedt On Fri, Nov 26, 2010 at 03:06:43PM +0100, Frederic Weisbecker wrote: > On Thu, Nov 25, 2010 at 06:56:32AM -0800, Paul E. McKenney wrote: > > On Wed, Nov 24, 2010 at 11:42:57PM +0100, Frederic Weisbecker wrote: > > > Le Wed, 24 Nov 2010 10:20:51 -0800, > > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> a écrit : > > > > > > > On Wed, Nov 24, 2010 at 06:38:45PM +0100, Frederic Weisbecker wrote: > > > > > Yeah. I mean, I need to read how the code manages the different > > > > > queues. But __rcu_process_gp_end() seems to sum it up quite well. > > > > > > > > For advancing callbacks, that is the one! For invocation of > > > > callbacks, see rcu_do_batch(). > > > > > > Ok. > > > > > > > > It's more like couldn't ever stop the tick. But that doesn't > > > > > concern mainline. This is because I have a hook that prevents the > > > > > tick from beeing stopped until rcu_pending() == 0. > > > > > > > > That would certainly change behavior!!! Why did you need to do that? > > > > > > > > Ah, because force_quiescent_state() has not yet been taught about > > > > dyntick-HPC, got it... > > > > > > Oh actually I have taught it about that. For such isolated CPU that > > > doesn't respond, it sends a specific IPI that will restart the tick if > > > we are not in nohz. > > > > > > The point in restarting the tick is to find some quiescent states and > > > also to keep the tick for a little while for potential further grace > > > periods to complete while we are in the kernel. > > > > > > This is why I use rcu_pending() from the tick: to check if we still > > > need the tick for rcu. > > > > So if you have received the IPI, then you turn on the tick. As soon as > > rcu_pending() returns false, you can turn off the tick. Once the tick > > is off, you can go back to using rcu_needs_cpu(), right? (At least > > until that CPU receives another RCU IPI.) > > Yep. Although there is already a test for rcu_needs_cpu() in > tick_nohz_stop_sched_tick() that cancels the switch to nohz mode. Sounds plausible. ;-) > > Either that or you need to check for both rcu_needs_cpu() -and- > > rcu_pending(). Actually, you do need to check both during the time that > > you have the tick on after receiving an RCU IPI. > > Yeah. Before I forget (again!) -- my thinking over the past few months has been in terms of making rcu_pending() simpler by permitting more false positives. This makes sense if the only penalty is a needless softirq. I am guessing that this sort of change would be a bad idea from your viewpoint, but thought I should check. > > > But it's not enough to track quiescent states. And we have no more timer > > > interrupts to track them. So we need to restart the tick at least until > > > we find a quiescent state for the grace period waiting for us. > > > > > > But I may be missing something either :) > > > > Well, I don't know whether or not you are missing it, but I do need to > > get my act together and get RCU priority boosting ported from tiny > > RCU to tree RCU. Otherwise, force_quiescent_state() cannot do much > > for preemptible RCU. > > And if I understood it correctly, RCU priority boosting involves a new > kthread that handles the callbacks instead of a softirq? So that you > can give dynamic priority to this thread and so? Yes, there will be a new kthread, and yes, because this allows its priority to be adjusted independently of non-RCU stuff. My current design (and code, in the case of Tiny RCU) runs this at a fixed real-time priority. This priority can be adjusted via the chrt command, if desired, though you can of course shoot yourself in the foot quite impressively if you do this carelessly. > How will that help in the preemptibe RCU case to force the quiescent > state? Scheduling the kthread doesn't mean that every tasks that were > preempted have been rescheduled and exited their rcu_read_unlock(), so > I guess you plan another trickery :) There is one kthread per CPU, so that there can be lock-free interaction similar to that between mainline and the RCU softirq. ;-) But in the meantime, your approach is good for experimental purposes. And might well be the way it needs to happen longer term, for all I know. > > > > > Ah, I see what you mean. So you would suggest to even ignore those > > > > > explicit QS report when in dynticj-hpc mode for CPUs that don't > > > > > have callbacks? > > > > > > > > > > Why not keeping them? > > > > > > > > My belief is that people needing dyntick-HPC are OK with RCU grace > > > > periods taking a few jiffies longer than they might otherwise. > > > > Besides, when you are running dyntick-HPC, you aren't context > > > > switching much, so keeping the tick doesn't buy you as much reduction > > > > in grace-period latency. > > > > > > But don't we still need the tick on such cases, (if we aren't in > > > userspace) when a grace period starts, to note our grace periods? > > > The rcu IPI itself doesn't seem to be sufficient for that. > > > > > > I'm not sure I undrstand what you mean. > > > > If there is a grace period, there must be an RCU callback. The CPU that > > has the RCU callback queued will keep its own tick going, because > > rcu_needs_cpu() will return true. > > > > The reason for ignoring the explicit QS reports in at least some of the > > cases is that paying attention to them requires that the tick be enabled. > > Which is fine for most workloads, but not so good for workloads that care > > about OS jitter. > > Is there another solution than restarting the tick a local CPU when we need > a quiescent state from it? The simple IPI is not enough to ensure we find a > decent grace period, in that case we need to restart the tick anyway if we > are in the kernel, no? It depends on the flavor of RCU. RCU-sched and RCU-bh are taken care of if the IPI handler does a set_need_resched() followed by a raise_softirq(). Of course, perhaps set_need_resched() ends up restarting the tick, depending on the order of checks in your schedule() path. But it doesn't need to. RCU's force_quiescent_state() would resend the IPI periodically, which would force the CPU through its state machine. The set_need_resched() would force a quiescent state, and the IPIs following that would push RCU through the steps needed to note that the corresponding CPU was no longer blocking the grace period. Preemptible RCU works similarly, unless there is a low-priority task that is being indefinitely preempted in an RCU read-side critical section. In that case, the low-priority task is in need of priority boosting. Or am I still missing something? > Enjoy your turkey(s) ;-) I did, thank you! Only one turkey, though. You must be confusing me with my 30-years-ago self. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 2/2] rcu: Stop checking quiescent states after grace period completion from remote 2010-11-24 0:31 [PATCH 0/2] rcu: Fix series of spurious RCU softirqs Frederic Weisbecker 2010-11-24 0:31 ` [PATCH 1/2] rcu: Don't chase unnecessary quiescent states after extended grace periods Frederic Weisbecker @ 2010-11-24 0:31 ` Frederic Weisbecker 2010-11-24 1:03 ` Paul E. McKenney 2010-11-25 3:42 ` [PATCH 0/2] rcu: Fix series of spurious RCU softirqs Lai Jiangshan 2 siblings, 1 reply; 27+ messages in thread From: Frederic Weisbecker @ 2010-11-24 0:31 UTC (permalink / raw) To: Paul E. McKenney Cc: LKML, Frederic Weisbecker, Paul E. McKenney, Lai Jiangshan, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Steven Rostedt After a CPU starts to chase its quiescent states by setting rdp->qs_pending to 1, it can still enter into an extended quiescent state and then another CPU will take care of this and complete the grace period if necessary. rcu_report_qs_rdp() currently doesn't handle well this case and considers it must try later to notify its quiescent state. However if the last grace period has been completed there is nothing left to do for the current CPU. It means that until a next grace period starts, the CPU that runs into that case will keep chasing its own quiescent states by raising a softirq on every tick for no good reason. This can take a while before a new grace period starts and this time slice is covered by spurious softirqs and other kinds of rcu checks. Fix this by resetting rdp->qs_pending if the last grace period has been completed by a remote CPU while we were in an extended quiescent state. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> --- kernel/rcutree.c | 9 +++++++++ 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/kernel/rcutree.c b/kernel/rcutree.c index 5f038a1..f287eaa 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -937,6 +937,15 @@ rcu_report_qs_rdp(int cpu, struct rcu_state *rsp, struct rcu_data *rdp, long las * race occurred. */ rdp->passed_quiesc = 0; /* try again later! */ + + /* + * Another CPU may have taken care of us if we were in an + * extended quiescent state, in which case we don't need + * to continue to track anything. + */ + if (rnp->gpnum == rnp->completed) + rdp->qs_pending = 0; + raw_spin_unlock_irqrestore(&rnp->lock, flags); return; } -- 1.7.1 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 2/2] rcu: Stop checking quiescent states after grace period completion from remote 2010-11-24 0:31 ` [PATCH 2/2] rcu: Stop checking quiescent states after grace period completion from remote Frederic Weisbecker @ 2010-11-24 1:03 ` Paul E. McKenney 0 siblings, 0 replies; 27+ messages in thread From: Paul E. McKenney @ 2010-11-24 1:03 UTC (permalink / raw) To: Frederic Weisbecker Cc: LKML, Lai Jiangshan, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Steven Rostedt On Wed, Nov 24, 2010 at 01:31:13AM +0100, Frederic Weisbecker wrote: > After a CPU starts to chase its quiescent states by setting > rdp->qs_pending to 1, it can still enter into an extended > quiescent state and then another CPU will take care of this > and complete the grace period if necessary. > > rcu_report_qs_rdp() currently doesn't handle well this case > and considers it must try later to notify its quiescent state. > > However if the last grace period has been completed there is > nothing left to do for the current CPU. > > It means that until a next grace period starts, the CPU that > runs into that case will keep chasing its own quiescent states > by raising a softirq on every tick for no good reason. > > This can take a while before a new grace period starts and > this time slice is covered by spurious softirqs and other > kinds of rcu checks. > > Fix this by resetting rdp->qs_pending if the last grace > period has been completed by a remote CPU while we were > in an extended quiescent state. This one looks very good, at least at first glance!!! Queued. Thanx, Paul > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > Cc: Lai Jiangshan <laijs@cn.fujitsu.com> > Cc: Ingo Molnar <mingo@elte.hu> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> > Cc: Steven Rostedt <rostedt@goodmis.org> > --- > kernel/rcutree.c | 9 +++++++++ > 1 files changed, 9 insertions(+), 0 deletions(-) > > diff --git a/kernel/rcutree.c b/kernel/rcutree.c > index 5f038a1..f287eaa 100644 > --- a/kernel/rcutree.c > +++ b/kernel/rcutree.c > @@ -937,6 +937,15 @@ rcu_report_qs_rdp(int cpu, struct rcu_state *rsp, struct rcu_data *rdp, long las > * race occurred. > */ > rdp->passed_quiesc = 0; /* try again later! */ > + > + /* > + * Another CPU may have taken care of us if we were in an > + * extended quiescent state, in which case we don't need > + * to continue to track anything. > + */ > + if (rnp->gpnum == rnp->completed) > + rdp->qs_pending = 0; > + > raw_spin_unlock_irqrestore(&rnp->lock, flags); > return; > } > -- > 1.7.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/2] rcu: Fix series of spurious RCU softirqs 2010-11-24 0:31 [PATCH 0/2] rcu: Fix series of spurious RCU softirqs Frederic Weisbecker 2010-11-24 0:31 ` [PATCH 1/2] rcu: Don't chase unnecessary quiescent states after extended grace periods Frederic Weisbecker 2010-11-24 0:31 ` [PATCH 2/2] rcu: Stop checking quiescent states after grace period completion from remote Frederic Weisbecker @ 2010-11-25 3:42 ` Lai Jiangshan 2010-11-25 7:38 ` Frederic Weisbecker 2 siblings, 1 reply; 27+ messages in thread From: Lai Jiangshan @ 2010-11-25 3:42 UTC (permalink / raw) To: Frederic Weisbecker Cc: Paul E. McKenney, LKML, Thomas Gleixner, Peter Zijlstra, Steven Rostedt, Ingo Molnar On 11/24/2010 08:31 AM, Frederic Weisbecker wrote: > Hi, > > I've observed some not so unfrequent series of spurious rcu > softirqs, sometimes happening at each ticks for a random > while. > > These patches aims at fixing them. > > Thanks. > > Frederic Weisbecker (2): > rcu: Don't chase unnecessary quiescent states after extended grace periods > rcu: Stop checking quiescent states after grace period completion from remote > If we ensure rdp->gpnum >= rdp->completed is always true, the problems as you described will not be existed. Or maybe I misunderstand you. rdp->gpnum >= rdp->completed is a very important guarantee I think. (In my RCURING, it is guaranteed.) I'm afraid there are some other problems still hidden if it is not guaranteed. so I recommend: (code is better than words) diff --git a/kernel/rcutree.c b/kernel/rcutree.c index d5bc439..af4e87a 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -648,6 +648,13 @@ __rcu_process_gp_end(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_dat /* Remember that we saw this grace-period completion. */ rdp->completed = rnp->completed; + + /* Ensure ->gpnum >= ->completed after NO_HZ */ + if (unlikely(rnp->completed - rdp->gpnum > 0 + || rdp->gpnum - rnp->gpnum > 0)) { + rdp->gpnum = rnp->completed; + rdp->qs_pending = 0; + } } } ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 0/2] rcu: Fix series of spurious RCU softirqs 2010-11-25 3:42 ` [PATCH 0/2] rcu: Fix series of spurious RCU softirqs Lai Jiangshan @ 2010-11-25 7:38 ` Frederic Weisbecker 2010-11-25 8:35 ` Lai Jiangshan 0 siblings, 1 reply; 27+ messages in thread From: Frederic Weisbecker @ 2010-11-25 7:38 UTC (permalink / raw) To: Lai Jiangshan Cc: Paul E. McKenney, LKML, Thomas Gleixner, Peter Zijlstra, Steven Rostedt, Ingo Molnar On Thu, Nov 25, 2010 at 11:42:34AM +0800, Lai Jiangshan wrote: > On 11/24/2010 08:31 AM, Frederic Weisbecker wrote: > > Hi, > > > > I've observed some not so unfrequent series of spurious rcu > > softirqs, sometimes happening at each ticks for a random > > while. > > > > These patches aims at fixing them. > > > > Thanks. > > > > Frederic Weisbecker (2): > > rcu: Don't chase unnecessary quiescent states after extended grace periods > > rcu: Stop checking quiescent states after grace period completion from remote > > > > If we ensure rdp->gpnum >= rdp->completed is always true, the problems as > you described will not be existed. Or maybe I misunderstand you. > > rdp->gpnum >= rdp->completed is a very important guarantee I think. > (In my RCURING, it is guaranteed.) I'm afraid there are some other > problems still hidden if it is not guaranteed. > > so I recommend: (code is better than words) > > diff --git a/kernel/rcutree.c b/kernel/rcutree.c > index d5bc439..af4e87a 100644 > --- a/kernel/rcutree.c > +++ b/kernel/rcutree.c > @@ -648,6 +648,13 @@ __rcu_process_gp_end(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_dat > > /* Remember that we saw this grace-period completion. */ > rdp->completed = rnp->completed; > + > + /* Ensure ->gpnum >= ->completed after NO_HZ */ > + if (unlikely(rnp->completed - rdp->gpnum > 0 > + || rdp->gpnum - rnp->gpnum > 0)) { > + rdp->gpnum = rnp->completed; > + rdp->qs_pending = 0; That's an alternative to my first patch yeah. And if rdp->gpnum >= rdp->completed must be a guarantee outside the rnp lock, then it's certainly better because the lock is relaxed between rcu_process_gp_end() and note_new_gpnum(), and both values are async in this lockless frame. But perhaps this shouldn't touch rdp->qs_pending: "if (rnp->completed > rdp->gpnum || rdp->gpnum > rnp->gpnum)" is not a guarantee that we don't need to find quiescent states. but rnp->completed == rnp->gpnum would provide that guarantee. That said, note_new_gp_new() would fix the value of rdp->qs_pending. Thanks. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/2] rcu: Fix series of spurious RCU softirqs 2010-11-25 7:38 ` Frederic Weisbecker @ 2010-11-25 8:35 ` Lai Jiangshan 2010-11-25 9:27 ` Frederic Weisbecker 0 siblings, 1 reply; 27+ messages in thread From: Lai Jiangshan @ 2010-11-25 8:35 UTC (permalink / raw) To: Frederic Weisbecker Cc: Paul E. McKenney, LKML, Thomas Gleixner, Peter Zijlstra, Steven Rostedt, Ingo Molnar On 11/25/2010 03:38 PM, Frederic Weisbecker wrote: > On Thu, Nov 25, 2010 at 11:42:34AM +0800, Lai Jiangshan wrote: >> On 11/24/2010 08:31 AM, Frederic Weisbecker wrote: >>> Hi, >>> >>> I've observed some not so unfrequent series of spurious rcu >>> softirqs, sometimes happening at each ticks for a random >>> while. >>> >>> These patches aims at fixing them. >>> >>> Thanks. >>> >>> Frederic Weisbecker (2): >>> rcu: Don't chase unnecessary quiescent states after extended grace periods >>> rcu: Stop checking quiescent states after grace period completion from remote >>> >> >> If we ensure rdp->gpnum >= rdp->completed is always true, the problems as >> you described will not be existed. Or maybe I misunderstand you. >> >> rdp->gpnum >= rdp->completed is a very important guarantee I think. >> (In my RCURING, it is guaranteed.) I'm afraid there are some other >> problems still hidden if it is not guaranteed. >> >> so I recommend: (code is better than words) >> >> diff --git a/kernel/rcutree.c b/kernel/rcutree.c >> index d5bc439..af4e87a 100644 >> --- a/kernel/rcutree.c >> +++ b/kernel/rcutree.c >> @@ -648,6 +648,13 @@ __rcu_process_gp_end(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_dat >> >> /* Remember that we saw this grace-period completion. */ >> rdp->completed = rnp->completed; >> + >> + /* Ensure ->gpnum >= ->completed after NO_HZ */ >> + if (unlikely(rnp->completed - rdp->gpnum > 0 >> + || rdp->gpnum - rnp->gpnum > 0)) { >> + rdp->gpnum = rnp->completed; >> + rdp->qs_pending = 0; > > > That's an alternative to my first patch yeah. Since rdp->gpnum >= rdp->completed is guaranteed. your second patch is not needed, the problem is also fixed. if rnp->gpnum == rnp->completed, rcu_report_qs_rdp() will not be called. it is because rdp->qs_pending == 0 when rnp->gpnum == rnp->completed. And if rdp->gpnum >= rdp->completed > must be a guarantee outside the rnp lock, then it's certainly better because > the lock is relaxed between rcu_process_gp_end() and note_new_gpnum(), and > both values are async in this lockless frame. > > But perhaps this shouldn't touch rdp->qs_pending: if rdp->gpnum == rnp->completed, it means we don't need a qs for rdp->gpnum, it is completed. so we must set rdp->qs_pending = 0; when we really need a qs, rdp->qs_pending will be fixed in note_new_gp_new(). > > "if (rnp->completed > rdp->gpnum || rdp->gpnum > rnp->gpnum)" is not > a guarantee that we don't need to find quiescent states. > > but rnp->completed == rnp->gpnum would provide that guarantee. > That said, note_new_gp_new() would fix the value of rdp->qs_pending. > > Thanks. > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/2] rcu: Fix series of spurious RCU softirqs 2010-11-25 8:35 ` Lai Jiangshan @ 2010-11-25 9:27 ` Frederic Weisbecker 2010-11-25 14:58 ` Paul E. McKenney 0 siblings, 1 reply; 27+ messages in thread From: Frederic Weisbecker @ 2010-11-25 9:27 UTC (permalink / raw) To: Lai Jiangshan Cc: Paul E. McKenney, LKML, Thomas Gleixner, Peter Zijlstra, Steven Rostedt, Ingo Molnar On Thu, Nov 25, 2010 at 04:35:30PM +0800, Lai Jiangshan wrote: > On 11/25/2010 03:38 PM, Frederic Weisbecker wrote: > > On Thu, Nov 25, 2010 at 11:42:34AM +0800, Lai Jiangshan wrote: > >> On 11/24/2010 08:31 AM, Frederic Weisbecker wrote: > >>> Hi, > >>> > >>> I've observed some not so unfrequent series of spurious rcu > >>> softirqs, sometimes happening at each ticks for a random > >>> while. > >>> > >>> These patches aims at fixing them. > >>> > >>> Thanks. > >>> > >>> Frederic Weisbecker (2): > >>> rcu: Don't chase unnecessary quiescent states after extended grace periods > >>> rcu: Stop checking quiescent states after grace period completion from remote > >>> > >> > >> If we ensure rdp->gpnum >= rdp->completed is always true, the problems as > >> you described will not be existed. Or maybe I misunderstand you. > >> > >> rdp->gpnum >= rdp->completed is a very important guarantee I think. > >> (In my RCURING, it is guaranteed.) I'm afraid there are some other > >> problems still hidden if it is not guaranteed. > >> > >> so I recommend: (code is better than words) > >> > >> diff --git a/kernel/rcutree.c b/kernel/rcutree.c > >> index d5bc439..af4e87a 100644 > >> --- a/kernel/rcutree.c > >> +++ b/kernel/rcutree.c > >> @@ -648,6 +648,13 @@ __rcu_process_gp_end(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_dat > >> > >> /* Remember that we saw this grace-period completion. */ > >> rdp->completed = rnp->completed; > >> + > >> + /* Ensure ->gpnum >= ->completed after NO_HZ */ > >> + if (unlikely(rnp->completed - rdp->gpnum > 0 > >> + || rdp->gpnum - rnp->gpnum > 0)) { > >> + rdp->gpnum = rnp->completed; > >> + rdp->qs_pending = 0; > > > > > > That's an alternative to my first patch yeah. > > Since rdp->gpnum >= rdp->completed is guaranteed. > your second patch is not needed, the problem is also fixed. > > if rnp->gpnum == rnp->completed, rcu_report_qs_rdp() will not be called. > it is because rdp->qs_pending == 0 when rnp->gpnum == rnp->completed. Aaah... > > And if rdp->gpnum >= rdp->completed > > must be a guarantee outside the rnp lock, then it's certainly better because > > the lock is relaxed between rcu_process_gp_end() and note_new_gpnum(), and > > both values are async in this lockless frame. > > > > But perhaps this shouldn't touch rdp->qs_pending: > > if rdp->gpnum == rnp->completed, it means we don't need a qs for rdp->gpnum, > it is completed. so we must set rdp->qs_pending = 0; > > when we really need a qs, rdp->qs_pending will be fixed in note_new_gp_new(). Ok that makes all sense now! I'm just not sure about your check above. (rdp->gpnum - rnp->gpnum > 0) can never happen, right? Also perhaps we should set rdp->qs_pending = 0 only if rnp->completed == rdp->completed? Which in the end would be: /* Remember that we saw this grace-period completion. */ rdp->completed = rnp->completed; + if (rdp->gpnum < rdp->completed) + rdp->gpnum = rdp->completed; + + if (rdp->gpnum == rdp->completed) + rdp->qs_pending = 0; And then if there is a new grace period to handle, this will be done through note_new_pgnum(). Hm? ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/2] rcu: Fix series of spurious RCU softirqs 2010-11-25 9:27 ` Frederic Weisbecker @ 2010-11-25 14:58 ` Paul E. McKenney 0 siblings, 0 replies; 27+ messages in thread From: Paul E. McKenney @ 2010-11-25 14:58 UTC (permalink / raw) To: Frederic Weisbecker Cc: Lai Jiangshan, LKML, Thomas Gleixner, Peter Zijlstra, Steven Rostedt, Ingo Molnar On Thu, Nov 25, 2010 at 10:27:08AM +0100, Frederic Weisbecker wrote: > On Thu, Nov 25, 2010 at 04:35:30PM +0800, Lai Jiangshan wrote: > > On 11/25/2010 03:38 PM, Frederic Weisbecker wrote: > > > On Thu, Nov 25, 2010 at 11:42:34AM +0800, Lai Jiangshan wrote: > > >> On 11/24/2010 08:31 AM, Frederic Weisbecker wrote: > > >>> Hi, > > >>> > > >>> I've observed some not so unfrequent series of spurious rcu > > >>> softirqs, sometimes happening at each ticks for a random > > >>> while. > > >>> > > >>> These patches aims at fixing them. > > >>> > > >>> Thanks. > > >>> > > >>> Frederic Weisbecker (2): > > >>> rcu: Don't chase unnecessary quiescent states after extended grace periods > > >>> rcu: Stop checking quiescent states after grace period completion from remote > > >>> > > >> > > >> If we ensure rdp->gpnum >= rdp->completed is always true, the problems as > > >> you described will not be existed. Or maybe I misunderstand you. > > >> > > >> rdp->gpnum >= rdp->completed is a very important guarantee I think. > > >> (In my RCURING, it is guaranteed.) I'm afraid there are some other > > >> problems still hidden if it is not guaranteed. > > >> > > >> so I recommend: (code is better than words) > > >> > > >> diff --git a/kernel/rcutree.c b/kernel/rcutree.c > > >> index d5bc439..af4e87a 100644 > > >> --- a/kernel/rcutree.c > > >> +++ b/kernel/rcutree.c > > >> @@ -648,6 +648,13 @@ __rcu_process_gp_end(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_dat > > >> > > >> /* Remember that we saw this grace-period completion. */ > > >> rdp->completed = rnp->completed; > > >> + > > >> + /* Ensure ->gpnum >= ->completed after NO_HZ */ > > >> + if (unlikely(rnp->completed - rdp->gpnum > 0 > > >> + || rdp->gpnum - rnp->gpnum > 0)) { > > >> + rdp->gpnum = rnp->completed; > > >> + rdp->qs_pending = 0; > > > > > > > > > That's an alternative to my first patch yeah. > > > > Since rdp->gpnum >= rdp->completed is guaranteed. > > your second patch is not needed, the problem is also fixed. > > > > if rnp->gpnum == rnp->completed, rcu_report_qs_rdp() will not be called. > > it is because rdp->qs_pending == 0 when rnp->gpnum == rnp->completed. > > > Aaah... > > > > > > And if rdp->gpnum >= rdp->completed > > > must be a guarantee outside the rnp lock, then it's certainly better because > > > the lock is relaxed between rcu_process_gp_end() and note_new_gpnum(), and > > > both values are async in this lockless frame. > > > > > > But perhaps this shouldn't touch rdp->qs_pending: > > > > if rdp->gpnum == rnp->completed, it means we don't need a qs for rdp->gpnum, > > it is completed. so we must set rdp->qs_pending = 0; > > > > when we really need a qs, rdp->qs_pending will be fixed in note_new_gp_new(). > > > Ok that makes all sense now! > > I'm just not sure about your check above. > > (rdp->gpnum - rnp->gpnum > 0) can never happen, right? > > Also perhaps we should set rdp->qs_pending = 0 only if > rnp->completed == rdp->completed? > > Which in the end would be: > > /* Remember that we saw this grace-period completion. */ > rdp->completed = rnp->completed; > > + if (rdp->gpnum < rdp->completed) > + rdp->gpnum = rdp->completed; > + > + if (rdp->gpnum == rdp->completed) > + rdp->qs_pending = 0; > > > And then if there is a new grace period to handle, this will > be done through note_new_pgnum(). > > Hm? Given that it is Thanksgiving holiday here in USA, I am going to give you guys a several days to come to agreement on this. Then I will inspect the resulting patch. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2010-11-29 23:07 UTC | newest] Thread overview: 27+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-11-24 0:31 [PATCH 0/2] rcu: Fix series of spurious RCU softirqs Frederic Weisbecker 2010-11-24 0:31 ` [PATCH 1/2] rcu: Don't chase unnecessary quiescent states after extended grace periods Frederic Weisbecker 2010-11-24 0:58 ` Paul E. McKenney 2010-11-24 2:29 ` Frederic Weisbecker 2010-11-24 2:33 ` Frederic Weisbecker 2010-11-24 6:22 ` Paul E. McKenney 2010-11-24 13:48 ` Frederic Weisbecker 2010-11-24 14:42 ` Paul E. McKenney 2010-11-24 15:45 ` Frederic Weisbecker 2010-11-24 16:15 ` Paul E. McKenney 2010-11-24 17:38 ` Frederic Weisbecker 2010-11-24 18:20 ` Paul E. McKenney 2010-11-24 20:22 ` Paul E. McKenney 2010-11-24 20:45 ` Frederic Weisbecker 2010-11-24 21:19 ` Paul E. McKenney 2010-11-24 21:50 ` Frederic Weisbecker 2010-11-24 22:42 ` Frederic Weisbecker 2010-11-25 14:56 ` Paul E. McKenney 2010-11-26 14:06 ` Frederic Weisbecker 2010-11-29 23:06 ` Paul E. McKenney 2010-11-24 0:31 ` [PATCH 2/2] rcu: Stop checking quiescent states after grace period completion from remote Frederic Weisbecker 2010-11-24 1:03 ` Paul E. McKenney 2010-11-25 3:42 ` [PATCH 0/2] rcu: Fix series of spurious RCU softirqs Lai Jiangshan 2010-11-25 7:38 ` Frederic Weisbecker 2010-11-25 8:35 ` Lai Jiangshan 2010-11-25 9:27 ` Frederic Weisbecker 2010-11-25 14:58 ` Paul E. McKenney
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox