cond_resched() and RCU CPU stall warnings

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* cond_resched() and RCU CPU stall warnings
@ 2014-03-16  1:59 Paul E. McKenney
  2014-03-16  6:09 ` Mike Galbraith
  2014-03-17 10:13 ` Peter Zijlstra
  0 siblings, 2 replies; 14+ messages in thread
From: Paul E. McKenney @ 2014-03-16  1:59 UTC (permalink / raw)
  To: peterz, mingo, josh, laijs; +Cc: linux-kernel

So I have been tightening up rcutorture a bit over the past year.
The other day, I came across what looked like a great opportunity for
further tightening, namely the schedule() in rcu_torture_reader().
Why not turn this into a cond_resched(), speeding up the readers a bit
and placing more stress on RCU?

And boy does it increase stress!

Unfortunately, this increased stress sometimes shows up in the form of
lots of RCU CPU stall warnings.  These can appear when an instance of
rcu_torture_reader() gets a CPU to itself, in which case it won't ever
enter the scheduler, and RCU will never see a quiescent state from that
CPU, which means the grace period never ends.

So I am taking a more measured approach to cond_resched() in
rcu_torture_reader() for the moment.

But longer term, should cond_resched() imply a set of RCU
quiescent states?  One way to do this would be to add calls to
rcu_note_context_switch() in each of the various cond_resched() functions.
Easy change, but of course adds some overhead.  On the other hand,
there might be more than a few of the 500+ calls to cond_resched() that
expect that RCU CPU stalls will be prevented (to say nothing of
might_sleep() and cond_resched_lock()).

Thoughts?

(Untested patch below, FWIW.)

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b46131ef6aab..994d2b0fd0b2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4075,6 +4075,9 @@ int __sched _cond_resched(void)
 		__cond_resched();
 		return 1;
 	}
+	preempt_disable();
+	rcu_note_context_switch(smp_processor_id());
+	preempt_enable();
 	return 0;
 }
 EXPORT_SYMBOL(_cond_resched);

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: cond_resched() and RCU CPU stall warnings
  2014-03-16  1:59 cond_resched() and RCU CPU stall warnings Paul E. McKenney
@ 2014-03-16  6:09 ` Mike Galbraith
  2014-03-16  6:14   ` Mike Galbraith
  2014-03-16  6:25   ` Paul E. McKenney
  2014-03-17 10:13 ` Peter Zijlstra
  1 sibling, 2 replies; 14+ messages in thread
From: Mike Galbraith @ 2014-03-16  6:09 UTC (permalink / raw)
  To: paulmck; +Cc: peterz, mingo, josh, laijs, linux-kernel

On Sat, 2014-03-15 at 18:59 -0700, Paul E. McKenney wrote: 
> So I have been tightening up rcutorture a bit over the past year.
> The other day, I came across what looked like a great opportunity for
> further tightening, namely the schedule() in rcu_torture_reader().
> Why not turn this into a cond_resched(), speeding up the readers a bit
> and placing more stress on RCU?
> 
> And boy does it increase stress!
> 
> Unfortunately, this increased stress sometimes shows up in the form of
> lots of RCU CPU stall warnings.  These can appear when an instance of
> rcu_torture_reader() gets a CPU to itself, in which case it won't ever
> enter the scheduler, and RCU will never see a quiescent state from that
> CPU, which means the grace period never ends.
> 
> So I am taking a more measured approach to cond_resched() in
> rcu_torture_reader() for the moment.
> 
> But longer term, should cond_resched() imply a set of RCU
> quiescent states?  One way to do this would be to add calls to
> rcu_note_context_switch() in each of the various cond_resched() functions.
> Easy change, but of course adds some overhead.  On the other hand,
> there might be more than a few of the 500+ calls to cond_resched() that
> expect that RCU CPU stalls will be prevented (to say nothing of
> might_sleep() and cond_resched_lock()).
> 
> Thoughts?
> 
> (Untested patch below, FWIW.)
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index b46131ef6aab..994d2b0fd0b2 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4075,6 +4075,9 @@ int __sched _cond_resched(void)
>  		__cond_resched();
>  		return 1;
>  	}
> +	preempt_disable();
> +	rcu_note_context_switch(smp_processor_id());
> +	preempt_enable();
>  	return 0;
>  }
>  EXPORT_SYMBOL(_cond_resched);

Hm.  Since you only care about the case where your task is solo, how
about do racy checks, 100% accuracy isn't required is it?  Seems you
wouldn't want to unconditionally do that in tight loops.

-Mike


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cond_resched() and RCU CPU stall warnings
  2014-03-16  6:09 ` Mike Galbraith
@ 2014-03-16  6:14   ` Mike Galbraith
  2014-03-16  6:27     ` Paul E. McKenney
  2014-03-16  6:25   ` Paul E. McKenney
  1 sibling, 1 reply; 14+ messages in thread
From: Mike Galbraith @ 2014-03-16  6:14 UTC (permalink / raw)
  To: paulmck; +Cc: peterz, mingo, josh, laijs, linux-kernel

On Sun, 2014-03-16 at 07:09 +0100, Mike Galbraith wrote: 
> On Sat, 2014-03-15 at 18:59 -0700, Paul E. McKenney wrote: 

> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index b46131ef6aab..994d2b0fd0b2 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -4075,6 +4075,9 @@ int __sched _cond_resched(void)
> >  		__cond_resched();
> >  		return 1;
> >  	}
> > +	preempt_disable();
> > +	rcu_note_context_switch(smp_processor_id());
> > +	preempt_enable();
> >  	return 0;
> >  }
> >  EXPORT_SYMBOL(_cond_resched);
> 
> Hm.  Since you only care about the case where your task is solo, how
> about do racy checks, 100% accuracy isn't required is it?  Seems you
> wouldn't want to unconditionally do that in tight loops.

(or do that in scheduler_tick()?)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cond_resched() and RCU CPU stall warnings
  2014-03-16  6:09 ` Mike Galbraith
  2014-03-16  6:14   ` Mike Galbraith
@ 2014-03-16  6:25   ` Paul E. McKenney
  2014-03-16  7:30     ` Mike Galbraith
  1 sibling, 1 reply; 14+ messages in thread
From: Paul E. McKenney @ 2014-03-16  6:25 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: peterz, mingo, josh, laijs, linux-kernel

On Sun, Mar 16, 2014 at 07:09:42AM +0100, Mike Galbraith wrote:
> On Sat, 2014-03-15 at 18:59 -0700, Paul E. McKenney wrote: 
> > So I have been tightening up rcutorture a bit over the past year.
> > The other day, I came across what looked like a great opportunity for
> > further tightening, namely the schedule() in rcu_torture_reader().
> > Why not turn this into a cond_resched(), speeding up the readers a bit
> > and placing more stress on RCU?
> > 
> > And boy does it increase stress!
> > 
> > Unfortunately, this increased stress sometimes shows up in the form of
> > lots of RCU CPU stall warnings.  These can appear when an instance of
> > rcu_torture_reader() gets a CPU to itself, in which case it won't ever
> > enter the scheduler, and RCU will never see a quiescent state from that
> > CPU, which means the grace period never ends.
> > 
> > So I am taking a more measured approach to cond_resched() in
> > rcu_torture_reader() for the moment.
> > 
> > But longer term, should cond_resched() imply a set of RCU
> > quiescent states?  One way to do this would be to add calls to
> > rcu_note_context_switch() in each of the various cond_resched() functions.
> > Easy change, but of course adds some overhead.  On the other hand,
> > there might be more than a few of the 500+ calls to cond_resched() that
> > expect that RCU CPU stalls will be prevented (to say nothing of
> > might_sleep() and cond_resched_lock()).
> > 
> > Thoughts?
> > 
> > (Untested patch below, FWIW.)
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index b46131ef6aab..994d2b0fd0b2 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -4075,6 +4075,9 @@ int __sched _cond_resched(void)
> >  		__cond_resched();
> >  		return 1;
> >  	}
> > +	preempt_disable();
> > +	rcu_note_context_switch(smp_processor_id());
> > +	preempt_enable();
> >  	return 0;
> >  }
> >  EXPORT_SYMBOL(_cond_resched);
> 
> Hm.  Since you only care about the case where your task is solo, how
> about do racy checks, 100% accuracy isn't required is it?  Seems you
> wouldn't want to unconditionally do that in tight loops.

And indeed, my current workaround unconditionally does schedule() one
out of 256 loops.  I would do something similar here, perhaps based
on per-CPU counters, perhaps even with racy accesses to avoid always
doing preempt_disable()/preempt_enable().

Or did you have something else in mind?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cond_resched() and RCU CPU stall warnings
  2014-03-16  6:14   ` Mike Galbraith
@ 2014-03-16  6:27     ` Paul E. McKenney
  0 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2014-03-16  6:27 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: peterz, mingo, josh, laijs, linux-kernel

On Sun, Mar 16, 2014 at 07:14:15AM +0100, Mike Galbraith wrote:
> On Sun, 2014-03-16 at 07:09 +0100, Mike Galbraith wrote: 
> > On Sat, 2014-03-15 at 18:59 -0700, Paul E. McKenney wrote: 
> 
> > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > > index b46131ef6aab..994d2b0fd0b2 100644
> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -4075,6 +4075,9 @@ int __sched _cond_resched(void)
> > >  		__cond_resched();
> > >  		return 1;
> > >  	}
> > > +	preempt_disable();
> > > +	rcu_note_context_switch(smp_processor_id());
> > > +	preempt_enable();
> > >  	return 0;
> > >  }
> > >  EXPORT_SYMBOL(_cond_resched);
> > 
> > Hm.  Since you only care about the case where your task is solo, how
> > about do racy checks, 100% accuracy isn't required is it?  Seems you
> > wouldn't want to unconditionally do that in tight loops.
> 
> (or do that in scheduler_tick()?)

There are checks in scheduler_tick(), but they have to assume that
if they interrupt kernel execution, they might have also interrupted
an RCU read-side critical section.  And in fact, the point of having
cond_resched() (sometimes) do a quiescent state is to communicate with
the existing checks invoked from scheduler_tick().

							Thanx, Paul


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cond_resched() and RCU CPU stall warnings
  2014-03-16  6:25   ` Paul E. McKenney
@ 2014-03-16  7:30     ` Mike Galbraith
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Galbraith @ 2014-03-16  7:30 UTC (permalink / raw)
  To: paulmck; +Cc: peterz, mingo, josh, laijs, linux-kernel

On Sat, 2014-03-15 at 23:25 -0700, Paul E. McKenney wrote: 
> On Sun, Mar 16, 2014 at 07:09:42AM +0100, Mike Galbraith wrote:

> > Hm.  Since you only care about the case where your task is solo, how
> > about do racy checks, 100% accuracy isn't required is it?  Seems you
> > wouldn't want to unconditionally do that in tight loops.
> 
> And indeed, my current workaround unconditionally does schedule() one
> out of 256 loops.  I would do something similar here, perhaps based
> on per-CPU counters, perhaps even with racy accesses to avoid always
> doing preempt_disable()/preempt_enable().
> 
> Or did you have something else in mind?

Exactly what I meant, take a racy peek or two first.

-Mike


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cond_resched() and RCU CPU stall warnings
  2014-03-16  1:59 cond_resched() and RCU CPU stall warnings Paul E. McKenney
  2014-03-16  6:09 ` Mike Galbraith
@ 2014-03-17 10:13 ` Peter Zijlstra
  2014-03-17 16:58   ` Paul E. McKenney
  1 sibling, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2014-03-17 10:13 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: mingo, josh, laijs, linux-kernel

On Sat, Mar 15, 2014 at 06:59:14PM -0700, Paul E. McKenney wrote:
> So I have been tightening up rcutorture a bit over the past year.
> The other day, I came across what looked like a great opportunity for
> further tightening, namely the schedule() in rcu_torture_reader().
> Why not turn this into a cond_resched(), speeding up the readers a bit
> and placing more stress on RCU?
> 
> And boy does it increase stress!
> 
> Unfortunately, this increased stress sometimes shows up in the form of
> lots of RCU CPU stall warnings.  These can appear when an instance of
> rcu_torture_reader() gets a CPU to itself, in which case it won't ever
> enter the scheduler, and RCU will never see a quiescent state from that
> CPU, which means the grace period never ends.
> 
> So I am taking a more measured approach to cond_resched() in
> rcu_torture_reader() for the moment.
> 
> But longer term, should cond_resched() imply a set of RCU
> quiescent states?  One way to do this would be to add calls to
> rcu_note_context_switch() in each of the various cond_resched() functions.
> Easy change, but of course adds some overhead.  On the other hand,
> there might be more than a few of the 500+ calls to cond_resched() that
> expect that RCU CPU stalls will be prevented (to say nothing of
> might_sleep() and cond_resched_lock()).
> 
> Thoughts?

I share Mike's concern. Some of those functions might be too expensive
to do in the loops where we have the cond_resched()s. And while its only
strictly required when nr_running==1, keying off off that seems
unfortunate in that it makes things behave differently with a single
running task.

I suppose your proposed per-cpu counter is the best option; even though
its still an extra cacheline hit in cond_resched().

As to the other cond_resched() variants; they might be a little more
tricky, eg. cond_resched_lock() would have you drop the lock in order to
note the QS, etc.

So one thing that might make sense is to have something like
rcu_should_qs() which will indicate RCUs need for a grace period end.
Then we can augment the various should_resched()/spin_needbreak() etc.
with that condition.

That also gets rid of the counter (or at least hides it in the
implementation if RCU really can't do anything better).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cond_resched() and RCU CPU stall warnings
  2014-03-17 10:13 ` Peter Zijlstra
@ 2014-03-17 16:58   ` Paul E. McKenney
  2014-03-17 17:14     ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Paul E. McKenney @ 2014-03-17 16:58 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, josh, laijs, linux-kernel

On Mon, Mar 17, 2014 at 11:13:00AM +0100, Peter Zijlstra wrote:
> On Sat, Mar 15, 2014 at 06:59:14PM -0700, Paul E. McKenney wrote:
> > So I have been tightening up rcutorture a bit over the past year.
> > The other day, I came across what looked like a great opportunity for
> > further tightening, namely the schedule() in rcu_torture_reader().
> > Why not turn this into a cond_resched(), speeding up the readers a bit
> > and placing more stress on RCU?
> > 
> > And boy does it increase stress!
> > 
> > Unfortunately, this increased stress sometimes shows up in the form of
> > lots of RCU CPU stall warnings.  These can appear when an instance of
> > rcu_torture_reader() gets a CPU to itself, in which case it won't ever
> > enter the scheduler, and RCU will never see a quiescent state from that
> > CPU, which means the grace period never ends.
> > 
> > So I am taking a more measured approach to cond_resched() in
> > rcu_torture_reader() for the moment.
> > 
> > But longer term, should cond_resched() imply a set of RCU
> > quiescent states?  One way to do this would be to add calls to
> > rcu_note_context_switch() in each of the various cond_resched() functions.
> > Easy change, but of course adds some overhead.  On the other hand,
> > there might be more than a few of the 500+ calls to cond_resched() that
> > expect that RCU CPU stalls will be prevented (to say nothing of
> > might_sleep() and cond_resched_lock()).
> > 
> > Thoughts?
> 
> I share Mike's concern. Some of those functions might be too expensive
> to do in the loops where we have the cond_resched()s. And while its only
> strictly required when nr_running==1, keying off off that seems
> unfortunate in that it makes things behave differently with a single
> running task.
> 
> I suppose your proposed per-cpu counter is the best option; even though
> its still an extra cacheline hit in cond_resched().
> 
> As to the other cond_resched() variants; they might be a little more
> tricky, eg. cond_resched_lock() would have you drop the lock in order to
> note the QS, etc.
> 
> So one thing that might make sense is to have something like
> rcu_should_qs() which will indicate RCUs need for a grace period end.
> Then we can augment the various should_resched()/spin_needbreak() etc.
> with that condition.
> 
> That also gets rid of the counter (or at least hides it in the
> implementation if RCU really can't do anything better).

I did code up a version having a per-CPU bitmask indicating
which flavors of RCU needed quiescent states, and only invoking
rcu_note_context_switch() if at least one of the flavors needed
a quiescent state.  This implementation ended up being more
complex, but worse, slowed down the fast path.  Hard to beat
__this_cpu_inc_return()...  Might be able to break even with a bit more
work, but on most real systems there is pretty much always a grace period
in flight anyway, so it does not appear to be worth it.

So how about the following?  Passes moderate rcutorture testing, no RCU
CPU stall warnings despite cond_resched() in rcu_torture_reader().

							Thanx, Paul

------------------------------------------------------------------------

sched,rcu: Make cond_resched() report RCU quiescent states

Given a CPU running a loop containing cond_resched(), with no
other tasks runnable on that CPU, RCU will eventually report RCU
CPU stall warnings due to lack of quiescent states.  Fortunately,
every call to cond_resched() is a perfectly good quiescent state.
Unfortunately, invoking rcu_note_context_switch() is a bit heavyweight
for cond_resched(), especially given the need to disable preemption,
and, for RCU-preempt, interrupts as well.

This commit therefore maintains a per-CPU counter that causes
cond_resched(), cond_resched_lock(), and cond_resched_softirq() to call
rcu_note_context_switch(), but only about once per 256 invocations.
This ratio was chosen in keeping with the relative time constants of
RCU grace periods.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 3cea28c64ebe..8d64878111ea 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -44,6 +44,7 @@
 #include <linux/debugobjects.h>
 #include <linux/bug.h>
 #include <linux/compiler.h>
+#include <linux/percpu.h>
 #include <asm/barrier.h>
 
 extern int rcu_expedited; /* for sysctl */
@@ -287,6 +288,41 @@ bool __rcu_is_watching(void);
 #endif /* #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) || defined(CONFIG_SMP) */
 
 /*
+ * Hooks for cond_resched() and friends to avoid RCU CPU stall warnings.
+ */
+
+#define RCU_COND_RESCHED_LIM 256	/* ms vs. 100s of ms. */
+DECLARE_PER_CPU(int, rcu_cond_resched_count);
+void rcu_resched(void);
+
+/*
+ * Is it time to report RCU quiescent states?
+ *
+ * Note unsynchronized access to rcu_cond_resched_count.  Yes, we might
+ * increment some random CPU's count, and possibly also load the result from
+ * yet another CPU's count.  We might even clobber some other CPU's attempt
+ * to zero its counter.  This is all OK because the goal is not precision,
+ * but rather reasonable amortization of rcu_note_context_switch() overhead
+ * and extremely high probability of avoiding RCU CPU stall warnings.
+ * Note that this function has to be preempted in just the wrong place,
+ * many thousands of times in a row, for anything bad to happen.
+ */
+static inline bool rcu_should_resched(void)
+{
+	return __this_cpu_inc_return(rcu_cond_resched_count) >=
+	       RCU_COND_RESCHED_LIM;
+}
+
+/*
+ * Report quiscent states to RCU if it is time to do so.
+ */
+static inline void rcu_cond_resched(void)
+{
+	if (unlikely(rcu_should_resched()))
+		rcu_resched();
+}
+
+/*
  * Infrastructure to implement the synchronize_() primitives in
  * TREE_RCU and rcu_barrier_() primitives in TINY_RCU.
  */
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index 4c0a9b0af469..30eb6bb52be6 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -337,4 +337,22 @@ static int __init check_cpu_stall_init(void)
 }
 early_initcall(check_cpu_stall_init);
 
+/*
+ * Hooks for cond_resched() and friends to avoid RCU CPU stall warnings.
+ */
+
+DEFINE_PER_CPU(int, rcu_cond_resched_count);
+
+/*
+ * Report a set of RCU quiescent states, for use by cond_resched()
+ * and friends.  Out of line due to being called infrequently.
+ */
+void rcu_resched(void)
+{
+	preempt_disable();
+	__this_cpu_write(rcu_cond_resched_count, 0);
+	rcu_note_context_switch(smp_processor_id());
+	preempt_enable();
+}
+
 #endif /* #ifdef CONFIG_RCU_STALL_COMMON */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b46131ef6aab..b5c942a5f7ae 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4071,6 +4071,7 @@ static void __cond_resched(void)
 
 int __sched _cond_resched(void)
 {
+	rcu_cond_resched();
 	if (should_resched()) {
 		__cond_resched();
 		return 1;
@@ -4089,6 +4090,7 @@ EXPORT_SYMBOL(_cond_resched);
  */
 int __cond_resched_lock(spinlock_t *lock)
 {
+	bool need_rcu_resched = rcu_should_resched();
 	int resched = should_resched();
 	int ret = 0;
 
@@ -4098,6 +4100,8 @@ int __cond_resched_lock(spinlock_t *lock)
 		spin_unlock(lock);
 		if (resched)
 			__cond_resched();
+		else if (unlikely(need_rcu_resched))
+			rcu_resched();
 		else
 			cpu_relax();
 		ret = 1;
@@ -4111,6 +4115,7 @@ int __sched __cond_resched_softirq(void)
 {
 	BUG_ON(!in_softirq());
 
+	rcu_cond_resched();
 	if (should_resched()) {
 		local_bh_enable();
 		__cond_resched();


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: cond_resched() and RCU CPU stall warnings
  2014-03-17 16:58   ` Paul E. McKenney
@ 2014-03-17 17:14     ` Peter Zijlstra
  2014-03-18  2:17       ` Paul E. McKenney
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2014-03-17 17:14 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: mingo, josh, laijs, linux-kernel

On Mon, Mar 17, 2014 at 09:58:07AM -0700, Paul E. McKenney wrote:
> @@ -4111,6 +4115,7 @@ int __sched __cond_resched_softirq(void)
>  {
>  	BUG_ON(!in_softirq());
>  
> +	rcu_cond_resched();

Don't you have to enable BHs before doing that? And if not; that needs a
comment for sure! :-)

>  	if (should_resched()) {
>  		local_bh_enable();
>  		__cond_resched();
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cond_resched() and RCU CPU stall warnings
  2014-03-17 17:14     ` Peter Zijlstra
@ 2014-03-18  2:17       ` Paul E. McKenney
  2014-03-18  8:51         ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Paul E. McKenney @ 2014-03-18  2:17 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, josh, laijs, linux-kernel

On Mon, Mar 17, 2014 at 06:14:34PM +0100, Peter Zijlstra wrote:
> On Mon, Mar 17, 2014 at 09:58:07AM -0700, Paul E. McKenney wrote:
> > @@ -4111,6 +4115,7 @@ int __sched __cond_resched_softirq(void)
> >  {
> >  	BUG_ON(!in_softirq());
> >  
> > +	rcu_cond_resched();
> 
> Don't you have to enable BHs before doing that? And if not; that needs a
> comment for sure! :-)

No need to enable BHs, just RCU marking quiescent states.  But yes,
the name does look a bit scary in that context, doesn't it?  Added
a comment, please see below for updated patch.

> >  	if (should_resched()) {
> >  		local_bh_enable();
> >  		__cond_resched();
> > 

------------------------------------------------------------------------

torture: Better summary diagnostics for build failures

The reaction of kvm-recheck.sh is obscure at best, and easy to miss
completely.  This commit therefore prints "BUG: Build failed" in the
summary at the end of a run.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/tools/testing/selftests/rcutorture/bin/kvm-recheck-lock.sh b/tools/testing/selftests/rcutorture/bin/kvm-recheck-lock.sh
index 829186e19eb1..7f1ff1a8fc4b 100755
--- a/tools/testing/selftests/rcutorture/bin/kvm-recheck-lock.sh
+++ b/tools/testing/selftests/rcutorture/bin/kvm-recheck-lock.sh
@@ -35,7 +35,7 @@ configfile=`echo $i | sed -e 's/^.*\///'`
 ncs=`grep "Writes:  Total:" $i/console.log 2> /dev/null | tail -1 | sed -e 's/^.* Total: //' -e 's/ .*$//'`
 if test -z "$ncs"
 then
-	echo $configfile
+	echo "$configfile -------"
 else
 	title="$configfile ------- $ncs acquisitions/releases"
 	dur=`sed -e 's/^.* locktorture.shutdown_secs=//' -e 's/ .*$//' < $i/qemu-cmd 2> /dev/null`
diff --git a/tools/testing/selftests/rcutorture/bin/kvm-recheck-rcu.sh b/tools/testing/selftests/rcutorture/bin/kvm-recheck-rcu.sh
index d75b1dc5ae53..307c4b95f325 100755
--- a/tools/testing/selftests/rcutorture/bin/kvm-recheck-rcu.sh
+++ b/tools/testing/selftests/rcutorture/bin/kvm-recheck-rcu.sh
@@ -35,7 +35,7 @@ configfile=`echo $i | sed -e 's/^.*\///'`
 ngps=`grep ver: $i/console.log 2> /dev/null | tail -1 | sed -e 's/^.* ver: //' -e 's/ .*$//'`
 if test -z "$ngps"
 then
-	echo $configfile
+	echo "$configfile -------"
 else
 	title="$configfile ------- $ngps grace periods"
 	dur=`sed -e 's/^.* rcutorture.shutdown_secs=//' -e 's/ .*$//' < $i/qemu-cmd 2> /dev/null`
diff --git a/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh b/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh
index 26d78b7eaccf..ee1f6cae3d70 100755
--- a/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh
+++ b/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh
@@ -25,6 +25,7 @@
 # Authors: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
 
 PATH=`pwd`/tools/testing/selftests/rcutorture/bin:$PATH; export PATH
+. tools/testing/selftests/rcutorture/bin/functions.sh
 for rd in "$@"
 do
 	firsttime=1
@@ -39,13 +40,24 @@ do
 		fi
 		TORTURE_SUITE="`cat $i/../TORTURE_SUITE`"
 		kvm-recheck-${TORTURE_SUITE}.sh $i
-		configcheck.sh $i/.config $i/ConfigFragment
-		parse-build.sh $i/Make.out $configfile
-		parse-torture.sh $i/console.log $configfile
-		parse-console.sh $i/console.log $configfile
-		if test -r $i/Warnings
+		if test -f "$i/console.log"
 		then
-			cat $i/Warnings
+			configcheck.sh $i/.config $i/ConfigFragment
+			parse-build.sh $i/Make.out $configfile
+			parse-torture.sh $i/console.log $configfile
+			parse-console.sh $i/console.log $configfile
+			if test -r $i/Warnings
+			then
+				cat $i/Warnings
+			fi
+		else
+			if test -f "$i/qemu-cmd"
+			then
+				print_bug qemu failed
+			else
+				print_bug Build failed
+			fi
+			echo "   $i"
 		fi
 	done
 done
diff --git a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
index 7a95f86cc85a..51c34a91a375 100755
--- a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
+++ b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
@@ -106,6 +106,7 @@ then
 	fi
 else
 	cp $builddir/Make*.out $resdir
+	cp $builddir/.config $resdir || :
 	echo Build failed, not running KVM, see $resdir.
 	if test -f $builddir.wait
 	then


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: cond_resched() and RCU CPU stall warnings
  2014-03-18  2:17       ` Paul E. McKenney
@ 2014-03-18  8:51         ` Peter Zijlstra
  2014-03-18 12:49           ` Paul E. McKenney
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2014-03-18  8:51 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: mingo, josh, laijs, linux-kernel

On Mon, Mar 17, 2014 at 07:17:06PM -0700, Paul E. McKenney wrote:
> On Mon, Mar 17, 2014 at 06:14:34PM +0100, Peter Zijlstra wrote:
> > On Mon, Mar 17, 2014 at 09:58:07AM -0700, Paul E. McKenney wrote:
> > > @@ -4111,6 +4115,7 @@ int __sched __cond_resched_softirq(void)
> > >  {
> > >  	BUG_ON(!in_softirq());
> > >  
> > > +	rcu_cond_resched();
> > 
> > Don't you have to enable BHs before doing that? And if not; that needs a
> > comment for sure! :-)
> 
> No need to enable BHs, just RCU marking quiescent states.  But yes,
> the name does look a bit scary in that context, doesn't it?  Added
> a comment, please see below for updated patch.

The below seemed like an unrelated patch entirely :-)

> ------------------------------------------------------------------------
> 
> torture: Better summary diagnostics for build failures
> 
> The reaction of kvm-recheck.sh is obscure at best, and easy to miss
> completely.  This commit therefore prints "BUG: Build failed" in the
> summary at the end of a run.
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cond_resched() and RCU CPU stall warnings
  2014-03-18  8:51         ` Peter Zijlstra
@ 2014-03-18 12:49           ` Paul E. McKenney
  2014-03-18 13:45             ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Paul E. McKenney @ 2014-03-18 12:49 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, josh, laijs, linux-kernel

On Tue, Mar 18, 2014 at 09:51:18AM +0100, Peter Zijlstra wrote:
> On Mon, Mar 17, 2014 at 07:17:06PM -0700, Paul E. McKenney wrote:
> > On Mon, Mar 17, 2014 at 06:14:34PM +0100, Peter Zijlstra wrote:
> > > On Mon, Mar 17, 2014 at 09:58:07AM -0700, Paul E. McKenney wrote:
> > > > @@ -4111,6 +4115,7 @@ int __sched __cond_resched_softirq(void)
> > > >  {
> > > >  	BUG_ON(!in_softirq());
> > > >  
> > > > +	rcu_cond_resched();
> > > 
> > > Don't you have to enable BHs before doing that? And if not; that needs a
> > > comment for sure! :-)
> > 
> > No need to enable BHs, just RCU marking quiescent states.  But yes,
> > the name does look a bit scary in that context, doesn't it?  Added
> > a comment, please see below for updated patch.
> 
> The below seemed like an unrelated patch entirely :-)

Well, it certainly avoids issues with BH...

> > ------------------------------------------------------------------------
> > 
> > torture: Better summary diagnostics for build failures
> > 
> > The reaction of kvm-recheck.sh is obscure at best, and easy to miss
> > completely.  This commit therefore prints "BUG: Build failed" in the
> > summary at the end of a run.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

How about this one instead?  ;-)

							Thanx, Paul

------------------------------------------------------------------------

sched,rcu: Make cond_resched() report RCU quiescent states

Given a CPU running a loop containing cond_resched(), with no
other tasks runnable on that CPU, RCU will eventually report RCU
CPU stall warnings due to lack of quiescent states.  Fortunately,
every call to cond_resched() is a perfectly good quiescent state.
Unfortunately, invoking rcu_note_context_switch() is a bit heavyweight
for cond_resched(), especially given the need to disable preemption,
and, for RCU-preempt, interrupts as well.

This commit therefore maintains a per-CPU counter that causes
cond_resched(), cond_resched_lock(), and cond_resched_softirq() to call
rcu_note_context_switch(), but only about once per 256 invocations.
This ratio was chosen in keeping with the relative time constants of
RCU grace periods.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 3cea28c64ebe..8d64878111ea 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -44,6 +44,7 @@
 #include <linux/debugobjects.h>
 #include <linux/bug.h>
 #include <linux/compiler.h>
+#include <linux/percpu.h>
 #include <asm/barrier.h>
 
 extern int rcu_expedited; /* for sysctl */
@@ -287,6 +288,41 @@ bool __rcu_is_watching(void);
 #endif /* #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) || defined(CONFIG_SMP) */
 
 /*
+ * Hooks for cond_resched() and friends to avoid RCU CPU stall warnings.
+ */
+
+#define RCU_COND_RESCHED_LIM 256	/* ms vs. 100s of ms. */
+DECLARE_PER_CPU(int, rcu_cond_resched_count);
+void rcu_resched(void);
+
+/*
+ * Is it time to report RCU quiescent states?
+ *
+ * Note unsynchronized access to rcu_cond_resched_count.  Yes, we might
+ * increment some random CPU's count, and possibly also load the result from
+ * yet another CPU's count.  We might even clobber some other CPU's attempt
+ * to zero its counter.  This is all OK because the goal is not precision,
+ * but rather reasonable amortization of rcu_note_context_switch() overhead
+ * and extremely high probability of avoiding RCU CPU stall warnings.
+ * Note that this function has to be preempted in just the wrong place,
+ * many thousands of times in a row, for anything bad to happen.
+ */
+static inline bool rcu_should_resched(void)
+{
+	return __this_cpu_inc_return(rcu_cond_resched_count) >=
+	       RCU_COND_RESCHED_LIM;
+}
+
+/*
+ * Report quiscent states to RCU if it is time to do so.
+ */
+static inline void rcu_cond_resched(void)
+{
+	if (unlikely(rcu_should_resched()))
+		rcu_resched();
+}
+
+/*
  * Infrastructure to implement the synchronize_() primitives in
  * TREE_RCU and rcu_barrier_() primitives in TINY_RCU.
  */
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index 4c0a9b0af469..ed7a0d72562c 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -338,3 +338,21 @@ static int __init check_cpu_stall_init(void)
 early_initcall(check_cpu_stall_init);
 
 #endif /* #ifdef CONFIG_RCU_STALL_COMMON */
+
+/*
+ * Hooks for cond_resched() and friends to avoid RCU CPU stall warnings.
+ */
+
+DEFINE_PER_CPU(int, rcu_cond_resched_count);
+
+/*
+ * Report a set of RCU quiescent states, for use by cond_resched()
+ * and friends.  Out of line due to being called infrequently.
+ */
+void rcu_resched(void)
+{
+	preempt_disable();
+	__this_cpu_write(rcu_cond_resched_count, 0);
+	rcu_note_context_switch(smp_processor_id());
+	preempt_enable();
+}
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b46131ef6aab..1f53e8871dd2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4071,6 +4071,7 @@ static void __cond_resched(void)
 
 int __sched _cond_resched(void)
 {
+	rcu_cond_resched();
 	if (should_resched()) {
 		__cond_resched();
 		return 1;
@@ -4089,6 +4090,7 @@ EXPORT_SYMBOL(_cond_resched);
  */
 int __cond_resched_lock(spinlock_t *lock)
 {
+	bool need_rcu_resched = rcu_should_resched();
 	int resched = should_resched();
 	int ret = 0;
 
@@ -4098,6 +4100,8 @@ int __cond_resched_lock(spinlock_t *lock)
 		spin_unlock(lock);
 		if (resched)
 			__cond_resched();
+		else if (unlikely(need_rcu_resched))
+			rcu_resched();
 		else
 			cpu_relax();
 		ret = 1;
@@ -4111,6 +4115,7 @@ int __sched __cond_resched_softirq(void)
 {
 	BUG_ON(!in_softirq());
 
+	rcu_cond_resched();  /* BH disabled OK, just recording QSes. */
 	if (should_resched()) {
 		local_bh_enable();
 		__cond_resched();


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: cond_resched() and RCU CPU stall warnings
  2014-03-18 12:49           ` Paul E. McKenney
@ 2014-03-18 13:45             ` Peter Zijlstra
  2014-03-18 15:15               ` Paul E. McKenney
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2014-03-18 13:45 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: mingo, josh, laijs, linux-kernel

> sched,rcu: Make cond_resched() report RCU quiescent states
> 
> Given a CPU running a loop containing cond_resched(), with no
> other tasks runnable on that CPU, RCU will eventually report RCU
> CPU stall warnings due to lack of quiescent states.  Fortunately,
> every call to cond_resched() is a perfectly good quiescent state.
> Unfortunately, invoking rcu_note_context_switch() is a bit heavyweight
> for cond_resched(), especially given the need to disable preemption,
> and, for RCU-preempt, interrupts as well.
> 
> This commit therefore maintains a per-CPU counter that causes
> cond_resched(), cond_resched_lock(), and cond_resched_softirq() to call
> rcu_note_context_switch(), but only about once per 256 invocations.
> This ratio was chosen in keeping with the relative time constants of
> RCU grace periods.
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 3cea28c64ebe..8d64878111ea 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -44,6 +44,7 @@
>  #include <linux/debugobjects.h>
>  #include <linux/bug.h>
>  #include <linux/compiler.h>
> +#include <linux/percpu.h>
>  #include <asm/barrier.h>
>  
>  extern int rcu_expedited; /* for sysctl */
> @@ -287,6 +288,41 @@ bool __rcu_is_watching(void);
>  #endif /* #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) || defined(CONFIG_SMP) */
>  
>  /*
> + * Hooks for cond_resched() and friends to avoid RCU CPU stall warnings.
> + */
> +
> +#define RCU_COND_RESCHED_LIM 256	/* ms vs. 100s of ms. */
> +DECLARE_PER_CPU(int, rcu_cond_resched_count);
> +void rcu_resched(void);
> +
> +/*
> + * Is it time to report RCU quiescent states?
> + *
> + * Note unsynchronized access to rcu_cond_resched_count.  Yes, we might
> + * increment some random CPU's count, and possibly also load the result from
> + * yet another CPU's count.  We might even clobber some other CPU's attempt
> + * to zero its counter.  This is all OK because the goal is not precision,
> + * but rather reasonable amortization of rcu_note_context_switch() overhead
> + * and extremely high probability of avoiding RCU CPU stall warnings.
> + * Note that this function has to be preempted in just the wrong place,
> + * many thousands of times in a row, for anything bad to happen.
> + */
> +static inline bool rcu_should_resched(void)
> +{
> +	return __this_cpu_inc_return(rcu_cond_resched_count) >=
> +	       RCU_COND_RESCHED_LIM;
> +}
> +
> +/*
> + * Report quiscent states to RCU if it is time to do so.
> + */
> +static inline void rcu_cond_resched(void)
> +{
> +	if (unlikely(rcu_should_resched()))
> +		rcu_resched();
> +}
> +
> +/*
>   * Infrastructure to implement the synchronize_() primitives in
>   * TREE_RCU and rcu_barrier_() primitives in TINY_RCU.
>   */
> diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> index 4c0a9b0af469..ed7a0d72562c 100644
> --- a/kernel/rcu/update.c
> +++ b/kernel/rcu/update.c
> @@ -338,3 +338,21 @@ static int __init check_cpu_stall_init(void)
>  early_initcall(check_cpu_stall_init);
>  
>  #endif /* #ifdef CONFIG_RCU_STALL_COMMON */
> +
> +/*
> + * Hooks for cond_resched() and friends to avoid RCU CPU stall warnings.
> + */
> +
> +DEFINE_PER_CPU(int, rcu_cond_resched_count);
> +
> +/*
> + * Report a set of RCU quiescent states, for use by cond_resched()
> + * and friends.  Out of line due to being called infrequently.
> + */
> +void rcu_resched(void)
> +{
> +	preempt_disable();
> +	__this_cpu_write(rcu_cond_resched_count, 0);

Somehow I've got the urge to write:

  __this_cpu_add(rcu_cond_resched_count, -RCU_COND_RESCHED_LIM);

But the immediate write is saver, and I suppose it doesn't matter at
all.

> +	rcu_note_context_switch(smp_processor_id());
> +	preempt_enable();
> +}
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index b46131ef6aab..1f53e8871dd2 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4071,6 +4071,7 @@ static void __cond_resched(void)
>  
>  int __sched _cond_resched(void)
>  {
> +	rcu_cond_resched();
>  	if (should_resched()) {
>  		__cond_resched();
>  		return 1;
> @@ -4089,6 +4090,7 @@ EXPORT_SYMBOL(_cond_resched);
>   */
>  int __cond_resched_lock(spinlock_t *lock)
>  {
> +	bool need_rcu_resched = rcu_should_resched();
>  	int resched = should_resched();
>  	int ret = 0;
>  

I suppose you also need:

-	if (spin_needbreak(lock) || resched)
+	if (spin_needbreak(lock) || resched || need_rcu_resched)

> @@ -4098,6 +4100,8 @@ int __cond_resched_lock(spinlock_t *lock)
>  		spin_unlock(lock);
>  		if (resched)
>  			__cond_resched();
> +		else if (unlikely(need_rcu_resched))
> +			rcu_resched();
>  		else
>  			cpu_relax();
>  		ret = 1;
> @@ -4111,6 +4115,7 @@ int __sched __cond_resched_softirq(void)
>  {
>  	BUG_ON(!in_softirq());
>  
> +	rcu_cond_resched();  /* BH disabled OK, just recording QSes. */
>  	if (should_resched()) {
>  		local_bh_enable();
>  		__cond_resched();

At which point I'm wondering; do you also want __cond_resched_softirq()
to advance rcu_bh? I'm forever forgetting the exact rcu_bh semantics
though.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cond_resched() and RCU CPU stall warnings
  2014-03-18 13:45             ` Peter Zijlstra
@ 2014-03-18 15:15               ` Paul E. McKenney
  0 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2014-03-18 15:15 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, josh, laijs, linux-kernel

On Tue, Mar 18, 2014 at 02:45:34PM +0100, Peter Zijlstra wrote:
> > sched,rcu: Make cond_resched() report RCU quiescent states
> > 
> > Given a CPU running a loop containing cond_resched(), with no
> > other tasks runnable on that CPU, RCU will eventually report RCU
> > CPU stall warnings due to lack of quiescent states.  Fortunately,
> > every call to cond_resched() is a perfectly good quiescent state.
> > Unfortunately, invoking rcu_note_context_switch() is a bit heavyweight
> > for cond_resched(), especially given the need to disable preemption,
> > and, for RCU-preempt, interrupts as well.
> > 
> > This commit therefore maintains a per-CPU counter that causes
> > cond_resched(), cond_resched_lock(), and cond_resched_softirq() to call
> > rcu_note_context_switch(), but only about once per 256 invocations.
> > This ratio was chosen in keeping with the relative time constants of
> > RCU grace periods.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > 
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index 3cea28c64ebe..8d64878111ea 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -44,6 +44,7 @@
> >  #include <linux/debugobjects.h>
> >  #include <linux/bug.h>
> >  #include <linux/compiler.h>
> > +#include <linux/percpu.h>
> >  #include <asm/barrier.h>
> >  
> >  extern int rcu_expedited; /* for sysctl */
> > @@ -287,6 +288,41 @@ bool __rcu_is_watching(void);
> >  #endif /* #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) || defined(CONFIG_SMP) */
> >  
> >  /*
> > + * Hooks for cond_resched() and friends to avoid RCU CPU stall warnings.
> > + */
> > +
> > +#define RCU_COND_RESCHED_LIM 256	/* ms vs. 100s of ms. */
> > +DECLARE_PER_CPU(int, rcu_cond_resched_count);
> > +void rcu_resched(void);
> > +
> > +/*
> > + * Is it time to report RCU quiescent states?
> > + *
> > + * Note unsynchronized access to rcu_cond_resched_count.  Yes, we might
> > + * increment some random CPU's count, and possibly also load the result from
> > + * yet another CPU's count.  We might even clobber some other CPU's attempt
> > + * to zero its counter.  This is all OK because the goal is not precision,
> > + * but rather reasonable amortization of rcu_note_context_switch() overhead
> > + * and extremely high probability of avoiding RCU CPU stall warnings.
> > + * Note that this function has to be preempted in just the wrong place,
> > + * many thousands of times in a row, for anything bad to happen.
> > + */
> > +static inline bool rcu_should_resched(void)
> > +{
> > +	return __this_cpu_inc_return(rcu_cond_resched_count) >=
> > +	       RCU_COND_RESCHED_LIM;
> > +}
> > +
> > +/*
> > + * Report quiscent states to RCU if it is time to do so.
> > + */
> > +static inline void rcu_cond_resched(void)
> > +{
> > +	if (unlikely(rcu_should_resched()))
> > +		rcu_resched();
> > +}
> > +
> > +/*
> >   * Infrastructure to implement the synchronize_() primitives in
> >   * TREE_RCU and rcu_barrier_() primitives in TINY_RCU.
> >   */
> > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> > index 4c0a9b0af469..ed7a0d72562c 100644
> > --- a/kernel/rcu/update.c
> > +++ b/kernel/rcu/update.c
> > @@ -338,3 +338,21 @@ static int __init check_cpu_stall_init(void)
> >  early_initcall(check_cpu_stall_init);
> >  
> >  #endif /* #ifdef CONFIG_RCU_STALL_COMMON */
> > +
> > +/*
> > + * Hooks for cond_resched() and friends to avoid RCU CPU stall warnings.
> > + */
> > +
> > +DEFINE_PER_CPU(int, rcu_cond_resched_count);
> > +
> > +/*
> > + * Report a set of RCU quiescent states, for use by cond_resched()
> > + * and friends.  Out of line due to being called infrequently.
> > + */
> > +void rcu_resched(void)
> > +{
> > +	preempt_disable();
> > +	__this_cpu_write(rcu_cond_resched_count, 0);
> 
> Somehow I've got the urge to write:
> 
>   __this_cpu_add(rcu_cond_resched_count, -RCU_COND_RESCHED_LIM);
> 
> But the immediate write is saver, and I suppose it doesn't matter at
> all.

My concern is that an unfortunate series of preemptions might cause a
victim CPU's counter to be driven to a large negative number, which
could then result in RCU CPU stall warnings given a subsequent time
with no schedules.  In contrast, the series of preemptions that result
in large positive numbers result in context switches during that time,
which keeps the stall warnings at bay.

> > +	rcu_note_context_switch(smp_processor_id());
> > +	preempt_enable();
> > +}
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index b46131ef6aab..1f53e8871dd2 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -4071,6 +4071,7 @@ static void __cond_resched(void)
> >  
> >  int __sched _cond_resched(void)
> >  {
> > +	rcu_cond_resched();
> >  	if (should_resched()) {
> >  		__cond_resched();
> >  		return 1;
> > @@ -4089,6 +4090,7 @@ EXPORT_SYMBOL(_cond_resched);
> >   */
> >  int __cond_resched_lock(spinlock_t *lock)
> >  {
> > +	bool need_rcu_resched = rcu_should_resched();
> >  	int resched = should_resched();
> >  	int ret = 0;
> >  
> 
> I suppose you also need:
> 
> -	if (spin_needbreak(lock) || resched)
> +	if (spin_needbreak(lock) || resched || need_rcu_resched)

Indeed I do, thank you!

> > @@ -4098,6 +4100,8 @@ int __cond_resched_lock(spinlock_t *lock)
> >  		spin_unlock(lock);
> >  		if (resched)
> >  			__cond_resched();
> > +		else if (unlikely(need_rcu_resched))
> > +			rcu_resched();
> >  		else
> >  			cpu_relax();
> >  		ret = 1;
> > @@ -4111,6 +4115,7 @@ int __sched __cond_resched_softirq(void)
> >  {
> >  	BUG_ON(!in_softirq());
> >  
> > +	rcu_cond_resched();  /* BH disabled OK, just recording QSes. */
> >  	if (should_resched()) {
> >  		local_bh_enable();
> >  		__cond_resched();
> 
> At which point I'm wondering; do you also want __cond_resched_softirq()
> to advance rcu_bh? I'm forever forgetting the exact rcu_bh semantics
> though.

My assumption is that a given CPU spends a significant fraction of its
time outside of BH.  This means that we have a high probability of seeing
an RCU-bh quiescent state on the next scheduling-clock interrupt.  (If
there is no next scheduling-clock interrupt, RCU's dyntick-idle logic
should take care of things for us.)

But you are right:  If we start getting RCU-bh CPU stall warnings,
then one potential fix is putting RCU-bh quiescent states into
cond_resched() and friends.

Updated patch below.

							Thanx, Paul

------------------------------------------------------------------------

sched,rcu: Make cond_resched() report RCU quiescent states

Given a CPU running a loop containing cond_resched(), with no
other tasks runnable on that CPU, RCU will eventually report RCU
CPU stall warnings due to lack of quiescent states.  Fortunately,
every call to cond_resched() is a perfectly good quiescent state.
Unfortunately, invoking rcu_note_context_switch() is a bit heavyweight
for cond_resched(), especially given the need to disable preemption,
and, for RCU-preempt, interrupts as well.

This commit therefore maintains a per-CPU counter that causes
cond_resched(), cond_resched_lock(), and cond_resched_softirq() to call
rcu_note_context_switch(), but only about once per 256 invocations.
This ratio was chosen in keeping with the relative time constants of
RCU grace periods.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 3cea28c64ebe..8d64878111ea 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -44,6 +44,7 @@
 #include <linux/debugobjects.h>
 #include <linux/bug.h>
 #include <linux/compiler.h>
+#include <linux/percpu.h>
 #include <asm/barrier.h>
 
 extern int rcu_expedited; /* for sysctl */
@@ -287,6 +288,41 @@ bool __rcu_is_watching(void);
 #endif /* #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) || defined(CONFIG_SMP) */
 
 /*
+ * Hooks for cond_resched() and friends to avoid RCU CPU stall warnings.
+ */
+
+#define RCU_COND_RESCHED_LIM 256	/* ms vs. 100s of ms. */
+DECLARE_PER_CPU(int, rcu_cond_resched_count);
+void rcu_resched(void);
+
+/*
+ * Is it time to report RCU quiescent states?
+ *
+ * Note unsynchronized access to rcu_cond_resched_count.  Yes, we might
+ * increment some random CPU's count, and possibly also load the result from
+ * yet another CPU's count.  We might even clobber some other CPU's attempt
+ * to zero its counter.  This is all OK because the goal is not precision,
+ * but rather reasonable amortization of rcu_note_context_switch() overhead
+ * and extremely high probability of avoiding RCU CPU stall warnings.
+ * Note that this function has to be preempted in just the wrong place,
+ * many thousands of times in a row, for anything bad to happen.
+ */
+static inline bool rcu_should_resched(void)
+{
+	return __this_cpu_inc_return(rcu_cond_resched_count) >=
+	       RCU_COND_RESCHED_LIM;
+}
+
+/*
+ * Report quiscent states to RCU if it is time to do so.
+ */
+static inline void rcu_cond_resched(void)
+{
+	if (unlikely(rcu_should_resched()))
+		rcu_resched();
+}
+
+/*
  * Infrastructure to implement the synchronize_() primitives in
  * TREE_RCU and rcu_barrier_() primitives in TINY_RCU.
  */
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index 4c0a9b0af469..ed7a0d72562c 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -338,3 +338,21 @@ static int __init check_cpu_stall_init(void)
 early_initcall(check_cpu_stall_init);
 
 #endif /* #ifdef CONFIG_RCU_STALL_COMMON */
+
+/*
+ * Hooks for cond_resched() and friends to avoid RCU CPU stall warnings.
+ */
+
+DEFINE_PER_CPU(int, rcu_cond_resched_count);
+
+/*
+ * Report a set of RCU quiescent states, for use by cond_resched()
+ * and friends.  Out of line due to being called infrequently.
+ */
+void rcu_resched(void)
+{
+	preempt_disable();
+	__this_cpu_write(rcu_cond_resched_count, 0);
+	rcu_note_context_switch(smp_processor_id());
+	preempt_enable();
+}
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b46131ef6aab..7b8ef161be4c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4071,6 +4071,7 @@ static void __cond_resched(void)
 
 int __sched _cond_resched(void)
 {
+	rcu_cond_resched();
 	if (should_resched()) {
 		__cond_resched();
 		return 1;
@@ -4089,15 +4090,18 @@ EXPORT_SYMBOL(_cond_resched);
  */
 int __cond_resched_lock(spinlock_t *lock)
 {
+	bool need_rcu_resched = rcu_should_resched();
 	int resched = should_resched();
 	int ret = 0;
 
 	lockdep_assert_held(lock);
 
-	if (spin_needbreak(lock) || resched) {
+	if (spin_needbreak(lock) || resched || need_rcu_resched) {
 		spin_unlock(lock);
 		if (resched)
 			__cond_resched();
+		else if (unlikely(need_rcu_resched))
+			rcu_resched();
 		else
 			cpu_relax();
 		ret = 1;
@@ -4111,6 +4115,7 @@ int __sched __cond_resched_softirq(void)
 {
 	BUG_ON(!in_softirq());
 
+	rcu_cond_resched();  /* BH disabled OK, just recording QSes. */
 	if (should_resched()) {
 		local_bh_enable();
 		__cond_resched();


^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-03-18 15:15 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-16  1:59 cond_resched() and RCU CPU stall warnings Paul E. McKenney
2014-03-16  6:09 ` Mike Galbraith
2014-03-16  6:14   ` Mike Galbraith
2014-03-16  6:27     ` Paul E. McKenney
2014-03-16  6:25   ` Paul E. McKenney
2014-03-16  7:30     ` Mike Galbraith
2014-03-17 10:13 ` Peter Zijlstra
2014-03-17 16:58   ` Paul E. McKenney
2014-03-17 17:14     ` Peter Zijlstra
2014-03-18  2:17       ` Paul E. McKenney
2014-03-18  8:51         ` Peter Zijlstra
2014-03-18 12:49           ` Paul E. McKenney
2014-03-18 13:45             ` Peter Zijlstra
2014-03-18 15:15               ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox