[PATCH RFC tip/core/rcu] accelerate grace period if last non-dynticked CPU

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH RFC tip/core/rcu] accelerate grace period if last non-dynticked CPU
@ 2010-01-25  3:48 Paul E. McKenney
  2010-01-25 12:28 ` Lai Jiangshan
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Paul E. McKenney @ 2010-01-25  3:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, dvhltc,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells

[Experimental RFC, not for inclusion.]

I recently received a complaint that RCU was refusing to let a system
go into low-power state immediately, instead waiting a few ticks after
the system had gone idle before letting go of the last CPU.  Of course,
the reason for this was that there were a couple of RCU callbacks on
the last CPU.

Currently, rcu_needs_cpu() simply checks whether the current CPU has
an outstanding RCU callback, which means that the last CPU to go into
dyntick-idle mode might wait a few ticks for the relevant grace periods
to complete.  However, if all the other CPUs are in dyntick-idle mode,
and if this CPU is in a quiescent state (which it is for RCU-bh and
RCU-sched any time that we are considering going into dyntick-idle mode),
then the grace period is instantly complete.

This patch therefore repeatedly invokes the RCU grace-period machinery
in order to force any needed grace periods to complete quickly.  It does
so a limited number of times in order to prevent starvation by an RCU
callback function that might pass itself to call_rcu().

Thoughts?

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/init/Kconfig b/init/Kconfig
index d95ca7c..42bf914 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -396,6 +396,22 @@ config RCU_FANOUT_EXACT
 
 	  Say N if unsure.
 
+config RCU_FAST_NO_HZ
+	bool "Accelerate last non-dyntick-idle CPU's grace periods"
+	depends on TREE_RCU && NO_HZ && SMP
+	default n
+	help
+	  This option causes RCU to attempt to accelerate grace periods
+	  in order to allow the final CPU to enter dynticks-idle state
+	  more quickly.  On the other hand, this option increases the
+	  overhead of the dynticks-idle checking, particularly on systems
+	  with large numbers of CPUs.
+
+	  Say Y if energy efficiency is critically important, particularly
+	  	if you have relatively few CPUs.
+
+	  Say N if you are unsure.
+
 config TREE_RCU_TRACE
 	def_bool RCU_TRACE && ( TREE_RCU || TREE_PREEMPT_RCU )
 	select DEBUG_FS
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 099a255..29d88c0 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -1550,10 +1550,9 @@ static int rcu_pending(int cpu)
 /*
  * Check to see if any future RCU-related work will need to be done
  * by the current CPU, even if none need be done immediately, returning
- * 1 if so.  This function is part of the RCU implementation; it is -not-
- * an exported member of the RCU API.
+ * 1 if so.
  */
-int rcu_needs_cpu(int cpu)
+static int rcu_needs_cpu_quick_check(int cpu)
 {
 	/* RCU callbacks either ready or pending? */
 	return per_cpu(rcu_sched_data, cpu).nxtlist ||
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index e77cdf3..d6170a9 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -906,3 +906,72 @@ static void __init __rcu_init_preempt(void)
 }
 
 #endif /* #else #ifdef CONFIG_TREE_PREEMPT_RCU */
+
+#if defined(CONFIG_TREE_PREEMPT_RCU) || !defined(CONFIG_RCU_FAST_NO_HZ)
+
+/*
+ * Check to see if any future RCU-related work will need to be done
+ * by the current CPU, even if none need be done immediately, returning
+ * 1 if so.  This function is part of the RCU implementation; it is -not-
+ * an exported member of the RCU API.
+ *
+ * Because we have preemptible RCU, just check whether this CPU needs
+ * any flavor of RCU.  Do not chew up lots of CPU cycles with preemption
+ * disabled in a most-likely vain attempt to cause RCU not to need this CPU.
+ */
+int rcu_needs_cpu(int cpu)
+{
+	return rcu_needs_cpu_quick_check(cpu);
+}
+
+#else
+
+#define RCU_NEEDS_CPU_FLUSHES 5
+
+/*
+ * Check to see if any future RCU-related work will need to be done
+ * by the current CPU, even if none need be done immediately, returning
+ * 1 if so.  This function is part of the RCU implementation; it is -not-
+ * an exported member of the RCU API.
+ *
+ * Because we are not supporting preemptible RCU, attempt to accelerate
+ * any current grace periods so that RCU no longer needs this CPU, but
+ * only if all other CPUs are already in dynticks-idle mode.  This will
+ * allow the CPU cores to be powered down immediately, as opposed to after
+ * waiting many milliseconds for grace periods to elapse.
+ */
+int rcu_needs_cpu(int cpu)
+{
+	int c = 1;
+	int i;
+	int thatcpu;
+
+	/* Don't bother unless we are the last non-dyntick-idle CPU. */
+	for_each_cpu(thatcpu, nohz_cpu_mask)
+		if (thatcpu != cpu)
+			return rcu_needs_cpu_quick_check(cpu);
+
+	/* Try to push remaining RCU-sched and RCU-bh callbacks through. */
+	for (i = 0; i < RCU_NEEDS_CPU_FLUSHES && c; i++) {
+		c = 0;
+		if (per_cpu(rcu_sched_data, cpu).nxtlist) {
+			c = 1;
+			rcu_sched_qs(cpu);
+			force_quiescent_state(&rcu_sched_state, 0);
+			__rcu_process_callbacks(&rcu_sched_state,
+						&per_cpu(rcu_sched_data, cpu));
+		}
+		if (per_cpu(rcu_bh_data, cpu).nxtlist) {
+			c = 1;
+			rcu_bh_qs(cpu);
+			force_quiescent_state(&rcu_bh_state, 0);
+			__rcu_process_callbacks(&rcu_bh_state,
+						&per_cpu(rcu_bh_data, cpu));
+		}
+	}
+
+	/* If RCU callbacks are still pending, RCU still needs this CPU. */
+	return c;
+}
+
+#endif

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC tip/core/rcu] accelerate grace period if last non-dynticked CPU
  2010-01-25  3:48 [PATCH RFC tip/core/rcu] accelerate grace period if last non-dynticked CPU Paul E. McKenney
@ 2010-01-25 12:28 ` Lai Jiangshan
  2010-01-25 12:35   ` Peter Zijlstra
                     ` (2 more replies)
  2010-01-25 15:12 ` Steven Rostedt
  2010-01-26 21:30 ` Andi Kleen
  2 siblings, 3 replies; 23+ messages in thread
From: Lai Jiangshan @ 2010-01-25 12:28 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, mingo, dipankar, akpm, mathieu.desnoyers, josh,
	dvhltc, niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells

Paul E. McKenney wrote:
> [Experimental RFC, not for inclusion.]
> 
> I recently received a complaint that RCU was refusing to let a system
> go into low-power state immediately, instead waiting a few ticks after
> the system had gone idle before letting go of the last CPU.  Of course,
> the reason for this was that there were a couple of RCU callbacks on
> the last CPU.
> 
> Currently, rcu_needs_cpu() simply checks whether the current CPU has
> an outstanding RCU callback, which means that the last CPU to go into
> dyntick-idle mode might wait a few ticks for the relevant grace periods
> to complete.  However, if all the other CPUs are in dyntick-idle mode,
> and if this CPU is in a quiescent state (which it is for RCU-bh and
> RCU-sched any time that we are considering going into dyntick-idle mode),
> then the grace period is instantly complete.
> 
> This patch therefore repeatedly invokes the RCU grace-period machinery
> in order to force any needed grace periods to complete quickly.  It does
> so a limited number of times in order to prevent starvation by an RCU
> callback function that might pass itself to call_rcu().
> 
> Thoughts?
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> diff --git a/init/Kconfig b/init/Kconfig
> index d95ca7c..42bf914 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -396,6 +396,22 @@ config RCU_FANOUT_EXACT
>  
>  	  Say N if unsure.
>  
> +config RCU_FAST_NO_HZ
> +	bool "Accelerate last non-dyntick-idle CPU's grace periods"
> +	depends on TREE_RCU && NO_HZ && SMP
> +	default n
> +	help
> +	  This option causes RCU to attempt to accelerate grace periods
> +	  in order to allow the final CPU to enter dynticks-idle state
> +	  more quickly.  On the other hand, this option increases the
> +	  overhead of the dynticks-idle checking, particularly on systems
> +	  with large numbers of CPUs.
> +
> +	  Say Y if energy efficiency is critically important, particularly
> +	  	if you have relatively few CPUs.
> +
> +	  Say N if you are unsure.
> +
>  config TREE_RCU_TRACE
>  	def_bool RCU_TRACE && ( TREE_RCU || TREE_PREEMPT_RCU )
>  	select DEBUG_FS
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 099a255..29d88c0 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -1550,10 +1550,9 @@ static int rcu_pending(int cpu)
>  /*
>   * Check to see if any future RCU-related work will need to be done
>   * by the current CPU, even if none need be done immediately, returning
> - * 1 if so.  This function is part of the RCU implementation; it is -not-
> - * an exported member of the RCU API.
> + * 1 if so.
>   */
> -int rcu_needs_cpu(int cpu)
> +static int rcu_needs_cpu_quick_check(int cpu)
>  {
>  	/* RCU callbacks either ready or pending? */
>  	return per_cpu(rcu_sched_data, cpu).nxtlist ||
> diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> index e77cdf3..d6170a9 100644
> --- a/kernel/rcutree_plugin.h
> +++ b/kernel/rcutree_plugin.h
> @@ -906,3 +906,72 @@ static void __init __rcu_init_preempt(void)
>  }
>  
>  #endif /* #else #ifdef CONFIG_TREE_PREEMPT_RCU */
> +
> +#if defined(CONFIG_TREE_PREEMPT_RCU) || !defined(CONFIG_RCU_FAST_NO_HZ)
> +
> +/*
> + * Check to see if any future RCU-related work will need to be done
> + * by the current CPU, even if none need be done immediately, returning
> + * 1 if so.  This function is part of the RCU implementation; it is -not-
> + * an exported member of the RCU API.
> + *
> + * Because we have preemptible RCU, just check whether this CPU needs
> + * any flavor of RCU.  Do not chew up lots of CPU cycles with preemption
> + * disabled in a most-likely vain attempt to cause RCU not to need this CPU.
> + */
> +int rcu_needs_cpu(int cpu)
> +{
> +	return rcu_needs_cpu_quick_check(cpu);
> +}
> +
> +#else
> +
> +#define RCU_NEEDS_CPU_FLUSHES 5
> +
> +/*
> + * Check to see if any future RCU-related work will need to be done
> + * by the current CPU, even if none need be done immediately, returning
> + * 1 if so.  This function is part of the RCU implementation; it is -not-
> + * an exported member of the RCU API.
> + *
> + * Because we are not supporting preemptible RCU, attempt to accelerate
> + * any current grace periods so that RCU no longer needs this CPU, but
> + * only if all other CPUs are already in dynticks-idle mode.  This will
> + * allow the CPU cores to be powered down immediately, as opposed to after
> + * waiting many milliseconds for grace periods to elapse.
> + */
> +int rcu_needs_cpu(int cpu)
> +{
> +	int c = 1;
> +	int i;
> +	int thatcpu;
> +
> +	/* Don't bother unless we are the last non-dyntick-idle CPU. */
> +	for_each_cpu(thatcpu, nohz_cpu_mask)
> +		if (thatcpu != cpu)
> +			return rcu_needs_cpu_quick_check(cpu);

The comment and the code are not the same, I think.

-----------
I found this thing, Although I think it is a ugly thing.
Is it help?

See select_nohz_load_balancer().

/*
 * This routine will try to nominate the ilb (idle load balancing)
 * owner among the cpus whose ticks are stopped. ilb owner will do the idle
 * load balancing on behalf of all those cpus. If all the cpus in the system
 * go into this tickless mode, then there will be no ilb owner (as there is
 * no need for one) and all the cpus will sleep till the next wakeup event
 * arrives...
 *
 * For the ilb owner, tick is not stopped. And this tick will be used
 * for idle load balancing. ilb owner will still be part of
 * nohz.cpu_mask..
 *
 * While stopping the tick, this cpu will become the ilb owner if there
 * is no other owner. And will be the owner till that cpu becomes busy
 * or if all cpus in the system stop their ticks at which point
 * there is no need for ilb owner.
 *
 * When the ilb owner becomes busy, it nominates another owner, during the
 * next busy scheduler_tick()
 */

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC tip/core/rcu] accelerate grace period if last non-dynticked CPU
  2010-01-25 12:28 ` Lai Jiangshan
@ 2010-01-25 12:35   ` Peter Zijlstra
  2010-01-25 15:08   ` Steven Rostedt
  2010-01-27  5:17   ` Paul E. McKenney
  2 siblings, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2010-01-25 12:35 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: paulmck, linux-kernel, mingo, dipankar, akpm, mathieu.desnoyers,
	josh, dvhltc, niv, tglx, rostedt, Valdis.Kletnieks, dhowells,
	Pallipadi,Venkatesh

On Mon, 2010-01-25 at 20:28 +0800, Lai Jiangshan wrote:
> Paul E. McKenney wrote:
> > [Experimental RFC, not for inclusion.]
> > 
> > I recently received a complaint that RCU was refusing to let a system
> > go into low-power state immediately, instead waiting a few ticks after
> > the system had gone idle before letting go of the last CPU.  Of course,
> > the reason for this was that there were a couple of RCU callbacks on
> > the last CPU.
> > 
> > Currently, rcu_needs_cpu() simply checks whether the current CPU has
> > an outstanding RCU callback, which means that the last CPU to go into
> > dyntick-idle mode might wait a few ticks for the relevant grace periods
> > to complete.  However, if all the other CPUs are in dyntick-idle mode,
> > and if this CPU is in a quiescent state (which it is for RCU-bh and
> > RCU-sched any time that we are considering going into dyntick-idle mode),
> > then the grace period is instantly complete.
> > 
> > This patch therefore repeatedly invokes the RCU grace-period machinery
> > in order to force any needed grace periods to complete quickly.  It does
> > so a limited number of times in order to prevent starvation by an RCU
> > callback function that might pass itself to call_rcu().
> > 
> > Thoughts?
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > 
> > diff --git a/init/Kconfig b/init/Kconfig
> > index d95ca7c..42bf914 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -396,6 +396,22 @@ config RCU_FANOUT_EXACT
> >  
> >  	  Say N if unsure.
> >  
> > +config RCU_FAST_NO_HZ
> > +	bool "Accelerate last non-dyntick-idle CPU's grace periods"
> > +	depends on TREE_RCU && NO_HZ && SMP
> > +	default n
> > +	help
> > +	  This option causes RCU to attempt to accelerate grace periods
> > +	  in order to allow the final CPU to enter dynticks-idle state
> > +	  more quickly.  On the other hand, this option increases the
> > +	  overhead of the dynticks-idle checking, particularly on systems
> > +	  with large numbers of CPUs.
> > +
> > +	  Say Y if energy efficiency is critically important, particularly
> > +	  	if you have relatively few CPUs.
> > +
> > +	  Say N if you are unsure.
> > +
> >  config TREE_RCU_TRACE
> >  	def_bool RCU_TRACE && ( TREE_RCU || TREE_PREEMPT_RCU )
> >  	select DEBUG_FS
> > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > index 099a255..29d88c0 100644
> > --- a/kernel/rcutree.c
> > +++ b/kernel/rcutree.c
> > @@ -1550,10 +1550,9 @@ static int rcu_pending(int cpu)
> >  /*
> >   * Check to see if any future RCU-related work will need to be done
> >   * by the current CPU, even if none need be done immediately, returning
> > - * 1 if so.  This function is part of the RCU implementation; it is -not-
> > - * an exported member of the RCU API.
> > + * 1 if so.
> >   */
> > -int rcu_needs_cpu(int cpu)
> > +static int rcu_needs_cpu_quick_check(int cpu)
> >  {
> >  	/* RCU callbacks either ready or pending? */
> >  	return per_cpu(rcu_sched_data, cpu).nxtlist ||
> > diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> > index e77cdf3..d6170a9 100644
> > --- a/kernel/rcutree_plugin.h
> > +++ b/kernel/rcutree_plugin.h
> > @@ -906,3 +906,72 @@ static void __init __rcu_init_preempt(void)
> >  }
> >  
> >  #endif /* #else #ifdef CONFIG_TREE_PREEMPT_RCU */
> > +
> > +#if defined(CONFIG_TREE_PREEMPT_RCU) || !defined(CONFIG_RCU_FAST_NO_HZ)
> > +
> > +/*
> > + * Check to see if any future RCU-related work will need to be done
> > + * by the current CPU, even if none need be done immediately, returning
> > + * 1 if so.  This function is part of the RCU implementation; it is -not-
> > + * an exported member of the RCU API.
> > + *
> > + * Because we have preemptible RCU, just check whether this CPU needs
> > + * any flavor of RCU.  Do not chew up lots of CPU cycles with preemption
> > + * disabled in a most-likely vain attempt to cause RCU not to need this CPU.
> > + */
> > +int rcu_needs_cpu(int cpu)
> > +{
> > +	return rcu_needs_cpu_quick_check(cpu);
> > +}
> > +
> > +#else
> > +
> > +#define RCU_NEEDS_CPU_FLUSHES 5
> > +
> > +/*
> > + * Check to see if any future RCU-related work will need to be done
> > + * by the current CPU, even if none need be done immediately, returning
> > + * 1 if so.  This function is part of the RCU implementation; it is -not-
> > + * an exported member of the RCU API.
> > + *
> > + * Because we are not supporting preemptible RCU, attempt to accelerate
> > + * any current grace periods so that RCU no longer needs this CPU, but
> > + * only if all other CPUs are already in dynticks-idle mode.  This will
> > + * allow the CPU cores to be powered down immediately, as opposed to after
> > + * waiting many milliseconds for grace periods to elapse.
> > + */
> > +int rcu_needs_cpu(int cpu)
> > +{
> > +	int c = 1;
> > +	int i;
> > +	int thatcpu;
> > +
> > +	/* Don't bother unless we are the last non-dyntick-idle CPU. */
> > +	for_each_cpu(thatcpu, nohz_cpu_mask)
> > +		if (thatcpu != cpu)
> > +			return rcu_needs_cpu_quick_check(cpu);
> 
> The comment and the code are not the same, I think.
> 
> -----------
> I found this thing, Although I think it is a ugly thing.
> Is it help?
> 
> See select_nohz_load_balancer().
> 
> /*
>  * This routine will try to nominate the ilb (idle load balancing)
>  * owner among the cpus whose ticks are stopped. ilb owner will do the idle
>  * load balancing on behalf of all those cpus. If all the cpus in the system
>  * go into this tickless mode, then there will be no ilb owner (as there is
>  * no need for one) and all the cpus will sleep till the next wakeup event
>  * arrives...
>  *
>  * For the ilb owner, tick is not stopped. And this tick will be used
>  * for idle load balancing. ilb owner will still be part of
>  * nohz.cpu_mask..
>  *
>  * While stopping the tick, this cpu will become the ilb owner if there
>  * is no other owner. And will be the owner till that cpu becomes busy
>  * or if all cpus in the system stop their ticks at which point
>  * there is no need for ilb owner.
>  *
>  * When the ilb owner becomes busy, it nominates another owner, during the
>  * next busy scheduler_tick()
>  */

Not quite sure what your point is, but Venki was poking at the ILB so he
might want to be aware of things...



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC tip/core/rcu] accelerate grace period if last non-dynticked CPU
  2010-01-25 12:28 ` Lai Jiangshan
  2010-01-25 12:35   ` Peter Zijlstra
@ 2010-01-25 15:08   ` Steven Rostedt
  2010-01-27  5:17   ` Paul E. McKenney
  2 siblings, 0 replies; 23+ messages in thread
From: Steven Rostedt @ 2010-01-25 15:08 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: paulmck, linux-kernel, mingo, dipankar, akpm, mathieu.desnoyers,
	josh, dvhltc, niv, tglx, peterz, Valdis.Kletnieks, dhowells

On Mon, 2010-01-25 at 20:28 +0800, Lai Jiangshan wrote:
> Paul E. McKenney wrote:

> > +	/* Don't bother unless we are the last non-dyntick-idle CPU. */
> > +	for_each_cpu(thatcpu, nohz_cpu_mask)
> > +		if (thatcpu != cpu)
> > +			return rcu_needs_cpu_quick_check(cpu);
> 
> The comment and the code are not the same, I think.
> 

I once heard this quote, but I don't know who said it:

"If the comment and the code do not match, they probably are both wrong"

Anyway, you are correct, the comment does not match, but I think the
code is wrong. The code returns if any cpu is in non-dyntick-idle state.
Reading the change log, that looks wrong. Perhaps this is what was
needed:


	for_each_online_cpu(thatcpu) {
		if (thatcpu != cpu && !cpumask_test_cpu(thatcpu, nohz_cpu_mask)
			return rcu_needs_cpu_quick_check(cpu);
	}

-- Steve



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC tip/core/rcu] accelerate grace period if last non-dynticked CPU
  2010-01-25 12:28 ` Lai Jiangshan
  2010-01-25 12:35   ` Peter Zijlstra
  2010-01-25 15:08   ` Steven Rostedt
@ 2010-01-27  5:17   ` Paul E. McKenney
  2 siblings, 0 replies; 23+ messages in thread
From: Paul E. McKenney @ 2010-01-27  5:17 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, mingo, dipankar, akpm, mathieu.desnoyers, josh,
	dvhltc, niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells

On Mon, Jan 25, 2010 at 08:28:34PM +0800, Lai Jiangshan wrote:
> Paul E. McKenney wrote:
> > [Experimental RFC, not for inclusion.]
> > 
> > I recently received a complaint that RCU was refusing to let a system
> > go into low-power state immediately, instead waiting a few ticks after
> > the system had gone idle before letting go of the last CPU.  Of course,
> > the reason for this was that there were a couple of RCU callbacks on
> > the last CPU.
> > 
> > Currently, rcu_needs_cpu() simply checks whether the current CPU has
> > an outstanding RCU callback, which means that the last CPU to go into
> > dyntick-idle mode might wait a few ticks for the relevant grace periods
> > to complete.  However, if all the other CPUs are in dyntick-idle mode,
> > and if this CPU is in a quiescent state (which it is for RCU-bh and
> > RCU-sched any time that we are considering going into dyntick-idle mode),
> > then the grace period is instantly complete.
> > 
> > This patch therefore repeatedly invokes the RCU grace-period machinery
> > in order to force any needed grace periods to complete quickly.  It does
> > so a limited number of times in order to prevent starvation by an RCU
> > callback function that might pass itself to call_rcu().
> > 
> > Thoughts?
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > 
> > diff --git a/init/Kconfig b/init/Kconfig
> > index d95ca7c..42bf914 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -396,6 +396,22 @@ config RCU_FANOUT_EXACT
> >  
> >  	  Say N if unsure.
> >  
> > +config RCU_FAST_NO_HZ
> > +	bool "Accelerate last non-dyntick-idle CPU's grace periods"
> > +	depends on TREE_RCU && NO_HZ && SMP
> > +	default n
> > +	help
> > +	  This option causes RCU to attempt to accelerate grace periods
> > +	  in order to allow the final CPU to enter dynticks-idle state
> > +	  more quickly.  On the other hand, this option increases the
> > +	  overhead of the dynticks-idle checking, particularly on systems
> > +	  with large numbers of CPUs.
> > +
> > +	  Say Y if energy efficiency is critically important, particularly
> > +	  	if you have relatively few CPUs.
> > +
> > +	  Say N if you are unsure.
> > +
> >  config TREE_RCU_TRACE
> >  	def_bool RCU_TRACE && ( TREE_RCU || TREE_PREEMPT_RCU )
> >  	select DEBUG_FS
> > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > index 099a255..29d88c0 100644
> > --- a/kernel/rcutree.c
> > +++ b/kernel/rcutree.c
> > @@ -1550,10 +1550,9 @@ static int rcu_pending(int cpu)
> >  /*
> >   * Check to see if any future RCU-related work will need to be done
> >   * by the current CPU, even if none need be done immediately, returning
> > - * 1 if so.  This function is part of the RCU implementation; it is -not-
> > - * an exported member of the RCU API.
> > + * 1 if so.
> >   */
> > -int rcu_needs_cpu(int cpu)
> > +static int rcu_needs_cpu_quick_check(int cpu)
> >  {
> >  	/* RCU callbacks either ready or pending? */
> >  	return per_cpu(rcu_sched_data, cpu).nxtlist ||
> > diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> > index e77cdf3..d6170a9 100644
> > --- a/kernel/rcutree_plugin.h
> > +++ b/kernel/rcutree_plugin.h
> > @@ -906,3 +906,72 @@ static void __init __rcu_init_preempt(void)
> >  }
> >  
> >  #endif /* #else #ifdef CONFIG_TREE_PREEMPT_RCU */
> > +
> > +#if defined(CONFIG_TREE_PREEMPT_RCU) || !defined(CONFIG_RCU_FAST_NO_HZ)
> > +
> > +/*
> > + * Check to see if any future RCU-related work will need to be done
> > + * by the current CPU, even if none need be done immediately, returning
> > + * 1 if so.  This function is part of the RCU implementation; it is -not-
> > + * an exported member of the RCU API.
> > + *
> > + * Because we have preemptible RCU, just check whether this CPU needs
> > + * any flavor of RCU.  Do not chew up lots of CPU cycles with preemption
> > + * disabled in a most-likely vain attempt to cause RCU not to need this CPU.
> > + */
> > +int rcu_needs_cpu(int cpu)
> > +{
> > +	return rcu_needs_cpu_quick_check(cpu);
> > +}
> > +
> > +#else
> > +
> > +#define RCU_NEEDS_CPU_FLUSHES 5
> > +
> > +/*
> > + * Check to see if any future RCU-related work will need to be done
> > + * by the current CPU, even if none need be done immediately, returning
> > + * 1 if so.  This function is part of the RCU implementation; it is -not-
> > + * an exported member of the RCU API.
> > + *
> > + * Because we are not supporting preemptible RCU, attempt to accelerate
> > + * any current grace periods so that RCU no longer needs this CPU, but
> > + * only if all other CPUs are already in dynticks-idle mode.  This will
> > + * allow the CPU cores to be powered down immediately, as opposed to after
> > + * waiting many milliseconds for grace periods to elapse.
> > + */
> > +int rcu_needs_cpu(int cpu)
> > +{
> > +	int c = 1;
> > +	int i;
> > +	int thatcpu;
> > +
> > +	/* Don't bother unless we are the last non-dyntick-idle CPU. */
> > +	for_each_cpu(thatcpu, nohz_cpu_mask)
> > +		if (thatcpu != cpu)
> > +			return rcu_needs_cpu_quick_check(cpu);
> 
> The comment and the code are not the same, I think.

Indeed, for this code to be correct, I would need to sequence through
the non-dyntick-idle CPUs, not the dyntick-idle ones.

Good catch!

I will likely come back with something similar to Steve Rostedt's
suggestion.  Probably better to sequence through all the CPUs rather
than to allocate a cpumask and invert it.  Or a 'for_each_cpu_not()'
or some such.  ;-)

There does appear to be a cpumask_next_zero() that I should be able to
use.

						Thanx, Paul

> -----------
> I found this thing, Although I think it is a ugly thing.
> Is it help?
> 
> See select_nohz_load_balancer().
> 
> /*
>  * This routine will try to nominate the ilb (idle load balancing)
>  * owner among the cpus whose ticks are stopped. ilb owner will do the idle
>  * load balancing on behalf of all those cpus. If all the cpus in the system
>  * go into this tickless mode, then there will be no ilb owner (as there is
>  * no need for one) and all the cpus will sleep till the next wakeup event
>  * arrives...
>  *
>  * For the ilb owner, tick is not stopped. And this tick will be used
>  * for idle load balancing. ilb owner will still be part of
>  * nohz.cpu_mask..
>  *
>  * While stopping the tick, this cpu will become the ilb owner if there
>  * is no other owner. And will be the owner till that cpu becomes busy
>  * or if all cpus in the system stop their ticks at which point
>  * there is no need for ilb owner.
>  *
>  * When the ilb owner becomes busy, it nominates another owner, during the
>  * next busy scheduler_tick()
>  */

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC tip/core/rcu] accelerate grace period if last non-dynticked CPU
  2010-01-25  3:48 [PATCH RFC tip/core/rcu] accelerate grace period if last non-dynticked CPU Paul E. McKenney
  2010-01-25 12:28 ` Lai Jiangshan
@ 2010-01-25 15:12 ` Steven Rostedt
  2010-01-27 14:11   ` Paul E. McKenney
  2010-01-26 21:30 ` Andi Kleen
  2 siblings, 1 reply; 23+ messages in thread
From: Steven Rostedt @ 2010-01-25 15:12 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, dvhltc, niv, tglx, peterz, Valdis.Kletnieks, dhowells

On Sun, 2010-01-24 at 19:48 -0800, Paul E. McKenney wrote:

> +/*
> + * Check to see if any future RCU-related work will need to be done
> + * by the current CPU, even if none need be done immediately, returning
> + * 1 if so.  This function is part of the RCU implementation; it is -not-
> + * an exported member of the RCU API.
> + *
> + * Because we are not supporting preemptible RCU, attempt to accelerate
> + * any current grace periods so that RCU no longer needs this CPU, but
> + * only if all other CPUs are already in dynticks-idle mode.  This will
> + * allow the CPU cores to be powered down immediately, as opposed to after
> + * waiting many milliseconds for grace periods to elapse.
> + */
> +int rcu_needs_cpu(int cpu)
> +{
> +	int c = 1;
> +	int i;
> +	int thatcpu;
> +
> +	/* Don't bother unless we are the last non-dyntick-idle CPU. */
> +	for_each_cpu(thatcpu, nohz_cpu_mask)
> +		if (thatcpu != cpu)
> +			return rcu_needs_cpu_quick_check(cpu);
> +
> +	/* Try to push remaining RCU-sched and RCU-bh callbacks through. */
> +	for (i = 0; i < RCU_NEEDS_CPU_FLUSHES && c; i++) {
> +		c = 0;
> +		if (per_cpu(rcu_sched_data, cpu).nxtlist) {
> +			c = 1;
> +			rcu_sched_qs(cpu);
> +			force_quiescent_state(&rcu_sched_state, 0);
> +			__rcu_process_callbacks(&rcu_sched_state,
> +						&per_cpu(rcu_sched_data, cpu));

> +		}
> +		if (per_cpu(rcu_bh_data, cpu).nxtlist) {
> +			c = 1;
> +			rcu_bh_qs(cpu);
> +			force_quiescent_state(&rcu_bh_state, 0);
> +			__rcu_process_callbacks(&rcu_bh_state,
> +						&per_cpu(rcu_bh_data, cpu));
> +		}
> +	}
> +
> +	/* If RCU callbacks are still pending, RCU still needs this CPU. */
> +	return c;

What happens if the last loop pushes out all callbacks? Then we would be
returning 1 when we could really be returning 0. Wouldn't a better
answer be:

	return per_cpu(rcu_sched_data, cpu).nxtlist ||
		per_cpu(rcu_bh_data, cpu).nxtlist;

-- Steve


> +}
> +
> +#endif



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC tip/core/rcu] accelerate grace period if last non-dynticked CPU
  2010-01-25 15:12 ` Steven Rostedt
@ 2010-01-27 14:11   ` Paul E. McKenney
  2010-01-27 14:52     ` Steven Rostedt
  0 siblings, 1 reply; 23+ messages in thread
From: Paul E. McKenney @ 2010-01-27 14:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, dvhltc, niv, tglx, peterz, Valdis.Kletnieks, dhowells

On Mon, Jan 25, 2010 at 10:12:03AM -0500, Steven Rostedt wrote:
> On Sun, 2010-01-24 at 19:48 -0800, Paul E. McKenney wrote:
> 
> > +/*
> > + * Check to see if any future RCU-related work will need to be done
> > + * by the current CPU, even if none need be done immediately, returning
> > + * 1 if so.  This function is part of the RCU implementation; it is -not-
> > + * an exported member of the RCU API.
> > + *
> > + * Because we are not supporting preemptible RCU, attempt to accelerate
> > + * any current grace periods so that RCU no longer needs this CPU, but
> > + * only if all other CPUs are already in dynticks-idle mode.  This will
> > + * allow the CPU cores to be powered down immediately, as opposed to after
> > + * waiting many milliseconds for grace periods to elapse.
> > + */
> > +int rcu_needs_cpu(int cpu)
> > +{
> > +	int c = 1;
> > +	int i;
> > +	int thatcpu;
> > +
> > +	/* Don't bother unless we are the last non-dyntick-idle CPU. */
> > +	for_each_cpu(thatcpu, nohz_cpu_mask)
> > +		if (thatcpu != cpu)
> > +			return rcu_needs_cpu_quick_check(cpu);
> > +
> > +	/* Try to push remaining RCU-sched and RCU-bh callbacks through. */
> > +	for (i = 0; i < RCU_NEEDS_CPU_FLUSHES && c; i++) {
> > +		c = 0;
> > +		if (per_cpu(rcu_sched_data, cpu).nxtlist) {
> > +			c = 1;
> > +			rcu_sched_qs(cpu);
> > +			force_quiescent_state(&rcu_sched_state, 0);
> > +			__rcu_process_callbacks(&rcu_sched_state,
> > +						&per_cpu(rcu_sched_data, cpu));
> 
> > +		}
> > +		if (per_cpu(rcu_bh_data, cpu).nxtlist) {
> > +			c = 1;
> > +			rcu_bh_qs(cpu);
> > +			force_quiescent_state(&rcu_bh_state, 0);
> > +			__rcu_process_callbacks(&rcu_bh_state,
> > +						&per_cpu(rcu_bh_data, cpu));
> > +		}
> > +	}
> > +
> > +	/* If RCU callbacks are still pending, RCU still needs this CPU. */
> > +	return c;
> 
> What happens if the last loop pushes out all callbacks? Then we would be
> returning 1 when we could really be returning 0. Wouldn't a better
> answer be:
> 
> 	return per_cpu(rcu_sched_data, cpu).nxtlist ||
> 		per_cpu(rcu_bh_data, cpu).nxtlist;

Good point!!!

Or I can move the assignment to "c" to the end of each branch of the
"if" statement, and do something like the following:

	c = !!per_cpu(rcu_sched_data, cpu).nxtlist;

But either way, you are right, it does not make sense to go to all the
trouble of forcing a grace period and then failing to take advantage
of it.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC tip/core/rcu] accelerate grace period if last non-dynticked CPU
  2010-01-27 14:11   ` Paul E. McKenney
@ 2010-01-27 14:52     ` Steven Rostedt
  0 siblings, 0 replies; 23+ messages in thread
From: Steven Rostedt @ 2010-01-27 14:52 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, dvhltc, niv, tglx, peterz, Valdis.Kletnieks, dhowells

On Wed, 2010-01-27 at 06:11 -0800, Paul E. McKenney wrote:
> On Mon, Jan 25, 2010 at 10:12:03AM -0500, Steven Rostedt wrote:
> > On Sun, 2010-01-24 at 19:48 -0800, Paul E. McKenney wrote:
> > 
> > > +/*
> > > + * Check to see if any future RCU-related work will need to be done
> > > + * by the current CPU, even if none need be done immediately, returning
> > > + * 1 if so.  This function is part of the RCU implementation; it is -not-
> > > + * an exported member of the RCU API.
> > > + *
> > > + * Because we are not supporting preemptible RCU, attempt to accelerate
> > > + * any current grace periods so that RCU no longer needs this CPU, but
> > > + * only if all other CPUs are already in dynticks-idle mode.  This will
> > > + * allow the CPU cores to be powered down immediately, as opposed to after
> > > + * waiting many milliseconds for grace periods to elapse.
> > > + */
> > > +int rcu_needs_cpu(int cpu)
> > > +{
> > > +	int c = 1;
> > > +	int i;
> > > +	int thatcpu;
> > > +
> > > +	/* Don't bother unless we are the last non-dyntick-idle CPU. */
> > > +	for_each_cpu(thatcpu, nohz_cpu_mask)
> > > +		if (thatcpu != cpu)
> > > +			return rcu_needs_cpu_quick_check(cpu);
> > > +
> > > +	/* Try to push remaining RCU-sched and RCU-bh callbacks through. */
> > > +	for (i = 0; i < RCU_NEEDS_CPU_FLUSHES && c; i++) {
> > > +		c = 0;
> > > +		if (per_cpu(rcu_sched_data, cpu).nxtlist) {
> > > +			c = 1;
> > > +			rcu_sched_qs(cpu);
> > > +			force_quiescent_state(&rcu_sched_state, 0);
> > > +			__rcu_process_callbacks(&rcu_sched_state,
> > > +						&per_cpu(rcu_sched_data, cpu));
> > 
> > > +		}
> > > +		if (per_cpu(rcu_bh_data, cpu).nxtlist) {
> > > +			c = 1;
> > > +			rcu_bh_qs(cpu);
> > > +			force_quiescent_state(&rcu_bh_state, 0);
> > > +			__rcu_process_callbacks(&rcu_bh_state,
> > > +						&per_cpu(rcu_bh_data, cpu));
> > > +		}
> > > +	}
> > > +
> > > +	/* If RCU callbacks are still pending, RCU still needs this CPU. */
> > > +	return c;
> > 
> > What happens if the last loop pushes out all callbacks? Then we would be
> > returning 1 when we could really be returning 0. Wouldn't a better
> > answer be:
> > 
> > 	return per_cpu(rcu_sched_data, cpu).nxtlist ||
> > 		per_cpu(rcu_bh_data, cpu).nxtlist;
> 
> Good point!!!
> 
> Or I can move the assignment to "c" to the end of each branch of the
> "if" statement, and do something like the following:
> 
> 	c = !!per_cpu(rcu_sched_data, cpu).nxtlist;

Hmm, that may just add obfuscation to those looking at the code. 

> 
> But either way, you are right, it does not make sense to go to all the
> trouble of forcing a grace period and then failing to take advantage
> of it.

Yeah, whatever implementation is fine, as long as it works and takes
advantage of all forced grace periods.

-- Steve



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC tip/core/rcu] accelerate grace period if last non-dynticked CPU
  2010-01-25  3:48 [PATCH RFC tip/core/rcu] accelerate grace period if last non-dynticked CPU Paul E. McKenney
  2010-01-25 12:28 ` Lai Jiangshan
  2010-01-25 15:12 ` Steven Rostedt
@ 2010-01-26 21:30 ` Andi Kleen
  2010-01-26 23:55   ` Mathieu Desnoyers
  2010-01-27  5:20   ` Paul E. McKenney
  2 siblings, 2 replies; 23+ messages in thread
From: Andi Kleen @ 2010-01-26 21:30 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, dvhltc, niv, tglx, peterz, rostedt, Valdis.Kletnieks,
	dhowells

"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:

Kind of offtopic to the original patch, but I couldn't 
resist...

> +config RCU_FAST_NO_HZ
> +	bool "Accelerate last non-dyntick-idle CPU's grace periods"
> +	depends on TREE_RCU && NO_HZ && SMP

Having such a thing as a config option doesn't really make 
any sense to me. Who would want to recompile their kernel
to enable/disable this? If anything it should be runtime, or better
just unconditionally on.

In general RCU could probably reduce its number of "weird"
Kconfig options.

While I think I have a better understanding of RCU than a lot
of normal users I often have no clue what to set there when
building a kernel.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC tip/core/rcu] accelerate grace period if last non-dynticked CPU
  2010-01-26 21:30 ` Andi Kleen
@ 2010-01-26 23:55   ` Mathieu Desnoyers
  2010-01-27  5:21     ` Paul E. McKenney
  2010-01-27  5:20   ` Paul E. McKenney
  1 sibling, 1 reply; 23+ messages in thread
From: Mathieu Desnoyers @ 2010-01-26 23:55 UTC (permalink / raw)
  To: Andi Kleen
  Cc: paulmck, linux-kernel, mingo, laijs, dipankar, akpm, josh, dvhltc,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells

* Andi Kleen (andi@firstfloor.org) wrote:
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:
> 
> Kind of offtopic to the original patch, but I couldn't 
> resist...
> 
> > +config RCU_FAST_NO_HZ
> > +	bool "Accelerate last non-dyntick-idle CPU's grace periods"
> > +	depends on TREE_RCU && NO_HZ && SMP
> 
> Having such a thing as a config option doesn't really make 
> any sense to me. Who would want to recompile their kernel
> to enable/disable this? If anything it should be runtime, or better
> just unconditionally on.
> 
> In general RCU could probably reduce its number of "weird"
> Kconfig options.
> 
> While I think I have a better understanding of RCU than a lot
> of normal users I often have no clue what to set there when
> building a kernel.

Maybe we could keep them under a CONFIG_DEBUG_RCU umbrella. Compiling
out parts of the rcu options can be useful for debugging purposes, but
I agree with you that end users should not be bothered with that much
options when some of them are "obviously" wanted.

OTOH, I understand that Paul seems to want to introduce new RCU
features gradually, without hitting all kernel users with bugs in newer
features. That's a sane approach to keep things generally stable, but
maybe it is time to set some of the stabilized RCU options to default Y
and move their config to the debug menu.

Let's see what Paul has to say about this...

Thanks,

Mathieu

> 
> -Andi
> 
> -- 
> ak@linux.intel.com -- Speaking for myself only.

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC tip/core/rcu] accelerate grace period if last non-dynticked CPU
  2010-01-26 23:55   ` Mathieu Desnoyers
@ 2010-01-27  5:21     ` Paul E. McKenney
  0 siblings, 0 replies; 23+ messages in thread
From: Paul E. McKenney @ 2010-01-27  5:21 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andi Kleen, linux-kernel, mingo, laijs, dipankar, akpm, josh,
	dvhltc, niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells

On Tue, Jan 26, 2010 at 06:55:16PM -0500, Mathieu Desnoyers wrote:
> * Andi Kleen (andi@firstfloor.org) wrote:
> > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:
> > 
> > Kind of offtopic to the original patch, but I couldn't 
> > resist...
> > 
> > > +config RCU_FAST_NO_HZ
> > > +	bool "Accelerate last non-dyntick-idle CPU's grace periods"
> > > +	depends on TREE_RCU && NO_HZ && SMP
> > 
> > Having such a thing as a config option doesn't really make 
> > any sense to me. Who would want to recompile their kernel
> > to enable/disable this? If anything it should be runtime, or better
> > just unconditionally on.
> > 
> > In general RCU could probably reduce its number of "weird"
> > Kconfig options.
> > 
> > While I think I have a better understanding of RCU than a lot
> > of normal users I often have no clue what to set there when
> > building a kernel.
> 
> Maybe we could keep them under a CONFIG_DEBUG_RCU umbrella. Compiling
> out parts of the rcu options can be useful for debugging purposes, but
> I agree with you that end users should not be bothered with that much
> options when some of them are "obviously" wanted.
> 
> OTOH, I understand that Paul seems to want to introduce new RCU
> features gradually, without hitting all kernel users with bugs in newer
> features. That's a sane approach to keep things generally stable, but
> maybe it is time to set some of the stabilized RCU options to default Y
> and move their config to the debug menu.

That is indeed a big part of the motivation.  ;-)

							Thanx, Paul

> Let's see what Paul has to say about this...
> 
> Thanks,
> 
> Mathieu
> 
> > 
> > -Andi
> > 
> > -- 
> > ak@linux.intel.com -- Speaking for myself only.
> 
> -- 
> Mathieu Desnoyers
> OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC tip/core/rcu] accelerate grace period if last non-dynticked CPU
  2010-01-26 21:30 ` Andi Kleen
  2010-01-26 23:55   ` Mathieu Desnoyers
@ 2010-01-27  5:20   ` Paul E. McKenney
  2010-01-27  9:43     ` Andi Kleen
  1 sibling, 1 reply; 23+ messages in thread
From: Paul E. McKenney @ 2010-01-27  5:20 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, dvhltc, niv, tglx, peterz, rostedt, Valdis.Kletnieks,
	dhowells

On Tue, Jan 26, 2010 at 10:30:57PM +0100, Andi Kleen wrote:
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:
> 
> Kind of offtopic to the original patch, but I couldn't 
> resist...
> 
> > +config RCU_FAST_NO_HZ
> > +	bool "Accelerate last non-dyntick-idle CPU's grace periods"
> > +	depends on TREE_RCU && NO_HZ && SMP
> 
> Having such a thing as a config option doesn't really make 
> any sense to me. Who would want to recompile their kernel
> to enable/disable this? If anything it should be runtime, or better
> just unconditionally on.

It adds significant overhead on entry to dyntick-idle mode for systems
with large numbers of CPUs.  :-(

> In general RCU could probably reduce its number of "weird"
> Kconfig options.
> 
> While I think I have a better understanding of RCU than a lot
> of normal users I often have no clue what to set there when
> building a kernel.

One approach would be to make it be default for small numbers of CPUs
(as in systems likely to be battery powered) but not for large numbers
of CPUs.  The reason I didn't do this initially is that a server-class
four-CPU system would have no need for this, but a four-core cellphone
most definitely would.  So I just created another config variable.

In any case, I do agree with your point about reducing the number of
config variables.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC tip/core/rcu] accelerate grace period if last non-dynticked CPU
  2010-01-27  5:20   ` Paul E. McKenney
@ 2010-01-27  9:43     ` Andi Kleen
  2010-01-27  9:50       ` Peter Zijlstra
  2010-01-27 10:01       ` Paul E. McKenney
  0 siblings, 2 replies; 23+ messages in thread
From: Andi Kleen @ 2010-01-27  9:43 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Andi Kleen, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, dvhltc, niv, tglx, peterz, rostedt,
	Valdis.Kletnieks, dhowells

On Tue, Jan 26, 2010 at 09:20:50PM -0800, Paul E. McKenney wrote:
> On Tue, Jan 26, 2010 at 10:30:57PM +0100, Andi Kleen wrote:
> > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:
> > 
> > Kind of offtopic to the original patch, but I couldn't 
> > resist...
> > 
> > > +config RCU_FAST_NO_HZ
> > > +	bool "Accelerate last non-dyntick-idle CPU's grace periods"
> > > +	depends on TREE_RCU && NO_HZ && SMP
> > 
> > Having such a thing as a config option doesn't really make 
> > any sense to me. Who would want to recompile their kernel
> > to enable/disable this? If anything it should be runtime, or better
> > just unconditionally on.
> 
> It adds significant overhead on entry to dyntick-idle mode for systems
> with large numbers of CPUs.  :-(

Can't you simply check that at runtime then?

if (num_possible_cpus() > 20) 
	...

BTW the new small is large. This years high end desktop PC will come with 
upto 12 CPU threads. It would likely be challenging to find a good
number for 20 that holds up with the future.

Or better perhaps have some threshold that you don't do it 
that often, or only do it when you expect to be idle for a long
enough time that the CPU can enter deeper idle states

(I higher idle states some more wakeups typically don't matter
that much)

The cpufreq/cstate governour have a reasonable good idea
now how "idle" the system is and will be. Maybe you can reuse
that information somehow.
	
-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC tip/core/rcu] accelerate grace period if last non-dynticked CPU
  2010-01-27  9:43     ` Andi Kleen
@ 2010-01-27  9:50       ` Peter Zijlstra
  2010-01-27 10:00         ` Andi Kleen
  2010-01-27 10:04         ` Paul E. McKenney
  2010-01-27 10:01       ` Paul E. McKenney
  1 sibling, 2 replies; 23+ messages in thread
From: Peter Zijlstra @ 2010-01-27  9:50 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Paul E. McKenney, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, dvhltc, niv, tglx, rostedt,
	Valdis.Kletnieks, dhowells

On Wed, 2010-01-27 at 10:43 +0100, Andi Kleen wrote:
> 
> Can't you simply check that at runtime then?
> 
> if (num_possible_cpus() > 20) 
>         ...
> 
> BTW the new small is large. This years high end desktop PC will come with 
> upto 12 CPU threads. It would likely be challenging to find a good
> number for 20 that holds up with the future. 

If only scalability were that easy :/

These massive core/thread count things are causing more problems as
well, the cpus/node ratios are constantly growing, giving grief in the
page allocator as well as other places that used to scale per node.

As to the current problem, the call_rcu() interface doesn't make a hard
promise that the callback will be done on the same cpu, right? So why
not simply move the callback list over to a more active cpu?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC tip/core/rcu] accelerate grace period if last non-dynticked CPU
  2010-01-27  9:50       ` Peter Zijlstra
@ 2010-01-27 10:00         ` Andi Kleen
  2010-01-27 10:04         ` Paul E. McKenney
  1 sibling, 0 replies; 23+ messages in thread
From: Andi Kleen @ 2010-01-27 10:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Paul E. McKenney, linux-kernel, mingo, laijs,
	dipankar, akpm, mathieu.desnoyers, josh, dvhltc, niv, tglx,
	rostedt, Valdis.Kletnieks, dhowells

On Wed, Jan 27, 2010 at 10:50:50AM +0100, Peter Zijlstra wrote:
> On Wed, 2010-01-27 at 10:43 +0100, Andi Kleen wrote:
> > 
> > Can't you simply check that at runtime then?
> > 
> > if (num_possible_cpus() > 20) 
> >         ...
> > 
> > BTW the new small is large. This years high end desktop PC will come with 
> > upto 12 CPU threads. It would likely be challenging to find a good
> > number for 20 that holds up with the future. 
> 
> If only scalability were that easy :/

See the rest of my email...

I see you're already suffering from "Linus disease" ... only reading
the first paragraph/sentence in the email.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC tip/core/rcu] accelerate grace period if last non-dynticked CPU
  2010-01-27  9:50       ` Peter Zijlstra
  2010-01-27 10:00         ` Andi Kleen
@ 2010-01-27 10:04         ` Paul E. McKenney
  2010-01-27 11:39           ` Nick Piggin
  1 sibling, 1 reply; 23+ messages in thread
From: Paul E. McKenney @ 2010-01-27 10:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, dvhltc, niv, tglx, rostedt,
	Valdis.Kletnieks, dhowells

On Wed, Jan 27, 2010 at 10:50:50AM +0100, Peter Zijlstra wrote:
> On Wed, 2010-01-27 at 10:43 +0100, Andi Kleen wrote:
> > 
> > Can't you simply check that at runtime then?
> > 
> > if (num_possible_cpus() > 20) 
> >         ...
> > 
> > BTW the new small is large. This years high end desktop PC will come with 
> > upto 12 CPU threads. It would likely be challenging to find a good
> > number for 20 that holds up with the future. 
> 
> If only scalability were that easy :/
> 
> These massive core/thread count things are causing more problems as
> well, the cpus/node ratios are constantly growing, giving grief in the
> page allocator as well as other places that used to scale per node.
> 
> As to the current problem, the call_rcu() interface doesn't make a hard
> promise that the callback will be done on the same cpu, right? So why
> not simply move the callback list over to a more active cpu?

I could indeed do that.  However, there is nothing stopping the
more-active CPU from going into dynticks-idle mode between the time
that I decide to push the callback to it and the time I actually do
the pushing.  :-(

I considered pushing the callbacks to the orphanage, but that is a
global lock that I would rather not acquire on each dyntick-idle
transition.

This conversation is having the effect of making me much more comfortable
adding a kernel configuration parameter.  Might not have been the intent,
but there you have it!  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC tip/core/rcu] accelerate grace period if last non-dynticked CPU
  2010-01-27 10:04         ` Paul E. McKenney
@ 2010-01-27 11:39           ` Nick Piggin
  2010-01-27 11:59             ` Paul E. McKenney
  0 siblings, 1 reply; 23+ messages in thread
From: Nick Piggin @ 2010-01-27 11:39 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, Andi Kleen, linux-kernel, mingo, laijs, dipankar,
	akpm, mathieu.desnoyers, josh, dvhltc, niv, tglx, rostedt,
	Valdis.Kletnieks, dhowells

On Wed, Jan 27, 2010 at 02:04:34AM -0800, Paul E. McKenney wrote:
> I could indeed do that.  However, there is nothing stopping the
> more-active CPU from going into dynticks-idle mode between the time
> that I decide to push the callback to it and the time I actually do
> the pushing.  :-(
> 
> I considered pushing the callbacks to the orphanage, but that is a
> global lock that I would rather not acquire on each dyntick-idle
> transition.

Well we already have to do atomic operations on the nohz mask, so
maybe it would be acceptable to actually have a spinlock there to
serialise operations on the nohz mask and also allow some subsystem
specific things (synchronisation here should allow either one of
those above approaches).

It's not going to be zero cost, but seeing as there is already the
contended cacheline there, it's not going to introduce a
fundamentally new bottleneck.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC tip/core/rcu] accelerate grace period if last non-dynticked CPU
  2010-01-27 11:39           ` Nick Piggin
@ 2010-01-27 11:59             ` Paul E. McKenney
  0 siblings, 0 replies; 23+ messages in thread
From: Paul E. McKenney @ 2010-01-27 11:59 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Peter Zijlstra, Andi Kleen, linux-kernel, mingo, laijs, dipankar,
	akpm, mathieu.desnoyers, josh, dvhltc, niv, tglx, rostedt,
	Valdis.Kletnieks, dhowells

On Wed, Jan 27, 2010 at 10:39:22PM +1100, Nick Piggin wrote:
> On Wed, Jan 27, 2010 at 02:04:34AM -0800, Paul E. McKenney wrote:
> > I could indeed do that.  However, there is nothing stopping the
> > more-active CPU from going into dynticks-idle mode between the time
> > that I decide to push the callback to it and the time I actually do
> > the pushing.  :-(
> > 
> > I considered pushing the callbacks to the orphanage, but that is a
> > global lock that I would rather not acquire on each dyntick-idle
> > transition.
> 
> Well we already have to do atomic operations on the nohz mask, so
> maybe it would be acceptable to actually have a spinlock there to
> serialise operations on the nohz mask and also allow some subsystem
> specific things (synchronisation here should allow either one of
> those above approaches).
> 
> It's not going to be zero cost, but seeing as there is already the
> contended cacheline there, it's not going to introduce a
> fundamentally new bottleneck.

Good point, although a contended global lock is nastier than a contended
cache line.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC tip/core/rcu] accelerate grace period if last non-dynticked CPU
  2010-01-27  9:43     ` Andi Kleen
  2010-01-27  9:50       ` Peter Zijlstra
@ 2010-01-27 10:01       ` Paul E. McKenney
  2010-01-27 10:13         ` Andi Kleen
  1 sibling, 1 reply; 23+ messages in thread
From: Paul E. McKenney @ 2010-01-27 10:01 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, dvhltc, niv, tglx, peterz, rostedt, Valdis.Kletnieks,
	dhowells

On Wed, Jan 27, 2010 at 10:43:36AM +0100, Andi Kleen wrote:
> On Tue, Jan 26, 2010 at 09:20:50PM -0800, Paul E. McKenney wrote:
> > On Tue, Jan 26, 2010 at 10:30:57PM +0100, Andi Kleen wrote:
> > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:
> > > 
> > > Kind of offtopic to the original patch, but I couldn't 
> > > resist...
> > > 
> > > > +config RCU_FAST_NO_HZ
> > > > +	bool "Accelerate last non-dyntick-idle CPU's grace periods"
> > > > +	depends on TREE_RCU && NO_HZ && SMP
> > > 
> > > Having such a thing as a config option doesn't really make 
> > > any sense to me. Who would want to recompile their kernel
> > > to enable/disable this? If anything it should be runtime, or better
> > > just unconditionally on.
> > 
> > It adds significant overhead on entry to dyntick-idle mode for systems
> > with large numbers of CPUs.  :-(
> 
> Can't you simply check that at runtime then?
> 
> if (num_possible_cpus() > 20) 
> 	...
> 
> BTW the new small is large. This years high end desktop PC will come with 
> upto 12 CPU threads. It would likely be challenging to find a good
> number for 20 that holds up with the future.

And this was another line of reasoning that lead me to the extra kernel
config parameter.

> Or better perhaps have some threshold that you don't do it 
> that often, or only do it when you expect to be idle for a long
> enough time that the CPU can enter deeper idle states
> 
> (I higher idle states some more wakeups typically don't matter
> that much)
> 
> The cpufreq/cstate governour have a reasonable good idea
> now how "idle" the system is and will be. Maybe you can reuse
> that information somehow.

My first thought was to find an existing "I am a small device running on
battery power" or "low power consumption is critical to me" config
parameter.  I didn't find anything that looked like that.  If there was
one, I would make RCU_FAST_NO_HZ depend on it.

Or did I miss some kernel parameter or API?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC tip/core/rcu] accelerate grace period if last non-dynticked CPU
  2010-01-27 10:01       ` Paul E. McKenney
@ 2010-01-27 10:13         ` Andi Kleen
  2010-01-27 11:44           ` Paul E. McKenney
  0 siblings, 1 reply; 23+ messages in thread
From: Andi Kleen @ 2010-01-27 10:13 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Andi Kleen, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, dvhltc, niv, tglx, peterz, rostedt,
	Valdis.Kletnieks, dhowells, arjan

> > Can't you simply check that at runtime then?
> > 
> > if (num_possible_cpus() > 20) 
> > 	...
> > 
> > BTW the new small is large. This years high end desktop PC will come with 
> > upto 12 CPU threads. It would likely be challenging to find a good
> > number for 20 that holds up with the future.
> 
> And this was another line of reasoning that lead me to the extra kernel
> config parameter.

Which doesn't solve the problem at all.

> > Or better perhaps have some threshold that you don't do it 
> > that often, or only do it when you expect to be idle for a long
> > enough time that the CPU can enter deeper idle states
> > 
> > (I higher idle states some more wakeups typically don't matter
> > that much)
> > 
> > The cpufreq/cstate governour have a reasonable good idea
> > now how "idle" the system is and will be. Maybe you can reuse
> > that information somehow.
> 
> My first thought was to find an existing "I am a small device running on
> battery power" or "low power consumption is critical to me" config
> parameter.  I didn't find anything that looked like that.  If there was
> one, I would make RCU_FAST_NO_HZ depend on it.
> 
> Or did I miss some kernel parameter or API?

There are a few for scalability (e.g. numa_distance()), but they're 
obscure.  The really good ones are just known somewhere.

But I think in this case scalability is not the key thing to check
for, but expected idle latency. Even on a large system if near all
CPUs are idle spending some time to keep them idle even longer is a good
thing. But only if the CPUs actually benefit from long idle.

There's the "pm_qos_latency" frame work that could be used for this
in theory, but it's not 100% the right match because it's not
dynamic.

Unfortunately last time I looked the interfaces were rather clumpsy
(e.g. don't allow interrupt level notifiers)

Better would be some insight into the expected future latency:
look at exporting this information from the various frequency/idle 
governours. 

Perhaps pm_qos_latency could be extended to support that?
CC Arjan, maybe he has some ideas on that.

After all of that there would be still of course the question 
what the right latency threshold would be, but at least that's
a much easier question that number of CPUs.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC tip/core/rcu] accelerate grace period if last non-dynticked CPU
  2010-01-27 10:13         ` Andi Kleen
@ 2010-01-27 11:44           ` Paul E. McKenney
  2010-01-27 12:11             ` Andi Kleen
  0 siblings, 1 reply; 23+ messages in thread
From: Paul E. McKenney @ 2010-01-27 11:44 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, dvhltc, niv, tglx, peterz, rostedt, Valdis.Kletnieks,
	dhowells, arjan

On Wed, Jan 27, 2010 at 11:13:42AM +0100, Andi Kleen wrote:
> > > Can't you simply check that at runtime then?
> > > 
> > > if (num_possible_cpus() > 20) 
> > > 	...
> > > 
> > > BTW the new small is large. This years high end desktop PC will come with 
> > > upto 12 CPU threads. It would likely be challenging to find a good
> > > number for 20 that holds up with the future.
> > 
> > And this was another line of reasoning that lead me to the extra kernel
> > config parameter.
> 
> Which doesn't solve the problem at all.

Depending on what you consider the problem to be, of course.

>From what I can see, most people would want RCU_FAST_NO_HZ=n.  Only
people with extreme power-consumption concerns would likely care enough
to select this.

> > > Or better perhaps have some threshold that you don't do it 
> > > that often, or only do it when you expect to be idle for a long
> > > enough time that the CPU can enter deeper idle states
> > > 
> > > (I higher idle states some more wakeups typically don't matter
> > > that much)
> > > 
> > > The cpufreq/cstate governour have a reasonable good idea
> > > now how "idle" the system is and will be. Maybe you can reuse
> > > that information somehow.
> > 
> > My first thought was to find an existing "I am a small device running on
> > battery power" or "low power consumption is critical to me" config
> > parameter.  I didn't find anything that looked like that.  If there was
> > one, I would make RCU_FAST_NO_HZ depend on it.
> > 
> > Or did I miss some kernel parameter or API?
> 
> There are a few for scalability (e.g. numa_distance()), but they're 
> obscure.  The really good ones are just known somewhere.
> 
> But I think in this case scalability is not the key thing to check
> for, but expected idle latency. Even on a large system if near all
> CPUs are idle spending some time to keep them idle even longer is a good
> thing. But only if the CPUs actually benefit from long idle.

The larger the number of CPUs, the lower the probability of all of them
going idle, so the less difference this patch makes.  Perhaps some
larger system will care about this on a per-socket basis, but I have yet
to hear any requests.

> There's the "pm_qos_latency" frame work that could be used for this
> in theory, but it's not 100% the right match because it's not
> dynamic.
> 
> Unfortunately last time I looked the interfaces were rather clumpsy
> (e.g. don't allow interrupt level notifiers)

I do need to query from interrupt context, but could potentially have a
notifier set up state for me.  Still, the real question is "how important
is a small reduction in power consumption?"

> Better would be some insight into the expected future latency:
> look at exporting this information from the various frequency/idle 
> governours. 
> 
> Perhaps pm_qos_latency could be extended to support that?
> CC Arjan, maybe he has some ideas on that.
> 
> After all of that there would be still of course the question 
> what the right latency threshold would be, but at least that's
> a much easier question that number of CPUs.

Hmmm...  I am still believing that very few people want RCU_FAST_NO_HZ,
and that those who want it can select it for their devices.

Trying to apply this to server-class machines gets into questions like
"where are the core/socket boundaries", "can this hardware turn entire
cores/sockets off", "given the current workload, does it really make sense
to try to turn off entire cores/sockets", and "is a few ticks important
when migrating processes, irqs, timers, and whatever else is required to
actually turn off a given core or socket for a significant time period".

I took a quick look at te pm_qos_latency, and, as you note, it doesn't
really seem to be designed to handle this situation.

And we really should not be gold-plating this thing.  I have one requester
(off list) who needs it badly, and who is willing to deal with a kernel
configuration parameter.  I have no other requesters, and therefore
cannot reasonably anticipate their needs.  As a result, we cannot justify
building any kind of infrastructure beyond what is reasonable for the
single requester.

Maybe the situation will be different next year.  But if so, we would
then have some information on what people really need.  So, if it turns
out that more will be needed in 2011, I will be happy to do something
about it once I have some hard information on what will really be needed.

Fair enough?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC tip/core/rcu] accelerate grace period if last non-dynticked CPU
  2010-01-27 11:44           ` Paul E. McKenney
@ 2010-01-27 12:11             ` Andi Kleen
  2010-01-27 13:23               ` Paul E. McKenney
  0 siblings, 1 reply; 23+ messages in thread
From: Andi Kleen @ 2010-01-27 12:11 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Andi Kleen, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, josh, dvhltc, niv, tglx, peterz, rostedt,
	Valdis.Kletnieks, dhowells, arjan

> From what I can see, most people would want RCU_FAST_NO_HZ=n.  Only

Most people do not recompile their kernel. And even those
that do most likely will not have enough information to make
an informed choice at build time.

> people with extreme power-consumption concerns would likely care enough
> to select this.

What would a distributor shipping binary kernels use?

> > But I think in this case scalability is not the key thing to check
> > for, but expected idle latency. Even on a large system if near all
> > CPUs are idle spending some time to keep them idle even longer is a good
> > thing. But only if the CPUs actually benefit from long idle.
> 
> The larger the number of CPUs, the lower the probability of all of them
> going idle, so the less difference this patch makes.  Perhaps some

My shiny new 8 CPU threads desktop is not less likely to go idle when I do 
nothing on it than an older dual core 2 CPU thread desktop.

Especially not given all the recent optimizations (no idle tick)
in this area etc. 

And core/thread counts are growing. In terms of CPU numbers today's
large machine is tomorrow's small machine.

> I do need to query from interrupt context, but could potentially have a
> notifier set up state for me.  Still, the real question is "how important
> is a small reduction in power consumption?"

I think any (measurable) power saving is important. Also on modern Intel
CPUs power saving often directly translates into performance:
if more cores are idle the others can clock faster.

> I took a quick look at te pm_qos_latency, and, as you note, it doesn't
> really seem to be designed to handle this situation.

It could be extended for it. It's just software after all,
we can change it.

> 
> And we really should not be gold-plating this thing.  I have one requester
> (off list) who needs it badly, and who is willing to deal with a kernel
> configuration parameter.  I have no other requesters, and therefore
> cannot reasonably anticipate their needs.  As a result, we cannot justify
> building any kind of infrastructure beyond what is reasonable for the
> single requester.

If this has a measurable power advantage I think it's better to
do the extra steps to make it usable everywhere, with automatic heuristics 
and no Kconfig hacks. 

If it's not then it's probably not worth merging.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC tip/core/rcu] accelerate grace period if last non-dynticked CPU
  2010-01-27 12:11             ` Andi Kleen
@ 2010-01-27 13:23               ` Paul E. McKenney
  0 siblings, 0 replies; 23+ messages in thread
From: Paul E. McKenney @ 2010-01-27 13:23 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	josh, dvhltc, niv, tglx, peterz, rostedt, Valdis.Kletnieks,
	dhowells, arjan

On Wed, Jan 27, 2010 at 01:11:50PM +0100, Andi Kleen wrote:
> > From what I can see, most people would want RCU_FAST_NO_HZ=n.  Only
> 
> Most people do not recompile their kernel. And even those
> that do most likely will not have enough information to make
> an informed choice at build time.

I believe that only a few embedded people will be using RCU_FAST_NO_HZ=y.

> > people with extreme power-consumption concerns would likely care enough
> > to select this.
> 
> What would a distributor shipping binary kernels use?

RCU_FAST_NO_HZ=n.

> > > But I think in this case scalability is not the key thing to check
> > > for, but expected idle latency. Even on a large system if near all
> > > CPUs are idle spending some time to keep them idle even longer is a good
> > > thing. But only if the CPUs actually benefit from long idle.
> > 
> > The larger the number of CPUs, the lower the probability of all of them
> > going idle, so the less difference this patch makes.  Perhaps some
> 
> My shiny new 8 CPU threads desktop is not less likely to go idle when I do 
> nothing on it than an older dual core 2 CPU thread desktop.
> 
> Especially not given all the recent optimizations (no idle tick)
> in this area etc. 
> 
> And core/thread counts are growing. In terms of CPU numbers today's
> large machine is tomorrow's small machine.

But your shiny new 8-CPU threads desktop runs off of AC power, right?
If so, I don't think you will care about a 4-5-tick delay for the last
CPU going into dyntick-idle mode.

And I bet you won't be able to measure the difference on your
battery-powered laptop.

> > I do need to query from interrupt context, but could potentially have a
> > notifier set up state for me.  Still, the real question is "how important
> > is a small reduction in power consumption?"
> 
> I think any (measurable) power saving is important. Also on modern Intel
> CPUs power saving often directly translates into performance:
> if more cores are idle the others can clock faster.

OK, I am testing a corrected patch with the kernel configuration
parameter.  If you can show a measureable difference on typical
desktop/server systems, then we can look into doing something more
generally useful.

> > I took a quick look at te pm_qos_latency, and, as you note, it doesn't
> > really seem to be designed to handle this situation.
> 
> It could be extended for it. It's just software after all,
> we can change it.

Of course we can change it.  But should we?

> > And we really should not be gold-plating this thing.  I have one requester
> > (off list) who needs it badly, and who is willing to deal with a kernel
> > configuration parameter.  I have no other requesters, and therefore
> > cannot reasonably anticipate their needs.  As a result, we cannot justify
> > building any kind of infrastructure beyond what is reasonable for the
> > single requester.
> 
> If this has a measurable power advantage I think it's better to
> do the extra steps to make it usable everywhere, with automatic heuristics 
> and no Kconfig hacks. 

I would agree with the following:

	If this has a measurable power advantage -on- -a- -large-
	-fraction- -of- -systems-, then it -might- be better to do
	extra steps to make it usable everywhere, which -might- involve
	heuristics instead of a kernel configuration parameter.

> If it's not then it's probably not worth merging.

This is not necessarily the case.  It can make a lot of sense to try
something for a special case, and then use the experience gained in
that special case to produce a good solution.  On the other hand, it
does not necessarily make sense to do a lot of possibly useless work
based on vague guesses as to what is needed.

If we merge the special case, then others have the opportunity to try it
out, thus getting us the experience required to see (1) if soemthing
more general-purpose is needed in the first place and (2) if so, what
that more general-purpose thing might look like.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2010-01-27 14:52 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-25  3:48 [PATCH RFC tip/core/rcu] accelerate grace period if last non-dynticked CPU Paul E. McKenney
2010-01-25 12:28 ` Lai Jiangshan
2010-01-25 12:35   ` Peter Zijlstra
2010-01-25 15:08   ` Steven Rostedt
2010-01-27  5:17   ` Paul E. McKenney
2010-01-25 15:12 ` Steven Rostedt
2010-01-27 14:11   ` Paul E. McKenney
2010-01-27 14:52     ` Steven Rostedt
2010-01-26 21:30 ` Andi Kleen
2010-01-26 23:55   ` Mathieu Desnoyers
2010-01-27  5:21     ` Paul E. McKenney
2010-01-27  5:20   ` Paul E. McKenney
2010-01-27  9:43     ` Andi Kleen
2010-01-27  9:50       ` Peter Zijlstra
2010-01-27 10:00         ` Andi Kleen
2010-01-27 10:04         ` Paul E. McKenney
2010-01-27 11:39           ` Nick Piggin
2010-01-27 11:59             ` Paul E. McKenney
2010-01-27 10:01       ` Paul E. McKenney
2010-01-27 10:13         ` Andi Kleen
2010-01-27 11:44           ` Paul E. McKenney
2010-01-27 12:11             ` Andi Kleen
2010-01-27 13:23               ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox