[PATCH] sched: avoid large irq-latencies in smp-balancing

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] sched: avoid large irq-latencies in smp-balancing
@ 2007-11-07 11:29 Peter Zijlstra
  2007-11-07 12:17 ` Peter Zijlstra
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2007-11-07 11:29 UTC (permalink / raw)
  To: Ingo Molnar, Peter Williams; +Cc: linux-kernel

Subject: sched: avoid large irq-latencies in smp-balancing

SMP balancing is done with IRQs disabled and can iterate the full rq. When rqs
are large this can cause large irq-latencies. Limit the nr of iterations on
each run.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Peter Williams <pwil3058@bigpond.net.au>
---
 kernel/sched.c  |   10 ++++++++--
 kernel/sysctl.c |    8 ++++++++
 2 files changed, 16 insertions(+), 2 deletions(-)

Index: linux-2.6-2/kernel/sched.c
===================================================================
--- linux-2.6-2.orig/kernel/sched.c
+++ linux-2.6-2/kernel/sched.c
@@ -474,6 +474,12 @@ const_debug unsigned int sysctl_sched_fe
 #define sched_feat(x) (sysctl_sched_features & SCHED_FEAT_##x)
 
 /*
+ * Number of tasks to iterate in a single balance run.
+ * Limited because this is done with IRQs disabled.
+ */
+const_debug unsigned int sysctl_sched_nr_migrate = 32;
+
+/*
  * For kernel-internal use: high-speed (but slightly incorrect) per-cpu
  * clock constructed from sched_clock():
  */
@@ -2237,7 +2243,7 @@ balance_tasks(struct rq *this_rq, int th
 	      enum cpu_idle_type idle, int *all_pinned,
 	      int *this_best_prio, struct rq_iterator *iterator)
 {
-	int pulled = 0, pinned = 0, skip_for_load;
+	int loops = 0, pulled = 0, pinned = 0, skip_for_load;
 	struct task_struct *p;
 	long rem_load_move = max_load_move;
 
@@ -2251,7 +2257,7 @@ balance_tasks(struct rq *this_rq, int th
 	 */
 	p = iterator->start(iterator->arg);
 next:
-	if (!p)
+	if (!p || loops++ > sysctl_sched_nr_migrate)
 		goto out;
 	/*
 	 * To help distribute high priority tasks accross CPUs we don't
Index: linux-2.6-2/kernel/sysctl.c
===================================================================
--- linux-2.6-2.orig/kernel/sysctl.c
+++ linux-2.6-2/kernel/sysctl.c
@@ -298,6 +298,14 @@ static struct ctl_table kern_table[] = {
 		.mode		= 0644,
 		.proc_handler	= &proc_dointvec,
 	},
+	{
+		.ctl_name	= CTL_UNNUMBERED,
+		.procname	= "sched_nr_migrate",
+		.data		= &sysctl_sched_nr_migrate,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 644,
+		.proc_handler	= &proc_dointvec,
+	},
 #endif
 	{
 		.ctl_name	= CTL_UNNUMBERED,



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched: avoid large irq-latencies in smp-balancing
  2007-11-07 11:29 [PATCH] sched: avoid large irq-latencies in smp-balancing Peter Zijlstra
@ 2007-11-07 12:17 ` Peter Zijlstra
  2007-11-07 18:27   ` Andrew Morton
                     ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Peter Zijlstra @ 2007-11-07 12:17 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Williams, linux-kernel

Bah, missed a hunk

---
Subject: sched: avoid large irq-latencies in smp-balancing

SMP balancing is done with IRQs disabled and can iterate the full rq. When rqs
are large this can cause large irq-latencies. Limit the nr of iterations on
each run.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Peter Williams <pwil3058@bigpond.net.au>
---
 include/linux/sched.h |    1 +
 kernel/sched.c        |   15 ++++++++++-----
 kernel/sysctl.c       |    8 ++++++++
 3 files changed, 19 insertions(+), 5 deletions(-)

Index: linux-2.6-2/kernel/sched.c
===================================================================
--- linux-2.6-2.orig/kernel/sched.c
+++ linux-2.6-2/kernel/sched.c
@@ -474,6 +474,12 @@ const_debug unsigned int sysctl_sched_fe
 #define sched_feat(x) (sysctl_sched_features & SCHED_FEAT_##x)
 
 /*
+ * Number of tasks to iterate in a single balance run.
+ * Limited because this is done with IRQs disabled.
+ */
+const_debug unsigned int sysctl_sched_nr_migrate = 32;
+
+/*
  * For kernel-internal use: high-speed (but slightly incorrect) per-cpu
  * clock constructed from sched_clock():
  */
@@ -2237,7 +2243,7 @@ balance_tasks(struct rq *this_rq, int th
 	      enum cpu_idle_type idle, int *all_pinned,
 	      int *this_best_prio, struct rq_iterator *iterator)
 {
-	int pulled = 0, pinned = 0, skip_for_load;
+	int loops = 0, pulled = 0, pinned = 0, skip_for_load;
 	struct task_struct *p;
 	long rem_load_move = max_load_move;
 
@@ -2251,10 +2257,10 @@ balance_tasks(struct rq *this_rq, int th
 	 */
 	p = iterator->start(iterator->arg);
 next:
-	if (!p)
+	if (!p || loops++ > sysctl_sched_nr_migrate)
 		goto out;
 	/*
-	 * To help distribute high priority tasks accross CPUs we don't
+	 * To help distribute high priority tasks across CPUs we don't
 	 * skip a task if it will be the highest priority task (i.e. smallest
 	 * prio value) on its new queue regardless of its load weight
 	 */
@@ -2271,8 +2277,7 @@ next:
 	rem_load_move -= p->se.load.weight;
 
 	/*
-	 * We only want to steal up to the prescribed number of tasks
-	 * and the prescribed amount of weighted load.
+	 * We only want to steal up to the prescribed amount of weighted load.
 	 */
 	if (rem_load_move > 0) {
 		if (p->prio < *this_best_prio)
Index: linux-2.6-2/kernel/sysctl.c
===================================================================
--- linux-2.6-2.orig/kernel/sysctl.c
+++ linux-2.6-2/kernel/sysctl.c
@@ -298,6 +298,14 @@ static struct ctl_table kern_table[] = {
 		.mode		= 0644,
 		.proc_handler	= &proc_dointvec,
 	},
+	{
+		.ctl_name	= CTL_UNNUMBERED,
+		.procname	= "sched_nr_migrate",
+		.data		= &sysctl_sched_nr_migrate,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 644,
+		.proc_handler	= &proc_dointvec,
+	},
 #endif
 	{
 		.ctl_name	= CTL_UNNUMBERED,
Index: linux-2.6-2/include/linux/sched.h
===================================================================
--- linux-2.6-2.orig/include/linux/sched.h
+++ linux-2.6-2/include/linux/sched.h
@@ -1466,6 +1466,7 @@ extern unsigned int sysctl_sched_batch_w
 extern unsigned int sysctl_sched_child_runs_first;
 extern unsigned int sysctl_sched_features;
 extern unsigned int sysctl_sched_migration_cost;
+extern unsigned int sysctl_sched_nr_migrate;
 #endif
 
 extern unsigned int sysctl_sched_compat_yield;



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched: avoid large irq-latencies in smp-balancing
  2007-11-07 12:17 ` Peter Zijlstra
@ 2007-11-07 18:27   ` Andrew Morton
  2007-11-07 18:58     ` Peter Zijlstra
  2007-11-07 22:10     ` Steven Rostedt
  2007-11-08  4:37   ` Gregory Haskins
  2007-11-09 12:41   ` DM
  2 siblings, 2 replies; 9+ messages in thread
From: Andrew Morton @ 2007-11-07 18:27 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, pwil3058, linux-kernel

> On Wed, 07 Nov 2007 13:17:00 +0100 Peter Zijlstra <peterz@infradead.org> wrote:
> +	{
> +		.ctl_name	= CTL_UNNUMBERED,
> +		.procname	= "sched_nr_migrate",
> +		.data		= &sysctl_sched_nr_migrate,
> +		.maxlen		= sizeof(unsigned int),
> +		.mode		= 644,
> +		.proc_handler	= &proc_dointvec,
> +	},

This (and all the other stuff in that table) should be described in
Documentation/, please.

It would be nice if sched_nr_migrate didn't exist, really.  It's hard to
imagine anyone wanting to tweak it, apart from developers.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched: avoid large irq-latencies in smp-balancing
  2007-11-07 18:27   ` Andrew Morton
@ 2007-11-07 18:58     ` Peter Zijlstra
  2007-11-07 22:10     ` Steven Rostedt
  1 sibling, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2007-11-07 18:58 UTC (permalink / raw)
  To: Andrew Morton; +Cc: mingo, pwil3058, linux-kernel

On Wed, 2007-11-07 at 10:27 -0800, Andrew Morton wrote:
> > On Wed, 07 Nov 2007 13:17:00 +0100 Peter Zijlstra <peterz@infradead.org> wrote:
> > +	{
> > +		.ctl_name	= CTL_UNNUMBERED,
> > +		.procname	= "sched_nr_migrate",
> > +		.data		= &sysctl_sched_nr_migrate,
> > +		.maxlen		= sizeof(unsigned int),
> > +		.mode		= 644,
> > +		.proc_handler	= &proc_dointvec,
> > +	},
> 
> This (and all the other stuff in that table) should be described in
> Documentation/, please.
> 
> It would be nice if sched_nr_migrate didn't exist, really.  It's hard to
> imagine anyone wanting to tweak it, apart from developers.

Right, most of these SCHED_DEBUG sysctls are only interesting to
developers, although stuff like sched_latency might be interesting to
some users.

I'll try and write up a documentation thingy before .24 hits the street.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched: avoid large irq-latencies in smp-balancing
  2007-11-07 18:27   ` Andrew Morton
  2007-11-07 18:58     ` Peter Zijlstra
@ 2007-11-07 22:10     ` Steven Rostedt
  2007-11-08  0:24       ` Eric St-Laurent
  1 sibling, 1 reply; 9+ messages in thread
From: Steven Rostedt @ 2007-11-07 22:10 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Peter Zijlstra, mingo, pwil3058, linux-kernel

On Wed, Nov 07, 2007 at 10:27:25AM -0800, Andrew Morton wrote:
> > On Wed, 07 Nov 2007 13:17:00 +0100 Peter Zijlstra <peterz@infradead.org> wrote:
> > +	{
> > +		.ctl_name	= CTL_UNNUMBERED,
> > +		.procname	= "sched_nr_migrate",
> > +		.data		= &sysctl_sched_nr_migrate,
> > +		.maxlen		= sizeof(unsigned int),
> > +		.mode		= 644,
> > +		.proc_handler	= &proc_dointvec,
> > +	},
> 
> This (and all the other stuff in that table) should be described in
> Documentation/, please.
> 
> It would be nice if sched_nr_migrate didn't exist, really.  It's hard to
> imagine anyone wanting to tweak it, apart from developers.

I'm not so sure about that. It is a tunable for RT. That is we can tweak
this value to be smaller if we don't like the latencies it gives us.

This is one of those things that sacrifices performance for latency.
The higher the number, the better it can spread tasks around, but it
also causes large latencies.

I've just included this patch into 2.6.23.1-rt11 and it brought down an
unbounded latency to just 42us. (previously we got into the
milliseconds!).

Perhaps when this feature matures, we can come to a good defined value
that would be good for all. But until then, I recommend keeping this a
tunable.

Acked-by: Steven Rostedt <rostedt@goodmis.org>

-- Steve

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched: avoid large irq-latencies in smp-balancing
  2007-11-07 22:10     ` Steven Rostedt
@ 2007-11-08  0:24       ` Eric St-Laurent
  0 siblings, 0 replies; 9+ messages in thread
From: Eric St-Laurent @ 2007-11-08  0:24 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrew Morton, Peter Zijlstra, mingo, pwil3058, linux-kernel


On Wed, 2007-11-07 at 17:10 -0500, Steven Rostedt wrote:
> > 
> > It would be nice if sched_nr_migrate didn't exist, really.  It's hard to
> > imagine anyone wanting to tweak it, apart from developers.
> 
> I'm not so sure about that. It is a tunable for RT. That is we can tweak
> this value to be smaller if we don't like the latencies it gives us.
> 
> This is one of those things that sacrifices performance for latency.
> The higher the number, the better it can spread tasks around, but it
> also causes large latencies.
> 
> I've just included this patch into 2.6.23.1-rt11 and it brought down an
> unbounded latency to just 42us. (previously we got into the
> milliseconds!).
> 
> Perhaps when this feature matures, we can come to a good defined value
> that would be good for all. But until then, I recommend keeping this a
> tunable.


Why not use the latency-expectation infrastructure?

Iterate under lock until (or before...) the system global latency is
respected.


- Eric


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched: avoid large irq-latencies in smp-balancing
  2007-11-07 12:17 ` Peter Zijlstra
  2007-11-07 18:27   ` Andrew Morton
@ 2007-11-08  4:37   ` Gregory Haskins
  2007-11-09 13:09     ` Nick Piggin
  2007-11-09 12:41   ` DM
  2 siblings, 1 reply; 9+ messages in thread
From: Gregory Haskins @ 2007-11-08  4:37 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Peter Williams, linux-kernel

Peter Zijlstra wrote:
> Bah, missed a hunk
> 
> ---
> Subject: sched: avoid large irq-latencies in smp-balancing
> 
> SMP balancing is done with IRQs disabled and can iterate the full rq. When rqs
> are large this can cause large irq-latencies. Limit the nr of iterations on
> each run.
> 
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> CC: Peter Williams <pwil3058@bigpond.net.au>

Tested-by: Gregory Haskins <ghaskins@novell.com> (as part of 23.1-rt11)

> ---
>  include/linux/sched.h |    1 +
>  kernel/sched.c        |   15 ++++++++++-----
>  kernel/sysctl.c       |    8 ++++++++
>  3 files changed, 19 insertions(+), 5 deletions(-)
> 
> Index: linux-2.6-2/kernel/sched.c
> ===================================================================
> --- linux-2.6-2.orig/kernel/sched.c
> +++ linux-2.6-2/kernel/sched.c
> @@ -474,6 +474,12 @@ const_debug unsigned int sysctl_sched_fe
>  #define sched_feat(x) (sysctl_sched_features & SCHED_FEAT_##x)
>  
>  /*
> + * Number of tasks to iterate in a single balance run.
> + * Limited because this is done with IRQs disabled.
> + */
> +const_debug unsigned int sysctl_sched_nr_migrate = 32;
> +
> +/*
>   * For kernel-internal use: high-speed (but slightly incorrect) per-cpu
>   * clock constructed from sched_clock():
>   */
> @@ -2237,7 +2243,7 @@ balance_tasks(struct rq *this_rq, int th
>  	      enum cpu_idle_type idle, int *all_pinned,
>  	      int *this_best_prio, struct rq_iterator *iterator)
>  {
> -	int pulled = 0, pinned = 0, skip_for_load;
> +	int loops = 0, pulled = 0, pinned = 0, skip_for_load;
>  	struct task_struct *p;
>  	long rem_load_move = max_load_move;
>  
> @@ -2251,10 +2257,10 @@ balance_tasks(struct rq *this_rq, int th
>  	 */
>  	p = iterator->start(iterator->arg);
>  next:
> -	if (!p)
> +	if (!p || loops++ > sysctl_sched_nr_migrate)
>  		goto out;
>  	/*
> -	 * To help distribute high priority tasks accross CPUs we don't
> +	 * To help distribute high priority tasks across CPUs we don't
>  	 * skip a task if it will be the highest priority task (i.e. smallest
>  	 * prio value) on its new queue regardless of its load weight
>  	 */
> @@ -2271,8 +2277,7 @@ next:
>  	rem_load_move -= p->se.load.weight;
>  
>  	/*
> -	 * We only want to steal up to the prescribed number of tasks
> -	 * and the prescribed amount of weighted load.
> +	 * We only want to steal up to the prescribed amount of weighted load.
>  	 */
>  	if (rem_load_move > 0) {
>  		if (p->prio < *this_best_prio)
> Index: linux-2.6-2/kernel/sysctl.c
> ===================================================================
> --- linux-2.6-2.orig/kernel/sysctl.c
> +++ linux-2.6-2/kernel/sysctl.c
> @@ -298,6 +298,14 @@ static struct ctl_table kern_table[] = {
>  		.mode		= 0644,
>  		.proc_handler	= &proc_dointvec,
>  	},
> +	{
> +		.ctl_name	= CTL_UNNUMBERED,
> +		.procname	= "sched_nr_migrate",
> +		.data		= &sysctl_sched_nr_migrate,
> +		.maxlen		= sizeof(unsigned int),
> +		.mode		= 644,
> +		.proc_handler	= &proc_dointvec,
> +	},
>  #endif
>  	{
>  		.ctl_name	= CTL_UNNUMBERED,
> Index: linux-2.6-2/include/linux/sched.h
> ===================================================================
> --- linux-2.6-2.orig/include/linux/sched.h
> +++ linux-2.6-2/include/linux/sched.h
> @@ -1466,6 +1466,7 @@ extern unsigned int sysctl_sched_batch_w
>  extern unsigned int sysctl_sched_child_runs_first;
>  extern unsigned int sysctl_sched_features;
>  extern unsigned int sysctl_sched_migration_cost;
> +extern unsigned int sysctl_sched_nr_migrate;
>  #endif
>  
>  extern unsigned int sysctl_sched_compat_yield;
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched: avoid large irq-latencies in smp-balancing
  2007-11-08  4:37   ` Gregory Haskins
@ 2007-11-09 13:09     ` Nick Piggin
  0 siblings, 0 replies; 9+ messages in thread
From: Nick Piggin @ 2007-11-09 13:09 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: Peter Zijlstra, Ingo Molnar, Peter Williams, linux-kernel

On Thursday 08 November 2007 15:37, Gregory Haskins wrote:
> Peter Zijlstra wrote:
> > Bah, missed a hunk
> >
> > ---
> > Subject: sched: avoid large irq-latencies in smp-balancing
> >
> > SMP balancing is done with IRQs disabled and can iterate the full rq.
> > When rqs are large this can cause large irq-latencies. Limit the nr of
> > iterations on each run.
> >
> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > CC: Peter Williams <pwil3058@bigpond.net.au>
>
> Tested-by: Gregory Haskins <ghaskins@novell.com> (as part of 23.1-rt11)

OK, but let's not make this default for mainline just yet please. There
are so many performance regression possibilities in 2.6.24 already that
we should go a little bit slower for now IMO.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched: avoid large irq-latencies in smp-balancing
  2007-11-07 12:17 ` Peter Zijlstra
  2007-11-07 18:27   ` Andrew Morton
  2007-11-08  4:37   ` Gregory Haskins
@ 2007-11-09 12:41   ` DM
  2 siblings, 0 replies; 9+ messages in thread
From: DM @ 2007-11-09 12:41 UTC (permalink / raw)
  To: linux-kernel

Peter Zijlstra <peterz <at> infradead.org> writes:
> @@ -2237,7 +2243,7 @@ balance_tasks(struct rq *this_rq, int th
>  	      enum cpu_idle_type idle, int *all_pinned,
>  	      int *this_best_prio, struct rq_iterator *iterator)
>  {
> -	int pulled = 0, pinned = 0, skip_for_load;
> +	int loops = 0, pulled = 0, pinned = 0, skip_for_load;
>  	struct task_struct *p;
>  	long rem_load_move = max_load_move;
> 
> @@ -2251,10 +2257,10 @@ balance_tasks(struct rq *this_rq, int th
>  	 */
>  	p = iterator->start(iterator->arg);
>  next:
> -	if (!p)
> +	if (!p || loops++ > sysctl_sched_nr_migrate)
>  		goto out;

Looks to me like an off-by-one thingy. If sysctl_sched_nr_migrate is zero we 
can still migrate one task. Feature or bug?

/dm



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-11-09 13:09 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-07 11:29 [PATCH] sched: avoid large irq-latencies in smp-balancing Peter Zijlstra
2007-11-07 12:17 ` Peter Zijlstra
2007-11-07 18:27   ` Andrew Morton
2007-11-07 18:58     ` Peter Zijlstra
2007-11-07 22:10     ` Steven Rostedt
2007-11-08  0:24       ` Eric St-Laurent
2007-11-08  4:37   ` Gregory Haskins
2007-11-09 13:09     ` Nick Piggin
2007-11-09 12:41   ` DM

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox