[RFC][PATCH 1/2] sched: higher granularity load on 64bit systems

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC][PATCH 1/2] sched: higher granularity load on 64bit systems
@ 2008-04-23 22:07 Peter Zijlstra
  2008-04-23 22:09 ` [RFC][PATCH 2/2] sched: aggregate_group_shares no loop Peter Zijlstra
  2008-04-24  0:27 ` [RFC][PATCH 1/2] sched: higher granularity load on 64bit systems David Miller
  0 siblings, 2 replies; 6+ messages in thread
From: Peter Zijlstra @ 2008-04-23 22:07 UTC (permalink / raw)
  To: Ingo Molnar, Dhaval Giani, Srivatsa Vaddagiri, Dmitry Adamushko
  Cc: linux-kernel, David Miller, Mike Galbraith

Hi

The below is an RFC because for some reason it regresses kbuild by 5% on
my machine (and more on the largesmp that are the reason for it).

I'm failing to see how adding a few shifts can cause this.

---

Subject: sched: higher granularity load on 64bit systems

Group scheduling stretches the 10 bit fixed point arithmetic in two ways:
 1) shares - fraction of a groups weight
 2) group load - recursive fraction of load

Esp. on LargeSMP 1) is a large problem as a group with load 1024 can easily
run into numerical trouble on a 128 CPU machine.

Increase the fixed point fraction to 20 bits on 64-bit machines (as LargeSMP
is hardly available on 32 bit).

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 include/linux/sched.h |    5 +++++
 kernel/sched.c        |   28 ++++++++++++++++++++--------
 kernel/sched_fair.c   |    2 +-
 3 files changed, 26 insertions(+), 9 deletions(-)

Index: linux-2.6-2/include/linux/sched.h
===================================================================
--- linux-2.6-2.orig/include/linux/sched.h
+++ linux-2.6-2/include/linux/sched.h
@@ -686,7 +686,12 @@ enum cpu_idle_type {
 /*
  * Increase resolution of nice-level calculations:
  */
+#if BITS_PER_LONG == 64
+#define SCHED_LOAD_SHIFT	20
+#else
 #define SCHED_LOAD_SHIFT	10
+#endif
+
 #define SCHED_LOAD_SCALE	(1L << SCHED_LOAD_SHIFT)
 
 #define SCHED_LOAD_SCALE_FUZZ	SCHED_LOAD_SCALE
Index: linux-2.6-2/kernel/sched.c
===================================================================
--- linux-2.6-2.orig/kernel/sched.c
+++ linux-2.6-2/kernel/sched.c
@@ -1416,6 +1416,15 @@ static void __resched_task(struct task_s
 }
 #endif
 
+/*
+ * We keep the prio_to_weight and its inverse in base WEIGHT_SHIFT
+ */
+#define WEIGHT_SHIFT 		10
+#define WEIGHT_LOAD_SHIFT	(SCHED_LOAD_SHIFT - WEIGHT_SHIFT)
+
+#define WLS(x)		((x) << WEIGHT_LOAD_SHIFT)
+#define inv_WLS(x)	((x) >> WEIGHT_LOAD_SHIFT)
+
 #if BITS_PER_LONG == 32
 # define WMULT_CONST	(~0UL)
 #else
@@ -1438,10 +1447,13 @@ calc_delta_mine(unsigned long delta_exec
 {
 	u64 tmp;
 
-	if (unlikely(!lw->inv_weight))
-		lw->inv_weight = (WMULT_CONST-lw->weight/2) / (lw->weight+1);
+	if (unlikely(!lw->inv_weight)) {
+		unsigned long inv_wls = inv_WLS(lw->weight);
+
+		lw->inv_weight = 1 + (WMULT_CONST-inv_wls/2) / (inv_wls+1);
+	}
 
-	tmp = (u64)delta_exec * weight;
+	tmp = inv_WLS((u64)delta_exec * weight);
 	/*
 	 * Check whether we'd overflow the 64-bit multiplication:
 	 */
@@ -1960,7 +1972,7 @@ static void dec_nr_running(struct rq *rq
 static void set_load_weight(struct task_struct *p)
 {
 	if (task_has_rt_policy(p)) {
-		p->se.load.weight = prio_to_weight[0] * 2;
+		p->se.load.weight = WLS(prio_to_weight[0] * 2);
 		p->se.load.inv_weight = prio_to_wmult[0] >> 1;
 		return;
 	}
@@ -1969,12 +1981,12 @@ static void set_load_weight(struct task_
 	 * SCHED_IDLE tasks get minimal weight:
 	 */
 	if (p->policy == SCHED_IDLE) {
-		p->se.load.weight = WEIGHT_IDLEPRIO;
+		p->se.load.weight = WLS(WEIGHT_IDLEPRIO);
 		p->se.load.inv_weight = WMULT_IDLEPRIO;
 		return;
 	}
 
-	p->se.load.weight = prio_to_weight[p->static_prio - MAX_RT_PRIO];
+	p->se.load.weight = WLS(prio_to_weight[p->static_prio - MAX_RT_PRIO]);
 	p->se.load.inv_weight = prio_to_wmult[p->static_prio - MAX_RT_PRIO];
 }
 
@@ -8072,7 +8084,7 @@ static void init_tg_cfs_entry(struct tas
 
 	se->my_q = cfs_rq;
 	se->load.weight = tg->shares;
-	se->load.inv_weight = div64_64(1ULL<<32, se->load.weight);
+	se->load.inv_weight = 0;
 	se->parent = parent;
 }
 #endif
@@ -8739,7 +8751,7 @@ static void __set_se_shares(struct sched
 		dequeue_entity(cfs_rq, se, 0);
 
 	se->load.weight = shares;
-	se->load.inv_weight = div64_64((1ULL<<32), shares);
+	se->load.inv_weight = 0;
 
 	if (on_rq)
 		enqueue_entity(cfs_rq, se, 0);
Index: linux-2.6-2/kernel/sched_fair.c
===================================================================
--- linux-2.6-2.orig/kernel/sched_fair.c
+++ linux-2.6-2/kernel/sched_fair.c
@@ -424,7 +424,7 @@ calc_delta_asym(unsigned long delta, str
 {
 	struct load_weight lw = {
 		.weight = NICE_0_LOAD,
-		.inv_weight = 1UL << (WMULT_SHIFT-NICE_0_SHIFT)
+		.inv_weight = 1UL << (WMULT_SHIFT-WEIGHT_SHIFT),
 	};
 
 	for_each_sched_entity(se) {



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC][PATCH 2/2] sched: aggregate_group_shares no loop
  2008-04-23 22:07 [RFC][PATCH 1/2] sched: higher granularity load on 64bit systems Peter Zijlstra
@ 2008-04-23 22:09 ` Peter Zijlstra
  2008-04-24  0:27 ` [RFC][PATCH 1/2] sched: higher granularity load on 64bit systems David Miller
  1 sibling, 0 replies; 6+ messages in thread
From: Peter Zijlstra @ 2008-04-23 22:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Dhaval Giani, Srivatsa Vaddagiri, Dmitry Adamushko, linux-kernel,
	David Miller, Mike Galbraith

Subject: sched: aggregate_group_shares no loop

Remove the chance of getting trapped in the loop.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/sched.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Index: linux-2.6-2/kernel/sched.c
===================================================================
--- linux-2.6-2.orig/kernel/sched.c
+++ linux-2.6-2/kernel/sched.c
@@ -1713,7 +1713,6 @@ void aggregate_group_shares(struct task_
 	unsigned long shares = 0;
 	int i;
 
-again:
 	for_each_cpu_mask(i, sd->span)
 		shares += tg->cfs_rq[i]->shares;
 
@@ -1723,7 +1722,9 @@ again:
 	 */
 	if (unlikely(!shares && aggregate(tg, sd)->rq_weight)) {
 		__aggregate_redistribute_shares(tg);
-		goto again;
+
+		for_each_cpu_mask(i, sd->span)
+			shares += tg->cfs_rq[i]->shares;
 	}
 
 	aggregate(tg, sd)->shares = shares;



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH 1/2] sched: higher granularity load on 64bit systems
  2008-04-23 22:07 [RFC][PATCH 1/2] sched: higher granularity load on 64bit systems Peter Zijlstra
  2008-04-23 22:09 ` [RFC][PATCH 2/2] sched: aggregate_group_shares no loop Peter Zijlstra
@ 2008-04-24  0:27 ` David Miller
  2008-04-24  1:58   ` Dhaval Giani
  2008-04-24  6:47   ` Peter Zijlstra
  1 sibling, 2 replies; 6+ messages in thread
From: David Miller @ 2008-04-24  0:27 UTC (permalink / raw)
  To: a.p.zijlstra; +Cc: mingo, dhaval, vatsa, dmitry.adamushko, linux-kernel, efault

From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Thu, 24 Apr 2008 00:07:56 +0200

> The below is an RFC because for some reason it regresses kbuild by 5% on
> my machine (and more on the largesmp that are the reason for it).

This causes my 64-cpu Niagara2 box to completely hang when I run "make
clean" on a kernel tree after a fresh bootup.

Can we just revert all of this broken code until it's sorted out? :-/

We're going on 4 days with unfixed major regressions from the
scheduler tree merge, and these regressions make systems unusable.

This is blocking my own work, and I'm starting to lose my patience.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH 1/2] sched: higher granularity load on 64bit systems
  2008-04-24  0:27 ` [RFC][PATCH 1/2] sched: higher granularity load on 64bit systems David Miller
@ 2008-04-24  1:58   ` Dhaval Giani
  2008-04-24  2:13     ` David Miller
  2008-04-24  6:47   ` Peter Zijlstra
  1 sibling, 1 reply; 6+ messages in thread
From: Dhaval Giani @ 2008-04-24  1:58 UTC (permalink / raw)
  To: David Miller; +Cc: a.p.zijlstra, mingo, vatsa, dmitry.adamushko, efault, lkml

On Wed, Apr 23, 2008 at 05:27:10PM -0700, David Miller wrote:
> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date: Thu, 24 Apr 2008 00:07:56 +0200
> 
> > The below is an RFC because for some reason it regresses kbuild by 5% on
> > my machine (and more on the largesmp that are the reason for it).
> 
> This causes my 64-cpu Niagara2 box to completely hang when I run "make
> clean" on a kernel tree after a fresh bootup.
> 

Any traces? Or anything? It stayed pretty stable on the 128 way I trying
on, (but for the performance regression) and I was able to reproduce
your group scheduler hang (or so i think, it might a separate one
altogether.)

> Can we just revert all of this broken code until it's sorted out? :-/
> 
> We're going on 4 days with unfixed major regressions from the
> scheduler tree merge, and these regressions make systems unusable.
> 
> This is blocking my own work, and I'm starting to lose my patience.

-- 
regards,
Dhaval

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH 1/2] sched: higher granularity load on 64bit systems
  2008-04-24  1:58   ` Dhaval Giani
@ 2008-04-24  2:13     ` David Miller
  0 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2008-04-24  2:13 UTC (permalink / raw)
  To: dhaval; +Cc: a.p.zijlstra, mingo, vatsa, dmitry.adamushko, efault,
	linux-kernel

From: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Date: Thu, 24 Apr 2008 07:28:14 +0530

> and I was able to reproduce your group scheduler hang (or so i
> think, it might a separate one altogether.)

I think it's likely the same problem as the one I'm seeing here.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH 1/2] sched: higher granularity load on 64bit systems
  2008-04-24  0:27 ` [RFC][PATCH 1/2] sched: higher granularity load on 64bit systems David Miller
  2008-04-24  1:58   ` Dhaval Giani
@ 2008-04-24  6:47   ` Peter Zijlstra
  1 sibling, 0 replies; 6+ messages in thread
From: Peter Zijlstra @ 2008-04-24  6:47 UTC (permalink / raw)
  To: David Miller; +Cc: mingo, dhaval, vatsa, dmitry.adamushko, linux-kernel, efault

On Wed, 2008-04-23 at 17:27 -0700, David Miller wrote:
> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date: Thu, 24 Apr 2008 00:07:56 +0200
> 
> > The below is an RFC because for some reason it regresses kbuild by 5% on
> > my machine (and more on the largesmp that are the reason for it).
> 
> This causes my 64-cpu Niagara2 box to completely hang when I run "make
> clean" on a kernel tree after a fresh bootup.
> 
> Can we just revert all of this broken code until it's sorted out? :-/
> 
> We're going on 4 days with unfixed major regressions from the
> scheduler tree merge, and these regressions make systems unusable.
> 
> This is blocking my own work, and I'm starting to lose my patience.

Yes, this is all rather embarassing - if you want a revert of the
offending patches I can spin you one, and if this isn't solved quickly
I'm afraid we'll indeed have to revert in -linus :-/

Sorry about this.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-04-24  6:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-23 22:07 [RFC][PATCH 1/2] sched: higher granularity load on 64bit systems Peter Zijlstra
2008-04-23 22:09 ` [RFC][PATCH 2/2] sched: aggregate_group_shares no loop Peter Zijlstra
2008-04-24  0:27 ` [RFC][PATCH 1/2] sched: higher granularity load on 64bit systems David Miller
2008-04-24  1:58   ` Dhaval Giani
2008-04-24  2:13     ` David Miller
2008-04-24  6:47   ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox