public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] scheduler patches
@ 2008-12-16  7:45 Peter Zijlstra
  2008-12-16  7:45 ` [PATCH 1/3] From: Mike Galbraith <efault@gmx.de> Peter Zijlstra
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Peter Zijlstra @ 2008-12-16  7:45 UTC (permalink / raw)
  To: mingo, efault; +Cc: linux-kernel, Peter Zijlstra

Ingo, please consider for .29
-- 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/3] From: Mike Galbraith <efault@gmx.de>
  2008-12-16  7:45 [PATCH 0/3] scheduler patches Peter Zijlstra
@ 2008-12-16  7:45 ` Peter Zijlstra
  2008-12-16  7:54   ` Peter Zijlstra
  2008-12-16  7:45 ` [PATCH 2/3] sched: optimize update_curr() Peter Zijlstra
  2008-12-16  7:45 ` [PATCH 3/3] sched: prefer wakers Peter Zijlstra
  2 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2008-12-16  7:45 UTC (permalink / raw)
  To: mingo, efault; +Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: sched-fix-wakeup-clock.patch --]
[-- Type: text/plain, Size: 1500 bytes --]

From: Mike Galbraith <efault@gmx.de>

It was possible to do the preemption check against an old time stamp.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/sched.c      |    2 +-
 kernel/sched_fair.c |    7 +++----
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index ce55b6a..efe5c6d 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2259,6 +2259,7 @@ static int try_to_wake_up(struct task_struct *p, unsigned int state, int sync)
 
 	smp_wmb();
 	rq = task_rq_lock(p, &flags);
+	update_rq_clock(rq);
 	old_state = p->state;
 	if (!(old_state & state))
 		goto out;
@@ -2316,7 +2317,6 @@ out_activate:
 		schedstat_inc(p, se.nr_wakeups_local);
 	else
 		schedstat_inc(p, se.nr_wakeups_remote);
-	update_rq_clock(rq);
 	activate_task(rq, p, 1);
 	success = 1;
 
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 08ffffd..6ae5115 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1343,12 +1343,11 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int sync)
 {
 	struct task_struct *curr = rq->curr;
 	struct sched_entity *se = &curr->se, *pse = &p->se;
+	struct cfs_rq *cfs_rq = task_cfs_rq(curr);
 
-	if (unlikely(rt_prio(p->prio))) {
-		struct cfs_rq *cfs_rq = task_cfs_rq(curr);
+	update_curr(cfs_rq);
 
-		update_rq_clock(rq);
-		update_curr(cfs_rq);
+	if (unlikely(rt_prio(p->prio))) {
 		resched_task(curr);
 		return;
 	}

-- 


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/3] sched: optimize update_curr()
  2008-12-16  7:45 [PATCH 0/3] scheduler patches Peter Zijlstra
  2008-12-16  7:45 ` [PATCH 1/3] From: Mike Galbraith <efault@gmx.de> Peter Zijlstra
@ 2008-12-16  7:45 ` Peter Zijlstra
  2008-12-16  7:45 ` [PATCH 3/3] sched: prefer wakers Peter Zijlstra
  2 siblings, 0 replies; 6+ messages in thread
From: Peter Zijlstra @ 2008-12-16  7:45 UTC (permalink / raw)
  To: mingo, efault; +Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: sched-opt-update_curr_fair.patch --]
[-- Type: text/plain, Size: 640 bytes --]

Skip the hard work when there is none.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Mike Galbraith <efault@gmx.de>
---
 kernel/sched_fair.c |    2 ++
 1 file changed, 2 insertions(+)

Index: linux-2.6/kernel/sched_fair.c
===================================================================
--- linux-2.6.orig/kernel/sched_fair.c
+++ linux-2.6/kernel/sched_fair.c
@@ -492,6 +492,8 @@ static void update_curr(struct cfs_rq *c
 	 * overflow on 32 bits):
 	 */
 	delta_exec = (unsigned long)(now - curr->exec_start);
+	if (!delta_exec)
+		return;
 
 	__update_curr(cfs_rq, curr, delta_exec);
 	curr->exec_start = now;

-- 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 3/3] sched: prefer wakers
  2008-12-16  7:45 [PATCH 0/3] scheduler patches Peter Zijlstra
  2008-12-16  7:45 ` [PATCH 1/3] From: Mike Galbraith <efault@gmx.de> Peter Zijlstra
  2008-12-16  7:45 ` [PATCH 2/3] sched: optimize update_curr() Peter Zijlstra
@ 2008-12-16  7:45 ` Peter Zijlstra
  2008-12-16  9:05   ` Mike Galbraith
  2 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2008-12-16  7:45 UTC (permalink / raw)
  To: mingo, efault; +Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: wakeup-preempt-fiddle.patch --]
[-- Type: text/plain, Size: 7006 bytes --]

Prefer tasks that wake other tasks to preempt quickly. This improves
performance because more work is available sooner.

The workload that prompted this patch was a kernel build over NFS4 (for some
curious and not understood reason we had to revert commit:
18de9735300756e3ca9c361ef58409d8561dfe0d to make any progress at all)

Without this patch a make -j8 bzImage (of x86-64 defconfig) would take
3m30-ish, with this patch we're down to 2m50-ish.

psql-sysbench/mysql-sysbench show a slight improvement in peak performance as
well, tbench and vmark seemed to not care.

It is possible to improve upon the build time (to 2m20-ish) but that seriously
destroys other benchmarks (just shows that there's more room for tinkering).

Much thanks to Mike who put in a lot of effort to benchmark things and proved
a worthy opponent with a competing patch.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
---
 include/linux/sched.h   |    2 +
 kernel/sched.c          |   30 +++++++++++++++++++++---
 kernel/sched_debug.c    |    1 
 kernel/sched_fair.c     |   59 +++++++++++++++++++++++++++++++++++++++++++-----
 kernel/sched_features.h |    3 +-
 5 files changed, 84 insertions(+), 11 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -1690,17 +1690,35 @@ static void update_avg(u64 *avg, u64 sam
 
 static void enqueue_task(struct rq *rq, struct task_struct *p, int wakeup)
 {
+	if (wakeup)
+		p->se.start_runtime = p->se.sum_exec_runtime;
+
 	sched_info_queued(p);
 	p->sched_class->enqueue_task(rq, p, wakeup);
 	p->se.on_rq = 1;
 }
 
+static void update_avg_wakeup(struct sched_entity *se)
+{
+	u64 sample = se->sum_exec_runtime;
+	if (se->last_wakeup)
+		sample -= se->last_wakeup;
+	else
+		sample -= se->start_runtime;
+	update_avg(&se->avg_wakeup, sample);
+}
+
 static void dequeue_task(struct rq *rq, struct task_struct *p, int sleep)
 {
-	if (sleep && p->se.last_wakeup) {
-		update_avg(&p->se.avg_overlap,
-			   p->se.sum_exec_runtime - p->se.last_wakeup);
-		p->se.last_wakeup = 0;
+	if (sleep) {
+		if (p->se.last_wakeup) {
+			update_avg(&p->se.avg_overlap,
+				p->se.sum_exec_runtime - p->se.last_wakeup);
+			p->se.last_wakeup = 0;
+		} else {
+			update_avg(&p->se.avg_wakeup,
+				sysctl_sched_wakeup_granularity);
+		}
 	}
 
 	sched_info_dequeued(p);
@@ -2327,6 +2345,8 @@ out_activate:
 		schedstat_inc(p, se.nr_wakeups_remote);
 	activate_task(rq, p, 1);
 	success = 1;
+	if (!in_interrupt())
+		update_avg_wakeup(&current->se);
 
 out_running:
 	trace_sched_wakeup(rq, p);
@@ -2369,6 +2389,8 @@ static void __sched_fork(struct task_str
 	p->se.prev_sum_exec_runtime	= 0;
 	p->se.last_wakeup		= 0;
 	p->se.avg_overlap		= 0;
+	p->se.start_runtime		= 0;
+	p->se.avg_wakeup		= sysctl_sched_wakeup_granularity;
 
 #ifdef CONFIG_SCHEDSTATS
 	p->se.wait_start		= 0;
Index: linux-2.6/kernel/sched_fair.c
===================================================================
--- linux-2.6.orig/kernel/sched_fair.c
+++ linux-2.6/kernel/sched_fair.c
@@ -1283,16 +1283,63 @@ out:
 }
 #endif /* CONFIG_SMP */
 
-static unsigned long wakeup_gran(struct sched_entity *se)
+/*
+ * Adaptive granularity
+ *
+ * se->avg_wakeup gives the average time a task runs until it does a wakeup,
+ * with the limit of wakeup_gran -- when it never does a wakeup.
+ *
+ * So the smaller avg_wakeup is the faster we want this task to preempt,
+ * but we don't want to treat the preemptee unfairly and therefore allow it
+ * to run for at least the amount of time we'd like to run.
+ *
+ * NOTE: we use 2*avg_wakeup to increase the probability of actually doing one
+ *
+ * NOTE: we use *nr_running to scale with load, this nicely matches the
+ *       degrading latency on load.
+ */
+static unsigned long
+adaptive_gran(struct sched_entity *curr, struct sched_entity *se)
+{
+	u64 this_run = curr->sum_exec_runtime - curr->prev_sum_exec_runtime;
+	u64 expected_wakeup = 2*se->avg_wakeup * cfs_rq_of(se)->nr_running;
+	u64 gran = 0;
+
+	if (this_run < expected_wakeup)
+		gran = expected_wakeup - this_run;
+
+	return min_t(s64, gran, sysctl_sched_wakeup_granularity);
+}
+
+static unsigned long
+wakeup_gran(struct sched_entity *curr, struct sched_entity *se)
 {
 	unsigned long gran = sysctl_sched_wakeup_granularity;
 
+	if (cfs_rq_of(curr)->curr && sched_feat(ADAPTIVE_GRAN))
+		gran = adaptive_gran(curr, se);
+
 	/*
-	 * More easily preempt - nice tasks, while not making it harder for
-	 * + nice tasks.
+	 * Since its curr running now, convert the gran from real-time
+	 * to virtual-time in his units.
 	 */
-	if (!sched_feat(ASYM_GRAN) || se->load.weight > NICE_0_LOAD)
-		gran = calc_delta_fair(sysctl_sched_wakeup_granularity, se);
+	if (sched_feat(ASYM_GRAN)) {
+		/*
+		 * By using 'se' instead of 'curr' we penalize light tasks, so
+		 * they get preempted easier. That is, if 'se' < 'curr' then
+		 * the resulting gran will be larger, therefore penalizing the
+		 * lighter, if otoh 'se' > 'curr' then the resulting gran will
+		 * be smaller, again penalizing the lighter task.
+		 *
+		 * This is especially important for buddies when the leftmost
+		 * task is higher priority than the buddy.
+		 */
+		if (unlikely(se->load.weight != NICE_0_LOAD))
+			gran = calc_delta_fair(gran, se);
+	} else {
+		if (unlikely(curr->load.weight != NICE_0_LOAD))
+			gran = calc_delta_fair(gran, curr);
+	}
 
 	return gran;
 }
@@ -1319,7 +1366,7 @@ wakeup_preempt_entity(struct sched_entit
 	if (vdiff <= 0)
 		return -1;
 
-	gran = wakeup_gran(curr);
+	gran = wakeup_gran(curr, se);
 	if (vdiff > gran)
 		return 1;
 
Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -1024,6 +1024,8 @@ struct sched_entity {
 
 	u64			last_wakeup;
 	u64			avg_overlap;
+	u64			start_runtime;
+	u64			avg_wakeup;
 
 #ifdef CONFIG_SCHEDSTATS
 	u64			wait_start;
Index: linux-2.6/kernel/sched_features.h
===================================================================
--- linux-2.6.orig/kernel/sched_features.h
+++ linux-2.6/kernel/sched_features.h
@@ -1,5 +1,5 @@
 SCHED_FEAT(NEW_FAIR_SLEEPERS, 1)
-SCHED_FEAT(NORMALIZED_SLEEPER, 1)
+SCHED_FEAT(NORMALIZED_SLEEPER, 0)
 SCHED_FEAT(WAKEUP_PREEMPT, 1)
 SCHED_FEAT(START_DEBIT, 1)
 SCHED_FEAT(AFFINE_WAKEUPS, 1)
@@ -13,3 +13,4 @@ SCHED_FEAT(LB_WAKEUP_UPDATE, 1)
 SCHED_FEAT(ASYM_EFF_LOAD, 1)
 SCHED_FEAT(WAKEUP_OVERLAP, 0)
 SCHED_FEAT(LAST_BUDDY, 1)
+SCHED_FEAT(ADAPTIVE_GRAN, 1)
Index: linux-2.6/kernel/sched_debug.c
===================================================================
--- linux-2.6.orig/kernel/sched_debug.c
+++ linux-2.6/kernel/sched_debug.c
@@ -384,6 +384,7 @@ void proc_sched_show_task(struct task_st
 	PN(se.vruntime);
 	PN(se.sum_exec_runtime);
 	PN(se.avg_overlap);
+	PN(se.avg_wakeup);
 
 	nr_switches = p->nvcsw + p->nivcsw;
 

-- 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/3] From: Mike Galbraith <efault@gmx.de>
  2008-12-16  7:45 ` [PATCH 1/3] From: Mike Galbraith <efault@gmx.de> Peter Zijlstra
@ 2008-12-16  7:54   ` Peter Zijlstra
  0 siblings, 0 replies; 6+ messages in thread
From: Peter Zijlstra @ 2008-12-16  7:54 UTC (permalink / raw)
  To: mingo; +Cc: efault, linux-kernel

I suck!

Subject: sched: fix wakeup preemption clock

On Tue, 2008-12-16 at 08:45 +0100, Peter Zijlstra wrote:
> plain text document attachment (sched-fix-wakeup-clock.patch)
> From: Mike Galbraith <efault@gmx.de>
> 
> It was possible to do the preemption check against an old time stamp.
> 
> Signed-off-by: Mike Galbraith <efault@gmx.de>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
>  kernel/sched.c      |    2 +-
>  kernel/sched_fair.c |    7 +++----
>  2 files changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/sched.c b/kernel/sched.c
> index ce55b6a..efe5c6d 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -2259,6 +2259,7 @@ static int try_to_wake_up(struct task_struct *p, unsigned int state, int sync)
>  
>  	smp_wmb();
>  	rq = task_rq_lock(p, &flags);
> +	update_rq_clock(rq);
>  	old_state = p->state;
>  	if (!(old_state & state))
>  		goto out;
> @@ -2316,7 +2317,6 @@ out_activate:
>  		schedstat_inc(p, se.nr_wakeups_local);
>  	else
>  		schedstat_inc(p, se.nr_wakeups_remote);
> -	update_rq_clock(rq);
>  	activate_task(rq, p, 1);
>  	success = 1;
>  
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 08ffffd..6ae5115 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -1343,12 +1343,11 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int sync)
>  {
>  	struct task_struct *curr = rq->curr;
>  	struct sched_entity *se = &curr->se, *pse = &p->se;
> +	struct cfs_rq *cfs_rq = task_cfs_rq(curr);
>  
> -	if (unlikely(rt_prio(p->prio))) {
> -		struct cfs_rq *cfs_rq = task_cfs_rq(curr);
> +	update_curr(cfs_rq);
>  
> -		update_rq_clock(rq);
> -		update_curr(cfs_rq);
> +	if (unlikely(rt_prio(p->prio))) {
>  		resched_task(curr);
>  		return;
>  	}
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 3/3] sched: prefer wakers
  2008-12-16  7:45 ` [PATCH 3/3] sched: prefer wakers Peter Zijlstra
@ 2008-12-16  9:05   ` Mike Galbraith
  0 siblings, 0 replies; 6+ messages in thread
From: Mike Galbraith @ 2008-12-16  9:05 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linux-kernel

On Tue, 2008-12-16 at 08:45 +0100, Peter Zijlstra wrote:
> plain text document attachment (wakeup-preempt-fiddle.patch)
> Prefer tasks that wake other tasks to preempt quickly. This improves
> performance because more work is available sooner.
> 
> The workload that prompted this patch was a kernel build over NFS4 (for some
> curious and not understood reason we had to revert commit:
> 18de9735300756e3ca9c361ef58409d8561dfe0d to make any progress at all)
> 
> Without this patch a make -j8 bzImage (of x86-64 defconfig) would take
> 3m30-ish, with this patch we're down to 2m50-ish.

Here, with wimpy 100Mbps nic and my normal config, the improvement is
considerably larger.  Full make -j8 build went from ~6m17s to ~4m28s.

	-Mike


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-12-16  9:06 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-16  7:45 [PATCH 0/3] scheduler patches Peter Zijlstra
2008-12-16  7:45 ` [PATCH 1/3] From: Mike Galbraith <efault@gmx.de> Peter Zijlstra
2008-12-16  7:54   ` Peter Zijlstra
2008-12-16  7:45 ` [PATCH 2/3] sched: optimize update_curr() Peter Zijlstra
2008-12-16  7:45 ` [PATCH 3/3] sched: prefer wakers Peter Zijlstra
2008-12-16  9:05   ` Mike Galbraith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox