linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] sched: START_NICE feature (temporarily niced forks) (v3)
@ 2010-09-14 20:25 Mathieu Desnoyers
  2010-09-15  8:37 ` Mike Galbraith
  2010-09-20 11:43 ` Peter Zijlstra
  0 siblings, 2 replies; 11+ messages in thread
From: Mathieu Desnoyers @ 2010-09-14 20:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Mike Galbraith, Peter Zijlstra, Linus Torvalds,
	Andrew Morton, Steven Rostedt, Thomas Gleixner, Tony Lindgren

This patch tweaks the fair vruntime calculation of both the parent and the child
after a fork to double vruntime increment speed, but this is only applied to
their first slice after the fork. The goal of this scheme is that their
respective vruntime will increment faster in the first slice after the fork, so
a workload doing many forks (e.g.  make -j10) will have a limited impact on
latency-sensitive workloads.

This is an alternative to START_DEBIT which does not have the downside of moving
newly forked threads to the end of the runqueue.

Changelog since v2:
- Apply vruntime penality even the first time the exec time has moved across
  the timeout.

Changelog since v1:
- Moving away from modifying the task weight from within the scheduler, as it is
  error-prone: modifying the weight of a queued task leads to cpu weight errors.
  For the moment, just tweak calc_delta_fair vruntime calculation. Eventually we
  could revisit the weight modification approach if we decide that it's worth
  the more intrusive changes. I redid the START_NICE benchmark, which did not
  change much: it is still appealing.


Latency benchmark:

* wakeup-latency.c (SIGEV_THREAD) with make -j10 on UP 2.0GHz

Kernel used: mainline 2.6.35.2 with smaller min_granularity and check_preempt
vruntime vs runtime comparison patches applied.

- START_DEBIT (vanilla setting)

maximum latency: 26409.0 µs
average latency: 6762.1 µs
missed timer events: 0

- NO_START_DEBIT, NO_START_NICE

maximum latency: 10001.8 µs
average latency: 1618.7 µs
missed timer events: 0

- START_NICE

maximum latency: 8351.2 µs
average latency: 1597.7 µs
missed timer events: 0


On the Xorg interactivity aspect, I notice a major improvement with START_NICE
compared to the two other settings. I just came up with a very simple repeatable
low-tech test that takes into account both input and video update
responsiveness:

Start make -j10 in a gnome-terminal
In another gnome-terminal, start pressing the space bar, holding it.
Use the cursor speed (my cursor is a full rectangle) as latency indicator. With
low latency, its speed should be constant, no stopping and no sudden
acceleration.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 include/linux/sched.h   |    2 ++
 kernel/sched.c          |    2 ++
 kernel/sched_debug.c    |   11 ++++++++---
 kernel/sched_fair.c     |   34 +++++++++++++++++++++++++++++++++-
 kernel/sched_features.h |    6 ++++++
 5 files changed, 51 insertions(+), 4 deletions(-)

Index: linux-2.6-lttng.git/kernel/sched_features.h
===================================================================
--- linux-2.6-lttng.git.orig/kernel/sched_features.h
+++ linux-2.6-lttng.git/kernel/sched_features.h
@@ -12,6 +12,12 @@ SCHED_FEAT(GENTLE_FAIR_SLEEPERS, 1)
 SCHED_FEAT(START_DEBIT, 1)
 
 /*
+ * After a fork, ensure both the parent and the child get niced for their
+ * following slice.
+ */
+SCHED_FEAT(START_NICE, 0)
+
+/*
  * Should wakeups try to preempt running tasks.
  */
 SCHED_FEAT(WAKEUP_PREEMPT, 1)
Index: linux-2.6-lttng.git/include/linux/sched.h
===================================================================
--- linux-2.6-lttng.git.orig/include/linux/sched.h
+++ linux-2.6-lttng.git/include/linux/sched.h
@@ -1132,6 +1132,8 @@ struct sched_entity {
 	u64			prev_sum_exec_runtime;
 
 	u64			nr_migrations;
+	u64			fork_nice_timeout;
+	unsigned int		fork_nice_penality;
 
 #ifdef CONFIG_SCHEDSTATS
 	struct sched_statistics statistics;
Index: linux-2.6-lttng.git/kernel/sched.c
===================================================================
--- linux-2.6-lttng.git.orig/kernel/sched.c
+++ linux-2.6-lttng.git/kernel/sched.c
@@ -2421,6 +2421,8 @@ static void __sched_fork(struct task_str
 	p->se.sum_exec_runtime		= 0;
 	p->se.prev_sum_exec_runtime	= 0;
 	p->se.nr_migrations		= 0;
+	p->se.fork_nice_timeout		= 0;
+	p->se.fork_nice_penality	= 0;
 
 #ifdef CONFIG_SCHEDSTATS
 	memset(&p->se.statistics, 0, sizeof(p->se.statistics));
Index: linux-2.6-lttng.git/kernel/sched_fair.c
===================================================================
--- linux-2.6-lttng.git.orig/kernel/sched_fair.c
+++ linux-2.6-lttng.git/kernel/sched_fair.c
@@ -433,6 +433,14 @@ calc_delta_fair(unsigned long delta, str
 	if (unlikely(se->load.weight != NICE_0_LOAD))
 		delta = calc_delta_mine(delta, NICE_0_LOAD, &se->load);
 
+	if (se->fork_nice_penality) {
+		delta <<= se->fork_nice_penality;
+		if ((s64)(se->sum_exec_runtime - se->fork_nice_timeout) > 0) {
+			se->fork_nice_penality = 0;
+			se->fork_nice_timeout = 0;
+		}
+	}
+
 	return delta;
 }
 
@@ -832,6 +840,11 @@ dequeue_entity(struct cfs_rq *cfs_rq, st
 	 */
 	if (!(flags & DEQUEUE_SLEEP))
 		se->vruntime -= cfs_rq->min_vruntime;
+
+	if (se->fork_nice_penality) {
+		se->fork_nice_penality = 0;
+		se->fork_nice_timeout = 0;
+	}
 }
 
 /*
@@ -3544,8 +3557,27 @@ static void task_fork_fair(struct task_s
 
 	update_curr(cfs_rq);
 
-	if (curr)
+	if (curr) {
 		se->vruntime = curr->vruntime;
+		if (sched_feat(START_NICE)) {
+			if (curr->fork_nice_penality &&
+			    (s64)(curr->sum_exec_runtime
+				  - curr->fork_nice_timeout) > 0) {
+				curr->fork_nice_penality = 0;
+				curr->fork_nice_timeout = 0;
+			}
+
+			if (!curr->fork_nice_timeout)
+				curr->fork_nice_timeout =
+					curr->sum_exec_runtime;
+			curr->fork_nice_timeout += sched_slice(cfs_rq, curr);
+			curr->fork_nice_penality = min_t(unsigned int,
+							 curr->fork_nice_penality + 1, 8);
+			se->fork_nice_timeout = curr->fork_nice_timeout
+						- curr->sum_exec_runtime;
+			se->fork_nice_penality = curr->fork_nice_penality;
+		}
+	}
 	place_entity(cfs_rq, se, 1);
 
 	if (sysctl_sched_child_runs_first && curr && entity_before(curr, se)) {
Index: linux-2.6-lttng.git/kernel/sched_debug.c
===================================================================
--- linux-2.6-lttng.git.orig/kernel/sched_debug.c
+++ linux-2.6-lttng.git/kernel/sched_debug.c
@@ -120,6 +120,10 @@ print_task(struct seq_file *m, struct rq
 		SEQ_printf(m, " %s", path);
 	}
 #endif
+
+	SEQ_printf(m, " %d", p->se.fork_nice_penality);
+	SEQ_printf(m, " %9Ld.%06ld", SPLIT_NS(p->se.fork_nice_timeout));
+
 	SEQ_printf(m, "\n");
 }
 
@@ -131,9 +135,10 @@ static void print_rq(struct seq_file *m,
 	SEQ_printf(m,
 	"\nrunnable tasks:\n"
 	"            task   PID         tree-key  switches  prio"
-	"     exec-runtime         sum-exec        sum-sleep\n"
-	"------------------------------------------------------"
-	"----------------------------------------------------\n");
+	"     exec-runtime         sum-exec        sum-sleep    nice-pen"
+	"     nice-pen-timeout\n"
+	"---------------------------------------------------------------"
+	"---------------------------------------------------------------\n");
 
 	read_lock_irqsave(&tasklist_lock, flags);
 
-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] sched: START_NICE feature (temporarily niced forks) (v3)
  2010-09-14 20:25 [RFC PATCH] sched: START_NICE feature (temporarily niced forks) (v3) Mathieu Desnoyers
@ 2010-09-15  8:37 ` Mike Galbraith
  2010-09-15  9:03   ` Mike Galbraith
  2010-09-20 11:43 ` Peter Zijlstra
  1 sibling, 1 reply; 11+ messages in thread
From: Mike Galbraith @ 2010-09-15  8:37 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Ingo Molnar, LKML, Peter Zijlstra, Linus Torvalds, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, Tony Lindgren

On Tue, 2010-09-14 at 16:25 -0400, Mathieu Desnoyers wrote:
> This patch tweaks the fair vruntime calculation of both the parent and the child
> after a fork to double vruntime increment speed, but this is only applied to
> their first slice after the fork. The goal of this scheme is that their
> respective vruntime will increment faster in the first slice after the fork, so
> a workload doing many forks (e.g.  make -j10) will have a limited impact on
> latency-sensitive workloads.
> 
> This is an alternative to START_DEBIT which does not have the downside of moving
> newly forked threads to the end of the runqueue.
> 
> Changelog since v2:
> - Apply vruntime penality even the first time the exec time has moved across
>   the timeout.
> 
> Changelog since v1:
> - Moving away from modifying the task weight from within the scheduler, as it is
>   error-prone: modifying the weight of a queued task leads to cpu weight errors.
>   For the moment, just tweak calc_delta_fair vruntime calculation. Eventually we
>   could revisit the weight modification approach if we decide that it's worth
>   the more intrusive changes. I redid the START_NICE benchmark, which did not
>   change much: it is still appealing.

Some numbers from my Q6600 box, applied to v2.6.36-rc3-409-g3e6dce7. 

pinned wakeup-latency competing with pinned make -j10
(taskset -c 3 ./wakeup-latency& sleep 30 && killall wakeup-latency)

STOCK (ouch ouch ouch!, excellent testcase)
maximum latency: 80558.8 µs     111838.0 µs      98196.6 µs
average latency:  9682.4 µs       9357.8 µs       9559.0 µs
missed timer events:   0               0               0

NO_START_DEBIT
maximum latency: 20015.5 µs      19247.8 µs      17204.2 µs  
average latency:  4923.0 µs       4985.2 µs       4879.5 µs
missed timer events:   0               0               0

START_NICE
maximum latency: 19185.1 µs      19988.8 µs      19492.3 µs
average latency: 4616.2 µs        4688.4 µs       4842.9 µs
missed timer events: 0                 0               0


x264 8 threads (ultrafast)
STOCK
     414.43     412.46     406.37 fps

NO_START_DEBIT
     425.13     426.70     427.01 fps

START_NICE
     423.80     424.87     419.00 fps


pert (100% hog perturbation measurement proggy) competing with make -j3, 10 sec sample interval

STOCK
pert/s:      131 >16763.81us:      239 min:  0.15 max:54498.30 avg:5702.18 sum/s:746986us overhead:74.70%
pert/s:      126 >18677.42us:      224 min:  0.13 max:65777.41 avg:6022.36 sum/s:761227us overhead:76.09%
pert/s:      127 >20308.15us:      215 min:  0.11 max:64017.28 avg:5952.13 sum/s:757706us overhead:75.71%
pert/s:      129 >21925.00us:      204 min:  0.41 max:67134.26 avg:5819.16 sum/s:752418us overhead:75.17% <== competition got

NO_START_DEBIT
pert/s:      113 >32566.80us:      140 min:  0.33 max:70870.38 avg:6937.03 sum/s:783885us overhead:78.39%
pert/s:      112 >33261.53us:      127 min:  0.35 max:72063.85 avg:6964.25 sum/s:784871us overhead:78.40%
pert/s:      111 >33372.61us:      112 min:  0.37 max:61722.37 avg:7022.35 sum/s:784397us overhead:78.44%
pert/s:      112 >34183.11us:      115 min:  0.38 max:76023.87 avg:6931.32 sum/s:781853us overhead:78.19%

START_NICE
pert/s:      153 >20082.61us:      181 min:  0.06 max:52532.34 avg:4563.81 sum/s:702371us overhead:70.24%
pert/s:      144 >21018.81us:      191 min:  0.14 max:53890.46 avg:4998.69 sum/s:722810us overhead:72.25%
pert/s:      145 >22969.69us:      185 min:  0.12 max:76028.37 avg:5026.94 sum/s:729912us overhead:72.69%
pert/s:      150 >23758.97us:      168 min:  0.12 max:56992.14 avg:4732.80 sum/s:709920us overhead:70.94%





^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] sched: START_NICE feature (temporarily niced forks) (v3)
  2010-09-15  8:37 ` Mike Galbraith
@ 2010-09-15  9:03   ` Mike Galbraith
  2010-09-15  9:22     ` Ingo Molnar
  0 siblings, 1 reply; 11+ messages in thread
From: Mike Galbraith @ 2010-09-15  9:03 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Ingo Molnar, LKML, Peter Zijlstra, Linus Torvalds, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, Tony Lindgren


Tangent: cgroups with classification user/pgid

taskset -c 3 ./wakeup-latency& sleep 30 && killall wakeup-latency
(taskset -c 3 make -j10 running)

maximum latency: 256.5 µs
average latency: 25.7 µs
missed timer events: 0

Rather spiffy for this case.

	-Mike


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] sched: START_NICE feature (temporarily niced forks) (v3)
  2010-09-15  9:03   ` Mike Galbraith
@ 2010-09-15  9:22     ` Ingo Molnar
  2010-09-15 13:12       ` Mike Galbraith
  0 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2010-09-15  9:22 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Mathieu Desnoyers, LKML, Peter Zijlstra, Linus Torvalds,
	Andrew Morton, Steven Rostedt, Thomas Gleixner, Tony Lindgren


* Mike Galbraith <efault@gmx.de> wrote:

> Tangent: cgroups with classification user/pgid
>
> taskset -c 3 ./wakeup-latency& sleep 30 && killall wakeup-latency
> (taskset -c 3 make -j10 running)
> 
> maximum latency: 256.5 µs
> average latency: 25.7 µs
> missed timer events: 0
> 
> Rather spiffy for this case.

Ouch, nothing beats proper segregation of workloads.

It would be _really_ nice to get that automatically somehow, for make -j 
jobs. Perhaps a per tty auto-cgroup-classification kernel feature?

We cannot really rely on distros to pull this off.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] sched: START_NICE feature (temporarily niced forks) (v3)
  2010-09-15  9:22     ` Ingo Molnar
@ 2010-09-15 13:12       ` Mike Galbraith
  2010-09-15 14:02         ` Ingo Molnar
  0 siblings, 1 reply; 11+ messages in thread
From: Mike Galbraith @ 2010-09-15 13:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mathieu Desnoyers, LKML, Peter Zijlstra, Linus Torvalds,
	Andrew Morton, Steven Rostedt, Thomas Gleixner, Tony Lindgren

On Wed, 2010-09-15 at 11:22 +0200, Ingo Molnar wrote:
> * Mike Galbraith <efault@gmx.de> wrote:
> 
> > Tangent: cgroups with classification user/pgid
> >
> > taskset -c 3 ./wakeup-latency& sleep 30 && killall wakeup-latency
> > (taskset -c 3 make -j10 running)
> > 
> > maximum latency: 256.5 µs
> > average latency: 25.7 µs
> > missed timer events: 0
> > 
> > Rather spiffy for this case.
> 
> Ouch, nothing beats proper segregation of workloads.
> 
> It would be _really_ nice to get that automatically somehow, for make -j 
> jobs. Perhaps a per tty auto-cgroup-classification kernel feature?
> 
> We cannot really rely on distros to pull this off.

Dunno, wasn't hard.

I'm no good at scripting, but got mine cobbled together in fairly short
order.  With it all in userland, I can turn pgid classification on/off
at the drop of a hat, and/or launch self reaping cgroups with shares set
ala nice level with another script.  Easy and flexible.

(I usually don't have cgroups enabled, cobbled pgid thing together to
see if userland could deal with it easily enough.. seems so, if little
ole /me can do it, anybody can)

	-Mike


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] sched: START_NICE feature (temporarily niced forks) (v3)
  2010-09-15 13:12       ` Mike Galbraith
@ 2010-09-15 14:02         ` Ingo Molnar
  2010-09-16 10:30           ` Mike Galbraith
  0 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2010-09-15 14:02 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Mathieu Desnoyers, LKML, Peter Zijlstra, Linus Torvalds,
	Andrew Morton, Steven Rostedt, Thomas Gleixner, Tony Lindgren


* Mike Galbraith <efault@gmx.de> wrote:

> On Wed, 2010-09-15 at 11:22 +0200, Ingo Molnar wrote:
> > * Mike Galbraith <efault@gmx.de> wrote:
> > 
> > > Tangent: cgroups with classification user/pgid
> > >
> > > taskset -c 3 ./wakeup-latency& sleep 30 && killall wakeup-latency
> > > (taskset -c 3 make -j10 running)
> > > 
> > > maximum latency: 256.5 µs
> > > average latency: 25.7 µs
> > > missed timer events: 0
> > > 
> > > Rather spiffy for this case.
> > 
> > Ouch, nothing beats proper segregation of workloads.
> > 
> > It would be _really_ nice to get that automatically somehow, for make -j 
> > jobs. Perhaps a per tty auto-cgroup-classification kernel feature?
> > 
> > We cannot really rely on distros to pull this off.
> 
> Dunno, wasn't hard.

Renicing a make -j 10 isnt hard either - still people/apps dont do it
:-/

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] sched: START_NICE feature (temporarily niced forks) (v3)
  2010-09-15 14:02         ` Ingo Molnar
@ 2010-09-16 10:30           ` Mike Galbraith
  0 siblings, 0 replies; 11+ messages in thread
From: Mike Galbraith @ 2010-09-16 10:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mathieu Desnoyers, LKML, Peter Zijlstra, Linus Torvalds,
	Andrew Morton, Steven Rostedt, Thomas Gleixner, Tony Lindgren

On Wed, 2010-09-15 at 16:02 +0200, Ingo Molnar wrote:
> * Mike Galbraith <efault@gmx.de> wrote:
> 
> > On Wed, 2010-09-15 at 11:22 +0200, Ingo Molnar wrote:
> > > * Mike Galbraith <efault@gmx.de> wrote:
> > > 
> > > > Tangent: cgroups with classification user/pgid
> > > >
> > > > taskset -c 3 ./wakeup-latency& sleep 30 && killall wakeup-latency
> > > > (taskset -c 3 make -j10 running)
> > > > 
> > > > maximum latency: 256.5 µs
> > > > average latency: 25.7 µs
> > > > missed timer events: 0
> > > > 
> > > > Rather spiffy for this case.
> > > 
> > > Ouch, nothing beats proper segregation of workloads.
> > > 
> > > It would be _really_ nice to get that automatically somehow, for make -j 
> > > jobs. Perhaps a per tty auto-cgroup-classification kernel feature?
> > > 
> > > We cannot really rely on distros to pull this off.
> > 
> > Dunno, wasn't hard.
> 
> Renicing a make -j 10 isnt hard either - still people/apps dont do it
> :-/

True.  I'll put resurrect/tinker usersched back on good intentions list.

	-Mike


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] sched: START_NICE feature (temporarily niced forks) (v3)
  2010-09-14 20:25 [RFC PATCH] sched: START_NICE feature (temporarily niced forks) (v3) Mathieu Desnoyers
  2010-09-15  8:37 ` Mike Galbraith
@ 2010-09-20 11:43 ` Peter Zijlstra
  2010-09-20 16:02   ` Mathieu Desnoyers
  1 sibling, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2010-09-20 11:43 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Ingo Molnar, LKML, Mike Galbraith, Linus Torvalds, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, Tony Lindgren

On Tue, 2010-09-14 at 16:25 -0400, Mathieu Desnoyers wrote:


> Index: linux-2.6-lttng.git/include/linux/sched.h
> ===================================================================
> --- linux-2.6-lttng.git.orig/include/linux/sched.h
> +++ linux-2.6-lttng.git/include/linux/sched.h
> @@ -1132,6 +1132,8 @@ struct sched_entity {
>  	u64			prev_sum_exec_runtime;
>  
>  	u64			nr_migrations;
> +	u64			fork_nice_timeout;
> +	unsigned int		fork_nice_penality;
>  
>  #ifdef CONFIG_SCHEDSTATS
>  	struct sched_statistics statistics;
> Index: linux-2.6-lttng.git/kernel/sched.c
> ===================================================================
> --- linux-2.6-lttng.git.orig/kernel/sched.c
> +++ linux-2.6-lttng.git/kernel/sched.c
> @@ -2421,6 +2421,8 @@ static void __sched_fork(struct task_str
>  	p->se.sum_exec_runtime		= 0;
>  	p->se.prev_sum_exec_runtime	= 0;
>  	p->se.nr_migrations		= 0;
> +	p->se.fork_nice_timeout		= 0;
> +	p->se.fork_nice_penality	= 0;
>  
>  #ifdef CONFIG_SCHEDSTATS
>  	memset(&p->se.statistics, 0, sizeof(p->se.statistics));
> Index: linux-2.6-lttng.git/kernel/sched_fair.c
> ===================================================================
> --- linux-2.6-lttng.git.orig/kernel/sched_fair.c
> +++ linux-2.6-lttng.git/kernel/sched_fair.c
> @@ -433,6 +433,14 @@ calc_delta_fair(unsigned long delta, str
>  	if (unlikely(se->load.weight != NICE_0_LOAD))
>  		delta = calc_delta_mine(delta, NICE_0_LOAD, &se->load);
>  
> +	if (se->fork_nice_penality) {
> +		delta <<= se->fork_nice_penality;
> +		if ((s64)(se->sum_exec_runtime - se->fork_nice_timeout) > 0) {
> +			se->fork_nice_penality = 0;
> +			se->fork_nice_timeout = 0;
> +		}
> +	}
> +
>  	return delta;
>  }

Something like this ought to live at every place where you use se->load,
including sched_slice(), possibly wakeup_gran(), although that's more
heuristic, so you could possibly leave it out there.

> @@ -832,6 +840,11 @@ dequeue_entity(struct cfs_rq *cfs_rq, st
>  	 */
>  	if (!(flags & DEQUEUE_SLEEP))
>  		se->vruntime -= cfs_rq->min_vruntime;
> +
> +	if (se->fork_nice_penality) {
> +		se->fork_nice_penality = 0;
> +		se->fork_nice_timeout = 0;
> +	}
>  }
>  
>  /*

So you want to reset this penalty on each de-schedule, not only sleep
(but also preemptions)?

> @@ -3544,8 +3557,27 @@ static void task_fork_fair(struct task_s
>  
>  	update_curr(cfs_rq);
>  
> -	if (curr)
> +	if (curr) {
>  		se->vruntime = curr->vruntime;
> +		if (sched_feat(START_NICE)) {
> +			if (curr->fork_nice_penality &&
> +			    (s64)(curr->sum_exec_runtime
> +				  - curr->fork_nice_timeout) > 0) {
> +				curr->fork_nice_penality = 0;
> +				curr->fork_nice_timeout = 0;
> +			}
> +
> +			if (!curr->fork_nice_timeout)
> +				curr->fork_nice_timeout =
> +					curr->sum_exec_runtime;
> +			curr->fork_nice_timeout += sched_slice(cfs_rq, curr);
> +			curr->fork_nice_penality = min_t(unsigned int,
> +							 curr->fork_nice_penality + 1, 8);
> +			se->fork_nice_timeout = curr->fork_nice_timeout
> +						- curr->sum_exec_runtime;
> +			se->fork_nice_penality = curr->fork_nice_penality;
> +		}
> +	}
>  	place_entity(cfs_rq, se, 1);
>  
>  	if (sysctl_sched_child_runs_first && curr && entity_before(curr, se)) {

If you stick than in a separate function you can loose 2 indent levels,
which would help with readability.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] sched: START_NICE feature (temporarily niced forks) (v3)
  2010-09-20 11:43 ` Peter Zijlstra
@ 2010-09-20 16:02   ` Mathieu Desnoyers
  2010-09-20 16:15     ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Mathieu Desnoyers @ 2010-09-20 16:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, LKML, Mike Galbraith, Linus Torvalds, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, Tony Lindgren

* Peter Zijlstra (peterz@infradead.org) wrote:
> On Tue, 2010-09-14 at 16:25 -0400, Mathieu Desnoyers wrote:
> 
> 
> > Index: linux-2.6-lttng.git/include/linux/sched.h
> > ===================================================================
> > --- linux-2.6-lttng.git.orig/include/linux/sched.h
> > +++ linux-2.6-lttng.git/include/linux/sched.h
> > @@ -1132,6 +1132,8 @@ struct sched_entity {
> >  	u64			prev_sum_exec_runtime;
> >  
> >  	u64			nr_migrations;
> > +	u64			fork_nice_timeout;
> > +	unsigned int		fork_nice_penality;
> >  
> >  #ifdef CONFIG_SCHEDSTATS
> >  	struct sched_statistics statistics;
> > Index: linux-2.6-lttng.git/kernel/sched.c
> > ===================================================================
> > --- linux-2.6-lttng.git.orig/kernel/sched.c
> > +++ linux-2.6-lttng.git/kernel/sched.c
> > @@ -2421,6 +2421,8 @@ static void __sched_fork(struct task_str
> >  	p->se.sum_exec_runtime		= 0;
> >  	p->se.prev_sum_exec_runtime	= 0;
> >  	p->se.nr_migrations		= 0;
> > +	p->se.fork_nice_timeout		= 0;
> > +	p->se.fork_nice_penality	= 0;
> >  
> >  #ifdef CONFIG_SCHEDSTATS
> >  	memset(&p->se.statistics, 0, sizeof(p->se.statistics));
> > Index: linux-2.6-lttng.git/kernel/sched_fair.c
> > ===================================================================
> > --- linux-2.6-lttng.git.orig/kernel/sched_fair.c
> > +++ linux-2.6-lttng.git/kernel/sched_fair.c
> > @@ -433,6 +433,14 @@ calc_delta_fair(unsigned long delta, str
> >  	if (unlikely(se->load.weight != NICE_0_LOAD))
> >  		delta = calc_delta_mine(delta, NICE_0_LOAD, &se->load);
> >  
> > +	if (se->fork_nice_penality) {
> > +		delta <<= se->fork_nice_penality;
> > +		if ((s64)(se->sum_exec_runtime - se->fork_nice_timeout) > 0) {
> > +			se->fork_nice_penality = 0;
> > +			se->fork_nice_timeout = 0;
> > +		}
> > +	}
> > +
> >  	return delta;
> >  }
> 
> Something like this ought to live at every place where you use se->load,
> including sched_slice(), possibly wakeup_gran(), although that's more
> heuristic, so you could possibly leave it out there.

Agreed for wakeup_gran(). I'll just remove the duplicate "if
(unlikely(se->load.weight != NICE_0_LOAD))" check.

For sched_slice(), I don't know. sched_vslice() is used to take nice level into
account when placing new tasks. sched_slice() takes only the weight into
account, not the nice level. So given that I want to mimic the nice level
impact, I'm not sure we have to take this into account at the sched_slice level.

Also, I wonder if leaving it out of account_entity_enqueue/dequeue() calls to
add_cfs_task_weight() and inc/dec_cpu_load is OK ? Because it can be a pain to
reequilibrate the cpu and task weights when the timeout occurs.  The temporary
effect of this nice-on-fork is to make the tasks a little lighter, so the weight
is not accurate. But I wonder if we really care that much about it.

> 
> > @@ -832,6 +840,11 @@ dequeue_entity(struct cfs_rq *cfs_rq, st
> >  	 */
> >  	if (!(flags & DEQUEUE_SLEEP))
> >  		se->vruntime -= cfs_rq->min_vruntime;
> > +
> > +	if (se->fork_nice_penality) {
> > +		se->fork_nice_penality = 0;
> > +		se->fork_nice_timeout = 0;
> > +	}
> >  }
> >  
> >  /*
> 
> So you want to reset this penalty on each de-schedule, not only sleep
> (but also preemptions)?

only sleeps. So I should put this within a 

if (flags & DEQUEUE_SLEEP) {
  ...
}

I suppose ?

> 
> > @@ -3544,8 +3557,27 @@ static void task_fork_fair(struct task_s
> >  
> >  	update_curr(cfs_rq);
> >  
> > -	if (curr)
> > +	if (curr) {
> >  		se->vruntime = curr->vruntime;
> > +		if (sched_feat(START_NICE)) {
> > +			if (curr->fork_nice_penality &&
> > +			    (s64)(curr->sum_exec_runtime
> > +				  - curr->fork_nice_timeout) > 0) {
> > +				curr->fork_nice_penality = 0;
> > +				curr->fork_nice_timeout = 0;
> > +			}
> > +
> > +			if (!curr->fork_nice_timeout)
> > +				curr->fork_nice_timeout =
> > +					curr->sum_exec_runtime;
> > +			curr->fork_nice_timeout += sched_slice(cfs_rq, curr);
> > +			curr->fork_nice_penality = min_t(unsigned int,
> > +							 curr->fork_nice_penality + 1, 8);
> > +			se->fork_nice_timeout = curr->fork_nice_timeout
> > +						- curr->sum_exec_runtime;
> > +			se->fork_nice_penality = curr->fork_nice_penality;
> > +		}
> > +	}
> >  	place_entity(cfs_rq, se, 1);
> >  
> >  	if (sysctl_sched_child_runs_first && curr && entity_before(curr, se)) {
> 
> If you stick than in a separate function you can loose 2 indent levels,
> which would help with readability.

Excellent point, will do! That will let me add more comments into the function
too.

Thanks a lot!

Mathieu


-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] sched: START_NICE feature (temporarily niced forks) (v3)
  2010-09-20 16:02   ` Mathieu Desnoyers
@ 2010-09-20 16:15     ` Peter Zijlstra
  2010-09-20 18:49       ` Mathieu Desnoyers
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2010-09-20 16:15 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Ingo Molnar, LKML, Mike Galbraith, Linus Torvalds, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, Tony Lindgren

On Mon, 2010-09-20 at 12:02 -0400, Mathieu Desnoyers wrote:
> > > Index: linux-2.6-lttng.git/kernel/sched_fair.c
> > > ===================================================================
> > > --- linux-2.6-lttng.git.orig/kernel/sched_fair.c
> > > +++ linux-2.6-lttng.git/kernel/sched_fair.c
> > > @@ -433,6 +433,14 @@ calc_delta_fair(unsigned long delta, str
> > >     if (unlikely(se->load.weight != NICE_0_LOAD))
> > >             delta = calc_delta_mine(delta, NICE_0_LOAD, &se->load);
> > >  
> > > +   if (se->fork_nice_penality) {
> > > +           delta <<= se->fork_nice_penality;
> > > +           if ((s64)(se->sum_exec_runtime - se->fork_nice_timeout) > 0) {
> > > +                   se->fork_nice_penality = 0;
> > > +                   se->fork_nice_timeout = 0;
> > > +           }
> > > +   }
> > > +
> > >     return delta;
> > >  }
> > 
> > Something like this ought to live at every place where you use se->load,
> > including sched_slice(), possibly wakeup_gran(), although that's more
> > heuristic, so you could possibly leave it out there.
> 
> Agreed for wakeup_gran(). I'll just remove the duplicate "if
> (unlikely(se->load.weight != NICE_0_LOAD))" check.
> 
> For sched_slice(), I don't know. sched_vslice() is used to take nice level into
> account when placing new tasks. sched_slice() takes only the weight into
> account, not the nice level. 

nice-level == weight

> So given that I want to mimic the nice level
> impact, I'm not sure we have to take this into account at the sched_slice level.

If you renice, we change the weight, hence you need to propagate this
penalty to every place we use the weight.

> Also, I wonder if leaving it out of account_entity_enqueue/dequeue() calls to
> add_cfs_task_weight() and inc/dec_cpu_load is OK ? Because it can be a pain to
> reequilibrate the cpu and task weights when the timeout occurs.  The temporary
> effect of this nice-on-fork is to make the tasks a little lighter, so the weight
> is not accurate. But I wonder if we really care that much about it.

Yeah, propagating the accumulated weight effect is a bit of a bother
like you noticed.

We can simply try, by lowering the effective weight and not propagating
this to the accumulated weight, the effect is even stronger. Suppose you
have 2 tasks of weight 1, then fork so that two tasks get half weight. 

Then if you propagate the accumulated weight it would look like:
1:.5:.5 with a total weight of 2, so that each of these light tasks get
1/4th the time. If, however you do not propagate, you get something
like: 1:.5:.5 on 3, so that each of these light tasks gets 1/6th of the
total time.
 
Its a bit of a trade-off, not propagating, simpler, less code, slightly
wrong numbers, against propagating, more complex/expensive but slightly
better numbers.

If you care you can implement both and measure it, but I'm not too
bothered -- we can always fix it if it turns out to have definite
down-sides.

> > > @@ -832,6 +840,11 @@ dequeue_entity(struct cfs_rq *cfs_rq, st
> > >      */
> > >     if (!(flags & DEQUEUE_SLEEP))
> > >             se->vruntime -= cfs_rq->min_vruntime;
> > > +
> > > +   if (se->fork_nice_penality) {
> > > +           se->fork_nice_penality = 0;
> > > +           se->fork_nice_timeout = 0;
> > > +   }
> > >  }
> > >  
> > >  /*
> > 
> > So you want to reset this penalty on each de-schedule, not only sleep
> > (but also preemptions)?
> 
> only sleeps. So I should put this within a 
> 
> if (flags & DEQUEUE_SLEEP) {
>   ...
> }
> 
> I suppose ? 

Yep.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] sched: START_NICE feature (temporarily niced forks) (v3)
  2010-09-20 16:15     ` Peter Zijlstra
@ 2010-09-20 18:49       ` Mathieu Desnoyers
  0 siblings, 0 replies; 11+ messages in thread
From: Mathieu Desnoyers @ 2010-09-20 18:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, LKML, Mike Galbraith, Linus Torvalds, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, Tony Lindgren

* Peter Zijlstra (peterz@infradead.org) wrote:
> On Mon, 2010-09-20 at 12:02 -0400, Mathieu Desnoyers wrote:
> > > > Index: linux-2.6-lttng.git/kernel/sched_fair.c
> > > > ===================================================================
> > > > --- linux-2.6-lttng.git.orig/kernel/sched_fair.c
> > > > +++ linux-2.6-lttng.git/kernel/sched_fair.c
> > > > @@ -433,6 +433,14 @@ calc_delta_fair(unsigned long delta, str
> > > >     if (unlikely(se->load.weight != NICE_0_LOAD))
> > > >             delta = calc_delta_mine(delta, NICE_0_LOAD, &se->load);
> > > >  
> > > > +   if (se->fork_nice_penality) {
> > > > +           delta <<= se->fork_nice_penality;
> > > > +           if ((s64)(se->sum_exec_runtime - se->fork_nice_timeout) > 0) {
> > > > +                   se->fork_nice_penality = 0;
> > > > +                   se->fork_nice_timeout = 0;
> > > > +           }
> > > > +   }
> > > > +
> > > >     return delta;
> > > >  }
> > > 
> > > Something like this ought to live at every place where you use se->load,
> > > including sched_slice(), possibly wakeup_gran(), although that's more
> > > heuristic, so you could possibly leave it out there.
> > 
> > Agreed for wakeup_gran(). I'll just remove the duplicate "if
> > (unlikely(se->load.weight != NICE_0_LOAD))" check.
> > 
> > For sched_slice(), I don't know. sched_vslice() is used to take nice level into
> > account when placing new tasks. sched_slice() takes only the weight into
> > account, not the nice level. 
> 
> nice-level == weight
> 
> > So given that I want to mimic the nice level
> > impact, I'm not sure we have to take this into account at the sched_slice level.
> 
> If you renice, we change the weight, hence you need to propagate this
> penalty to every place we use the weight.

OK

> 
> > Also, I wonder if leaving it out of account_entity_enqueue/dequeue() calls to
> > add_cfs_task_weight() and inc/dec_cpu_load is OK ? Because it can be a pain to
> > reequilibrate the cpu and task weights when the timeout occurs.  The temporary
> > effect of this nice-on-fork is to make the tasks a little lighter, so the weight
> > is not accurate. But I wonder if we really care that much about it.
> 
> Yeah, propagating the accumulated weight effect is a bit of a bother
> like you noticed.
> 
> We can simply try, by lowering the effective weight and not propagating
> this to the accumulated weight, the effect is even stronger. Suppose you
> have 2 tasks of weight 1, then fork so that two tasks get half weight. 
> 
> Then if you propagate the accumulated weight it would look like:
> 1:.5:.5 with a total weight of 2, so that each of these light tasks get
> 1/4th the time. If, however you do not propagate, you get something
> like: 1:.5:.5 on 3, so that each of these light tasks gets 1/6th of the
> total time.
>  
> Its a bit of a trade-off, not propagating, simpler, less code, slightly
> wrong numbers, against propagating, more complex/expensive but slightly
> better numbers.
> 
> If you care you can implement both and measure it, but I'm not too
> bothered -- we can always fix it if it turns out to have definite
> down-sides.

Yeah, I think an approximation will be enough too. I'll keep my current approach
which does not update the accumulated weight.

> 
> > > > @@ -832,6 +840,11 @@ dequeue_entity(struct cfs_rq *cfs_rq, st
> > > >      */
> > > >     if (!(flags & DEQUEUE_SLEEP))
> > > >             se->vruntime -= cfs_rq->min_vruntime;
> > > > +
> > > > +   if (se->fork_nice_penality) {
> > > > +           se->fork_nice_penality = 0;
> > > > +           se->fork_nice_timeout = 0;
> > > > +   }
> > > >  }
> > > >  
> > > >  /*
> > > 
> > > So you want to reset this penalty on each de-schedule, not only sleep
> > > (but also preemptions)?
> > 
> > only sleeps. So I should put this within a 
> > 
> > if (flags & DEQUEUE_SLEEP) {
> >   ...
> > }
> > 
> > I suppose ? 
> 
> Yep.

OK, thanks !

I'll post v3 soon, incorporating the changes you recommended.

Mathieu


-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-09-20 18:49 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-14 20:25 [RFC PATCH] sched: START_NICE feature (temporarily niced forks) (v3) Mathieu Desnoyers
2010-09-15  8:37 ` Mike Galbraith
2010-09-15  9:03   ` Mike Galbraith
2010-09-15  9:22     ` Ingo Molnar
2010-09-15 13:12       ` Mike Galbraith
2010-09-15 14:02         ` Ingo Molnar
2010-09-16 10:30           ` Mike Galbraith
2010-09-20 11:43 ` Peter Zijlstra
2010-09-20 16:02   ` Mathieu Desnoyers
2010-09-20 16:15     ` Peter Zijlstra
2010-09-20 18:49       ` Mathieu Desnoyers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).