public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [announce] CFS-devel, performance improvements
@ 2007-09-11 20:04 Ingo Molnar
  2007-09-12  0:42 ` Roman Zippel
                   ` (3 more replies)
  0 siblings, 4 replies; 49+ messages in thread
From: Ingo Molnar @ 2007-09-11 20:04 UTC (permalink / raw)
  To: linux-kernel; +Cc: Peter Zijlstra, Mike Galbraith, Roman Zippel


fresh back from the Kernel Summit, Peter Zijlstra and me are pleased to 
announce the latest iteration of the CFS scheduler development tree. Our 
main focus has been on simplifications and performance - and as part of 
that we've also picked up some ideas from Roman Zippel's 'Really Fair 
Scheduler' patch as well and integrated them into CFS. We'd like to ask 
people go give these patches a good workout, especially with an eye on 
any interactivity regressions.

The combo patch against 2.6.23-rc6 can be picked up from:

  http://people.redhat.com/mingo/cfs-scheduler/devel/

The sched-devel.git tree can be pulled from:

   git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel.git

There are lots of small performance improvements in form of a 
finegrained 29-patch series. We have removed a number of features and 
metrics from CFS that might have been needed but ended up being 
superfluous - while keeping the things that worked out fine, like 
sleeper fairness. On 32-bit x86 there's a ~16% speedup (over -rc6) in 
lmbench (lat_ctx -s 0 2) results:

                                  (microseconds, lower is better)
     ------------------------------------------------------------
        v2.6.22    2.6.23-rc6(CFS)     v2.6.23-rc6-CFS-devel
     ----------------------------------------------------
           0.70          0.75                0.65
           0.62          0.66                0.63
           0.60          0.72                0.69
           0.62          0.74                0.61
           0.69          0.73                0.53
           0.66          0.73                0.63
           0.63          0.69                0.61
           0.63          0.70                0.64
           0.61          0.76                0.61
           0.69          0.74                0.63
     ----------------------------------------------------
      avg: 0.64          0.72 (+12%)         0.62 (-3%)

there is a similar speedup on 64-bit x86 as well. We are now a bit 
faster than the O(1) scheduler was under v2.6.22 - even on 32-bit. The 
main speedup comes from the avoidance of divisions (or shifts) in the 
wakeup and context-switch fastpaths.

there's also a visible reduction in code size:

   text    data     bss     dec     hex filename
  13369     228    2036   15633    3d11 sched.o.before  (UP, nodebug)
  11167     224    1988   13379    3443 sched.o.after   (UP, nodebug)

which obviously helps embedded and is good for performance as well. Even 
on 32-bit we are now within 1% of the size of v2.6.22's sched.o, which 
was:

   text    data     bss     dec     hex filename
   9915      24    3344   13283    33e3 sched.o.v2.6.22

and on SMP the new scheduler is now substantially smaller:

   text    data     bss     dec     hex filename
  24972    4149      24   29145    71d9 sched.o-v2.6.22
  24056    2594      16   26666    682a sched.o-CFS-devel

Changes: besides the many micro-optimizations, one of the changes is 
that se->vruntime (virtual runtime) based scheduling has been introduced 
gradually, step by step - while keeping the wait_runtime metric working 
too. (so that the two methods are comparable side by side, in the same 
scheduler)

The ->vruntime metric is similar to the ->time_norm metric used by 
Roman's patch (and both are losely related to the already existing 
sum_exec_runtime metric in CFS), it's in essence the sum of CPU time 
executed by a task, in nanoseconds - weighted up or down by their nice 
level (or kept the same on the default nice 0 level). Besides this basic 
metric our implementation and math differs from RFS. The two approaches 
should be conceptually more comparable from now on.

We have also picked up two cleanups from RFS (the cfs_rq->curr approach 
and an uninlining optimization) and there's also a cleanup patch from 
Matthias Kaehlcke. We welcome and encourage finegrained patches against 
this patchset. As usual, bugreports, fixes and suggestions are welcome,

	Ingo, Peter

------------------>
Matthias Kaehlcke (1):
      sched: use list_for_each_entry_safe() in __wake_up_common()

Peter Zijlstra (5):
      sched: simplify SCHED_FEAT_* code
      sched: new task placement for vruntime
      sched: simplify adaptive latency
      sched: clean up new task placement
      sched: add tree based averages

Ingo Molnar (23):
      sched: fix new-task method
      sched: small sched_debug cleanup
      sched: debug: track maximum 'slice'
      sched: uniform tunings
      sched: use constants if !CONFIG_SCHED_DEBUG
      sched: remove stat_gran
      sched: remove precise CPU load
      sched: remove precise CPU load calculations #2
      sched: track cfs_rq->curr on !group-scheduling too
      sched: cleanup: simplify cfs_rq_curr() methods
      sched: uninline __enqueue_entity()/__dequeue_entity()
      sched: speed up update_load_add/_sub()
      sched: clean up calc_weighted()
      sched: introduce se->vruntime
      sched: move sched_feat() definitions
      sched: optimize vruntime based scheduling
      sched: simplify check_preempt() methods
      sched: wakeup granularity fix
      sched: add se->vruntime debugging
      sched: debug: update exec_clock only when SCHED_DEBUG
      sched: remove wait_runtime limit
      sched: remove wait_runtime fields and features
      sched: x86: allow single-depth wchan output

 arch/i386/Kconfig     |   11 
 include/linux/sched.h |   17 -
 kernel/sched.c        |  196 ++++-------------
 kernel/sched_debug.c  |   86 +++----
 kernel/sched_fair.c   |  557 +++++++++++++-------------------------------------
 kernel/sysctl.c       |   22 -
 6 files changed, 243 insertions(+), 646 deletions(-)

^ permalink raw reply	[flat|nested] 49+ messages in thread
* Re: [announce] CFS-devel, performance improvements
@ 2007-09-13 22:51 dimm
  2007-09-14  8:13 ` Ingo Molnar
  2007-09-14  8:13 ` Ingo Molnar
  0 siblings, 2 replies; 49+ messages in thread
From: dimm @ 2007-09-13 22:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Roman Zippel, Mike Galbraith, dmitry.adamushko,
	linux-kernel


Hi,

please find a couple of minor cleanups below (on top of sched-cfs-v2.6.23-rc6-v21-combo-3.patch):


(1)

Better placement of #ifdef CONFIG_SCHEDSTAT block in dequeue_entity().

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>

---
diff -upr linux-2.6.23-rc6/kernel/sched_fair.c linux-2.6.23-rc6-my/kernel/sched_fair.c
--- linux-2.6.23-rc6/kernel/sched_fair.c	2007-09-13 21:38:49.000000000 +0200
+++ linux-2.6.23-rc6-my/kernel/sched_fair.c	2007-09-13 21:48:50.000000000 +0200
@@ -453,8 +453,8 @@ static void
 dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int sleep)
 {
 	update_stats_dequeue(cfs_rq, se);
-	if (sleep) {
 #ifdef CONFIG_SCHEDSTATS
+	if (sleep) {
 		if (entity_is_task(se)) {
 			struct task_struct *tsk = task_of(se);
 
@@ -463,8 +463,8 @@ dequeue_entity(struct cfs_rq *cfs_rq, st
 			if (tsk->state & TASK_UNINTERRUPTIBLE)
 				se->block_start = rq_of(cfs_rq)->clock;
 		}
-#endif
 	}
+#endif
 	__dequeue_entity(cfs_rq, se);
 }
 
---


(2)

unless we are very eager to keep an additional layer of abstraction,
'struct load_stat' is redundant now so let's get rid of it.

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>


---
diff -upr linux-2.6.23-rc6/kernel/sched.c linux-2.6.23-rc6-sched-dev/kernel/sched.c
--- linux-2.6.23-rc6/kernel/sched.c	2007-09-12 21:37:41.000000000 +0200
+++ linux-2.6.23-rc6-sched-dev/kernel/sched.c	2007-09-12 21:26:10.000000000 +0200
@@ -170,10 +170,6 @@ struct rt_prio_array {
 	struct list_head queue[MAX_RT_PRIO];
 };
 
-struct load_stat {
-	struct load_weight load;
-};
-
 /* CFS-related fields in a runqueue */
 struct cfs_rq {
 	struct load_weight load;
@@ -232,7 +228,7 @@ struct rq {
 #ifdef CONFIG_NO_HZ
 	unsigned char in_nohz_recently;
 #endif
-	struct load_stat ls;	/* capture load from *all* tasks on this cpu */
+	struct load_weight load;	/* capture load from *all* tasks on this cpu */
 	unsigned long nr_load_updates;
 	u64 nr_switches;
 
@@ -804,7 +800,7 @@ static int balance_tasks(struct rq *this
  * Update delta_exec, delta_fair fields for rq.
  *
  * delta_fair clock advances at a rate inversely proportional to
- * total load (rq->ls.load.weight) on the runqueue, while
+ * total load (rq->load.weight) on the runqueue, while
  * delta_exec advances at the same rate as wall-clock (provided
  * cpu is not idle).
  *
@@ -812,17 +808,17 @@ static int balance_tasks(struct rq *this
  * runqueue over any given interval. This (smoothened) load is used
  * during load balance.
  *
- * This function is called /before/ updating rq->ls.load
+ * This function is called /before/ updating rq->load
  * and when switching tasks.
  */
 static inline void inc_load(struct rq *rq, const struct task_struct *p)
 {
-	update_load_add(&rq->ls.load, p->se.load.weight);
+	update_load_add(&rq->load, p->se.load.weight);
 }
 
 static inline void dec_load(struct rq *rq, const struct task_struct *p)
 {
-	update_load_sub(&rq->ls.load, p->se.load.weight);
+	update_load_sub(&rq->load, p->se.load.weight);
 }
 
 static void inc_nr_running(struct task_struct *p, struct rq *rq)
@@ -967,7 +963,7 @@ inline int task_curr(const struct task_s
 /* Used instead of source_load when we know the type == 0 */
 unsigned long weighted_cpuload(const int cpu)
 {
-	return cpu_rq(cpu)->ls.load.weight;
+	return cpu_rq(cpu)->load.weight;
 }
 
 static inline void __set_task_cpu(struct task_struct *p, unsigned int cpu)
@@ -1933,7 +1929,7 @@ unsigned long nr_active(void)
  */
 static void update_cpu_load(struct rq *this_rq)
 {
-	unsigned long this_load = this_rq->ls.load.weight;
+	unsigned long this_load = this_rq->load.weight;
 	int i, scale;
 
 	this_rq->nr_load_updates++;
diff -upr linux-2.6.23-rc6/kernel/sched_debug.c linux-2.6.23-rc6-sched-dev/kernel/sched_debug.c
--- linux-2.6.23-rc6/kernel/sched_debug.c	2007-09-12 21:37:41.000000000 +0200
+++ linux-2.6.23-rc6-sched-dev/kernel/sched_debug.c	2007-09-12 21:36:04.000000000 +0200
@@ -137,7 +137,7 @@ static void print_cpu(struct seq_file *m
 
 	P(nr_running);
 	SEQ_printf(m, "  .%-30s: %lu\n", "load",
-		   rq->ls.load.weight);
+		   rq->load.weight);
 	P(nr_switches);
 	P(nr_load_updates);
 	P(nr_uninterruptible);
diff -upr linux-2.6.23-rc6/kernel/sched_fair.c linux-2.6.23-rc6-sched-dev/kernel/sched_fair.c
--- linux-2.6.23-rc6/kernel/sched_fair.c	2007-09-12 21:37:41.000000000 +0200
+++ linux-2.6.23-rc6-sched-dev/kernel/sched_fair.c	2007-09-12 21:35:27.000000000 +0200
@@ -499,7 +499,7 @@ set_next_entity(struct cfs_rq *cfs_rq, s
 	 * least twice that of our own weight (i.e. dont track it
 	 * when there are only lesser-weight tasks around):
 	 */
-	if (rq_of(cfs_rq)->ls.load.weight >= 2*se->load.weight) {
+	if (rq_of(cfs_rq)->load.weight >= 2*se->load.weight) {
 		se->slice_max = max(se->slice_max,
 			se->sum_exec_runtime - se->prev_sum_exec_runtime);
 	}

---


^ permalink raw reply	[flat|nested] 49+ messages in thread
* Re: [announce] CFS-devel, performance improvements
@ 2007-09-13 23:25 dimm
  2007-09-14  8:17 ` Ingo Molnar
  0 siblings, 1 reply; 49+ messages in thread
From: dimm @ 2007-09-13 23:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Roman Zippel, Mike Galbraith, dmitry.adamushko,
	linux-kernel


and here's something a bit more intrusive.

The initial idea was to completely get rid of 'se->fair_key'. It's always equal to 'se->vruntime' for
all runnable tasks but the 'current'. The exact key within the tree for the 'current' has to be known in
order for __enqueue_entity() to work properly (if we just use 'vruntime', we may go a wrong way down the tree
while looking for the correct position for a new element).
Sure, it's possible to cache the current's key in the 'cfs_rq' and add a few additional checks, but that's
not very nice... so what if we don't keep the 'current' within the tree? :-)

The illustration is below. Some bits can be missed so far but a patched kernel boots/works
(haven't done real regression tests yet... can say that the mail client is still working
at this very moment :-).

There are 2 benefits:

(1) no more 'fair_key' ;
(2) entity_tick() is simpler/more effective : 'update_curr()' now vs.
'dequeue_entity() + enqueue_entity()' before.

anyway, consider it as mainly an illustration of idea so far.

---
diff -upr linux-2.6.23-rc6/include/linux/sched.h linux-2.6.23-rc6-my/include/linux/sched.h
--- linux-2.6.23-rc6/include/linux/sched.h	2007-09-13 21:38:49.000000000 +0200
+++ linux-2.6.23-rc6-my/include/linux/sched.h	2007-09-13 23:01:21.000000000 +0200
@@ -890,7 +890,6 @@ struct load_weight {
  *     6 se->load.weight
  */
 struct sched_entity {
-	s64			fair_key;
 	struct load_weight	load;		/* for load-balancing */
 	struct rb_node		run_node;
 	unsigned int		on_rq;
diff -upr linux-2.6.23-rc6/kernel/sched.c linux-2.6.23-rc6-my/kernel/sched.c
--- linux-2.6.23-rc6/kernel/sched.c	2007-09-13 21:52:13.000000000 +0200
+++ linux-2.6.23-rc6-my/kernel/sched.c	2007-09-13 23:00:19.000000000 +0200
@@ -6534,7 +6534,6 @@ void normalize_rt_tasks(void)
 
 	read_lock_irq(&tasklist_lock);
 	do_each_thread(g, p) {
-		p->se.fair_key			= 0;
 		p->se.exec_start		= 0;
 #ifdef CONFIG_SCHEDSTATS
 		p->se.wait_start		= 0;
diff -upr linux-2.6.23-rc6/kernel/sched_debug.c linux-2.6.23-rc6-my/kernel/sched_debug.c
--- linux-2.6.23-rc6/kernel/sched_debug.c	2007-09-13 21:52:13.000000000 +0200
+++ linux-2.6.23-rc6-my/kernel/sched_debug.c	2007-09-13 23:00:50.000000000 +0200
@@ -38,7 +38,7 @@ print_task(struct seq_file *m, struct rq
 
 	SEQ_printf(m, "%15s %5d %15Ld %13Ld %5d ",
 		p->comm, p->pid,
-		(long long)p->se.fair_key,
+		(long long)p->se.vruntime,
 		(long long)(p->nvcsw + p->nivcsw),
 		p->prio);
 #ifdef CONFIG_SCHEDSTATS
diff -upr linux-2.6.23-rc6/kernel/sched_fair.c linux-2.6.23-rc6-my/kernel/sched_fair.c
--- linux-2.6.23-rc6/kernel/sched_fair.c	2007-09-13 21:52:13.000000000 +0200
+++ linux-2.6.23-rc6-my/kernel/sched_fair.c	2007-09-13 23:48:02.000000000 +0200
@@ -125,7 +125,7 @@ set_leftmost(struct cfs_rq *cfs_rq, stru
 
 s64 entity_key(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
-	return se->fair_key - cfs_rq->min_vruntime;
+	return se->vruntime - cfs_rq->min_vruntime;
 }
 
 /*
@@ -167,9 +167,6 @@ __enqueue_entity(struct cfs_rq *cfs_rq, 
 
 	rb_link_node(&se->run_node, parent, link);
 	rb_insert_color(&se->run_node, &cfs_rq->tasks_timeline);
-	update_load_add(&cfs_rq->load, se->load.weight);
-	cfs_rq->nr_running++;
-	se->on_rq = 1;
 }
 
 static void
@@ -179,9 +176,6 @@ __dequeue_entity(struct cfs_rq *cfs_rq, 
 		set_leftmost(cfs_rq, rb_next(&se->run_node));
 
 	rb_erase(&se->run_node, &cfs_rq->tasks_timeline);
-	update_load_sub(&cfs_rq->load, se->load.weight);
-	cfs_rq->nr_running--;
-	se->on_rq = 0;
 }
 
 static inline struct rb_node *first_fair(struct cfs_rq *cfs_rq)
@@ -320,10 +314,6 @@ static void update_stats_enqueue(struct 
 	 */
 	if (se != cfs_rq->curr)
 		update_stats_wait_start(cfs_rq, se);
-	/*
-	 * Update the key:
-	 */
-	se->fair_key = se->vruntime;
 }
 
 static void
@@ -371,6 +361,22 @@ update_stats_curr_end(struct cfs_rq *cfs
  * Scheduling class queueing methods:
  */
 
+static void
+account_entity_enqueue(struct cfs_rq *cfs_rq, struct sched_entity *se)
+{
+	update_load_add(&cfs_rq->load, se->load.weight);
+	cfs_rq->nr_running++;
+	se->on_rq = 1;
+}
+
+static void
+account_entity_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se)
+{
+	update_load_sub(&cfs_rq->load, se->load.weight);
+	cfs_rq->nr_running--;
+	se->on_rq = 0;
+}
+
 static void enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
 #ifdef CONFIG_SCHEDSTATS
@@ -446,7 +452,9 @@ enqueue_entity(struct cfs_rq *cfs_rq, st
 	}
 
 	update_stats_enqueue(cfs_rq, se);
-	__enqueue_entity(cfs_rq, se);
+	if (se != cfs_rq->curr)
+		__enqueue_entity(cfs_rq, se);
+	account_entity_enqueue(cfs_rq, se);
 }
 
 static void
@@ -465,7 +473,9 @@ dequeue_entity(struct cfs_rq *cfs_rq, st
 		}
 	}
 #endif
-	__dequeue_entity(cfs_rq, se);
+	if (se != cfs_rq->curr)
+		__dequeue_entity(cfs_rq, se);
+	account_entity_dequeue(cfs_rq, se);
 }
 
 /*
@@ -511,6 +521,9 @@ static struct sched_entity *pick_next_en
 {
 	struct sched_entity *se = __pick_next_entity(cfs_rq);
 
+	if (se)
+		__dequeue_entity(cfs_rq, se);
+
 	set_next_entity(cfs_rq, se);
 
 	return se;
@@ -522,8 +535,11 @@ static void put_prev_entity(struct cfs_r
 	 * If still on the runqueue then deactivate_task()
 	 * was not called and update_curr() has to be done:
 	 */
-	if (prev->on_rq)
+	if (prev->on_rq) {
 		update_curr(cfs_rq);
+		/* Put the current back into the tree. */
+		__enqueue_entity(cfs_rq, prev);
+	}	
 
 	update_stats_curr_end(cfs_rq, prev);
 
@@ -535,11 +551,9 @@ static void put_prev_entity(struct cfs_r
 static void entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
 {
 	/*
-	 * Dequeue and enqueue the task to update its
-	 * position within the tree:
+	 * Update run-time statistics of the 'current'.
 	 */
-	dequeue_entity(cfs_rq, curr, 0);
-	enqueue_entity(cfs_rq, curr, 0);
+	update_curr(cfs_rq);	
 
 	if (cfs_rq->nr_running > 1)
 		check_preempt_tick(cfs_rq, curr);
@@ -890,6 +904,7 @@ static void task_new_fair(struct rq *rq,
 
 	update_stats_enqueue(cfs_rq, se);
 	__enqueue_entity(cfs_rq, se);
+	account_entity_enqueue(cfs_rq, se);
 	resched_task(rq->curr);
 }
 

---



^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2007-09-14 17:55 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-11 20:04 [announce] CFS-devel, performance improvements Ingo Molnar
2007-09-12  0:42 ` Roman Zippel
2007-09-13  7:52   ` Ingo Molnar
2007-09-13 12:35     ` Roman Zippel
2007-09-13 14:28       ` Ingo Molnar
2007-09-13 16:50         ` Roman Zippel
2007-09-13 17:06           ` Peter Zijlstra
2007-09-13 17:09             ` Peter Zijlstra
2007-09-14 12:04             ` Roman Zippel
2007-09-14 12:17               ` Peter Zijlstra
2007-09-13 18:28           ` Kyle Moffett
2007-09-13 19:08             ` Peter Zijlstra
2007-09-14 15:26           ` Arjan van de Ven
2007-09-14 14:50             ` Roman Zippel
2007-09-14 15:56               ` Arjan van de Ven
2007-09-14 15:13                 ` Roman Zippel
2007-09-13 19:01       ` Sam Ravnborg
2007-09-14 12:26         ` Roman Zippel
2007-09-12  1:16 ` Rob Hussey
2007-09-13  8:42   ` Rob Hussey
2007-09-13  9:06     ` Ingo Molnar
2007-09-13  9:24       ` Rob Hussey
2007-09-13  9:31         ` Ingo Molnar
2007-09-13  9:36           ` Rob Hussey
2007-09-13  9:43             ` Rob Hussey
2007-09-13 10:17               ` Rob Hussey
2007-09-13 11:48             ` Ingo Molnar
2007-09-14  1:47               ` Rob Hussey
2007-09-14  2:26                 ` Rob Hussey
2007-09-14  6:59                 ` Kyle Moffett
2007-09-12  6:20 ` Mike Galbraith
2007-09-12 22:17 ` Roman Zippel
2007-09-13  7:17   ` debian developer
2007-09-13  7:34   ` debian developer
2007-09-13  9:19   ` Ingo Molnar
2007-09-13 11:35   ` Peter Zijlstra
2007-09-13 12:14     ` Roman Zippel
2007-09-13 12:44       ` Peter Zijlstra
2007-09-14 11:16         ` Peter Zijlstra
2007-09-13 12:47   ` Ingo Molnar
2007-09-14 11:46     ` Roman Zippel
2007-09-13 23:08   ` Willy Tarreau
2007-09-14 13:10     ` Roman Zippel
2007-09-14 17:54       ` Willy Tarreau
  -- strict thread matches above, loose matches on Subject: below --
2007-09-13 22:51 dimm
2007-09-14  8:13 ` Ingo Molnar
2007-09-14  8:13 ` Ingo Molnar
2007-09-13 23:25 dimm
2007-09-14  8:17 ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox