[patch] CFS scheduler, -v7

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [patch] CFS scheduler, -v7
@ 2007-04-28 15:25 Ingo Molnar
  2007-04-28 19:20 ` S.Çağlar Onur
                   ` (3 more replies)
  0 siblings, 4 replies; 26+ messages in thread
From: Ingo Molnar @ 2007-04-28 15:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Andrew Morton, Con Kolivas, Nick Piggin,
	Mike Galbraith, Arjan van de Ven, Peter Williams, Thomas Gleixner,
	caglar, Willy Tarreau, Gene Heskett, Mark Lord, Zach Carter,
	Kasper Sandberg, buddabrod, Srivatsa Vaddagiri


i'm pleased to announce release -v7 of the CFS scheduler patchset. (The 
main goal of CFS is to implement "desktop scheduling" with as high 
quality as technically possible.)

The CFS patch against v2.6.21 (or against v2.6.20.8) can be downloaded 
from the usual place:

    http://redhat.com/~mingo/cfs-scheduler/

-v6 got lots of nice feedback and the -v5 list of regressions has shrunk 
considerably. The most user-visible change in -v7 should be a fix for an 
interactivity problem that should/could explain the 'audio skipping' 
problem reported by Kasper Sandberg. (which was the only main regression 
reported against -v6. Please re-report regressions, if any.)

the rate of change is moderate:

    15 files changed, 150 insertions(+), 124 deletions(-)

half of that code-flux is due to the removal of the X auto-renice patch 
and most of the rest is debugging related. It seems the CFS codebase is 
slowly starting to settle down. (-v7 has been test-built and test-booted 
on i686 and x86_64 UP and SMP systems.)

Changes since -v6:

 - speedup: cache rb_leftmost better (Srivatsa Vaddagiri)

 - bugfix: handle Priority Inheritance properly (Thomas Gleixner)

 - interactivity fix: tighten up arithmetics some more.

 - feature removal: remove the X auto-renicing feature, CONFIG_BOOST_X.

 - debugging feature: introduce the sched_sleep_history_max_ns tunable
   to modify sleep-history handling.

 - debugging feature: /proc/<PID>/sched file contains various useful
   scheduler statistics about every task.

 - debugging feature: track the maximum amount of time a task has been
   waiting to get on the CPU, the maximum amount of time it was blocked
   involuntarily and the maximum amount of time it was sleeping
   voluntarily.

As usual, any sort of feedback, bugreport, fix and suggestion is more 
than welcome,

	Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch] CFS scheduler, -v7
  2007-04-28 15:25 [patch] CFS scheduler, -v7 Ingo Molnar
@ 2007-04-28 19:20 ` S.Çağlar Onur
  2007-04-28 19:24   ` Ingo Molnar
  2007-04-28 19:27   ` S.Çağlar Onur
  2007-04-29 17:28 ` Prakash Punnoor
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 26+ messages in thread
From: S.Çağlar Onur @ 2007-04-28 19:20 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas,
	Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams,
	Thomas Gleixner, Willy Tarreau, Gene Heskett, Mark Lord,
	Zach Carter, Kasper Sandberg, buddabrod, Srivatsa Vaddagiri

[-- Attachment #1: Type: text/plain, Size: 2760 bytes --]

28 Nis 2007 Cts tarihinde, Ingo Molnar şunları yazmıştı: 
>  - feature removal: remove the X auto-renicing feature, CONFIG_BOOST_X.

This the the first time i use CFS with without X auto-renicing and i think its 
not smooth as before :(

While compiling something (say alsa-drivers) with just "make" firefox starts 
to scrolls slowly, kmail frezes time to time, applications are not responsive 
as before in that load and also system is not as fast as before while system 
idle. Also somehow boot takes longer;

v6;
Apr 28 16:40:41 (up 10.57) /sbin/mudur.py sysinit
Apr 28 13:40:49 (up 17.75) /sbin/mudur.py boot
Apr 28 13:41:00 (up 28.61) /sbin/mudur.py default

v7;
Apr 29 00:35:49 (up 10.61) /sbin/mudur.py sysinit
Apr 28 21:36:14 (up 33.77) /sbin/mudur.py boot
Apr 28 21:36:16 (up 36.21) /sbin/mudur.py default


And i while trying to understand what caused this (or trying to confirm X is 
the problem), i realized Xorg is not the only reniced process in v6

[caglar@zangetsu][~]> diff -u top.v7 top.v6
...
   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
...
     2 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0
     3 root      39  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/0
-    4 root      15  -5     0    0    0 S  0.0  0.0   0:00.11 events/0
-    5 root      15  -5     0    0    0 S  0.0  0.0   0:00.01 khelper
-    6 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 kthread
-   26 root      15  -5     0    0    0 S  0.0  0.0   0:00.02 kblockd/0
-   27 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 kacpid
-  124 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 kseriod
-  137 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 kapmd
+    4 root      10 -10     0    0    0 S  0.0  0.0   0:00.01 events/0
+    5 root      10 -10     0    0    0 S  0.0  0.0   0:00.01 khelper
+    6 root      10 -10     0    0    0 S  0.0  0.0   0:00.00 kthread
+   26 root      10 -10     0    0    0 S  0.0  0.0   0:00.03 kblockd/0
+   27 root      10 -10     0    0    0 S  0.0  0.0   0:00.00 kacpid
+  124 root      10 -10     0    0    0 S  0.0  0.0   0:00.00 kseriod
+  137 root      10 -10     0    0    0 S  0.0  0.0   0:00.00 kapmd
...

Until now i didn't try to renice X manually to see if it changes anything in 
v7 but i will (ırgh, i simply boot with v6).

> As usual, any sort of feedback, bugreport, fix and suggestion is more
> than welcome,

If you want some more output/info etc. please just say, i have both v6 and v7 
available.

Cheers
-- 
S.Çağlar Onur <caglar@pardus.org.tr>
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch] CFS scheduler, -v7
  2007-04-28 19:20 ` S.Çağlar Onur
@ 2007-04-28 19:24   ` Ingo Molnar
  2007-04-28 23:42     ` S.Çağlar Onur
  2007-04-28 19:27   ` S.Çağlar Onur
  1 sibling, 1 reply; 26+ messages in thread
From: Ingo Molnar @ 2007-04-28 19:24 UTC (permalink / raw)
  To: S.Çağlar Onur
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas,
	Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams,
	Thomas Gleixner, Willy Tarreau, Gene Heskett, Mark Lord,
	Zach Carter, Kasper Sandberg, buddabrod, Srivatsa Vaddagiri

[-- Attachment #1: Type: text/plain, Size: 314 bytes --]


* S.Çağlar Onur <caglar@pardus.org.tr> wrote:

> If you want some more output/info etc. please just say, i have both v6 
> and v7 available.

could you try the auto-renice patch ontop of -v7:

  http://people.redhat.com/mingo/cfs-scheduler/sched-cfs-auto-renice.patch

does this make it behave like -v6?

	Ingo

[-- Attachment #2: sched-cfs-auto-renice.patch --]
[-- Type: text/plain, Size: 6597 bytes --]

---
 arch/i386/kernel/ioport.c   |   17 ++++++++++++++---
 arch/x86_64/kernel/ioport.c |   12 ++++++++++--
 drivers/block/loop.c        |    5 ++++-
 include/linux/sched.h       |    7 +++++++
 kernel/Kconfig.preempt      |   17 +++++++++++++++++
 kernel/sched.c              |   40 ++++++++++++++++++++++++++++++++++++++++
 kernel/workqueue.c          |    2 +-
 mm/oom_kill.c               |    4 +++-
 8 files changed, 96 insertions(+), 8 deletions(-)

Index: linux/arch/i386/kernel/ioport.c
===================================================================
--- linux.orig/arch/i386/kernel/ioport.c
+++ linux/arch/i386/kernel/ioport.c
@@ -64,9 +64,17 @@ asmlinkage long sys_ioperm(unsigned long
 
 	if ((from + num <= from) || (from + num > IO_BITMAP_BITS))
 		return -EINVAL;
-	if (turn_on && !capable(CAP_SYS_RAWIO))
-		return -EPERM;
-
+	if (turn_on) {
+		if (!capable(CAP_SYS_RAWIO))
+			return -EPERM;
+		/*
+		 * Task will be accessing hardware IO ports,
+		 * mark it as special with the scheduler too:
+		 */
+#ifdef CONFIG_BOOST_X
+		sched_privileged_task(current);
+#endif
+	}
 	/*
 	 * If it's the first ioperm() call in this thread's lifetime, set the
 	 * IO bitmap up. ioperm() is much less timing critical than clone(),
@@ -145,6 +153,9 @@ asmlinkage long sys_iopl(unsigned long u
 	if (level > old) {
 		if (!capable(CAP_SYS_RAWIO))
 			return -EPERM;
+#ifdef CONFIG_BOOST_X
+		sched_privileged_task(current);
+#endif
 	}
 	t->iopl = level << 12;
 	regs->eflags = (regs->eflags & ~X86_EFLAGS_IOPL) | t->iopl;
Index: linux/arch/x86_64/kernel/ioport.c
===================================================================
--- linux.orig/arch/x86_64/kernel/ioport.c
+++ linux/arch/x86_64/kernel/ioport.c
@@ -41,8 +41,13 @@ asmlinkage long sys_ioperm(unsigned long
 
 	if ((from + num <= from) || (from + num > IO_BITMAP_BITS))
 		return -EINVAL;
-	if (turn_on && !capable(CAP_SYS_RAWIO))
-		return -EPERM;
+	if (turn_on) {
+		if (!capable(CAP_SYS_RAWIO))
+			return -EPERM;
+#ifdef CONFIG_BOOST_X
+		sched_privileged_task(current);
+#endif
+	}
 
 	/*
 	 * If it's the first ioperm() call in this thread's lifetime, set the
@@ -113,6 +118,9 @@ asmlinkage long sys_iopl(unsigned int le
 	if (level > old) {
 		if (!capable(CAP_SYS_RAWIO))
 			return -EPERM;
+#ifdef CONFIG_BOOST_X
+		sched_privileged_task(current);
+#endif
 	}
 	regs->eflags = (regs->eflags &~ X86_EFLAGS_IOPL) | (level << 12);
 	return 0;
Index: linux/drivers/block/loop.c
===================================================================
--- linux.orig/drivers/block/loop.c
+++ linux/drivers/block/loop.c
@@ -588,7 +588,10 @@ static int loop_thread(void *data)
 	 */
 	current->flags |= PF_NOFREEZE;
 
-	set_user_nice(current, -20);
+	/*
+	 * The loop thread is important enough to be given a boost:
+	 */
+	sched_privileged_task(current);
 
 	while (!kthread_should_stop() || lo->lo_bio) {
 
Index: linux/include/linux/sched.h
===================================================================
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -1268,6 +1268,13 @@ static inline int rt_mutex_getprio(struc
 #endif
 
 extern void set_user_nice(struct task_struct *p, long nice);
+/*
+ * Task has special privileges, give it more CPU power:
+ */
+extern void sched_privileged_task(struct task_struct *p);
+
+extern int sysctl_sched_privileged_nice_level;
+
 extern int task_prio(const struct task_struct *p);
 extern int task_nice(const struct task_struct *p);
 extern int can_nice(const struct task_struct *p, const int nice);
Index: linux/kernel/Kconfig.preempt
===================================================================
--- linux.orig/kernel/Kconfig.preempt
+++ linux/kernel/Kconfig.preempt
@@ -63,3 +63,20 @@ config PREEMPT_BKL
 	  Say Y here if you are building a kernel for a desktop system.
 	  Say N if you are unsure.
 
+config BOOST_X
+	bool "Boost X"
+	default y
+	help
+	  This option instructs the kernel to guarantee more CPU time to
+	  X than to other tasks, which is useful if you want to have a
+	  faster desktop even under high system load.
+
+	  This option works by automatically boosting X's priority via
+	  renicing it to -10. NOTE: CFS does not suffer from
+	  "overscheduling" problems when X is reniced to -10, so if this
+	  is a predominantly desktop box it makes sense to select this
+	  option.
+
+	  Say Y here if you are building a kernel for a desktop system.
+	  Say N if you want X to be treated as a normal task.
+
Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -3323,6 +3323,46 @@ out_unlock:
 EXPORT_SYMBOL(set_user_nice);
 
 /*
+ * Nice level for privileged tasks. (can be set to 0 for this
+ * to be turned off)
+ */
+int sysctl_sched_privileged_nice_level __read_mostly = -10;
+
+static int __init privileged_nice_level_setup(char *str)
+{
+	sysctl_sched_privileged_nice_level = simple_strtol(str, NULL, 0);
+	return 1;
+}
+__setup("privileged_nice_level=", privileged_nice_level_setup);
+
+/*
+ * Tasks with special privileges call this and gain extra nice
+ * levels:
+ */
+void sched_privileged_task(struct task_struct *p)
+{
+	long new_nice = sysctl_sched_privileged_nice_level;
+	long old_nice = TASK_NICE(p);
+
+	if (new_nice >= old_nice)
+		return;
+	/*
+	 * Setting the sysctl to 0 turns off the boosting:
+	 */
+	if (unlikely(!new_nice))
+		return;
+
+	if (new_nice < -20)
+		new_nice = -20;
+	else if (new_nice > 19)
+		new_nice = 19;
+
+	set_user_nice(p, new_nice);
+}
+
+EXPORT_SYMBOL(sched_privileged_task);
+
+/*
  * can_nice - check if a task can reduce its nice value
  * @p: task
  * @nice: nice value
Index: linux/kernel/workqueue.c
===================================================================
--- linux.orig/kernel/workqueue.c
+++ linux/kernel/workqueue.c
@@ -355,7 +355,7 @@ static int worker_thread(void *__cwq)
 	if (!cwq->freezeable)
 		current->flags |= PF_NOFREEZE;
 
-	set_user_nice(current, -5);
+	sched_privileged_task(current);
 
 	/* Block and flush all signals */
 	sigfillset(&blocked);
Index: linux/mm/oom_kill.c
===================================================================
--- linux.orig/mm/oom_kill.c
+++ linux/mm/oom_kill.c
@@ -293,7 +293,9 @@ static void __oom_kill_task(struct task_
 	 * all the memory it needs. That way it should be able to
 	 * exit() and clear out its resources quickly...
 	 */
-	p->time_slice = HZ;
+	if (p->policy == SCHED_NORMAL || p->policy == SCHED_BATCH)
+		sched_privileged_task(p);
+
 	set_tsk_thread_flag(p, TIF_MEMDIE);
 
 	force_sig(SIGKILL, p);

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch] CFS scheduler, -v7
  2007-04-28 19:24   ` Ingo Molnar
@ 2007-04-28 23:42     ` S.Çağlar Onur
  2007-04-29  7:11       ` Ingo Molnar
  0 siblings, 1 reply; 26+ messages in thread
From: S.Çağlar Onur @ 2007-04-28 23:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas,
	Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams,
	Thomas Gleixner, Willy Tarreau, Gene Heskett, Mark Lord,
	Zach Carter, Kasper Sandberg, buddabrod, Srivatsa Vaddagiri

[-- Attachment #1: Type: text/plain, Size: 987 bytes --]

28 Nis 2007 Cts tarihinde, Ingo Molnar şunları yazmıştı: 
> * S.Çağlar Onur <caglar@pardus.org.tr> wrote:
> > If you want some more output/info etc. please just say, i have both v6
> > and v7 available.
>
> could you try the auto-renice patch ontop of -v7:
>
>   http://people.redhat.com/mingo/cfs-scheduler/sched-cfs-auto-renice.patch
>
> does this make it behave like -v6?

Ingo, please ignore my first report until i found a proper way to reproduce 
the slowness cause currently CFS-v7, CFS-v7 + "renice patch", CFS-v7 + renice 
+ your private mail suggestions and CFS-v6 + "PI support for futexes patch" 
seems works equally (which is a good thing so X renicing seems really not 
needed, and there were no regression instead of my daydreams) or im too tired 
to understand the differences.

-- 
S.Çağlar Onur <caglar@pardus.org.tr>
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch] CFS scheduler, -v7
  2007-04-28 23:42     ` S.Çağlar Onur
@ 2007-04-29  7:11       ` Ingo Molnar
  2007-04-29 12:37         ` S.Çağlar Onur
  0 siblings, 1 reply; 26+ messages in thread
From: Ingo Molnar @ 2007-04-29  7:11 UTC (permalink / raw)
  To: S.Çağlar Onur
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas,
	Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams,
	Thomas Gleixner, Willy Tarreau, Gene Heskett, Mark Lord,
	Zach Carter, Kasper Sandberg, buddabrod, Srivatsa Vaddagiri


* S.Çağlar Onur <caglar@pardus.org.tr> wrote:

> Ingo, please ignore my first report until i found a proper way to 
> reproduce the slowness cause currently CFS-v7, CFS-v7 + "renice 
> patch", CFS-v7 + renice + your private mail suggestions and CFS-v6 + 
> "PI support for futexes patch" seems works equally (which is a good 
> thing so X renicing seems really not needed, [...]

oh, good!

> [...] and there were no regression instead of my daydreams) or im too 
> tired to understand the differences.

could the CPU have dropped speed for that bootup (some CPUs do that 
automatically upon overheating), or perhaps if you are using some RAID 
array, could it have done a background resync? Especially the bootup 
slowdown you saw seemed significant, and because bootup speed is 90% IO 
dominated, the CPU scheduler seems an unlikely candidate.

	Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch] CFS scheduler, -v7
  2007-04-29  7:11       ` Ingo Molnar
@ 2007-04-29 12:37         ` S.Çağlar Onur
  2007-04-29 15:58           ` Ingo Molnar
  0 siblings, 1 reply; 26+ messages in thread
From: S.Çağlar Onur @ 2007-04-29 12:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas,
	Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams,
	Thomas Gleixner, Willy Tarreau, Gene Heskett, Mark Lord,
	Zach Carter, Kasper Sandberg, buddabrod, Srivatsa Vaddagiri

[-- Attachment #1: Type: text/plain, Size: 1199 bytes --]

29 Nis 2007 Paz tarihinde, Ingo Molnar şunları yazmıştı: 
> > [...] and there were no regression instead of my daydreams) or im too
> > tired to understand the differences.
>
> could the CPU have dropped speed for that bootup (some CPUs do that
> automatically upon overheating), or perhaps if you are using some RAID
> array, could it have done a background resync? Especially the bootup
> slowdown you saw seemed significant, and because bootup speed is 90% IO
> dominated, the CPU scheduler seems an unlikely candidate.

It could some overheating problem but i think if it is, this is the first time 
it occurs :), and i don't have any array, SAMSUNG HM120JC IDE disk works on 
SONY VAIO FS-215B.

I just boot with plain CFSv7 and boot time seems normal; 

Apr 29 15:02:54 (up 10.72) /sbin/mudur.py sysinit
Apr 29 15:03:02 (up 17.26) /sbin/mudur.py boot
Apr 29 15:03:06 (up 21.34) /sbin/mudur.py default

I'll report if i can find any reproducable problem, so far CFSv7 works as 
expected :)

Cheers
-- 
S.Çağlar Onur <caglar@pardus.org.tr>
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch] CFS scheduler, -v7
  2007-04-29 12:37         ` S.Çağlar Onur
@ 2007-04-29 15:58           ` Ingo Molnar
  2007-04-29 22:29             ` Dennis Brendel
  2007-04-30 14:38             ` S.Çağlar Onur
  0 siblings, 2 replies; 26+ messages in thread
From: Ingo Molnar @ 2007-04-29 15:58 UTC (permalink / raw)
  To: S.Çağlar Onur
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas,
	Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams,
	Thomas Gleixner, Willy Tarreau, Gene Heskett, Mark Lord,
	Zach Carter, Kasper Sandberg, buddabrod, Srivatsa Vaddagiri


* S.Çağlar Onur <caglar@pardus.org.tr> wrote:

> I'll report if i can find any reproducable problem, so far CFSv7 works 
> as expected :)

ok :)

	Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch] CFS scheduler, -v7
  2007-04-29 15:58           ` Ingo Molnar
@ 2007-04-29 22:29             ` Dennis Brendel
  2007-04-30 14:38             ` S.Çağlar Onur
  1 sibling, 0 replies; 26+ messages in thread
From: Dennis Brendel @ 2007-04-29 22:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: S.Çağlar Onur, linux-kernel, Linus Torvalds,
	Andrew Morton, Con Kolivas, Nick Piggin, Mike Galbraith,
	Arjan van de Ven, Peter Williams, Thomas Gleixner, Willy Tarreau,
	Gene Heskett, Mark Lord, Zach Carter, Kasper Sandberg,
	Srivatsa Vaddagiri

On Sunday 29 April 2007, Ingo Molnar wrote:
> * S.Çağlar Onur <caglar@pardus.org.tr> wrote:
> > I'll report if i can find any reproducable problem, so far CFSv7 works
> > as expected :)
>
> ok :)
>
> 	Ingo

One small regression: I worked for hours with cfs v7 as scheduler while  
listening to music (with amarok/xine engine) and the music stuttered 3 times 
in 5 hours. After then i watched a movie and that stuttered 2 times, too. But 
when i watched that previously stuttering scenes again, there was no 
stuttering.

It's not directly reproducable, it happens randomly.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch] CFS scheduler, -v7
  2007-04-29 15:58           ` Ingo Molnar
  2007-04-29 22:29             ` Dennis Brendel
@ 2007-04-30 14:38             ` S.Çağlar Onur
  1 sibling, 0 replies; 26+ messages in thread
From: S.Çağlar Onur @ 2007-04-30 14:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas,
	Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams,
	Thomas Gleixner, Willy Tarreau, Gene Heskett, Mark Lord,
	Zach Carter, Kasper Sandberg, buddabrod, Srivatsa Vaddagiri

[-- Attachment #1: Type: text/plain, Size: 1787 bytes --]

Hi Ingo;

29 Nis 2007 Paz tarihinde, Ingo Molnar şunları yazmıştı: 
> * S.Çağlar Onur <caglar@pardus.org.tr> wrote:
> > I'll report if i can find any reproducable problem, so far CFSv7 works
> > as expected :)

After complete 2 day usage, i still can't reproduce previous problems. And 
even can manage to burn a dvd with k3b with one console compiles kernel, 
other checkouts svn repos, amarok plays music, virtualbox boots another 
distro and with 3 tab opened firefox setup :). And i must say system was 
still usable at that time the only thing happened was a little firefox 
slowdown but its really negligable cause with mainline firefox/whole system 
really unusable if i fire a guest in virtualbox.

And lastly boot time still around ~20 sec. which is expected

Apr 29 01:39:04 (up 10.48) /sbin/mudur.py sysinit
Apr 29 01:39:13 (up 18.83) /sbin/mudur.py boot
Apr 29 01:39:18 (up 23.29) /sbin/mudur.py default
...
Apr 29 15:02:54 (up 10.72) /sbin/mudur.py sysinit
Apr 29 15:03:02 (up 17.26) /sbin/mudur.py boot
Apr 29 15:03:06 (up 21.34) /sbin/mudur.py default
...
Apr 30 14:21:56 (up 22.80) /sbin/mudur.py default
Apr 30 14:28:34 (up 15.67) /sbin/mudur.py sysinit
Apr 30 14:28:40 (up 21.02) /sbin/mudur.py boot
...
Apr 30 14:33:26 (up 10.59) /sbin/mudur.py sysinit
Apr 30 14:33:32 (up 15.97) /sbin/mudur.py boot
Apr 30 14:33:35 (up 18.69) /sbin/mudur.py default
...
Apr 30 17:02:09 (up 10.70) /sbin/mudur.py sysinit
Apr 30 17:02:16 (up 17.15) /sbin/mudur.py boot
Apr 30 17:02:19 (up 20.16) /sbin/mudur.py default

So for me, there is no problem at all :)...

Cheers
-- 
S.Çağlar Onur <caglar@pardus.org.tr>
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch] CFS scheduler, -v7
  2007-04-28 19:20 ` S.Çağlar Onur
  2007-04-28 19:24   ` Ingo Molnar
@ 2007-04-28 19:27   ` S.Çağlar Onur
  1 sibling, 0 replies; 26+ messages in thread
From: S.Çağlar Onur @ 2007-04-28 19:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas,
	Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams,
	Thomas Gleixner, Willy Tarreau, Gene Heskett, Mark Lord,
	Zach Carter, Kasper Sandberg, buddabrod, Srivatsa Vaddagiri

[-- Attachment #1: Type: text/plain, Size: 699 bytes --]

28 Nis 2007 Cts tarihinde, S.Çağlar Onur şunları yazmıştı: 
> Also somehow boot takes longer;
>
> v6;
> Apr 28 16:40:41 (up 10.57) /sbin/mudur.py sysinit
> Apr 28 13:40:49 (up 17.75) /sbin/mudur.py boot
> Apr 28 13:41:00 (up 28.61) /sbin/mudur.py default
>
> v7;
> Apr 29 00:35:49 (up 10.61) /sbin/mudur.py sysinit
> Apr 28 21:36:14 (up 33.77) /sbin/mudur.py boot
> Apr 28 21:36:16 (up 36.21) /sbin/mudur.py default

The ones in paranthesis (second) are correct ones, date/time screws because of 
my bios :(

-- 
S.Çağlar Onur <caglar@pardus.org.tr>
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch] CFS scheduler, -v7
  2007-04-28 15:25 [patch] CFS scheduler, -v7 Ingo Molnar
  2007-04-28 19:20 ` S.Çağlar Onur
@ 2007-04-29 17:28 ` Prakash Punnoor
  2007-05-04 13:05   ` Prakash Punnoor
  2007-04-30 16:29 ` Srivatsa Vaddagiri
  2007-04-30 18:30 ` Balbir Singh
  3 siblings, 1 reply; 26+ messages in thread
From: Prakash Punnoor @ 2007-04-29 17:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas,
	Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams,
	Thomas Gleixner, caglar, Willy Tarreau, Gene Heskett, Mark Lord,
	Zach Carter, Kasper Sandberg, buddabrod, Srivatsa Vaddagiri

[-- Attachment #1: Type: text/plain, Size: 787 bytes --]

Am Samstag 28 April 2007 schrieb Ingo Molnar:
> i'm pleased to announce release -v7 of the CFS scheduler patchset. (The
> main goal of CFS is to implement "desktop scheduling" with as high
> quality as technically possible.)
>
> The CFS patch against v2.6.21 (or against v2.6.20.8) can be downloaded
> from the usual place:

I made a quick test with the ac3 encoder aften, I tested against rsdl 0.36 (I 
think it was). Time was slightly worse: >5.9 secs (with two threads on 
Athlon64 X2 x86_64). mainline gives me 5.4sec. rsdl took 5.8 sec.

I haven't testes earlier cfs nor later sd. Seems scheduler gets optimized more 
for single core than smp?

http://aften.sourceforge.net/

Cheers,
-- 
(°=                 =°)
//\ Prakash Punnoor /\\
V_/                 \_V

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch] CFS scheduler, -v7
  2007-04-29 17:28 ` Prakash Punnoor
@ 2007-05-04 13:05   ` Prakash Punnoor
  0 siblings, 0 replies; 26+ messages in thread
From: Prakash Punnoor @ 2007-05-04 13:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas,
	Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams,
	Thomas Gleixner, caglar, Willy Tarreau, Gene Heskett, Mark Lord,
	Zach Carter, Kasper Sandberg, buddabrod, Srivatsa Vaddagiri

[-- Attachment #1: Type: text/plain, Size: 935 bytes --]

Am Sonntag 29 April 2007 schrieb Prakash Punnoor:
> Am Samstag 28 April 2007 schrieb Ingo Molnar:
> > i'm pleased to announce release -v7 of the CFS scheduler patchset. (The
> > main goal of CFS is to implement "desktop scheduling" with as high
> > quality as technically possible.)
> >
> > The CFS patch against v2.6.21 (or against v2.6.20.8) can be downloaded
> > from the usual place:
>
> I made a quick test with the ac3 encoder aften, I tested against rsdl 0.36
> (I think it was). Time was slightly worse: >5.9 secs (with two threads on
> Athlon64 X2 x86_64). mainline gives me 5.4sec. rsdl took 5.8 sec.
>
> I haven't testes earlier cfs nor later sd. Seems scheduler gets optimized
> more for single core than smp?
>
> http://aften.sourceforge.net/

Just as an update: Ingo managed to fix this issue in v9. Nice work!

Cheers,
-- 
(°=                 =°)
//\ Prakash Punnoor /\\
V_/                 \_V

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch] CFS scheduler, -v7
  2007-04-28 15:25 [patch] CFS scheduler, -v7 Ingo Molnar
  2007-04-28 19:20 ` S.Çağlar Onur
  2007-04-29 17:28 ` Prakash Punnoor
@ 2007-04-30 16:29 ` Srivatsa Vaddagiri
  2007-04-30 18:30 ` Balbir Singh
  3 siblings, 0 replies; 26+ messages in thread
From: Srivatsa Vaddagiri @ 2007-04-30 16:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas,
	Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams,
	Thomas Gleixner, caglar, Willy Tarreau, Gene Heskett, Mark Lord,
	Zach Carter, Kasper Sandberg, buddabrod

On Sat, Apr 28, 2007 at 05:25:39PM +0200, Ingo Molnar wrote:
> i'm pleased to announce release -v7 of the CFS scheduler patchset. (The 
> main goal of CFS is to implement "desktop scheduling" with as high 
> quality as technically possible.)

+unsigned int sysctl_sched_granularity __read_mostly = 2000000;

Any reason why this tunable can't be 64-bit?

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch] CFS scheduler, -v7
  2007-04-28 15:25 [patch] CFS scheduler, -v7 Ingo Molnar
                   ` (2 preceding siblings ...)
  2007-04-30 16:29 ` Srivatsa Vaddagiri
@ 2007-04-30 18:30 ` Balbir Singh
  3 siblings, 0 replies; 26+ messages in thread
From: Balbir Singh @ 2007-04-30 18:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas,
	Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams,
	Thomas Gleixner, caglar, Willy Tarreau, Gene Heskett, Mark Lord,
	Zach Carter, Kasper Sandberg, buddabrod, Srivatsa Vaddagiri

[-- Attachment #1: Type: text/plain, Size: 1131 bytes --]

Ingo Molnar wrote:
> i'm pleased to announce release -v7 of the CFS scheduler patchset. (The 
> main goal of CFS is to implement "desktop scheduling" with as high 
> quality as technically possible.)
> 
> The CFS patch against v2.6.21 (or against v2.6.20.8) can be downloaded 
> from the usual place:
> 
>     http://redhat.com/~mingo/cfs-scheduler/
> 

Hi, Ingo,

I needed the following fixes on my powerpc box to fix all warnings
generated by the compiler out during compilation. Without these fixes, I was
seeing negative values in /proc/sched_debug on my box. I still see a
negative value for "waiting"

            task   PID     tree-key       delta    waiting  switches  prio 
  wstart-fair      sum-exec      sum-wait
-------------------------------------------------------------------------------------------------------------------
R          bash  5594  69839216478     6238568   -6238568      1617   120 
-69832977910   66377151352      27173793



I've started on cfs late (with -v7), hopefully I'll catch up.
More questions, feedback will follow.

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

[-- Attachment #2: cfs-v7-fix-sched-debug-warnings --]
[-- Type: text/plain, Size: 3498 bytes --]

Index: linux-2.6.21/kernel/sched_debug.c
===================================================================
--- linux-2.6.21.orig/kernel/sched_debug.c	2007-04-30 23:17:10.000000000 +0530
+++ linux-2.6.21/kernel/sched_debug.c	2007-04-30 23:49:40.000000000 +0530
@@ -45,13 +45,13 @@
 	SEQ_printf(m, "%14s %5d %12Ld %11Ld %10Ld %9Ld %5d "
 		      "%13Ld %13Ld %13Ld\n",
 		p->comm, p->pid,
-		p->fair_key, p->fair_key - rq->fair_clock,
-		p->wait_runtime,
-		p->nr_switches,
+		(long long)p->fair_key, (long long)p->fair_key - rq->fair_clock,
+		(long long)p->wait_runtime,
+		(long long)p->nr_switches,
 		p->prio,
-		p->wait_start_fair - rq->fair_clock,
-		p->sum_exec_runtime,
-		p->sum_wait_runtime);
+		(long long)p->wait_start_fair - rq->fair_clock,
+		(long long)p->sum_exec_runtime,
+		(long long)p->sum_wait_runtime);
 }
 
 static void print_rq(struct seq_file *m, struct rq *rq, u64 now)
@@ -83,7 +83,7 @@
 
 	SEQ_printf(m, "\ncpu: %d\n", cpu);
 #define P(x) \
-	SEQ_printf(m, "  .%-22s: %Ld\n", #x, (u64)(rq->x))
+	SEQ_printf(m, "  .%-22s: %Lu\n", #x, (unsigned long long)(rq->x))
 
 	P(nr_running);
 	P(raw_weighted_load);
@@ -110,7 +110,7 @@
 	int cpu;
 
 	SEQ_printf(m, "Sched Debug Version: v0.02\n");
-	SEQ_printf(m, "now at %Ld nsecs\n", (unsigned long long)now);
+	SEQ_printf(m, "now at %Lu nsecs\n", (unsigned long long)now);
 
 	for_each_online_cpu(cpu)
 		print_cpu(m, cpu, now);
Index: linux-2.6.21/kernel/sched.c
===================================================================
--- linux-2.6.21.orig/kernel/sched.c	2007-04-30 23:42:04.000000000 +0530
+++ linux-2.6.21/kernel/sched.c	2007-04-30 23:49:44.000000000 +0530
@@ -229,7 +229,7 @@
 	unsigned long long t0, t1;
 
 #define P(F) \
-	buffer += sprintf(buffer, "%-25s:%20Ld\n", #F, p->F)
+	buffer += sprintf(buffer, "%-25s:%20Ld\n", #F, (long long)p->F)
 
 	P(wait_start);
 	P(wait_start_fair);
@@ -248,22 +248,22 @@
 
 	t0 = sched_clock();
 	t1 = sched_clock();
-	buffer += sprintf(buffer, "%-25s:%20Ld\n", "clock-delta", t1-t0);
-	buffer += sprintf(buffer, "%-25s:%20Ld\n",
-			  "rq-wait_runtime", this_rq->wait_runtime);
-	buffer += sprintf(buffer, "%-25s:%20Ld\n",
-			  "rq-fair_clock", this_rq->fair_clock);
-	buffer += sprintf(buffer, "%-25s:%20Ld\n",
-			  "rq-clock", this_rq->clock);
-	buffer += sprintf(buffer, "%-25s:%20Ld\n",
-			  "rq-prev_clock_raw", this_rq->prev_clock_raw);
-	buffer += sprintf(buffer, "%-25s:%20Ld\n",
-			  "rq-clock_max_delta", this_rq->clock_max_delta);
-	buffer += sprintf(buffer, "%-25s:%20u\n",
-			  "rq-clock_warps", this_rq->clock_warps);
-	buffer += sprintf(buffer, "%-25s:%20u\n",
-			  "rq-clock_unstable_events",
-				  this_rq->clock_unstable_events);
+	buffer += sprintf(buffer, "%-25s:%20Ld\n", "clock-delta",
+				(long long)t1-t0);
+	buffer += sprintf(buffer, "%-25s:%20Ld\n", "rq-wait_runtime",
+				(long long)this_rq->wait_runtime);
+	buffer += sprintf(buffer, "%-25s:%20Ld\n", "rq-fair_clock",
+				(long long)this_rq->fair_clock);
+	buffer += sprintf(buffer, "%-25s:%20Ld\n", "rq-clock",
+				(long long)this_rq->clock);
+	buffer += sprintf(buffer, "%-25s:%20Ld\n", "rq-prev_clock_raw",
+				(long long)this_rq->prev_clock_raw);
+	buffer += sprintf(buffer, "%-25s:%20Ld\n", "rq-clock_max_delta",
+				(long long)this_rq->clock_max_delta);
+	buffer += sprintf(buffer, "%-25s:%20u\n", "rq-clock_warps",
+				this_rq->clock_warps);
+	buffer += sprintf(buffer, "%-25s:%20u\n", "rq-clock_unstable_events",
+				this_rq->clock_unstable_events);
 	return buffer;
 }
 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch] CFS scheduler, -v7
@ 2007-04-30  5:20 Al Boldi
  2007-05-03  7:45 ` Ingo Molnar
  0 siblings, 1 reply; 26+ messages in thread
From: Al Boldi @ 2007-04-30  5:20 UTC (permalink / raw)
  To: linux-kernel

Ingo Molnar wrote:
>
> i'm pleased to announce release -v7 of the CFS scheduler patchset. (The
> main goal of CFS is to implement "desktop scheduling" with as high
> quality as technically possible.)
:
:
> As usual, any sort of feedback, bugreport, fix and suggestion is more
> than welcome,

This one seems on par with SD, but there are still some nice issues.

Try running 3 chew.c's, then renicing one to -10, starves others for some 
seconds while switching prio-level.  Now renice it back to 10, it starves 
for up to 45sec.

Also, nice levels are only effective on every other step; ie:
 ... -3/-2 , -1/0 , 1/2 ... yields only 20 instead of 40 prio-levels.


Thanks!

--
Al


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch] CFS scheduler, -v7
  2007-04-30  5:20 Al Boldi
@ 2007-05-03  7:45 ` Ingo Molnar
  2007-05-03  8:07   ` Ingo Molnar
                     ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Ingo Molnar @ 2007-05-03  7:45 UTC (permalink / raw)
  To: Al Boldi; +Cc: linux-kernel


* Al Boldi <a1426z@gawab.com> wrote:

> > i'm pleased to announce release -v7 of the CFS scheduler patchset. 
> > (The main goal of CFS is to implement "desktop scheduling" with as 
> > high quality as technically possible.)
> :
> :
> > As usual, any sort of feedback, bugreport, fix and suggestion is 
> > more than welcome,
> 
> This one seems on par with SD, [...]

excellent :-)

> [...] but there are still some nice issues.
> 
> Try running 3 chew.c's, then renicing one to -10, starves others for 
> some seconds while switching prio-level.  Now renice it back to 10, it 
> starves for up to 45sec.

ok - to make sure i understood you correctly: does this starvation only 
occur right when you renice it (when switching prio levels), and it gets 
rectified quickly once they get over a few reschedules?

> Also, nice levels are only effective on every other step; ie:
>  ... -3/-2 , -1/0 , 1/2 ... yields only 20 instead of 40 prio-levels.

yeah - this is a first-approximation thing.

Some background: in the upstream scheduler (and in SD) nice levels are 
linearly scaled, while in CFS they are exponentially scaled. I did this 
because i believe exponential is more logical: regardless of which nice 
level a task uses, if it goes +2 nice levels up then it will halve its 
"fair CPU share". So for example the CPU consumption delta between nice 
0 and nice +10 is 1/32 - and so is the delta between -5 and +5, -10 and 
-5, etc. This makes nice levels _alot_ more potent than upstream's 
linear approach.

	Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch] CFS scheduler, -v7
  2007-05-03  7:45 ` Ingo Molnar
@ 2007-05-03  8:07   ` Ingo Molnar
  2007-05-03 11:16     ` Al Boldi
  2007-05-03  8:42   ` Al Boldi
  2007-05-03 15:02   ` Ting Yang
  2 siblings, 1 reply; 26+ messages in thread
From: Ingo Molnar @ 2007-05-03  8:07 UTC (permalink / raw)
  To: Al Boldi; +Cc: linux-kernel


* Ingo Molnar <mingo@elte.hu> wrote:

> > [...] but there are still some nice issues.
> > 
> > Try running 3 chew.c's, then renicing one to -10, starves others for 
> > some seconds while switching prio-level.  Now renice it back to 10, 
> > it starves for up to 45sec.
> 
> ok - to make sure i understood you correctly: does this starvation 
> only occur right when you renice it (when switching prio levels), and 
> it gets rectified quickly once they get over a few reschedules?

meanwhile i managed to reproduce it by following the exact steps you 
described, and i've fixed the bug in my tree. Can you confirm that the 
patch below fixes it for you too?

	Ingo

----------------->
From: Ingo Molnar <mingo@elte.hu>
Subject: [patch] sched, cfs: fix starvation upon nice level switching

Al Boldi reported the following bug: when switching a CPU-intense task's 
nice levels they can get unfairly starved right after the priority level 
switching. The bug was that when changing the load_weight the 
->wait_runtime value did not get rescaled. So clear wait_runtime when 
switching nice levels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -575,6 +580,7 @@ static void set_load_weight(struct task_
 {
 	p->load_shift = get_load_shift(p);
 	p->load_weight = 1 << p->load_shift;
+	p->wait_runtime = 0;
 }
 
 static inline void

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch] CFS scheduler, -v7
  2007-05-03  8:07   ` Ingo Molnar
@ 2007-05-03 11:16     ` Al Boldi
  2007-05-03 12:36       ` Ingo Molnar
  0 siblings, 1 reply; 26+ messages in thread
From: Al Boldi @ 2007-05-03 11:16 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

Ingo Molnar wrote:
> * Ingo Molnar <mingo@elte.hu> wrote:
> > > [...] but there are still some nice issues.
> > >
> > > Try running 3 chew.c's, then renicing one to -10, starves others for
> > > some seconds while switching prio-level.  Now renice it back to 10,
> > > it starves for up to 45sec.
> >
> > ok - to make sure i understood you correctly: does this starvation
> > only occur right when you renice it (when switching prio levels), and
> > it gets rectified quickly once they get over a few reschedules?
>
> meanwhile i managed to reproduce it by following the exact steps you
> described, and i've fixed the bug in my tree. Can you confirm that the
> patch below fixes it for you too?

Seems like this fixed it.  But I can still see these awful latency blips in 
the presence of negatively niced chew.c at -10 and two chew.c's at nice 0.  
This gets really bad when sched_granularity_ns >= 5,000,000.


Thanks!

--
Al


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch] CFS scheduler, -v7
  2007-05-03 11:16     ` Al Boldi
@ 2007-05-03 12:36       ` Ingo Molnar
  2007-05-03 13:49         ` Al Boldi
  0 siblings, 1 reply; 26+ messages in thread
From: Ingo Molnar @ 2007-05-03 12:36 UTC (permalink / raw)
  To: Al Boldi; +Cc: linux-kernel

* Al Boldi <a1426z@gawab.com> wrote:

> [...] But I can still see these awful latency blips in the presence of 
> negatively niced chew.c at -10 and two chew.c's at nice 0. [...]

of course: you asked for the two chew's to be treated like that and CFS 
delivered it! :-)

nice -10 means the two chew's will get ~90+% of the CPU time, and all 
other nice 0 tasks will get <10% of CPU time.

in the previous mail i have described the new exponential-scale nice 
levels that CFS introduces. In practice this means that vanilla kernel's 
nice -20 level is roughly equivalent to CFS's nice -6. CFS's nice -10 
would be roughly equivalent to vanilla nice -80 (if it existed).

	Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch] CFS scheduler, -v7
  2007-05-03 12:36       ` Ingo Molnar
@ 2007-05-03 13:49         ` Al Boldi
  0 siblings, 0 replies; 26+ messages in thread
From: Al Boldi @ 2007-05-03 13:49 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

Ingo Molnar wrote:
> * Al Boldi <a1426z@gawab.com> wrote:
> > [...] But I can still see these awful latency blips in the presence of
> > negatively niced chew.c at -10 and two chew.c's at nice 0. [...]
>
> of course: you asked for the two chew's to be treated like that and CFS
> delivered it! :-)
>
> nice -10 means the two chew's will get ~90+% of the CPU time, and all
> other nice 0 tasks will get <10% of CPU time.

Yes, but the latencies fluctuate wildly from 5ms to 
sched_granularity_ns*1000.  Isn't it possible to smooth this?


Thanks!

--
Al


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch] CFS scheduler, -v7
  2007-05-03  7:45 ` Ingo Molnar
  2007-05-03  8:07   ` Ingo Molnar
@ 2007-05-03  8:42   ` Al Boldi
  2007-05-03 15:02   ` Ting Yang
  2 siblings, 0 replies; 26+ messages in thread
From: Al Boldi @ 2007-05-03  8:42 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

Ingo Molnar wrote:
> * Al Boldi <a1426z@gawab.com> wrote:
> > > i'm pleased to announce release -v7 of the CFS scheduler patchset.
> > > (The main goal of CFS is to implement "desktop scheduling" with as
> > > high quality as technically possible.)
> > >
> > >
> > > As usual, any sort of feedback, bugreport, fix and suggestion is
> > > more than welcome,
> >
> > This one seems on par with SD, [...]
>
> excellent :-)
>
> > [...] but there are still some nice issues.
> >
> > Try running 3 chew.c's, then renicing one to -10, starves others for
> > some seconds while switching prio-level.  Now renice it back to 10, it
> > starves for up to 45sec.
>
> ok - to make sure i understood you correctly: does this starvation only
> occur right when you renice it (when switching prio levels),

Yes.

> and it gets
> rectified quickly once they get over a few reschedules?

Well, depending on nice level this delay may be more than 45sec.

And, in cfs-v8 there is an additional repeating latency blip akin to an 
expiry when running procs at different nice levels.  chew.c shows this 
clearly.

> > Also, nice levels are only effective on every other step; ie:
> >  ... -3/-2 , -1/0 , 1/2 ... yields only 20 instead of 40 prio-levels.
>
> yeah - this is a first-approximation thing.
>
> Some background: in the upstream scheduler (and in SD) nice levels are
> linearly scaled, while in CFS they are exponentially scaled. I did this
> because i believe exponential is more logical: regardless of which nice
> level a task uses, if it goes +2 nice levels up then it will halve its
> "fair CPU share". So for example the CPU consumption delta between nice
> 0 and nice +10 is 1/32 - and so is the delta between -5 and +5, -10 and
> -5, etc. This makes nice levels _alot_ more potent than upstream's
> linear approach.

Actually, I think 1/32 for +10 is a bit to strong.  Introducing a scalefactor 
tunable may be useful.

Also, don't you think it reasonable to lower-bound the timeslices?


Thanks!

--
Al


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch] CFS scheduler, -v7
  2007-05-03  7:45 ` Ingo Molnar
  2007-05-03  8:07   ` Ingo Molnar
  2007-05-03  8:42   ` Al Boldi
@ 2007-05-03 15:02   ` Ting Yang
  2007-05-03 15:17     ` Ingo Molnar
  2 siblings, 1 reply; 26+ messages in thread
From: Ting Yang @ 2007-05-03 15:02 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

Hi, Ingo

    This is the test case that I think worth discuss and it leads me to 
find 2 things.

>> [...] but there are still some nice issues.
>>
>> Try running 3 chew.c's, then renicing one to -10, starves others for 
>> some seconds while switching prio-level.  Now renice it back to 10, it 
>> starves for up to 45sec.
>>     
>
> ok - to make sure i understood you correctly: does this starvation only 
> occur right when you renice it (when switching prio levels), and it gets 
> rectified quickly once they get over a few reschedules?
The main problem of what Al Boldi saw might come from this piece of code 
in sched_fair.c, which scales of fair_key difference needed to preempt 
the current task:

+rescale_load(struct task_struct *p, u64 value)
+{
+    int load_shift = p->load_shift;
+
+    if (load_shift == SCHED_LOAD_SHIFT)
+        return value;
+
+    return (value << load_shift) >> SCHED_LOAD_SHIFT;
+}
+
+static u64
+niced_granularity(struct rq *rq, struct task_struct *curr,
+          unsigned long granularity)
+{
+    return rescale_load(curr, granularity);
+}

Here is the checking for pre-emption:

+static inline void
+__check_preempt_curr_fair(struct rq *rq, struct task_struct *p,
+              struct task_struct *curr, unsigned long granularity)
+{
+    s64 __delta = curr->fair_key - p->fair_key;
+
+    /*
+     * Take scheduling granularity into account - do not
+     * preempt the current task unless the best task has
+     * a larger than sched_granularity fairness advantage:
+     */
+    if (__delta > niced_granularity(rq, curr, granularity))
+        resched_task(curr);
+}

This code actually now says, the difference of fair_key needed to 
preempt the current task is amplified by a facto of its weigh (in Al 
Boldi's example 32). However, the weighted task already advance its 
p->fair_key by its weight, (also 32 here). The combination of them 
becomes quadratic!

Let's starting from three nice 0 tasks p1, p2, p3, at time t=0, with 
niced_granularity set to be 5ms:
     Originally each task executes 5 ms in turn:
                  5ms for p1, 5ms for p2, 5ms p3, 5ms for p1, 5ms for 
p2, 5ms for p3 ...
     If somehow p3 is re-niced to -10 _right before_ the 16th ms, we run 
into the worst case after p3 gets the cpu:
        p1->fair_key = p2 ->fair_key = 10,   p3->fair_key = 5.
    
     Now, in order for p3 to be preempted, it has to make its fair_key 5 
* 32 larger than p1 and p2's fair_key. Furthermore, p3 now is higher 
weight, push its fair_key to increase by 1 now needs 32ms,
thus p3 will stay one cpu for 5 * 32 *32ms, which is about 5 second!
    
     Besides this quadratic effect, another minor issue amplified this a 
little bit further: p->wait_runtime accumulated before. During renice it 
is not adjusted to match the new nice value. The p->wait_runtime earned 
using previous weight has to be paid off using the current weight. If 
renice to larger weight you pay more than you need, otherwise you paid 
less, which introduces unfairness.

     Ingo, now, partially solved this problem by clearing 
p->wait_runtime when a task is reniced, but the quadratic effect of 
scaling is still there.

Thanks

Ting


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch] CFS scheduler, -v7
  2007-05-03 15:02   ` Ting Yang
@ 2007-05-03 15:17     ` Ingo Molnar
  2007-05-03 16:00       ` Ting Yang
  0 siblings, 1 reply; 26+ messages in thread
From: Ingo Molnar @ 2007-05-03 15:17 UTC (permalink / raw)
  To: Ting Yang; +Cc: linux-kernel


* Ting Yang <tingy@cs.umass.edu> wrote:

> +    s64 __delta = curr->fair_key - p->fair_key;
> +
> +    /*
> +     * Take scheduling granularity into account - do not
> +     * preempt the current task unless the best task has
> +     * a larger than sched_granularity fairness advantage:
> +     */
> +    if (__delta > niced_granularity(rq, curr, granularity))
> +        resched_task(curr);
> +}
> 
> This code actually now says, the difference of fair_key needed to 
> preempt the current task is amplified by a facto of its weigh (in Al 
> Boldi's example 32). However, the weighted task already advance its 
> p->fair_key by its weight, (also 32 here). The combination of them 
> becomes quadratic!

it's not quadratic in terms of CPU share: the first factor impacts the 
CPU share, the second factor impacts the granularity. This means that 
reniced workloads will be preempted in a more finegrained way - but 
otherwise there's _no_ quadratic effect for CPU time - which is a 
completely separate metric. Remember: there are no timeslices in CFS, so 
a task can be preempted any number of times without being at a 
disadvantage.

>     Besides this quadratic effect, another minor issue amplified this 
> a little bit further: p->wait_runtime accumulated before. [...]

actually, this 'minor issue' was the main issue that caused the bug ;-)

	Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch] CFS scheduler, -v7
  2007-05-03 15:17     ` Ingo Molnar
@ 2007-05-03 16:00       ` Ting Yang
  2007-05-03 19:48         ` Ingo Molnar
  0 siblings, 1 reply; 26+ messages in thread
From: Ting Yang @ 2007-05-03 16:00 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

Hi, Ingo

    I wrote that email in a hurry, therefore might not explain the 
problem clearly. However I do think there is a problem for this part, 
after I carefully read the code again. Now I want to try again :-) 
Hopefully, this time I will do a right job.

Starting from the following code:

+    if (__delta > niced_granularity(rq, curr, granularity))
+        resched_task(curr);

Suppose, "curr" has nice value -10, then curr->load_shift = 15.  
Granularity passed into this function is
fixed 2,000,000 (for CFS -v8). Let's just divide everything by 1,000,000 
for simplicity, say granularity used is 2.

Now, we look at how granularity is rescaled:

+    int load_shift = p->load_shift;
+
+    if (load_shift == SCHED_LOAD_SHIFT)
+        return value;
+
+    return (value << load_shift) >> SCHED_LOAD_SHIFT;

it returns (2  << 15) >> 10 = 2 * 32 = 64, therefore __delta has to be 
larger than 64 so that the current process can be preempted.

Suppose, "curr" executes for 1 tick, an timer interrupts comes. It 
executes about 1,000,000 (roughly speaking, since timer interrupts come 
1000/second). Since we divided everything by 1,000,000, it becomes 1 in 
this discussion. After this execution, how much will "curr" increments 
its fair_key?
It is weighted: 1/32.

then how much time is needed for "curr" to build a 2 * 32 difference on 
fair_key, with every 1 ms it updates fair_key by 1/32 ?    2 * 32 * 32 !
On the other hand, for a task  has nice value 1, the amount work needed 
to preemption is 2 * 1 *1.
If we have only 2 task running, p1 with nice value -10, p2 with nice 
value 0.
           p1 get cup share:  (32 * 32) / (32 * 32 + 1 *1)
           p2 get cpu share:  ( 1* 1) / (32 * 32 + 1 * 1)

I do see a quadratic effect here. Did I missed anything? sorry to bother 
you again, I just want to help :-)

Thanks a lot !

Ting

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch] CFS scheduler, -v7
  2007-05-03 16:00       ` Ting Yang
@ 2007-05-03 19:48         ` Ingo Molnar
  2007-05-03 19:57           ` William Lee Irwin III
  0 siblings, 1 reply; 26+ messages in thread
From: Ingo Molnar @ 2007-05-03 19:48 UTC (permalink / raw)
  To: Ting Yang; +Cc: linux-kernel

* Ting Yang <tingy@cs.umass.edu> wrote:

> then how much time is needed for "curr" to build a 2 * 32 difference 
> on fair_key, with every 1 ms it updates fair_key by 1/32 ?  2 * 32 * 
> 32 !

yes - but the "*32" impacts the rescheduling granularity, the "/32" 
impacts the speed of how the key moves. So the total execution speed of 
the nice -10 task is still "*32" of a nice 0 task - it's just that not 
only it gets 32 times more CPU time, it also gets it at 32 times larger 
chunks at once. But the rescheduling granularity does _not_ impact the 
CPU share the task gets, so there's no quadratic effect.

but this is really simple to test: boot up CFS, start two infinite 
loops, one at nice 0 and one at nice +10 and look at it via "top" and 
type 's 60' in top to get a really long update interval for precise 
results. You wont see quadratically less CPU time used up by the nice 
+10 task, you'll see it getting the intended 1/32 share of CPU time.

	Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch] CFS scheduler, -v7
  2007-05-03 19:48         ` Ingo Molnar
@ 2007-05-03 19:57           ` William Lee Irwin III
  0 siblings, 0 replies; 26+ messages in thread
From: William Lee Irwin III @ 2007-05-03 19:57 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Ting Yang, linux-kernel, davidel

* Ting Yang <tingy@cs.umass.edu> wrote:
>> then how much time is needed for "curr" to build a 2 * 32 difference 
>> on fair_key, with every 1 ms it updates fair_key by 1/32 ?  2 * 32 * 
>> 32 !

On Thu, May 03, 2007 at 09:48:27PM +0200, Ingo Molnar wrote:
> yes - but the "*32" impacts the rescheduling granularity, the "/32" 
> impacts the speed of how the key moves. So the total execution speed of 
> the nice -10 task is still "*32" of a nice 0 task - it's just that not 
> only it gets 32 times more CPU time, it also gets it at 32 times larger 
> chunks at once. But the rescheduling granularity does _not_ impact the 
> CPU share the task gets, so there's no quadratic effect.
> but this is really simple to test: boot up CFS, start two infinite 
> loops, one at nice 0 and one at nice +10 and look at it via "top" and 
> type 's 60' in top to get a really long update interval for precise 
> results. You wont see quadratically less CPU time used up by the nice 
> +10 task, you'll see it getting the intended 1/32 share of CPU time.

Davide has code to test this more rigorously. Looks like I don't need
to do very much to get a nice test going at all, besides fiddling with
options parsing and maybe a few other things.


-- wli

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2007-05-04 13:05 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-28 15:25 [patch] CFS scheduler, -v7 Ingo Molnar
2007-04-28 19:20 ` S.Çağlar Onur
2007-04-28 19:24   ` Ingo Molnar
2007-04-28 23:42     ` S.Çağlar Onur
2007-04-29  7:11       ` Ingo Molnar
2007-04-29 12:37         ` S.Çağlar Onur
2007-04-29 15:58           ` Ingo Molnar
2007-04-29 22:29             ` Dennis Brendel
2007-04-30 14:38             ` S.Çağlar Onur
2007-04-28 19:27   ` S.Çağlar Onur
2007-04-29 17:28 ` Prakash Punnoor
2007-05-04 13:05   ` Prakash Punnoor
2007-04-30 16:29 ` Srivatsa Vaddagiri
2007-04-30 18:30 ` Balbir Singh
  -- strict thread matches above, loose matches on Subject: below --
2007-04-30  5:20 Al Boldi
2007-05-03  7:45 ` Ingo Molnar
2007-05-03  8:07   ` Ingo Molnar
2007-05-03 11:16     ` Al Boldi
2007-05-03 12:36       ` Ingo Molnar
2007-05-03 13:49         ` Al Boldi
2007-05-03  8:42   ` Al Boldi
2007-05-03 15:02   ` Ting Yang
2007-05-03 15:17     ` Ingo Molnar
2007-05-03 16:00       ` Ting Yang
2007-05-03 19:48         ` Ingo Molnar
2007-05-03 19:57           ` William Lee Irwin III

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox