* [patch] CFS scheduler, -v7
@ 2007-04-28 15:25 Ingo Molnar
2007-04-28 19:20 ` S.Çağlar Onur
` (3 more replies)
0 siblings, 4 replies; 26+ messages in thread
From: Ingo Molnar @ 2007-04-28 15:25 UTC (permalink / raw)
To: linux-kernel
Cc: Linus Torvalds, Andrew Morton, Con Kolivas, Nick Piggin,
Mike Galbraith, Arjan van de Ven, Peter Williams, Thomas Gleixner,
caglar, Willy Tarreau, Gene Heskett, Mark Lord, Zach Carter,
Kasper Sandberg, buddabrod, Srivatsa Vaddagiri
i'm pleased to announce release -v7 of the CFS scheduler patchset. (The
main goal of CFS is to implement "desktop scheduling" with as high
quality as technically possible.)
The CFS patch against v2.6.21 (or against v2.6.20.8) can be downloaded
from the usual place:
http://redhat.com/~mingo/cfs-scheduler/
-v6 got lots of nice feedback and the -v5 list of regressions has shrunk
considerably. The most user-visible change in -v7 should be a fix for an
interactivity problem that should/could explain the 'audio skipping'
problem reported by Kasper Sandberg. (which was the only main regression
reported against -v6. Please re-report regressions, if any.)
the rate of change is moderate:
15 files changed, 150 insertions(+), 124 deletions(-)
half of that code-flux is due to the removal of the X auto-renice patch
and most of the rest is debugging related. It seems the CFS codebase is
slowly starting to settle down. (-v7 has been test-built and test-booted
on i686 and x86_64 UP and SMP systems.)
Changes since -v6:
- speedup: cache rb_leftmost better (Srivatsa Vaddagiri)
- bugfix: handle Priority Inheritance properly (Thomas Gleixner)
- interactivity fix: tighten up arithmetics some more.
- feature removal: remove the X auto-renicing feature, CONFIG_BOOST_X.
- debugging feature: introduce the sched_sleep_history_max_ns tunable
to modify sleep-history handling.
- debugging feature: /proc/<PID>/sched file contains various useful
scheduler statistics about every task.
- debugging feature: track the maximum amount of time a task has been
waiting to get on the CPU, the maximum amount of time it was blocked
involuntarily and the maximum amount of time it was sleeping
voluntarily.
As usual, any sort of feedback, bugreport, fix and suggestion is more
than welcome,
Ingo
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7
2007-04-28 15:25 Ingo Molnar
@ 2007-04-28 19:20 ` S.Çağlar Onur
2007-04-28 19:24 ` Ingo Molnar
2007-04-28 19:27 ` S.Çağlar Onur
2007-04-29 17:28 ` Prakash Punnoor
` (2 subsequent siblings)
3 siblings, 2 replies; 26+ messages in thread
From: S.Çağlar Onur @ 2007-04-28 19:20 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas,
Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams,
Thomas Gleixner, Willy Tarreau, Gene Heskett, Mark Lord,
Zach Carter, Kasper Sandberg, buddabrod, Srivatsa Vaddagiri
[-- Attachment #1: Type: text/plain, Size: 2760 bytes --]
28 Nis 2007 Cts tarihinde, Ingo Molnar şunları yazmıştı:
> - feature removal: remove the X auto-renicing feature, CONFIG_BOOST_X.
This the the first time i use CFS with without X auto-renicing and i think its
not smooth as before :(
While compiling something (say alsa-drivers) with just "make" firefox starts
to scrolls slowly, kmail frezes time to time, applications are not responsive
as before in that load and also system is not as fast as before while system
idle. Also somehow boot takes longer;
v6;
Apr 28 16:40:41 (up 10.57) /sbin/mudur.py sysinit
Apr 28 13:40:49 (up 17.75) /sbin/mudur.py boot
Apr 28 13:41:00 (up 28.61) /sbin/mudur.py default
v7;
Apr 29 00:35:49 (up 10.61) /sbin/mudur.py sysinit
Apr 28 21:36:14 (up 33.77) /sbin/mudur.py boot
Apr 28 21:36:16 (up 36.21) /sbin/mudur.py default
And i while trying to understand what caused this (or trying to confirm X is
the problem), i realized Xorg is not the only reniced process in v6
[caglar@zangetsu][~]> diff -u top.v7 top.v6
...
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
...
2 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
3 root 39 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
- 4 root 15 -5 0 0 0 S 0.0 0.0 0:00.11 events/0
- 5 root 15 -5 0 0 0 S 0.0 0.0 0:00.01 khelper
- 6 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kthread
- 26 root 15 -5 0 0 0 S 0.0 0.0 0:00.02 kblockd/0
- 27 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kacpid
- 124 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kseriod
- 137 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kapmd
+ 4 root 10 -10 0 0 0 S 0.0 0.0 0:00.01 events/0
+ 5 root 10 -10 0 0 0 S 0.0 0.0 0:00.01 khelper
+ 6 root 10 -10 0 0 0 S 0.0 0.0 0:00.00 kthread
+ 26 root 10 -10 0 0 0 S 0.0 0.0 0:00.03 kblockd/0
+ 27 root 10 -10 0 0 0 S 0.0 0.0 0:00.00 kacpid
+ 124 root 10 -10 0 0 0 S 0.0 0.0 0:00.00 kseriod
+ 137 root 10 -10 0 0 0 S 0.0 0.0 0:00.00 kapmd
...
Until now i didn't try to renice X manually to see if it changes anything in
v7 but i will (ırgh, i simply boot with v6).
> As usual, any sort of feedback, bugreport, fix and suggestion is more
> than welcome,
If you want some more output/info etc. please just say, i have both v6 and v7
available.
Cheers
--
S.Çağlar Onur <caglar@pardus.org.tr>
http://cekirdek.pardus.org.tr/~caglar/
Linux is like living in a teepee. No Windows, no Gates and an Apache in house!
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7
2007-04-28 19:20 ` S.Çağlar Onur
@ 2007-04-28 19:24 ` Ingo Molnar
2007-04-28 23:42 ` S.Çağlar Onur
2007-04-28 19:27 ` S.Çağlar Onur
1 sibling, 1 reply; 26+ messages in thread
From: Ingo Molnar @ 2007-04-28 19:24 UTC (permalink / raw)
To: S.Çağlar Onur
Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas,
Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams,
Thomas Gleixner, Willy Tarreau, Gene Heskett, Mark Lord,
Zach Carter, Kasper Sandberg, buddabrod, Srivatsa Vaddagiri
[-- Attachment #1: Type: text/plain, Size: 314 bytes --]
* S.Çağlar Onur <caglar@pardus.org.tr> wrote:
> If you want some more output/info etc. please just say, i have both v6
> and v7 available.
could you try the auto-renice patch ontop of -v7:
http://people.redhat.com/mingo/cfs-scheduler/sched-cfs-auto-renice.patch
does this make it behave like -v6?
Ingo
[-- Attachment #2: sched-cfs-auto-renice.patch --]
[-- Type: text/plain, Size: 6597 bytes --]
---
arch/i386/kernel/ioport.c | 17 ++++++++++++++---
arch/x86_64/kernel/ioport.c | 12 ++++++++++--
drivers/block/loop.c | 5 ++++-
include/linux/sched.h | 7 +++++++
kernel/Kconfig.preempt | 17 +++++++++++++++++
kernel/sched.c | 40 ++++++++++++++++++++++++++++++++++++++++
kernel/workqueue.c | 2 +-
mm/oom_kill.c | 4 +++-
8 files changed, 96 insertions(+), 8 deletions(-)
Index: linux/arch/i386/kernel/ioport.c
===================================================================
--- linux.orig/arch/i386/kernel/ioport.c
+++ linux/arch/i386/kernel/ioport.c
@@ -64,9 +64,17 @@ asmlinkage long sys_ioperm(unsigned long
if ((from + num <= from) || (from + num > IO_BITMAP_BITS))
return -EINVAL;
- if (turn_on && !capable(CAP_SYS_RAWIO))
- return -EPERM;
-
+ if (turn_on) {
+ if (!capable(CAP_SYS_RAWIO))
+ return -EPERM;
+ /*
+ * Task will be accessing hardware IO ports,
+ * mark it as special with the scheduler too:
+ */
+#ifdef CONFIG_BOOST_X
+ sched_privileged_task(current);
+#endif
+ }
/*
* If it's the first ioperm() call in this thread's lifetime, set the
* IO bitmap up. ioperm() is much less timing critical than clone(),
@@ -145,6 +153,9 @@ asmlinkage long sys_iopl(unsigned long u
if (level > old) {
if (!capable(CAP_SYS_RAWIO))
return -EPERM;
+#ifdef CONFIG_BOOST_X
+ sched_privileged_task(current);
+#endif
}
t->iopl = level << 12;
regs->eflags = (regs->eflags & ~X86_EFLAGS_IOPL) | t->iopl;
Index: linux/arch/x86_64/kernel/ioport.c
===================================================================
--- linux.orig/arch/x86_64/kernel/ioport.c
+++ linux/arch/x86_64/kernel/ioport.c
@@ -41,8 +41,13 @@ asmlinkage long sys_ioperm(unsigned long
if ((from + num <= from) || (from + num > IO_BITMAP_BITS))
return -EINVAL;
- if (turn_on && !capable(CAP_SYS_RAWIO))
- return -EPERM;
+ if (turn_on) {
+ if (!capable(CAP_SYS_RAWIO))
+ return -EPERM;
+#ifdef CONFIG_BOOST_X
+ sched_privileged_task(current);
+#endif
+ }
/*
* If it's the first ioperm() call in this thread's lifetime, set the
@@ -113,6 +118,9 @@ asmlinkage long sys_iopl(unsigned int le
if (level > old) {
if (!capable(CAP_SYS_RAWIO))
return -EPERM;
+#ifdef CONFIG_BOOST_X
+ sched_privileged_task(current);
+#endif
}
regs->eflags = (regs->eflags &~ X86_EFLAGS_IOPL) | (level << 12);
return 0;
Index: linux/drivers/block/loop.c
===================================================================
--- linux.orig/drivers/block/loop.c
+++ linux/drivers/block/loop.c
@@ -588,7 +588,10 @@ static int loop_thread(void *data)
*/
current->flags |= PF_NOFREEZE;
- set_user_nice(current, -20);
+ /*
+ * The loop thread is important enough to be given a boost:
+ */
+ sched_privileged_task(current);
while (!kthread_should_stop() || lo->lo_bio) {
Index: linux/include/linux/sched.h
===================================================================
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -1268,6 +1268,13 @@ static inline int rt_mutex_getprio(struc
#endif
extern void set_user_nice(struct task_struct *p, long nice);
+/*
+ * Task has special privileges, give it more CPU power:
+ */
+extern void sched_privileged_task(struct task_struct *p);
+
+extern int sysctl_sched_privileged_nice_level;
+
extern int task_prio(const struct task_struct *p);
extern int task_nice(const struct task_struct *p);
extern int can_nice(const struct task_struct *p, const int nice);
Index: linux/kernel/Kconfig.preempt
===================================================================
--- linux.orig/kernel/Kconfig.preempt
+++ linux/kernel/Kconfig.preempt
@@ -63,3 +63,20 @@ config PREEMPT_BKL
Say Y here if you are building a kernel for a desktop system.
Say N if you are unsure.
+config BOOST_X
+ bool "Boost X"
+ default y
+ help
+ This option instructs the kernel to guarantee more CPU time to
+ X than to other tasks, which is useful if you want to have a
+ faster desktop even under high system load.
+
+ This option works by automatically boosting X's priority via
+ renicing it to -10. NOTE: CFS does not suffer from
+ "overscheduling" problems when X is reniced to -10, so if this
+ is a predominantly desktop box it makes sense to select this
+ option.
+
+ Say Y here if you are building a kernel for a desktop system.
+ Say N if you want X to be treated as a normal task.
+
Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -3323,6 +3323,46 @@ out_unlock:
EXPORT_SYMBOL(set_user_nice);
/*
+ * Nice level for privileged tasks. (can be set to 0 for this
+ * to be turned off)
+ */
+int sysctl_sched_privileged_nice_level __read_mostly = -10;
+
+static int __init privileged_nice_level_setup(char *str)
+{
+ sysctl_sched_privileged_nice_level = simple_strtol(str, NULL, 0);
+ return 1;
+}
+__setup("privileged_nice_level=", privileged_nice_level_setup);
+
+/*
+ * Tasks with special privileges call this and gain extra nice
+ * levels:
+ */
+void sched_privileged_task(struct task_struct *p)
+{
+ long new_nice = sysctl_sched_privileged_nice_level;
+ long old_nice = TASK_NICE(p);
+
+ if (new_nice >= old_nice)
+ return;
+ /*
+ * Setting the sysctl to 0 turns off the boosting:
+ */
+ if (unlikely(!new_nice))
+ return;
+
+ if (new_nice < -20)
+ new_nice = -20;
+ else if (new_nice > 19)
+ new_nice = 19;
+
+ set_user_nice(p, new_nice);
+}
+
+EXPORT_SYMBOL(sched_privileged_task);
+
+/*
* can_nice - check if a task can reduce its nice value
* @p: task
* @nice: nice value
Index: linux/kernel/workqueue.c
===================================================================
--- linux.orig/kernel/workqueue.c
+++ linux/kernel/workqueue.c
@@ -355,7 +355,7 @@ static int worker_thread(void *__cwq)
if (!cwq->freezeable)
current->flags |= PF_NOFREEZE;
- set_user_nice(current, -5);
+ sched_privileged_task(current);
/* Block and flush all signals */
sigfillset(&blocked);
Index: linux/mm/oom_kill.c
===================================================================
--- linux.orig/mm/oom_kill.c
+++ linux/mm/oom_kill.c
@@ -293,7 +293,9 @@ static void __oom_kill_task(struct task_
* all the memory it needs. That way it should be able to
* exit() and clear out its resources quickly...
*/
- p->time_slice = HZ;
+ if (p->policy == SCHED_NORMAL || p->policy == SCHED_BATCH)
+ sched_privileged_task(p);
+
set_tsk_thread_flag(p, TIF_MEMDIE);
force_sig(SIGKILL, p);
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7
2007-04-28 19:20 ` S.Çağlar Onur
2007-04-28 19:24 ` Ingo Molnar
@ 2007-04-28 19:27 ` S.Çağlar Onur
1 sibling, 0 replies; 26+ messages in thread
From: S.Çağlar Onur @ 2007-04-28 19:27 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas,
Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams,
Thomas Gleixner, Willy Tarreau, Gene Heskett, Mark Lord,
Zach Carter, Kasper Sandberg, buddabrod, Srivatsa Vaddagiri
[-- Attachment #1: Type: text/plain, Size: 699 bytes --]
28 Nis 2007 Cts tarihinde, S.Çağlar Onur şunları yazmıştı:
> Also somehow boot takes longer;
>
> v6;
> Apr 28 16:40:41 (up 10.57) /sbin/mudur.py sysinit
> Apr 28 13:40:49 (up 17.75) /sbin/mudur.py boot
> Apr 28 13:41:00 (up 28.61) /sbin/mudur.py default
>
> v7;
> Apr 29 00:35:49 (up 10.61) /sbin/mudur.py sysinit
> Apr 28 21:36:14 (up 33.77) /sbin/mudur.py boot
> Apr 28 21:36:16 (up 36.21) /sbin/mudur.py default
The ones in paranthesis (second) are correct ones, date/time screws because of
my bios :(
--
S.Çağlar Onur <caglar@pardus.org.tr>
http://cekirdek.pardus.org.tr/~caglar/
Linux is like living in a teepee. No Windows, no Gates and an Apache in house!
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7
2007-04-28 19:24 ` Ingo Molnar
@ 2007-04-28 23:42 ` S.Çağlar Onur
2007-04-29 7:11 ` Ingo Molnar
0 siblings, 1 reply; 26+ messages in thread
From: S.Çağlar Onur @ 2007-04-28 23:42 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas,
Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams,
Thomas Gleixner, Willy Tarreau, Gene Heskett, Mark Lord,
Zach Carter, Kasper Sandberg, buddabrod, Srivatsa Vaddagiri
[-- Attachment #1: Type: text/plain, Size: 987 bytes --]
28 Nis 2007 Cts tarihinde, Ingo Molnar şunları yazmıştı:
> * S.Çağlar Onur <caglar@pardus.org.tr> wrote:
> > If you want some more output/info etc. please just say, i have both v6
> > and v7 available.
>
> could you try the auto-renice patch ontop of -v7:
>
> http://people.redhat.com/mingo/cfs-scheduler/sched-cfs-auto-renice.patch
>
> does this make it behave like -v6?
Ingo, please ignore my first report until i found a proper way to reproduce
the slowness cause currently CFS-v7, CFS-v7 + "renice patch", CFS-v7 + renice
+ your private mail suggestions and CFS-v6 + "PI support for futexes patch"
seems works equally (which is a good thing so X renicing seems really not
needed, and there were no regression instead of my daydreams) or im too tired
to understand the differences.
--
S.Çağlar Onur <caglar@pardus.org.tr>
http://cekirdek.pardus.org.tr/~caglar/
Linux is like living in a teepee. No Windows, no Gates and an Apache in house!
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7
2007-04-28 23:42 ` S.Çağlar Onur
@ 2007-04-29 7:11 ` Ingo Molnar
2007-04-29 12:37 ` S.Çağlar Onur
0 siblings, 1 reply; 26+ messages in thread
From: Ingo Molnar @ 2007-04-29 7:11 UTC (permalink / raw)
To: S.Çağlar Onur
Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas,
Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams,
Thomas Gleixner, Willy Tarreau, Gene Heskett, Mark Lord,
Zach Carter, Kasper Sandberg, buddabrod, Srivatsa Vaddagiri
* S.Çağlar Onur <caglar@pardus.org.tr> wrote:
> Ingo, please ignore my first report until i found a proper way to
> reproduce the slowness cause currently CFS-v7, CFS-v7 + "renice
> patch", CFS-v7 + renice + your private mail suggestions and CFS-v6 +
> "PI support for futexes patch" seems works equally (which is a good
> thing so X renicing seems really not needed, [...]
oh, good!
> [...] and there were no regression instead of my daydreams) or im too
> tired to understand the differences.
could the CPU have dropped speed for that bootup (some CPUs do that
automatically upon overheating), or perhaps if you are using some RAID
array, could it have done a background resync? Especially the bootup
slowdown you saw seemed significant, and because bootup speed is 90% IO
dominated, the CPU scheduler seems an unlikely candidate.
Ingo
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7
2007-04-29 7:11 ` Ingo Molnar
@ 2007-04-29 12:37 ` S.Çağlar Onur
2007-04-29 15:58 ` Ingo Molnar
0 siblings, 1 reply; 26+ messages in thread
From: S.Çağlar Onur @ 2007-04-29 12:37 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas,
Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams,
Thomas Gleixner, Willy Tarreau, Gene Heskett, Mark Lord,
Zach Carter, Kasper Sandberg, buddabrod, Srivatsa Vaddagiri
[-- Attachment #1: Type: text/plain, Size: 1199 bytes --]
29 Nis 2007 Paz tarihinde, Ingo Molnar şunları yazmıştı:
> > [...] and there were no regression instead of my daydreams) or im too
> > tired to understand the differences.
>
> could the CPU have dropped speed for that bootup (some CPUs do that
> automatically upon overheating), or perhaps if you are using some RAID
> array, could it have done a background resync? Especially the bootup
> slowdown you saw seemed significant, and because bootup speed is 90% IO
> dominated, the CPU scheduler seems an unlikely candidate.
It could some overheating problem but i think if it is, this is the first time
it occurs :), and i don't have any array, SAMSUNG HM120JC IDE disk works on
SONY VAIO FS-215B.
I just boot with plain CFSv7 and boot time seems normal;
Apr 29 15:02:54 (up 10.72) /sbin/mudur.py sysinit
Apr 29 15:03:02 (up 17.26) /sbin/mudur.py boot
Apr 29 15:03:06 (up 21.34) /sbin/mudur.py default
I'll report if i can find any reproducable problem, so far CFSv7 works as
expected :)
Cheers
--
S.Çağlar Onur <caglar@pardus.org.tr>
http://cekirdek.pardus.org.tr/~caglar/
Linux is like living in a teepee. No Windows, no Gates and an Apache in house!
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7
2007-04-29 12:37 ` S.Çağlar Onur
@ 2007-04-29 15:58 ` Ingo Molnar
2007-04-29 22:29 ` Dennis Brendel
2007-04-30 14:38 ` S.Çağlar Onur
0 siblings, 2 replies; 26+ messages in thread
From: Ingo Molnar @ 2007-04-29 15:58 UTC (permalink / raw)
To: S.Çağlar Onur
Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas,
Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams,
Thomas Gleixner, Willy Tarreau, Gene Heskett, Mark Lord,
Zach Carter, Kasper Sandberg, buddabrod, Srivatsa Vaddagiri
* S.Çağlar Onur <caglar@pardus.org.tr> wrote:
> I'll report if i can find any reproducable problem, so far CFSv7 works
> as expected :)
ok :)
Ingo
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7
2007-04-28 15:25 Ingo Molnar
2007-04-28 19:20 ` S.Çağlar Onur
@ 2007-04-29 17:28 ` Prakash Punnoor
2007-05-04 13:05 ` Prakash Punnoor
2007-04-30 16:29 ` Srivatsa Vaddagiri
2007-04-30 18:30 ` Balbir Singh
3 siblings, 1 reply; 26+ messages in thread
From: Prakash Punnoor @ 2007-04-29 17:28 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas,
Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams,
Thomas Gleixner, caglar, Willy Tarreau, Gene Heskett, Mark Lord,
Zach Carter, Kasper Sandberg, buddabrod, Srivatsa Vaddagiri
[-- Attachment #1: Type: text/plain, Size: 787 bytes --]
Am Samstag 28 April 2007 schrieb Ingo Molnar:
> i'm pleased to announce release -v7 of the CFS scheduler patchset. (The
> main goal of CFS is to implement "desktop scheduling" with as high
> quality as technically possible.)
>
> The CFS patch against v2.6.21 (or against v2.6.20.8) can be downloaded
> from the usual place:
I made a quick test with the ac3 encoder aften, I tested against rsdl 0.36 (I
think it was). Time was slightly worse: >5.9 secs (with two threads on
Athlon64 X2 x86_64). mainline gives me 5.4sec. rsdl took 5.8 sec.
I haven't testes earlier cfs nor later sd. Seems scheduler gets optimized more
for single core than smp?
http://aften.sourceforge.net/
Cheers,
--
(°= =°)
//\ Prakash Punnoor /\\
V_/ \_V
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7
2007-04-29 15:58 ` Ingo Molnar
@ 2007-04-29 22:29 ` Dennis Brendel
2007-04-30 14:38 ` S.Çağlar Onur
1 sibling, 0 replies; 26+ messages in thread
From: Dennis Brendel @ 2007-04-29 22:29 UTC (permalink / raw)
To: Ingo Molnar
Cc: S.Çağlar Onur, linux-kernel, Linus Torvalds,
Andrew Morton, Con Kolivas, Nick Piggin, Mike Galbraith,
Arjan van de Ven, Peter Williams, Thomas Gleixner, Willy Tarreau,
Gene Heskett, Mark Lord, Zach Carter, Kasper Sandberg,
Srivatsa Vaddagiri
On Sunday 29 April 2007, Ingo Molnar wrote:
> * S.Çağlar Onur <caglar@pardus.org.tr> wrote:
> > I'll report if i can find any reproducable problem, so far CFSv7 works
> > as expected :)
>
> ok :)
>
> Ingo
One small regression: I worked for hours with cfs v7 as scheduler while
listening to music (with amarok/xine engine) and the music stuttered 3 times
in 5 hours. After then i watched a movie and that stuttered 2 times, too. But
when i watched that previously stuttering scenes again, there was no
stuttering.
It's not directly reproducable, it happens randomly.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7
@ 2007-04-30 5:20 Al Boldi
2007-05-03 7:45 ` Ingo Molnar
0 siblings, 1 reply; 26+ messages in thread
From: Al Boldi @ 2007-04-30 5:20 UTC (permalink / raw)
To: linux-kernel
Ingo Molnar wrote:
>
> i'm pleased to announce release -v7 of the CFS scheduler patchset. (The
> main goal of CFS is to implement "desktop scheduling" with as high
> quality as technically possible.)
:
:
> As usual, any sort of feedback, bugreport, fix and suggestion is more
> than welcome,
This one seems on par with SD, but there are still some nice issues.
Try running 3 chew.c's, then renicing one to -10, starves others for some
seconds while switching prio-level. Now renice it back to 10, it starves
for up to 45sec.
Also, nice levels are only effective on every other step; ie:
... -3/-2 , -1/0 , 1/2 ... yields only 20 instead of 40 prio-levels.
Thanks!
--
Al
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7
2007-04-29 15:58 ` Ingo Molnar
2007-04-29 22:29 ` Dennis Brendel
@ 2007-04-30 14:38 ` S.Çağlar Onur
1 sibling, 0 replies; 26+ messages in thread
From: S.Çağlar Onur @ 2007-04-30 14:38 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas,
Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams,
Thomas Gleixner, Willy Tarreau, Gene Heskett, Mark Lord,
Zach Carter, Kasper Sandberg, buddabrod, Srivatsa Vaddagiri
[-- Attachment #1: Type: text/plain, Size: 1787 bytes --]
Hi Ingo;
29 Nis 2007 Paz tarihinde, Ingo Molnar şunları yazmıştı:
> * S.Çağlar Onur <caglar@pardus.org.tr> wrote:
> > I'll report if i can find any reproducable problem, so far CFSv7 works
> > as expected :)
After complete 2 day usage, i still can't reproduce previous problems. And
even can manage to burn a dvd with k3b with one console compiles kernel,
other checkouts svn repos, amarok plays music, virtualbox boots another
distro and with 3 tab opened firefox setup :). And i must say system was
still usable at that time the only thing happened was a little firefox
slowdown but its really negligable cause with mainline firefox/whole system
really unusable if i fire a guest in virtualbox.
And lastly boot time still around ~20 sec. which is expected
Apr 29 01:39:04 (up 10.48) /sbin/mudur.py sysinit
Apr 29 01:39:13 (up 18.83) /sbin/mudur.py boot
Apr 29 01:39:18 (up 23.29) /sbin/mudur.py default
...
Apr 29 15:02:54 (up 10.72) /sbin/mudur.py sysinit
Apr 29 15:03:02 (up 17.26) /sbin/mudur.py boot
Apr 29 15:03:06 (up 21.34) /sbin/mudur.py default
...
Apr 30 14:21:56 (up 22.80) /sbin/mudur.py default
Apr 30 14:28:34 (up 15.67) /sbin/mudur.py sysinit
Apr 30 14:28:40 (up 21.02) /sbin/mudur.py boot
...
Apr 30 14:33:26 (up 10.59) /sbin/mudur.py sysinit
Apr 30 14:33:32 (up 15.97) /sbin/mudur.py boot
Apr 30 14:33:35 (up 18.69) /sbin/mudur.py default
...
Apr 30 17:02:09 (up 10.70) /sbin/mudur.py sysinit
Apr 30 17:02:16 (up 17.15) /sbin/mudur.py boot
Apr 30 17:02:19 (up 20.16) /sbin/mudur.py default
So for me, there is no problem at all :)...
Cheers
--
S.Çağlar Onur <caglar@pardus.org.tr>
http://cekirdek.pardus.org.tr/~caglar/
Linux is like living in a teepee. No Windows, no Gates and an Apache in house!
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7
2007-04-28 15:25 Ingo Molnar
2007-04-28 19:20 ` S.Çağlar Onur
2007-04-29 17:28 ` Prakash Punnoor
@ 2007-04-30 16:29 ` Srivatsa Vaddagiri
2007-04-30 18:30 ` Balbir Singh
3 siblings, 0 replies; 26+ messages in thread
From: Srivatsa Vaddagiri @ 2007-04-30 16:29 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas,
Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams,
Thomas Gleixner, caglar, Willy Tarreau, Gene Heskett, Mark Lord,
Zach Carter, Kasper Sandberg, buddabrod
On Sat, Apr 28, 2007 at 05:25:39PM +0200, Ingo Molnar wrote:
> i'm pleased to announce release -v7 of the CFS scheduler patchset. (The
> main goal of CFS is to implement "desktop scheduling" with as high
> quality as technically possible.)
+unsigned int sysctl_sched_granularity __read_mostly = 2000000;
Any reason why this tunable can't be 64-bit?
--
Regards,
vatsa
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7
2007-04-28 15:25 Ingo Molnar
` (2 preceding siblings ...)
2007-04-30 16:29 ` Srivatsa Vaddagiri
@ 2007-04-30 18:30 ` Balbir Singh
3 siblings, 0 replies; 26+ messages in thread
From: Balbir Singh @ 2007-04-30 18:30 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas,
Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams,
Thomas Gleixner, caglar, Willy Tarreau, Gene Heskett, Mark Lord,
Zach Carter, Kasper Sandberg, buddabrod, Srivatsa Vaddagiri
[-- Attachment #1: Type: text/plain, Size: 1131 bytes --]
Ingo Molnar wrote:
> i'm pleased to announce release -v7 of the CFS scheduler patchset. (The
> main goal of CFS is to implement "desktop scheduling" with as high
> quality as technically possible.)
>
> The CFS patch against v2.6.21 (or against v2.6.20.8) can be downloaded
> from the usual place:
>
> http://redhat.com/~mingo/cfs-scheduler/
>
Hi, Ingo,
I needed the following fixes on my powerpc box to fix all warnings
generated by the compiler out during compilation. Without these fixes, I was
seeing negative values in /proc/sched_debug on my box. I still see a
negative value for "waiting"
task PID tree-key delta waiting switches prio
wstart-fair sum-exec sum-wait
-------------------------------------------------------------------------------------------------------------------
R bash 5594 69839216478 6238568 -6238568 1617 120
-69832977910 66377151352 27173793
I've started on cfs late (with -v7), hopefully I'll catch up.
More questions, feedback will follow.
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
[-- Attachment #2: cfs-v7-fix-sched-debug-warnings --]
[-- Type: text/plain, Size: 3498 bytes --]
Index: linux-2.6.21/kernel/sched_debug.c
===================================================================
--- linux-2.6.21.orig/kernel/sched_debug.c 2007-04-30 23:17:10.000000000 +0530
+++ linux-2.6.21/kernel/sched_debug.c 2007-04-30 23:49:40.000000000 +0530
@@ -45,13 +45,13 @@
SEQ_printf(m, "%14s %5d %12Ld %11Ld %10Ld %9Ld %5d "
"%13Ld %13Ld %13Ld\n",
p->comm, p->pid,
- p->fair_key, p->fair_key - rq->fair_clock,
- p->wait_runtime,
- p->nr_switches,
+ (long long)p->fair_key, (long long)p->fair_key - rq->fair_clock,
+ (long long)p->wait_runtime,
+ (long long)p->nr_switches,
p->prio,
- p->wait_start_fair - rq->fair_clock,
- p->sum_exec_runtime,
- p->sum_wait_runtime);
+ (long long)p->wait_start_fair - rq->fair_clock,
+ (long long)p->sum_exec_runtime,
+ (long long)p->sum_wait_runtime);
}
static void print_rq(struct seq_file *m, struct rq *rq, u64 now)
@@ -83,7 +83,7 @@
SEQ_printf(m, "\ncpu: %d\n", cpu);
#define P(x) \
- SEQ_printf(m, " .%-22s: %Ld\n", #x, (u64)(rq->x))
+ SEQ_printf(m, " .%-22s: %Lu\n", #x, (unsigned long long)(rq->x))
P(nr_running);
P(raw_weighted_load);
@@ -110,7 +110,7 @@
int cpu;
SEQ_printf(m, "Sched Debug Version: v0.02\n");
- SEQ_printf(m, "now at %Ld nsecs\n", (unsigned long long)now);
+ SEQ_printf(m, "now at %Lu nsecs\n", (unsigned long long)now);
for_each_online_cpu(cpu)
print_cpu(m, cpu, now);
Index: linux-2.6.21/kernel/sched.c
===================================================================
--- linux-2.6.21.orig/kernel/sched.c 2007-04-30 23:42:04.000000000 +0530
+++ linux-2.6.21/kernel/sched.c 2007-04-30 23:49:44.000000000 +0530
@@ -229,7 +229,7 @@
unsigned long long t0, t1;
#define P(F) \
- buffer += sprintf(buffer, "%-25s:%20Ld\n", #F, p->F)
+ buffer += sprintf(buffer, "%-25s:%20Ld\n", #F, (long long)p->F)
P(wait_start);
P(wait_start_fair);
@@ -248,22 +248,22 @@
t0 = sched_clock();
t1 = sched_clock();
- buffer += sprintf(buffer, "%-25s:%20Ld\n", "clock-delta", t1-t0);
- buffer += sprintf(buffer, "%-25s:%20Ld\n",
- "rq-wait_runtime", this_rq->wait_runtime);
- buffer += sprintf(buffer, "%-25s:%20Ld\n",
- "rq-fair_clock", this_rq->fair_clock);
- buffer += sprintf(buffer, "%-25s:%20Ld\n",
- "rq-clock", this_rq->clock);
- buffer += sprintf(buffer, "%-25s:%20Ld\n",
- "rq-prev_clock_raw", this_rq->prev_clock_raw);
- buffer += sprintf(buffer, "%-25s:%20Ld\n",
- "rq-clock_max_delta", this_rq->clock_max_delta);
- buffer += sprintf(buffer, "%-25s:%20u\n",
- "rq-clock_warps", this_rq->clock_warps);
- buffer += sprintf(buffer, "%-25s:%20u\n",
- "rq-clock_unstable_events",
- this_rq->clock_unstable_events);
+ buffer += sprintf(buffer, "%-25s:%20Ld\n", "clock-delta",
+ (long long)t1-t0);
+ buffer += sprintf(buffer, "%-25s:%20Ld\n", "rq-wait_runtime",
+ (long long)this_rq->wait_runtime);
+ buffer += sprintf(buffer, "%-25s:%20Ld\n", "rq-fair_clock",
+ (long long)this_rq->fair_clock);
+ buffer += sprintf(buffer, "%-25s:%20Ld\n", "rq-clock",
+ (long long)this_rq->clock);
+ buffer += sprintf(buffer, "%-25s:%20Ld\n", "rq-prev_clock_raw",
+ (long long)this_rq->prev_clock_raw);
+ buffer += sprintf(buffer, "%-25s:%20Ld\n", "rq-clock_max_delta",
+ (long long)this_rq->clock_max_delta);
+ buffer += sprintf(buffer, "%-25s:%20u\n", "rq-clock_warps",
+ this_rq->clock_warps);
+ buffer += sprintf(buffer, "%-25s:%20u\n", "rq-clock_unstable_events",
+ this_rq->clock_unstable_events);
return buffer;
}
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7
2007-04-30 5:20 [patch] CFS scheduler, -v7 Al Boldi
@ 2007-05-03 7:45 ` Ingo Molnar
2007-05-03 8:07 ` Ingo Molnar
` (2 more replies)
0 siblings, 3 replies; 26+ messages in thread
From: Ingo Molnar @ 2007-05-03 7:45 UTC (permalink / raw)
To: Al Boldi; +Cc: linux-kernel
* Al Boldi <a1426z@gawab.com> wrote:
> > i'm pleased to announce release -v7 of the CFS scheduler patchset.
> > (The main goal of CFS is to implement "desktop scheduling" with as
> > high quality as technically possible.)
> :
> :
> > As usual, any sort of feedback, bugreport, fix and suggestion is
> > more than welcome,
>
> This one seems on par with SD, [...]
excellent :-)
> [...] but there are still some nice issues.
>
> Try running 3 chew.c's, then renicing one to -10, starves others for
> some seconds while switching prio-level. Now renice it back to 10, it
> starves for up to 45sec.
ok - to make sure i understood you correctly: does this starvation only
occur right when you renice it (when switching prio levels), and it gets
rectified quickly once they get over a few reschedules?
> Also, nice levels are only effective on every other step; ie:
> ... -3/-2 , -1/0 , 1/2 ... yields only 20 instead of 40 prio-levels.
yeah - this is a first-approximation thing.
Some background: in the upstream scheduler (and in SD) nice levels are
linearly scaled, while in CFS they are exponentially scaled. I did this
because i believe exponential is more logical: regardless of which nice
level a task uses, if it goes +2 nice levels up then it will halve its
"fair CPU share". So for example the CPU consumption delta between nice
0 and nice +10 is 1/32 - and so is the delta between -5 and +5, -10 and
-5, etc. This makes nice levels _alot_ more potent than upstream's
linear approach.
Ingo
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7
2007-05-03 7:45 ` Ingo Molnar
@ 2007-05-03 8:07 ` Ingo Molnar
2007-05-03 11:16 ` Al Boldi
2007-05-03 8:42 ` Al Boldi
2007-05-03 15:02 ` Ting Yang
2 siblings, 1 reply; 26+ messages in thread
From: Ingo Molnar @ 2007-05-03 8:07 UTC (permalink / raw)
To: Al Boldi; +Cc: linux-kernel
* Ingo Molnar <mingo@elte.hu> wrote:
> > [...] but there are still some nice issues.
> >
> > Try running 3 chew.c's, then renicing one to -10, starves others for
> > some seconds while switching prio-level. Now renice it back to 10,
> > it starves for up to 45sec.
>
> ok - to make sure i understood you correctly: does this starvation
> only occur right when you renice it (when switching prio levels), and
> it gets rectified quickly once they get over a few reschedules?
meanwhile i managed to reproduce it by following the exact steps you
described, and i've fixed the bug in my tree. Can you confirm that the
patch below fixes it for you too?
Ingo
----------------->
From: Ingo Molnar <mingo@elte.hu>
Subject: [patch] sched, cfs: fix starvation upon nice level switching
Al Boldi reported the following bug: when switching a CPU-intense task's
nice levels they can get unfairly starved right after the priority level
switching. The bug was that when changing the load_weight the
->wait_runtime value did not get rescaled. So clear wait_runtime when
switching nice levels.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -575,6 +580,7 @@ static void set_load_weight(struct task_
{
p->load_shift = get_load_shift(p);
p->load_weight = 1 << p->load_shift;
+ p->wait_runtime = 0;
}
static inline void
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7
2007-05-03 7:45 ` Ingo Molnar
2007-05-03 8:07 ` Ingo Molnar
@ 2007-05-03 8:42 ` Al Boldi
2007-05-03 15:02 ` Ting Yang
2 siblings, 0 replies; 26+ messages in thread
From: Al Boldi @ 2007-05-03 8:42 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel
Ingo Molnar wrote:
> * Al Boldi <a1426z@gawab.com> wrote:
> > > i'm pleased to announce release -v7 of the CFS scheduler patchset.
> > > (The main goal of CFS is to implement "desktop scheduling" with as
> > > high quality as technically possible.)
> > >
> > >
> > > As usual, any sort of feedback, bugreport, fix and suggestion is
> > > more than welcome,
> >
> > This one seems on par with SD, [...]
>
> excellent :-)
>
> > [...] but there are still some nice issues.
> >
> > Try running 3 chew.c's, then renicing one to -10, starves others for
> > some seconds while switching prio-level. Now renice it back to 10, it
> > starves for up to 45sec.
>
> ok - to make sure i understood you correctly: does this starvation only
> occur right when you renice it (when switching prio levels),
Yes.
> and it gets
> rectified quickly once they get over a few reschedules?
Well, depending on nice level this delay may be more than 45sec.
And, in cfs-v8 there is an additional repeating latency blip akin to an
expiry when running procs at different nice levels. chew.c shows this
clearly.
> > Also, nice levels are only effective on every other step; ie:
> > ... -3/-2 , -1/0 , 1/2 ... yields only 20 instead of 40 prio-levels.
>
> yeah - this is a first-approximation thing.
>
> Some background: in the upstream scheduler (and in SD) nice levels are
> linearly scaled, while in CFS they are exponentially scaled. I did this
> because i believe exponential is more logical: regardless of which nice
> level a task uses, if it goes +2 nice levels up then it will halve its
> "fair CPU share". So for example the CPU consumption delta between nice
> 0 and nice +10 is 1/32 - and so is the delta between -5 and +5, -10 and
> -5, etc. This makes nice levels _alot_ more potent than upstream's
> linear approach.
Actually, I think 1/32 for +10 is a bit to strong. Introducing a scalefactor
tunable may be useful.
Also, don't you think it reasonable to lower-bound the timeslices?
Thanks!
--
Al
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7
2007-05-03 8:07 ` Ingo Molnar
@ 2007-05-03 11:16 ` Al Boldi
2007-05-03 12:36 ` Ingo Molnar
0 siblings, 1 reply; 26+ messages in thread
From: Al Boldi @ 2007-05-03 11:16 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel
Ingo Molnar wrote:
> * Ingo Molnar <mingo@elte.hu> wrote:
> > > [...] but there are still some nice issues.
> > >
> > > Try running 3 chew.c's, then renicing one to -10, starves others for
> > > some seconds while switching prio-level. Now renice it back to 10,
> > > it starves for up to 45sec.
> >
> > ok - to make sure i understood you correctly: does this starvation
> > only occur right when you renice it (when switching prio levels), and
> > it gets rectified quickly once they get over a few reschedules?
>
> meanwhile i managed to reproduce it by following the exact steps you
> described, and i've fixed the bug in my tree. Can you confirm that the
> patch below fixes it for you too?
Seems like this fixed it. But I can still see these awful latency blips in
the presence of negatively niced chew.c at -10 and two chew.c's at nice 0.
This gets really bad when sched_granularity_ns >= 5,000,000.
Thanks!
--
Al
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7
2007-05-03 11:16 ` Al Boldi
@ 2007-05-03 12:36 ` Ingo Molnar
2007-05-03 13:49 ` Al Boldi
0 siblings, 1 reply; 26+ messages in thread
From: Ingo Molnar @ 2007-05-03 12:36 UTC (permalink / raw)
To: Al Boldi; +Cc: linux-kernel
* Al Boldi <a1426z@gawab.com> wrote:
> [...] But I can still see these awful latency blips in the presence of
> negatively niced chew.c at -10 and two chew.c's at nice 0. [...]
of course: you asked for the two chew's to be treated like that and CFS
delivered it! :-)
nice -10 means the two chew's will get ~90+% of the CPU time, and all
other nice 0 tasks will get <10% of CPU time.
in the previous mail i have described the new exponential-scale nice
levels that CFS introduces. In practice this means that vanilla kernel's
nice -20 level is roughly equivalent to CFS's nice -6. CFS's nice -10
would be roughly equivalent to vanilla nice -80 (if it existed).
Ingo
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7
2007-05-03 12:36 ` Ingo Molnar
@ 2007-05-03 13:49 ` Al Boldi
0 siblings, 0 replies; 26+ messages in thread
From: Al Boldi @ 2007-05-03 13:49 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel
Ingo Molnar wrote:
> * Al Boldi <a1426z@gawab.com> wrote:
> > [...] But I can still see these awful latency blips in the presence of
> > negatively niced chew.c at -10 and two chew.c's at nice 0. [...]
>
> of course: you asked for the two chew's to be treated like that and CFS
> delivered it! :-)
>
> nice -10 means the two chew's will get ~90+% of the CPU time, and all
> other nice 0 tasks will get <10% of CPU time.
Yes, but the latencies fluctuate wildly from 5ms to
sched_granularity_ns*1000. Isn't it possible to smooth this?
Thanks!
--
Al
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7
2007-05-03 7:45 ` Ingo Molnar
2007-05-03 8:07 ` Ingo Molnar
2007-05-03 8:42 ` Al Boldi
@ 2007-05-03 15:02 ` Ting Yang
2007-05-03 15:17 ` Ingo Molnar
2 siblings, 1 reply; 26+ messages in thread
From: Ting Yang @ 2007-05-03 15:02 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel
Hi, Ingo
This is the test case that I think worth discuss and it leads me to
find 2 things.
>> [...] but there are still some nice issues.
>>
>> Try running 3 chew.c's, then renicing one to -10, starves others for
>> some seconds while switching prio-level. Now renice it back to 10, it
>> starves for up to 45sec.
>>
>
> ok - to make sure i understood you correctly: does this starvation only
> occur right when you renice it (when switching prio levels), and it gets
> rectified quickly once they get over a few reschedules?
The main problem of what Al Boldi saw might come from this piece of code
in sched_fair.c, which scales of fair_key difference needed to preempt
the current task:
+rescale_load(struct task_struct *p, u64 value)
+{
+ int load_shift = p->load_shift;
+
+ if (load_shift == SCHED_LOAD_SHIFT)
+ return value;
+
+ return (value << load_shift) >> SCHED_LOAD_SHIFT;
+}
+
+static u64
+niced_granularity(struct rq *rq, struct task_struct *curr,
+ unsigned long granularity)
+{
+ return rescale_load(curr, granularity);
+}
Here is the checking for pre-emption:
+static inline void
+__check_preempt_curr_fair(struct rq *rq, struct task_struct *p,
+ struct task_struct *curr, unsigned long granularity)
+{
+ s64 __delta = curr->fair_key - p->fair_key;
+
+ /*
+ * Take scheduling granularity into account - do not
+ * preempt the current task unless the best task has
+ * a larger than sched_granularity fairness advantage:
+ */
+ if (__delta > niced_granularity(rq, curr, granularity))
+ resched_task(curr);
+}
This code actually now says, the difference of fair_key needed to
preempt the current task is amplified by a facto of its weigh (in Al
Boldi's example 32). However, the weighted task already advance its
p->fair_key by its weight, (also 32 here). The combination of them
becomes quadratic!
Let's starting from three nice 0 tasks p1, p2, p3, at time t=0, with
niced_granularity set to be 5ms:
Originally each task executes 5 ms in turn:
5ms for p1, 5ms for p2, 5ms p3, 5ms for p1, 5ms for
p2, 5ms for p3 ...
If somehow p3 is re-niced to -10 _right before_ the 16th ms, we run
into the worst case after p3 gets the cpu:
p1->fair_key = p2 ->fair_key = 10, p3->fair_key = 5.
Now, in order for p3 to be preempted, it has to make its fair_key 5
* 32 larger than p1 and p2's fair_key. Furthermore, p3 now is higher
weight, push its fair_key to increase by 1 now needs 32ms,
thus p3 will stay one cpu for 5 * 32 *32ms, which is about 5 second!
Besides this quadratic effect, another minor issue amplified this a
little bit further: p->wait_runtime accumulated before. During renice it
is not adjusted to match the new nice value. The p->wait_runtime earned
using previous weight has to be paid off using the current weight. If
renice to larger weight you pay more than you need, otherwise you paid
less, which introduces unfairness.
Ingo, now, partially solved this problem by clearing
p->wait_runtime when a task is reniced, but the quadratic effect of
scaling is still there.
Thanks
Ting
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7
2007-05-03 15:02 ` Ting Yang
@ 2007-05-03 15:17 ` Ingo Molnar
2007-05-03 16:00 ` Ting Yang
0 siblings, 1 reply; 26+ messages in thread
From: Ingo Molnar @ 2007-05-03 15:17 UTC (permalink / raw)
To: Ting Yang; +Cc: linux-kernel
* Ting Yang <tingy@cs.umass.edu> wrote:
> + s64 __delta = curr->fair_key - p->fair_key;
> +
> + /*
> + * Take scheduling granularity into account - do not
> + * preempt the current task unless the best task has
> + * a larger than sched_granularity fairness advantage:
> + */
> + if (__delta > niced_granularity(rq, curr, granularity))
> + resched_task(curr);
> +}
>
> This code actually now says, the difference of fair_key needed to
> preempt the current task is amplified by a facto of its weigh (in Al
> Boldi's example 32). However, the weighted task already advance its
> p->fair_key by its weight, (also 32 here). The combination of them
> becomes quadratic!
it's not quadratic in terms of CPU share: the first factor impacts the
CPU share, the second factor impacts the granularity. This means that
reniced workloads will be preempted in a more finegrained way - but
otherwise there's _no_ quadratic effect for CPU time - which is a
completely separate metric. Remember: there are no timeslices in CFS, so
a task can be preempted any number of times without being at a
disadvantage.
> Besides this quadratic effect, another minor issue amplified this
> a little bit further: p->wait_runtime accumulated before. [...]
actually, this 'minor issue' was the main issue that caused the bug ;-)
Ingo
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7
2007-05-03 15:17 ` Ingo Molnar
@ 2007-05-03 16:00 ` Ting Yang
2007-05-03 19:48 ` Ingo Molnar
0 siblings, 1 reply; 26+ messages in thread
From: Ting Yang @ 2007-05-03 16:00 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel
Hi, Ingo
I wrote that email in a hurry, therefore might not explain the
problem clearly. However I do think there is a problem for this part,
after I carefully read the code again. Now I want to try again :-)
Hopefully, this time I will do a right job.
Starting from the following code:
+ if (__delta > niced_granularity(rq, curr, granularity))
+ resched_task(curr);
Suppose, "curr" has nice value -10, then curr->load_shift = 15.
Granularity passed into this function is
fixed 2,000,000 (for CFS -v8). Let's just divide everything by 1,000,000
for simplicity, say granularity used is 2.
Now, we look at how granularity is rescaled:
+ int load_shift = p->load_shift;
+
+ if (load_shift == SCHED_LOAD_SHIFT)
+ return value;
+
+ return (value << load_shift) >> SCHED_LOAD_SHIFT;
it returns (2 << 15) >> 10 = 2 * 32 = 64, therefore __delta has to be
larger than 64 so that the current process can be preempted.
Suppose, "curr" executes for 1 tick, an timer interrupts comes. It
executes about 1,000,000 (roughly speaking, since timer interrupts come
1000/second). Since we divided everything by 1,000,000, it becomes 1 in
this discussion. After this execution, how much will "curr" increments
its fair_key?
It is weighted: 1/32.
then how much time is needed for "curr" to build a 2 * 32 difference on
fair_key, with every 1 ms it updates fair_key by 1/32 ? 2 * 32 * 32 !
On the other hand, for a task has nice value 1, the amount work needed
to preemption is 2 * 1 *1.
If we have only 2 task running, p1 with nice value -10, p2 with nice
value 0.
p1 get cup share: (32 * 32) / (32 * 32 + 1 *1)
p2 get cpu share: ( 1* 1) / (32 * 32 + 1 * 1)
I do see a quadratic effect here. Did I missed anything? sorry to bother
you again, I just want to help :-)
Thanks a lot !
Ting
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7
2007-05-03 16:00 ` Ting Yang
@ 2007-05-03 19:48 ` Ingo Molnar
2007-05-03 19:57 ` William Lee Irwin III
0 siblings, 1 reply; 26+ messages in thread
From: Ingo Molnar @ 2007-05-03 19:48 UTC (permalink / raw)
To: Ting Yang; +Cc: linux-kernel
* Ting Yang <tingy@cs.umass.edu> wrote:
> then how much time is needed for "curr" to build a 2 * 32 difference
> on fair_key, with every 1 ms it updates fair_key by 1/32 ? 2 * 32 *
> 32 !
yes - but the "*32" impacts the rescheduling granularity, the "/32"
impacts the speed of how the key moves. So the total execution speed of
the nice -10 task is still "*32" of a nice 0 task - it's just that not
only it gets 32 times more CPU time, it also gets it at 32 times larger
chunks at once. But the rescheduling granularity does _not_ impact the
CPU share the task gets, so there's no quadratic effect.
but this is really simple to test: boot up CFS, start two infinite
loops, one at nice 0 and one at nice +10 and look at it via "top" and
type 's 60' in top to get a really long update interval for precise
results. You wont see quadratically less CPU time used up by the nice
+10 task, you'll see it getting the intended 1/32 share of CPU time.
Ingo
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7
2007-05-03 19:48 ` Ingo Molnar
@ 2007-05-03 19:57 ` William Lee Irwin III
0 siblings, 0 replies; 26+ messages in thread
From: William Lee Irwin III @ 2007-05-03 19:57 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Ting Yang, linux-kernel, davidel
* Ting Yang <tingy@cs.umass.edu> wrote:
>> then how much time is needed for "curr" to build a 2 * 32 difference
>> on fair_key, with every 1 ms it updates fair_key by 1/32 ? 2 * 32 *
>> 32 !
On Thu, May 03, 2007 at 09:48:27PM +0200, Ingo Molnar wrote:
> yes - but the "*32" impacts the rescheduling granularity, the "/32"
> impacts the speed of how the key moves. So the total execution speed of
> the nice -10 task is still "*32" of a nice 0 task - it's just that not
> only it gets 32 times more CPU time, it also gets it at 32 times larger
> chunks at once. But the rescheduling granularity does _not_ impact the
> CPU share the task gets, so there's no quadratic effect.
> but this is really simple to test: boot up CFS, start two infinite
> loops, one at nice 0 and one at nice +10 and look at it via "top" and
> type 's 60' in top to get a really long update interval for precise
> results. You wont see quadratically less CPU time used up by the nice
> +10 task, you'll see it getting the intended 1/32 share of CPU time.
Davide has code to test this more rigorously. Looks like I don't need
to do very much to get a nice test going at all, besides fiddling with
options parsing and maybe a few other things.
-- wli
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7
2007-04-29 17:28 ` Prakash Punnoor
@ 2007-05-04 13:05 ` Prakash Punnoor
0 siblings, 0 replies; 26+ messages in thread
From: Prakash Punnoor @ 2007-05-04 13:05 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas,
Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams,
Thomas Gleixner, caglar, Willy Tarreau, Gene Heskett, Mark Lord,
Zach Carter, Kasper Sandberg, buddabrod, Srivatsa Vaddagiri
[-- Attachment #1: Type: text/plain, Size: 935 bytes --]
Am Sonntag 29 April 2007 schrieb Prakash Punnoor:
> Am Samstag 28 April 2007 schrieb Ingo Molnar:
> > i'm pleased to announce release -v7 of the CFS scheduler patchset. (The
> > main goal of CFS is to implement "desktop scheduling" with as high
> > quality as technically possible.)
> >
> > The CFS patch against v2.6.21 (or against v2.6.20.8) can be downloaded
> > from the usual place:
>
> I made a quick test with the ac3 encoder aften, I tested against rsdl 0.36
> (I think it was). Time was slightly worse: >5.9 secs (with two threads on
> Athlon64 X2 x86_64). mainline gives me 5.4sec. rsdl took 5.8 sec.
>
> I haven't testes earlier cfs nor later sd. Seems scheduler gets optimized
> more for single core than smp?
>
> http://aften.sourceforge.net/
Just as an update: Ingo managed to fix this issue in v9. Nice work!
Cheers,
--
(°= =°)
//\ Prakash Punnoor /\\
V_/ \_V
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2007-05-04 13:05 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-30 5:20 [patch] CFS scheduler, -v7 Al Boldi
2007-05-03 7:45 ` Ingo Molnar
2007-05-03 8:07 ` Ingo Molnar
2007-05-03 11:16 ` Al Boldi
2007-05-03 12:36 ` Ingo Molnar
2007-05-03 13:49 ` Al Boldi
2007-05-03 8:42 ` Al Boldi
2007-05-03 15:02 ` Ting Yang
2007-05-03 15:17 ` Ingo Molnar
2007-05-03 16:00 ` Ting Yang
2007-05-03 19:48 ` Ingo Molnar
2007-05-03 19:57 ` William Lee Irwin III
-- strict thread matches above, loose matches on Subject: below --
2007-04-28 15:25 Ingo Molnar
2007-04-28 19:20 ` S.Çağlar Onur
2007-04-28 19:24 ` Ingo Molnar
2007-04-28 23:42 ` S.Çağlar Onur
2007-04-29 7:11 ` Ingo Molnar
2007-04-29 12:37 ` S.Çağlar Onur
2007-04-29 15:58 ` Ingo Molnar
2007-04-29 22:29 ` Dennis Brendel
2007-04-30 14:38 ` S.Çağlar Onur
2007-04-28 19:27 ` S.Çağlar Onur
2007-04-29 17:28 ` Prakash Punnoor
2007-05-04 13:05 ` Prakash Punnoor
2007-04-30 16:29 ` Srivatsa Vaddagiri
2007-04-30 18:30 ` Balbir Singh
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox