* [patch] CFS scheduler, -v7
@ 2007-04-28 15:25 Ingo Molnar
2007-04-28 19:20 ` S.Çağlar Onur
` (3 more replies)
0 siblings, 4 replies; 26+ messages in thread
From: Ingo Molnar @ 2007-04-28 15:25 UTC (permalink / raw)
To: linux-kernel
Cc: Linus Torvalds, Andrew Morton, Con Kolivas, Nick Piggin,
Mike Galbraith, Arjan van de Ven, Peter Williams, Thomas Gleixner,
caglar, Willy Tarreau, Gene Heskett, Mark Lord, Zach Carter,
Kasper Sandberg, buddabrod, Srivatsa Vaddagiri
i'm pleased to announce release -v7 of the CFS scheduler patchset. (The
main goal of CFS is to implement "desktop scheduling" with as high
quality as technically possible.)
The CFS patch against v2.6.21 (or against v2.6.20.8) can be downloaded
from the usual place:
http://redhat.com/~mingo/cfs-scheduler/
-v6 got lots of nice feedback and the -v5 list of regressions has shrunk
considerably. The most user-visible change in -v7 should be a fix for an
interactivity problem that should/could explain the 'audio skipping'
problem reported by Kasper Sandberg. (which was the only main regression
reported against -v6. Please re-report regressions, if any.)
the rate of change is moderate:
15 files changed, 150 insertions(+), 124 deletions(-)
half of that code-flux is due to the removal of the X auto-renice patch
and most of the rest is debugging related. It seems the CFS codebase is
slowly starting to settle down. (-v7 has been test-built and test-booted
on i686 and x86_64 UP and SMP systems.)
Changes since -v6:
- speedup: cache rb_leftmost better (Srivatsa Vaddagiri)
- bugfix: handle Priority Inheritance properly (Thomas Gleixner)
- interactivity fix: tighten up arithmetics some more.
- feature removal: remove the X auto-renicing feature, CONFIG_BOOST_X.
- debugging feature: introduce the sched_sleep_history_max_ns tunable
to modify sleep-history handling.
- debugging feature: /proc/<PID>/sched file contains various useful
scheduler statistics about every task.
- debugging feature: track the maximum amount of time a task has been
waiting to get on the CPU, the maximum amount of time it was blocked
involuntarily and the maximum amount of time it was sleeping
voluntarily.
As usual, any sort of feedback, bugreport, fix and suggestion is more
than welcome,
Ingo
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [patch] CFS scheduler, -v7 2007-04-28 15:25 [patch] CFS scheduler, -v7 Ingo Molnar @ 2007-04-28 19:20 ` S.Çağlar Onur 2007-04-28 19:24 ` Ingo Molnar 2007-04-28 19:27 ` S.Çağlar Onur 2007-04-29 17:28 ` Prakash Punnoor ` (2 subsequent siblings) 3 siblings, 2 replies; 26+ messages in thread From: S.Çağlar Onur @ 2007-04-28 19:20 UTC (permalink / raw) To: Ingo Molnar Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas, Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams, Thomas Gleixner, Willy Tarreau, Gene Heskett, Mark Lord, Zach Carter, Kasper Sandberg, buddabrod, Srivatsa Vaddagiri [-- Attachment #1: Type: text/plain, Size: 2760 bytes --] 28 Nis 2007 Cts tarihinde, Ingo Molnar şunları yazmıştı: > - feature removal: remove the X auto-renicing feature, CONFIG_BOOST_X. This the the first time i use CFS with without X auto-renicing and i think its not smooth as before :( While compiling something (say alsa-drivers) with just "make" firefox starts to scrolls slowly, kmail frezes time to time, applications are not responsive as before in that load and also system is not as fast as before while system idle. Also somehow boot takes longer; v6; Apr 28 16:40:41 (up 10.57) /sbin/mudur.py sysinit Apr 28 13:40:49 (up 17.75) /sbin/mudur.py boot Apr 28 13:41:00 (up 28.61) /sbin/mudur.py default v7; Apr 29 00:35:49 (up 10.61) /sbin/mudur.py sysinit Apr 28 21:36:14 (up 33.77) /sbin/mudur.py boot Apr 28 21:36:16 (up 36.21) /sbin/mudur.py default And i while trying to understand what caused this (or trying to confirm X is the problem), i realized Xorg is not the only reniced process in v6 [caglar@zangetsu][~]> diff -u top.v7 top.v6 ... PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND ... 2 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0 3 root 39 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0 - 4 root 15 -5 0 0 0 S 0.0 0.0 0:00.11 events/0 - 5 root 15 -5 0 0 0 S 0.0 0.0 0:00.01 khelper - 6 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kthread - 26 root 15 -5 0 0 0 S 0.0 0.0 0:00.02 kblockd/0 - 27 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kacpid - 124 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kseriod - 137 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kapmd + 4 root 10 -10 0 0 0 S 0.0 0.0 0:00.01 events/0 + 5 root 10 -10 0 0 0 S 0.0 0.0 0:00.01 khelper + 6 root 10 -10 0 0 0 S 0.0 0.0 0:00.00 kthread + 26 root 10 -10 0 0 0 S 0.0 0.0 0:00.03 kblockd/0 + 27 root 10 -10 0 0 0 S 0.0 0.0 0:00.00 kacpid + 124 root 10 -10 0 0 0 S 0.0 0.0 0:00.00 kseriod + 137 root 10 -10 0 0 0 S 0.0 0.0 0:00.00 kapmd ... Until now i didn't try to renice X manually to see if it changes anything in v7 but i will (ırgh, i simply boot with v6). > As usual, any sort of feedback, bugreport, fix and suggestion is more > than welcome, If you want some more output/info etc. please just say, i have both v6 and v7 available. Cheers -- S.Çağlar Onur <caglar@pardus.org.tr> http://cekirdek.pardus.org.tr/~caglar/ Linux is like living in a teepee. No Windows, no Gates and an Apache in house! [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7 2007-04-28 19:20 ` S.Çağlar Onur @ 2007-04-28 19:24 ` Ingo Molnar 2007-04-28 23:42 ` S.Çağlar Onur 2007-04-28 19:27 ` S.Çağlar Onur 1 sibling, 1 reply; 26+ messages in thread From: Ingo Molnar @ 2007-04-28 19:24 UTC (permalink / raw) To: S.Çağlar Onur Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas, Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams, Thomas Gleixner, Willy Tarreau, Gene Heskett, Mark Lord, Zach Carter, Kasper Sandberg, buddabrod, Srivatsa Vaddagiri [-- Attachment #1: Type: text/plain, Size: 314 bytes --] * S.Çağlar Onur <caglar@pardus.org.tr> wrote: > If you want some more output/info etc. please just say, i have both v6 > and v7 available. could you try the auto-renice patch ontop of -v7: http://people.redhat.com/mingo/cfs-scheduler/sched-cfs-auto-renice.patch does this make it behave like -v6? Ingo [-- Attachment #2: sched-cfs-auto-renice.patch --] [-- Type: text/plain, Size: 6597 bytes --] --- arch/i386/kernel/ioport.c | 17 ++++++++++++++--- arch/x86_64/kernel/ioport.c | 12 ++++++++++-- drivers/block/loop.c | 5 ++++- include/linux/sched.h | 7 +++++++ kernel/Kconfig.preempt | 17 +++++++++++++++++ kernel/sched.c | 40 ++++++++++++++++++++++++++++++++++++++++ kernel/workqueue.c | 2 +- mm/oom_kill.c | 4 +++- 8 files changed, 96 insertions(+), 8 deletions(-) Index: linux/arch/i386/kernel/ioport.c =================================================================== --- linux.orig/arch/i386/kernel/ioport.c +++ linux/arch/i386/kernel/ioport.c @@ -64,9 +64,17 @@ asmlinkage long sys_ioperm(unsigned long if ((from + num <= from) || (from + num > IO_BITMAP_BITS)) return -EINVAL; - if (turn_on && !capable(CAP_SYS_RAWIO)) - return -EPERM; - + if (turn_on) { + if (!capable(CAP_SYS_RAWIO)) + return -EPERM; + /* + * Task will be accessing hardware IO ports, + * mark it as special with the scheduler too: + */ +#ifdef CONFIG_BOOST_X + sched_privileged_task(current); +#endif + } /* * If it's the first ioperm() call in this thread's lifetime, set the * IO bitmap up. ioperm() is much less timing critical than clone(), @@ -145,6 +153,9 @@ asmlinkage long sys_iopl(unsigned long u if (level > old) { if (!capable(CAP_SYS_RAWIO)) return -EPERM; +#ifdef CONFIG_BOOST_X + sched_privileged_task(current); +#endif } t->iopl = level << 12; regs->eflags = (regs->eflags & ~X86_EFLAGS_IOPL) | t->iopl; Index: linux/arch/x86_64/kernel/ioport.c =================================================================== --- linux.orig/arch/x86_64/kernel/ioport.c +++ linux/arch/x86_64/kernel/ioport.c @@ -41,8 +41,13 @@ asmlinkage long sys_ioperm(unsigned long if ((from + num <= from) || (from + num > IO_BITMAP_BITS)) return -EINVAL; - if (turn_on && !capable(CAP_SYS_RAWIO)) - return -EPERM; + if (turn_on) { + if (!capable(CAP_SYS_RAWIO)) + return -EPERM; +#ifdef CONFIG_BOOST_X + sched_privileged_task(current); +#endif + } /* * If it's the first ioperm() call in this thread's lifetime, set the @@ -113,6 +118,9 @@ asmlinkage long sys_iopl(unsigned int le if (level > old) { if (!capable(CAP_SYS_RAWIO)) return -EPERM; +#ifdef CONFIG_BOOST_X + sched_privileged_task(current); +#endif } regs->eflags = (regs->eflags &~ X86_EFLAGS_IOPL) | (level << 12); return 0; Index: linux/drivers/block/loop.c =================================================================== --- linux.orig/drivers/block/loop.c +++ linux/drivers/block/loop.c @@ -588,7 +588,10 @@ static int loop_thread(void *data) */ current->flags |= PF_NOFREEZE; - set_user_nice(current, -20); + /* + * The loop thread is important enough to be given a boost: + */ + sched_privileged_task(current); while (!kthread_should_stop() || lo->lo_bio) { Index: linux/include/linux/sched.h =================================================================== --- linux.orig/include/linux/sched.h +++ linux/include/linux/sched.h @@ -1268,6 +1268,13 @@ static inline int rt_mutex_getprio(struc #endif extern void set_user_nice(struct task_struct *p, long nice); +/* + * Task has special privileges, give it more CPU power: + */ +extern void sched_privileged_task(struct task_struct *p); + +extern int sysctl_sched_privileged_nice_level; + extern int task_prio(const struct task_struct *p); extern int task_nice(const struct task_struct *p); extern int can_nice(const struct task_struct *p, const int nice); Index: linux/kernel/Kconfig.preempt =================================================================== --- linux.orig/kernel/Kconfig.preempt +++ linux/kernel/Kconfig.preempt @@ -63,3 +63,20 @@ config PREEMPT_BKL Say Y here if you are building a kernel for a desktop system. Say N if you are unsure. +config BOOST_X + bool "Boost X" + default y + help + This option instructs the kernel to guarantee more CPU time to + X than to other tasks, which is useful if you want to have a + faster desktop even under high system load. + + This option works by automatically boosting X's priority via + renicing it to -10. NOTE: CFS does not suffer from + "overscheduling" problems when X is reniced to -10, so if this + is a predominantly desktop box it makes sense to select this + option. + + Say Y here if you are building a kernel for a desktop system. + Say N if you want X to be treated as a normal task. + Index: linux/kernel/sched.c =================================================================== --- linux.orig/kernel/sched.c +++ linux/kernel/sched.c @@ -3323,6 +3323,46 @@ out_unlock: EXPORT_SYMBOL(set_user_nice); /* + * Nice level for privileged tasks. (can be set to 0 for this + * to be turned off) + */ +int sysctl_sched_privileged_nice_level __read_mostly = -10; + +static int __init privileged_nice_level_setup(char *str) +{ + sysctl_sched_privileged_nice_level = simple_strtol(str, NULL, 0); + return 1; +} +__setup("privileged_nice_level=", privileged_nice_level_setup); + +/* + * Tasks with special privileges call this and gain extra nice + * levels: + */ +void sched_privileged_task(struct task_struct *p) +{ + long new_nice = sysctl_sched_privileged_nice_level; + long old_nice = TASK_NICE(p); + + if (new_nice >= old_nice) + return; + /* + * Setting the sysctl to 0 turns off the boosting: + */ + if (unlikely(!new_nice)) + return; + + if (new_nice < -20) + new_nice = -20; + else if (new_nice > 19) + new_nice = 19; + + set_user_nice(p, new_nice); +} + +EXPORT_SYMBOL(sched_privileged_task); + +/* * can_nice - check if a task can reduce its nice value * @p: task * @nice: nice value Index: linux/kernel/workqueue.c =================================================================== --- linux.orig/kernel/workqueue.c +++ linux/kernel/workqueue.c @@ -355,7 +355,7 @@ static int worker_thread(void *__cwq) if (!cwq->freezeable) current->flags |= PF_NOFREEZE; - set_user_nice(current, -5); + sched_privileged_task(current); /* Block and flush all signals */ sigfillset(&blocked); Index: linux/mm/oom_kill.c =================================================================== --- linux.orig/mm/oom_kill.c +++ linux/mm/oom_kill.c @@ -293,7 +293,9 @@ static void __oom_kill_task(struct task_ * all the memory it needs. That way it should be able to * exit() and clear out its resources quickly... */ - p->time_slice = HZ; + if (p->policy == SCHED_NORMAL || p->policy == SCHED_BATCH) + sched_privileged_task(p); + set_tsk_thread_flag(p, TIF_MEMDIE); force_sig(SIGKILL, p); ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7 2007-04-28 19:24 ` Ingo Molnar @ 2007-04-28 23:42 ` S.Çağlar Onur 2007-04-29 7:11 ` Ingo Molnar 0 siblings, 1 reply; 26+ messages in thread From: S.Çağlar Onur @ 2007-04-28 23:42 UTC (permalink / raw) To: Ingo Molnar Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas, Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams, Thomas Gleixner, Willy Tarreau, Gene Heskett, Mark Lord, Zach Carter, Kasper Sandberg, buddabrod, Srivatsa Vaddagiri [-- Attachment #1: Type: text/plain, Size: 987 bytes --] 28 Nis 2007 Cts tarihinde, Ingo Molnar şunları yazmıştı: > * S.Çağlar Onur <caglar@pardus.org.tr> wrote: > > If you want some more output/info etc. please just say, i have both v6 > > and v7 available. > > could you try the auto-renice patch ontop of -v7: > > http://people.redhat.com/mingo/cfs-scheduler/sched-cfs-auto-renice.patch > > does this make it behave like -v6? Ingo, please ignore my first report until i found a proper way to reproduce the slowness cause currently CFS-v7, CFS-v7 + "renice patch", CFS-v7 + renice + your private mail suggestions and CFS-v6 + "PI support for futexes patch" seems works equally (which is a good thing so X renicing seems really not needed, and there were no regression instead of my daydreams) or im too tired to understand the differences. -- S.Çağlar Onur <caglar@pardus.org.tr> http://cekirdek.pardus.org.tr/~caglar/ Linux is like living in a teepee. No Windows, no Gates and an Apache in house! [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7 2007-04-28 23:42 ` S.Çağlar Onur @ 2007-04-29 7:11 ` Ingo Molnar 2007-04-29 12:37 ` S.Çağlar Onur 0 siblings, 1 reply; 26+ messages in thread From: Ingo Molnar @ 2007-04-29 7:11 UTC (permalink / raw) To: S.Çağlar Onur Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas, Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams, Thomas Gleixner, Willy Tarreau, Gene Heskett, Mark Lord, Zach Carter, Kasper Sandberg, buddabrod, Srivatsa Vaddagiri * S.Çağlar Onur <caglar@pardus.org.tr> wrote: > Ingo, please ignore my first report until i found a proper way to > reproduce the slowness cause currently CFS-v7, CFS-v7 + "renice > patch", CFS-v7 + renice + your private mail suggestions and CFS-v6 + > "PI support for futexes patch" seems works equally (which is a good > thing so X renicing seems really not needed, [...] oh, good! > [...] and there were no regression instead of my daydreams) or im too > tired to understand the differences. could the CPU have dropped speed for that bootup (some CPUs do that automatically upon overheating), or perhaps if you are using some RAID array, could it have done a background resync? Especially the bootup slowdown you saw seemed significant, and because bootup speed is 90% IO dominated, the CPU scheduler seems an unlikely candidate. Ingo ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7 2007-04-29 7:11 ` Ingo Molnar @ 2007-04-29 12:37 ` S.Çağlar Onur 2007-04-29 15:58 ` Ingo Molnar 0 siblings, 1 reply; 26+ messages in thread From: S.Çağlar Onur @ 2007-04-29 12:37 UTC (permalink / raw) To: Ingo Molnar Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas, Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams, Thomas Gleixner, Willy Tarreau, Gene Heskett, Mark Lord, Zach Carter, Kasper Sandberg, buddabrod, Srivatsa Vaddagiri [-- Attachment #1: Type: text/plain, Size: 1199 bytes --] 29 Nis 2007 Paz tarihinde, Ingo Molnar şunları yazmıştı: > > [...] and there were no regression instead of my daydreams) or im too > > tired to understand the differences. > > could the CPU have dropped speed for that bootup (some CPUs do that > automatically upon overheating), or perhaps if you are using some RAID > array, could it have done a background resync? Especially the bootup > slowdown you saw seemed significant, and because bootup speed is 90% IO > dominated, the CPU scheduler seems an unlikely candidate. It could some overheating problem but i think if it is, this is the first time it occurs :), and i don't have any array, SAMSUNG HM120JC IDE disk works on SONY VAIO FS-215B. I just boot with plain CFSv7 and boot time seems normal; Apr 29 15:02:54 (up 10.72) /sbin/mudur.py sysinit Apr 29 15:03:02 (up 17.26) /sbin/mudur.py boot Apr 29 15:03:06 (up 21.34) /sbin/mudur.py default I'll report if i can find any reproducable problem, so far CFSv7 works as expected :) Cheers -- S.Çağlar Onur <caglar@pardus.org.tr> http://cekirdek.pardus.org.tr/~caglar/ Linux is like living in a teepee. No Windows, no Gates and an Apache in house! [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7 2007-04-29 12:37 ` S.Çağlar Onur @ 2007-04-29 15:58 ` Ingo Molnar 2007-04-29 22:29 ` Dennis Brendel 2007-04-30 14:38 ` S.Çağlar Onur 0 siblings, 2 replies; 26+ messages in thread From: Ingo Molnar @ 2007-04-29 15:58 UTC (permalink / raw) To: S.Çağlar Onur Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas, Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams, Thomas Gleixner, Willy Tarreau, Gene Heskett, Mark Lord, Zach Carter, Kasper Sandberg, buddabrod, Srivatsa Vaddagiri * S.Çağlar Onur <caglar@pardus.org.tr> wrote: > I'll report if i can find any reproducable problem, so far CFSv7 works > as expected :) ok :) Ingo ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7 2007-04-29 15:58 ` Ingo Molnar @ 2007-04-29 22:29 ` Dennis Brendel 2007-04-30 14:38 ` S.Çağlar Onur 1 sibling, 0 replies; 26+ messages in thread From: Dennis Brendel @ 2007-04-29 22:29 UTC (permalink / raw) To: Ingo Molnar Cc: S.Çağlar Onur, linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas, Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams, Thomas Gleixner, Willy Tarreau, Gene Heskett, Mark Lord, Zach Carter, Kasper Sandberg, Srivatsa Vaddagiri On Sunday 29 April 2007, Ingo Molnar wrote: > * S.Çağlar Onur <caglar@pardus.org.tr> wrote: > > I'll report if i can find any reproducable problem, so far CFSv7 works > > as expected :) > > ok :) > > Ingo One small regression: I worked for hours with cfs v7 as scheduler while listening to music (with amarok/xine engine) and the music stuttered 3 times in 5 hours. After then i watched a movie and that stuttered 2 times, too. But when i watched that previously stuttering scenes again, there was no stuttering. It's not directly reproducable, it happens randomly. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7 2007-04-29 15:58 ` Ingo Molnar 2007-04-29 22:29 ` Dennis Brendel @ 2007-04-30 14:38 ` S.Çağlar Onur 1 sibling, 0 replies; 26+ messages in thread From: S.Çağlar Onur @ 2007-04-30 14:38 UTC (permalink / raw) To: Ingo Molnar Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas, Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams, Thomas Gleixner, Willy Tarreau, Gene Heskett, Mark Lord, Zach Carter, Kasper Sandberg, buddabrod, Srivatsa Vaddagiri [-- Attachment #1: Type: text/plain, Size: 1787 bytes --] Hi Ingo; 29 Nis 2007 Paz tarihinde, Ingo Molnar şunları yazmıştı: > * S.Çağlar Onur <caglar@pardus.org.tr> wrote: > > I'll report if i can find any reproducable problem, so far CFSv7 works > > as expected :) After complete 2 day usage, i still can't reproduce previous problems. And even can manage to burn a dvd with k3b with one console compiles kernel, other checkouts svn repos, amarok plays music, virtualbox boots another distro and with 3 tab opened firefox setup :). And i must say system was still usable at that time the only thing happened was a little firefox slowdown but its really negligable cause with mainline firefox/whole system really unusable if i fire a guest in virtualbox. And lastly boot time still around ~20 sec. which is expected Apr 29 01:39:04 (up 10.48) /sbin/mudur.py sysinit Apr 29 01:39:13 (up 18.83) /sbin/mudur.py boot Apr 29 01:39:18 (up 23.29) /sbin/mudur.py default ... Apr 29 15:02:54 (up 10.72) /sbin/mudur.py sysinit Apr 29 15:03:02 (up 17.26) /sbin/mudur.py boot Apr 29 15:03:06 (up 21.34) /sbin/mudur.py default ... Apr 30 14:21:56 (up 22.80) /sbin/mudur.py default Apr 30 14:28:34 (up 15.67) /sbin/mudur.py sysinit Apr 30 14:28:40 (up 21.02) /sbin/mudur.py boot ... Apr 30 14:33:26 (up 10.59) /sbin/mudur.py sysinit Apr 30 14:33:32 (up 15.97) /sbin/mudur.py boot Apr 30 14:33:35 (up 18.69) /sbin/mudur.py default ... Apr 30 17:02:09 (up 10.70) /sbin/mudur.py sysinit Apr 30 17:02:16 (up 17.15) /sbin/mudur.py boot Apr 30 17:02:19 (up 20.16) /sbin/mudur.py default So for me, there is no problem at all :)... Cheers -- S.Çağlar Onur <caglar@pardus.org.tr> http://cekirdek.pardus.org.tr/~caglar/ Linux is like living in a teepee. No Windows, no Gates and an Apache in house! [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7 2007-04-28 19:20 ` S.Çağlar Onur 2007-04-28 19:24 ` Ingo Molnar @ 2007-04-28 19:27 ` S.Çağlar Onur 1 sibling, 0 replies; 26+ messages in thread From: S.Çağlar Onur @ 2007-04-28 19:27 UTC (permalink / raw) To: Ingo Molnar Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas, Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams, Thomas Gleixner, Willy Tarreau, Gene Heskett, Mark Lord, Zach Carter, Kasper Sandberg, buddabrod, Srivatsa Vaddagiri [-- Attachment #1: Type: text/plain, Size: 699 bytes --] 28 Nis 2007 Cts tarihinde, S.Çağlar Onur şunları yazmıştı: > Also somehow boot takes longer; > > v6; > Apr 28 16:40:41 (up 10.57) /sbin/mudur.py sysinit > Apr 28 13:40:49 (up 17.75) /sbin/mudur.py boot > Apr 28 13:41:00 (up 28.61) /sbin/mudur.py default > > v7; > Apr 29 00:35:49 (up 10.61) /sbin/mudur.py sysinit > Apr 28 21:36:14 (up 33.77) /sbin/mudur.py boot > Apr 28 21:36:16 (up 36.21) /sbin/mudur.py default The ones in paranthesis (second) are correct ones, date/time screws because of my bios :( -- S.Çağlar Onur <caglar@pardus.org.tr> http://cekirdek.pardus.org.tr/~caglar/ Linux is like living in a teepee. No Windows, no Gates and an Apache in house! [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7 2007-04-28 15:25 [patch] CFS scheduler, -v7 Ingo Molnar 2007-04-28 19:20 ` S.Çağlar Onur @ 2007-04-29 17:28 ` Prakash Punnoor 2007-05-04 13:05 ` Prakash Punnoor 2007-04-30 16:29 ` Srivatsa Vaddagiri 2007-04-30 18:30 ` Balbir Singh 3 siblings, 1 reply; 26+ messages in thread From: Prakash Punnoor @ 2007-04-29 17:28 UTC (permalink / raw) To: Ingo Molnar Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas, Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams, Thomas Gleixner, caglar, Willy Tarreau, Gene Heskett, Mark Lord, Zach Carter, Kasper Sandberg, buddabrod, Srivatsa Vaddagiri [-- Attachment #1: Type: text/plain, Size: 787 bytes --] Am Samstag 28 April 2007 schrieb Ingo Molnar: > i'm pleased to announce release -v7 of the CFS scheduler patchset. (The > main goal of CFS is to implement "desktop scheduling" with as high > quality as technically possible.) > > The CFS patch against v2.6.21 (or against v2.6.20.8) can be downloaded > from the usual place: I made a quick test with the ac3 encoder aften, I tested against rsdl 0.36 (I think it was). Time was slightly worse: >5.9 secs (with two threads on Athlon64 X2 x86_64). mainline gives me 5.4sec. rsdl took 5.8 sec. I haven't testes earlier cfs nor later sd. Seems scheduler gets optimized more for single core than smp? http://aften.sourceforge.net/ Cheers, -- (°= =°) //\ Prakash Punnoor /\\ V_/ \_V [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7 2007-04-29 17:28 ` Prakash Punnoor @ 2007-05-04 13:05 ` Prakash Punnoor 0 siblings, 0 replies; 26+ messages in thread From: Prakash Punnoor @ 2007-05-04 13:05 UTC (permalink / raw) To: Ingo Molnar Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas, Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams, Thomas Gleixner, caglar, Willy Tarreau, Gene Heskett, Mark Lord, Zach Carter, Kasper Sandberg, buddabrod, Srivatsa Vaddagiri [-- Attachment #1: Type: text/plain, Size: 935 bytes --] Am Sonntag 29 April 2007 schrieb Prakash Punnoor: > Am Samstag 28 April 2007 schrieb Ingo Molnar: > > i'm pleased to announce release -v7 of the CFS scheduler patchset. (The > > main goal of CFS is to implement "desktop scheduling" with as high > > quality as technically possible.) > > > > The CFS patch against v2.6.21 (or against v2.6.20.8) can be downloaded > > from the usual place: > > I made a quick test with the ac3 encoder aften, I tested against rsdl 0.36 > (I think it was). Time was slightly worse: >5.9 secs (with two threads on > Athlon64 X2 x86_64). mainline gives me 5.4sec. rsdl took 5.8 sec. > > I haven't testes earlier cfs nor later sd. Seems scheduler gets optimized > more for single core than smp? > > http://aften.sourceforge.net/ Just as an update: Ingo managed to fix this issue in v9. Nice work! Cheers, -- (°= =°) //\ Prakash Punnoor /\\ V_/ \_V [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7 2007-04-28 15:25 [patch] CFS scheduler, -v7 Ingo Molnar 2007-04-28 19:20 ` S.Çağlar Onur 2007-04-29 17:28 ` Prakash Punnoor @ 2007-04-30 16:29 ` Srivatsa Vaddagiri 2007-04-30 18:30 ` Balbir Singh 3 siblings, 0 replies; 26+ messages in thread From: Srivatsa Vaddagiri @ 2007-04-30 16:29 UTC (permalink / raw) To: Ingo Molnar Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas, Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams, Thomas Gleixner, caglar, Willy Tarreau, Gene Heskett, Mark Lord, Zach Carter, Kasper Sandberg, buddabrod On Sat, Apr 28, 2007 at 05:25:39PM +0200, Ingo Molnar wrote: > i'm pleased to announce release -v7 of the CFS scheduler patchset. (The > main goal of CFS is to implement "desktop scheduling" with as high > quality as technically possible.) +unsigned int sysctl_sched_granularity __read_mostly = 2000000; Any reason why this tunable can't be 64-bit? -- Regards, vatsa ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7 2007-04-28 15:25 [patch] CFS scheduler, -v7 Ingo Molnar ` (2 preceding siblings ...) 2007-04-30 16:29 ` Srivatsa Vaddagiri @ 2007-04-30 18:30 ` Balbir Singh 3 siblings, 0 replies; 26+ messages in thread From: Balbir Singh @ 2007-04-30 18:30 UTC (permalink / raw) To: Ingo Molnar Cc: linux-kernel, Linus Torvalds, Andrew Morton, Con Kolivas, Nick Piggin, Mike Galbraith, Arjan van de Ven, Peter Williams, Thomas Gleixner, caglar, Willy Tarreau, Gene Heskett, Mark Lord, Zach Carter, Kasper Sandberg, buddabrod, Srivatsa Vaddagiri [-- Attachment #1: Type: text/plain, Size: 1131 bytes --] Ingo Molnar wrote: > i'm pleased to announce release -v7 of the CFS scheduler patchset. (The > main goal of CFS is to implement "desktop scheduling" with as high > quality as technically possible.) > > The CFS patch against v2.6.21 (or against v2.6.20.8) can be downloaded > from the usual place: > > http://redhat.com/~mingo/cfs-scheduler/ > Hi, Ingo, I needed the following fixes on my powerpc box to fix all warnings generated by the compiler out during compilation. Without these fixes, I was seeing negative values in /proc/sched_debug on my box. I still see a negative value for "waiting" task PID tree-key delta waiting switches prio wstart-fair sum-exec sum-wait ------------------------------------------------------------------------------------------------------------------- R bash 5594 69839216478 6238568 -6238568 1617 120 -69832977910 66377151352 27173793 I've started on cfs late (with -v7), hopefully I'll catch up. More questions, feedback will follow. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL [-- Attachment #2: cfs-v7-fix-sched-debug-warnings --] [-- Type: text/plain, Size: 3498 bytes --] Index: linux-2.6.21/kernel/sched_debug.c =================================================================== --- linux-2.6.21.orig/kernel/sched_debug.c 2007-04-30 23:17:10.000000000 +0530 +++ linux-2.6.21/kernel/sched_debug.c 2007-04-30 23:49:40.000000000 +0530 @@ -45,13 +45,13 @@ SEQ_printf(m, "%14s %5d %12Ld %11Ld %10Ld %9Ld %5d " "%13Ld %13Ld %13Ld\n", p->comm, p->pid, - p->fair_key, p->fair_key - rq->fair_clock, - p->wait_runtime, - p->nr_switches, + (long long)p->fair_key, (long long)p->fair_key - rq->fair_clock, + (long long)p->wait_runtime, + (long long)p->nr_switches, p->prio, - p->wait_start_fair - rq->fair_clock, - p->sum_exec_runtime, - p->sum_wait_runtime); + (long long)p->wait_start_fair - rq->fair_clock, + (long long)p->sum_exec_runtime, + (long long)p->sum_wait_runtime); } static void print_rq(struct seq_file *m, struct rq *rq, u64 now) @@ -83,7 +83,7 @@ SEQ_printf(m, "\ncpu: %d\n", cpu); #define P(x) \ - SEQ_printf(m, " .%-22s: %Ld\n", #x, (u64)(rq->x)) + SEQ_printf(m, " .%-22s: %Lu\n", #x, (unsigned long long)(rq->x)) P(nr_running); P(raw_weighted_load); @@ -110,7 +110,7 @@ int cpu; SEQ_printf(m, "Sched Debug Version: v0.02\n"); - SEQ_printf(m, "now at %Ld nsecs\n", (unsigned long long)now); + SEQ_printf(m, "now at %Lu nsecs\n", (unsigned long long)now); for_each_online_cpu(cpu) print_cpu(m, cpu, now); Index: linux-2.6.21/kernel/sched.c =================================================================== --- linux-2.6.21.orig/kernel/sched.c 2007-04-30 23:42:04.000000000 +0530 +++ linux-2.6.21/kernel/sched.c 2007-04-30 23:49:44.000000000 +0530 @@ -229,7 +229,7 @@ unsigned long long t0, t1; #define P(F) \ - buffer += sprintf(buffer, "%-25s:%20Ld\n", #F, p->F) + buffer += sprintf(buffer, "%-25s:%20Ld\n", #F, (long long)p->F) P(wait_start); P(wait_start_fair); @@ -248,22 +248,22 @@ t0 = sched_clock(); t1 = sched_clock(); - buffer += sprintf(buffer, "%-25s:%20Ld\n", "clock-delta", t1-t0); - buffer += sprintf(buffer, "%-25s:%20Ld\n", - "rq-wait_runtime", this_rq->wait_runtime); - buffer += sprintf(buffer, "%-25s:%20Ld\n", - "rq-fair_clock", this_rq->fair_clock); - buffer += sprintf(buffer, "%-25s:%20Ld\n", - "rq-clock", this_rq->clock); - buffer += sprintf(buffer, "%-25s:%20Ld\n", - "rq-prev_clock_raw", this_rq->prev_clock_raw); - buffer += sprintf(buffer, "%-25s:%20Ld\n", - "rq-clock_max_delta", this_rq->clock_max_delta); - buffer += sprintf(buffer, "%-25s:%20u\n", - "rq-clock_warps", this_rq->clock_warps); - buffer += sprintf(buffer, "%-25s:%20u\n", - "rq-clock_unstable_events", - this_rq->clock_unstable_events); + buffer += sprintf(buffer, "%-25s:%20Ld\n", "clock-delta", + (long long)t1-t0); + buffer += sprintf(buffer, "%-25s:%20Ld\n", "rq-wait_runtime", + (long long)this_rq->wait_runtime); + buffer += sprintf(buffer, "%-25s:%20Ld\n", "rq-fair_clock", + (long long)this_rq->fair_clock); + buffer += sprintf(buffer, "%-25s:%20Ld\n", "rq-clock", + (long long)this_rq->clock); + buffer += sprintf(buffer, "%-25s:%20Ld\n", "rq-prev_clock_raw", + (long long)this_rq->prev_clock_raw); + buffer += sprintf(buffer, "%-25s:%20Ld\n", "rq-clock_max_delta", + (long long)this_rq->clock_max_delta); + buffer += sprintf(buffer, "%-25s:%20u\n", "rq-clock_warps", + this_rq->clock_warps); + buffer += sprintf(buffer, "%-25s:%20u\n", "rq-clock_unstable_events", + this_rq->clock_unstable_events); return buffer; } ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7 @ 2007-04-30 5:20 Al Boldi 2007-05-03 7:45 ` Ingo Molnar 0 siblings, 1 reply; 26+ messages in thread From: Al Boldi @ 2007-04-30 5:20 UTC (permalink / raw) To: linux-kernel Ingo Molnar wrote: > > i'm pleased to announce release -v7 of the CFS scheduler patchset. (The > main goal of CFS is to implement "desktop scheduling" with as high > quality as technically possible.) : : > As usual, any sort of feedback, bugreport, fix and suggestion is more > than welcome, This one seems on par with SD, but there are still some nice issues. Try running 3 chew.c's, then renicing one to -10, starves others for some seconds while switching prio-level. Now renice it back to 10, it starves for up to 45sec. Also, nice levels are only effective on every other step; ie: ... -3/-2 , -1/0 , 1/2 ... yields only 20 instead of 40 prio-levels. Thanks! -- Al ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7 2007-04-30 5:20 Al Boldi @ 2007-05-03 7:45 ` Ingo Molnar 2007-05-03 8:07 ` Ingo Molnar ` (2 more replies) 0 siblings, 3 replies; 26+ messages in thread From: Ingo Molnar @ 2007-05-03 7:45 UTC (permalink / raw) To: Al Boldi; +Cc: linux-kernel * Al Boldi <a1426z@gawab.com> wrote: > > i'm pleased to announce release -v7 of the CFS scheduler patchset. > > (The main goal of CFS is to implement "desktop scheduling" with as > > high quality as technically possible.) > : > : > > As usual, any sort of feedback, bugreport, fix and suggestion is > > more than welcome, > > This one seems on par with SD, [...] excellent :-) > [...] but there are still some nice issues. > > Try running 3 chew.c's, then renicing one to -10, starves others for > some seconds while switching prio-level. Now renice it back to 10, it > starves for up to 45sec. ok - to make sure i understood you correctly: does this starvation only occur right when you renice it (when switching prio levels), and it gets rectified quickly once they get over a few reschedules? > Also, nice levels are only effective on every other step; ie: > ... -3/-2 , -1/0 , 1/2 ... yields only 20 instead of 40 prio-levels. yeah - this is a first-approximation thing. Some background: in the upstream scheduler (and in SD) nice levels are linearly scaled, while in CFS they are exponentially scaled. I did this because i believe exponential is more logical: regardless of which nice level a task uses, if it goes +2 nice levels up then it will halve its "fair CPU share". So for example the CPU consumption delta between nice 0 and nice +10 is 1/32 - and so is the delta between -5 and +5, -10 and -5, etc. This makes nice levels _alot_ more potent than upstream's linear approach. Ingo ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7 2007-05-03 7:45 ` Ingo Molnar @ 2007-05-03 8:07 ` Ingo Molnar 2007-05-03 11:16 ` Al Boldi 2007-05-03 8:42 ` Al Boldi 2007-05-03 15:02 ` Ting Yang 2 siblings, 1 reply; 26+ messages in thread From: Ingo Molnar @ 2007-05-03 8:07 UTC (permalink / raw) To: Al Boldi; +Cc: linux-kernel * Ingo Molnar <mingo@elte.hu> wrote: > > [...] but there are still some nice issues. > > > > Try running 3 chew.c's, then renicing one to -10, starves others for > > some seconds while switching prio-level. Now renice it back to 10, > > it starves for up to 45sec. > > ok - to make sure i understood you correctly: does this starvation > only occur right when you renice it (when switching prio levels), and > it gets rectified quickly once they get over a few reschedules? meanwhile i managed to reproduce it by following the exact steps you described, and i've fixed the bug in my tree. Can you confirm that the patch below fixes it for you too? Ingo -----------------> From: Ingo Molnar <mingo@elte.hu> Subject: [patch] sched, cfs: fix starvation upon nice level switching Al Boldi reported the following bug: when switching a CPU-intense task's nice levels they can get unfairly starved right after the priority level switching. The bug was that when changing the load_weight the ->wait_runtime value did not get rescaled. So clear wait_runtime when switching nice levels. Signed-off-by: Ingo Molnar <mingo@elte.hu> Index: linux/kernel/sched.c =================================================================== --- linux.orig/kernel/sched.c +++ linux/kernel/sched.c @@ -575,6 +580,7 @@ static void set_load_weight(struct task_ { p->load_shift = get_load_shift(p); p->load_weight = 1 << p->load_shift; + p->wait_runtime = 0; } static inline void ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7 2007-05-03 8:07 ` Ingo Molnar @ 2007-05-03 11:16 ` Al Boldi 2007-05-03 12:36 ` Ingo Molnar 0 siblings, 1 reply; 26+ messages in thread From: Al Boldi @ 2007-05-03 11:16 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel Ingo Molnar wrote: > * Ingo Molnar <mingo@elte.hu> wrote: > > > [...] but there are still some nice issues. > > > > > > Try running 3 chew.c's, then renicing one to -10, starves others for > > > some seconds while switching prio-level. Now renice it back to 10, > > > it starves for up to 45sec. > > > > ok - to make sure i understood you correctly: does this starvation > > only occur right when you renice it (when switching prio levels), and > > it gets rectified quickly once they get over a few reschedules? > > meanwhile i managed to reproduce it by following the exact steps you > described, and i've fixed the bug in my tree. Can you confirm that the > patch below fixes it for you too? Seems like this fixed it. But I can still see these awful latency blips in the presence of negatively niced chew.c at -10 and two chew.c's at nice 0. This gets really bad when sched_granularity_ns >= 5,000,000. Thanks! -- Al ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7 2007-05-03 11:16 ` Al Boldi @ 2007-05-03 12:36 ` Ingo Molnar 2007-05-03 13:49 ` Al Boldi 0 siblings, 1 reply; 26+ messages in thread From: Ingo Molnar @ 2007-05-03 12:36 UTC (permalink / raw) To: Al Boldi; +Cc: linux-kernel * Al Boldi <a1426z@gawab.com> wrote: > [...] But I can still see these awful latency blips in the presence of > negatively niced chew.c at -10 and two chew.c's at nice 0. [...] of course: you asked for the two chew's to be treated like that and CFS delivered it! :-) nice -10 means the two chew's will get ~90+% of the CPU time, and all other nice 0 tasks will get <10% of CPU time. in the previous mail i have described the new exponential-scale nice levels that CFS introduces. In practice this means that vanilla kernel's nice -20 level is roughly equivalent to CFS's nice -6. CFS's nice -10 would be roughly equivalent to vanilla nice -80 (if it existed). Ingo ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7 2007-05-03 12:36 ` Ingo Molnar @ 2007-05-03 13:49 ` Al Boldi 0 siblings, 0 replies; 26+ messages in thread From: Al Boldi @ 2007-05-03 13:49 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel Ingo Molnar wrote: > * Al Boldi <a1426z@gawab.com> wrote: > > [...] But I can still see these awful latency blips in the presence of > > negatively niced chew.c at -10 and two chew.c's at nice 0. [...] > > of course: you asked for the two chew's to be treated like that and CFS > delivered it! :-) > > nice -10 means the two chew's will get ~90+% of the CPU time, and all > other nice 0 tasks will get <10% of CPU time. Yes, but the latencies fluctuate wildly from 5ms to sched_granularity_ns*1000. Isn't it possible to smooth this? Thanks! -- Al ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7 2007-05-03 7:45 ` Ingo Molnar 2007-05-03 8:07 ` Ingo Molnar @ 2007-05-03 8:42 ` Al Boldi 2007-05-03 15:02 ` Ting Yang 2 siblings, 0 replies; 26+ messages in thread From: Al Boldi @ 2007-05-03 8:42 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel Ingo Molnar wrote: > * Al Boldi <a1426z@gawab.com> wrote: > > > i'm pleased to announce release -v7 of the CFS scheduler patchset. > > > (The main goal of CFS is to implement "desktop scheduling" with as > > > high quality as technically possible.) > > > > > > > > > As usual, any sort of feedback, bugreport, fix and suggestion is > > > more than welcome, > > > > This one seems on par with SD, [...] > > excellent :-) > > > [...] but there are still some nice issues. > > > > Try running 3 chew.c's, then renicing one to -10, starves others for > > some seconds while switching prio-level. Now renice it back to 10, it > > starves for up to 45sec. > > ok - to make sure i understood you correctly: does this starvation only > occur right when you renice it (when switching prio levels), Yes. > and it gets > rectified quickly once they get over a few reschedules? Well, depending on nice level this delay may be more than 45sec. And, in cfs-v8 there is an additional repeating latency blip akin to an expiry when running procs at different nice levels. chew.c shows this clearly. > > Also, nice levels are only effective on every other step; ie: > > ... -3/-2 , -1/0 , 1/2 ... yields only 20 instead of 40 prio-levels. > > yeah - this is a first-approximation thing. > > Some background: in the upstream scheduler (and in SD) nice levels are > linearly scaled, while in CFS they are exponentially scaled. I did this > because i believe exponential is more logical: regardless of which nice > level a task uses, if it goes +2 nice levels up then it will halve its > "fair CPU share". So for example the CPU consumption delta between nice > 0 and nice +10 is 1/32 - and so is the delta between -5 and +5, -10 and > -5, etc. This makes nice levels _alot_ more potent than upstream's > linear approach. Actually, I think 1/32 for +10 is a bit to strong. Introducing a scalefactor tunable may be useful. Also, don't you think it reasonable to lower-bound the timeslices? Thanks! -- Al ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7 2007-05-03 7:45 ` Ingo Molnar 2007-05-03 8:07 ` Ingo Molnar 2007-05-03 8:42 ` Al Boldi @ 2007-05-03 15:02 ` Ting Yang 2007-05-03 15:17 ` Ingo Molnar 2 siblings, 1 reply; 26+ messages in thread From: Ting Yang @ 2007-05-03 15:02 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel Hi, Ingo This is the test case that I think worth discuss and it leads me to find 2 things. >> [...] but there are still some nice issues. >> >> Try running 3 chew.c's, then renicing one to -10, starves others for >> some seconds while switching prio-level. Now renice it back to 10, it >> starves for up to 45sec. >> > > ok - to make sure i understood you correctly: does this starvation only > occur right when you renice it (when switching prio levels), and it gets > rectified quickly once they get over a few reschedules? The main problem of what Al Boldi saw might come from this piece of code in sched_fair.c, which scales of fair_key difference needed to preempt the current task: +rescale_load(struct task_struct *p, u64 value) +{ + int load_shift = p->load_shift; + + if (load_shift == SCHED_LOAD_SHIFT) + return value; + + return (value << load_shift) >> SCHED_LOAD_SHIFT; +} + +static u64 +niced_granularity(struct rq *rq, struct task_struct *curr, + unsigned long granularity) +{ + return rescale_load(curr, granularity); +} Here is the checking for pre-emption: +static inline void +__check_preempt_curr_fair(struct rq *rq, struct task_struct *p, + struct task_struct *curr, unsigned long granularity) +{ + s64 __delta = curr->fair_key - p->fair_key; + + /* + * Take scheduling granularity into account - do not + * preempt the current task unless the best task has + * a larger than sched_granularity fairness advantage: + */ + if (__delta > niced_granularity(rq, curr, granularity)) + resched_task(curr); +} This code actually now says, the difference of fair_key needed to preempt the current task is amplified by a facto of its weigh (in Al Boldi's example 32). However, the weighted task already advance its p->fair_key by its weight, (also 32 here). The combination of them becomes quadratic! Let's starting from three nice 0 tasks p1, p2, p3, at time t=0, with niced_granularity set to be 5ms: Originally each task executes 5 ms in turn: 5ms for p1, 5ms for p2, 5ms p3, 5ms for p1, 5ms for p2, 5ms for p3 ... If somehow p3 is re-niced to -10 _right before_ the 16th ms, we run into the worst case after p3 gets the cpu: p1->fair_key = p2 ->fair_key = 10, p3->fair_key = 5. Now, in order for p3 to be preempted, it has to make its fair_key 5 * 32 larger than p1 and p2's fair_key. Furthermore, p3 now is higher weight, push its fair_key to increase by 1 now needs 32ms, thus p3 will stay one cpu for 5 * 32 *32ms, which is about 5 second! Besides this quadratic effect, another minor issue amplified this a little bit further: p->wait_runtime accumulated before. During renice it is not adjusted to match the new nice value. The p->wait_runtime earned using previous weight has to be paid off using the current weight. If renice to larger weight you pay more than you need, otherwise you paid less, which introduces unfairness. Ingo, now, partially solved this problem by clearing p->wait_runtime when a task is reniced, but the quadratic effect of scaling is still there. Thanks Ting ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7 2007-05-03 15:02 ` Ting Yang @ 2007-05-03 15:17 ` Ingo Molnar 2007-05-03 16:00 ` Ting Yang 0 siblings, 1 reply; 26+ messages in thread From: Ingo Molnar @ 2007-05-03 15:17 UTC (permalink / raw) To: Ting Yang; +Cc: linux-kernel * Ting Yang <tingy@cs.umass.edu> wrote: > + s64 __delta = curr->fair_key - p->fair_key; > + > + /* > + * Take scheduling granularity into account - do not > + * preempt the current task unless the best task has > + * a larger than sched_granularity fairness advantage: > + */ > + if (__delta > niced_granularity(rq, curr, granularity)) > + resched_task(curr); > +} > > This code actually now says, the difference of fair_key needed to > preempt the current task is amplified by a facto of its weigh (in Al > Boldi's example 32). However, the weighted task already advance its > p->fair_key by its weight, (also 32 here). The combination of them > becomes quadratic! it's not quadratic in terms of CPU share: the first factor impacts the CPU share, the second factor impacts the granularity. This means that reniced workloads will be preempted in a more finegrained way - but otherwise there's _no_ quadratic effect for CPU time - which is a completely separate metric. Remember: there are no timeslices in CFS, so a task can be preempted any number of times without being at a disadvantage. > Besides this quadratic effect, another minor issue amplified this > a little bit further: p->wait_runtime accumulated before. [...] actually, this 'minor issue' was the main issue that caused the bug ;-) Ingo ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7 2007-05-03 15:17 ` Ingo Molnar @ 2007-05-03 16:00 ` Ting Yang 2007-05-03 19:48 ` Ingo Molnar 0 siblings, 1 reply; 26+ messages in thread From: Ting Yang @ 2007-05-03 16:00 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel Hi, Ingo I wrote that email in a hurry, therefore might not explain the problem clearly. However I do think there is a problem for this part, after I carefully read the code again. Now I want to try again :-) Hopefully, this time I will do a right job. Starting from the following code: + if (__delta > niced_granularity(rq, curr, granularity)) + resched_task(curr); Suppose, "curr" has nice value -10, then curr->load_shift = 15. Granularity passed into this function is fixed 2,000,000 (for CFS -v8). Let's just divide everything by 1,000,000 for simplicity, say granularity used is 2. Now, we look at how granularity is rescaled: + int load_shift = p->load_shift; + + if (load_shift == SCHED_LOAD_SHIFT) + return value; + + return (value << load_shift) >> SCHED_LOAD_SHIFT; it returns (2 << 15) >> 10 = 2 * 32 = 64, therefore __delta has to be larger than 64 so that the current process can be preempted. Suppose, "curr" executes for 1 tick, an timer interrupts comes. It executes about 1,000,000 (roughly speaking, since timer interrupts come 1000/second). Since we divided everything by 1,000,000, it becomes 1 in this discussion. After this execution, how much will "curr" increments its fair_key? It is weighted: 1/32. then how much time is needed for "curr" to build a 2 * 32 difference on fair_key, with every 1 ms it updates fair_key by 1/32 ? 2 * 32 * 32 ! On the other hand, for a task has nice value 1, the amount work needed to preemption is 2 * 1 *1. If we have only 2 task running, p1 with nice value -10, p2 with nice value 0. p1 get cup share: (32 * 32) / (32 * 32 + 1 *1) p2 get cpu share: ( 1* 1) / (32 * 32 + 1 * 1) I do see a quadratic effect here. Did I missed anything? sorry to bother you again, I just want to help :-) Thanks a lot ! Ting ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7 2007-05-03 16:00 ` Ting Yang @ 2007-05-03 19:48 ` Ingo Molnar 2007-05-03 19:57 ` William Lee Irwin III 0 siblings, 1 reply; 26+ messages in thread From: Ingo Molnar @ 2007-05-03 19:48 UTC (permalink / raw) To: Ting Yang; +Cc: linux-kernel * Ting Yang <tingy@cs.umass.edu> wrote: > then how much time is needed for "curr" to build a 2 * 32 difference > on fair_key, with every 1 ms it updates fair_key by 1/32 ? 2 * 32 * > 32 ! yes - but the "*32" impacts the rescheduling granularity, the "/32" impacts the speed of how the key moves. So the total execution speed of the nice -10 task is still "*32" of a nice 0 task - it's just that not only it gets 32 times more CPU time, it also gets it at 32 times larger chunks at once. But the rescheduling granularity does _not_ impact the CPU share the task gets, so there's no quadratic effect. but this is really simple to test: boot up CFS, start two infinite loops, one at nice 0 and one at nice +10 and look at it via "top" and type 's 60' in top to get a really long update interval for precise results. You wont see quadratically less CPU time used up by the nice +10 task, you'll see it getting the intended 1/32 share of CPU time. Ingo ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch] CFS scheduler, -v7 2007-05-03 19:48 ` Ingo Molnar @ 2007-05-03 19:57 ` William Lee Irwin III 0 siblings, 0 replies; 26+ messages in thread From: William Lee Irwin III @ 2007-05-03 19:57 UTC (permalink / raw) To: Ingo Molnar; +Cc: Ting Yang, linux-kernel, davidel * Ting Yang <tingy@cs.umass.edu> wrote: >> then how much time is needed for "curr" to build a 2 * 32 difference >> on fair_key, with every 1 ms it updates fair_key by 1/32 ? 2 * 32 * >> 32 ! On Thu, May 03, 2007 at 09:48:27PM +0200, Ingo Molnar wrote: > yes - but the "*32" impacts the rescheduling granularity, the "/32" > impacts the speed of how the key moves. So the total execution speed of > the nice -10 task is still "*32" of a nice 0 task - it's just that not > only it gets 32 times more CPU time, it also gets it at 32 times larger > chunks at once. But the rescheduling granularity does _not_ impact the > CPU share the task gets, so there's no quadratic effect. > but this is really simple to test: boot up CFS, start two infinite > loops, one at nice 0 and one at nice +10 and look at it via "top" and > type 's 60' in top to get a really long update interval for precise > results. You wont see quadratically less CPU time used up by the nice > +10 task, you'll see it getting the intended 1/32 share of CPU time. Davide has code to test this more rigorously. Looks like I don't need to do very much to get a nice test going at all, besides fiddling with options parsing and maybe a few other things. -- wli ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2007-05-04 13:05 UTC | newest] Thread overview: 26+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-04-28 15:25 [patch] CFS scheduler, -v7 Ingo Molnar 2007-04-28 19:20 ` S.Çağlar Onur 2007-04-28 19:24 ` Ingo Molnar 2007-04-28 23:42 ` S.Çağlar Onur 2007-04-29 7:11 ` Ingo Molnar 2007-04-29 12:37 ` S.Çağlar Onur 2007-04-29 15:58 ` Ingo Molnar 2007-04-29 22:29 ` Dennis Brendel 2007-04-30 14:38 ` S.Çağlar Onur 2007-04-28 19:27 ` S.Çağlar Onur 2007-04-29 17:28 ` Prakash Punnoor 2007-05-04 13:05 ` Prakash Punnoor 2007-04-30 16:29 ` Srivatsa Vaddagiri 2007-04-30 18:30 ` Balbir Singh -- strict thread matches above, loose matches on Subject: below -- 2007-04-30 5:20 Al Boldi 2007-05-03 7:45 ` Ingo Molnar 2007-05-03 8:07 ` Ingo Molnar 2007-05-03 11:16 ` Al Boldi 2007-05-03 12:36 ` Ingo Molnar 2007-05-03 13:49 ` Al Boldi 2007-05-03 8:42 ` Al Boldi 2007-05-03 15:02 ` Ting Yang 2007-05-03 15:17 ` Ingo Molnar 2007-05-03 16:00 ` Ting Yang 2007-05-03 19:48 ` Ingo Molnar 2007-05-03 19:57 ` William Lee Irwin III
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox