* scheduler nice 19 versus 'idle' behavior / static low-priority scheduling
@ 2009-01-30 5:49 Nathanael Hoyle
2009-01-30 6:16 ` Jan Engelhardt
` (4 more replies)
0 siblings, 5 replies; 31+ messages in thread
From: Nathanael Hoyle @ 2009-01-30 5:49 UTC (permalink / raw)
To: linux-kernel
All (though perhaps of special interest to a few such as Ingo, Peter,
and David),
I am posting regarding an issue I have been dealing with recently,
though this post is not really a request for troubleshooting. Instead
I'd like to ramble for just a moment about my understanding of the
current 2.6 scheduler, describe the behavior I'm seeing, and discuss a
couple of the architectural solutions I've considered, as well as pose
the question whether anyone else views this as a general-case problem
worthy of being addressed, or whether this is something that gets
ignored by and large. It is my hope that this is not too off-topic for
this group.
First, let me explain the issue I encountered. I am running a relatively
powerful system for a home desktop, an Intel Core 2 Quad Q9450 with 4 GB
of RAM. If it matters for the discussion, it also has 4 drives in an
mdraid raid-5 array, and decent I/O throughput. In normal circumstances
it is quite responsive as a desktop (kde 3.5.4 atm). It is further a
very carefully configured kernel build, including only those things
which I truly need, and excluding everything else. I often use it to
watch DVD movies, and have had no trouble with performance in general.
Recently I installed the Folding@Home client, which many of you may be
familiar with, intended to utilize spare CPU cycles to perform protein
folding simulations in order to further medical research. It is not a
multi-threaded client at this point, so it simply runs four instances on
my system, since it has four cores. It is configured to run at
nice-level 19.
Because it is heavily optimized, and needs little external data to
perform its work, it spends almost all of its time cpu-bound, with
little to no io-wait or blocking on network calls, etc. I had been
using it for about a week with no real difficulty until I went to watch
another DVD and found that the video was slightly stuttery/jerky so long
as foldingathome was running in the background. Once I shut it down,
the video playback resumed its normal smooth form.
There are a couple simple solutions to this:
Substantially boosting the process priority of the mplayer process also
returns the video to smooth playback, but this is undesirable in that it
requires manual intervention each time, and root privileges. It fails to
achieve what I want, which is for the foldingathome computation to not
interfere with anything else I may try to do. I want my compiles to be
as *exactly* as fast as they were without it as possible, etc.
Stopping foldingathome before I do something performance sensitive is
also possible, but again smacks of workaround rather than solution. The
scheduler should be able to resolve the goal without me stopping the
other work.
I have done a bit of research on how the kernel scheduler works, and why
I am seeing this behavior. I had previously, apparently ignorantly,
equated 'nice 19' with being akin to Microsoft Windows' 'idle' thread
priority, and assumed it would never steal CPU cycles from a process
with a higher(lower, depending on nomenclature) priority.
It is my current understanding that when mplayer is running (also
typically CPU bound, occassionally it becomes I/O bound briefly), one of
the instances of foldingathome, which is sharing the CPU (core) with
mplayer starts getting starved, and the scheduler dynamically rewards it
with up to four additional priority levels based on the time remaining
in its quantum which it was not allowed to execute for.
At this point, when mplayer blocks for just a moment, say to page in the
data for the next video frame, foldingathome gets scheduled again, and
gets to run for at least MIN_TIMESLICE (plus, due to the lack of kernel
pre-emptibility, possibly longer). It appears that it takes too long to
switch back to mplayer and the result is the stuttering picture I
observe.
I have tried adjusting CONFIG_HZ_xxx from 300 (where I had it) to 1000,
and noted some improvement, but not complete remedy.
In my prior searching on this, I found only one poster with the same
essential problem (from 2004, and regarding distributed.net in the
background, which is essentially the same problem). The only technical
answer given him was to perhaps try tuning the MIN_TIMESLICE value
downward. It is my understanding that this parameter is relatively
important in order to avoid cache thrashing, and I do not wish to alter
it and have not so far.
Given all of the above, I am unconvinced that I see a good overall
solution. However, one thing that seems to me a glaring weakness of the
scheduler is that only realtime priority threads can be given static
priorities. What I really want for foldingathome, and similar tasks, is
static, low priority. Something that would not boost up, no matter how
well behaved it was or how much it had been starved, or how close to the
same memory segments the needed code was.
I think that there are probably (at least) three approaches here. One I
consider unnacceptable at the outset, which is to alter the semantics of
nice 19 such that it does not boost. Since this would break existing
assumptions and code, I do not think it is feasible.
Secondly, one could add additional nice levels which would correspond to
new static priorities below the bottom of the current user ones. This
should not interfere with the O(1) scheduler implementation as I
understand it, because current I believe 5 32-bit words are used to flag
the queue usage, and 140 priorities leaves 20 more bits available for
new priorities. This has its own problems however, in that existing
tools which examine process priorities could break on priorities outside
the known 'nice' range of -20 to 19.
Finally, new scheduling classes could be introduced, together with new
system calls so that applications could select a different scheduling
class at startup. In this way, applications could volunteer to use a
scheduling class which never received dynamic 'reward' boosts that would
raise their priorities. I believe Solaris has done this since Solaris
9, with the 'FX' scheduling class.
Stepping back:
1) Is my problem 'expected' based on others' understanding of the
current design of the scheduler, or do I have a one-off problem to
troubleshoot here?
2) Am I overlooking obvious alternative (but clean) fixes?
3) Does anyone else see the need for static, but low process priorities?
4) What is the view of introducing a new scheduler class to handle this?
I welcome any further feedback on this. I will try to follow replies
on-list, but would appreciate being CC'd off-list as well. Please make
the obvious substitution to my email address in order to bypass the
spam-killer.
Thanks,
Nathanael Hoyle
^ permalink raw reply [flat|nested] 31+ messages in thread* Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling 2009-01-30 5:49 scheduler nice 19 versus 'idle' behavior / static low-priority scheduling Nathanael Hoyle @ 2009-01-30 6:16 ` Jan Engelhardt 2009-01-30 6:40 ` Nathanael Hoyle 2009-01-30 6:17 ` V.Radhakrishnan ` (3 subsequent siblings) 4 siblings, 1 reply; 31+ messages in thread From: Jan Engelhardt @ 2009-01-30 6:16 UTC (permalink / raw) To: Nathanael Hoyle; +Cc: Linux Kernel Mailing List On Friday 2009-01-30 06:49, Nathanael Hoyle wrote: > >I have done a bit of research on how the kernel scheduler works, and >why I am seeing this behavior. I had previously, apparently >ignorantly, equated 'nice 19' with being akin to Microsoft Windows' >'idle' thread priority, and assumed it would never steal CPU cycles >from a process with a higher(lower, depending on nomenclature) >priority. [...] > >One[...] is to alter the semantics of nice 19 such that it does not >boost. Since this would break existing assumptions and code, I do >not think it is feasible. [...] Finally, new scheduling classes >could be introduced[...] Surprise. There is already SCHED_BATCH (intended for computing tasks as I gathered) and SCHED_IDLE (for idle stuff). >Please make the obvious substitution to my email address in order to >bypass the spam-killer. (Obviously this is not obvious... there are no 'nospam' keywords or similar in it that could be removed.) ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling 2009-01-30 6:16 ` Jan Engelhardt @ 2009-01-30 6:40 ` Nathanael Hoyle 2009-01-30 7:21 ` Jan Engelhardt 0 siblings, 1 reply; 31+ messages in thread From: Nathanael Hoyle @ 2009-01-30 6:40 UTC (permalink / raw) To: Jan Engelhardt; +Cc: Linux Kernel Mailing List On Fri, 2009-01-30 at 07:16 +0100, Jan Engelhardt wrote: > On Friday 2009-01-30 06:49, Nathanael Hoyle wrote: > > > >I have done a bit of research on how the kernel scheduler works, and > >why I am seeing this behavior. I had previously, apparently > >ignorantly, equated 'nice 19' with being akin to Microsoft Windows' > >'idle' thread priority, and assumed it would never steal CPU cycles > >from a process with a higher(lower, depending on nomenclature) > >priority. [...] > > > >One[...] is to alter the semantics of nice 19 such that it does not > >boost. Since this would break existing assumptions and code, I do > >not think it is feasible. [...] Finally, new scheduling classes > >could be introduced[...] > > Surprise. There is already SCHED_BATCH (intended for computing tasks > as I gathered) and SCHED_IDLE (for idle stuff). > The one discussion I saw referencing SCHED_BATCH seemed to imply that it was a non-standard kernel patch by Con Kolivas in one of his -ck variants that never made it into mainline and is not being maintained. Is this inaccurate? I was unfamiliar with SCHED_IDLE. Having done a little Googling now, I finally find reference to the man page for sched_setscheduler(2). This appears that it is likely what I wanted. I think the information I had been able to find was somehwat out of date. It had indicated that the only static priority levels were the realtime ones. Is there currently a standardized userspace tool to use to run a command in order to alter its scheduling class? Obviously writing one would be trivial, but didn't know if something like: $ runidle ./foldingathome would be available. Thanks for your helpful reply. > > > >Please make the obvious substitution to my email address in order to > >bypass the spam-killer. > > (Obviously this is not obvious... there are no 'nospam' keywords or > similar in it that could be removed.) I made a failed attempt to post earlier in the evening, which included the address 'nhoyle@no-damn-spam.hoyletech.com'. When that one didn't make it to the list (though I'm unsure it had to do with the address I used) I retried with the clean address. I forgot to remove the note at the bottom of the posting. Sincerely, Nathanael Hoyle ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling 2009-01-30 6:40 ` Nathanael Hoyle @ 2009-01-30 7:21 ` Jan Engelhardt 2009-01-30 7:59 ` Nathanael Hoyle 0 siblings, 1 reply; 31+ messages in thread From: Jan Engelhardt @ 2009-01-30 7:21 UTC (permalink / raw) To: Nathanael Hoyle; +Cc: Linux Kernel Mailing List On Friday 2009-01-30 07:40, Nathanael Hoyle wrote: >On Fri, 2009-01-30 at 07:16 +0100, Jan Engelhardt wrote: > >The one discussion I saw referencing SCHED_BATCH seemed to imply that it >was a non-standard kernel patch by Con Kolivas in one of his -ck >variants that never made it into mainline and is not being maintained. >Is this inaccurate? The presence of SCHED_BATCH in linux/sched.h tells me it is available (on the other hand, SCHED_ISO, also from -ck, is only listed as a comment.) >I was unfamiliar with SCHED_IDLE. Having done a little Googling now, I >finally find reference to the man page for sched_setscheduler(2). This >appears that it is likely what I wanted. > >I think the information I had been able to find was somehwat out of >date. The manpage does say it, but if your local distro does not mention SCHED_BATCH/SCHED_IDLE, then that's a pretty sad distro. The doc in sched_setschedule seems complete to me as of man-pages 3.13. >Is there currently a standardized userspace tool to use to run a command >in order to alter its scheduling class? Obviously writing one would be >trivial, but didn't know if something like: man chrt ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling 2009-01-30 7:21 ` Jan Engelhardt @ 2009-01-30 7:59 ` Nathanael Hoyle 2009-01-30 8:07 ` Mike Galbraith ` (2 more replies) 0 siblings, 3 replies; 31+ messages in thread From: Nathanael Hoyle @ 2009-01-30 7:59 UTC (permalink / raw) To: Jan Engelhardt; +Cc: Linux Kernel Mailing List On Fri, 2009-01-30 at 08:21 +0100, Jan Engelhardt wrote: > On Friday 2009-01-30 07:40, Nathanael Hoyle wrote: > >On Fri, 2009-01-30 at 07:16 +0100, Jan Engelhardt wrote: > > > >The one discussion I saw referencing SCHED_BATCH seemed to imply that it > >was a non-standard kernel patch by Con Kolivas in one of his -ck > >variants that never made it into mainline and is not being maintained. > >Is this inaccurate? > > The presence of SCHED_BATCH in linux/sched.h tells me it is available > (on the other hand, SCHED_ISO, also from -ck, is only listed as a comment.) > Fair enough. I should have re-checked recent sources after your mention rather than going on the old thread I found. > >I was unfamiliar with SCHED_IDLE. Having done a little Googling now, I > >finally find reference to the man page for sched_setscheduler(2). This > >appears that it is likely what I wanted. > > > >I think the information I had been able to find was somehwat out of > >date. > > The manpage does say it, but if your local distro does not > mention SCHED_BATCH/SCHED_IDLE, then that's a pretty sad distro. > > The doc in sched_setschedule seems complete to me as of man-pages 3.13. > > >Is there currently a standardized userspace tool to use to run a command > >in order to alter its scheduling class? Obviously writing one would be > >trivial, but didn't know if something like: > > man chrt The latest version of man chrt that I can find implies that it handles SCHED_BATCH but not SCHED_IDLE. To that end, if anyone else is interested, I have thrown together the above-suggested 'runidle' which will invoke the passed command using the SCHED_IDLE scheduler; it's nothing fancy. I am running foldingathome under it at the moment, and it seems to be improving the situation somewhat, but I still need/want to test with Mike's referenced patches. runidle.c: #include <stdlib.h> #include <stdio.h> #include <unistd.h> #include <sched.h> #include <linux/sched.h> extern char **environ; int main(int argc, char **argv) { if(argc<2) { perror("Must specify at least one argument: the path to the program to " \ "execute. Additional arguments may be specified which will be passed " \ "to the called program."); return EXIT_FAILURE; } sched_setscheduler(0, SCHED_IDLE, NULL); if(argc==2) { if(execve(argv[1], NULL, environ) == -1) { perror("Failed to execve target!"); } } else { if(execve(argv[1], argv+1, environ) == -1) { perror("Failed to execve target!"); } } /* should be unreachable */ return EXIT_FAILURE; } -Nathanael ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling 2009-01-30 7:59 ` Nathanael Hoyle @ 2009-01-30 8:07 ` Mike Galbraith 2009-01-30 8:55 ` Nathanael Hoyle 2009-01-30 22:12 ` Brian Rogers 2009-01-30 8:16 ` Nathanael Hoyle 2009-01-30 14:15 ` Jan Engelhardt 2 siblings, 2 replies; 31+ messages in thread From: Mike Galbraith @ 2009-01-30 8:07 UTC (permalink / raw) To: Nathanael Hoyle; +Cc: Jan Engelhardt, Linux Kernel Mailing List On Fri, 2009-01-30 at 02:59 -0500, Nathanael Hoyle wrote: > I am running foldingathome under it at the moment, and it seems to be > improving the situation somewhat, but I still need/want to test with > Mike's referenced patches. You will most definitely encounter evilness running SCHED_IDLE tasks in a kernel without the SCHED_IDLE fixes. -Mike ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling 2009-01-30 8:07 ` Mike Galbraith @ 2009-01-30 8:55 ` Nathanael Hoyle 2009-01-30 9:29 ` Mike Galbraith 2009-01-30 22:12 ` Brian Rogers 1 sibling, 1 reply; 31+ messages in thread From: Nathanael Hoyle @ 2009-01-30 8:55 UTC (permalink / raw) To: Mike Galbraith; +Cc: Linux Kernel Mailing List On Fri, 2009-01-30 at 09:07 +0100, Mike Galbraith wrote: > On Fri, 2009-01-30 at 02:59 -0500, Nathanael Hoyle wrote: > > > I am running foldingathome under it at the moment, and it seems to be > > improving the situation somewhat, but I still need/want to test with > > Mike's referenced patches. > > You will most definitely encounter evilness running SCHED_IDLE tasks in > a kernel without the SCHED_IDLE fixes. > > -Mike > Mike, Any reason not to apply this fairly simple patch against the 2.6.27 series kernel I'm running now? Are there other relevant changes you're aware of in the later kernel revs for this problem? Thanks, -Nathanael ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling 2009-01-30 8:55 ` Nathanael Hoyle @ 2009-01-30 9:29 ` Mike Galbraith 0 siblings, 0 replies; 31+ messages in thread From: Mike Galbraith @ 2009-01-30 9:29 UTC (permalink / raw) To: Nathanael Hoyle; +Cc: Linux Kernel Mailing List On Fri, 2009-01-30 at 03:55 -0500, Nathanael Hoyle wrote: > Any reason not to apply this fairly simple patch against the 2.6.27 > series kernel I'm running now? Are there other relevant changes you're > aware of in the later kernel revs for this problem? You'd have to dig out a few other changes in order to apply it to 27. Yes there are other relevant changes, but unless you're familiar with the source, you'll be better off just trying 28.stable of the latest rc. You'll have to find extract and back-port otherwise. -Mike ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling 2009-01-30 8:07 ` Mike Galbraith 2009-01-30 8:55 ` Nathanael Hoyle @ 2009-01-30 22:12 ` Brian Rogers 2009-01-31 5:38 ` Mike Galbraith 1 sibling, 1 reply; 31+ messages in thread From: Brian Rogers @ 2009-01-30 22:12 UTC (permalink / raw) To: Mike Galbraith; +Cc: Nathanael Hoyle, Jan Engelhardt, Linux Kernel Mailing List Mike Galbraith wrote: > On Fri, 2009-01-30 at 02:59 -0500, Nathanael Hoyle wrote: > >> I am running foldingathome under it at the moment, and it seems to be >> improving the situation somewhat, but I still need/want to test with >> Mike's referenced patches. >> > You will most definitely encounter evilness running SCHED_IDLE tasks in > a kernel without the SCHED_IDLE fixes. > Speaking of SCHED_IDLE fixes, is 6bc912b71b6f33b041cfde93ca3f019cbaa852bc going to be put into the next stable 2.6.28 release? Without it on 2.6.28.2, I can still produce minutes-long freezes with BOINC or other idle processes. With the above commit on top of 2.6.28.2 and also cce7ade803699463ecc62a065ca522004f7ccb3d, the problem is solved, though I assume cce7ad isn't actually required to fix that, and I can test that if desired. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling 2009-01-30 22:12 ` Brian Rogers @ 2009-01-31 5:38 ` Mike Galbraith 2009-01-31 9:08 ` Mike Galbraith 0 siblings, 1 reply; 31+ messages in thread From: Mike Galbraith @ 2009-01-31 5:38 UTC (permalink / raw) To: Brian Rogers Cc: Nathanael Hoyle, Jan Engelhardt, Linux Kernel Mailing List, Ingo Molnar, Peter Zijlstra, stable On Fri, 2009-01-30 at 14:12 -0800, Brian Rogers wrote: > Mike Galbraith wrote: > > On Fri, 2009-01-30 at 02:59 -0500, Nathanael Hoyle wrote: > > > >> I am running foldingathome under it at the moment, and it seems to be > >> improving the situation somewhat, but I still need/want to test with > >> Mike's referenced patches. > >> > > You will most definitely encounter evilness running SCHED_IDLE tasks in > > a kernel without the SCHED_IDLE fixes. > > > Speaking of SCHED_IDLE fixes, is > 6bc912b71b6f33b041cfde93ca3f019cbaa852bc going to be put into the next > stable 2.6.28 release? Without it on 2.6.28.2, I can still produce > minutes-long freezes with BOINC or other idle processes. > > With the above commit on top of 2.6.28.2 and also > cce7ade803699463ecc62a065ca522004f7ccb3d, the problem is solved, though > I assume cce7ad isn't actually required to fix that, and I can test that > if desired. I think they both should go to stable, but dunno if they're headed that direction or not. One way to find out, CCs added. -Mike ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling 2009-01-31 5:38 ` Mike Galbraith @ 2009-01-31 9:08 ` Mike Galbraith 2009-02-02 23:57 ` [stable] " Greg KH 0 siblings, 1 reply; 31+ messages in thread From: Mike Galbraith @ 2009-01-31 9:08 UTC (permalink / raw) To: Brian Rogers Cc: Nathanael Hoyle, Jan Engelhardt, Linux Kernel Mailing List, Ingo Molnar, Peter Zijlstra, stable On Sat, 2009-01-31 at 06:38 +0100, Mike Galbraith wrote: > On Fri, 2009-01-30 at 14:12 -0800, Brian Rogers wrote: > > Mike Galbraith wrote: > > > On Fri, 2009-01-30 at 02:59 -0500, Nathanael Hoyle wrote: > > > > > >> I am running foldingathome under it at the moment, and it seems to be > > >> improving the situation somewhat, but I still need/want to test with > > >> Mike's referenced patches. > > >> > > > You will most definitely encounter evilness running SCHED_IDLE tasks in > > > a kernel without the SCHED_IDLE fixes. > > > > > Speaking of SCHED_IDLE fixes, is > > 6bc912b71b6f33b041cfde93ca3f019cbaa852bc going to be put into the next > > stable 2.6.28 release? Without it on 2.6.28.2, I can still produce > > minutes-long freezes with BOINC or other idle processes. > > > > With the above commit on top of 2.6.28.2 and also > > cce7ade803699463ecc62a065ca522004f7ccb3d, the problem is solved, though > > I assume cce7ad isn't actually required to fix that, and I can test that > > if desired. > > I think they both should go to stable, but dunno if they're headed that > direction or not. > > One way to find out, CCs added. For those who may want to run SCHED_IDLE tasks in .27, I've integrated and lightly tested the fixes required to do so. One additional commit was needed to get SCHED_IDLE vs nice 19 working right, namely f9c0b09. Without that, SCHED_IDLE tasks received more CPU than nice 19 tasks. Since .27 is in long-term maintenance, I'd integrate into stable, but that's not my decision. Anyone who applies the below to their stable kernel gets to keep all the pieces should something break ;-) commit f9c0b0950d5fd8c8c5af39bc061f27ea8fddcac3 Author: Peter Zijlstra <a.p.zijlstra@chello.nl> Date: Fri Oct 17 19:27:04 2008 +0200 sched: revert back to per-rq vruntime Vatsa rightly points out that having the runqueue weight in the vruntime calculations can cause unfairness in the face of task joins/leaves. Suppose: dv = dt * rw / w Then take 10 tasks t_n, each of similar weight. If the first will run 1 then its vruntime will increase by 10. Now, if the next 8 tasks leave after having run their 1, then the last task will get a vruntime increase of 2 after having run 1. Which will leave us with 2 tasks of equal weight and equal runtime, of which one will not be scheduled for 8/2=4 units of time. Ergo, we cannot do that and must use: dv = dt / w. This means we cannot have a global vruntime based on effective priority, but must instead go back to the vruntime per rq model we started out with. This patch was lightly tested by doing starting while loops on each nice level and observing their execution time, and a simple group scenario of 1:2:3 pinned to a single cpu. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- kernel/sched_fair.c | 32 +++++++++++++++----------------- 1 file changed, 15 insertions(+), 17 deletions(-) Index: linux-2.6.27/kernel/sched_fair.c =================================================================== --- linux-2.6.27.orig/kernel/sched_fair.c +++ linux-2.6.27/kernel/sched_fair.c @@ -334,7 +334,7 @@ int sched_nr_latency_handler(struct ctl_ #endif /* - * delta *= w / rw + * delta *= P[w / rw] */ static inline unsigned long calc_delta_weight(unsigned long delta, struct sched_entity *se) @@ -348,15 +348,13 @@ calc_delta_weight(unsigned long delta, s } /* - * delta *= rw / w + * delta /= w */ static inline unsigned long calc_delta_fair(unsigned long delta, struct sched_entity *se) { - for_each_sched_entity(se) { - delta = calc_delta_mine(delta, - cfs_rq_of(se)->load.weight, &se->load); - } + if (unlikely(se->load.weight != NICE_0_LOAD)) + delta = calc_delta_mine(delta, NICE_0_LOAD, &se->load); return delta; } @@ -386,26 +384,26 @@ static u64 __sched_period(unsigned long * We calculate the wall-time slice from the period by taking a part * proportional to the weight. * - * s = p*w/rw + * s = p*P[w/rw] */ static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se) { - return calc_delta_weight(__sched_period(cfs_rq->nr_running), se); + unsigned long nr_running = cfs_rq->nr_running; + + if (unlikely(!se->on_rq)) + nr_running++; + + return calc_delta_weight(__sched_period(nr_running), se); } /* * We calculate the vruntime slice of a to be inserted task * - * vs = s*rw/w = p + * vs = s/w */ -static u64 sched_vslice_add(struct cfs_rq *cfs_rq, struct sched_entity *se) +static u64 sched_vslice(struct cfs_rq *cfs_rq, struct sched_entity *se) { - unsigned long nr_running = cfs_rq->nr_running; - - if (!se->on_rq) - nr_running++; - - return __sched_period(nr_running); + return calc_delta_fair(sched_slice(cfs_rq, se), se); } /* @@ -683,7 +681,7 @@ place_entity(struct cfs_rq *cfs_rq, stru * stays open at the end. */ if (initial && sched_feat(START_DEBIT)) - vruntime += sched_vslice_add(cfs_rq, se); + vruntime += sched_vslice(cfs_rq, se); if (!initial) { /* sleeps upto a single latency don't count. */ commit 1af5f730fc1bf7c62ec9fb2d307206e18bf40a69 Author: Peter Zijlstra <a.p.zijlstra@chello.nl> Date: Fri Oct 24 11:06:13 2008 +0200 sched: more accurate min_vruntime accounting Mike noticed the current min_vruntime tracking can go wrong and skip the current task. If the only remaining task in the tree is a nice 19 task with huge vruntime, new tasks will be inserted too far to the right too, causing some interactibity issues. min_vruntime can only change due to the leftmost entry disappearing (dequeue_entity()), or by the leftmost entry being incremented past the next entry, which elects a new leftmost (__update_curr()) Due to the current entry not being part of the actual tree, we have to compare the leftmost tree entry with the current entry, and take the leftmost of these two. So create a update_min_vruntime() function that takes computes the leftmost vruntime in the system (either tree of current) and increases the cfs_rq->min_vruntime if the computed value is larger than the previously found min_vruntime. And call this from the two sites we've identified that can change min_vruntime. Reported-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- kernel/sched_fair.c | 49 +++++++++++++++++++++++++------------------------ 1 file changed, 25 insertions(+), 24 deletions(-) Index: linux-2.6.27/kernel/sched_fair.c =================================================================== --- linux-2.6.27.orig/kernel/sched_fair.c +++ linux-2.6.27/kernel/sched_fair.c @@ -221,6 +221,27 @@ static inline s64 entity_key(struct cfs_ return se->vruntime - cfs_rq->min_vruntime; } +static void update_min_vruntime(struct cfs_rq *cfs_rq) +{ + u64 vruntime = cfs_rq->min_vruntime; + + if (cfs_rq->curr) + vruntime = cfs_rq->curr->vruntime; + + if (cfs_rq->rb_leftmost) { + struct sched_entity *se = rb_entry(cfs_rq->rb_leftmost, + struct sched_entity, + run_node); + + if (vruntime == cfs_rq->min_vruntime) + vruntime = se->vruntime; + else + vruntime = min_vruntime(vruntime, se->vruntime); + } + + cfs_rq->min_vruntime = max_vruntime(cfs_rq->min_vruntime, vruntime); +} + /* * Enqueue an entity into the rb-tree: */ @@ -254,15 +275,8 @@ static void __enqueue_entity(struct cfs_ * Maintain a cache of leftmost tree entries (it is frequently * used): */ - if (leftmost) { + if (leftmost) cfs_rq->rb_leftmost = &se->run_node; - /* - * maintain cfs_rq->min_vruntime to be a monotonic increasing - * value tracking the leftmost vruntime in the tree. - */ - cfs_rq->min_vruntime = - max_vruntime(cfs_rq->min_vruntime, se->vruntime); - } rb_link_node(&se->run_node, parent, link); rb_insert_color(&se->run_node, &cfs_rq->tasks_timeline); @@ -272,18 +286,9 @@ static void __dequeue_entity(struct cfs_ { if (cfs_rq->rb_leftmost == &se->run_node) { struct rb_node *next_node; - struct sched_entity *next; next_node = rb_next(&se->run_node); cfs_rq->rb_leftmost = next_node; - - if (next_node) { - next = rb_entry(next_node, - struct sched_entity, run_node); - cfs_rq->min_vruntime = - max_vruntime(cfs_rq->min_vruntime, - next->vruntime); - } } if (cfs_rq->next == se) @@ -480,6 +485,7 @@ __update_curr(struct cfs_rq *cfs_rq, str schedstat_add(cfs_rq, exec_clock, delta_exec); delta_exec_weighted = calc_delta_fair(delta_exec, curr); curr->vruntime += delta_exec_weighted; + update_min_vruntime(cfs_rq); } static void update_curr(struct cfs_rq *cfs_rq) @@ -666,13 +672,7 @@ static void check_spread(struct cfs_rq * static void place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial) { - u64 vruntime; - - if (first_fair(cfs_rq)) { - vruntime = min_vruntime(cfs_rq->min_vruntime, - __pick_next_entity(cfs_rq)->vruntime); - } else - vruntime = cfs_rq->min_vruntime; + u64 vruntime = cfs_rq->min_vruntime; /* * The 'current' period is already promised to the current tasks, @@ -749,6 +749,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, st if (se != cfs_rq->curr) __dequeue_entity(cfs_rq, se); account_entity_dequeue(cfs_rq, se); + update_min_vruntime(cfs_rq); } /* commit e17036dac189dd034c092a91df56aa740db7146d Author: Peter Zijlstra <a.p.zijlstra@chello.nl> Date: Thu Jan 15 14:53:39 2009 +0100 sched: fix update_min_vruntime Impact: fix SCHED_IDLE latency problems OK, so we have 1 running task A (which is obviously curr and the tree is equally obviously empty). 'A' nicely chugs along, doing its thing, carrying min_vruntime along as it goes. Then some whacko speed freak SCHED_IDLE task gets inserted due to SMP balancing, which is very likely far right, in that case update_curr update_min_vruntime cfs_rq->rb_leftmost := true (the crazy task sitting in a tree) vruntime = se->vruntime and voila, min_vruntime is waaay right of where it ought to be. OK, so why did I write it like that to begin with... Aah, yes. Say we've just dequeued current schedule deactivate_task(prev) dequeue_entity update_min_vruntime Then we'll set vruntime = cfs_rq->min_vruntime; we find !cfs_rq->curr, but do find someone in the tree. Then we _must_ do vruntime = se->vruntime, because vruntime = min_vruntime(vruntime := cfs_rq->min_vruntime, se->vruntime) will not advance vruntime, and cause lags the other way around (which we fixed with that initial patch: 1af5f730fc1bf7c62ec9fb2d307206e18bf40a69 (sched: more accurate min_vruntime accounting). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Mike Galbraith <efault@gmx.de> Acked-by: Mike Galbraith <efault@gmx.de> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- kernel/sched_fair.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6.27/kernel/sched_fair.c =================================================================== --- linux-2.6.27.orig/kernel/sched_fair.c +++ linux-2.6.27/kernel/sched_fair.c @@ -233,7 +233,7 @@ static void update_min_vruntime(struct c struct sched_entity, run_node); - if (vruntime == cfs_rq->min_vruntime) + if (!cfs_rq->curr) vruntime = se->vruntime; else vruntime = min_vruntime(vruntime, se->vruntime); commit 6bc912b71b6f33b041cfde93ca3f019cbaa852bc Author: Peter Zijlstra <a.p.zijlstra@chello.nl> Date: Thu Jan 15 14:53:38 2009 +0100 sched: SCHED_OTHER vs SCHED_IDLE isolation Stronger SCHED_IDLE isolation: - no SCHED_IDLE buddies - never let SCHED_IDLE preempt on wakeup - always preempt SCHED_IDLE on wakeup - limit SLEEPER fairness for SCHED_IDLE. Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- kernel/sched_fair.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-) Index: linux-2.6.27/kernel/sched_fair.c =================================================================== --- linux-2.6.27.orig/kernel/sched_fair.c +++ linux-2.6.27/kernel/sched_fair.c @@ -689,9 +689,13 @@ place_entity(struct cfs_rq *cfs_rq, stru unsigned long thresh = sysctl_sched_latency; /* - * convert the sleeper threshold into virtual time + * Convert the sleeper threshold into virtual time. + * SCHED_IDLE is a special sub-class. We care about + * fairness only relative to other SCHED_IDLE tasks, + * all of which have the same weight. */ - if (sched_feat(NORMALIZED_SLEEPER)) + if (sched_feat(NORMALIZED_SLEEPER) && + task_of(se)->policy != SCHED_IDLE) thresh = calc_delta_fair(thresh, se); vruntime -= thresh; @@ -1347,15 +1351,22 @@ static void check_preempt_wakeup(struct if (unlikely(se == pse)) return; - cfs_rq_of(pse)->next = pse; + if (likely(task_of(se)->policy != SCHED_IDLE)) + cfs_rq_of(pse)->next = pse; /* - * Batch tasks do not preempt (their preemption is driven by + * Batch and idle tasks do not preempt (their preemption is driven by * the tick): */ - if (unlikely(p->policy == SCHED_BATCH)) + if (unlikely(p->policy != SCHED_NORMAL)) return; + /* Idle tasks are by definition preempted by everybody. */ + if (unlikely(curr->policy == SCHED_IDLE)) { + resched_task(curr); + return; + } + if (!sched_feat(WAKEUP_PREEMPT)) return; ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [stable] scheduler nice 19 versus 'idle' behavior / static low-priority scheduling 2009-01-31 9:08 ` Mike Galbraith @ 2009-02-02 23:57 ` Greg KH 2009-02-09 15:19 ` Brian Rogers 0 siblings, 1 reply; 31+ messages in thread From: Greg KH @ 2009-02-02 23:57 UTC (permalink / raw) To: Mike Galbraith Cc: Brian Rogers, Peter Zijlstra, Linux Kernel Mailing List, Jan Engelhardt, Ingo Molnar, Nathanael Hoyle, stable On Sat, Jan 31, 2009 at 10:08:13AM +0100, Mike Galbraith wrote: > On Sat, 2009-01-31 at 06:38 +0100, Mike Galbraith wrote: > > On Fri, 2009-01-30 at 14:12 -0800, Brian Rogers wrote: > > > Mike Galbraith wrote: > > > > On Fri, 2009-01-30 at 02:59 -0500, Nathanael Hoyle wrote: > > > > > > > >> I am running foldingathome under it at the moment, and it seems to be > > > >> improving the situation somewhat, but I still need/want to test with > > > >> Mike's referenced patches. > > > >> > > > > You will most definitely encounter evilness running SCHED_IDLE tasks in > > > > a kernel without the SCHED_IDLE fixes. > > > > > > > Speaking of SCHED_IDLE fixes, is > > > 6bc912b71b6f33b041cfde93ca3f019cbaa852bc going to be put into the next > > > stable 2.6.28 release? Without it on 2.6.28.2, I can still produce > > > minutes-long freezes with BOINC or other idle processes. > > > > > > With the above commit on top of 2.6.28.2 and also > > > cce7ade803699463ecc62a065ca522004f7ccb3d, the problem is solved, though > > > I assume cce7ad isn't actually required to fix that, and I can test that > > > if desired. > > > > I think they both should go to stable, but dunno if they're headed that > > direction or not. > > > > One way to find out, CCs added. > > For those who may want to run SCHED_IDLE tasks in .27, I've integrated > and lightly tested the fixes required to do so. One additional commit > was needed to get SCHED_IDLE vs nice 19 working right, namely f9c0b09. > Without that, SCHED_IDLE tasks received more CPU than nice 19 tasks. > > Since .27 is in long-term maintenance, I'd integrate into stable, but > that's not my decision. Anyone who applies the below to their stable > kernel gets to keep all the pieces should something break ;-) I'm going to hold off and not do this, as it seems too risky. But thanks for the pointers, perhaps someone else will want to do this for their distro kernels if they have problems with this. thanks, greg k-h ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [stable] scheduler nice 19 versus 'idle' behavior / static low-priority scheduling 2009-02-02 23:57 ` [stable] " Greg KH @ 2009-02-09 15:19 ` Brian Rogers 2009-02-09 15:51 ` Greg KH 0 siblings, 1 reply; 31+ messages in thread From: Brian Rogers @ 2009-02-09 15:19 UTC (permalink / raw) To: Greg KH Cc: Mike Galbraith, Peter Zijlstra, Linux Kernel Mailing List, Jan Engelhardt, Ingo Molnar, Nathanael Hoyle, stable Greg KH wrote: > On Sat, Jan 31, 2009 at 10:08:13AM +0100, Mike Galbraith wrote: > >> On Sat, 2009-01-31 at 06:38 +0100, Mike Galbraith wrote: >> >>> On Fri, 2009-01-30 at 14:12 -0800, Brian Rogers wrote: >>> >>>> Mike Galbraith wrote: >>>> >>>>> On Fri, 2009-01-30 at 02:59 -0500, Nathanael Hoyle wrote: >>>>> >>>>> >>>>>> I am running foldingathome under it at the moment, and it seems to be >>>>>> improving the situation somewhat, but I still need/want to test with >>>>>> Mike's referenced patches. >>>>>> >>>>>> >>>>> You will most definitely encounter evilness running SCHED_IDLE tasks in >>>>> a kernel without the SCHED_IDLE fixes. >>>>> >>>>> >>>> Speaking of SCHED_IDLE fixes, is >>>> 6bc912b71b6f33b041cfde93ca3f019cbaa852bc going to be put into the next >>>> stable 2.6.28 release? Without it on 2.6.28.2, I can still produce >>>> minutes-long freezes with BOINC or other idle processes. >>>> >>>> With the above commit on top of 2.6.28.2 and also >>>> cce7ade803699463ecc62a065ca522004f7ccb3d, the problem is solved, though >>>> I assume cce7ad isn't actually required to fix that, and I can test that >>>> if desired. >>>> >>> I think they both should go to stable, but dunno if they're headed that >>> direction or not. >>> >>> One way to find out, CCs added. >>> >> For those who may want to run SCHED_IDLE tasks in .27, I've integrated >> and lightly tested the fixes required to do so. One additional commit >> was needed to get SCHED_IDLE vs nice 19 working right, namely f9c0b09. >> Without that, SCHED_IDLE tasks received more CPU than nice 19 tasks. >> >> Since .27 is in long-term maintenance, I'd integrate into stable, but >> that's not my decision. Anyone who applies the below to their stable >> kernel gets to keep all the pieces should something break ;-) >> > > I'm going to hold off and not do this, as it seems too risky. > > But thanks for the pointers, perhaps someone else will want to do this > for their distro kernels if they have problems with this. > Is this statement meant to apply to both 2.6.27 and 2.6.28, or just 2.6.27? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [stable] scheduler nice 19 versus 'idle' behavior / static low-priority scheduling 2009-02-09 15:19 ` Brian Rogers @ 2009-02-09 15:51 ` Greg KH 0 siblings, 0 replies; 31+ messages in thread From: Greg KH @ 2009-02-09 15:51 UTC (permalink / raw) To: Brian Rogers Cc: Mike Galbraith, Peter Zijlstra, Linux Kernel Mailing List, Jan Engelhardt, Ingo Molnar, Nathanael Hoyle, stable On Mon, Feb 09, 2009 at 07:19:36AM -0800, Brian Rogers wrote: > Greg KH wrote: >> On Sat, Jan 31, 2009 at 10:08:13AM +0100, Mike Galbraith wrote: >> >>> On Sat, 2009-01-31 at 06:38 +0100, Mike Galbraith wrote: >>> >>>> On Fri, 2009-01-30 at 14:12 -0800, Brian Rogers wrote: >>>> >>>>> Mike Galbraith wrote: >>>>> >>>>>> On Fri, 2009-01-30 at 02:59 -0500, Nathanael Hoyle wrote: >>>>>> >>>>>>> I am running foldingathome under it at the moment, and it seems to be >>>>>>> improving the situation somewhat, but I still need/want to test with >>>>>>> Mike's referenced patches. >>>>>>> >>>>>> You will most definitely encounter evilness running SCHED_IDLE tasks >>>>>> in >>>>>> a kernel without the SCHED_IDLE fixes. >>>>>> >>>>> Speaking of SCHED_IDLE fixes, is >>>>> 6bc912b71b6f33b041cfde93ca3f019cbaa852bc going to be put into the next >>>>> stable 2.6.28 release? Without it on 2.6.28.2, I can still produce >>>>> minutes-long freezes with BOINC or other idle processes. >>>>> >>>>> With the above commit on top of 2.6.28.2 and also >>>>> cce7ade803699463ecc62a065ca522004f7ccb3d, the problem is solved, though >>>>> I assume cce7ad isn't actually required to fix that, and I can test >>>>> that if desired. >>>>> >>>> I think they both should go to stable, but dunno if they're headed that >>>> direction or not. >>>> >>>> One way to find out, CCs added. >>>> >>> For those who may want to run SCHED_IDLE tasks in .27, I've integrated >>> and lightly tested the fixes required to do so. One additional commit >>> was needed to get SCHED_IDLE vs nice 19 working right, namely f9c0b09. >>> Without that, SCHED_IDLE tasks received more CPU than nice 19 tasks. >>> >>> Since .27 is in long-term maintenance, I'd integrate into stable, but >>> that's not my decision. Anyone who applies the below to their stable >>> kernel gets to keep all the pieces should something break ;-) >>> >> >> I'm going to hold off and not do this, as it seems too risky. >> >> But thanks for the pointers, perhaps someone else will want to do this >> for their distro kernels if they have problems with this. >> > > Is this statement meant to apply to both 2.6.27 and 2.6.28, or just 2.6.27? Both. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling 2009-01-30 7:59 ` Nathanael Hoyle 2009-01-30 8:07 ` Mike Galbraith @ 2009-01-30 8:16 ` Nathanael Hoyle 2009-01-30 13:56 ` Jan Engelhardt 2009-01-30 14:15 ` Jan Engelhardt 2 siblings, 1 reply; 31+ messages in thread From: Nathanael Hoyle @ 2009-01-30 8:16 UTC (permalink / raw) To: Jan Engelhardt; +Cc: Linux Kernel Mailing List On Fri, 2009-01-30 at 02:59 -0500, Nathanael Hoyle wrote: > On Fri, 2009-01-30 at 08:21 +0100, Jan Engelhardt wrote: > > On Friday 2009-01-30 07:40, Nathanael Hoyle wrote: > > >On Fri, 2009-01-30 at 07:16 +0100, Jan Engelhardt wrote: > > > > > >The one discussion I saw referencing SCHED_BATCH seemed to imply that it > > >was a non-standard kernel patch by Con Kolivas in one of his -ck > > >variants that never made it into mainline and is not being maintained. > > >Is this inaccurate? > > > > The presence of SCHED_BATCH in linux/sched.h tells me it is available > > (on the other hand, SCHED_ISO, also from -ck, is only listed as a comment.) > > > > Fair enough. I should have re-checked recent sources after your mention > rather than going on the old thread I found. > > > >I was unfamiliar with SCHED_IDLE. Having done a little Googling now, I > > >finally find reference to the man page for sched_setscheduler(2). This > > >appears that it is likely what I wanted. > > > > > >I think the information I had been able to find was somehwat out of > > >date. > > > > The manpage does say it, but if your local distro does not > > mention SCHED_BATCH/SCHED_IDLE, then that's a pretty sad distro. > > > > The doc in sched_setschedule seems complete to me as of man-pages 3.13. > > > > >Is there currently a standardized userspace tool to use to run a command > > >in order to alter its scheduling class? Obviously writing one would be > > >trivial, but didn't know if something like: > > > > man chrt > > The latest version of man chrt that I can find implies that it handles > SCHED_BATCH but not SCHED_IDLE. To that end, if anyone else is > interested, I have thrown together the above-suggested 'runidle' which > will invoke the passed command using the SCHED_IDLE scheduler; it's > nothing fancy. > > I am running foldingathome under it at the moment, and it seems to be > improving the situation somewhat, but I still need/want to test with > Mike's referenced patches. > <snipped old version, because of a fixed goof, and this has better formatting for mail client> #include <stdlib.h> #include <stdio.h> #include <unistd.h> #include <sched.h> #include <linux/sched.h> extern char **environ; int main(int argc, char **argv) { struct sched_param sp; sp.sched_priority = 0; if(argc<2) { perror("Must specify at least one argument: the path to " \ "the program to execute. Additional arguments may be " \ "specified which will be passed to the called program."); return EXIT_FAILURE; } if(sched_setscheduler(0, SCHED_IDLE, &sp) == -1) { perror("Failed to alter scheduling class!"); return EXIT_FAILURE; } if(argc==2) { if(execve(argv[1], NULL, environ) == -1) { perror("Failed to execve target!"); } } else { if(execve(argv[1], argv+1, environ) == -1) { perror("Failed to execve target!"); } } /* should be unreachable */ return EXIT_FAILURE; } > -Nathanael ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling 2009-01-30 8:16 ` Nathanael Hoyle @ 2009-01-30 13:56 ` Jan Engelhardt 0 siblings, 0 replies; 31+ messages in thread From: Jan Engelhardt @ 2009-01-30 13:56 UTC (permalink / raw) To: Nathanael Hoyle; +Cc: Linux Kernel Mailing List On Friday 2009-01-30 09:16, Nathanael Hoyle wrote: ><snipped old version, because of a fixed goof, and this has better >formatting for mail client> > >extern char **environ; >[...] > if(argc==2) { > if(execve(argv[1], NULL, environ) == -1) { > perror("Failed to execve target!"); > } > } else { > if(execve(argv[1], argv+1, environ) == -1) { > perror("Failed to execve target!"); > } > } Are you sure your first execve even works? I would have used if (execvp(argv[1], &argv[1]) < 0) ... just so (a) I do not have to deal with the ugly 'extern char **environ' [such should have been in a libc header imho] or use int main(int argc, char **argv, char **envp); (b) execvp so that it looks through $PATH, just as /bin/su (resp. the shell it starts) would do. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling 2009-01-30 7:59 ` Nathanael Hoyle 2009-01-30 8:07 ` Mike Galbraith 2009-01-30 8:16 ` Nathanael Hoyle @ 2009-01-30 14:15 ` Jan Engelhardt 2 siblings, 0 replies; 31+ messages in thread From: Jan Engelhardt @ 2009-01-30 14:15 UTC (permalink / raw) To: Nathanael Hoyle; +Cc: Linux Kernel Mailing List On Friday 2009-01-30 08:59, Nathanael Hoyle wrote: >> >Is there currently a standardized userspace tool to use to run a command >> >in order to alter its scheduling class? Obviously writing one would be >> >trivial, but didn't know if something like: >> >> man chrt > >The latest version of man chrt that I can find implies that it handles >SCHED_BATCH but not SCHED_IDLE. To that end, if anyone else is >interested, I have thrown together the above-suggested 'runidle' which >will invoke the passed command using the SCHED_IDLE scheduler; it's >nothing fancy. Should have added -i to chrt instead and submit ;-) ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling 2009-01-30 5:49 scheduler nice 19 versus 'idle' behavior / static low-priority scheduling Nathanael Hoyle 2009-01-30 6:16 ` Jan Engelhardt @ 2009-01-30 6:17 ` V.Radhakrishnan 2009-01-30 6:48 ` Nathanael Hoyle 2009-01-30 6:24 ` Mike Galbraith ` (2 subsequent siblings) 4 siblings, 1 reply; 31+ messages in thread From: V.Radhakrishnan @ 2009-01-30 6:17 UTC (permalink / raw) To: Nathanael Hoyle; +Cc: linux-kernel Clear description of a "problem" which may not be directly "solvable" unless one becomes philosophical. The linux scheduler by default is a "FAIR" scheduler which means that every runnable process ready to occupy CPU time will at some point RUN and never be denied cpu time. This is possible by boosting the dynamic priority of runnable processes so that one day they rule the roost. However, the kernel also supports SCHED_FIFO and SCHED_RR which supports Real Time capabilities, albeit as root. The DVD player is a soft real time application where the display gets jittery whenever the frame display rate is not achievable. If you wish 100% smooth display, you could make it run as SCHED_FIFO which means that your foldingathome would wait quietly for the movie to get completed fully. What's "wrong" with that aproach, which is essentially what you want ? The key question is whether YOU want the foldingathome application to run in parallel with the dvd player or not. Lets wait for more light from the gurus...!! V. Radhakrishnan On Fri, 2009-01-30 at 00:49 -0500, Nathanael Hoyle wrote: > All (though perhaps of special interest to a few such as Ingo, Peter, > and David), > > I am posting regarding an issue I have been dealing with recently, > though this post is not really a request for troubleshooting. Instead > I'd like to ramble for just a moment about my understanding of the > current 2.6 scheduler, describe the behavior I'm seeing, and discuss a > couple of the architectural solutions I've considered, as well as pose > the question whether anyone else views this as a general-case problem > worthy of being addressed, or whether this is something that gets > ignored by and large. It is my hope that this is not too off-topic for > this group. > > First, let me explain the issue I encountered. I am running a relatively > powerful system for a home desktop, an Intel Core 2 Quad Q9450 with 4 GB > of RAM. If it matters for the discussion, it also has 4 drives in an > mdraid raid-5 array, and decent I/O throughput. In normal circumstances > it is quite responsive as a desktop (kde 3.5.4 atm). It is further a > very carefully configured kernel build, including only those things > which I truly need, and excluding everything else. I often use it to > watch DVD movies, and have had no trouble with performance in general. > > Recently I installed the Folding@Home client, which many of you may be > familiar with, intended to utilize spare CPU cycles to perform protein > folding simulations in order to further medical research. It is not a > multi-threaded client at this point, so it simply runs four instances on > my system, since it has four cores. It is configured to run at > nice-level 19. > > Because it is heavily optimized, and needs little external data to > perform its work, it spends almost all of its time cpu-bound, with > little to no io-wait or blocking on network calls, etc. I had been > using it for about a week with no real difficulty until I went to watch > another DVD and found that the video was slightly stuttery/jerky so long > as foldingathome was running in the background. Once I shut it down, > the video playback resumed its normal smooth form. > > There are a couple simple solutions to this: > > Substantially boosting the process priority of the mplayer process also > returns the video to smooth playback, but this is undesirable in that it > requires manual intervention each time, and root privileges. It fails to > achieve what I want, which is for the foldingathome computation to not > interfere with anything else I may try to do. I want my compiles to be > as *exactly* as fast as they were without it as possible, etc. > > Stopping foldingathome before I do something performance sensitive is > also possible, but again smacks of workaround rather than solution. The > scheduler should be able to resolve the goal without me stopping the > other work. > > I have done a bit of research on how the kernel scheduler works, and why > I am seeing this behavior. I had previously, apparently ignorantly, > equated 'nice 19' with being akin to Microsoft Windows' 'idle' thread > priority, and assumed it would never steal CPU cycles from a process > with a higher(lower, depending on nomenclature) priority. > > It is my current understanding that when mplayer is running (also > typically CPU bound, occassionally it becomes I/O bound briefly), one of > the instances of foldingathome, which is sharing the CPU (core) with > mplayer starts getting starved, and the scheduler dynamically rewards it > with up to four additional priority levels based on the time remaining > in its quantum which it was not allowed to execute for. > > At this point, when mplayer blocks for just a moment, say to page in the > data for the next video frame, foldingathome gets scheduled again, and > gets to run for at least MIN_TIMESLICE (plus, due to the lack of kernel > pre-emptibility, possibly longer). It appears that it takes too long to > switch back to mplayer and the result is the stuttering picture I > observe. > > I have tried adjusting CONFIG_HZ_xxx from 300 (where I had it) to 1000, > and noted some improvement, but not complete remedy. > > In my prior searching on this, I found only one poster with the same > essential problem (from 2004, and regarding distributed.net in the > background, which is essentially the same problem). The only technical > answer given him was to perhaps try tuning the MIN_TIMESLICE value > downward. It is my understanding that this parameter is relatively > important in order to avoid cache thrashing, and I do not wish to alter > it and have not so far. > > Given all of the above, I am unconvinced that I see a good overall > solution. However, one thing that seems to me a glaring weakness of the > scheduler is that only realtime priority threads can be given static > priorities. What I really want for foldingathome, and similar tasks, is > static, low priority. Something that would not boost up, no matter how > well behaved it was or how much it had been starved, or how close to the > same memory segments the needed code was. > > I think that there are probably (at least) three approaches here. One I > consider unnacceptable at the outset, which is to alter the semantics of > nice 19 such that it does not boost. Since this would break existing > assumptions and code, I do not think it is feasible. > > Secondly, one could add additional nice levels which would correspond to > new static priorities below the bottom of the current user ones. This > should not interfere with the O(1) scheduler implementation as I > understand it, because current I believe 5 32-bit words are used to flag > the queue usage, and 140 priorities leaves 20 more bits available for > new priorities. This has its own problems however, in that existing > tools which examine process priorities could break on priorities outside > the known 'nice' range of -20 to 19. > > Finally, new scheduling classes could be introduced, together with new > system calls so that applications could select a different scheduling > class at startup. In this way, applications could volunteer to use a > scheduling class which never received dynamic 'reward' boosts that would > raise their priorities. I believe Solaris has done this since Solaris > 9, with the 'FX' scheduling class. > > Stepping back: > > 1) Is my problem 'expected' based on others' understanding of the > current design of the scheduler, or do I have a one-off problem to > troubleshoot here? > > 2) Am I overlooking obvious alternative (but clean) fixes? > > 3) Does anyone else see the need for static, but low process priorities? > > 4) What is the view of introducing a new scheduler class to handle this? > > I welcome any further feedback on this. I will try to follow replies > on-list, but would appreciate being CC'd off-list as well. Please make > the obvious substitution to my email address in order to bypass the > spam-killer. > > Thanks, > Nathanael Hoyle > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling 2009-01-30 6:17 ` V.Radhakrishnan @ 2009-01-30 6:48 ` Nathanael Hoyle 2009-01-30 14:15 ` Jan Engelhardt 0 siblings, 1 reply; 31+ messages in thread From: Nathanael Hoyle @ 2009-01-30 6:48 UTC (permalink / raw) To: V.Radhakrishnan; +Cc: linux-kernel On Fri, 2009-01-30 at 11:47 +0530, V.Radhakrishnan wrote: > Clear description of a "problem" which may not be directly "solvable" > unless one becomes philosophical. > > The linux scheduler by default is a "FAIR" scheduler which means that > every runnable process ready to occupy CPU time will at some point RUN > and never be denied cpu time. This is possible by boosting the dynamic > priority of runnable processes so that one day they rule the roost. > > However, the kernel also supports SCHED_FIFO and SCHED_RR which supports > Real Time capabilities, albeit as root. > > The DVD player is a soft real time application where the display gets > jittery whenever the frame display rate is not achievable. > > If you wish 100% smooth display, you could make it run as SCHED_FIFO > which means that your foldingathome would wait quietly for the movie to > get completed fully. What's "wrong" with that aproach, which is > essentially what you want ? My view of what's "wrong" with that approach is that it requires root privileges to boost the scheduling priority of each and every process (although in this case, mplayer is the issue) which I want to not be affected by foldingathome's CPU usage. While I happen to be root on this system, since it is my desktop, I would imagine there are instances where the root user/administrator of a system wanted to be able to run items which had no impact on other users, including allowing them to run fast and responsive applications. Aside from that, it's a PITA to start mplayer playing, go renice -19 it and resume watching my movie every time. > > The key question is whether YOU want the foldingathome application to > run in parallel with the dvd player or not. > I want it to be able to run in the background without my having to intervene, but I want the core which shares mplayer to starve foldingathome until my movie is over :-). > Lets wait for more light from the gurus...!! > > V. Radhakrishnan > > > On Fri, 2009-01-30 at 00:49 -0500, Nathanael Hoyle wrote: > > All (though perhaps of special interest to a few such as Ingo, Peter, > > and David), > > > > I am posting regarding an issue I have been dealing with recently, > > though this post is not really a request for troubleshooting. Instead > > I'd like to ramble for just a moment about my understanding of the > > current 2.6 scheduler, describe the behavior I'm seeing, and discuss a > > couple of the architectural solutions I've considered, as well as pose > > the question whether anyone else views this as a general-case problem > > worthy of being addressed, or whether this is something that gets > > ignored by and large. It is my hope that this is not too off-topic for > > this group. > > > > First, let me explain the issue I encountered. I am running a relatively > > powerful system for a home desktop, an Intel Core 2 Quad Q9450 with 4 GB > > of RAM. If it matters for the discussion, it also has 4 drives in an > > mdraid raid-5 array, and decent I/O throughput. In normal circumstances > > it is quite responsive as a desktop (kde 3.5.4 atm). It is further a > > very carefully configured kernel build, including only those things > > which I truly need, and excluding everything else. I often use it to > > watch DVD movies, and have had no trouble with performance in general. > > > > Recently I installed the Folding@Home client, which many of you may be > > familiar with, intended to utilize spare CPU cycles to perform protein > > folding simulations in order to further medical research. It is not a > > multi-threaded client at this point, so it simply runs four instances on > > my system, since it has four cores. It is configured to run at > > nice-level 19. > > > > Because it is heavily optimized, and needs little external data to > > perform its work, it spends almost all of its time cpu-bound, with > > little to no io-wait or blocking on network calls, etc. I had been > > using it for about a week with no real difficulty until I went to watch > > another DVD and found that the video was slightly stuttery/jerky so long > > as foldingathome was running in the background. Once I shut it down, > > the video playback resumed its normal smooth form. > > > > There are a couple simple solutions to this: > > > > Substantially boosting the process priority of the mplayer process also > > returns the video to smooth playback, but this is undesirable in that it > > requires manual intervention each time, and root privileges. It fails to > > achieve what I want, which is for the foldingathome computation to not > > interfere with anything else I may try to do. I want my compiles to be > > as *exactly* as fast as they were without it as possible, etc. > > > > Stopping foldingathome before I do something performance sensitive is > > also possible, but again smacks of workaround rather than solution. The > > scheduler should be able to resolve the goal without me stopping the > > other work. > > > > I have done a bit of research on how the kernel scheduler works, and why > > I am seeing this behavior. I had previously, apparently ignorantly, > > equated 'nice 19' with being akin to Microsoft Windows' 'idle' thread > > priority, and assumed it would never steal CPU cycles from a process > > with a higher(lower, depending on nomenclature) priority. > > > > It is my current understanding that when mplayer is running (also > > typically CPU bound, occassionally it becomes I/O bound briefly), one of > > the instances of foldingathome, which is sharing the CPU (core) with > > mplayer starts getting starved, and the scheduler dynamically rewards it > > with up to four additional priority levels based on the time remaining > > in its quantum which it was not allowed to execute for. > > > > At this point, when mplayer blocks for just a moment, say to page in the > > data for the next video frame, foldingathome gets scheduled again, and > > gets to run for at least MIN_TIMESLICE (plus, due to the lack of kernel > > pre-emptibility, possibly longer). It appears that it takes too long to > > switch back to mplayer and the result is the stuttering picture I > > observe. > > > > I have tried adjusting CONFIG_HZ_xxx from 300 (where I had it) to 1000, > > and noted some improvement, but not complete remedy. > > > > In my prior searching on this, I found only one poster with the same > > essential problem (from 2004, and regarding distributed.net in the > > background, which is essentially the same problem). The only technical > > answer given him was to perhaps try tuning the MIN_TIMESLICE value > > downward. It is my understanding that this parameter is relatively > > important in order to avoid cache thrashing, and I do not wish to alter > > it and have not so far. > > > > Given all of the above, I am unconvinced that I see a good overall > > solution. However, one thing that seems to me a glaring weakness of the > > scheduler is that only realtime priority threads can be given static > > priorities. What I really want for foldingathome, and similar tasks, is > > static, low priority. Something that would not boost up, no matter how > > well behaved it was or how much it had been starved, or how close to the > > same memory segments the needed code was. > > > > I think that there are probably (at least) three approaches here. One I > > consider unnacceptable at the outset, which is to alter the semantics of > > nice 19 such that it does not boost. Since this would break existing > > assumptions and code, I do not think it is feasible. > > > > Secondly, one could add additional nice levels which would correspond to > > new static priorities below the bottom of the current user ones. This > > should not interfere with the O(1) scheduler implementation as I > > understand it, because current I believe 5 32-bit words are used to flag > > the queue usage, and 140 priorities leaves 20 more bits available for > > new priorities. This has its own problems however, in that existing > > tools which examine process priorities could break on priorities outside > > the known 'nice' range of -20 to 19. > > > > Finally, new scheduling classes could be introduced, together with new > > system calls so that applications could select a different scheduling > > class at startup. In this way, applications could volunteer to use a > > scheduling class which never received dynamic 'reward' boosts that would > > raise their priorities. I believe Solaris has done this since Solaris > > 9, with the 'FX' scheduling class. > > > > Stepping back: > > > > 1) Is my problem 'expected' based on others' understanding of the > > current design of the scheduler, or do I have a one-off problem to > > troubleshoot here? > > > > 2) Am I overlooking obvious alternative (but clean) fixes? > > > > 3) Does anyone else see the need for static, but low process priorities? > > > > 4) What is the view of introducing a new scheduler class to handle this? > > > > I welcome any further feedback on this. I will try to follow replies > > on-list, but would appreciate being CC'd off-list as well. Please make > > the obvious substitution to my email address in order to bypass the > > spam-killer. > > > > Thanks, > > Nathanael Hoyle > > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling 2009-01-30 6:48 ` Nathanael Hoyle @ 2009-01-30 14:15 ` Jan Engelhardt 0 siblings, 0 replies; 31+ messages in thread From: Jan Engelhardt @ 2009-01-30 14:15 UTC (permalink / raw) To: Nathanael Hoyle; +Cc: V.Radhakrishnan, linux-kernel On Friday 2009-01-30 07:48, Nathanael Hoyle wrote: >On Fri, 2009-01-30 at 11:47 +0530, V.Radhakrishnan wrote: >> >> However, the kernel also supports SCHED_FIFO and SCHED_RR which supports >> Real Time capabilities, albeit as root. >> [...] >> If you wish 100% smooth display, you could make it run as SCHED_FIFO >> which means that your foldingathome would wait quietly for the movie to >> get completed fully. What's "wrong" with that aproach, which is >> essentially what you want ? > >My view of what's "wrong" with that approach is that it requires root >privileges to boost the scheduling priority of each and every process >(although in this case, mplayer is the issue) which I want to not be >affected by foldingathome's CPU usage. SCHED_FIFO is dangerous - it is easy to essentially lock up your box simply because the process in question (e.g. video decoder) just runs forever (e.g. bug causing busyloop, and/or others do not get to run (no other processes in the same priority class). Or they (X.org, for displaying your video and handling user input) run only for short amounts of time only, giving a borked responsiveness experience to the user. It was about time SCHED_{BATCH,IDLE} came along ;-) >While I happen to be root on >this system, since it is my desktop, I would imagine there are instances >where the root user/administrator of a system wanted to be able to run >items which had no impact on other users, including allowing them to run >fast and responsive applications. Aside from that, it's a PITA to start >mplayer playing, go renice -19 it and resume watching my movie every >time. Even making mplayer -19 will not completely cause FAH to get zero CPU time if the Regular Desktop Processes Everybody Needs would already max out the CPU. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling 2009-01-30 5:49 scheduler nice 19 versus 'idle' behavior / static low-priority scheduling Nathanael Hoyle 2009-01-30 6:16 ` Jan Engelhardt 2009-01-30 6:17 ` V.Radhakrishnan @ 2009-01-30 6:24 ` Mike Galbraith 2009-01-30 6:52 ` Nathanael Hoyle 2009-01-30 8:50 ` Peter Zijlstra 2009-02-02 17:23 ` Lennart Sorensen 4 siblings, 1 reply; 31+ messages in thread From: Mike Galbraith @ 2009-01-30 6:24 UTC (permalink / raw) To: Nathanael Hoyle; +Cc: linux-kernel On Fri, 2009-01-30 at 00:49 -0500, Nathanael Hoyle wrote: > Recently I installed the Folding@Home client, which many of you may be > familiar with, intended to utilize spare CPU cycles to perform protein > folding simulations in order to further medical research. It is not a > multi-threaded client at this point, so it simply runs four instances on > my system, since it has four cores. It is configured to run at > nice-level 19. > > Because it is heavily optimized, and needs little external data to > perform its work, it spends almost all of its time cpu-bound, with > little to no io-wait or blocking on network calls, etc. I had been > using it for about a week with no real difficulty until I went to watch > another DVD and found that the video was slightly stuttery/jerky so long > as foldingathome was running in the background. Once I shut it down, > the video playback resumed its normal smooth form. Sounds like a problem was recently fixed. Can you try 2.6.29-rc3 or 2.6.28.2? -Mike ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling 2009-01-30 6:24 ` Mike Galbraith @ 2009-01-30 6:52 ` Nathanael Hoyle 2009-01-30 7:09 ` Mike Galbraith 0 siblings, 1 reply; 31+ messages in thread From: Nathanael Hoyle @ 2009-01-30 6:52 UTC (permalink / raw) To: Mike Galbraith; +Cc: linux-kernel On Fri, 2009-01-30 at 07:24 +0100, Mike Galbraith wrote: > On Fri, 2009-01-30 at 00:49 -0500, Nathanael Hoyle wrote: > > > Recently I installed the Folding@Home client, which many of you may be > > familiar with, intended to utilize spare CPU cycles to perform protein > > folding simulations in order to further medical research. It is not a > > multi-threaded client at this point, so it simply runs four instances on > > my system, since it has four cores. It is configured to run at > > nice-level 19. > > > > Because it is heavily optimized, and needs little external data to > > perform its work, it spends almost all of its time cpu-bound, with > > little to no io-wait or blocking on network calls, etc. I had been > > using it for about a week with no real difficulty until I went to watch > > another DVD and found that the video was slightly stuttery/jerky so long > > as foldingathome was running in the background. Once I shut it down, > > the video playback resumed its normal smooth form. > > Sounds like a problem was recently fixed. Can you try 2.6.29-rc3 or > 2.6.28.2? > > -Mike > I will try to do so as soon as I get the chance. Do you have any specific info on the problem that you believe was fixed and/or the fix applied? Thanks, -Nathanael ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling 2009-01-30 6:52 ` Nathanael Hoyle @ 2009-01-30 7:09 ` Mike Galbraith 0 siblings, 0 replies; 31+ messages in thread From: Mike Galbraith @ 2009-01-30 7:09 UTC (permalink / raw) To: Nathanael Hoyle; +Cc: linux-kernel On Fri, 2009-01-30 at 01:52 -0500, Nathanael Hoyle wrote: > On Fri, 2009-01-30 at 07:24 +0100, Mike Galbraith wrote: > > Sounds like a problem was recently fixed. Can you try 2.6.29-rc3 or > > 2.6.28.2? > > > > I will try to do so as soon as I get the chance. Do you have any > specific info on the problem that you believe was fixed and/or the fix > applied. The commit text below also applies to +nice tasks. commit 046e7f77d734778a3b2e7d51ce63da3dbe7a8168 Author: Peter Zijlstra <a.p.zijlstra@chello.nl> Date: Thu Jan 15 14:53:39 2009 +0100 sched: fix update_min_vruntime commit e17036dac189dd034c092a91df56aa740db7146d upstream. Impact: fix SCHED_IDLE latency problems OK, so we have 1 running task A (which is obviously curr and the tree is equally obviously empty). 'A' nicely chugs along, doing its thing, carrying min_vruntime along as it goes. Then some whacko speed freak SCHED_IDLE task gets inserted due to SMP balancing, which is very likely far right, in that case update_curr update_min_vruntime cfs_rq->rb_leftmost := true (the crazy task sitting in a tree) vruntime = se->vruntime and voila, min_vruntime is waaay right of where it ought to be. OK, so why did I write it like that to begin with... Aah, yes. Say we've just dequeued current schedule deactivate_task(prev) dequeue_entity update_min_vruntime Then we'll set vruntime = cfs_rq->min_vruntime; we find !cfs_rq->curr, but do find someone in the tree. Then we _must_ do vruntime = se->vruntime, because vruntime = min_vruntime(vruntime := cfs_rq->min_vruntime, se->vruntime) will not advance vruntime, and cause lags the other way around (which we fixed with that initial patch: 1af5f730fc1bf7c62ec9fb2d307206e18bf40a69 (sched: more accurate min_vruntime accounting). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Mike Galbraith <efault@gmx.de> Acked-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c index 98345e4..06a68c4 100644 --- a/kernel/sched_fair.c +++ b/kernel/sched_fair.c @@ -283,7 +283,7 @@ static void update_min_vruntime(struct cfs_rq *cfs_rq) struct sched_entity, run_node); - if (vruntime == cfs_rq->min_vruntime) + if (!cfs_rq->curr) vruntime = se->vruntime; else vruntime = min_vruntime(vruntime, se->vruntime); ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling 2009-01-30 5:49 scheduler nice 19 versus 'idle' behavior / static low-priority scheduling Nathanael Hoyle ` (2 preceding siblings ...) 2009-01-30 6:24 ` Mike Galbraith @ 2009-01-30 8:50 ` Peter Zijlstra 2009-01-30 9:00 ` Nathanael Hoyle 2009-02-02 17:23 ` Lennart Sorensen 4 siblings, 1 reply; 31+ messages in thread From: Peter Zijlstra @ 2009-01-30 8:50 UTC (permalink / raw) To: Nathanael Hoyle; +Cc: linux-kernel On Fri, 2009-01-30 at 00:49 -0500, Nathanael Hoyle wrote: > > 1) Is my problem 'expected' based on others' understanding of the > current design of the scheduler, or do I have a one-off problem to > troubleshoot here? What kernel are you running (or did my eye glance over that detail in your longish email) ? > 2) Am I overlooking obvious alternative (but clean) fixes? Maybe, we fixed a glaring bug in this department recently (or more even, if you're on older than .28). > 3) Does anyone else see the need for static, but low process priorities? Yep, its rather common. > 4) What is the view of introducing a new scheduler class to handle this? We should have plenty available, SCHED_IDLE should just work -- as should nice 19 for that matter. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling 2009-01-30 8:50 ` Peter Zijlstra @ 2009-01-30 9:00 ` Nathanael Hoyle 2009-01-30 9:03 ` Peter Zijlstra 0 siblings, 1 reply; 31+ messages in thread From: Nathanael Hoyle @ 2009-01-30 9:00 UTC (permalink / raw) To: Peter Zijlstra; +Cc: linux-kernel On Fri, 2009-01-30 at 09:50 +0100, Peter Zijlstra wrote: > On Fri, 2009-01-30 at 00:49 -0500, Nathanael Hoyle wrote: > > > > 1) Is my problem 'expected' based on others' understanding of the > > current design of the scheduler, or do I have a one-off problem to > > troubleshoot here? > > What kernel are you running (or did my eye glance over that detail in > your longish email) ? > I didn't include it, I should have: $ uname -a Linux nightmare 2.6.27-gentoo-r7-nhoyle #2 SMP Wed Jan 28 19:04:37 EST 2009 x86_64 Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz GenuineIntel GNU/Linux > > 2) Am I overlooking obvious alternative (but clean) fixes? > > Maybe, we fixed a glaring bug in this department recently (or more even, > if you're on older than .28). > Yep, .27 atm. > > 3) Does anyone else see the need for static, but low process priorities? > > Yep, its rather common. > > > 4) What is the view of introducing a new scheduler class to handle this? > > We should have plenty available, SCHED_IDLE should just work -- as > should nice 19 for that matter. > Thanks! -Nathanael ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling 2009-01-30 9:00 ` Nathanael Hoyle @ 2009-01-30 9:03 ` Peter Zijlstra 2009-01-30 10:18 ` Nathanael Hoyle 0 siblings, 1 reply; 31+ messages in thread From: Peter Zijlstra @ 2009-01-30 9:03 UTC (permalink / raw) To: Nathanael Hoyle; +Cc: linux-kernel, Mike Galbraith On Fri, 2009-01-30 at 04:00 -0500, Nathanael Hoyle wrote: > On Fri, 2009-01-30 at 09:50 +0100, Peter Zijlstra wrote: > > On Fri, 2009-01-30 at 00:49 -0500, Nathanael Hoyle wrote: > > > > > > 1) Is my problem 'expected' based on others' understanding of the > > > current design of the scheduler, or do I have a one-off problem to > > > troubleshoot here? > > > > What kernel are you running (or did my eye glance over that detail in > > your longish email) ? > > > > I didn't include it, I should have: > > $ uname -a > Linux nightmare 2.6.27-gentoo-r7-nhoyle #2 SMP Wed Jan 28 19:04:37 EST > 2009 x86_64 Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz GenuineIntel > GNU/Linux Ah, then please do as Mike suggested, try 28.2 or 29-rc3, if you still have trouble with those, please let us know. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling 2009-01-30 9:03 ` Peter Zijlstra @ 2009-01-30 10:18 ` Nathanael Hoyle 2009-01-30 10:31 ` Mike Galbraith 0 siblings, 1 reply; 31+ messages in thread From: Nathanael Hoyle @ 2009-01-30 10:18 UTC (permalink / raw) To: Peter Zijlstra, Mike Galbraith; +Cc: Linux Kernel Mailing List On Fri, 2009-01-30 at 10:03 +0100, Peter Zijlstra wrote: > On Fri, 2009-01-30 at 04:00 -0500, Nathanael Hoyle wrote: > > On Fri, 2009-01-30 at 09:50 +0100, Peter Zijlstra wrote: > > > On Fri, 2009-01-30 at 00:49 -0500, Nathanael Hoyle wrote: > > > > > > > > 1) Is my problem 'expected' based on others' understanding of the > > > > current design of the scheduler, or do I have a one-off problem to > > > > troubleshoot here? > > > > > > What kernel are you running (or did my eye glance over that detail in > > > your longish email) ? > > > > > > > I didn't include it, I should have: > > > > $ uname -a > > Linux nightmare 2.6.27-gentoo-r7-nhoyle #2 SMP Wed Jan 28 19:04:37 EST > > 2009 x86_64 Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz GenuineIntel > > GNU/Linux > > Ah, then please do as Mike suggested, try 28.2 or 29-rc3, if you still > have trouble with those, please let us know. > Ok, I'm now running: Linux nightmare 2.6.28.2-nhoyle #1 SMP Fri Jan 30 04:50:03 EST 2009 x86_64 Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz GenuineIntel GNU/Linux Initial conclusion is that whatever defects were corrected (non SCHED_IDLE specific defects that is), the newer kernel version does the trick. Video playback is as smooth as ever when running foldingathome at simple nice 19 priority. I am not sure that I can perceive a difference so far in testing that versus using SCHED_IDLE. I will probably continue to use the latter anyhow, as that represents more accurately the semantics that I'm trying to achieve. I had previously tried upgrading my kernel version in case that would fix it, but even the latest available kernels in the portage tree for Gentoo are older than 2.6.28.2. The 'stable' ones were all still 2.6.27. I have had some concerns about Gentoo as a distro for some time, but it still allows me more freedom and perfomance optimization than do most other distros. I'll leave that at that for now to avoid starting any religous wars over distros. Once I downloaded and built the latest vanilla 2.6.28.2 sources from kernel.org though, everything seems improved, as mentioned above. Thanks to each of you who responded for all the help. I will continue to experiment over the next week or so and provide feedback if I see anything further unusual, but so far things seem good. -Nathanael ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling 2009-01-30 10:18 ` Nathanael Hoyle @ 2009-01-30 10:31 ` Mike Galbraith 2009-01-30 10:40 ` Peter Zijlstra 0 siblings, 1 reply; 31+ messages in thread From: Mike Galbraith @ 2009-01-30 10:31 UTC (permalink / raw) To: Nathanael Hoyle; +Cc: Peter Zijlstra, Linux Kernel Mailing List On Fri, 2009-01-30 at 05:18 -0500, Nathanael Hoyle wrote: > Ok, I'm now running: > > Linux nightmare 2.6.28.2-nhoyle #1 SMP Fri Jan 30 04:50:03 EST 2009 > x86_64 Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz GenuineIntel > GNU/Linux > > Initial conclusion is that whatever defects were corrected (non > SCHED_IDLE specific defects that is), the newer kernel version does the > trick. Video playback is as smooth as ever when running foldingathome > at simple nice 19 priority. Good to hear, thanks for testing. Peter, since 27 is a long term maintenance kernel, do you think 1af5f73 and 046e7f7 (at least) are 27.stable candidates? -Mike ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling 2009-01-30 10:31 ` Mike Galbraith @ 2009-01-30 10:40 ` Peter Zijlstra 2009-01-30 10:50 ` Mike Galbraith 0 siblings, 1 reply; 31+ messages in thread From: Peter Zijlstra @ 2009-01-30 10:40 UTC (permalink / raw) To: Mike Galbraith; +Cc: Nathanael Hoyle, Linux Kernel Mailing List On Fri, 2009-01-30 at 11:31 +0100, Mike Galbraith wrote: > On Fri, 2009-01-30 at 05:18 -0500, Nathanael Hoyle wrote: > > > Ok, I'm now running: > > > > Linux nightmare 2.6.28.2-nhoyle #1 SMP Fri Jan 30 04:50:03 EST 2009 > > x86_64 Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz GenuineIntel > > GNU/Linux > > > > Initial conclusion is that whatever defects were corrected (non > > SCHED_IDLE specific defects that is), the newer kernel version does the > > trick. Video playback is as smooth as ever when running foldingathome > > at simple nice 19 priority. > > Good to hear, thanks for testing. > > Peter, since 27 is a long term maintenance kernel, do you think 1af5f73 > and 046e7f7 (at least) are 27.stable candidates? 1af5f730fc1bf7c62ec9fb2d307206e18bf40a69 and e17036dac189dd034c092a91df56aa740db7146d you mean? I guess that makes sense. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling 2009-01-30 10:40 ` Peter Zijlstra @ 2009-01-30 10:50 ` Mike Galbraith 0 siblings, 0 replies; 31+ messages in thread From: Mike Galbraith @ 2009-01-30 10:50 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Nathanael Hoyle, Linux Kernel Mailing List, stable On Fri, 2009-01-30 at 11:40 +0100, Peter Zijlstra wrote: > On Fri, 2009-01-30 at 11:31 +0100, Mike Galbraith wrote: > > On Fri, 2009-01-30 at 05:18 -0500, Nathanael Hoyle wrote: > > > > > Ok, I'm now running: > > > > > > Linux nightmare 2.6.28.2-nhoyle #1 SMP Fri Jan 30 04:50:03 EST 2009 > > > x86_64 Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz GenuineIntel > > > GNU/Linux > > > > > > Initial conclusion is that whatever defects were corrected (non > > > SCHED_IDLE specific defects that is), the newer kernel version does the > > > trick. Video playback is as smooth as ever when running foldingathome > > > at simple nice 19 priority. > > > > Good to hear, thanks for testing. > > > > Peter, since 27 is a long term maintenance kernel, do you think 1af5f73 > > and 046e7f7 (at least) are 27.stable candidates? > > 1af5f730fc1bf7c62ec9fb2d307206e18bf40a69 and > e17036dac189dd034c092a91df56aa740db7146d you mean? Yeah. > I guess that makes sense. (adds cc) -Mike ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling 2009-01-30 5:49 scheduler nice 19 versus 'idle' behavior / static low-priority scheduling Nathanael Hoyle ` (3 preceding siblings ...) 2009-01-30 8:50 ` Peter Zijlstra @ 2009-02-02 17:23 ` Lennart Sorensen 4 siblings, 0 replies; 31+ messages in thread From: Lennart Sorensen @ 2009-02-02 17:23 UTC (permalink / raw) To: Nathanael Hoyle; +Cc: linux-kernel On Fri, Jan 30, 2009 at 12:49:44AM -0500, Nathanael Hoyle wrote: > All (though perhaps of special interest to a few such as Ingo, Peter, > and David), > > I am posting regarding an issue I have been dealing with recently, > though this post is not really a request for troubleshooting. Instead > I'd like to ramble for just a moment about my understanding of the > current 2.6 scheduler, describe the behavior I'm seeing, and discuss a > couple of the architectural solutions I've considered, as well as pose > the question whether anyone else views this as a general-case problem > worthy of being addressed, or whether this is something that gets > ignored by and large. It is my hope that this is not too off-topic for > this group. > > First, let me explain the issue I encountered. I am running a relatively > powerful system for a home desktop, an Intel Core 2 Quad Q9450 with 4 GB > of RAM. If it matters for the discussion, it also has 4 drives in an > mdraid raid-5 array, and decent I/O throughput. In normal circumstances > it is quite responsive as a desktop (kde 3.5.4 atm). It is further a > very carefully configured kernel build, including only those things > which I truly need, and excluding everything else. I often use it to > watch DVD movies, and have had no trouble with performance in general. > > Recently I installed the Folding@Home client, which many of you may be > familiar with, intended to utilize spare CPU cycles to perform protein > folding simulations in order to further medical research. It is not a > multi-threaded client at this point, so it simply runs four instances on > my system, since it has four cores. It is configured to run at > nice-level 19. I too have seen this behaviour on my quad core Q6600 mythtv box, and I too run folding@home on it and have a 4 drive raid5. > Because it is heavily optimized, and needs little external data to > perform its work, it spends almost all of its time cpu-bound, with > little to no io-wait or blocking on network calls, etc. I had been > using it for about a week with no real difficulty until I went to watch > another DVD and found that the video was slightly stuttery/jerky so long > as foldingathome was running in the background. Once I shut it down, > the video playback resumed its normal smooth form. > > There are a couple simple solutions to this: > > Substantially boosting the process priority of the mplayer process also > returns the video to smooth playback, but this is undesirable in that it > requires manual intervention each time, and root privileges. It fails to > achieve what I want, which is for the foldingathome computation to not > interfere with anything else I may try to do. I want my compiles to be > as *exactly* as fast as they were without it as possible, etc. > > Stopping foldingathome before I do something performance sensitive is > also possible, but again smacks of workaround rather than solution. The > scheduler should be able to resolve the goal without me stopping the > other work. > > I have done a bit of research on how the kernel scheduler works, and why > I am seeing this behavior. I had previously, apparently ignorantly, > equated 'nice 19' with being akin to Microsoft Windows' 'idle' thread > priority, and assumed it would never steal CPU cycles from a process > with a higher(lower, depending on nomenclature) priority. > > It is my current understanding that when mplayer is running (also > typically CPU bound, occassionally it becomes I/O bound briefly), one of > the instances of foldingathome, which is sharing the CPU (core) with > mplayer starts getting starved, and the scheduler dynamically rewards it > with up to four additional priority levels based on the time remaining > in its quantum which it was not allowed to execute for. > > At this point, when mplayer blocks for just a moment, say to page in the > data for the next video frame, foldingathome gets scheduled again, and > gets to run for at least MIN_TIMESLICE (plus, due to the lack of kernel > pre-emptibility, possibly longer). It appears that it takes too long to > switch back to mplayer and the result is the stuttering picture I > observe. > > I have tried adjusting CONFIG_HZ_xxx from 300 (where I had it) to 1000, > and noted some improvement, but not complete remedy. > > In my prior searching on this, I found only one poster with the same > essential problem (from 2004, and regarding distributed.net in the > background, which is essentially the same problem). The only technical > answer given him was to perhaps try tuning the MIN_TIMESLICE value > downward. It is my understanding that this parameter is relatively > important in order to avoid cache thrashing, and I do not wish to alter > it and have not so far. > > Given all of the above, I am unconvinced that I see a good overall > solution. However, one thing that seems to me a glaring weakness of the > scheduler is that only realtime priority threads can be given static > priorities. What I really want for foldingathome, and similar tasks, is > static, low priority. Something that would not boost up, no matter how > well behaved it was or how much it had been starved, or how close to the > same memory segments the needed code was. > > I think that there are probably (at least) three approaches here. One I > consider unnacceptable at the outset, which is to alter the semantics of > nice 19 such that it does not boost. Since this would break existing > assumptions and code, I do not think it is feasible. > > Secondly, one could add additional nice levels which would correspond to > new static priorities below the bottom of the current user ones. This > should not interfere with the O(1) scheduler implementation as I > understand it, because current I believe 5 32-bit words are used to flag > the queue usage, and 140 priorities leaves 20 more bits available for > new priorities. This has its own problems however, in that existing > tools which examine process priorities could break on priorities outside > the known 'nice' range of -20 to 19. > > Finally, new scheduling classes could be introduced, together with new > system calls so that applications could select a different scheduling > class at startup. In this way, applications could volunteer to use a > scheduling class which never received dynamic 'reward' boosts that would > raise their priorities. I believe Solaris has done this since Solaris > 9, with the 'FX' scheduling class. > > Stepping back: > > 1) Is my problem 'expected' based on others' understanding of the > current design of the scheduler, or do I have a one-off problem to > troubleshoot here? > > 2) Am I overlooking obvious alternative (but clean) fixes? > > 3) Does anyone else see the need for static, but low process priorities? > > 4) What is the view of introducing a new scheduler class to handle this? > > I welcome any further feedback on this. I will try to follow replies > on-list, but would appreciate being CC'd off-list as well. Please make > the obvious substitution to my email address in order to bypass the > spam-killer. Well I haven't looked into it myself, but I can certainly confirm that the current bahaviour is downright awful with this particular mix of processes. -- Len Sorensen ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2009-02-09 16:12 UTC | newest] Thread overview: 31+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-01-30 5:49 scheduler nice 19 versus 'idle' behavior / static low-priority scheduling Nathanael Hoyle 2009-01-30 6:16 ` Jan Engelhardt 2009-01-30 6:40 ` Nathanael Hoyle 2009-01-30 7:21 ` Jan Engelhardt 2009-01-30 7:59 ` Nathanael Hoyle 2009-01-30 8:07 ` Mike Galbraith 2009-01-30 8:55 ` Nathanael Hoyle 2009-01-30 9:29 ` Mike Galbraith 2009-01-30 22:12 ` Brian Rogers 2009-01-31 5:38 ` Mike Galbraith 2009-01-31 9:08 ` Mike Galbraith 2009-02-02 23:57 ` [stable] " Greg KH 2009-02-09 15:19 ` Brian Rogers 2009-02-09 15:51 ` Greg KH 2009-01-30 8:16 ` Nathanael Hoyle 2009-01-30 13:56 ` Jan Engelhardt 2009-01-30 14:15 ` Jan Engelhardt 2009-01-30 6:17 ` V.Radhakrishnan 2009-01-30 6:48 ` Nathanael Hoyle 2009-01-30 14:15 ` Jan Engelhardt 2009-01-30 6:24 ` Mike Galbraith 2009-01-30 6:52 ` Nathanael Hoyle 2009-01-30 7:09 ` Mike Galbraith 2009-01-30 8:50 ` Peter Zijlstra 2009-01-30 9:00 ` Nathanael Hoyle 2009-01-30 9:03 ` Peter Zijlstra 2009-01-30 10:18 ` Nathanael Hoyle 2009-01-30 10:31 ` Mike Galbraith 2009-01-30 10:40 ` Peter Zijlstra 2009-01-30 10:50 ` Mike Galbraith 2009-02-02 17:23 ` Lennart Sorensen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox