* [RFC][PATCH][2.6.6] Replacing CPU scheduler active and expired with a single array
@ 2004-05-28 4:52 Peter Williams
2004-05-28 9:05 ` Ingo Molnar
0 siblings, 1 reply; 15+ messages in thread
From: Peter Williams @ 2004-05-28 4:52 UTC (permalink / raw)
To: Linux Kernel Mailing List
This is a modification of the O(1) scheduler to replace the active and
expired priority arrays with a single priority array:
<http://users.bigpond.net.au/Peter-Williams/patch-2.6.6-spa>
This patch simplifies the scheduler code by:
- removing the need to manage two priority arrays per CPU
- removing the need to use variable time slices to implement "nice"
which means no more sharing of time slices between parents and children
at fork and exit.
Key features are:
-- uses the existing "interactive task" bonus scheme so there is no
change to interactive responsiveness
-- the number of priority slots has been increased by MAX_BONUS + 1 so
that there is no need to truncate the allocated priority after
allocation of the bonus and to allow the "idle" tasks to be placed on
the queues
-- at the end of each time slice (or when waking up) each task is given
a complete new time slice and, if class SCHED_NORMAL, is put in a
priority slot given by (static_prio + MAX_BONUS - interactive_bonus)
-- starvation is prevented by promoting all runnable tasks by one
priority slot at intervals determined by a base promotion interval
multiplied by the number of runnable tasks on the CPU in question
-- when there are less than 2 runnable processes on the CPU the
promotion is not performed
-- in order for the anti starvation promotion mechanism to be O(1) the
"prio" field has been removed from the task structure
-- to enable the priority of the current task to be available for
various uses that previously used current->prio a record of the priority
slot for the current task is now kept in the runqueue struct
-- having the "idle" tasks on the queues allows the code for selecting
the "next" task in schedule() to be simplified
Effected files are:
init_task.h -- cope with removal of "prio" and "array" from task struct
sched.h -- remove "prio" and "array" from task struct
exit.c -- remove call to sched_exit()
sched.c -- the bulk of the change
Performance:
- no discernible change in interactive responsiveness
- slight improvements in results from tests such as kernbench e.g.
Average Half Load -j 3 Run:
Elapsed Time 573.582 (vs 576.196)
User Time 1576.36 (vs 1582.19)
System Time 100.866 (vs 102.266)
Percent CPU 291.8 (vs 291.8)
Context Switches 9970.2 (vs 10653.6)
Sleeps 23996 (vs 23736.2)
Average Optimal -j 16 Load Run:
Elapsed Time 439.88 (vs 443.21)
User Time 1605.09 (vs 1613.75)
System Time 104.618 (vs 106.99)
Percent CPU 388 (vs 387.6)
Context Switches 29816.4 (vs 35444.6)
Sleeps 25589.8 (vs 27509.4)
Average Maximal -j Load Run:
Elapsed Time 460.252 (vs 460.72)
User Time 1628.57 (vs 1633.04)
System Time 147.666 (vs 147.222)
Percent CPU 385.4 (vs 386.2)
Context Switches 78475.6 (vs 78242.4)
Sleeps 126011 (vs 113026)
In summary, this patch provides code simplification without cost.
--
Dr Peter Williams, Chief Scientist peterw@aurema.com
Aurema Pty Limited Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH][2.6.6] Replacing CPU scheduler active and expired with a single array
2004-05-28 4:52 [RFC][PATCH][2.6.6] Replacing CPU scheduler active and expired with a single array Peter Williams
@ 2004-05-28 9:05 ` Ingo Molnar
2004-05-28 9:24 ` Peter Williams
0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2004-05-28 9:05 UTC (permalink / raw)
To: Peter Williams; +Cc: Linux Kernel Mailing List
* Peter Williams <peterw@aurema.com> wrote:
> -- at the end of each time slice (or when waking up) each task is
> given a complete new time slice and, if class SCHED_NORMAL, is put in
> a priority slot given by (static_prio + MAX_BONUS - interactive_bonus)
this is the Achilles' heel of approaches that try to get rid of the
active/expired array and/or try to get rid of timeslice tracking. A
CPU-bound task which schedules away for small amounts of time will get a
disproportionatly larger share of the CPU than a CPU-bound task that
doesnt schedule at all.
just try it - run a task that runs 95% of the time and sleeps 5% of the
time, and run a (same prio) task that runs 100% of the time. With the
current scheduler the slightly-sleeping task gets 45% of the CPU, the
looping one gets 55% of the CPU. With your patch the slightly-sleeping
process can easily monopolize 90% of the CPU!
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH][2.6.6] Replacing CPU scheduler active and expired with a single array
2004-05-28 9:05 ` Ingo Molnar
@ 2004-05-28 9:24 ` Peter Williams
2004-05-28 9:29 ` Ingo Molnar
2004-05-28 9:57 ` Con Kolivas
0 siblings, 2 replies; 15+ messages in thread
From: Peter Williams @ 2004-05-28 9:24 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Linux Kernel Mailing List
Ingo Molnar wrote:
> * Peter Williams <peterw@aurema.com> wrote:
>
>
>>-- at the end of each time slice (or when waking up) each task is
>>given a complete new time slice and, if class SCHED_NORMAL, is put in
>>a priority slot given by (static_prio + MAX_BONUS - interactive_bonus)
>
>
> this is the Achilles' heel of approaches that try to get rid of the
> active/expired array and/or try to get rid of timeslice tracking. A
> CPU-bound task which schedules away for small amounts of time will get a
> disproportionatly larger share of the CPU than a CPU-bound task that
> doesnt schedule at all.
>
> just try it - run a task that runs 95% of the time and sleeps 5% of the
> time, and run a (same prio) task that runs 100% of the time. With the
> current scheduler the slightly-sleeping task gets 45% of the CPU, the
> looping one gets 55% of the CPU. With your patch the slightly-sleeping
> process can easily monopolize 90% of the CPU!
If these two tasks have the same nice value they should around robin
with each other in the same priority slot and this means that the one
doing the smaller bites of CPU each time will in fact get less CPU than
the other i.e. the outcome will be the opposite of what you claim.
This does, of course, not take into account the interactive bonus. If
the task doing the shorter CPU bursts manages to earn a larger
interactivity bonus than the other then it will get more CPU but isn't
that the intention of the interactivity bonus?
Peter
--
Dr Peter Williams, Chief Scientist peterw@aurema.com
Aurema Pty Limited Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH][2.6.6] Replacing CPU scheduler active and expired with a single array
2004-05-28 9:24 ` Peter Williams
@ 2004-05-28 9:29 ` Ingo Molnar
2004-05-28 9:57 ` Con Kolivas
1 sibling, 0 replies; 15+ messages in thread
From: Ingo Molnar @ 2004-05-28 9:29 UTC (permalink / raw)
To: Peter Williams; +Cc: Linux Kernel Mailing List
* Peter Williams <peterw@aurema.com> wrote:
> >just try it - run a task that runs 95% of the time and sleeps 5% of the
> >time, and run a (same prio) task that runs 100% of the time. With the
> >current scheduler the slightly-sleeping task gets 45% of the CPU, the
> >looping one gets 55% of the CPU. With your patch the slightly-sleeping
> >process can easily monopolize 90% of the CPU!
>
> If these two tasks have the same nice value they should around robin
> with each other in the same priority slot and this means that the one
> doing the smaller bites of CPU each time will in fact get less CPU
> than the other i.e. the outcome will be the opposite of what you
> claim.
just try what i described with and without your patch and look at the
'top' output. You can do a simple loop plus short 10-20msec sleeps (via
nanosleep) to simulate a 95% busy task.
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH][2.6.6] Replacing CPU scheduler active and expired with a single array
2004-05-28 9:24 ` Peter Williams
2004-05-28 9:29 ` Ingo Molnar
@ 2004-05-28 9:57 ` Con Kolivas
1 sibling, 0 replies; 15+ messages in thread
From: Con Kolivas @ 2004-05-28 9:57 UTC (permalink / raw)
To: Peter Williams; +Cc: Ingo Molnar, Linux Kernel Mailing List
On Fri, 28 May 2004 19:24, Peter Williams wrote:
> Ingo Molnar wrote:
> > just try it - run a task that runs 95% of the time and sleeps 5% of the
> > time, and run a (same prio) task that runs 100% of the time. With the
> > current scheduler the slightly-sleeping task gets 45% of the CPU, the
> > looping one gets 55% of the CPU. With your patch the slightly-sleeping
> > process can easily monopolize 90% of the CPU!
> This does, of course, not take into account the interactive bonus. If
> the task doing the shorter CPU bursts manages to earn a larger
> interactivity bonus than the other then it will get more CPU but isn't
> that the intention of the interactivity bonus?
No. Ideally the interactivity bonus should decide what goes first every time
to decrease the latency of interactive tasks, but the cpu percentage should
remain close to the same for equal "nice" tasks. Interactive tasks need low
scheduling latency and short bursts of high cpu usage; not more cpu usage
overall. When the cpu percentage differs significantly from this the logic
has failed.
Con
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH][2.6.6] Replacing CPU scheduler active and expired with a single array
@ 2004-05-29 1:39 Peter Williams
0 siblings, 0 replies; 15+ messages in thread
From: Peter Williams @ 2004-05-29 1:39 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Linux Kernel Mailing List
> Ingo Molnar wrote:
>> Peter Williams <peterw@xxxxxxxxxx> wrote:
>> >just try it - run a task that runs 95% of the time and sleeps 5% of the
>> >time, and run a (same prio) task that runs 100% of the time. With the
>> >current scheduler the slightly-sleeping task gets 45% of the CPU, the
>> >looping one gets 55% of the CPU. With your patch the slightly-sleeping
>> >process can easily monopolize 90% of the CPU!
>>
>> If these two tasks have the same nice value they should around robin
>> with each other in the same priority slot and this means that the one
>> doing the smaller bites of CPU each time will in fact get less CPU
>> than the other i.e. the outcome will be the opposite of what you
>> claim.
>
> just try what i described with and without your patch and look at the
> 'top' output. You can do a simple loop plus short 10-20msec sleeps
> (via nanosleep) to simulate a 95% busy task.
OK. I've tried this experiment and an effect similar to what you
described (but not as severe) was observed. However, as I opined, this
was due to the interactive bonus mechanism being too generous to the
sleeper - sometimes giving at as many as 9 bonus points out of a
possible maximum of 10. The severity of the effect was variable and
proportional to the size of the bonus awarded at any given time. So the
problem is due to the bonus point scheme and not the absence of the
active and expired arrays.
Further investigation of this problem has revealed that the reason that
the interactive bonus scheme is so generous to this type of task is due
the code in schedule() that allows tasks with "activated" greater than 1
to count time spent on the run queue as sleep time. If this code is
removed the effect disappears.
Interestingly enough, removing that code has no effect (that I could
discern) on the interactive response even under heavy loads. This had
been observed previously and consideration was given to removing this
code as part of this patch but it was decided to make no significant
changes to the interactive bonus scheme for this patch. I.e.
modifying/improving the interactive bonus mechanism was considered to be
another issue.
Peter
PS Sorry about the change in e-mail address but my ISP changed my IP
address and this has caused me to lose my SSH connection with my
aurema.com account and as it's now Saturday here in Sydney I'm unlikely
to get it back until Monday.
--
Dr Peter Williams pwil3058@bigpond.net.au
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH][2.6.6] Replacing CPU scheduler active and expired with a single array
@ 2004-05-29 5:27 Peter Williams
2004-05-29 11:17 ` Con Kolivas
0 siblings, 1 reply; 15+ messages in thread
From: Peter Williams @ 2004-05-29 5:27 UTC (permalink / raw)
To: Con Kolivas; +Cc: Ingo Molnar, Linux Kernel Mailing List
Con Kolivas wrote:
> On Fri, 28 May 2004 19:24, Peter Williams wrote:
> > Ingo Molnar wrote:
> > > just try it - run a task that runs 95% of the time and sleeps 5%
> > > of the time, and run a (same prio) task that runs 100% of the
> > > time. With the current scheduler the slightly-sleeping task gets
> > > 45% of the CPU, the looping one gets 55% of the CPU. With your
> > > patch the slightly-sleeping process can easily monopolize 90% of
> > > the CPU!
>
> > This does, of course, not take into account the interactive bonus.
> > If the task doing the shorter CPU bursts manages to earn a larger
> > interactivity bonus than the other then it will get more CPU but
> > isn't that the intention of the interactivity bonus?
>
> No. Ideally the interactivity bonus should decide what goes first
> every time to decrease the latency of interactive tasks, but the cpu
> percentage should remain close to the same for equal "nice" tasks.
There are at least two possible ways of viewing "nice": one of these is
that it is an indicator of the tasks entitlement to CPU resource (which
is more or less the view you describe) and another that it is an
indicator of the task's priority with respect to access to CPU resources.
If you wish the system to take the first of these views then the
appropriate solution to the scheduling problem is to use an entitlement
based scheduler such as EBS (see
<http://sourceforge.net/projects/ebs-linux/>) which is also much simpler
than the current O(1) scheduler and has the advantage that it gives
pretty good interactive responsiveness without treating interactive
tasks specially (although some modification in this regard may be
desirable if very high loads are going to be encountered).
If you want the second of these then this proposed modification is a
simple way of getting it (with the added proviso that starvation be
avoided).
Of course, there can be other scheduling aims such as maximising
throughput where different scheduler paradigms need to be used. As a
matter of interest these tend to have not very good interactive response.
If the system is an interactive system then all of these models (or at
least two of them) need to be modified to "break the rules" as far as
interactive tasks are concerned and give them higher priority in order
not to try human patience.
> Interactive tasks need low scheduling latency and short bursts of high
> cpu usage; not more cpu usage overall. When the cpu percentage
differs > significantly from this the logic has failed.
The only way this will happen is if the interactive bonus mechanism
misidentifies a CPU bound task as an interactive task and gives it a
large bonus. This seems to be the case as tasks with a 95% CPU demand
rate are being given a bonus of 9 (out of 10 possible) points.
Peter
--
Dr Peter Williams pwil3058@bigpond.net.au
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH][2.6.6] Replacing CPU scheduler active and expired with a single array
2004-05-29 5:27 Peter Williams
@ 2004-05-29 11:17 ` Con Kolivas
2004-05-30 0:19 ` Peter Williams
0 siblings, 1 reply; 15+ messages in thread
From: Con Kolivas @ 2004-05-29 11:17 UTC (permalink / raw)
To: Peter Williams; +Cc: Ingo Molnar, Linux Kernel Mailing List
On Sat, 29 May 2004 15:27, Peter Williams wrote:
> Con Kolivas wrote:
> > On Fri, 28 May 2004 19:24, Peter Williams wrote:
> > > Ingo Molnar wrote:
> > > > just try it - run a task that runs 95% of the time and sleeps 5%
> > > > of the time, and run a (same prio) task that runs 100% of the
> > > > time. With the current scheduler the slightly-sleeping task gets
> > > > 45% of the CPU, the looping one gets 55% of the CPU. With your
> > > > patch the slightly-sleeping process can easily monopolize 90% of
> > > > the CPU!
> > >
> > > This does, of course, not take into account the interactive bonus.
> > > If the task doing the shorter CPU bursts manages to earn a larger
> > > interactivity bonus than the other then it will get more CPU but
> > > isn't that the intention of the interactivity bonus?
> >
> > No. Ideally the interactivity bonus should decide what goes first
> > every time to decrease the latency of interactive tasks, but the cpu
> > percentage should remain close to the same for equal "nice" tasks.
>
> There are at least two possible ways of viewing "nice": one of these is
> that it is an indicator of the tasks entitlement to CPU resource (which
> is more or less the view you describe) and another that it is an
> indicator of the task's priority with respect to access to CPU resources.
>
> If you wish the system to take the first of these views then the
> appropriate solution to the scheduling problem is to use an entitlement
> based scheduler such as EBS (see
> <http://sourceforge.net/projects/ebs-linux/>) which is also much simpler
> than the current O(1) scheduler and has the advantage that it gives
> pretty good interactive responsiveness without treating interactive
> tasks specially (although some modification in this regard may be
> desirable if very high loads are going to be encountered).
>
> If you want the second of these then this proposed modification is a
> simple way of getting it (with the added proviso that starvation be
> avoided).
>
> Of course, there can be other scheduling aims such as maximising
> throughput where different scheduler paradigms need to be used. As a
> matter of interest these tend to have not very good interactive response.
>
> If the system is an interactive system then all of these models (or at
> least two of them) need to be modified to "break the rules" as far as
> interactive tasks are concerned and give them higher priority in order
> not to try human patience.
>
> > Interactive tasks need low scheduling latency and short bursts of high
> > cpu usage; not more cpu usage overall. When the cpu percentage
>
> differs > significantly from this the logic has failed.
>
> The only way this will happen is if the interactive bonus mechanism
> misidentifies a CPU bound task as an interactive task and gives it a
> large bonus. This seems to be the case as tasks with a 95% CPU demand
> rate are being given a bonus of 9 (out of 10 possible) points.
This is all a matter of semantics and I have no argument with it.
I think your aims of simplifying the scheduler are admirable but I hope you
don't suffer the quagmire that is manipulating the interactivity stuff.
Changing one value and saying it has no apparent effect is almost certainly
wrong; surely it was put there for a reason - or rather I put it there for a
reason.
Con
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH][2.6.6] Replacing CPU scheduler active and expired with a single array
2004-05-29 11:17 ` Con Kolivas
@ 2004-05-30 0:19 ` Peter Williams
2004-05-30 12:56 ` Con Kolivas
0 siblings, 1 reply; 15+ messages in thread
From: Peter Williams @ 2004-05-30 0:19 UTC (permalink / raw)
To: Con Kolivas; +Cc: Ingo Molnar, Linux Kernel Mailing List
Con Kolivas wrote:
> On Sat, 29 May 2004 15:27, Peter Williams wrote:
>
>>Con Kolivas wrote:
>> > On Fri, 28 May 2004 19:24, Peter Williams wrote:
>> > > Ingo Molnar wrote:
>> > > > just try it - run a task that runs 95% of the time and sleeps 5%
>> > > > of the time, and run a (same prio) task that runs 100% of the
>> > > > time. With the current scheduler the slightly-sleeping task gets
>> > > > 45% of the CPU, the looping one gets 55% of the CPU. With your
>> > > > patch the slightly-sleeping process can easily monopolize 90% of
>> > > > the CPU!
>> > >
>> > > This does, of course, not take into account the interactive bonus.
>> > > If the task doing the shorter CPU bursts manages to earn a larger
>> > > interactivity bonus than the other then it will get more CPU but
>> > > isn't that the intention of the interactivity bonus?
>> >
>> > No. Ideally the interactivity bonus should decide what goes first
>> > every time to decrease the latency of interactive tasks, but the cpu
>> > percentage should remain close to the same for equal "nice" tasks.
>>
>>There are at least two possible ways of viewing "nice": one of these is
>>that it is an indicator of the tasks entitlement to CPU resource (which
>>is more or less the view you describe) and another that it is an
>>indicator of the task's priority with respect to access to CPU resources.
>>
>>If you wish the system to take the first of these views then the
>>appropriate solution to the scheduling problem is to use an entitlement
>>based scheduler such as EBS (see
>><http://sourceforge.net/projects/ebs-linux/>) which is also much simpler
>>than the current O(1) scheduler and has the advantage that it gives
>>pretty good interactive responsiveness without treating interactive
>>tasks specially (although some modification in this regard may be
>>desirable if very high loads are going to be encountered).
>>
>>If you want the second of these then this proposed modification is a
>>simple way of getting it (with the added proviso that starvation be
>>avoided).
>>
>>Of course, there can be other scheduling aims such as maximising
>>throughput where different scheduler paradigms need to be used. As a
>>matter of interest these tend to have not very good interactive response.
>>
>>If the system is an interactive system then all of these models (or at
>>least two of them) need to be modified to "break the rules" as far as
>>interactive tasks are concerned and give them higher priority in order
>>not to try human patience.
>>
>> > Interactive tasks need low scheduling latency and short bursts of high
>> > cpu usage; not more cpu usage overall. When the cpu percentage
>>
>>differs > significantly from this the logic has failed.
>>
>>The only way this will happen is if the interactive bonus mechanism
>>misidentifies a CPU bound task as an interactive task and gives it a
>>large bonus. This seems to be the case as tasks with a 95% CPU demand
>>rate are being given a bonus of 9 (out of 10 possible) points.
>
>
> This is all a matter of semantics and I have no argument with it.
>
> I think your aims of simplifying the scheduler are admirable but I hope you
> don't suffer the quagmire that is manipulating the interactivity stuff.
As you surmise, this patch is just a starting point and there are some
parts of it the may need to be fine tuned.
For instance, the current time slice used is set at the average that the
current mechanism would have dispensed. Making this smaller would
lessen the severity of the anomaly under discussion but making it too
small would increase the context switch rate. There is evidence from
our kernbench results that we have room to decrease this value and still
keep the context switch rate below that of the current scheduler (at
least, for normal to moderately heavy loads). If possible I'd like to
get some statistics on the sleep/wake cycles of tasks on a typical
system to help make a judgment about what is the best value here.
Another area that needs more consideration is the determination of the
promotion interval. At the moment, there's no promotion if there's less
than 2 runnable tasks on a CPU and the interval is a constant multiplied
by the number of runnable tasks otherwise.
Another area of investigation is (yet another) bonus intended to
increase system throughput by minimizing (or at least attempting to) the
time tasks spend on the run queues. The principal difficulty here is
making sure that this doesn't adversely effect interactive
responsiveness as it's an unfortunate fact of life that what's good for
interactive response isn't necessarily (and usually isn't) good for
maximizing throughput and vice versa.
Then, the interactive bonus mechanism might be examined but this is of
low priority as the current one seems to do a reasonable job.
Lastly, with the simplification of the scheduler I believe that it would
be possible to make both the interactive response and throughput bonuses
optional. An example of why this MIGHT BE desirable is that the
interactive response bonus adversely effects throughput and turning it
off on servers where there are no interactive users may be worthwhile.
> Changing one value and saying it has no apparent effect is almost certainly
> wrong; surely it was put there for a reason - or rather I put it there for a
> reason.
Out of interest, what was the reason? What problem were you addressing?
Peter
--
Dr Peter Williams pwil3058@bigpond.net.au
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH][2.6.6] Replacing CPU scheduler active and expired with a single array
2004-05-30 0:19 ` Peter Williams
@ 2004-05-30 12:56 ` Con Kolivas
2004-05-31 0:04 ` Peter Williams
0 siblings, 1 reply; 15+ messages in thread
From: Con Kolivas @ 2004-05-30 12:56 UTC (permalink / raw)
To: Peter Williams; +Cc: Ingo Molnar, Linux Kernel Mailing List
On Sun, 30 May 2004 10:19, Peter Williams wrote:
> Out of interest, what was the reason? What problem were you addressing?
The interactive credit?
There was a problem with difficulty elevating back to interactive state if an
interactive task had used too long a burst of cpu (ie Xfree) which was
addressed by making the bonus pseudo-exponentially curved for rapid recovery
and slow decay - in fact this is probably the most important part of
addressing the interactive tasks and had the best effect. The problem was
that giving this to all tasks meant that cpu bound tasks that had, as a
property of their behaviour, long waits on say pipes or i/o would also get
this rapid recovery to interactive state and as soon as they became fully
bound to cpu again they would cause noticable stalls. The standard example is
the increasing number of jobs in a make, where each job waits longer for i/o
as the job numbers increase. However there were much worse examples at even
normal - low loads, such as mpeg or divx encoding where the encoder would
buffer say 250 frames sleeping and then do them in a burst. (this is the
maximum space between key [I] frame or intervals). The interactive credit
prevented those tasks that would have long but only infrequent sleeps from
getting the curved bonus/penalty.
Hmm... if this is black magic I guess I'm breaking the magician's cardinal
rules and revealing my tricks ;-)
Con
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH][2.6.6] Replacing CPU scheduler active and expired with a single array
2004-05-30 12:56 ` Con Kolivas
@ 2004-05-31 0:04 ` Peter Williams
2004-05-30 23:13 ` Con Kolivas
0 siblings, 1 reply; 15+ messages in thread
From: Peter Williams @ 2004-05-31 0:04 UTC (permalink / raw)
To: Con Kolivas; +Cc: Ingo Molnar, Linux Kernel Mailing List
Con Kolivas wrote:
> On Sun, 30 May 2004 10:19, Peter Williams wrote:
>
>>Out of interest, what was the reason? What problem were you addressing?
>
>
> The interactive credit?
No. I was asking about the mechanism in schedule() that, based on the
value of "activated", allows some tasks to count their time on the run
queue as sleep time.
>
> There was a problem with difficulty elevating back to interactive state if an
> interactive task had used too long a burst of cpu (ie Xfree) which was
> addressed by making the bonus pseudo-exponentially curved for rapid recovery
> and slow decay - in fact this is probably the most important part of
> addressing the interactive tasks and had the best effect.
But this probably answers my question anyway.
> The problem was
> that giving this to all tasks meant that cpu bound tasks that had, as a
> property of their behaviour, long waits on say pipes or i/o would also get
> this rapid recovery to interactive state and as soon as they became fully
> bound to cpu again they would cause noticable stalls. The standard example is
> the increasing number of jobs in a make, where each job waits longer for i/o
> as the job numbers increase. However there were much worse examples at even
> normal - low loads, such as mpeg or divx encoding where the encoder would
> buffer say 250 frames sleeping and then do them in a burst. (this is the
> maximum space between key [I] frame or intervals). The interactive credit
> prevented those tasks that would have long but only infrequent sleeps from
> getting the curved bonus/penalty.
In the near future, I'll be proposing another patch that goes on top of
the single priority array (SPA) patch that has the express purpose of
reducing the time that all tasks spend on run queues. I think that one
of the side effects of this mechanism is that it will alleviate the
problem you've described above.
I also think that their is a category of task that may be automatically
detected and that needs special attention and that is tasks that need
very regular access to small bursts of CPU. I suspect that the tasks
doing the mpeg and divx encoding/decoding that you refer to above fall
into this category. The key to identifying these tasks would be that
the variance (or standard deviation) of their sleeps would be close to
zero and as they probably do much the same amount of work each CPU burst
the same would be true of the variance of the length of the CPU bursts.
Currently, the scheduler relies on the interactive bonus to make sure
that these tasks get CPU when they need it but I suspect that to make
this happen the interactive bonus has to be larger than it might
otherwise need to be. So if these tasks can be identified and treated
specially (e.g. reserve the MAX_RT_PRIO slot for them) the interactive
bonus could be reduced and this would improve overall system throughput.
>
> Hmm... if this is black magic I guess I'm breaking the magician's cardinal
> rules and revealing my tricks ;-)
>
Peter
--
Dr Peter Williams pwil3058@bigpond.net.au
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC][PATCH][2.6.6] Replacing CPU scheduler active and expired with a single array
2004-05-31 0:04 ` Peter Williams
@ 2004-05-30 23:13 ` Con Kolivas
0 siblings, 0 replies; 15+ messages in thread
From: Con Kolivas @ 2004-05-30 23:13 UTC (permalink / raw)
To: Peter Williams; +Cc: Ingo Molnar, Linux Kernel Mailing List
Quoting Peter Williams <pwil3058@bigpond.net.au>:
> Con Kolivas wrote:
> > The interactive credit?
>
> No. I was asking about the mechanism in schedule() that, based on the
> value of "activated", allows some tasks to count their time on the run
> queue as sleep time.
No this is different again. When there is significant load and an interactive
task (like X) is using bursts of cpu it will not get a chance to sleep. It will
either run out of timeslice or be preempted and thus will be constantly on the
runqueue. In that time if only sleep time is counted it is progressively seen
as a cpu bound task - in fact this was a major problem before where X would
basically die under load because you'd move a window and it would be constantly
waiting for more cpu time rather than ever sleeping.
> I also think that their is a category of task that may be automatically
> detected and that needs special attention and that is tasks that need
> very regular access to small bursts of CPU. I suspect that the tasks
> doing the mpeg and divx encoding/decoding that you refer to above fall
> into this category. The key to identifying these tasks would be that
> the variance (or standard deviation) of their sleeps would be close to
> zero and as they probably do much the same amount of work each CPU burst
> the same would be true of the variance of the length of the CPU bursts.
I've already tried that experiment and I personally failed to make the pattern
of sleep/burst correspond with interactivity. There is a huge variation in the
duration of sleep/run for all different things, and it changes with load. Of
course your modelling may be better than mine.
> Dr Peter Williams pwil3058@bigpond.net.au
Oooh I've got one of those too but I normally dont use it on lkml.
So just for this once...
Dr Con Kolivas
^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <214A1-6NK-7@gated-at.bofh.it>]
end of thread, other threads:[~2004-06-04 7:40 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-05-28 4:52 [RFC][PATCH][2.6.6] Replacing CPU scheduler active and expired with a single array Peter Williams
2004-05-28 9:05 ` Ingo Molnar
2004-05-28 9:24 ` Peter Williams
2004-05-28 9:29 ` Ingo Molnar
2004-05-28 9:57 ` Con Kolivas
-- strict thread matches above, loose matches on Subject: below --
2004-05-29 1:39 Peter Williams
2004-05-29 5:27 Peter Williams
2004-05-29 11:17 ` Con Kolivas
2004-05-30 0:19 ` Peter Williams
2004-05-30 12:56 ` Con Kolivas
2004-05-31 0:04 ` Peter Williams
2004-05-30 23:13 ` Con Kolivas
[not found] <214A1-6NK-7@gated-at.bofh.it>
[not found] ` <21acm-2GN-1@gated-at.bofh.it>
2004-05-29 12:24 ` Andi Kleen
2004-05-29 12:38 ` Con Kolivas
2004-06-04 7:40 ` Peter Williams
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox