From: Peter Williams <pwil3058@bigpond.net.au>
To: Andi Kleen <ak@muc.de>
Cc: Con Kolivas <kernel@kolivas.org>,
linux-kernel@vger.kernel.org, kernel.linux@lists.sw.oz.au
Subject: Re: [RFC][PATCH][2.6.6] Replacing CPU scheduler active and expired with a single array
Date: Fri, 04 Jun 2004 17:40:33 +1000 [thread overview]
Message-ID: <40C02771.1080803@bigpond.net.au> (raw)
In-Reply-To: <m37juvpgjc.fsf@averell.firstfloor.org>
Andi Kleen wrote:
> Con Kolivas <kernel@kolivas.org> writes:
>
>>I think your aims of simplifying the scheduler are admirable but I hope you
>>don't suffer the quagmire that is manipulating the interactivity stuff.
>>Changing one value and saying it has no apparent effect is almost certainly
>>wrong; surely it was put there for a reason - or rather I put it there for a
>>reason.
>
>
> But that doesn't mean that the reason cannot be reevaluated later.
> If Peter can up with a simpler scheduler and nobody can break it significantly
> it would be great, and i'm hope such simplifications could be merged
> after testing. Certainly the current one does far too much black magic.
As a result of your encouragement I have implemented a simplified
version of the interactive bonus scheme and an extension whose aim is to
improve general system throughput on to of my SPA patch (to the 2.6.6
kernel) an updated version of which is available at:
<http://users.bigpond.net.au/Peter-Williams/patch-2.6.6-spa-v2>
Interactive Bonus (SPA_IA_BONUS) patch is available at:
<http://users.bigpond.net.au/Peter-Williams/patch-2.6.6-spa_ia_bonus>
This patch approaches the problem from a control systems perspective and
the statistics are calculated for each task (using extremely simple and
efficient Kalman filters):
A_c which is the running average number of nanoseconds of CPU time that
it spends on the CPU during a scheduling cycle (SC) which is defined to
be the period from one activation (or wake up) to the next.
A_s which is the running average number of nanoseconds that it spends
sleeping during the SC
A_d which is the running average number of nanoseconds that it spends on
a run queue waiting for CPU access
These statistics are then used to make the following tests about the task:
1. is it showing interactive behaviour, and/or
2. is it showing CPU bound behaviour.
The first test consists of testing that A_s / (A_s + A_c) (where A_s is
the average sleep time per cycle and A_c is the average CPU time per
cycle) is greater than a threshold (currently 90%) (NB this is
equivalent to A_c / (A_c + A_s) being less than 10%) and also that the
average sleep time isn't greater than a threshold (currently set at 15
minutes). This test is applied at the end of each scheduling cycle when
the task is woken and put back on the run queue. If the task passes
this test its interactive bonus is increased asymptotically towards the
maximum bonus (or, at least, the maximum multiplied by their average A_s
/ (A_s + A_c)).
The second test is applied at the end of a cycle if the above test fails
and also at the end of each time slice (if the task stays active for
more than one time slice). This task consists of testing whether the
ratio A_c / (A_s + A_c + A_d) (where A_d is the average delay on run
queues waiting for CPU access per cycle) is greater a threshold
(currently 50%) or A_s is zero or very large (currently greater than 2
hours) and if the task passes this test it is considered to be CPU bound
or a chronic idler and NOT interactive and its interactive bonus is
decreased asymptotically towards zero. The reason that A_c / (A_s + A_c
+ A_d) instead of the equivalent A_c / (A_c + A_s) form the first test
is so that interactive tasks are not mistakenly classified as CPU bound
tasks when the system is very busy and their wait time becomes squeezed
and turns into run queue delay time. Similarly, the reason that A_c /
(A_c + A_s + A_d) isn't used in the first test is that it could lead to
CPU bound tasks being wrongly classed as interactive tasks when the
system is very busy.
In the description of the first test, I mention that the interactive
bonus of a tasks asymptotically approaches the product of the maximum
bonus and their average A_s / (A_s + A_c). This has the important side
effect that the interactive bonus of a fairly busy task such as the X
server doesn't get as big bonus as that of those low CPU usage stream
servers such as xmms and they work quite happily under extremely heavy
loads including heavy X server activity.
Although this may sound complex and expensive it's actually quite cheap
as estimating the averages is very simple and the more complex bits
(involving 64 bit division to calculate the ratios) occur relatively
infrequently.
Throughput Bonus (SPA_TPT_BONUS) patch is available at:
<http://users.bigpond.net.au/Peter-Williams/patch-2.6.6-spa_tpt_bonus>
This bonus is also based on the statistics described above and awards
bonus points to tasks that are spending a disproportionate amount of
time (compared to the CPU time they are using) on run queues. The
maximum amount of this bonus is half that of the maximum interactive
bonus so it should not cause tasks receiving it to interfere with them.
The bonus awarded is proportional to the equation: (A_d / (A_d + A_c *
N)) where N is the number of tasks running on the CPU in question. This
latter correction is necessary to stop all tasks getting the maximum
bonus when the system is busy. Unlike the interactive bonus this bonus
is not persistent and is recalculated every time the task is to be
requeued. When the system is not heavily loaded this bonus has the
tendency to reduce the overall amount of queuing on the system and
improve throughput. When the system is heavily loaded it tends to
spread the pain evenly among the non interactive tasks (subject, of
course, to niceness).
Hopefully my explanations above don't fall into the "black magic"
category and the mechanics of the scheduler are easy to understand? :-)
If not and you have any questions don't hesitate to ask
Peter
PS Patches for 2.6.7-rc2 should be available next week.
--
Dr Peter Williams pwil3058@bigpond.net.au
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
next prev parent reply other threads:[~2004-06-04 7:40 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <214A1-6NK-7@gated-at.bofh.it>
[not found] ` <21acm-2GN-1@gated-at.bofh.it>
2004-05-29 12:24 ` [RFC][PATCH][2.6.6] Replacing CPU scheduler active and expired with a single array Andi Kleen
2004-05-29 12:38 ` Con Kolivas
2004-06-04 7:40 ` Peter Williams [this message]
2004-05-29 5:27 Peter Williams
2004-05-29 11:17 ` Con Kolivas
2004-05-30 0:19 ` Peter Williams
2004-05-30 12:56 ` Con Kolivas
2004-05-31 0:04 ` Peter Williams
2004-05-30 23:13 ` Con Kolivas
-- strict thread matches above, loose matches on Subject: below --
2004-05-29 1:39 Peter Williams
2004-05-28 4:52 Peter Williams
2004-05-28 9:05 ` Ingo Molnar
2004-05-28 9:24 ` Peter Williams
2004-05-28 9:29 ` Ingo Molnar
2004-05-28 9:57 ` Con Kolivas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=40C02771.1080803@bigpond.net.au \
--to=pwil3058@bigpond.net.au \
--cc=ak@muc.de \
--cc=kernel.linux@lists.sw.oz.au \
--cc=kernel@kolivas.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox