Re: [SCHED] Totally WRONG prority calculation with specific test-case (since 2.6.10-bk12)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Peter Williams <pwil3058@bigpond.net.au>
To: Paolo Ornati <ornati@fastwebnet.it>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Con Kolivas <kernel@kolivas.org>, Ingo Molnar <mingo@elte.hu>
Subject: Re: [SCHED] Totally WRONG prority calculation with specific test-case (since 2.6.10-bk12)
Date: Thu, 29 Dec 2005 00:38:08 +1100	[thread overview]
Message-ID: <43B29540.1030904@bigpond.net.au> (raw)
In-Reply-To: <20051228112058.2c0c1137@localhost>

Paolo Ornati wrote:
> On Wed, 28 Dec 2005 10:59:13 +1100
> Peter Williams <pwil3058@bigpond.net.au> wrote:
> 
> 
>>Any chance of you applying the PlugSched patches and seeing how the 
>>other schedulers that it contains handle this situation?
>>
>>The patch at:
>>
>><http://prdownloads.sourceforge.net/cpuse/plugsched-6.1.6-for-2.6.15-rc5.patch?download>
>>
>>should apply without problems to the 2.6.15-rc7 kernel.
>>
>>Very Brief Documentation:
>>
>>You can select a default scheduler at kernel build time.  If you wish to
>>boot with a scheduler other than the default it can be selected at boot
>>time by adding:
>>
>>cpusched=<scheduler>
>>
>>to the boot command line where <scheduler> is one of: ingosched,
>>nicksched, staircase, spa_no_frills, spa_ws, spa_svr or zaphod.  If you
>>don't change the default when you build the kernel the default scheduler
>>will be ingosched (which is the normal scheduler).
> 
> 
> 
> First of all, this is the "pstree" structure of transcode an friends:
> 
>      |-kdesktop---perl---sh---transcode-+-2*[sh-+-tccat]
>      |                                  |       |-tcdecode]
>      |                                  |       |-tcdemux]
>      |                                  |       `-tcextract]
>      |                                  `-transcode---5*[transcode]
> 
> 
> Results with various schedulers:

First, thanks for doing this.

> 
> ------------------------------------------------------------------------
> 
> 	1) nicksched: perfect! This is the behaviour I want.
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  5562 paolo     40   0  115m  18m 2428 R 82.2  3.7   0:22.16 transcode
>  5576 paolo     26   0 50348 4516 1912 S  9.5  0.9   0:02.43 tcdecode
>  5566 paolo     23   0  115m  18m 2428 S  4.6  3.7   0:01.24 transcode
>  5573 paolo     21   0  115m  18m 2428 S  0.9  3.7   0:00.22 transcode
>  5577 paolo     27   0 20356 1140  920 S  0.9  0.2   0:00.21 tcdemux
>  5295 root      20   0  167m  17m 3624 S  0.6  3.5   0:11.02 X
>  5579 paolo     20   0 47308 2540 1996 S  0.5  0.5   0:00.14 tcdecode
>  5574 paolo     20   0 20356 1144  920 S  0.4  0.2   0:00.11 tcdemux
> ...
> 
> transcode get recognized for what it is, and I/O bounded processes
> don't even notice that it is running :)

Interesting.  This one's more or less a dead scheduler and hasn't had 
any development work done on it for some time.  I just keep porting the 
original version to new kernels.

> 
> 
> 	2) staircase: bad, as you can see:
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  5582 paolo     26   0  115m  18m 2428 R 82.7  3.7   0:47.63 transcode
>  5599 paolo     39   0 50352 4516 1912 R  9.6  0.9   0:05.21 tcdecode
>  5586 paolo      0   0  115m  18m 2428 S  4.5  3.7   0:02.61 transcode
>  5622 paolo     39   0  4948 1520  412 R  1.1  0.3   0:00.15 dd
>  5591 paolo      0   0  115m  18m 2428 S  0.6  3.7   0:00.36 transcode
>  5575 paolo      0   0 98476  37m 9392 S  0.4  7.5   0:01.44 perl
>  5597 paolo     27   0 20356 1144  920 S  0.4  0.2   0:00.21 tcdemux
>  5475 paolo      0   0 86556  22m  15m S  0.2  4.5   0:01.24 konsole
>  5388 root       0   0  167m  17m 3208 S  0.1  3.4   0:03.16 X
>  5587 paolo      0   0  115m  18m 2428 S  0.1  3.7   0:00.03 transcode
>  5595 paolo     20   0 47312 2540 1996 S  0.1  0.5   0:00.14 tcdecode
>  5596 paolo     26   0 22672 1268 1020 S  0.1  0.2   0:00.03 tccat
>  5598 paolo     28   0 22364 1436  932 S  0.1  0.3   0:00.04 tcextract
> 
> 
> And "DD" is affected badly:
> 
> paolo@tux /mnt $ mount space/; sync; sleep 1; time dd if=space/bigfile
> of=/dev/null bs=1M count=128; umount space/ 128+0 records in
> 128+0 records out
> 
> real    0m6.341s
> user    0m0.002s
> sys     0m0.229s
> 
> While transcoding:
> 
> paolo@tux /mnt $ mount space/; sync; sleep 1; time dd if=space/bigfile
> of=/dev/null bs=1M count=256; umount space/ 256+0 records in
> 256+0 records out
> 
> real    0m15.793s
> user    0m0.001s
> sys     0m0.374s
> 
> 
> 	3) spa_no_frills: bad, but this is OK since it is Round Robin :)
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  5356 paolo     20   0  115m  18m 2428 R 81.1  3.7   0:27.61 transcode
>  5371 paolo     20   0 50348 4516 1912 R  8.9  0.9   0:02.97 tcdecode
>  5360 paolo     20   0  115m  18m 2428 S  4.1  3.7   0:01.54 transcode
>  5378 paolo     20   0  4948 1520  412 D  1.4  0.3   0:00.29 dd
>  5364 paolo     20   0 20352 1144  920 S  0.9  0.2   0:00.20 tcdemux
>  5373 paolo     20   0  115m  18m 2428 S  0.7  3.7   0:00.32 transcode
>  5369 paolo     20   0 20356 1144  920 S  0.5  0.2   0:00.14 tcdemux
>  5205 root      20   0  165m  15m 2584 R  0.2  3.2   0:01.86 X
> 

Yes, no surprises there.

> 
> 	4) spa_ws: bad
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  5334 paolo     32   0  115m  18m 2428 R 82.7  3.7   0:18.77 transcode
>  5349 paolo     32   0 50348 4516 1912 R  8.9  0.9   0:02.00 tcdecode
>  5338 paolo     21   0  115m  18m 2428 S  4.6  3.7   0:01.08 transcode
>  5356 paolo     32   0  4948 1520  412 D  1.1  0.3   0:00.12 dd
>  5351 paolo     32   0  115m  18m 2428 S  1.0  3.7   0:00.20 transcode
>  5199 root      21   0  165m  15m 2584 S  0.4  3.2   0:01.68 X
>  5347 paolo     32   0 20356 1140  920 S  0.4  0.2   0:00.08 tcdemux
>  5296 paolo     22   0 98472  37m 9392 S  0.2  7.5   0:01.47 perl
>  5299 paolo     21   0 86556  22m  15m S  0.2  4.4   0:00.75 konsole
>  5344 paolo     32   0 47308 2540 1996 S  0.2  0.5   0:00.07 tcdecode
>  5339 paolo     21   0  115m  18m 2428 S  0.1  3.7   0:00.01 transcode
> 
> paolo@tux /mnt $ mount space/; sync; sleep 1; time dd if=space/bigfile
> of=/dev/null bs=1M count=256; umount space/ 256+0 records in
> 256+0 records out
> 
> real    0m8.112s
> user    0m0.001s
> sys     0m0.444s
> 
> paolo@tux /mnt $ mount space/; sync; sleep 1; time dd if=space/bigfile
> of=/dev/null bs=1M count=256; umount space/ 256+0 records in
> 256+0 records out
> 
> real    0m29.222s
> user    0m0.000s
> sys     0m0.400s

This one is aimed purely at good interactive responsiveness (i.e. 
keyboard, mouse, X server and media players such as rythmbox/xmms) so no 
real surprises here either.

> 
> 
> 	5) spa_svr: surprise, surprise! Not all that bad. At least DD
> gets better priority than transcode... and DD real time is only a bit
> affected (8s --> ~9s).
>

This will be the "throughput bonus" in action.  It's overall aim is to 
reduce the time tasks spend on the runqueue waiting for CPU access 
a.k.a. delay.  It does this by using the system load and the average 
amount of CPU time that the task uses each scheduling cycle to estimate 
the expected delay for the task and gives it a bonus if the actual 
average delays being experienced are bigger than this value.

It's intended for server systems not interactive systems as reducing 
overall delay isn't necessarily good for interactive systems where the 
aim is to quell the user's impatience by giving good latency to the 
interactive tasks.  These aims aren't always compatible.

> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  5334 paolo     33   0  115m  18m 2428 R 78.1  3.7   0:22.70 transcode
>  5349 paolo     28   0 50352 4516 1912 S  9.0  0.9   0:02.41 tcdecode
>  5338 paolo     25   0  115m  18m 2428 S  4.7  3.7   0:01.29 transcode
>  5363 paolo     27   0  4952 1520  412 R  4.7  0.3   0:00.25 dd
>  5342 paolo     33   0 20352 1140  920 S  1.6  0.2   0:00.21 tcdemux
>  5351 paolo     25   0  115m  18m 2428 S  0.8  3.7   0:00.23 transcode
>  5144 root      22   0  166m  16m 3120 S  0.4  3.3   0:01.85 X
>  5344 paolo     23   0 47308 2540 1996 S  0.4  0.5   0:00.13 tcdecode
>  5347 paolo     27   0 20356 1144  920 S  0.4  0.2   0:00.10 tcdemux
>  5231 paolo     22   0 86660  22m  15m S  0.2  4.5   0:00.95 konsole
>  5271 paolo     25   0 98476  37m 9396 S  0.2  7.5   0:01.54 perl
>  5341 paolo     23   0 22672 1268 1020 S  0.2  0.2   0:00.02 tccat
> 
> 
> 	6) zaphod: more or less like spa_svr

Zaphod includes the throughput bonus in its armoury which why it is 
similar in performance to spa_svr.

> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  5308 paolo     34   0  115m  18m 2428 R 52.1  3.7   0:49.77 transcode
>  5323 paolo     32   0 50352 4516 1912 S  6.0  0.9   0:05.61 tcdecode
>  5356 paolo     28   0  4952 1520  412 D  3.5  0.3   0:00.28 dd
>  5312 paolo     28   0  115m  18m 2428 S  2.6  3.7   0:02.71 transcode
>  5325 paolo     31   0  115m  18m 2428 S  0.7  3.7   0:00.55 transcode
>  5316 paolo     37   0 20352 1140  920 S  0.4  0.2   0:00.33 tcdemux
>  5202 root      23   0  165m  15m 2584 S  0.2  3.1   0:01.57 X
>  5318 paolo     31   0 47312 2540 1996 S  0.2  0.5   0:00.28 tcdecode
>  5321 paolo     33   0 20356 1144  920 S  0.2  0.2   0:00.26 tcdemux
>  4760 messageb  25   0 13248 1068  848 S  0.1  0.2   0:00.07
> dbus-daemon-1 5264 paolo     24   0 93920  17m  10m S  0.1  3.5
> 0:00.38 kded 5282 paolo     23   0 92712  19m  12m S  0.1  3.9
> 0:00.36 kdesktop
> 
> 
> 	7) ingosched: bad, as already said in the original post
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  5209 paolo     16   0  115m  18m 2428 R 72.0  3.7   0:22.13 transcode
>  5224 paolo     22   0 50348 4516 1912 R  8.4  0.9   0:02.44 tcdecode
>  5213 paolo     15   0  115m  18m 2428 S  4.2  3.7   0:01.24 transcode
>  5243 paolo     18   0  4948 1520  412 R  1.8  0.3   0:00.14 dd
>  5217 paolo     19   0 20356 1144  920 R  0.8  0.2   0:00.19 tcdemux
>  5108 root      15   0  165m  15m 2584 S  0.6  3.1   0:01.44 X
>  5226 paolo     15   0  115m  18m 2428 S  0.6  3.7   0:00.20 transcode
>  5216 paolo     18   0 22676 1268 1020 S  0.4  0.2   0:00.03 tccat
>  5219 paolo     18   0 47312 2540 1996 R  0.4  0.5   0:00.12 tcdecode
>  5222 paolo     18   0 20356 1144  920 S  0.4  0.2   0:00.10 tcdemux
>  5195 paolo     16   0 98488  37m 9392 S  0.2  7.5   0:01.41 perl
>  5198 paolo     16   0 86552  22m  15m R  0.2  4.4   0:00.66 konsole
> 
> paolo@tux /mnt $ mount space/; sync; sleep 1; time dd if=space/bigfile of=/dev/null bs=1M count=256; umount space/
> 256+0 records in
> 256+0 records out
> 
> real    0m23.393s	(instead of 8s)
> user    0m0.001s
> sys     0m0.418s
> 
> ------------------------------------------------------------------------
> 
> 
> So the winner for manifest superiority is "nicksched", it looks to me
> even better than 2.6.10-bk12 (ingosched) with
> "remove_interactive_credit" reverted.

Thanks for this data.  It will enable me to make some mods to the 
spa_xxx and zaphod schedulers.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

next prev parent reply	other threads:[~2005-12-28 13:38 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-12-27 18:09 [SCHED] Totally WRONG prority calculation with specific test-case (since 2.6.10-bk12) Paolo Ornati
2005-12-27 21:48 ` Paolo Ornati
2005-12-27 23:26   ` Con Kolivas
2005-12-28 11:01     ` Paolo Ornati
2005-12-28 11:19       ` Con Kolivas
2005-12-28 11:35         ` Paolo Ornati
2005-12-28 17:23           ` Paolo Ornati
2005-12-28 17:39             ` Paolo Ornati
2005-12-30 13:52     ` [SCHED] wrong priority calc - SIMPLE test case Paolo Ornati
2005-12-31  2:06       ` Peter Williams
2005-12-31 10:34         ` Paolo Ornati
2005-12-31 10:52           ` Paolo Ornati
2005-12-31 11:12             ` Con Kolivas
2005-12-31 13:44             ` Peter Williams
2005-12-31 16:31               ` Paolo Ornati
2005-12-31 22:04                 ` Peter Williams
2005-12-31  8:13       ` Mike Galbraith
2005-12-31 11:00         ` Paolo Ornati
2005-12-31 15:11         ` Paolo Ornati
2005-12-31 16:37           ` Mike Galbraith
2005-12-31 17:24             ` Paolo Ornati
2005-12-31 17:42               ` Paolo Ornati
2006-01-01 11:39             ` Paolo Ornati
2006-01-02  9:15               ` Mike Galbraith
2006-01-02  9:50                 ` Paolo Ornati
2006-01-09 11:11                 ` Mike Galbraith
2006-01-09 15:52                   ` Mike Galbraith
2006-01-09 16:08                     ` Con Kolivas
2006-01-09 18:14                       ` Mike Galbraith
2006-01-09 20:00                     ` Paolo Ornati
2006-01-09 20:23                       ` Paolo Ornati
2006-01-10  7:08                       ` Mike Galbraith
2006-01-10 12:07                         ` Mike Galbraith
2006-01-10 12:56                           ` Paolo Ornati
2006-01-10 13:01                             ` Mike Galbraith
2006-01-10 13:53                               ` Paolo Ornati
2006-01-10 15:18                                 ` Mike Galbraith
2006-01-13  1:13       ` Con Kolivas
2006-01-13  1:32         ` Con Kolivas
2006-01-13 10:46         ` Paolo Ornati
2006-01-13 10:51           ` Con Kolivas
2006-01-13 13:01             ` Mike Galbraith
2006-01-13 14:34               ` Con Kolivas
2006-01-13 16:15                 ` Mike Galbraith
2006-01-14  2:05                   ` Con Kolivas
2006-01-14  2:56                     ` Mike Galbraith
2005-12-27 23:59   ` [SCHED] Totally WRONG prority calculation with specific test-case (since 2.6.10-bk12) Peter Williams
2005-12-28 10:20     ` Paolo Ornati
2005-12-28 13:38       ` Peter Williams [this message]
2005-12-28 19:45         ` Paolo Ornati
2005-12-29  3:13         ` Nick Piggin
2005-12-29  3:35           ` Peter Williams
2005-12-29  8:11             ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43B29540.1030904@bigpond.net.au \
    --to=pwil3058@bigpond.net.au \
    --cc=kernel@kolivas.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=ornati@fastwebnet.it \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox