From: Peter Williams <pwil3058@bigpond.net.au>
To: Paolo Ornati <ornati@fastwebnet.it>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Chris Han <xiphux@gmail.com>, Con Kolivas <kernel@kolivas.org>,
William Lee Irwin III <wli@holomorphy.com>,
Jake Moilanen <moilanen@austin.ibm.com>
Subject: Re: [ANNOUNCE][RFC] PlugSched-6.2 for 2.6.16-rc1 and 2.6.16-rc1-mm1
Date: Sun, 29 Jan 2006 10:44:17 +1100 [thread overview]
Message-ID: <43DC01D1.9040902@bigpond.net.au> (raw)
In-Reply-To: <43D94E62.6080603@bigpond.net.au>
[-- Attachment #1: Type: text/plain, Size: 3425 bytes --]
Peter Williams wrote:
> Paolo Ornati wrote:
>
>> On Thu, 26 Jan 2006 12:09:53 +1100
>> Peter Williams <pwil3058@bigpond.net.au> wrote:
>>
>>
>>> I know that I've said this before but I've found the problem.
>>> Embarrassingly, it was a basic book keeping error (recently
>>> introduced and equivalent to getting nr_running wrong for each CPU)
>>> in the gathering of the statistics that I use. :-(
>>>
>>> The attached patch (applied on top of the PlugSched patch) should fix
>>> things. Could you test it please?
>>
>>
>>
>> Ok, this one make a difference:
>>
>> (transcode)
>>
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>> 5774 paolo 34 0 116m 18m 2432 R 86.2 3.7 0:11.65 transcode
>> 5788 paolo 32 0 51000 4472 1872 S 7.5 0.9 0:01.13 tcdecode
>> 5797 paolo 29 0 4948 1468 372 D 3.2 0.3 0:00.30 dd
>> 5781 paolo 33 0 19844 1092 880 S 1.0 0.2 0:00.10 tcdemux
>> 5783 paolo 31 0 47964 2496 1956 S 0.7 0.5 0:00.08 tcdecode
>> 5786 paolo 34 0 19840 1088 880 R 0.5 0.2 0:00.06 tcdemux
>>
>> (sched_fooler)
>>
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>> 5804 paolo 34 0 2396 292 228 R 35.7 0.1 0:12.84 a.out
>> 5803 paolo 34 0 2392 288 228 R 30.5 0.1 0:11.49 a.out
>> 5805 paolo 34 0 2392 288 228 R 30.2 0.1 0:10.70 a.out
>> 5815 paolo 29 0 4948 1468 372 D 3.7 0.3 0:00.29 dd
>> 5458 paolo 28 0 86656 21m 15m S 0.2 4.4 0:02.18 konsole
>>
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>> 5804 paolo 34 0 2396 292 228 R 36.5 0.1 0:38.19 a.out
>> 5803 paolo 34 0 2392 288 228 R 30.5 0.1 0:34.27 a.out
>> 5805 paolo 34 0 2392 288 228 R 29.2 0.1 0:32.39 a.out
>> 5829 paolo 34 0 4952 1472 372 R 3.2 0.3 0:00.35 dd
>>
>> DD_TEST + sched_fooler: 512 MB --- ~20s instead of 16.6s
>>
>> This is a clear improvement... however I wonder why DD priority
>> fluctuate going up even to 34 (the range is something like 29 <--->
>> 34).
>>
>
> It's because the "fairness" bonus is still being done as a one shot
> bonus when a task's delay time become unfairly large. I mentioned this
> before as possibly needing to be changed to a more persistent model but
> after I discovered the accounting bug I deferred doing anything about it
> in the hope that fixing the bug would have been sufficient.
>
> I'll now try a model whereby a task's fairness bonus is increased
> whenever it has unfair delays and decreased when it doesn't. Hopefully,
> with the right rates of increase/decrease, this can result in a system
> where a task has a fairly persistent bonus which is sufficient to give
> it its fair share. One reason that I've been avoiding this method is
> that it introduces double smoothing: once in the calculation of the
> average delay time and then again in the determination of the bonus; and
> I'm concerned this may make it slow to react to change. Any way I'll
> give it a try and see what happens.
Attached is a patch which makes the fairness bonuses more persistent. I
should be applied on top of the last patch that I sent. Could you test
it please?
Thanks
Peter
--
Peter Williams pwil3058@bigpond.net.au
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
[-- Attachment #2: spa_ws-persistent-fairness --]
[-- Type: text/plain, Size: 3568 bytes --]
Index: MM-2.6.16/kernel/sched_spa_ws.c
===================================================================
--- MM-2.6.16.orig/kernel/sched_spa_ws.c 2006-01-26 12:21:50.000000000 +1100
+++ MM-2.6.16/kernel/sched_spa_ws.c 2006-01-29 10:00:21.000000000 +1100
@@ -45,12 +45,20 @@ static unsigned int initial_ia_bonus = D
#define LSHARES_AVG_ALPHA ((1 << LSHARES_AVG_OFFSET) - 2)
#define LSHARES_AVG_INCR(a) ((a) << 1)
#define LSHARES_AVG_REAL(s) ((s) << LSHARES_AVG_OFFSET)
-#define LSHARES_AVG_ONE LSAHRES_AVG_REAL(1UL)
+#define LSHARES_AVG_ONE LSHARES_AVG_REAL(1UL)
#define LSHARES_AVG_MUL(a, b) (((a) * (b)) >> LSHARES_AVG_OFFSET)
static unsigned int max_fairness_bonus = DEF_MAX_FAIRNESS_BONUS;
-#define FAIRNESS_BONUS_OFFSET 8
+#define FAIRNESS_BONUS_OFFSET 5
+#define FAIRNESS_ALPHA ((1UL << FAIRNESS_BONUS_OFFSET) - 2)
+#define FAIRNESS_ALPHA_COMPL 2
+
+static inline int fairness_bonus(const struct task_struct *p)
+{
+ return (p->sdu.spa.auxilary_bonus * max_fairness_bonus) >>
+ FAIRNESS_BONUS_OFFSET;
+}
static DEFINE_PER_CPU(unsigned long, rq_avg_lshares);
@@ -124,7 +132,7 @@ static inline void zero_interactive_bonu
static inline int bonuses(const struct task_struct *p)
{
- return current_ia_bonus_rnd(p) + p->sdu.spa.auxilary_bonus;
+ return current_ia_bonus_rnd(p) + fairness_bonus(p);
}
static int spa_ws_effective_prio(const struct task_struct *p)
@@ -161,65 +169,22 @@ static void spa_ws_fork(struct task_stru
p->sdu.spa.interactive_bonus <<= IA_BONUS_OFFSET;
}
-static inline unsigned int map_ratio(unsigned long long a,
- unsigned long long b,
- unsigned int range)
-{
- a *= range;
-
-#if BITS_PER_LONG < 64
- /*
- * Assume that there's no 64 bit divide available
- */
- if (a < b)
- return 0;
- /*
- * Scale down until b less than 32 bits so that we can do
- * a divide using do_div()
- */
- while (b > ULONG_MAX) { a >>= 1; b >>= 1; }
-
- (void)do_div(a, (unsigned long)b);
-
- return a;
-#else
- return a / b;
-#endif
-}
-
static void spa_ws_reassess_fairness_bonus(struct task_struct *p)
{
- unsigned long long expected_delay, adjusted_delay;
- unsigned long long avg_lshares;
- unsigned long pshares;
-
- p->sdu.spa.auxilary_bonus = 0;
- if (max_fairness_bonus == 0)
- return;
+ unsigned long long expected_delay;
+ unsigned long long wanr; /* weighted average number running */
- pshares = LSHARES_AVG_REAL(p->sdu.spa.eb_shares);
- avg_lshares = per_cpu(rq_avg_lshares, task_cpu(p));
- if (avg_lshares <= pshares)
+ wanr = per_cpu(rq_avg_lshares, task_cpu(p)) / p->sdu.spa.eb_shares;
+ if (wanr <= LSHARES_AVG_ONE)
expected_delay = 0;
- else {
- expected_delay = p->sdu.spa.avg_cpu_per_cycle *
- (avg_lshares - pshares);
- (void)do_div(expected_delay, pshares);
- }
-
- /*
- * No delay means no bonus, but
- * NB this test also avoids a possible divide by zero error if
- * cpu is also zero and negative bonuses
- */
- if (p->sdu.spa.avg_delay_per_cycle <= expected_delay)
- return;
-
- adjusted_delay = p->sdu.spa.avg_delay_per_cycle - expected_delay;
- p->sdu.spa.auxilary_bonus =
- map_ratio(adjusted_delay,
- adjusted_delay + p->sdu.spa.avg_cpu_per_cycle,
- max_fairness_bonus);
+ else
+ expected_delay = LSHARES_AVG_MUL(p->sdu.spa.avg_cpu_per_cycle,
+ (wanr - LSHARES_AVG_ONE));
+
+ p->sdu.spa.auxilary_bonus *= FAIRNESS_ALPHA;
+ p->sdu.spa.auxilary_bonus >>= FAIRNESS_BONUS_OFFSET;
+ if (p->sdu.spa.avg_delay_per_cycle > expected_delay)
+ p->sdu.spa.auxilary_bonus += FAIRNESS_ALPHA_COMPL;
}
static inline int spa_ws_eligible(struct task_struct *p)
next prev parent reply other threads:[~2006-01-28 23:44 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-01-19 21:45 [ANNOUNCE][RFC] PlugSched-6.2 for 2.6.16-rc1 and 2.6.16-rc1-mm1 Peter Williams
2006-01-21 6:48 ` Peter Williams
2006-01-21 10:46 ` Paolo Ornati
2006-01-21 23:06 ` Peter Williams
2006-01-22 22:47 ` Peter Williams
2006-01-23 0:49 ` Peter Williams
2006-01-23 20:21 ` Paolo Ornati
2006-01-24 0:00 ` Peter Williams
2006-01-26 1:09 ` Peter Williams
2006-01-26 8:11 ` Paolo Ornati
2006-01-26 22:34 ` Peter Williams
2006-01-28 23:44 ` Peter Williams [this message]
2006-01-31 17:44 ` Paolo Ornati
2006-01-23 20:09 ` Paolo Ornati
2006-01-23 20:25 ` Lee Revell
2006-01-23 20:52 ` Paolo Ornati
2006-01-23 20:59 ` Lee Revell
2006-01-23 21:10 ` Paolo Ornati
2006-01-23 21:11 ` Lee Revell
2006-01-23 23:32 ` Peter Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=43DC01D1.9040902@bigpond.net.au \
--to=pwil3058@bigpond.net.au \
--cc=kernel@kolivas.org \
--cc=linux-kernel@vger.kernel.org \
--cc=moilanen@austin.ibm.com \
--cc=ornati@fastwebnet.it \
--cc=wli@holomorphy.com \
--cc=xiphux@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.