Re: [ANNOUNCE][RFC] PlugSched-6.2 for 2.6.16-rc1 and 2.6.16-rc1-mm1

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Peter Williams <pwil3058@bigpond.net.au>
To: Paolo Ornati <ornati@fastwebnet.it>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Chris Han <xiphux@gmail.com>, Con Kolivas <kernel@kolivas.org>,
	William Lee Irwin III <wli@holomorphy.com>,
	Jake Moilanen <moilanen@austin.ibm.com>
Subject: Re: [ANNOUNCE][RFC] PlugSched-6.2 for  2.6.16-rc1 and 2.6.16-rc1-mm1
Date: Sun, 29 Jan 2006 10:44:17 +1100	[thread overview]
Message-ID: <43DC01D1.9040902@bigpond.net.au> (raw)
In-Reply-To: <43D94E62.6080603@bigpond.net.au>

[-- Attachment #1: Type: text/plain, Size: 3425 bytes --]

Peter Williams wrote:
> Paolo Ornati wrote:
> 
>> On Thu, 26 Jan 2006 12:09:53 +1100
>> Peter Williams <pwil3058@bigpond.net.au> wrote:
>>
>>
>>> I know that I've said this before but I've found the problem. 
>>> Embarrassingly, it was a basic book keeping error (recently 
>>> introduced and equivalent to getting nr_running wrong for each CPU) 
>>> in the gathering of the statistics that I use. :-(
>>>
>>> The attached patch (applied on top of the PlugSched patch) should fix 
>>> things.  Could you test it please?
>>
>>
>>
>> Ok, this one make a difference:
>>
>> (transcode)
>>
>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>  5774 paolo     34   0  116m  18m 2432 R 86.2  3.7   0:11.65 transcode
>>  5788 paolo     32   0 51000 4472 1872 S  7.5  0.9   0:01.13 tcdecode
>>  5797 paolo     29   0  4948 1468  372 D  3.2  0.3   0:00.30 dd
>>  5781 paolo     33   0 19844 1092  880 S  1.0  0.2   0:00.10 tcdemux
>>  5783 paolo     31   0 47964 2496 1956 S  0.7  0.5   0:00.08 tcdecode
>>  5786 paolo     34   0 19840 1088  880 R  0.5  0.2   0:00.06 tcdemux
>>
>> (sched_fooler)
>>
>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>  5804 paolo     34   0  2396  292  228 R 35.7  0.1   0:12.84 a.out
>>  5803 paolo     34   0  2392  288  228 R 30.5  0.1   0:11.49 a.out
>>  5805 paolo     34   0  2392  288  228 R 30.2  0.1   0:10.70 a.out
>>  5815 paolo     29   0  4948 1468  372 D  3.7  0.3   0:00.29 dd
>>  5458 paolo     28   0 86656  21m  15m S  0.2  4.4   0:02.18 konsole
>>
>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>  5804 paolo     34   0  2396  292  228 R 36.5  0.1   0:38.19 a.out
>>  5803 paolo     34   0  2392  288  228 R 30.5  0.1   0:34.27 a.out
>>  5805 paolo     34   0  2392  288  228 R 29.2  0.1   0:32.39 a.out
>>  5829 paolo     34   0  4952 1472  372 R  3.2  0.3   0:00.35 dd
>>
>> DD_TEST + sched_fooler: 512 MB --- ~20s instead of 16.6s
>>
>> This is a clear improvement... however I wonder why DD priority
>> fluctuate going up even to 34 (the range is something like 29 <--->
>> 34).
>>
> 
> It's because the "fairness" bonus is still being done as a one shot 
> bonus when a task's delay time become unfairly large.  I mentioned this 
> before as possibly needing to be changed to a more persistent model but 
> after I discovered the accounting bug I deferred doing anything about it 
> in the hope that fixing the bug would have been sufficient.
> 
> I'll now try a model whereby a task's fairness bonus is increased 
> whenever it has unfair delays and decreased when it doesn't.  Hopefully, 
> with the right rates of increase/decrease, this can result in a system 
> where a task has a fairly persistent bonus which is sufficient to give 
> it its fair share.  One reason that I've been avoiding this method is 
> that it introduces double smoothing: once in the calculation of the 
> average delay time and then again in the determination of the bonus; and 
> I'm concerned this may make it slow to react to change.  Any way I'll 
> give it a try and see what happens.

Attached is a patch which makes the fairness bonuses more persistent.  I 
should be applied on top of the last patch that I sent.  Could you test 
it please?

Thanks
Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

[-- Attachment #2: spa_ws-persistent-fairness --]
[-- Type: text/plain, Size: 3568 bytes --]

Index: MM-2.6.16/kernel/sched_spa_ws.c
===================================================================
--- MM-2.6.16.orig/kernel/sched_spa_ws.c	2006-01-26 12:21:50.000000000 +1100
+++ MM-2.6.16/kernel/sched_spa_ws.c	2006-01-29 10:00:21.000000000 +1100
@@ -45,12 +45,20 @@ static unsigned int initial_ia_bonus = D
 #define LSHARES_AVG_ALPHA ((1 << LSHARES_AVG_OFFSET) - 2)
 #define LSHARES_AVG_INCR(a) ((a) << 1)
 #define LSHARES_AVG_REAL(s) ((s) << LSHARES_AVG_OFFSET)
-#define LSHARES_AVG_ONE LSAHRES_AVG_REAL(1UL)
+#define LSHARES_AVG_ONE LSHARES_AVG_REAL(1UL)
 #define LSHARES_AVG_MUL(a, b) (((a) * (b)) >> LSHARES_AVG_OFFSET)
 
 static unsigned int max_fairness_bonus = DEF_MAX_FAIRNESS_BONUS;
 
-#define FAIRNESS_BONUS_OFFSET	8
+#define FAIRNESS_BONUS_OFFSET	5
+#define FAIRNESS_ALPHA		((1UL << FAIRNESS_BONUS_OFFSET) - 2)
+#define FAIRNESS_ALPHA_COMPL	2
+
+static inline int fairness_bonus(const struct task_struct *p)
+{
+	return (p->sdu.spa.auxilary_bonus * max_fairness_bonus) >>
+		FAIRNESS_BONUS_OFFSET;
+}
 
 static DEFINE_PER_CPU(unsigned long, rq_avg_lshares);
 
@@ -124,7 +132,7 @@ static inline void zero_interactive_bonu
 
 static inline int bonuses(const struct task_struct *p)
 {
-	return current_ia_bonus_rnd(p) + p->sdu.spa.auxilary_bonus;
+	return current_ia_bonus_rnd(p) + fairness_bonus(p);
 }
 
 static int spa_ws_effective_prio(const struct task_struct *p)
@@ -161,65 +169,22 @@ static void spa_ws_fork(struct task_stru
 	p->sdu.spa.interactive_bonus <<= IA_BONUS_OFFSET;
 }
 
-static inline unsigned int map_ratio(unsigned long long a,
-				     unsigned long long b,
-				     unsigned int range)
-{
-	a *= range;
-
-#if BITS_PER_LONG < 64
-	/*
-	 * Assume that there's no 64 bit divide available
-	 */
-	if (a < b)
-		return 0;
-	/*
-	 * Scale down until b less than 32 bits so that we can do
-	 * a divide using do_div()
-	 */
-	while (b > ULONG_MAX) { a >>= 1; b >>= 1; }
-
-	(void)do_div(a, (unsigned long)b);
-
-	return a;
-#else
-	return a / b;
-#endif
-}
-
 static void spa_ws_reassess_fairness_bonus(struct task_struct *p)
 {
-	unsigned long long expected_delay, adjusted_delay;
-	unsigned long long avg_lshares;
-	unsigned long pshares;
-
-	p->sdu.spa.auxilary_bonus = 0;
-	if (max_fairness_bonus == 0)
-		return;
+	unsigned long long expected_delay;
+	unsigned long long wanr; /* weighted average number running */
 
-	pshares = LSHARES_AVG_REAL(p->sdu.spa.eb_shares);
-	avg_lshares = per_cpu(rq_avg_lshares, task_cpu(p));
-	if (avg_lshares <= pshares)
+	wanr = per_cpu(rq_avg_lshares, task_cpu(p)) / p->sdu.spa.eb_shares;
+	if (wanr <= LSHARES_AVG_ONE)
 		expected_delay = 0;
-	else {
-		expected_delay = p->sdu.spa.avg_cpu_per_cycle *
-			(avg_lshares - pshares);
-		(void)do_div(expected_delay, pshares);
-	}
-
-	/*
-	 * No delay means no bonus, but
-	 * NB this test also avoids a possible divide by zero error if
-	 * cpu is also zero and negative bonuses
-	 */
-	if (p->sdu.spa.avg_delay_per_cycle <= expected_delay)
-		return;
-
-	adjusted_delay = p->sdu.spa.avg_delay_per_cycle - expected_delay;
-	p->sdu.spa.auxilary_bonus =
-		map_ratio(adjusted_delay,
-			  adjusted_delay + p->sdu.spa.avg_cpu_per_cycle,
-			  max_fairness_bonus);
+	else
+		expected_delay = LSHARES_AVG_MUL(p->sdu.spa.avg_cpu_per_cycle,
+						(wanr - LSHARES_AVG_ONE));
+
+	p->sdu.spa.auxilary_bonus *= FAIRNESS_ALPHA;
+	p->sdu.spa.auxilary_bonus >>= FAIRNESS_BONUS_OFFSET;
+	if (p->sdu.spa.avg_delay_per_cycle > expected_delay)
+		p->sdu.spa.auxilary_bonus += FAIRNESS_ALPHA_COMPL;
 }
 
 static inline int spa_ws_eligible(struct task_struct *p)

next prev parent reply	other threads:[~2006-01-28 23:44 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-01-19 21:45 [ANNOUNCE][RFC] PlugSched-6.2 for 2.6.16-rc1 and 2.6.16-rc1-mm1 Peter Williams
2006-01-21  6:48 ` Peter Williams
2006-01-21 10:46 ` Paolo Ornati
2006-01-21 23:06   ` Peter Williams
2006-01-22 22:47     ` Peter Williams
2006-01-23  0:49       ` Peter Williams
2006-01-23 20:21         ` Paolo Ornati
2006-01-24  0:00           ` Peter Williams
2006-01-26  1:09           ` Peter Williams
2006-01-26  8:11             ` Paolo Ornati
2006-01-26 22:34               ` Peter Williams
2006-01-28 23:44                 ` Peter Williams [this message]
2006-01-31 17:44                   ` Paolo Ornati
2006-01-23 20:09     ` Paolo Ornati
2006-01-23 20:25       ` Lee Revell
2006-01-23 20:52         ` Paolo Ornati
2006-01-23 20:59           ` Lee Revell
2006-01-23 21:10             ` Paolo Ornati
2006-01-23 21:11               ` Lee Revell
2006-01-23 23:32       ` Peter Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43DC01D1.9040902@bigpond.net.au \
    --to=pwil3058@bigpond.net.au \
    --cc=kernel@kolivas.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=moilanen@austin.ibm.com \
    --cc=ornati@fastwebnet.it \
    --cc=wli@holomorphy.com \
    --cc=xiphux@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox