Re: [ANNOUNCE][RFC] PlugSched-6.2 for 2.6.16-rc1 and 2.6.16-rc1-mm1

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Peter Williams <pwil3058@bigpond.net.au>
To: Paolo Ornati <ornati@fastwebnet.it>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Chris Han <xiphux@gmail.com>, Con Kolivas <kernel@kolivas.org>,
	William Lee Irwin III <wli@holomorphy.com>,
	Jake Moilanen <moilanen@austin.ibm.com>
Subject: Re: [ANNOUNCE][RFC] PlugSched-6.2 for  2.6.16-rc1 and 2.6.16-rc1-mm1
Date: Mon, 23 Jan 2006 11:49:33 +1100	[thread overview]
Message-ID: <43D4281D.10009@bigpond.net.au> (raw)
In-Reply-To: <43D40B96.3060705@bigpond.net.au>

[-- Attachment #1: Type: text/plain, Size: 4023 bytes --]

Peter Williams wrote:
> Peter Williams wrote:
> 
>> Paolo Ornati wrote:
>>
>>> On Fri, 20 Jan 2006 08:45:43 +1100
>>> Peter Williams <pwil3058@bigpond.net.au> wrote:
>>>
>>>
>>>> Modifications have been made to spa_ws to (hopefully) address the 
>>>> issues raised by Paolo Ornati recently and a new entitlement based 
>>>> interpretation of "nice" scheduler, spa_ebs, which is a cut down 
>>>> version of the Zaphod schedulers "eb" mode has been added as this 
>>>> mode of Zaphod performed will for Paolo's problem when he tried it 
>>>> at my request. Paolo, could you please give these a test drive on 
>>>> your problem?
>>>
>>>
>>>
>>>
>>> ---- spa_ws: the problem is still here
>>>
>>> (sched_fooler)
>>> ./a.out 3000 & ./a.out 4307 &
>>>
>>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>>  5573 paolo     34   0  2396  292  228 R 59.0  0.1   0:24.51 a.out
>>>  5572 paolo     34   0  2392  288  228 R 40.7  0.1   0:16.94 a.out
>>>  5580 paolo     35   0  4948 1468  372 R  0.3  0.3   0:00.04 dd
>>>
>>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>>  5573 paolo     34   0  2396  292  228 R 59.3  0.1   0:59.65 a.out
>>>  5572 paolo     33   0  2392  288  228 R 40.3  0.1   0:41.32 a.out
>>>  5440 paolo     28   0 86652  21m  15m S  0.3  4.4   0:03.34 konsole
>>>  5580 paolo     37   0  4948 1468  372 R  0.3  0.3   0:00.10 dd
>>>
>>>
>>> (real life - transcode)
>>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>>  5585 paolo     33   0  115m  18m 2432 S 90.0  3.7   0:38.04 transcode
>>>  5599 paolo     37   0 50996 4472 1872 R  9.1  0.9   0:04.03 tcdecode
>>>  5610 paolo     37   0  4948 1468  372 R  0.6  0.3   0:00.19 dd
>>>
>>>
>>> DD test takes ages in both cases.
>>>
>>> What exactly have you done to spa_ws?
>>
>>
>>
>> I added a "nice aware" version of the throughput bonuses from spa_svr 
>> and renamed them fairness bonus.  They don't appear to be working :-(
>>
>> 34 is the priority value that ordinary tasks should end up with i.e. 
>> if they don't look like interactive tasks or CPU hogs.  If they look 
>> like interactive tasks they should get a lower one via the interactive 
>> bonus mechanism and if they look like CPU hogs they should get a 
>> higher one via the same mechanism.  In addition to this tasks will get 
>> bonuses if they seem to be being treated unfairly i.e. spending too 
>> much time on run queues waiting for CPU access.
>>
>> Looking at your numbers the transcode task has the priority that I'd 
>> expect it to have but tcdecode and dd seem to have had their 
>> priorities adjusted in the wrong direction.   It's almost like they'd 
>> been (incorrectly, obviously) identified as CPU hogs :-(.  I'll look 
>> into this.
> 
> 
> I forgot that I'd also made changes to the "CPU hog" component of the 
> interactive response as the one I had was useless on heavily loaded 
> systems.  It appears that I made a mistake (I used interactive 
> sleepiness instead of ordinary sleepiness for detecting CPU hogs) during 
> these changes which means that tasks that do no interactive sleeping 
> (such as your dd) get classified as CPU hogs.  The transcode task 
> escapes this because, although its sleeps aren't really interactive, 
> they're classified as such.  More widespread us of TASK_NONINTERACTIVE 
> would fix this but would need to be done carefully as it would risk 
> breaking the normal scheduler.
> 
> However, in spite of the above, the fairness mechanism should have been 
> able to generate enough bonus points to get dd's priority back to less 
> than 34.  I'm still investigating why this didn't happen.

Problem solved.  It was a scaling issue during the calculation of 
expected delay.  The attached patch should fix both the CPU hog problem 
and the fairness problem.  Could you give it a try?

Thanks,
Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

[-- Attachment #2: fix-spa_ws-scheduler --]
[-- Type: text/plain, Size: 4729 bytes --]

Index: MM-2.6.16/kernel/sched_spa_ws.c
===================================================================
--- MM-2.6.16.orig/kernel/sched_spa_ws.c	2006-01-21 16:42:45.000000000 +1100
+++ MM-2.6.16/kernel/sched_spa_ws.c	2006-01-23 11:42:32.000000000 +1100
@@ -44,7 +44,8 @@ static unsigned int initial_ia_bonus = D
 #define LSHARES_AVG_OFFSET 7
 #define LSHARES_AVG_ALPHA ((1 << LSHARES_AVG_OFFSET) - 2)
 #define LSHARES_AVG_INCR(a) ((a) << 1)
-#define LSHARES_AVG_ONE (1UL << LSHARES_AVG_OFFSET)
+#define LSHARES_AVG_REAL(s) ((s) << LSHARES_AVG_OFFSET)
+#define LSHARES_AVG_ONE LSAHRES_AVG_REAL(1UL)
 #define LSHARES_AVG_MUL(a, b) (((a) * (b)) >> LSHARES_AVG_OFFSET)
 
 static unsigned int max_fairness_bonus = DEF_MAX_FAIRNESS_BONUS;
@@ -121,32 +122,9 @@ static inline void zero_interactive_bonu
 	p->sdu.spa.interactive_bonus = 0;
 }
 
-static inline int current_fairness_bonus(const struct task_struct *p)
-{
-	return p->sdu.spa.auxilary_bonus >> FAIRNESS_BONUS_OFFSET;
-}
-
-static inline int current_fairness_bonus_rnd(const struct task_struct *p)
-{
-	return (p->sdu.spa.auxilary_bonus + (1UL << (FAIRNESS_BONUS_OFFSET - 1)))
-		>> FAIRNESS_BONUS_OFFSET;
-}
-
-static inline void decr_fairness_bonus(struct task_struct *p)
-{
-	p->sdu.spa.auxilary_bonus *= ((1UL << FAIRNESS_BONUS_OFFSET) - 2);
-	p->sdu.spa.auxilary_bonus >>= FAIRNESS_BONUS_OFFSET;
-}
-
-static inline void incr_fairness_bonus(struct task_struct *p)
-{
-	decr_fairness_bonus(p);
-	p->sdu.spa.auxilary_bonus += (max_fairness_bonus << 1);
-}
-
 static inline int bonuses(const struct task_struct *p)
 {
-	return current_ia_bonus_rnd(p) + current_fairness_bonus_rnd(p);
+	return current_ia_bonus_rnd(p) + p->sdu.spa.auxilary_bonus;
 }
 
 static int spa_ws_effective_prio(const struct task_struct *p)
@@ -211,43 +189,36 @@ static inline unsigned int map_ratio(uns
 
 static void spa_ws_reassess_fairness_bonus(struct task_struct *p)
 {
-	unsigned long long expected_delay;
+	unsigned long long expected_delay, adjusted_delay;
 	unsigned long long avg_lshares;
+	unsigned long pshares = LSHARES_AVG_REAL(p->sdu.spa.eb_shares);
 
-#if 0
 	p->sdu.spa.auxilary_bonus = 0;
 	if (max_fairness_bonus == 0)
 		return;
-#endif
 
 	avg_lshares = per_cpu(rq_avg_lshares, task_cpu(p));
-	if (avg_lshares <= p->sdu.spa.eb_shares)
+	if (avg_lshares <= pshares)
 		expected_delay = 0;
 	else {
 		expected_delay = LSHARES_AVG_MUL(p->sdu.spa.avg_cpu_per_cycle,
-				      (avg_lshares - p->sdu.spa.eb_shares));
-		(void)do_div(expected_delay, p->sdu.spa.eb_shares);
+				      (avg_lshares - pshares));
+		(void)do_div(expected_delay, pshares);
 	}
-#if 1
-	if (p->sdu.spa.avg_delay_per_cycle > expected_delay)
-		incr_fairness_bonus(p);
-	else
-		decr_fairness_bonus(p);
-#else
+
 	/*
 	 * No delay means no bonus, but
 	 * NB this test also avoids a possible divide by zero error if
 	 * cpu is also zero and negative bonuses
 	 */
-	lhs = p->sdu.spa.avg_delay_per_cycle;
-	if (lhs <= rhs)
+	if (p->sdu.spa.avg_delay_per_cycle <= expected_delay)
 		return;
 
-	lhs  -= rhs;
+	adjusted_delay = p->sdu.spa.avg_delay_per_cycle - expected_delay;
 	p->sdu.spa.auxilary_bonus =
-		map_ratio(lhs, lhs + p->sdu.spa.avg_cpu_per_cycle,
+		map_ratio(adjusted_delay,
+			  adjusted_delay + p->sdu.spa.avg_cpu_per_cycle,
 			  max_fairness_bonus);
-#endif
 }
 
 static inline int spa_ws_eligible(struct task_struct *p)
@@ -255,6 +226,15 @@ static inline int spa_ws_eligible(struct
 	return p->sdu.spa.avg_sleep_per_cycle < WS_BIG_SLEEP;
 }
 
+static inline int spa_sleepiness_exceeds_ppt(const struct task_struct *p,
+					    unsigned int ppt)
+{
+	return RATIO_EXCEEDS_PPT(p->sdu.spa.avg_sleep_per_cycle,
+				 p->sdu.spa.avg_sleep_per_cycle +
+				 p->sdu.spa.avg_cpu_per_cycle,
+				 ppt);
+}
+
 static void spa_ws_reassess_at_activation(struct task_struct *p)
 {
 	spa_ws_reassess_fairness_bonus(p);
@@ -264,7 +244,7 @@ static void spa_ws_reassess_at_activatio
 		else
 			partial_incr_interactive_bonus(p);
 	}
-	else if (!spa_ia_sleepiness_exceeds_ppt(p, iab_decr_threshold))
+	else if (!spa_sleepiness_exceeds_ppt(p, iab_decr_threshold))
 		decr_interactive_bonus(p);
 	else if (!spa_ia_sleepiness_exceeds_ppt(p, (iab_decr_threshold + iab_incr_threshold) / 2))
 		partial_decr_interactive_bonus(p);
@@ -284,7 +264,7 @@ static void spa_ws_reassess_at_end_of_ts
 	/* Don't punish tasks that have done a lot of sleeping for the
 	 * occasional run of short sleeps unless they become a cpu hog.
 	 */
-	if (!spa_ia_sleepiness_exceeds_ppt(p, iab_decr_threshold))
+	if (!spa_sleepiness_exceeds_ppt(p, iab_decr_threshold))
 		decr_interactive_bonus(p);
 	else if (!spa_ia_sleepiness_exceeds_ppt(p, (iab_decr_threshold + iab_incr_threshold) / 2))
 		partial_decr_interactive_bonus(p);

next prev parent reply	other threads:[~2006-01-23  0:49 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-01-19 21:45 [ANNOUNCE][RFC] PlugSched-6.2 for 2.6.16-rc1 and 2.6.16-rc1-mm1 Peter Williams
2006-01-21  6:48 ` Peter Williams
2006-01-21 10:46 ` Paolo Ornati
2006-01-21 23:06   ` Peter Williams
2006-01-22 22:47     ` Peter Williams
2006-01-23  0:49       ` Peter Williams [this message]
2006-01-23 20:21         ` Paolo Ornati
2006-01-24  0:00           ` Peter Williams
2006-01-26  1:09           ` Peter Williams
2006-01-26  8:11             ` Paolo Ornati
2006-01-26 22:34               ` Peter Williams
2006-01-28 23:44                 ` Peter Williams
2006-01-31 17:44                   ` Paolo Ornati
2006-01-23 20:09     ` Paolo Ornati
2006-01-23 20:25       ` Lee Revell
2006-01-23 20:52         ` Paolo Ornati
2006-01-23 20:59           ` Lee Revell
2006-01-23 21:10             ` Paolo Ornati
2006-01-23 21:11               ` Lee Revell
2006-01-23 23:32       ` Peter Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43D4281D.10009@bigpond.net.au \
    --to=pwil3058@bigpond.net.au \
    --cc=kernel@kolivas.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=moilanen@austin.ibm.com \
    --cc=ornati@fastwebnet.it \
    --cc=wli@holomorphy.com \
    --cc=xiphux@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.