All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Williams <pwil3058@bigpond.net.au>
To: Peter Williams <pwil3058@bigpond.net.au>
Cc: Andrew Morton <akpm@osdl.org>,
	suresh.b.siddha@intel.com, kernel@kolivas.org, npiggin@suse.de,
	mingo@elte.hu, rostedt@goodmis.org, linux-kernel@vger.kernel.org,
	torvalds@osdl.org
Subject: Re: [rfc][patch] sched: remove smpnice
Date: Mon, 13 Feb 2006 10:10:23 +1100	[thread overview]
Message-ID: <43EFC05F.6010204@bigpond.net.au> (raw)
In-Reply-To: <43EE8BAD.9040601@bigpond.net.au>

Peter Williams wrote:
> Andrew Morton wrote:
> 
>> Peter Williams <pwil3058@bigpond.net.au> wrote:
>>
>>> I don't think either of these issues warrant abandoning smpnice.  The 
>>> first is highly unlikely to occur on real systems and the second is 
>>> just an example of the patch doing its job (maybe too officiously).  
>>> I don't think users would notice either on real systems.
>>>
>>> Even if you pull it from 2.6.16 rather than upgrading it with my 
>>> patch can you please leave both in -mm?
>>
>>
>>
>> Yes, I have done that.  I currently have:
>>
>> sched-restore-smpnice.patch
>> sched-modified-nice-support-for-smp-load-balancing.patch
>> sched-cleanup_task_activated.patch
>> sched-alter_uninterruptible_sleep_interactivity.patch
>> sched-make_task_noninteractive_use_sleep_type.patch
>> sched-dont_decrease_idle_sleep_avg.patch
>> sched-include_noninteractive_sleep_in_idle_detect.patch
>> sched-new-sched-domain-for-representing-multi-core.patch
>> sched-fix-group-power-for-allnodes_domains.patch
> 
> 
> OK.  Having slept on these problems I am now of the opinion that the 
> problems are caused by the use of NICE_TO_BIAS_PRIO(0) to set *imbalance 
> inside the (*imbalance < SCHED_LOAD_SCALE) if statement in 
> find_busiest_group().  What is happening here is that even though the 
> imbalance is less than one (average) task sometimes the decision is made 
> to move a task anyway but with the current version this decision can be 
> subverted in two ways: 1) if the task to be moved has a nice value less 
> than zero the value of *imbalance that is set will be too small for 
> move_tasks() to move it; and 2) if there are a number of tasks with nice 
> values greater than zero on the "busiest" more than one of them may be 
> moved as the value of *imbalance that is set may be big enough to 
> include more than one of these tasks.
> 
> The fix for this problem is to replace NICE_TO_BIAS_PRIO(0) with the 
> "average bias prio per runnable task" on "busiest".  This will 
> (generally) result in a larger value for *imbalance in case 1. above and 
> a smaller one in case 2. and alleviate both problems.  A patch to apply 
> this fix is attached.
> 
> Signed-off-by: Peter Williams <pwil3058@bigpond.com.au>
> 
> Could you please add this patch to -mm so that it can be tested?
> 
> Thanks
> Peter
> 
> 
> ------------------------------------------------------------------------
> 
> Index: MM-2.6.X/kernel/sched.c
> ===================================================================
> --- MM-2.6.X.orig/kernel/sched.c	2006-02-12 11:24:48.000000000 +1100
> +++ MM-2.6.X/kernel/sched.c	2006-02-12 11:35:40.000000000 +1100
> @@ -735,6 +735,19 @@ static inline unsigned long biased_load(
>  {
>  	return (wload * NICE_TO_BIAS_PRIO(0)) / SCHED_LOAD_SCALE;
>  }
> +
> +/* get the average biased load per runnable task for a run queue */
> +static inline unsigned long avg_biased_load(runqueue_t *rq)
> +{
> +	/*
> +	 * When I'm convinced that this won't be called with a zero nr_running
> +	 * and that it can't change during the call this can be simplified.
> +	 * For the time being and for proof of concept let's paly it safe.
> +	 */
> +	unsigned long n = rq->nr_running;
> +
> +	return n ? rq->prio_bias / n : 0;
> +}
>  #else
>  static inline void set_bias_prio(task_t *p)
>  {
> @@ -2116,7 +2129,7 @@ find_busiest_group(struct sched_domain *
>  		unsigned long tmp;
>  
>  		if (max_load - this_load >= SCHED_LOAD_SCALE*2) {
> -			*imbalance = NICE_TO_BIAS_PRIO(0);
> +			*imbalance = avg_biased_load(busiest);
>  			return busiest;
>  		}
>  
> @@ -2149,7 +2162,7 @@ find_busiest_group(struct sched_domain *
>  		if (pwr_move <= pwr_now)
>  			goto out_balanced;
>  
> -		*imbalance = NICE_TO_BIAS_PRIO(0);
> +		*imbalance = avg_biased_load(busiest);
>  		return busiest;
>  	}
>  

Can we pull this one, please?  I've mistakenly assumed that busiest was 
a run queue when it's actually a sched group. :-(

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

  reply	other threads:[~2006-02-12 23:10 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-07 14:28 [rfc][patch] sched: remove smpnice Nick Piggin
2006-02-07 14:57 ` Con Kolivas
2006-02-07 15:05   ` Nick Piggin
2006-02-07 22:15   ` Andrew Morton
2006-02-07 23:11     ` Con Kolivas
2006-02-07 23:36       ` Andrew Morton
2006-02-08  3:28         ` Nick Piggin
2006-02-08 14:56         ` Ingo Molnar
2006-02-10  7:01         ` Siddha, Suresh B
2006-02-10  7:17           ` Andrew Morton
2006-02-10  7:23             ` Con Kolivas
2006-02-10  9:06             ` Ingo Molnar
2006-02-11  1:27             ` Peter Williams
2006-02-11  2:00               ` Andrew Morton
2006-02-12  1:13                 ` Peter Williams
2006-02-12 23:10                   ` Peter Williams [this message]
2006-02-13  1:06                     ` Peter Williams
2006-02-14  0:37                       ` Peter Williams
2006-02-14  8:53                         ` Siddha, Suresh B
2006-02-11  3:36               ` Peter Williams
2006-02-11  4:04               ` Peter Williams
2006-02-14  9:07               ` Siddha, Suresh B
2006-02-14 22:40                 ` Peter Williams
2006-02-14 23:44                   ` Paul Jackson
2006-02-15  0:09                     ` Peter Williams
2006-02-15  1:00                       ` Paul Jackson
2006-02-15  7:07                   ` Siddha, Suresh B
2006-02-15 22:36                     ` Peter Williams
2006-02-15 23:29                       ` Peter Williams
2006-02-13 14:12           ` Con Kolivas
2006-02-07 23:20     ` Peter Williams
2006-02-07 23:29       ` Con Kolivas
2006-02-07 23:36       ` Martin Bligh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43EFC05F.6010204@bigpond.net.au \
    --to=pwil3058@bigpond.net.au \
    --cc=akpm@osdl.org \
    --cc=kernel@kolivas.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=npiggin@suse.de \
    --cc=rostedt@goodmis.org \
    --cc=suresh.b.siddha@intel.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.