public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter Williams <pwil3058@bigpond.net.au>
To: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: Andrew Morton <akpm@osdl.org>, Mike Galbraith <efault@gmx.de>,
	Nick Piggin <nickpiggin@yahoo.com.au>,
	Ingo Molnar <mingo@elte.hu>, Con Kolivas <kernel@kolivas.org>,
	"Chen, Kenneth W" <kenneth.w.chen@intel.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] sched: smpnice work around for active_load_balance()
Date: Thu, 30 Mar 2006 12:14:57 +1100	[thread overview]
Message-ID: <442B3111.5030808@bigpond.net.au> (raw)
In-Reply-To: <20060329165052.C11376@unix-os.sc.intel.com>

Siddha, Suresh B wrote:
> On Thu, Mar 30, 2006 at 10:40:24AM +1100, Peter Williams wrote:
>> Siddha, Suresh B wrote:
>>> On Wed, Mar 29, 2006 at 02:42:45PM +1100, Peter Williams wrote:
>>>> I meant that it doesn't explicitly address your problem.  What it does 
>>>> is ASSUME that failure of load balancing to move tasks is because there 
>>>> was exactly one task on the source run queue and that this makes it a 
>>>> suitable candidate to have that single task moved elsewhere in the blind 
>>>> hope that it may fix an HT/MC imbalance that may or may not exist.  In 
>>>> my mind this is very close to random.  
>>> That so called assumption happens only when load balancing has
>>> failed for more than the domain specific cache_nice_tries. Only reason
>>> why it can fail so many times is because of all pinned tasks or only a single
>>> task is running on that particular CPU. load balancing code takes care of both
>>> these scenarios..
>>>
>>> sched groups cpu_power controls the mechanism of implementing HT/MC
>>> optimizations in addition to active balance code... There is no randomness
>>> in this.
>> The above explanation just increases my belief in the randomness of this 
>> solution.  This code is mostly done without locks and is therefore very 
>> racy and any assumptions made based on the number of times load 
>> balancing has failed etc. are highly speculative.
> 
> Isn't it the same case with regular cpu load calculations during load
> balance?

Yes.  Which is why move_tasks() is designed to cope.

But this doesn't effect the argument w.r.t. your code.

> 
>> And even if there is only one task on the CPU there's no guarantee that
>> that CPU is in a package that meets the other requirements to make the 
>> move desirable.  So there's a good probability that you'll be moving 
>> tasks unnecessarily.
> 
> sched groups cpu_power and domain topology information cleanly
> encapsulates the imbalance identification and source/destination groups
> to fix the imbalance.

But you don't look at the rest of the queues in the package to see if 
the need is REALLY required.

> 
>> It's a poor solution and it's being inflicted on architectures that 
>> don't need it.  Even if cache_nice_tries is used to suppress this 
>> behaviour on architectures that don't need it they have to carry the 
>> code in their kernel.
> 
> We can clearly throw CONFIG_SCHED_MC/SMT around that code.. Nick/Ingo
> do you see any issue?

That just makes it a poor solution and ugly. :-)

> 
>>>
>>>> Also back to front and inefficient.
>>> HT/MC imbalance is detected in a normal way.. A lightly loaded group
>>> finds an imbalance and tries to pull some load from a busy group (which
>>> is inline with normal load balance)... pull fails because the only task
>>> on that cpu is busy running and needs to go off the cpu (which is triggered
>>> by active load balance)... Scheduler load balance is generally done by a 
>>> pull mechansim and here (HT/MC) it is still a pull mechanism(triggering a 
>>> final push only because of the single running task) 
>>>
>>> If you have any better generic and simple method, please let us know.
>> I gave an example in a previous e-mail.  Basically, at the end of 
>> scheduler_tick() if rebalance_tick() doesn't move any tasks (it would be 
>> foolish to contemplate moving tasks of the queue just after you've moved 
>> some there) and the run queue has exactly one running task and it's time 
>> for a HT/MC rebalance check on the package that this run queue belongs 
>> to then check that package to to see if it meets the rest of criteria 
>> for needing to lose some tasks.  If it does look for a package that is a 
>> suitable recipient for the moved task and if you find one then mark this 
>> run queue as needing active load balancing and arrange for its migration 
>> thread to be started.
>>
>> Simple, direct and amenable to being only built on architectures that 
>> need the functionality.
> 
> First of all we will be doing unnecessary checks to see if there is
> an imbalance.. Current code triggers the checks and movement only when
> it is necessary.. And second, finding the correct destination cpu in the 
> presence of SMT and MC is really complicated.. Look at different examples
> in the OLS paper.. Domain topology provides all this info with no added
> complexity...
> 
>> Another (more complex) solution that would also allow improvements to 
>> other HT related code (e.g. the sleeping dependent code) would be to 
>> modify the load balancing code so that all CPUs in a package share a run 
>> queue and load balancing is then done between packages.  As long as the 
>> number of CPUs in a package is small this shouldn't have scalability 
>> issues.  The big disadvantage of this approach is its complexity which 
>> is probably too great to contemplate doing it in 2.6.X kernels.
> 
> Presence of SMT and MC, implementation of power-savings scheduler
> policy will present more challenges...

And I would recommend a similar approach to what I've suggested above. 
They could probably be combined into a single neat well encapsulated 
solution.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

  reply	other threads:[~2006-03-30  1:15 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-03-28  6:00 [PATCH] sched: smpnice work around for active_load_balance() Peter Williams
2006-03-28 19:25 ` Siddha, Suresh B
2006-03-28 22:44   ` Peter Williams
2006-03-29  2:14     ` Peter Williams
2006-03-29  2:52     ` Siddha, Suresh B
2006-03-29  3:42       ` Peter Williams
2006-03-29 22:52         ` Siddha, Suresh B
2006-03-29 23:40           ` Peter Williams
2006-03-30  0:50             ` Siddha, Suresh B
2006-03-30  1:14               ` Peter Williams [this message]
2006-04-02  4:48                 ` smpnice loadbalancing with high priority tasks Siddha, Suresh B
2006-04-02  7:08                   ` Peter Williams
2006-04-04  0:24                     ` Siddha, Suresh B
2006-04-04  1:22                       ` Peter Williams
2006-04-04  1:34                         ` Peter Williams
2006-04-04  2:11                         ` Siddha, Suresh B
2006-04-04  3:24                           ` Peter Williams
2006-04-04  4:34                             ` Peter Williams
2006-04-06  2:14                             ` Peter Williams
2006-04-20  1:24                     ` [patch] smpnice: don't consider sched groups which are lightly loaded for balancing Siddha, Suresh B
2006-04-20  5:19                       ` Peter Williams
2006-04-20 16:54                         ` Siddha, Suresh B
2006-04-20 23:11                           ` Peter Williams
2006-04-20 23:49                           ` Andrew Morton
2006-04-21  0:25                             ` Siddha, Suresh B
2006-04-21  0:28                             ` Peter Williams
2006-04-21  1:25                               ` Andrew Morton
2006-04-20 17:04                         ` Siddha, Suresh B
2006-04-21  0:00                           ` Peter Williams
2006-04-03  1:04             ` [PATCH] sched: smpnice work around for active_load_balance() Peter Williams
2006-04-03 16:57               ` Siddha, Suresh B
2006-04-03 23:11                 ` Peter Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=442B3111.5030808@bigpond.net.au \
    --to=pwil3058@bigpond.net.au \
    --cc=akpm@osdl.org \
    --cc=efault@gmx.de \
    --cc=kenneth.w.chen@intel.com \
    --cc=kernel@kolivas.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=nickpiggin@yahoo.com.au \
    --cc=suresh.b.siddha@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox