Re: [PATCH] sched, fair: Allow a small load imbalance between low utilisation SD_NUMA domains v3

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Mel Gorman <mgorman@techsingularity.net>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Hillf Danton <hdanton@sina.com>, Rik van Riel <riel@surriel.com>,
	Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Phil Auld <pauld@redhat.com>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	Quentin Perret <quentin.perret@arm.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Morten Rasmussen <Morten.Rasmussen@arm.com>,
	Parth Shah <parth@linux.ibm.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] sched, fair: Allow a small load imbalance between low utilisation SD_NUMA domains v3
Date: Tue, 7 Jan 2020 10:16:46 +0000	[thread overview]
Message-ID: <20200107101646.GG3466@techsingularity.net> (raw)
In-Reply-To: <CAKfTPtAMeta=jtTkTewdFN1UyqT+iyRhm9pfNDjkydfJQjaorQ@mail.gmail.com>

On Tue, Jan 07, 2020 at 10:43:08AM +0100, Vincent Guittot wrote:
> > > > > It's not directly related to the number of CPUs in the node. Are you
> > > > > thinking of busiest->group_weight?
> > > >
> > > > I am, because as it is right now that if condition
> > > > looks like it might never be true for imbalance_pct 115.
> > > >
> > > > Presumably you put that check there for a reason, and
> > > > would like it to trigger when the amount by which a node
> > > > is busy is less than 2 * (imbalance_pct - 100).
> > >
> > >
> > > If three per cent can make any sense in helping determine utilisation
> > > low then the busy load has to meet
> > >
> > >       busiest->sum_nr_running < max(3, cpus in the node / 32);
> > >
> >
> > Why 3% and why would the low utilisation cut-off depend on the number of
> 
> But in the same way, why only 6 tasks ? which is the value with
> default imbalance_pct ?

I laid this out in another mail sent based on timing so I would repeat
myself other than to say it's predictable across machines.

> I expect a machine with 128 CPUs to have more bandwidth than a machine
> with only 32 CPUs and as a result to allow more imbalance
> 

I would expect so too with the caveat that there can be more memory
channels within a node so positioning does matter but we can't take
everything into account without creating a convulated mess. Worse, we have
no decent method for estimating bandwidth as it depends on the reference
pattern and scheduler domains do not currently take memory channels into
account. Maybe they should but that's a whole different discussion that
we do not want to get into right now.

> Maybe the number of running tasks (or idle cpus) is not the right
> metrics to choose if we can allow a small degree of imbalance because
> this doesn't take into account it wether the tasks are long running or
> short running ones
> 

I think running tasks at least is the least bad metric. idle CPUs gets
caught up in corner cases with bindings and util_avg can be skewed by
outliers. Running tasks is a sensible starting point until there is a
concrete use case that shows it is unworkable. Lets see what you think of
the other untested patch I posted that takes the group weight and child
domain weight into account.

-- 
Mel Gorman
SUSE Labs

next prev parent reply	other threads:[~2020-01-07 10:16 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-06 14:42 [PATCH] sched, fair: Allow a small load imbalance between low utilisation SD_NUMA domains v3 Mel Gorman
2020-01-06 15:47 ` Rik van Riel
2020-01-06 16:33   ` Mel Gorman
2020-01-06 16:44     ` Rik van Riel
2020-01-06 17:19       ` Mel Gorman
     [not found]     ` <20200107015111.4836-1-hdanton@sina.com>
2020-01-07  8:44       ` Vincent Guittot
2020-01-07  9:12       ` Mel Gorman
2020-01-07  9:43         ` Vincent Guittot
2020-01-07 10:16           ` Mel Gorman [this message]
2020-01-08 15:49             ` Valentin Schneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200107101646.GG3466@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=Morten.Rasmussen@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=hdanton@sina.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=parth@linux.ibm.com \
    --cc=pauld@redhat.com \
    --cc=peterz@infradead.org \
    --cc=quentin.perret@arm.com \
    --cc=riel@surriel.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox