Re: [PATCH 1/2] sched/fair: Consider SD_NUMA when selecting the most idle group to schedule on

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Galbraith <efault@gmx.de>,
	Matt Fleming <matt@codeblueprint.co.uk>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/2] sched/fair: Consider SD_NUMA when selecting the most idle group to schedule on
Date: Tue, 13 Feb 2018 11:45:41 +0100	[thread overview]
Message-ID: <20180213104541.GG25201@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20180212171131.26139-2-mgorman@techsingularity.net>

On Mon, Feb 12, 2018 at 05:11:30PM +0000, Mel Gorman wrote:
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 50442697b455..0192448e43a2 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5917,6 +5917,18 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p,
>  	if (!idlest)
>  		return NULL;
>  
> +	/*
> +	 * When comparing groups across NUMA domains, it's possible for the
> +	 * local domain to be very lightly loaded relative to the remote
> +	 * domains but "imbalance" skews the comparison making remote CPUs
> +	 * look much more favourable. When considering cross-domain, add
> +	 * imbalance to the runnable load on the remote node and consider
> +	 * staying local.
> +	 */
> +	if ((sd->flags & SD_NUMA) &&
> +	    min_runnable_load + imbalance >= this_runnable_load)
> +		return NULL;
> +
>  	if (min_runnable_load > (this_runnable_load + imbalance))
>  		return NULL;

So this is basically a spread vs group decision, which we typically do
using SD_PREFER_SIBLNG. Now that flag is a bit awkward in that its set
on the child domain.

Now, we set it for SD_SHARE_PKG_RESOURCES (aka LLC), which means that for
our typical modern NUMA system we indicate we want to spread between the
lowest NUMA level. And regular load balancing will do so.

Now you modify the idlest code for initial placement to go against the
stable behaviour, which is unfortunate.

However, if we have numa balancing enabled, that will counteract
the normal spreading across nodes, so in that regard it makes sense, but
the above code is not conditional on numa balancing.

I'm torn and confused...

next prev parent reply	other threads:[~2018-02-13 10:45 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-12 17:11 [PATCH 0/2] Stop wake_affine fighting with automatic NUMA balancing Mel Gorman
2018-02-12 17:11 ` [PATCH 1/2] sched/fair: Consider SD_NUMA when selecting the most idle group to schedule on Mel Gorman
2018-02-13 10:45   ` Peter Zijlstra [this message]
2018-02-13 11:35     ` Mel Gorman
2018-02-13 13:04       ` Peter Zijlstra
2018-02-13 13:29         ` Mel Gorman
2018-02-12 17:11 ` [PATCH 2/2] sched/numa: Delay retrying placement for automatic NUMA balance after wake_affine Mel Gorman
2018-02-12 17:34   ` Peter Zijlstra
2018-02-12 17:52     ` Mel Gorman
2018-02-12 17:37   ` Peter Zijlstra
2018-02-12 18:11     ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180213104541.GG25201@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=efault@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matt@codeblueprint.co.uk \
    --cc=mgorman@techsingularity.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox