From: Ingo Molnar <mingo@kernel.org>
To: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
James Hartsock <hartsjc@redhat.com>,
Rik van Riel <riel@redhat.com>,
Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>,
Kirill Tkhai <ktkhai@parallels.com>,
linux-kernel@vger.kernel.org
Subject: Re: [RFC] sched: unused cpu in affine workload
Date: Mon, 4 Apr 2016 11:19:51 +0200 [thread overview]
Message-ID: <20160404091951.GA10360@gmail.com> (raw)
In-Reply-To: <20160404085944.GA3030@gmail.com>
* Ingo Molnar <mingo@kernel.org> wrote:
> - if you want to come up with a 'complete' solution then please don't put it into
> hot paths such as wakeup or context switching, or any of the hardirq methods,
> but try to integrate it with the NUMA scheduling slow path.
>
> The NUMA balancing slow path: that is softirq driven and reasonably low freq to
> not cause many performance problems.
>
> The two problems (NUMA affinity and user affinity) are also losely related on a
> conceptual level: the NUMA affinity optimization problem can be considered as a
> workload determined, arbitrary 'NUMA mask' being optimized from first
> principles.
>
> There's one ABI detail: this is true only as long as SMP affinity masks follow
> node boundaries - the current NUMA balancing code is very much node granular, so
> the two can only be merged if the ->cpus_allowed mask follows node boundaries as
> well.
>
> A third approach would be to extend the NUMA balancing code to be CPU granular
> (without changing anytask placement behavior of the current NUMA balancing code
> of course), with node granular being a special case. This would fit the cgroups
> (and virtualization) usecases, but that would be a major change.
So my thinking here is: if the NUMA balancing code (which is node granular at the
moment and uses node masks, etc.) is extended to be CPU granular (which is a big
task in itself), then the two problems can be 'unified':
- the NUMA balancing code inputs arbitrarly CPU (node) affinity masks from the
MM code into the scheduler.
- the scheduler syscall ABI (and other configuration sources) inputs arbitrary
CPU affinity masks into the scheduler.
it's a similar problem, with two (minor looking) complication:
- the NUMA code right now is 'statistical', while ->cpus_allowed are hard
constraints that must never be violated. So there always has to be a final
layer to implement the hard constraint - which does not exist in the NUMA
balancing case. This should be relatively easy I think as we already do it
with the regular balancer.
- the balancing slowpath would have to be activated on non-NUMA systems as well,
so that it can handle ->cpus_allowed balancing.
... once all that is solved, I can see several advantages from unifying the NUMA
balancing and SMP affinity balancing code:
- the NUMA balancer would improve: cpus_allowed isolation is used more
frequently, so fixes from those workloads would benefit the NUMA balancing case
as well.
- testing the NUMA balancer would become easier: we'd simply set cpus_allowed and
would watch how it balances. No need to coax workloads into actual MM NUMA
usage patters to set up interesting scenarios.
- our existing half-hearted ways to deal with cpus_allowed balancing could be
outsourced to the NUMA slow path, which would simplify the SMP balancing fast
path.
But it's a major piece of work, and I might be missing implementational details.
It would be the biggest new scheduler feature since NUMA balancing for sure.
Thanks,
Ingo
next prev parent reply other threads:[~2016-04-04 9:19 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-04 8:23 [RFC] sched: unused cpu in affine workload Jiri Olsa
2016-04-04 8:44 ` Peter Zijlstra
2016-04-04 8:59 ` Ingo Molnar
2016-04-04 9:19 ` Ingo Molnar [this message]
2016-04-04 9:38 ` Ingo Molnar
2016-04-04 13:23 ` Peter Zijlstra
2016-04-04 19:45 ` Rik van Riel
2016-04-04 21:34 ` Peter Zijlstra
2016-04-05 8:56 ` Jiri Olsa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160404091951.GA10360@gmail.com \
--to=mingo@kernel.org \
--cc=a.p.zijlstra@chello.nl \
--cc=hartsjc@redhat.com \
--cc=jolsa@redhat.com \
--cc=ktkhai@parallels.com \
--cc=linux-kernel@vger.kernel.org \
--cc=riel@redhat.com \
--cc=vatsa@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).