From: Ingo Molnar <mingo@kernel.org>
To: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
James Hartsock <hartsjc@redhat.com>,
Rik van Riel <riel@redhat.com>,
Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>,
Kirill Tkhai <ktkhai@parallels.com>,
linux-kernel@vger.kernel.org
Subject: Re: [RFC] sched: unused cpu in affine workload
Date: Mon, 4 Apr 2016 11:19:51 +0200 [thread overview]
Message-ID: <20160404091951.GA10360@gmail.com> (raw)
In-Reply-To: <20160404085944.GA3030@gmail.com>
* Ingo Molnar <mingo@kernel.org> wrote:
> - if you want to come up with a 'complete' solution then please don't put it into
> hot paths such as wakeup or context switching, or any of the hardirq methods,
> but try to integrate it with the NUMA scheduling slow path.
>
> The NUMA balancing slow path: that is softirq driven and reasonably low freq to
> not cause many performance problems.
>
> The two problems (NUMA affinity and user affinity) are also losely related on a
> conceptual level: the NUMA affinity optimization problem can be considered as a
> workload determined, arbitrary 'NUMA mask' being optimized from first
> principles.
>
> There's one ABI detail: this is true only as long as SMP affinity masks follow
> node boundaries - the current NUMA balancing code is very much node granular, so
> the two can only be merged if the ->cpus_allowed mask follows node boundaries as
> well.
>
> A third approach would be to extend the NUMA balancing code to be CPU granular
> (without changing anytask placement behavior of the current NUMA balancing code
> of course), with node granular being a special case. This would fit the cgroups
> (and virtualization) usecases, but that would be a major change.
So my thinking here is: if the NUMA balancing code (which is node granular at the
moment and uses node masks, etc.) is extended to be CPU granular (which is a big
task in itself), then the two problems can be 'unified':
- the NUMA balancing code inputs arbitrarly CPU (node) affinity masks from the
MM code into the scheduler.
- the scheduler syscall ABI (and other configuration sources) inputs arbitrary
CPU affinity masks into the scheduler.
it's a similar problem, with two (minor looking) complication:
- the NUMA code right now is 'statistical', while ->cpus_allowed are hard
constraints that must never be violated. So there always has to be a final
layer to implement the hard constraint - which does not exist in the NUMA
balancing case. This should be relatively easy I think as we already do it
with the regular balancer.
- the balancing slowpath would have to be activated on non-NUMA systems as well,
so that it can handle ->cpus_allowed balancing.
... once all that is solved, I can see several advantages from unifying the NUMA
balancing and SMP affinity balancing code:
- the NUMA balancer would improve: cpus_allowed isolation is used more
frequently, so fixes from those workloads would benefit the NUMA balancing case
as well.
- testing the NUMA balancer would become easier: we'd simply set cpus_allowed and
would watch how it balances. No need to coax workloads into actual MM NUMA
usage patters to set up interesting scenarios.
- our existing half-hearted ways to deal with cpus_allowed balancing could be
outsourced to the NUMA slow path, which would simplify the SMP balancing fast
path.
But it's a major piece of work, and I might be missing implementational details.
It would be the biggest new scheduler feature since NUMA balancing for sure.
Thanks,
Ingo
next prev parent reply other threads:[~2016-04-04 9:19 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-04 8:23 [RFC] sched: unused cpu in affine workload Jiri Olsa
2016-04-04 8:44 ` Peter Zijlstra
2016-04-04 8:59 ` Ingo Molnar
2016-04-04 9:19 ` Ingo Molnar [this message]
2016-04-04 9:38 ` Ingo Molnar
2016-04-04 13:23 ` Peter Zijlstra
2016-04-04 19:45 ` Rik van Riel
2016-04-04 21:34 ` Peter Zijlstra
2016-04-05 8:56 ` Jiri Olsa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160404091951.GA10360@gmail.com \
--to=mingo@kernel.org \
--cc=a.p.zijlstra@chello.nl \
--cc=hartsjc@redhat.com \
--cc=jolsa@redhat.com \
--cc=ktkhai@parallels.com \
--cc=linux-kernel@vger.kernel.org \
--cc=riel@redhat.com \
--cc=vatsa@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.