From: Rik van Riel <riel@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, jhladky@redhat.com,
mingo@kernel.org, mgorman@suse.de
Subject: Re: [PATCH 4/4] sched,fair: remove effective_load
Date: Tue, 27 Jun 2017 10:55:58 -0400 [thread overview]
Message-ID: <1498575358.20270.114.camel@redhat.com> (raw)
In-Reply-To: <20170627053906.GA7287@worktop>
[-- Attachment #1: Type: text/plain, Size: 3743 bytes --]
On Tue, 2017-06-27 at 07:39 +0200, Peter Zijlstra wrote:
> On Mon, Jun 26, 2017 at 03:34:49PM -0400, Rik van Riel wrote:
> > On Mon, 2017-06-26 at 18:12 +0200, Peter Zijlstra wrote:
> > > On Mon, Jun 26, 2017 at 11:20:54AM -0400, Rik van Riel wrote:
> > >
> > > > Oh, indeed. I guess in wake_affine() we should test
> > > > whether the CPUs are in the same NUMA node, rather than
> > > > doing cpus_share_cache() ?
> > >
> > > Well, since select_idle_sibling() is on LLC; the early test on
> > > cpus_share_cache(prev,this) seems to actually make sense.
> > >
> > > But then cutting out all the other bits seems wrong. Not in the
> > > least
> > > because !NUMA_BALACING should also still keep working.
> >
> > Even when !NUMA_BALANCING, I suspect it makes little sense
> > to compare the loads just one the cores in question, since
> > select_idle_sibling() will likely move the task somewhere
> > else.
> >
> > I suspect we want to compare the load on the whole LLC
> > for that reason, even with NUMA_BALANCING disabled.
>
> But we don't have that data around :/ One thing we could do is try
> and
> keep a copy of the last s*_lb_stats around in the sched_domain_shared
> stuff or something and try and use that.
>
> That way we can strictly keep things at the LLC level and not confuse
> things with NUMA.
>
> Similarly, we could use that same data to then avoid re-computing
> things
> for the NUMA domain as well and do away with numa_stats.
That does seem like a useful optimization, though
I guess we would have to invalidate the cached data
every time we actually move a task?
The current code simply walks all the CPUs in the
cpumask_t, and adds up capacity and load. Doing
that appears to be better than poor task placement
(Jirka's numbers speak for themselves), but optimizing
this code path does seem like a worthwhile goal.
I'll look into it.
> > > > Or, alternatively, have an update_numa_stats() variant
> > > > for numa_wake_affine() that works on the LLC level?
> > >
> > > I think we want to retain the existing behaviour for everything
> > > larger than LLC, and when NUMA_BALANCING, smaller than NUMA.
> >
> > What do you mean by this, exactly?
>
> As you noted, when prev and this are in the same LLC, it doesn't
> matter
> and select_idle_sibling() will do its thing. So anything smaller than
> the LLC need not do anything.
>
> When NUMA_BALANCING we have the numa_stats thing and we can, as you
> propose use that.
>
> If LLC < NUMA or !NUMA_BALANCING we have a region that needs to do
> _something_.
Agreed. I will fix this. Given that this is a bit
of a corner case, I guess I can fix this with follow-up
patches, to be merged into -tip before the whole series
is sent on to Linus?
> > > Also note that your use of task_h_load() in the new numa thing
> > > suffers
> > > from exactly the problem effective_load() is trying to solve.
> >
> > Are you saying task_h_load is wrong in task_numa_compare()
> > too, then? Should both use effective_load()?
>
> I need more than the few minutes I currently have, but probably. The
> question is of course, how much does it matter and how painful will
> it
> be to do it better.
I suspect it does not matter at all currenly, since the
load balancing code does not use effective_load, and
having the wake_affine logic calculate things differently
from the load balancer is likely to result in both pieces
of code fighting against each other.
I suspect we should either use task_h_load everywhere,
or effective_load everywhere, but not have a mix and
match situation where one is used in some places, and
the other in others.
--
All rights reversed
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
next prev parent reply other threads:[~2017-06-27 14:56 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-23 16:55 [PATCH 0/4] NUMA improvements with task wakeup and load balancing riel
2017-06-23 16:55 ` [PATCH 1/4] sched,numa: override part of migrate_degrades_locality when idle balancing riel
2017-06-24 6:58 ` Ingo Molnar
2017-06-24 23:45 ` Rik van Riel
2017-06-24 7:22 ` [tip:sched/core] sched/numa: Override part of migrate_degrades_locality() " tip-bot for Rik van Riel
2017-06-23 16:55 ` [PATCH 2/4] sched: simplify wake_affine for single socket case riel
2017-06-24 7:22 ` [tip:sched/core] sched/fair: Simplify wake_affine() for the " tip-bot for Rik van Riel
2017-06-23 16:55 ` [PATCH 3/4] sched,numa: implement numa node level wake_affine riel
2017-06-24 7:23 ` [tip:sched/core] sched/numa: Implement NUMA node level wake_affine() tip-bot for Rik van Riel
2017-06-26 14:43 ` [PATCH 3/4] sched,numa: implement numa node level wake_affine Peter Zijlstra
2017-06-23 16:55 ` [PATCH 4/4] sched,fair: remove effective_load riel
2017-06-24 7:23 ` [tip:sched/core] sched/fair: Remove effective_load() tip-bot for Rik van Riel
2017-06-26 14:44 ` [PATCH 4/4] sched,fair: remove effective_load Peter Zijlstra
2017-06-26 14:46 ` Peter Zijlstra
2017-06-26 14:55 ` Rik van Riel
2017-06-26 15:04 ` Peter Zijlstra
2017-06-26 15:20 ` Rik van Riel
2017-06-26 16:12 ` Peter Zijlstra
2017-06-26 19:34 ` Rik van Riel
2017-06-27 5:39 ` Peter Zijlstra
2017-06-27 14:55 ` Rik van Riel [this message]
2017-08-01 12:19 ` [PATCH] sched/fair: Fix wake_affine() for !NUMA_BALANCING Peter Zijlstra
2017-08-01 19:26 ` Josef Bacik
2017-08-01 21:43 ` Peter Zijlstra
2017-08-24 22:29 ` Chris Wilson
2017-08-25 15:46 ` Chris Wilson
2017-06-27 18:27 ` [PATCH 4/4] sched,fair: remove effective_load Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1498575358.20270.114.camel@redhat.com \
--to=riel@redhat.com \
--cc=jhladky@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).