Re: [PATCH 4/4] sched,fair: remove effective_load

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Rik van Riel <riel@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, jhladky@redhat.com,
	mingo@kernel.org, mgorman@suse.de
Subject: Re: [PATCH 4/4] sched,fair: remove effective_load
Date: Tue, 27 Jun 2017 10:55:58 -0400	[thread overview]
Message-ID: <1498575358.20270.114.camel@redhat.com> (raw)
In-Reply-To: <20170627053906.GA7287@worktop>

[-- Attachment #1: Type: text/plain, Size: 3743 bytes --]

On Tue, 2017-06-27 at 07:39 +0200, Peter Zijlstra wrote:
> On Mon, Jun 26, 2017 at 03:34:49PM -0400, Rik van Riel wrote:
> > On Mon, 2017-06-26 at 18:12 +0200, Peter Zijlstra wrote:
> > > On Mon, Jun 26, 2017 at 11:20:54AM -0400, Rik van Riel wrote:
> > > 
> > > > Oh, indeed.  I guess in wake_affine() we should test
> > > > whether the CPUs are in the same NUMA node, rather than
> > > > doing cpus_share_cache() ?
> > > 
> > > Well, since select_idle_sibling() is on LLC; the early test on
> > > cpus_share_cache(prev,this) seems to actually make sense.
> > > 
> > > But then cutting out all the other bits seems wrong. Not in the
> > > least
> > > because !NUMA_BALACING should also still keep working.
> > 
> > Even when !NUMA_BALANCING, I suspect it makes little sense
> > to compare the loads just one the cores in question, since
> > select_idle_sibling() will likely move the task somewhere
> > else.
> > 
> > I suspect we want to compare the load on the whole LLC
> > for that reason, even with NUMA_BALANCING disabled.
> 
> But we don't have that data around :/ One thing we could do is try
> and
> keep a copy of the last s*_lb_stats around in the sched_domain_shared
> stuff or something and try and use that.
> 
> That way we can strictly keep things at the LLC level and not confuse
> things with NUMA.
> 
> Similarly, we could use that same data to then avoid re-computing
> things
> for the NUMA domain as well and do away with numa_stats.

That does seem like a useful optimization, though
I guess we would have to invalidate the cached data
every time we actually move a task?

The current code simply walks all the CPUs in the
cpumask_t, and adds up capacity and load. Doing
that appears to be better than poor task placement
(Jirka's numbers speak for themselves), but optimizing
this code path does seem like a worthwhile goal.

I'll look into it.

> > > > Or, alternatively, have an update_numa_stats() variant
> > > > for numa_wake_affine() that works on the LLC level?
> > > 
> > > I think we want to retain the existing behaviour for everything
> > > larger than LLC, and when NUMA_BALANCING, smaller than NUMA.
> > 
> > What do you mean by this, exactly?
> 
> As you noted, when prev and this are in the same LLC, it doesn't
> matter
> and select_idle_sibling() will do its thing. So anything smaller than
> the LLC need not do anything.
> 
> When NUMA_BALANCING we have the numa_stats thing and we can, as you
> propose use that.
> 
> If LLC < NUMA or !NUMA_BALANCING we have a region that needs to do
> _something_.

Agreed. I will fix this. Given that this is a bit
of a corner case, I guess I can fix this with follow-up
patches, to be merged into -tip before the whole series
is sent on to Linus?

> > > Also note that your use of task_h_load() in the new numa thing
> > > suffers
> > > from exactly the problem effective_load() is trying to solve.
> > 
> > Are you saying task_h_load is wrong in task_numa_compare()
> > too, then?  Should both use effective_load()?
> 
> I need more than the few minutes I currently have, but probably. The
> question is of course, how much does it matter and how painful will
> it
> be to do it better.

I suspect it does not matter at all currenly, since the 
load balancing code does not use effective_load, and
having the wake_affine logic calculate things differently
from the load balancer is likely to result in both pieces
of code fighting against each other.

I suspect we should either use task_h_load everywhere,
or effective_load everywhere, but not have a mix and
match situation where one is used in some places, and
the other in others.

-- 
All rights reversed

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

next prev parent reply	other threads:[~2017-06-27 14:56 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-23 16:55 [PATCH 0/4] NUMA improvements with task wakeup and load balancing riel
2017-06-23 16:55 ` [PATCH 1/4] sched,numa: override part of migrate_degrades_locality when idle balancing riel
2017-06-24  6:58   ` Ingo Molnar
2017-06-24 23:45     ` Rik van Riel
2017-06-24  7:22   ` [tip:sched/core] sched/numa: Override part of migrate_degrades_locality() " tip-bot for Rik van Riel
2017-06-23 16:55 ` [PATCH 2/4] sched: simplify wake_affine for single socket case riel
2017-06-24  7:22   ` [tip:sched/core] sched/fair: Simplify wake_affine() for the " tip-bot for Rik van Riel
2017-06-23 16:55 ` [PATCH 3/4] sched,numa: implement numa node level wake_affine riel
2017-06-24  7:23   ` [tip:sched/core] sched/numa: Implement NUMA node level wake_affine() tip-bot for Rik van Riel
2017-06-26 14:43   ` [PATCH 3/4] sched,numa: implement numa node level wake_affine Peter Zijlstra
2017-06-23 16:55 ` [PATCH 4/4] sched,fair: remove effective_load riel
2017-06-24  7:23   ` [tip:sched/core] sched/fair: Remove effective_load() tip-bot for Rik van Riel
2017-06-26 14:44   ` [PATCH 4/4] sched,fair: remove effective_load Peter Zijlstra
2017-06-26 14:46     ` Peter Zijlstra
2017-06-26 14:55       ` Rik van Riel
2017-06-26 15:04         ` Peter Zijlstra
2017-06-26 15:20           ` Rik van Riel
2017-06-26 16:12             ` Peter Zijlstra
2017-06-26 19:34               ` Rik van Riel
2017-06-27  5:39                 ` Peter Zijlstra
2017-06-27 14:55                   ` Rik van Riel [this message]
2017-08-01 12:19                     ` [PATCH] sched/fair: Fix wake_affine() for !NUMA_BALANCING Peter Zijlstra
2017-08-01 19:26                       ` Josef Bacik
2017-08-01 21:43                         ` Peter Zijlstra
2017-08-24 22:29                           ` Chris Wilson
2017-08-25 15:46                           ` Chris Wilson
2017-06-27 18:27               ` [PATCH 4/4] sched,fair: remove effective_load Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1498575358.20270.114.camel@redhat.com \
    --to=riel@redhat.com \
    --cc=jhladky@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).