From: Rik van Riel <riel@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, mingo@kernel.org,
chegu_vinod@hp.com, mgorman@suse.de
Subject: Re: [PATCH 3/3] sched,numa: do not set preferred_node on migration to a second choice node
Date: Tue, 15 Apr 2014 10:35:29 -0400 [thread overview]
Message-ID: <534D43B1.9020405@redhat.com> (raw)
In-Reply-To: <20140414125635.GE11182@twins.programming.kicks-ass.net>
On 04/14/2014 08:56 AM, Peter Zijlstra wrote:
> On Fri, Apr 11, 2014 at 01:00:29PM -0400, riel@redhat.com wrote:
>> From: Rik van Riel <riel@redhat.com>
>>
>> Setting the numa_preferred_node for a task in task_numa_migrate
>> does nothing on a 2-node system. Either we migrate to the node
>> that already was our preferred node, or we stay where we were.
>>
>> On a 4-node system, it can slightly decrease overhead, by not
>> calling the NUMA code as much. Since every node tends to be
>> directly connected to every other node, running on the wrong
>> node for a while does not do much damage.
>>
>> However, on an 8 node system, there are far more bad nodes
>> than there are good ones, and pretending that a second choice
>> is actually the preferred node can greatly delay, or even
>> prevent, a workload from converging.
>>
>> The only time we can safely pretend that a second choice
>> node is the preferred node is when the task is part of a
>> workload that spans multiple NUMA nodes.
>>
>> Signed-off-by: Rik van Riel <riel@redhat.com>
>> Tested-by: Vinod Chegu <chegu_vinod@hp.com>
>> ---
>> kernel/sched/fair.c | 11 ++++++++++-
>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index babd316..302facf 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -1301,7 +1301,16 @@ static int task_numa_migrate(struct task_struct *p)
>> if (env.best_cpu == -1)
>> return -EAGAIN;
>>
>> - sched_setnuma(p, env.dst_nid);
>> + /*
>> + * If the task is part of a workload that spans multiple NUMA nodes,
>> + * and is migrating into one of the workload's active nodes, remember
>
> I read 'into' as:
> !node_isset(env.src_nid, ...) && node_isset(env.dst_nid, ...)
>
> The code doesn't seem to do this.
s/into/to/ makes the comment and the code match
again :)
>> + * this node as the task's preferred numa node, so the workload can
>> + * settle down.
>> + * A task that migrated to a second choice node will be better off
>> + * trying for a better one later. Do not set the preferred node here.
>> + */
>> + if (p->numa_group && node_isset(env.dst_nid, p->numa_group->active_nodes))
>> + sched_setnuma(p, env.dst_nid);
>
> OK, so I was totally confused on this one.
>
> What I missed was that we set the primary choice over in
> task_numa_placement().
>
> I'm not really happy with the changelog; but I'm also struggling to
> identify what exactly is missing. Or rather, the thing makes me
> confused, and not feel like it actually explains it proper.
>
> That said; I tend to more or less agree with the actual change, but..
I have looked at the comment and the changelog some
more, and it is not clear to me what you are missing,
or what I could be explaining better...
next prev parent reply other threads:[~2014-04-15 14:35 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-11 17:00 [PATCH 0/3] sched,numa: reduce page migrations with pseudo-interleaving riel
2014-04-11 17:00 ` [PATCH 1/3] sched,numa: count pages on active node as local riel
2014-04-11 17:34 ` Joe Perches
2014-04-11 17:41 ` Rik van Riel
2014-04-11 18:01 ` Joe Perches
2014-04-25 9:04 ` Mel Gorman
2014-05-08 10:42 ` [tip:sched/core] sched/numa: Count " tip-bot for Rik van Riel
2014-04-11 17:00 ` [PATCH 2/3] sched,numa: retry placement more frequently when misplaced riel
2014-04-11 17:46 ` Joe Perches
2014-04-11 18:03 ` Rik van Riel
2014-04-14 8:19 ` Ingo Molnar
2014-04-25 9:05 ` Mel Gorman
2014-05-08 10:42 ` [tip:sched/core] sched/numa: Retry " tip-bot for Rik van Riel
2014-04-11 17:00 ` [PATCH 3/3] sched,numa: do not set preferred_node on migration to a second choice node riel
2014-04-14 12:56 ` Peter Zijlstra
2014-04-15 14:35 ` Rik van Riel [this message]
2014-04-15 16:51 ` Peter Zijlstra
2014-04-25 9:09 ` Mel Gorman
2014-05-08 10:43 ` [tip:sched/core] sched/numa: Do " tip-bot for Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=534D43B1.9020405@redhat.com \
--to=riel@redhat.com \
--cc=chegu_vinod@hp.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.