From: Peter Zijlstra <peterz@infradead.org>
To: riel@redhat.com
Cc: linux-kernel@vger.kernel.org, mgorman@suse.de,
chegu_vinod@hp.com, mingo@kernel.org, efault@gmx.de,
vincent.guittot@linaro.org
Subject: Re: [PATCH RFC 4/5] sched,numa: calculate node scores in complex NUMA topologies
Date: Sun, 12 Oct 2014 16:53:58 +0200 [thread overview]
Message-ID: <20141012145358.GH3015@worktop> (raw)
In-Reply-To: <1412797050-8903-5-git-send-email-riel@redhat.com>
On Wed, Oct 08, 2014 at 03:37:29PM -0400, riel@redhat.com wrote:
> From: Rik van Riel <riel@redhat.com>
>
> In order to do task placement on systems with complex NUMA topologies,
> it is necessary to count the faults on nodes nearby the node that is
> being examined for a potential move.
>
> In case of a system with a backplane interconnect, we are dealing with
> groups of NUMA nodes; each of the nodes within a group is the same number
> of hops away from nodes in other groups in the system. Optimal placement
> on this topology is achieved by counting all nearby nodes equally. When
> comparing nodes A and B at distance N, nearby nodes are those at distances
> smaller than N from nodes A or B.
>
> Placement strategy on a system with a glueless mesh NUMA topology needs
> to be different, because there are no natural groups of nodes determined
> by the hardware. Instead, when dealing with two nodes A and B at distance
> N, N >= 2, there will be intermediate nodes at distance < N from both nodes
> A and B. Good placement can be achieved by right shifting the faults on
> nearby nodes by the number of hops from the node being scored. In this
> context, a nearby node is any node less than the maximum distance in the
> system away from the node. Those nodes are skipped for efficiency reasons,
> there is no real policy reason to do so.
> +/* Handle placement on systems where not all nodes are directly connected. */
> +static unsigned long score_nearby_nodes(struct task_struct *p, int nid,
> + int hoplimit, bool task)
> +{
> + unsigned long score = 0;
> + int node;
> +
> + /*
> + * All nodes are directly connected, and the same distance
> + * from each other. No need for fancy placement algorithms.
> + */
> + if (sched_numa_topology_type == NUMA_DIRECT)
> + return 0;
> +
> + for_each_online_node(node) {
> + }
> +
> + return score;
> +}
> @@ -944,6 +1003,8 @@ static inline unsigned long task_weight(struct task_struct *p, int nid,
> return 0;
>
> faults = task_faults(p, nid);
> + faults += score_nearby_nodes(p, nid, hops, true);
> +
> return 1000 * faults / total_faults;
> }
> @@ -961,6 +1022,8 @@ static inline unsigned long group_weight(struct task_struct *p, int nid,
> return 0;
>
> faults = group_faults(p, nid);
> + faults += score_nearby_nodes(p, nid, hops, false);
> +
> return 1000 * faults / total_faults;
> }
So this makes {task,group}_weight() O(nr_nodes), and we call these
function from O(nr_nodes) loops, giving a total of O(nr_nodes^2)
computational complexity, right?
Seems important to mention; I realize this is only for !DIRECT, but
still, I bet the real large people (those same 512 nodes we had
previous) would not really appreciate this.
next prev parent reply other threads:[~2014-10-12 14:54 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-08 19:37 [PATCH RFC 0/5] sched,numa: task placement with complex NUMA topologies riel
2014-10-08 19:37 ` [PATCH RFC 1/5] sched,numa: build table of node hop distance riel
2014-10-12 13:17 ` Peter Zijlstra
2014-10-12 13:28 ` Rik van Riel
2014-10-14 6:47 ` Peter Zijlstra
2014-10-14 7:49 ` Rik van Riel
2014-10-08 19:37 ` [PATCH RFC 2/5] sched,numa: classify the NUMA topology of a system riel
2014-10-12 14:30 ` Peter Zijlstra
2014-10-13 7:12 ` Rik van Riel
2014-10-08 19:37 ` [PATCH RFC 3/5] sched,numa: preparations for complex topology placement riel
2014-10-12 14:37 ` Peter Zijlstra
2014-10-13 7:12 ` Rik van Riel
2014-10-08 19:37 ` [PATCH RFC 4/5] sched,numa: calculate node scores in complex NUMA topologies riel
2014-10-12 14:53 ` Peter Zijlstra [this message]
2014-10-13 7:15 ` Rik van Riel
2014-10-08 19:37 ` [PATCH RFC 5/5] sched,numa: find the preferred nid with complex NUMA topology riel
2014-10-12 14:56 ` Peter Zijlstra
2014-10-13 7:17 ` Rik van Riel
[not found] ` <4168C988EBDF2141B4E0B6475B6A73D126F58E4F@G6W2504.americas.hpqcorp.net>
[not found] ` <54367446.3020603@redhat.com>
2014-10-10 18:44 ` [PATCH RFC 0/5] sched,numa: task placement with complex NUMA topologies Vinod, Chegu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141012145358.GH3015@worktop \
--to=peterz@infradead.org \
--cc=chegu_vinod@hp.com \
--cc=efault@gmx.de \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=riel@redhat.com \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox