From: Andrea Righi <arighi@nvidia.com>
To: Yury Norov <yury.norov@gmail.com>
Cc: Tejun Heo <tj@kernel.org>, David Vernet <void@manifault.com>,
Changwoo Min <changwoo@igalia.com>,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Valentin Schneider <vschneid@redhat.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/6] sched/topology: introduce for_each_numa_hop_node() / sched_numa_hop_node()
Date: Thu, 19 Dec 2024 20:43:52 +0100 [thread overview]
Message-ID: <Z2R3eKPwBhzKU4y3@gpd3> (raw)
In-Reply-To: <Z2Rlc8eljxSF0I0Z@yury-ThinkPad>
On Thu, Dec 19, 2024 at 10:26:59AM -0800, Yury Norov wrote:
> On Wed, Dec 18, 2024 at 06:04:53AM -1000, Tejun Heo wrote:
> > Hello,
> >
> > On Wed, Dec 18, 2024 at 11:23:40AM +0100, Andrea Righi wrote:
> > ...
> > > > So, this would work but given that there is nothing dynamic about this
> > > > ordering, would it make more sense to build the ordering and store it
> > > > per-node? Then, the iteration just becomes walking that array.
> > >
> > > I've also considered doing that. I don't know if it'd work with offline
> > > nodes, but maybe we can just check node_online(node) at each iteration and
> > > skip those that are not online.
>
> for_each_numa_hop_mask() only traverses N_CPU nodes, and N_CPU nodes have
> proper distances.
>
> I think that for_each_numa_hop_node() should match for_each_numa_hop_mask().
> It would be good to cross-test them to ensure that they generate the same
> order at least for N_CPU nodes.
It'd be nice to have a kunit, I can take a look at this (in a separate
patch, I think we can add this later).
>
> If you think that for_each_numa_hop_node() should traverse non-N_CPU nodes,
> you need a 'node_state' parameter. This will allow to make sure that at
> least N_CPU portion works correctly.
>
> > Yeah, there can be e.g. for_each_possible_node_by_dist() wheke nodes with
> > unknown distances (offline ones?) are put at the end and then there's also
> > for_each_online_node_by_dist() which filters out offline ones, and the
> > ordering can be updated from a CPU hotplug callback.
>
> We can assign UINT_MAX for those nodes I guess?
>
> > The ordering can be
> > probably put in an rcu protected array? I'm not sure what's the
> > synchronization convention around node on/offlining. Is that protected
> > together with CPU on/offlining?
>
> The machinery is already there, we just need another array of nodemasks -
> sched_domains_numa_nodes in addition to sched_domains_numa_nodes. The
> last one is already protected by RCU, and we need to update new array every
> time when sched_domains_numa_nodes updated.
>
> > Given that there usually aren't that many nodes, the current implementation
> > is probably fine too, so please feel free to ignore this suggestion for now
> > too.
>
> I agree. The number of nodes on typical system is 1 or 2. Even if
> it's 8, the Andrea's bubble sort will be still acceptable. So, I'm
> OK with O(N^2) if you guys OK with it. I only would like to have
> this choice explained in commit message.
Good point, I'll add a comment about that.
Thanks,
-Andrea
next prev parent reply other threads:[~2024-12-19 19:44 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-17 9:32 [PATCHSET v7 sched_ext/for-6.14] sched_ext: split global idle cpumask into per-NUMA cpumasks Andrea Righi
2024-12-17 9:32 ` [PATCH 1/6] sched/topology: introduce for_each_numa_hop_node() / sched_numa_hop_node() Andrea Righi
2024-12-17 21:57 ` Tejun Heo
2024-12-18 10:23 ` Andrea Righi
2024-12-18 16:04 ` Tejun Heo
2024-12-19 18:26 ` Yury Norov
2024-12-19 19:43 ` Andrea Righi [this message]
2024-12-19 19:52 ` Peter Zijlstra
2024-12-19 21:16 ` Andrea Righi
2024-12-17 9:32 ` [PATCH 2/6] sched_ext: Introduce SCX_OPS_NODE_BUILTIN_IDLE Andrea Righi
2024-12-17 9:32 ` [PATCH 3/6] sched_ext: Introduce per-node idle cpumasks Andrea Righi
2024-12-17 23:22 ` Tejun Heo
2024-12-18 10:21 ` Andrea Righi
2024-12-18 16:10 ` Tejun Heo
2024-12-18 16:18 ` Andrea Righi
2024-12-17 23:23 ` Tejun Heo
2024-12-20 16:48 ` Yury Norov
2024-12-20 17:52 ` Andrea Righi
2024-12-17 9:32 ` [PATCH 4/6] sched_ext: Get rid of the scx_selcpu_topo_numa logic Andrea Righi
2024-12-17 9:32 ` [PATCH 5/6] sched_ext: Introduce NUMA aware idle cpu kfunc helpers Andrea Righi
2024-12-17 9:32 ` [PATCH 6/6] sched_ext: Move built-in idle CPU selection policy to a separate file Andrea Righi
2024-12-20 14:53 ` Yury Norov
2024-12-20 14:58 ` Andrea Righi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z2R3eKPwBhzKU4y3@gpd3 \
--to=arighi@nvidia.com \
--cc=bsegall@google.com \
--cc=changwoo@igalia.com \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tj@kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=void@manifault.com \
--cc=vschneid@redhat.com \
--cc=yury.norov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.