public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Righi <arighi@nvidia.com>
To: Yury Norov <yury.norov@gmail.com>
Cc: Tejun Heo <tj@kernel.org>, David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/6] sched/topology: introduce for_each_numa_hop_node() / sched_numa_hop_node()
Date: Thu, 19 Dec 2024 20:43:52 +0100	[thread overview]
Message-ID: <Z2R3eKPwBhzKU4y3@gpd3> (raw)
In-Reply-To: <Z2Rlc8eljxSF0I0Z@yury-ThinkPad>

On Thu, Dec 19, 2024 at 10:26:59AM -0800, Yury Norov wrote:
> On Wed, Dec 18, 2024 at 06:04:53AM -1000, Tejun Heo wrote:
> > Hello,
> > 
> > On Wed, Dec 18, 2024 at 11:23:40AM +0100, Andrea Righi wrote:
> > ...
> > > > So, this would work but given that there is nothing dynamic about this
> > > > ordering, would it make more sense to build the ordering and store it
> > > > per-node? Then, the iteration just becomes walking that array.
> > > 
> > > I've also considered doing that. I don't know if it'd work with offline
> > > nodes, but maybe we can just check node_online(node) at each iteration and
> > > skip those that are not online.
> 
> for_each_numa_hop_mask() only traverses N_CPU nodes, and N_CPU nodes have
> proper distances.
> 
> I think that for_each_numa_hop_node() should match for_each_numa_hop_mask().
> It would be good to cross-test them to ensure that they generate the same
> order at least for N_CPU nodes.

It'd be nice to have a kunit, I can take a look at this (in a separate
patch, I think we can add this later).

> 
> If you think that for_each_numa_hop_node() should traverse non-N_CPU nodes,
> you need a 'node_state' parameter. This will allow to make sure that at
> least N_CPU portion works correctly.
> 
> > Yeah, there can be e.g. for_each_possible_node_by_dist() wheke nodes with
> > unknown distances (offline ones?) are put at the end and then there's also
> > for_each_online_node_by_dist() which filters out offline ones, and the
> > ordering can be updated from a CPU hotplug callback.
> 
> We can assign UINT_MAX for those nodes I guess?
> 
> > The ordering can be
> > probably put in an rcu protected array? I'm not sure what's the
> > synchronization convention around node on/offlining. Is that protected
> > together with CPU on/offlining?
> 
> The machinery is already there, we just need another array of nodemasks - 
> sched_domains_numa_nodes in addition to sched_domains_numa_nodes. The
> last one is already protected by RCU, and we need to update new array every
> time when sched_domains_numa_nodes updated.
>  
> > Given that there usually aren't that many nodes, the current implementation
> > is probably fine too, so please feel free to ignore this suggestion for now
> > too.
> 
> I agree. The number of nodes on typical system is 1 or 2. Even if
> it's 8, the Andrea's bubble sort will be still acceptable. So, I'm
> OK with O(N^2) if you guys OK with it. I only would like to have
> this choice explained in commit message.

Good point, I'll add a comment about that.

Thanks,
-Andrea

  reply	other threads:[~2024-12-19 19:44 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-17  9:32 [PATCHSET v7 sched_ext/for-6.14] sched_ext: split global idle cpumask into per-NUMA cpumasks Andrea Righi
2024-12-17  9:32 ` [PATCH 1/6] sched/topology: introduce for_each_numa_hop_node() / sched_numa_hop_node() Andrea Righi
2024-12-17 21:57   ` Tejun Heo
2024-12-18 10:23     ` Andrea Righi
2024-12-18 16:04       ` Tejun Heo
2024-12-19 18:26         ` Yury Norov
2024-12-19 19:43           ` Andrea Righi [this message]
2024-12-19 19:52           ` Peter Zijlstra
2024-12-19 21:16             ` Andrea Righi
2024-12-17  9:32 ` [PATCH 2/6] sched_ext: Introduce SCX_OPS_NODE_BUILTIN_IDLE Andrea Righi
2024-12-17  9:32 ` [PATCH 3/6] sched_ext: Introduce per-node idle cpumasks Andrea Righi
2024-12-17 23:22   ` Tejun Heo
2024-12-18 10:21     ` Andrea Righi
2024-12-18 16:10       ` Tejun Heo
2024-12-18 16:18         ` Andrea Righi
2024-12-17 23:23   ` Tejun Heo
2024-12-20 16:48   ` Yury Norov
2024-12-20 17:52     ` Andrea Righi
2024-12-17  9:32 ` [PATCH 4/6] sched_ext: Get rid of the scx_selcpu_topo_numa logic Andrea Righi
2024-12-17  9:32 ` [PATCH 5/6] sched_ext: Introduce NUMA aware idle cpu kfunc helpers Andrea Righi
2024-12-17  9:32 ` [PATCH 6/6] sched_ext: Move built-in idle CPU selection policy to a separate file Andrea Righi
2024-12-20 14:53   ` Yury Norov
2024-12-20 14:58     ` Andrea Righi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z2R3eKPwBhzKU4y3@gpd3 \
    --to=arighi@nvidia.com \
    --cc=bsegall@google.com \
    --cc=changwoo@igalia.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=void@manifault.com \
    --cc=vschneid@redhat.com \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox