From: Yury Norov <yury.norov@gmail.com>
To: Andrea Righi <arighi@nvidia.com>
Cc: Tejun Heo <tj@kernel.org>, David Vernet <void@manifault.com>,
Changwoo Min <changwoo@igalia.com>,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Valentin Schneider <vschneid@redhat.com>,
Joel Fernandes <joel@joelfernandes.org>,
Ian May <ianm@nvidia.com>,
bpf@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 6/7] sched_ext: idle: Per-node idle cpumasks
Date: Thu, 13 Feb 2025 13:03:22 -0500 [thread overview]
Message-ID: <Z64z6jIXz-MCSlv1@thinkpad> (raw)
In-Reply-To: <20250212165006.490130-7-arighi@nvidia.com>
On Wed, Feb 12, 2025 at 05:48:13PM +0100, Andrea Righi wrote:
> @@ -90,6 +131,78 @@ s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, u64 flags)
> goto retry;
> }
>
> +static s32 pick_idle_cpu_from_other_nodes(const struct cpumask *cpus_allowed, int node, u64 flags)
'From other node' sounds a bit vague
> +{
> + static DEFINE_PER_CPU(nodemask_t, per_cpu_unvisited);
> + nodemask_t *unvisited = this_cpu_ptr(&per_cpu_unvisited);
> + s32 cpu = -EBUSY;
> +
> + preempt_disable();
> + unvisited = this_cpu_ptr(&per_cpu_unvisited);
> +
> + /*
> + * Restrict the search to the online nodes, excluding the current
> + * one.
> + */
> + nodes_clear(*unvisited);
> + nodes_or(*unvisited, *unvisited, node_states[N_ONLINE]);
nodes_clear() + nodes_or() == nodes_copy()
Yeah, we miss it. The attached patch adds nodes_copy(). Can you
consider taking it for your series?
> + node_clear(node, *unvisited);
> +
> + /*
> + * Traverse all nodes in order of increasing distance, starting
> + * from @node.
> + *
> + * This loop is O(N^2), with N being the amount of NUMA nodes,
> + * which might be quite expensive in large NUMA systems. However,
> + * this complexity comes into play only when a scheduler enables
> + * SCX_OPS_BUILTIN_IDLE_PER_NODE and it's requesting an idle CPU
> + * without specifying a target NUMA node, so it shouldn't be a
> + * bottleneck is most cases.
> + *
> + * As a future optimization we may want to cache the list of nodes
> + * in a per-node array, instead of actually traversing them every
> + * time.
> + */
> + for_each_node_numadist(node, *unvisited) {
> + cpu = pick_idle_cpu_in_node(cpus_allowed, node, flags);
> + if (cpu >= 0)
> + break;
> + }
> + preempt_enable();
> +
> + return cpu;
> +}
> +
> +/*
> + * Find an idle CPU in the system, starting from @node.
> + */
> +s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, int node, u64 flags)
> +{
> + s32 cpu;
> +
> + /*
> + * Always search in the starting node first (this is an
> + * optimization that can save some cycles even when the search is
> + * not limited to a single node).
> + */
> + cpu = pick_idle_cpu_in_node(cpus_allowed, node, flags);
> + if (cpu >= 0)
> + return cpu;
> +
> + /*
> + * Stop the search if we are using only a single global cpumask
> + * (NUMA_NO_NODE) or if the search is restricted to the first node
> + * only.
> + */
> + if (node == NUMA_NO_NODE || flags & SCX_PICK_IDLE_IN_NODE)
> + return -EBUSY;
> +
> + /*
> + * Extend the search to the other nodes.
> + */
> + return pick_idle_cpu_from_other_nodes(cpus_allowed, node, flags);
> +}
From d69294cba9bffc05924dc3351a88601937c24213 Mon Sep 17 00:00:00 2001
From: Yury Norov <yury.norov@gmail.com>
Date: Thu, 13 Feb 2025 11:21:08 -0500
Subject: [PATCH] nodemask: add nodes_copy()
Nodemasks API misses the plain nodes_copy() which is required in this series.
Signed-off-by: Yury Norov [NVIDIA] <yury.norov@gmail.com>
---
include/linux/nodemask.h | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
index 9fd7a0ce9c1a..41cf43c4e70f 100644
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -191,6 +191,13 @@ static __always_inline void __nodes_andnot(nodemask_t *dstp, const nodemask_t *s
bitmap_andnot(dstp->bits, src1p->bits, src2p->bits, nbits);
}
+#define nodes_copy(dst, src) __nodes_copy(&(dst), &(src), MAX_NUMNODES)
+static __always_inline void __nodes_copy(nodemask_t *dstp,
+ const nodemask_t *srcp, unsigned int nbits)
+{
+ bitmap_copy(dstp->bits, srcp->bits, nbits);
+}
+
#define nodes_complement(dst, src) \
__nodes_complement(&(dst), &(src), MAX_NUMNODES)
static __always_inline void __nodes_complement(nodemask_t *dstp,
--
2.43.0
next prev parent reply other threads:[~2025-02-13 18:03 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-12 16:48 [PATCHSET v11 sched_ext/for-6.15] sched_ext: split global idle cpumask into per-NUMA cpumasks Andrea Righi
2025-02-12 16:48 ` [PATCH 1/7] nodemask: numa: reorganize inclusion path Andrea Righi
2025-02-13 15:29 ` Yury Norov
2025-02-13 15:59 ` Andrea Righi
2025-02-12 16:48 ` [PATCH 2/7] mm/numa: Introduce nearest_node_nodemask() Andrea Righi
2025-02-13 15:57 ` Yury Norov
2025-02-13 16:19 ` Andrea Righi
2025-02-13 17:12 ` Yury Norov
2025-02-14 8:55 ` Andrea Righi
2025-02-14 16:04 ` Yury Norov
2025-02-12 16:48 ` [PATCH 3/7] sched/topology: Introduce for_each_node_numadist() iterator Andrea Righi
2025-02-13 16:02 ` Yury Norov
2025-02-13 16:32 ` Andrea Righi
2025-02-12 16:48 ` [PATCH 4/7] sched_ext: idle: Make idle static keys private Andrea Righi
2025-02-12 16:48 ` [PATCH 5/7] sched_ext: idle: Introduce SCX_OPS_BUILTIN_IDLE_PER_NODE Andrea Righi
2025-02-13 16:08 ` Yury Norov
2025-02-13 16:22 ` Andrea Righi
2025-02-12 16:48 ` [PATCH 6/7] sched_ext: idle: Per-node idle cpumasks Andrea Righi
2025-02-13 10:57 ` kernel test robot
2025-02-13 18:03 ` Yury Norov [this message]
2025-02-12 16:48 ` [PATCH 7/7] sched_ext: idle: Introduce node-aware idle cpu kfunc helpers Andrea Righi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z64z6jIXz-MCSlv1@thinkpad \
--to=yury.norov@gmail.com \
--cc=arighi@nvidia.com \
--cc=bpf@vger.kernel.org \
--cc=bsegall@google.com \
--cc=changwoo@igalia.com \
--cc=dietmar.eggemann@arm.com \
--cc=ianm@nvidia.com \
--cc=joel@joelfernandes.org \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tj@kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=void@manifault.com \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.