[PATCH v2 3/8] sched/topology: add for_each_numa_cpu() macro

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Yury Norov <yury.norov@gmail.com>
To: Jakub Kicinski <kuba@kernel.org>,
	netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Yury Norov <yury.norov@gmail.com>,
	Saeed Mahameed <saeedm@nvidia.com>,
	Pawel Chmielewski <pawel.chmielewski@intel.com>,
	Leon Romanovsky <leon@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Paolo Abeni <pabeni@redhat.com>,
	Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>,
	Tariq Toukan <tariqt@nvidia.com>, Gal Pressman <gal@nvidia.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Heiko Carstens <hca@linux.ibm.com>,
	Barry Song <baohua@kernel.org>
Subject: [PATCH v2 3/8] sched/topology: add for_each_numa_cpu() macro
Date: Wed, 19 Apr 2023 22:19:41 -0700	[thread overview]
Message-ID: <20230420051946.7463-4-yury.norov@gmail.com> (raw)
In-Reply-To: <20230420051946.7463-1-yury.norov@gmail.com>

for_each_cpu() is widely used in the kernel, and it's beneficial to
create a NUMA-aware version of the macro.

Recently added for_each_numa_hop_mask() works, but switching existing
codebase to using it is not an easy process.

New for_each_numa_cpu() is designed to be similar to the for_each_cpu().
It allows to convert existing code to NUMA-aware as simple as adding a
hop iterator variable and passing it inside new macro. for_each_numa_cpu()
takes care of the rest.

At the moment, we have 2 users of NUMA-aware enumerators. One is
Melanox's in-tree driver, and another is Intel's in-review driver:

https://lore.kernel.org/lkml/20230216145455.661709-1-pawel.chmielewski@intel.com/

Both real-life examples follow the same pattern:

	for_each_numa_hop_mask(cpus, prev, node) {
 		for_each_cpu_andnot(cpu, cpus, prev) {
 			if (cnt++ == max_num)
 				goto out;
 			do_something(cpu);
 		}
		prev = cpus;
 	}

With the new macro, it would look like this:

	for_each_numa_cpu(cpu, hop, node, cpu_possible_mask) {
		if (cnt++ == max_num)
			break;
		do_something(cpu);
 	}

Straight conversion of existing for_each_cpu() codebase to NUMA-aware
version with for_each_numa_hop_mask() is difficult because it doesn't
take a user-provided cpu mask, and eventually ends up with open-coded
double loop. With for_each_numa_cpu() it shouldn't be a brainteaser.
Consider the NUMA-ignorant example:

	cpumask_t cpus = get_mask();
	int cnt = 0, cpu;

	for_each_cpu(cpu, cpus) {
		if (cnt++ == max_num)
			break;
		do_something(cpu);
 	}

Converting it to NUMA-aware version would be as simple as:

	cpumask_t cpus = get_mask();
	int node = get_node();
	int cnt = 0, hop, cpu;

	for_each_numa_cpu(cpu, hop, node, cpus) {
		if (cnt++ == max_num)
			break;
		do_something(cpu);
 	}

The latter looks more verbose and avoids from open-coding that annoying
double loop. Another advantage is that it works with a 'hop' parameter with
the clear meaning of NUMA distance, and doesn't make people not familiar
to enumerator internals bothering with current and previous masks machinery.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 include/linux/topology.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/include/linux/topology.h b/include/linux/topology.h
index 13209095d6e2..01fb3a55d7ce 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -286,4 +286,20 @@ sched_numa_hop_mask(unsigned int node, unsigned int hops)
 	     !IS_ERR_OR_NULL(mask);					       \
 	     __hops++)

+/**
+ * for_each_numa_cpu - iterate over cpus in increasing order taking into account
+ *		       NUMA distances from a given node.
+ * @cpu: the (optionally unsigned) integer iterator
+ * @hop: the iterator variable, must be initialized to a desired minimal hop.
+ * @node: the NUMA node to start the search from.
+ * @mask: the cpumask pointer
+ *
+ * Requires rcu_lock to be held.
+ */
+#define for_each_numa_cpu(cpu, hop, node, mask)					\
+	for ((cpu) = 0, (hop) = 0;						\
+		(cpu) = sched_numa_find_next_cpu((mask), (cpu), (node), &(hop)),\
+		(cpu) < nr_cpu_ids;						\
+		(cpu)++)
+
 #endif /* _LINUX_TOPOLOGY_H */
-- 
2.34.1

next prev parent reply	other threads:[~2023-04-20  5:20 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-20  5:19 [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
2023-04-20  5:19 ` [PATCH v2 1/8] lib/find: add find_next_and_andnot_bit() Yury Norov
2023-04-20  5:19 ` [PATCH v2 2/8] sched/topology: introduce sched_numa_find_next_cpu() Yury Norov
2023-04-25  9:54   ` Valentin Schneider
2023-04-26  5:26     ` Yury Norov
2023-04-26  9:17       ` Valentin Schneider
2023-04-20  5:19 ` Yury Norov [this message]
2023-04-25  9:54   ` [PATCH v2 3/8] sched/topology: add for_each_numa_cpu() macro Valentin Schneider
2023-04-26  5:32     ` Yury Norov
2023-04-26  9:17       ` Valentin Schneider
2023-04-20  5:19 ` [PATCH v2 4/8] net: mlx5: switch comp_irqs_request() to using for_each_numa_cpu Yury Norov
2023-04-20  8:27   ` Tariq Toukan
2023-04-20 22:45     ` Yury Norov
2023-04-20  5:19 ` [PATCH v2 5/8] lib/cpumask: update comment to cpumask_local_spread() Yury Norov
2023-04-20  5:19 ` [PATCH v2 6/8] sched/topology: export sched_domains_numa_levels Yury Norov
2023-04-20  5:19 ` [PATCH v2 7/8] lib: add test for for_each_numa_{cpu,hop_mask}() Yury Norov
2023-04-24 17:09   ` Valentin Schneider
2023-04-26  5:50     ` Yury Norov
2023-04-26  9:17       ` Valentin Schneider
2023-04-26 20:51         ` Yury Norov
2023-04-27  9:35           ` Valentin Schneider
2023-04-20  5:19 ` [PATCH v2 8/8] sched: drop for_each_numa_hop_mask() Yury Norov
2023-04-20 10:15   ` kernel test robot
2023-04-21  8:38   ` kernel test robot
2023-04-21  8:38   ` kernel test robot

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:13209095d6e dfblob:01fb3a55d7c )
 OR (
bs:"[PATCH v2 3/8] sched/topology: add for_each_numa_cpu() macro" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230420051946.7463-4-yury.norov@gmail.com \
    --to=yury.norov@gmail.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=baohua@kernel.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=davem@davemloft.net \
    --cc=dietmar.eggemann@arm.com \
    --cc=edumazet@google.com \
    --cc=gal@nvidia.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hca@linux.ibm.com \
    --cc=juri.lelli@redhat.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pawel.chmielewski@intel.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=saeedm@nvidia.com \
    --cc=tariqt@nvidia.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.