From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org,
Valentin Schneider <valentin.schneider@arm.com>,
"Peter Zijlstra (Intel)" <peterz@infradead.org>,
dann frazier <dann.frazier@canonical.com>
Subject: [PATCH 4.19 17/57] sched/topology: Make sched_init_numa() use a set for the deduplicating sort
Date: Mon, 21 Mar 2022 14:51:58 +0100 [thread overview]
Message-ID: <20220321133222.485936257@linuxfoundation.org> (raw)
In-Reply-To: <20220321133221.984120927@linuxfoundation.org>
From: Valentin Schneider <valentin.schneider@arm.com>
commit 620a6dc40754dc218f5b6389b5d335e9a107fd29 upstream.
The deduplicating sort in sched_init_numa() assumes that the first line in
the distance table contains all unique values in the entire table. I've
been trying to pen what this exactly means for the topology, but it's not
straightforward. For instance, topology.c uses this example:
node 0 1 2 3
0: 10 20 20 30
1: 20 10 20 20
2: 20 20 10 20
3: 30 20 20 10
0 ----- 1
| / |
| / |
| / |
2 ----- 3
Which works out just fine. However, if we swap nodes 0 and 1:
1 ----- 0
| / |
| / |
| / |
2 ----- 3
we get this distance table:
node 0 1 2 3
0: 10 20 20 20
1: 20 10 20 30
2: 20 20 10 20
3: 20 30 20 10
Which breaks the deduplicating sort (non-representative first line). In
this case this would just be a renumbering exercise, but it so happens that
we can have a deduplicating sort that goes through the whole table in O(n²)
at the extra cost of a temporary memory allocation (i.e. any form of set).
The ACPI spec (SLIT) mentions distances are encoded on 8 bits. Following
this, implement the set as a 256-bits bitmap. Should this not be
satisfactory (i.e. we want to support 32-bit values), then we'll have to go
for some other sparse set implementation.
This has the added benefit of letting us allocate just the right amount of
memory for sched_domains_numa_distance[], rather than an arbitrary
(nr_node_ids + 1).
Note: DT binding equivalent (distance-map) decodes distances as 32-bit
values.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210122123943.1217-2-valentin.schneider@arm.com
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
include/linux/topology.h | 1
kernel/sched/topology.c | 99 ++++++++++++++++++++++-------------------------
2 files changed, 49 insertions(+), 51 deletions(-)
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -47,6 +47,7 @@ int arch_update_cpu_topology(void);
/* Conform to ACPI 2.0 SLIT distance definitions */
#define LOCAL_DISTANCE 10
#define REMOTE_DISTANCE 20
+#define DISTANCE_BITS 8
#ifndef node_distance
#define node_distance(from,to) ((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)
#endif
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1322,66 +1322,58 @@ static void init_numa_topology_type(void
}
}
+
+#define NR_DISTANCE_VALUES (1 << DISTANCE_BITS)
+
void sched_init_numa(void)
{
- int next_distance, curr_distance = node_distance(0, 0);
struct sched_domain_topology_level *tl;
- int level = 0;
- int i, j, k;
-
- sched_domains_numa_distance = kzalloc(sizeof(int) * (nr_node_ids + 1), GFP_KERNEL);
- if (!sched_domains_numa_distance)
- return;
-
- /* Includes NUMA identity node at level 0. */
- sched_domains_numa_distance[level++] = curr_distance;
- sched_domains_numa_levels = level;
+ unsigned long *distance_map;
+ int nr_levels = 0;
+ int i, j;
/*
* O(nr_nodes^2) deduplicating selection sort -- in order to find the
* unique distances in the node_distance() table.
- *
- * Assumes node_distance(0,j) includes all distances in
- * node_distance(i,j) in order to avoid cubic time.
*/
- next_distance = curr_distance;
+ distance_map = bitmap_alloc(NR_DISTANCE_VALUES, GFP_KERNEL);
+ if (!distance_map)
+ return;
+
+ bitmap_zero(distance_map, NR_DISTANCE_VALUES);
for (i = 0; i < nr_node_ids; i++) {
for (j = 0; j < nr_node_ids; j++) {
- for (k = 0; k < nr_node_ids; k++) {
- int distance = node_distance(i, k);
+ int distance = node_distance(i, j);
- if (distance > curr_distance &&
- (distance < next_distance ||
- next_distance == curr_distance))
- next_distance = distance;
-
- /*
- * While not a strong assumption it would be nice to know
- * about cases where if node A is connected to B, B is not
- * equally connected to A.
- */
- if (sched_debug() && node_distance(k, i) != distance)
- sched_numa_warn("Node-distance not symmetric");
-
- if (sched_debug() && i && !find_numa_distance(distance))
- sched_numa_warn("Node-0 not representative");
+ if (distance < LOCAL_DISTANCE || distance >= NR_DISTANCE_VALUES) {
+ sched_numa_warn("Invalid distance value range");
+ return;
}
- if (next_distance != curr_distance) {
- sched_domains_numa_distance[level++] = next_distance;
- sched_domains_numa_levels = level;
- curr_distance = next_distance;
- } else break;
+
+ bitmap_set(distance_map, distance, 1);
}
+ }
+ /*
+ * We can now figure out how many unique distance values there are and
+ * allocate memory accordingly.
+ */
+ nr_levels = bitmap_weight(distance_map, NR_DISTANCE_VALUES);
- /*
- * In case of sched_debug() we verify the above assumption.
- */
- if (!sched_debug())
- break;
+ sched_domains_numa_distance = kcalloc(nr_levels, sizeof(int), GFP_KERNEL);
+ if (!sched_domains_numa_distance) {
+ bitmap_free(distance_map);
+ return;
}
+ for (i = 0, j = 0; i < nr_levels; i++, j++) {
+ j = find_next_bit(distance_map, NR_DISTANCE_VALUES, j);
+ sched_domains_numa_distance[i] = j;
+ }
+
+ bitmap_free(distance_map);
+
/*
- * 'level' contains the number of unique distances
+ * 'nr_levels' contains the number of unique distances
*
* The sched_domains_numa_distance[] array includes the actual distance
* numbers.
@@ -1390,15 +1382,15 @@ void sched_init_numa(void)
/*
* Here, we should temporarily reset sched_domains_numa_levels to 0.
* If it fails to allocate memory for array sched_domains_numa_masks[][],
- * the array will contain less then 'level' members. This could be
+ * the array will contain less then 'nr_levels' members. This could be
* dangerous when we use it to iterate array sched_domains_numa_masks[][]
* in other functions.
*
- * We reset it to 'level' at the end of this function.
+ * We reset it to 'nr_levels' at the end of this function.
*/
sched_domains_numa_levels = 0;
- sched_domains_numa_masks = kzalloc(sizeof(void *) * level, GFP_KERNEL);
+ sched_domains_numa_masks = kzalloc(sizeof(void *) * nr_levels, GFP_KERNEL);
if (!sched_domains_numa_masks)
return;
@@ -1406,7 +1398,7 @@ void sched_init_numa(void)
* Now for each level, construct a mask per node which contains all
* CPUs of nodes that are that many hops away from us.
*/
- for (i = 0; i < level; i++) {
+ for (i = 0; i < nr_levels; i++) {
sched_domains_numa_masks[i] =
kzalloc(nr_node_ids * sizeof(void *), GFP_KERNEL);
if (!sched_domains_numa_masks[i])
@@ -1414,12 +1406,17 @@ void sched_init_numa(void)
for (j = 0; j < nr_node_ids; j++) {
struct cpumask *mask = kzalloc(cpumask_size(), GFP_KERNEL);
+ int k;
+
if (!mask)
return;
sched_domains_numa_masks[i][j] = mask;
for_each_node(k) {
+ if (sched_debug() && (node_distance(j, k) != node_distance(k, j)))
+ sched_numa_warn("Node-distance not symmetric");
+
if (node_distance(j, k) > sched_domains_numa_distance[i])
continue;
@@ -1431,7 +1428,7 @@ void sched_init_numa(void)
/* Compute default topology size */
for (i = 0; sched_domain_topology[i].mask; i++);
- tl = kzalloc((i + level + 1) *
+ tl = kzalloc((i + nr_levels) *
sizeof(struct sched_domain_topology_level), GFP_KERNEL);
if (!tl)
return;
@@ -1454,7 +1451,7 @@ void sched_init_numa(void)
/*
* .. and append 'j' levels of NUMA goodness.
*/
- for (j = 1; j < level; i++, j++) {
+ for (j = 1; j < nr_levels; i++, j++) {
tl[i] = (struct sched_domain_topology_level){
.mask = sd_numa_mask,
.sd_flags = cpu_numa_flags,
@@ -1466,8 +1463,8 @@ void sched_init_numa(void)
sched_domain_topology = tl;
- sched_domains_numa_levels = level;
- sched_max_numa_distance = sched_domains_numa_distance[level - 1];
+ sched_domains_numa_levels = nr_levels;
+ sched_max_numa_distance = sched_domains_numa_distance[nr_levels - 1];
init_numa_topology_type();
}
next prev parent reply other threads:[~2022-03-21 14:03 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-21 13:51 [PATCH 4.19 00/57] 4.19.236-rc1 review Greg Kroah-Hartman
2022-03-21 13:51 ` [PATCH 4.19 01/57] Revert "xfrm: state and policy should fail if XFRMA_IF_ID 0" Greg Kroah-Hartman
2022-03-21 13:51 ` [PATCH 4.19 02/57] sctp: fix the processing for INIT chunk Greg Kroah-Hartman
2022-03-21 13:51 ` [PATCH 4.19 03/57] sctp: fix the processing for INIT_ACK chunk Greg Kroah-Hartman
2022-03-21 13:51 ` [PATCH 4.19 04/57] xfrm: Check if_id in xfrm_migrate Greg Kroah-Hartman
2022-03-21 13:51 ` [PATCH 4.19 05/57] xfrm: Fix xfrm migrate issues when address family changes Greg Kroah-Hartman
2022-03-21 13:51 ` [PATCH 4.19 06/57] arm64: dts: rockchip: fix rk3399-puma eMMC HS400 signal integrity Greg Kroah-Hartman
2022-03-21 13:51 ` [PATCH 4.19 07/57] arm64: dts: rockchip: reorder rk3399 hdmi clocks Greg Kroah-Hartman
2022-03-21 13:51 ` [PATCH 4.19 08/57] ARM: dts: rockchip: fix a typo on rk3288 crypto-controller Greg Kroah-Hartman
2022-03-21 13:51 ` [PATCH 4.19 09/57] MIPS: smp: fill in sibling and core maps earlier Greg Kroah-Hartman
2022-03-21 13:51 ` [PATCH 4.19 10/57] ARM: 9178/1: fix unmet dependency on BITREVERSE for HAVE_ARCH_BITREVERSE Greg Kroah-Hartman
2022-03-21 13:51 ` [PATCH 4.19 11/57] can: rcar_canfd: rcar_canfd_channel_probe(): register the CAN device when fully ready Greg Kroah-Hartman
2022-03-21 13:51 ` [PATCH 4.19 12/57] atm: firestream: check the return value of ioremap() in fs_init() Greg Kroah-Hartman
2022-03-21 13:51 ` [PATCH 4.19 13/57] nl80211: Update bss channel on channel switch for P2P_CLIENT Greg Kroah-Hartman
2022-03-21 13:51 ` [PATCH 4.19 14/57] tcp: make tcp_read_sock() more robust Greg Kroah-Hartman
2022-03-21 13:51 ` [PATCH 4.19 15/57] sfc: extend the locking on mcdi->seqno Greg Kroah-Hartman
2022-03-21 13:51 ` [PATCH 4.19 16/57] kselftest/vm: fix tests build with old libc Greg Kroah-Hartman
2022-03-21 13:51 ` Greg Kroah-Hartman [this message]
2022-03-21 13:51 ` [PATCH 4.19 18/57] sched/topology: Fix sched_domain_topology_level alloc in sched_init_numa() Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 19/57] ia64: ensure proper NUMA distance and possible map initialization Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 20/57] cpuset: Fix unsafe lock order between cpuset lock and cpuslock Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 21/57] mm: fix dereference a null pointer in migrate[_huge]_page_move_mapping() Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 22/57] fs: sysfs_emit: Remove PAGE_SIZE alignment check Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 23/57] arm64: Add part number for Arm Cortex-A77 Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 24/57] arm64: Add Neoverse-N2, Cortex-A710 CPU part definition Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 25/57] arm64: Add Cortex-X2 " Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 26/57] arm64: entry.S: Add ventry overflow sanity checks Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 27/57] arm64: entry: Make the trampoline cleanup optional Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 28/57] arm64: entry: Free up another register on kptis tramp_exit path Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 29/57] arm64: entry: Move the trampoline data page before the text page Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 30/57] arm64: entry: Allow tramp_alias to access symbols after the 4K boundary Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 31/57] arm64: entry: Dont assume tramp_vectors is the start of the vectors Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 32/57] arm64: entry: Move trampoline macros out of ifdefd section Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 33/57] arm64: entry: Make the kpti trampolines kpti sequence optional Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 34/57] arm64: entry: Allow the trampoline text to occupy multiple pages Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 35/57] arm64: entry: Add non-kpti __bp_harden_el1_vectors for mitigations Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 36/57] arm64: entry: Add vectors that have the bhb mitigation sequences Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 37/57] arm64: entry: Add macro for reading symbol addresses from the trampoline Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 38/57] arm64: Add percpu vectors for EL1 Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 39/57] arm64: proton-pack: Report Spectre-BHB vulnerabilities as part of Spectre-v2 Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 40/57] KVM: arm64: Add templates for BHB mitigation sequences Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 41/57] arm64: Mitigate spectre style branch history side channels Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 42/57] KVM: arm64: Allow SMCCC_ARCH_WORKAROUND_3 to be discovered and migrated Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 43/57] arm64: add ID_AA64ISAR2_EL1 sys register Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 44/57] arm64: Use the clearbhb instruction in mitigations Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 45/57] crypto: qcom-rng - ensure buffer for generate is completely filled Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 46/57] ocfs2: fix crash when initialize filecheck kobj fails Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 47/57] efi: fix return value of __setup handlers Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 48/57] net/packet: fix slab-out-of-bounds access in packet_recvmsg() Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 49/57] atm: eni: Add check for dma_map_single Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 50/57] hv_netvsc: Add check for kvmalloc_array Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 51/57] drm/panel: simple: Fix Innolux G070Y2-L01 BPP settings Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 52/57] net: handle ARPHRD_PIMREG in dev_is_mac_header_xmit() Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 53/57] net: dsa: Add missing of_node_put() in dsa_port_parse_of Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 54/57] usb: gadget: rndis: prevent integer overflow in rndis_set_response() Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 55/57] usb: gadget: Fix use-after-free bug by not setting udc->dev.driver Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 56/57] Input: aiptek - properly check endpoint type Greg Kroah-Hartman
2022-03-21 13:52 ` [PATCH 4.19 57/57] perf symbols: Fix symbol size calculation condition Greg Kroah-Hartman
2022-03-21 18:00 ` [PATCH 4.19 00/57] 4.19.236-rc1 review Pavel Machek
2022-03-21 23:24 ` Shuah Khan
2022-03-22 1:59 ` Guenter Roeck
2022-03-22 12:54 ` Sudip Mukherjee
2022-03-22 15:23 ` Naresh Kamboju
2022-03-23 0:55 ` Samuel Zou
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220321133222.485936257@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=dann.frazier@canonical.com \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=stable@vger.kernel.org \
--cc=valentin.schneider@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox