From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Valentin Schneider <valentin.schneider@arm.com>,
Geetika Moolchandani <Geetika.Moolchandani1@ibm.com>,
Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
Peter Zijlstra <peterz@infradead.org>,
Sasha Levin <sashal@kernel.org>
Subject: [PATCH AUTOSEL 5.14 28/47] sched/topology: Skip updating masks for non-online nodes
Date: Sun, 5 Sep 2021 21:19:32 -0400 [thread overview]
Message-ID: <20210906011951.928679-28-sashal@kernel.org> (raw)
In-Reply-To: <20210906011951.928679-1-sashal@kernel.org>
From: Valentin Schneider <valentin.schneider@arm.com>
[ Upstream commit 0083242c93759dde353a963a90cb351c5c283379 ]
The scheduler currently expects NUMA node distances to be stable from
init onwards, and as a consequence builds the related data structures
once-and-for-all at init (see sched_init_numa()).
Unfortunately, on some architectures node distance is unreliable for
offline nodes and may very well change upon onlining.
Skip over offline nodes during sched_init_numa(). Track nodes that have
been onlined at least once, and trigger a build of a node's NUMA masks
when it is first onlined post-init.
Reported-by: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210818074333.48645-1-srikar@linux.vnet.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
kernel/sched/topology.c | 65 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 65 insertions(+)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index b77ad49dc14f..4e8698e62f07 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1482,6 +1482,8 @@ int sched_max_numa_distance;
static int *sched_domains_numa_distance;
static struct cpumask ***sched_domains_numa_masks;
int __read_mostly node_reclaim_distance = RECLAIM_DISTANCE;
+
+static unsigned long __read_mostly *sched_numa_onlined_nodes;
#endif
/*
@@ -1833,6 +1835,16 @@ void sched_init_numa(void)
sched_domains_numa_masks[i][j] = mask;
for_each_node(k) {
+ /*
+ * Distance information can be unreliable for
+ * offline nodes, defer building the node
+ * masks to its bringup.
+ * This relies on all unique distance values
+ * still being visible at init time.
+ */
+ if (!node_online(j))
+ continue;
+
if (sched_debug() && (node_distance(j, k) != node_distance(k, j)))
sched_numa_warn("Node-distance not symmetric");
@@ -1886,6 +1898,53 @@ void sched_init_numa(void)
sched_max_numa_distance = sched_domains_numa_distance[nr_levels - 1];
init_numa_topology_type();
+
+ sched_numa_onlined_nodes = bitmap_alloc(nr_node_ids, GFP_KERNEL);
+ if (!sched_numa_onlined_nodes)
+ return;
+
+ bitmap_zero(sched_numa_onlined_nodes, nr_node_ids);
+ for_each_online_node(i)
+ bitmap_set(sched_numa_onlined_nodes, i, 1);
+}
+
+static void __sched_domains_numa_masks_set(unsigned int node)
+{
+ int i, j;
+
+ /*
+ * NUMA masks are not built for offline nodes in sched_init_numa().
+ * Thus, when a CPU of a never-onlined-before node gets plugged in,
+ * adding that new CPU to the right NUMA masks is not sufficient: the
+ * masks of that CPU's node must also be updated.
+ */
+ if (test_bit(node, sched_numa_onlined_nodes))
+ return;
+
+ bitmap_set(sched_numa_onlined_nodes, node, 1);
+
+ for (i = 0; i < sched_domains_numa_levels; i++) {
+ for (j = 0; j < nr_node_ids; j++) {
+ if (!node_online(j) || node == j)
+ continue;
+
+ if (node_distance(j, node) > sched_domains_numa_distance[i])
+ continue;
+
+ /* Add remote nodes in our masks */
+ cpumask_or(sched_domains_numa_masks[i][node],
+ sched_domains_numa_masks[i][node],
+ sched_domains_numa_masks[0][j]);
+ }
+ }
+
+ /*
+ * A new node has been brought up, potentially changing the topology
+ * classification.
+ *
+ * Note that this is racy vs any use of sched_numa_topology_type :/
+ */
+ init_numa_topology_type();
}
void sched_domains_numa_masks_set(unsigned int cpu)
@@ -1893,8 +1952,14 @@ void sched_domains_numa_masks_set(unsigned int cpu)
int node = cpu_to_node(cpu);
int i, j;
+ __sched_domains_numa_masks_set(node);
+
for (i = 0; i < sched_domains_numa_levels; i++) {
for (j = 0; j < nr_node_ids; j++) {
+ if (!node_online(j))
+ continue;
+
+ /* Set ourselves in the remote node's masks */
if (node_distance(j, node) <= sched_domains_numa_distance[i])
cpumask_set_cpu(cpu, sched_domains_numa_masks[i][j]);
}
--
2.30.2
next prev parent reply other threads:[~2021-09-06 1:20 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-06 1:19 [PATCH AUTOSEL 5.14 01/47] locking/mutex: Fix HANDOFF condition Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 02/47] regmap: fix the offset of register error log Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 03/47] regulator: tps65910: Silence deferred probe error Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 04/47] crypto: mxs-dcp - Check for DMA mapping errors Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 05/47] sched/deadline: Fix reset_on_fork reporting of DL tasks Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 06/47] power: supply: axp288_fuel_gauge: Report register-address on readb / writeb errors Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 07/47] crypto: omap-sham - clear dma flags only after omap_sham_update_dma_stop() Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 08/47] sched/deadline: Fix missing clock update in migrate_task_rq_dl() Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 09/47] rcu/tree: Handle VM stoppage in stall detection Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 10/47] EDAC/mce_amd: Do not load edac_mce_amd module on guests Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 11/47] posix-cpu-timers: Force next expiration recalc after itimer reset Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 12/47] hrtimer: Avoid double reprogramming in __hrtimer_start_range_ns() Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 13/47] hrtimer: Ensure timerfd notification for HIGHRES=n Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 14/47] udf: Check LVID earlier Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 15/47] udf: Fix iocharset=utf8 mount option Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 16/47] isofs: joliet: " Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 17/47] bcache: add proper error unwinding in bcache_device_init Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 18/47] nbd: add the check to prevent overflow in __nbd_ioctl() Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 19/47] blk-throtl: optimize IOPS throttle for large IO scenarios Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 20/47] nvme-tcp: don't update queue count when failing to set io queues Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 21/47] nvme-rdma: " Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 22/47] nvmet: pass back cntlid on successful completion Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 23/47] power: supply: smb347-charger: Add missing pin control activation Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 24/47] power: supply: max17042_battery: fix typo in MAx17042_TOFF Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 25/47] s390/cio: add dev_busid sysfs entry for each subchannel Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 26/47] s390/zcrypt: fix wrong offset index for APKA master key valid state Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 27/47] libata: fix ata_host_start() Sasha Levin
2021-09-06 1:19 ` Sasha Levin [this message]
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 29/47] crypto: omap - Fix inconsistent locking of device lists Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 30/47] crypto: qat - do not ignore errors from enable_vf2pf_comms() Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 31/47] crypto: qat - handle both source of interrupt in VF ISR Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 32/47] crypto: qat - fix reuse of completion variable Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 33/47] crypto: qat - fix naming for init/shutdown VF to PF notifications Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 34/47] crypto: qat - do not export adf_iov_putmsg() Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 35/47] crypto: hisilicon/sec - fix the abnormal exiting process Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 36/47] crypto: hisilicon/sec - modify the hardware endian configuration Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 37/47] crypto: tcrypt - Fix missing return value check Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 38/47] fcntl: fix potential deadlocks for &fown_struct.lock Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 39/47] fcntl: fix potential deadlock for &fasync_struct.fa_lock Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 40/47] udf_get_extendedattr() had no boundary checks Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 41/47] io-wq: remove GFP_ATOMIC allocation off schedule out path Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 42/47] s390/kasan: fix large PMD pages address alignment check Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 43/47] s390/pci: fix misleading rc in clp_set_pci_fn() Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 44/47] s390/debug: keep debug data on resize Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 45/47] s390/debug: fix debug area life cycle Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 46/47] s390/ap: fix state machine hang after failure to enable irq Sasha Levin
2021-09-06 1:19 ` [PATCH AUTOSEL 5.14 47/47] s390/smp: enable DAT before CPU restart callback is called Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210906011951.928679-28-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=Geetika.Moolchandani1@ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=srikar@linux.vnet.ibm.com \
--cc=stable@vger.kernel.org \
--cc=valentin.schneider@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox