From: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
To: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>,
Peter Zijlstra <peterz@infradead.org>,
Scott Cheloha <cheloha@linux.ibm.com>,
Geetika Moolchandani <Geetika.Moolchandani1@ibm.com>,
Valentin Schneider <valentin.schneider@arm.com>,
Laurent Dufour <ldufour@linux.vnet.ibm.com>,
linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
Ingo Molnar <mingo@kernel.org>
Subject: Re: [PATCH 1/1] powerpc/smp: Set numa node before updating mask
Date: Fri, 9 Apr 2021 15:44:09 +0530 [thread overview]
Message-ID: <20210409101409.GA2633526@linux.vnet.ibm.com> (raw)
In-Reply-To: <87czvdbova.fsf@linux.ibm.com>
Hey Nathan,
>
> Regardless of your change: at boot time, this set of calls to
> set_numa_node() and set_numa_mem() is redundant, right? Because
> smp_prepare_cpus() has:
>
> for_each_possible_cpu(cpu) {
> ...
> if (cpu_present(cpu)) {
> set_cpu_numa_node(cpu, numa_cpu_lookup_table[cpu]);
> set_cpu_numa_mem(cpu,
> local_memory_node(numa_cpu_lookup_table[cpu]));
> }
>
> I would rather that, when onlining a CPU that happens to have been
> dynamically added after boot, we enter start_secondary() with conditions
> equivalent to those at boot time. Or as close to that as is practical.
>
> So I'd suggest that pseries_add_processor() be made to update
> these things when the CPUs are marked present, before onlining them.
I did try updating at pseries_add_processor when things were being marked as
present. But looks like the zonelists may not be updated at that time.
I saw couple of crashes in local_memory_node() when dplar adding CPUs.
(Appending the patch that causes these crash to this mail for your reference)
[ 293.669901] Kernel attempted to read user page (1508) - exploit attempt? (uid: 0) [1309/17174]
[ 293.669925] BUG: Kernel NULL pointer dereference on read at 0x00001508
[ 293.669935] Faulting instruction address: 0xc000000000484ae0
[ 293.669947] Oops: Kernel access of bad area, sig: 11 [#1]
[ 293.669956] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
[ 293.669969] Modules linked in: nft_counter nft_compat rpadlpar_io rpaphp
mptcp_diag xsk_diag tcp_diag udp_diag raw_diag inet_diag unix_diag
af_packet_diag netlink_diag bonding tls nft_fib_inet nft_fib_ipv4 nft_ fib_ipv6
nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set
nf_tables nfnetlink dm_multipath pseries_rng xts vmx_c rypto uio_pdrv_genirq
uio binfmt_misc ip_tables xfs libcrc32c sd_mod t10_pi sg ibmvscsi ibmveth
scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod fuse
[ 293.670101] CPU: 17 PID: 3183 Comm: drmgr Not tainted 5.12.0-rc5-master+ #2
[ 293.670113] NIP: c000000000484ae0 LR: c00000000010a548 CTR: 00000000006dd130
[ 293.670121] REGS: c0000025a9997160 TRAP: 0300 Not tainted (5.12.0-rc5-master+)
[ 293.670131] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 48008422 XER: 00000008
[ 293.670155] CFAR: c00000000010a544 DAR: 0000000000001508 DSISR: 40000000 IRQMASK: 0
[ 293.670155] GPR00: c00000000010a548 c0000025a9997400 c000000001f55500 0000000000000000
[ 293.670155] GPR04: c000000001a3c598 0000000000000006 0000000000000027 c0000035873b8018
[ 293.670155] GPR08: 0000000000000023 c0000000020464f8 00000035894d0000 c000003584c8ffe8
[ 293.670155] GPR12: 0000000028008424 c0000035bf22ce80 0000000000000000 0000000000000000
[ 293.670155] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 293.670155] GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000001fbc160
[ 293.670155] GPR24: 0000000000000002 0000000000000200 c000000001fc04c8 0000000000000001
[ 293.670155] GPR28: c0000000010c6fc8 c000000001fc08a0 c000002f8fb99e60 0000000000000040
[ 293.670263] NIP [c000000000484ae0] local_memory_node+0x20/0x80
[ 293.670281] LR [c00000000010a548] pseries_add_processor+0x368/0x3b0
[ 293.670292] Call Trace:
[ 293.670297] [c0000025a9997400] [c00000000010a524] pseries_add_processor+0x344/0x3b0 (unreliable)
[ 293.670311] [c0000025a99976c0] [c00000000010a790] pseries_smp_notifier+0x200/0x2a0
[ 293.670322] [c0000025a9997780] [c000000000197288] blocking_notifier_call_chain+0xa8/0x110
[ 293.670335] [c0000025a99977d0] [c000000000b27010] of_attach_node+0xc0/0x110
[ 293.670347] [c0000025a9997830] [c0000000001032a0] dlpar_attach_node+0x30/0x70
[ 293.670358] [c0000025a99978a0] [c000000000109ac0] dlpar_cpu_add+0x1d0/0x510
[ 293.670368] [c0000025a9997980] [c00000000010a898] dlpar_cpu+0x68/0x6e0
[ 293.670377] [c0000025a9997a80] [c0000000001036b8] handle_dlpar_errorlog+0xf8/0x190
[ 293.670388] [c0000025a9997af0] [c000000000103928] dlpar_store+0x178/0x4a0
[ 293.670396] [c0000025a9997bb0] [c0000000007df050] kobj_attr_store+0x30/0x50
[ 293.670408] [c0000025a9997bd0] [c00000000062f0b0] sysfs_kf_write+0x70/0xb0
[ 293.670421] [c0000025a9997c10] [c00000000062d4e0] kernfs_fop_write_iter+0x1d0/0x280
[ 293.670432] [c0000025a9997c60] [c00000000051673c] new_sync_write+0x14c/0x1d0
[ 293.670445] [c0000025a9997d00] [c000000000519df4] vfs_write+0x264/0x380
[ 293.670455] [c0000025a9997d60] [c00000000051a0ec] ksys_write+0x7c/0x140
[ 293.670464] [c0000025a9997db0] [c000000000032af0] system_call_exception+0x150/0x290
[ 293.670475] [c0000025a9997e10] [c00000000000d45c] system_call_common+0xec/0x278
[ 293.670486] --- interrupt: c00 at 0x20000025bd74
That leaves us with just 2 options for now.
1. Update numa_mem later and only update numa_node here.
- Over a longer period of time, this would be more confusing since we
may lose track of why we are splitting the set of numa_node and numa_mem.
or
2. Use my earlier patch.
My choice would be to go with my earlier patch.
Please do let me know your thoughts on the same.
--
Thanks and Regards
Srikar Dronamraju
diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
index 3beeb030cd78..1cdd83703f93 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -44,6 +44,7 @@ extern void __init dump_numa_cpu_topology(void);
extern int sysfs_add_device_to_node(struct device *dev, int nid);
extern void sysfs_remove_device_from_node(struct device *dev, int nid);
+extern int numa_setup_cpu(unsigned long cpu);
static inline void update_numa_cpu_lookup_table(unsigned int cpu, int node)
{
@@ -81,6 +82,11 @@ static inline void sysfs_remove_device_from_node(struct device *dev,
{
}
+static int numa_setup_cpu(unsigned long cpu)
+{
+ return first_online_node;
+}
+
static inline void update_numa_cpu_lookup_table(unsigned int cpu, int node) {}
static inline int cpu_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc)
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 5a4d59a1070d..0d0c71ba4672 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1539,9 +1539,6 @@ void start_secondary(void *unused)
shared_caches = true;
}
- set_numa_node(numa_cpu_lookup_table[cpu]);
- set_numa_mem(local_memory_node(numa_cpu_lookup_table[cpu]));
-
smp_wmb();
notify_cpu_starting(cpu);
set_cpu_online(cpu, true);
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index f2bf98bdcea2..11914a28db67 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -501,7 +501,7 @@ static int vphn_get_nid(long unused)
* Figure out to which domain a cpu belongs and stick it there.
* Return the id of the domain used.
*/
-static int numa_setup_cpu(unsigned long lcpu)
+int numa_setup_cpu(unsigned long lcpu)
{
struct device_node *cpu;
int fcpu = cpu_first_thread_sibling(lcpu);
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 12cbffd3c2e3..311fbe916d5b 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -198,9 +198,14 @@ static int pseries_add_processor(struct device_node *np)
}
for_each_cpu(cpu, tmp) {
+ int nid;
+
BUG_ON(cpu_present(cpu));
set_cpu_present(cpu, true);
set_hard_smp_processor_id(cpu, be32_to_cpu(*intserv++));
+ nid = numa_setup_cpu(cpu);
+ set_cpu_numa_node(cpu, nid);
+ set_cpu_numa_mem(cpu, local_memory_node(nid));
}
err = 0;
out_unlock:
next prev parent reply other threads:[~2021-04-09 10:15 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-01 15:42 [PATCH 1/1] powerpc/smp: Set numa node before updating mask Srikar Dronamraju
2021-04-01 22:51 ` Nathan Lynch
2021-04-02 3:18 ` Srikar Dronamraju
2021-04-07 12:19 ` Nathan Lynch
2021-04-07 16:49 ` Srikar Dronamraju
2021-04-07 19:46 ` Nathan Lynch
2021-04-08 11:11 ` Srikar Dronamraju
2021-04-08 20:00 ` Nathan Lynch
2021-04-09 10:14 ` Srikar Dronamraju [this message]
2021-04-13 12:23 ` Nathan Lynch
2021-04-13 12:25 ` Nathan Lynch
2021-04-19 4:00 ` Michael Ellerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210409101409.GA2633526@linux.vnet.ibm.com \
--to=srikar@linux.vnet.ibm.com \
--cc=Geetika.Moolchandani1@ibm.com \
--cc=cheloha@linux.ibm.com \
--cc=ego@linux.vnet.ibm.com \
--cc=ldufour@linux.vnet.ibm.com \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mingo@kernel.org \
--cc=nathanl@linux.ibm.com \
--cc=peterz@infradead.org \
--cc=valentin.schneider@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).