From: Michael Bringmann <mwb@linux.vnet.ibm.com>
To: linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH V7 3/3] hotplug/cpu: Fix crash with memoryless nodes
Date: Mon, 27 Nov 2017 14:36:01 -0600 [thread overview]
Message-ID: <73c6417b-5120-7915-cfac-2c187543e965@linux.vnet.ibm.com> (raw)
In-Reply-To: <08960615-15b4-1a14-a10b-9f0053c8163e@linux.vnet.ibm.com>
See below.
On 11/20/2017 10:50 AM, Nathan Fontenot wrote:
> On 11/16/2017 11:28 AM, Michael Bringmann wrote:
>> On powerpc systems with shared configurations of CPUs and memory and
>> memoryless nodes at boot, an event ordering problem was observed on
>> a SLES12 build platforms with the hot-add of CPUs to the memoryless
>> nodes.
>>
>> * The most common error occurred when the memory SLAB driver attempted
>> to reference the memoryless node to which a CPU was being added
>> before the kernel had finished initializing all of the data structures
>> for the CPU and exited 'device_online' under DLPAR/hot-add.
>>
>> Normally the memoryless node would be initialized through the call
>> path device_online ... arch_update_cpu_topology ... find_cpu_nid
>> ... try_online_node. This patch ensures that the powerpc node will
>> be initialized as early as possible, even if it was memoryless and
>> CPU-less at the point when we are trying to hot-add a new CPU to it.
>>
>> Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
>> ---
>> Changes in V7:
>> -- Make function find_cpu_nid() externally visible/usable so that
>> it may be used from hotplug-cpu.c
>> ---
>> arch/powerpc/mm/numa.c | 3 ++-
>> arch/powerpc/platforms/pseries/hotplug-cpu.c | 3 +++
>> 2 files changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
>> index 163f4cc..d6d4f7c 100644
>> --- a/arch/powerpc/mm/numa.c
>> +++ b/arch/powerpc/mm/numa.c
>> @@ -1310,7 +1310,7 @@ static long vphn_get_associativity(unsigned long cpu,
>> return rc;
>> }
>>
>> -static inline int find_cpu_nid(int cpu)
>> +int find_cpu_nid(int cpu)
>> {
>> __be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
>> int new_nid;
>> @@ -1343,6 +1343,7 @@ static inline int find_cpu_nid(int cpu)
>> #endif
>> }
>>
>> + printk(KERN_INFO "%s:%d cpu %d nid %d\n", __FUNCTION__, __LINE__, cpu, new_nid);
>
> This seems like a more likely pr_debug statement.
Yes. Changed.
>> return new_nid;
>> }
>>
>> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> index a7d14aa7..df8c732 100644
>> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> @@ -340,6 +340,8 @@ static void pseries_remove_processor(struct device_node *np)
>> cpu_maps_update_done();
>> }
>>
>> +extern int find_cpu_nid(int cpu);
>> +
>> static int dlpar_online_cpu(struct device_node *dn)
>> {
>> int rc = 0;
>> @@ -364,6 +366,7 @@ static int dlpar_online_cpu(struct device_node *dn)
>> != CPU_STATE_OFFLINE);
>> cpu_maps_update_done();
>> timed_topology_update(1);
>> + find_cpu_nid(cpu);
>
> We don't use the returned node from this call, so I'm not sure why it gets
> called. Perhaps its the possible call to try_online_node() that may get called
> in find_cpu_nid(), if so perhpas naming the routine something slightly
> different would be good, like find_and_online_cpu_nid?
The function find_cpu_nid() gets called here for its operations to ensure that
the node is online and initialized during the 'device_online()' operation for
the new CPU that immediately follows. As I tried to explain in the patch
description, I ran into a case during the SLES12 SP3 backport test where failure
to initialize/online the node early on led to a crash in the memory SLAB driver.
I don't know why that driver was being called so early, but ensuring that the
node was setup as early as possible provided the simplest and smallest patch.
As all of the functionality had been placed in the function 'find_cpu_nid' in
'numa.c' during a previous patch, I reused it here. Though I did not need
the actual node Id during the call dlpar_online_cpu().
Yes, I can provide a separate calling interface named 'find_and_online_cpu_nid'
for the call from 'dlpar_online_cpu'. They would both do the same operations.
Or I could just use the new name for the current implementation of 'find_cpu_nid',
instead, and migrate that name to the previous patches.
If I don't hear any further remarks on this point, I will implement/migrate
the function name 'find_and_online_cpu_nid' in the next version of this patch.
>
> -Nathan
Michael
>> rc = device_online(get_cpu_device(cpu));
>> if (rc)
>> goto out;
>>
>
>
--
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line 363-5196
External: (512) 286-5196
Cell: (512) 466-0650
mwb@linux.vnet.ibm.com
prev parent reply other threads:[~2017-11-27 20:36 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-16 17:24 [PATCH V7 0/2] powerpc/nodes: Fix issues with memoryless nodes Michael Bringmann
2017-11-16 17:24 ` [PATCH V7 1/3] powerpc/nodes: Ensure enough nodes avail for operations Michael Bringmann
2017-11-20 16:33 ` Nathan Fontenot
2017-11-22 11:17 ` Michael Ellerman
2017-11-27 20:02 ` Michael Bringmann
2017-11-27 20:02 ` Michael Bringmann
2017-11-16 17:24 ` [PATCH V7 2/3] poserpc/initnodes: Ensure nodes initialized for hotplug Michael Bringmann
2017-11-16 17:27 ` RESEND " Michael Bringmann
2017-11-20 16:45 ` Nathan Fontenot
2017-11-27 20:16 ` Michael Bringmann
2017-11-16 17:28 ` [PATCH V7 3/3] hotplug/cpu: Fix crash with memoryless nodes Michael Bringmann
2017-11-20 16:50 ` Nathan Fontenot
2017-11-27 20:36 ` Michael Bringmann [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=73c6417b-5120-7915-cfac-2c187543e965@linux.vnet.ibm.com \
--to=mwb@linux.vnet.ibm.com \
--cc=linuxppc-dev@lists.ozlabs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).