linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V8 0/3] powerpc/nodes: Fix issues with memoryless nodes
@ 2017-11-28 22:58 Michael Bringmann
  2017-11-28 22:58 ` [PATCH V8 1/3] powerpc/nodes: Ensure enough nodes avail for operations Michael Bringmann
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Michael Bringmann @ 2017-11-28 22:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michael Bringmann, Nathan Fontenot

powerpc/nodes: Ensure enough nodes avail for operations

powerpc/initnodes: Ensure nodes initialized for hotplug

hotplug/cpu: Fix crash with memoryless nodes

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>

Michael Bringmann (3):
  powerpc/nodes: Ensure enough nodes avail for operations
  powerpc/initnodes: Ensure nodes initialized for hotplug
  hotplug/cpu: Fix crash with memoryless nodes
---
Changes in V8:
  -- Remove unneeded pr_info() statement
  -- Clarify 'resources' as 'CPUs' in patch description regarding
     VPHN call.  Add another clause to statement mentioning that
     shared CPUs start in node 0, and are finally assigned per
     VPHN information.
  -- Change a 'printk(KERN_INFO ...)' statement to be a pr_debug()
     statement.
  -- Rename 'find_cpu_nid' to 'find_and_online_cpu_nid' for better
     clarity of its function.
  -- Restore '__init' tag to definition of 'setup_node_data'

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH V8 1/3] powerpc/nodes: Ensure enough nodes avail for operations
  2017-11-28 22:58 [PATCH V8 0/3] powerpc/nodes: Fix issues with memoryless nodes Michael Bringmann
@ 2017-11-28 22:58 ` Michael Bringmann
  2018-01-08 19:13   ` Nathan Fontenot
  2018-01-29  4:13   ` [V8,1/3] " Michael Ellerman
  2017-11-28 22:58 ` [PATCH V8 2/3] poserpc/initnodes: Ensure nodes initialized for hotplug Michael Bringmann
  2017-11-28 22:58 ` [PATCH V8 3/3] hotplug/cpu: Fix crash with memoryless nodes Michael Bringmann
  2 siblings, 2 replies; 8+ messages in thread
From: Michael Bringmann @ 2017-11-28 22:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michael Bringmann, Nathan Fontenot

On powerpc systems which allow 'hot-add' of CPU or memory resources,
it may occur that the new resources are to be inserted into nodes
that were not used for these resources at bootup.  In the kernel,
any node that is used must be defined and initialized.  These empty
nodes may occur when,

* Dedicated vs. shared resources.  Shared resources require
  information such as the VPHN hcall for CPU assignment to nodes.
  Associativity decisions made based on dedicated resource rules,
  such as associativity properties in the device tree, may vary
  from decisions made using the values returned by the VPHN hcall.
* memoryless nodes at boot.  Nodes need to be defined as 'possible'
  at boot for operation with other code modules.  Previously, the
  powerpc code would limit the set of possible nodes to those which
  have memory assigned at boot, and were thus online.  Subsequent
  add/remove of CPUs or memory would only work with this subset of
  possible nodes.
* memoryless nodes with CPUs at boot.  Due to the previous restriction
  on nodes, nodes that had CPUs but no memory were being collapsed
  into other nodes that did have memory at boot.  In practice this
  meant that the node assignment presented by the runtime kernel
  differed from the affinity and associativity attributes presented
  by the device tree or VPHN hcalls.  Nodes that might be known to
  the pHyp were not 'possible' in the runtime kernel because they did
  not have memory at boot.

This patch ensures that sufficient nodes are defined to support
configuration requirements after boot, as well as at boot.  This
patch set fixes a couple of problems.

* Nodes known to powerpc to be memoryless at boot, but to have
  CPUs in them are allowed to be 'possible' and 'online'.  Memory
  allocations for those nodes are taken from another node that does
  have memory until and if memory is hot-added to the node.
* Nodes which have no resources assigned at boot, but which may still
  be referenced subsequently by affinity or associativity attributes,
  are kept in the list of 'possible' nodes for powerpc.  Hot-add of
  memory or CPUs to the system can reference these nodes and bring
  them online instead of redirecting to one of the set of nodes that
  were known to have memory at boot.

This patch extracts the value of the lowest domain level (number of
allocable resources) from the device tree property
"ibm,max-associativity-domains" to use as the maximum number of nodes
to setup as possibly available in the system.  This new setting will
override the instruction,

    nodes_and(node_possible_map, node_possible_map, node_online_map);

presently seen in the function arch/powerpc/mm/numa.c:initmem_init().

If the "ibm,max-associativity-domains" property is not present at boot,
no operation will be performed to define or enable additional nodes, or
enable the above 'nodes_and()'.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
---
Changes in V8:
  -- Remove unneeded pr_info() statement
---
 arch/powerpc/mm/numa.c |   37 ++++++++++++++++++++++++++++++++++---
 1 file changed, 34 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index adb6364f..735e3fd 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -892,6 +892,34 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
 	NODE_DATA(nid)->node_spanned_pages = spanned_pages;
 }
 
+static void __init find_possible_nodes(void)
+{
+	struct device_node *rtas;
+	u32 numnodes, i;
+
+	if (min_common_depth <= 0)
+		return;
+
+	rtas = of_find_node_by_path("/rtas");
+	if (!rtas)
+		return;
+
+	if (of_property_read_u32_index(rtas,
+				"ibm,max-associativity-domains",
+				min_common_depth, &numnodes))
+		goto out;
+
+	for (i = 0; i < numnodes; i++) {
+		if (!node_possible(i)) {
+			setup_node_data(i, 0, 0);
+			node_set(i, node_possible_map);
+		}
+	}
+
+out:
+	of_node_put(rtas);
+}
+
 void __init initmem_init(void)
 {
 	int nid, cpu;
@@ -905,12 +933,15 @@ void __init initmem_init(void)
 	memblock_dump_all();
 
 	/*
-	 * Reduce the possible NUMA nodes to the online NUMA nodes,
-	 * since we do not support node hotplug. This ensures that  we
-	 * lower the maximum NUMA node ID to what is actually present.
+	 * Modify the set of possible NUMA nodes to reflect information
+	 * available about the set of online nodes, and the set of nodes
+	 * that we expect to make use of for this platform's affinity
+	 * calculations.
 	 */
 	nodes_and(node_possible_map, node_possible_map, node_online_map);
 
+	find_possible_nodes();
+
 	for_each_online_node(nid) {
 		unsigned long start_pfn, end_pfn;
 

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH V8 2/3] poserpc/initnodes: Ensure nodes initialized for hotplug
  2017-11-28 22:58 [PATCH V8 0/3] powerpc/nodes: Fix issues with memoryless nodes Michael Bringmann
  2017-11-28 22:58 ` [PATCH V8 1/3] powerpc/nodes: Ensure enough nodes avail for operations Michael Bringmann
@ 2017-11-28 22:58 ` Michael Bringmann
  2018-01-08 19:14   ` Nathan Fontenot
  2017-11-28 22:58 ` [PATCH V8 3/3] hotplug/cpu: Fix crash with memoryless nodes Michael Bringmann
  2 siblings, 1 reply; 8+ messages in thread
From: Michael Bringmann @ 2017-11-28 22:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michael Bringmann, Nathan Fontenot

On powerpc systems which allow 'hot-add' of CPU, it may occur that
the new resources are to be inserted into nodes that were not used
for memory resources at bootup.  Many different configurations of
PowerPC resources may need to be supported depending upon the
environment.  Important characteristics of the nodes and operating
environment include:

* Dedicated vs. shared CPUs.  Shared CPUs require information such
  as the VPHN hcall for CPU assignment to nodes, since shared CPUs
  have their affinity set to node 0 at boot and when hot-added.
  Associativity decisions made based on dedicated resource rules,
  such as associativity properties in the device tree, may vary from
  decisions made using the values returned by the VPHN hcall.
* memoryless nodes at boot.  Nodes need to be defined as 'possible'
  at boot for operation with other code modules.  Previously, the
  powerpc code would limit the set of possible nodes to those which
  have memory assigned at boot, and were thus online.  Subsequent
  add/remove of CPUs or memory would only work with this subset of
  possible nodes.
* memoryless nodes with CPUs at boot.  Due to the previous restriction
  on nodes, nodes that had CPUs but no memory were being collapsed
  into other nodes that did have memory at boot.  In practice this
  meant that the node assignment presented by the runtime kernel
  differed from the affinity and associativity attributes presented
  by the device tree or VPHN hcalls.  Nodes that might be known to
  the pHyp were not 'possible' in the runtime kernel because they did
  not have memory at boot.

This patch fixes some problems encountered at runtime with
configurations that support memory-less nodes, or that hot-add CPUs
into nodes that are memoryless during system execution after boot.
The problems of interest include,

* Nodes known to powerpc to be memoryless at boot, but to have
  CPUs in them are allowed to be 'possible' and 'online'.  Memory
  allocations for those nodes are taken from another node that does
  have memory until and if memory is hot-added to the node.
* Nodes which have no resources assigned at boot, but which may still
  be referenced subsequently by affinity or associativity attributes,
  are kept in the list of 'possible' nodes for powerpc.  Hot-add of
  memory or CPUs to the system can reference these nodes and bring
  them online instead of redirecting the references to one of the set
  of nodes known to have memory at boot.

Note that this software operates under the context of CPU hotplug.
We are not doing memory hotplug in this code, but rather updating
the kernel's CPU topology (i.e. arch_update_cpu_topology /
numa_update_cpu_topology).  We are initializing a node that may be
used by CPUs or memory before it can be referenced as invalid by a
CPU hotplug operation.  CPU hotplug operations are protected by a
range of APIs including cpu_maps_update_begin/cpu_maps_update_done,
cpus_read/write_lock / cpus_read/write_unlock, device locks, and more.
Memory hotplug operations, including try_online_node, are protected
by mem_hotplug_begin/mem_hotplug_done, device locks, and more.  In
the case of CPUs being hot-added to a previously memoryless node, the
try_online_node operation occurs wholly within the CPU locks with no
overlap.  Using HMC hot-add/hot-remove operations, we have been able
to add and remove CPUs to any possible node without failures.  HMC
operations involve a degree self-serialization, though.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
---
Changes in V8:
  -- Clarify 'resources' as CPUs in patch description regarding
     VPHN call.  Add another clause to statement mentioning that
     shared CPUs start in node 0, and are finally assigned per
     VPHN information.
  -- Rename 'find_cpu_nid' to 'find_and_online_cpu_nid' for better
     clarity of its function.
  -- Restore '__init' tag to definition of 'setup_node_data'
---
 arch/powerpc/mm/numa.c |   49 ++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 39 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 735e3fd..6b08dd8 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -551,7 +551,7 @@ static int numa_setup_cpu(unsigned long lcpu)
 	nid = of_node_to_nid_single(cpu);
 
 out_present:
-	if (nid < 0 || !node_online(nid))
+	if (nid < 0 || !node_possible(nid))
 		nid = first_online_node;
 
 	map_cpu_to_node(lcpu, nid);
@@ -910,10 +910,8 @@ static void __init find_possible_nodes(void)
 		goto out;
 
 	for (i = 0; i < numnodes; i++) {
-		if (!node_possible(i)) {
-			setup_node_data(i, 0, 0);
+		if (!node_possible(i))
 			node_set(i, node_possible_map);
-		}
 	}
 
 out:
@@ -1309,6 +1307,42 @@ static long vphn_get_associativity(unsigned long cpu,
 	return rc;
 }
 
+static inline int find_and_online_cpu_nid(int cpu)
+{
+	__be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
+	int new_nid;
+
+	/* Use associativity from first thread for all siblings */
+	vphn_get_associativity(cpu, associativity);
+	new_nid = associativity_to_nid(associativity);
+	if (new_nid < 0 || !node_possible(new_nid))
+		new_nid = first_online_node;
+
+	if (NODE_DATA(new_nid) == NULL) {
+#ifdef CONFIG_MEMORY_HOTPLUG
+		/*
+		 * Need to ensure that NODE_DATA is initialized
+		 * for a node from available memory (see
+		 * memblock_alloc_try_nid).  If unable to init
+		 * the node, then default to nearest node that
+		 * has memory installed.
+		 */
+		if (try_online_node(new_nid))
+			new_nid = first_online_node;
+#else
+		/*
+		 * Default to using the nearest node that has
+		 * memory installed.  Otherwise, it would be 
+		 * necessary to patch the kernel MM code to deal
+		 * with more memoryless-node error conditions.
+		 */
+		new_nid = first_online_node;
+#endif
+	}
+
+	return new_nid;
+}
+
 /*
  * Update the CPU maps and sysfs entries for a single CPU when its NUMA
  * characteristics change. This function doesn't perform any locking and is
@@ -1376,7 +1410,6 @@ int numa_update_cpu_topology(bool cpus_locked)
 {
 	unsigned int cpu, sibling, changed = 0;
 	struct topology_update_data *updates, *ud;
-	__be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
 	cpumask_t updated_cpus;
 	struct device *dev;
 	int weight, new_nid, i = 0;
@@ -1414,11 +1447,7 @@ int numa_update_cpu_topology(bool cpus_locked)
 			continue;
 		}
 
-		/* Use associativity from first thread for all siblings */
-		vphn_get_associativity(cpu, associativity);
-		new_nid = associativity_to_nid(associativity);
-		if (new_nid < 0 || !node_online(new_nid))
-			new_nid = first_online_node;
+		new_nid = find_and_online_cpu_nid(cpu);
 
 		if (new_nid == numa_cpu_lookup_table[cpu]) {
 			cpumask_andnot(&cpu_associativity_changes_mask,

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH V8 3/3] hotplug/cpu: Fix crash with memoryless nodes
  2017-11-28 22:58 [PATCH V8 0/3] powerpc/nodes: Fix issues with memoryless nodes Michael Bringmann
  2017-11-28 22:58 ` [PATCH V8 1/3] powerpc/nodes: Ensure enough nodes avail for operations Michael Bringmann
  2017-11-28 22:58 ` [PATCH V8 2/3] poserpc/initnodes: Ensure nodes initialized for hotplug Michael Bringmann
@ 2017-11-28 22:58 ` Michael Bringmann
  2018-01-08 19:15   ` Nathan Fontenot
  2 siblings, 1 reply; 8+ messages in thread
From: Michael Bringmann @ 2017-11-28 22:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michael Bringmann, Nathan Fontenot

On powerpc systems with shared configurations of CPUs and memory and
memoryless nodes at boot, an event ordering problem was observed on
a SLES12 build platforms with the hot-add of CPUs to the memoryless
nodes.

* The most common error occurred when the memory SLAB driver attempted
  to reference the memoryless node to which a CPU was being added
  before the kernel had finished initializing all of the data structures
  for the CPU and exited 'device_online' under DLPAR/hot-add.

  Normally the memoryless node would be initialized through the call
  path device_online ... arch_update_cpu_topology ... find_cpu_nid
  ...  try_online_node.  This patch ensures that the powerpc node will
  be initialized as early as possible, even if it was memoryless and
  CPU-less at the point when we are trying to hot-add a new CPU to it.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
---
Changes in V8:
  -- Change a 'printk(KERN_INFO ...)' statement to be a pr_debug()
     statement.
  -- Rename 'find_cpu_nid' to 'find_and_online_cpu_nid' for better
     clarity of its function.
---
 arch/powerpc/mm/numa.c                       |    4 +++-
 arch/powerpc/platforms/pseries/hotplug-cpu.c |    3 +++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 6b08dd8..a182f9e 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1307,7 +1307,7 @@ static long vphn_get_associativity(unsigned long cpu,
 	return rc;
 }
 
-static inline int find_and_online_cpu_nid(int cpu)
+int find_and_online_cpu_nid(int cpu)
 {
 	__be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
 	int new_nid;
@@ -1340,6 +1340,8 @@ static inline int find_and_online_cpu_nid(int cpu)
 #endif
 	}
 
+	pr_debug("%s:%d cpu %d nid %d\n", __FUNCTION__, __LINE__,
+		cpu, new_nid);
 	return new_nid;
 }
 
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index a7d14aa7..dceb514 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -340,6 +340,8 @@ static void pseries_remove_processor(struct device_node *np)
 	cpu_maps_update_done();
 }
 
+extern int find_and_online_cpu_nid(int cpu);
+
 static int dlpar_online_cpu(struct device_node *dn)
 {
 	int rc = 0;
@@ -364,6 +366,7 @@ static int dlpar_online_cpu(struct device_node *dn)
 					!= CPU_STATE_OFFLINE);
 			cpu_maps_update_done();
 			timed_topology_update(1);
+			find_and_online_cpu_nid(cpu);
 			rc = device_online(get_cpu_device(cpu));
 			if (rc)
 				goto out;

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH V8 1/3] powerpc/nodes: Ensure enough nodes avail for operations
  2017-11-28 22:58 ` [PATCH V8 1/3] powerpc/nodes: Ensure enough nodes avail for operations Michael Bringmann
@ 2018-01-08 19:13   ` Nathan Fontenot
  2018-01-29  4:13   ` [V8,1/3] " Michael Ellerman
  1 sibling, 0 replies; 8+ messages in thread
From: Nathan Fontenot @ 2018-01-08 19:13 UTC (permalink / raw)
  To: Michael Bringmann, linuxppc-dev

On 11/28/2017 04:58 PM, Michael Bringmann wrote:
> On powerpc systems which allow 'hot-add' of CPU or memory resources,
> it may occur that the new resources are to be inserted into nodes
> that were not used for these resources at bootup.  In the kernel,
> any node that is used must be defined and initialized.  These empty
> nodes may occur when,
> 
> * Dedicated vs. shared resources.  Shared resources require
>   information such as the VPHN hcall for CPU assignment to nodes.
>   Associativity decisions made based on dedicated resource rules,
>   such as associativity properties in the device tree, may vary
>   from decisions made using the values returned by the VPHN hcall.
> * memoryless nodes at boot.  Nodes need to be defined as 'possible'
>   at boot for operation with other code modules.  Previously, the
>   powerpc code would limit the set of possible nodes to those which
>   have memory assigned at boot, and were thus online.  Subsequent
>   add/remove of CPUs or memory would only work with this subset of
>   possible nodes.
> * memoryless nodes with CPUs at boot.  Due to the previous restriction
>   on nodes, nodes that had CPUs but no memory were being collapsed
>   into other nodes that did have memory at boot.  In practice this
>   meant that the node assignment presented by the runtime kernel
>   differed from the affinity and associativity attributes presented
>   by the device tree or VPHN hcalls.  Nodes that might be known to
>   the pHyp were not 'possible' in the runtime kernel because they did
>   not have memory at boot.
> 
> This patch ensures that sufficient nodes are defined to support
> configuration requirements after boot, as well as at boot.  This
> patch set fixes a couple of problems.
> 
> * Nodes known to powerpc to be memoryless at boot, but to have
>   CPUs in them are allowed to be 'possible' and 'online'.  Memory
>   allocations for those nodes are taken from another node that does
>   have memory until and if memory is hot-added to the node.
> * Nodes which have no resources assigned at boot, but which may still
>   be referenced subsequently by affinity or associativity attributes,
>   are kept in the list of 'possible' nodes for powerpc.  Hot-add of
>   memory or CPUs to the system can reference these nodes and bring
>   them online instead of redirecting to one of the set of nodes that
>   were known to have memory at boot.
> 
> This patch extracts the value of the lowest domain level (number of
> allocable resources) from the device tree property
> "ibm,max-associativity-domains" to use as the maximum number of nodes
> to setup as possibly available in the system.  This new setting will
> override the instruction,
> 
>     nodes_and(node_possible_map, node_possible_map, node_online_map);
> 
> presently seen in the function arch/powerpc/mm/numa.c:initmem_init().
> 
> If the "ibm,max-associativity-domains" property is not present at boot,
> no operation will be performed to define or enable additional nodes, or
> enable the above 'nodes_and()'.
> 
> Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>

Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
 
> ---
> Changes in V8:
>   -- Remove unneeded pr_info() statement
> ---
>  arch/powerpc/mm/numa.c |   37 ++++++++++++++++++++++++++++++++++---
>  1 file changed, 34 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index adb6364f..735e3fd 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -892,6 +892,34 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
>  	NODE_DATA(nid)->node_spanned_pages = spanned_pages;
>  }
> 
> +static void __init find_possible_nodes(void)
> +{
> +	struct device_node *rtas;
> +	u32 numnodes, i;
> +
> +	if (min_common_depth <= 0)
> +		return;
> +
> +	rtas = of_find_node_by_path("/rtas");
> +	if (!rtas)
> +		return;
> +
> +	if (of_property_read_u32_index(rtas,
> +				"ibm,max-associativity-domains",
> +				min_common_depth, &numnodes))
> +		goto out;
> +
> +	for (i = 0; i < numnodes; i++) {
> +		if (!node_possible(i)) {
> +			setup_node_data(i, 0, 0);
> +			node_set(i, node_possible_map);
> +		}
> +	}
> +
> +out:
> +	of_node_put(rtas);
> +}
> +
>  void __init initmem_init(void)
>  {
>  	int nid, cpu;
> @@ -905,12 +933,15 @@ void __init initmem_init(void)
>  	memblock_dump_all();
> 
>  	/*
> -	 * Reduce the possible NUMA nodes to the online NUMA nodes,
> -	 * since we do not support node hotplug. This ensures that  we
> -	 * lower the maximum NUMA node ID to what is actually present.
> +	 * Modify the set of possible NUMA nodes to reflect information
> +	 * available about the set of online nodes, and the set of nodes
> +	 * that we expect to make use of for this platform's affinity
> +	 * calculations.
>  	 */
>  	nodes_and(node_possible_map, node_possible_map, node_online_map);
> 
> +	find_possible_nodes();
> +
>  	for_each_online_node(nid) {
>  		unsigned long start_pfn, end_pfn;
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V8 2/3] poserpc/initnodes: Ensure nodes initialized for hotplug
  2017-11-28 22:58 ` [PATCH V8 2/3] poserpc/initnodes: Ensure nodes initialized for hotplug Michael Bringmann
@ 2018-01-08 19:14   ` Nathan Fontenot
  0 siblings, 0 replies; 8+ messages in thread
From: Nathan Fontenot @ 2018-01-08 19:14 UTC (permalink / raw)
  To: Michael Bringmann, linuxppc-dev

On 11/28/2017 04:58 PM, Michael Bringmann wrote:
> On powerpc systems which allow 'hot-add' of CPU, it may occur that
> the new resources are to be inserted into nodes that were not used
> for memory resources at bootup.  Many different configurations of
> PowerPC resources may need to be supported depending upon the
> environment.  Important characteristics of the nodes and operating
> environment include:
> 
> * Dedicated vs. shared CPUs.  Shared CPUs require information such
>   as the VPHN hcall for CPU assignment to nodes, since shared CPUs
>   have their affinity set to node 0 at boot and when hot-added.
>   Associativity decisions made based on dedicated resource rules,
>   such as associativity properties in the device tree, may vary from
>   decisions made using the values returned by the VPHN hcall.
> * memoryless nodes at boot.  Nodes need to be defined as 'possible'
>   at boot for operation with other code modules.  Previously, the
>   powerpc code would limit the set of possible nodes to those which
>   have memory assigned at boot, and were thus online.  Subsequent
>   add/remove of CPUs or memory would only work with this subset of
>   possible nodes.
> * memoryless nodes with CPUs at boot.  Due to the previous restriction
>   on nodes, nodes that had CPUs but no memory were being collapsed
>   into other nodes that did have memory at boot.  In practice this
>   meant that the node assignment presented by the runtime kernel
>   differed from the affinity and associativity attributes presented
>   by the device tree or VPHN hcalls.  Nodes that might be known to
>   the pHyp were not 'possible' in the runtime kernel because they did
>   not have memory at boot.
> 
> This patch fixes some problems encountered at runtime with
> configurations that support memory-less nodes, or that hot-add CPUs
> into nodes that are memoryless during system execution after boot.
> The problems of interest include,
> 
> * Nodes known to powerpc to be memoryless at boot, but to have
>   CPUs in them are allowed to be 'possible' and 'online'.  Memory
>   allocations for those nodes are taken from another node that does
>   have memory until and if memory is hot-added to the node.
> * Nodes which have no resources assigned at boot, but which may still
>   be referenced subsequently by affinity or associativity attributes,
>   are kept in the list of 'possible' nodes for powerpc.  Hot-add of
>   memory or CPUs to the system can reference these nodes and bring
>   them online instead of redirecting the references to one of the set
>   of nodes known to have memory at boot.
> 
> Note that this software operates under the context of CPU hotplug.
> We are not doing memory hotplug in this code, but rather updating
> the kernel's CPU topology (i.e. arch_update_cpu_topology /
> numa_update_cpu_topology).  We are initializing a node that may be
> used by CPUs or memory before it can be referenced as invalid by a
> CPU hotplug operation.  CPU hotplug operations are protected by a
> range of APIs including cpu_maps_update_begin/cpu_maps_update_done,
> cpus_read/write_lock / cpus_read/write_unlock, device locks, and more.
> Memory hotplug operations, including try_online_node, are protected
> by mem_hotplug_begin/mem_hotplug_done, device locks, and more.  In
> the case of CPUs being hot-added to a previously memoryless node, the
> try_online_node operation occurs wholly within the CPU locks with no
> overlap.  Using HMC hot-add/hot-remove operations, we have been able
> to add and remove CPUs to any possible node without failures.  HMC
> operations involve a degree self-serialization, though.
> 
> Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>

Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>

> ---
> Changes in V8:
>   -- Clarify 'resources' as CPUs in patch description regarding
>      VPHN call.  Add another clause to statement mentioning that
>      shared CPUs start in node 0, and are finally assigned per
>      VPHN information.
>   -- Rename 'find_cpu_nid' to 'find_and_online_cpu_nid' for better
>      clarity of its function.
>   -- Restore '__init' tag to definition of 'setup_node_data'
> ---
>  arch/powerpc/mm/numa.c |   49 ++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 39 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 735e3fd..6b08dd8 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -551,7 +551,7 @@ static int numa_setup_cpu(unsigned long lcpu)
>  	nid = of_node_to_nid_single(cpu);
> 
>  out_present:
> -	if (nid < 0 || !node_online(nid))
> +	if (nid < 0 || !node_possible(nid))
>  		nid = first_online_node;
> 
>  	map_cpu_to_node(lcpu, nid);
> @@ -910,10 +910,8 @@ static void __init find_possible_nodes(void)
>  		goto out;
> 
>  	for (i = 0; i < numnodes; i++) {
> -		if (!node_possible(i)) {
> -			setup_node_data(i, 0, 0);
> +		if (!node_possible(i))
>  			node_set(i, node_possible_map);
> -		}
>  	}
> 
>  out:
> @@ -1309,6 +1307,42 @@ static long vphn_get_associativity(unsigned long cpu,
>  	return rc;
>  }
> 
> +static inline int find_and_online_cpu_nid(int cpu)
> +{
> +	__be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
> +	int new_nid;
> +
> +	/* Use associativity from first thread for all siblings */
> +	vphn_get_associativity(cpu, associativity);
> +	new_nid = associativity_to_nid(associativity);
> +	if (new_nid < 0 || !node_possible(new_nid))
> +		new_nid = first_online_node;
> +
> +	if (NODE_DATA(new_nid) == NULL) {
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +		/*
> +		 * Need to ensure that NODE_DATA is initialized
> +		 * for a node from available memory (see
> +		 * memblock_alloc_try_nid).  If unable to init
> +		 * the node, then default to nearest node that
> +		 * has memory installed.
> +		 */
> +		if (try_online_node(new_nid))
> +			new_nid = first_online_node;
> +#else
> +		/*
> +		 * Default to using the nearest node that has
> +		 * memory installed.  Otherwise, it would be 
> +		 * necessary to patch the kernel MM code to deal
> +		 * with more memoryless-node error conditions.
> +		 */
> +		new_nid = first_online_node;
> +#endif
> +	}
> +
> +	return new_nid;
> +}
> +
>  /*
>   * Update the CPU maps and sysfs entries for a single CPU when its NUMA
>   * characteristics change. This function doesn't perform any locking and is
> @@ -1376,7 +1410,6 @@ int numa_update_cpu_topology(bool cpus_locked)
>  {
>  	unsigned int cpu, sibling, changed = 0;
>  	struct topology_update_data *updates, *ud;
> -	__be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
>  	cpumask_t updated_cpus;
>  	struct device *dev;
>  	int weight, new_nid, i = 0;
> @@ -1414,11 +1447,7 @@ int numa_update_cpu_topology(bool cpus_locked)
>  			continue;
>  		}
> 
> -		/* Use associativity from first thread for all siblings */
> -		vphn_get_associativity(cpu, associativity);
> -		new_nid = associativity_to_nid(associativity);
> -		if (new_nid < 0 || !node_online(new_nid))
> -			new_nid = first_online_node;
> +		new_nid = find_and_online_cpu_nid(cpu);
> 
>  		if (new_nid == numa_cpu_lookup_table[cpu]) {
>  			cpumask_andnot(&cpu_associativity_changes_mask,
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V8 3/3] hotplug/cpu: Fix crash with memoryless nodes
  2017-11-28 22:58 ` [PATCH V8 3/3] hotplug/cpu: Fix crash with memoryless nodes Michael Bringmann
@ 2018-01-08 19:15   ` Nathan Fontenot
  0 siblings, 0 replies; 8+ messages in thread
From: Nathan Fontenot @ 2018-01-08 19:15 UTC (permalink / raw)
  To: Michael Bringmann, linuxppc-dev

On 11/28/2017 04:58 PM, Michael Bringmann wrote:
> On powerpc systems with shared configurations of CPUs and memory and
> memoryless nodes at boot, an event ordering problem was observed on
> a SLES12 build platforms with the hot-add of CPUs to the memoryless
> nodes.
> 
> * The most common error occurred when the memory SLAB driver attempted
>   to reference the memoryless node to which a CPU was being added
>   before the kernel had finished initializing all of the data structures
>   for the CPU and exited 'device_online' under DLPAR/hot-add.
> 
>   Normally the memoryless node would be initialized through the call
>   path device_online ... arch_update_cpu_topology ... find_cpu_nid
>   ...  try_online_node.  This patch ensures that the powerpc node will
>   be initialized as early as possible, even if it was memoryless and
>   CPU-less at the point when we are trying to hot-add a new CPU to it.
> 
> Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>

Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>

> ---
> Changes in V8:
>   -- Change a 'printk(KERN_INFO ...)' statement to be a pr_debug()
>      statement.
>   -- Rename 'find_cpu_nid' to 'find_and_online_cpu_nid' for better
>      clarity of its function.
> ---
>  arch/powerpc/mm/numa.c                       |    4 +++-
>  arch/powerpc/platforms/pseries/hotplug-cpu.c |    3 +++
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 6b08dd8..a182f9e 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -1307,7 +1307,7 @@ static long vphn_get_associativity(unsigned long cpu,
>  	return rc;
>  }
> 
> -static inline int find_and_online_cpu_nid(int cpu)
> +int find_and_online_cpu_nid(int cpu)
>  {
>  	__be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
>  	int new_nid;
> @@ -1340,6 +1340,8 @@ static inline int find_and_online_cpu_nid(int cpu)
>  #endif
>  	}
> 
> +	pr_debug("%s:%d cpu %d nid %d\n", __FUNCTION__, __LINE__,
> +		cpu, new_nid);
>  	return new_nid;
>  }
> 
> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> index a7d14aa7..dceb514 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -340,6 +340,8 @@ static void pseries_remove_processor(struct device_node *np)
>  	cpu_maps_update_done();
>  }
> 
> +extern int find_and_online_cpu_nid(int cpu);
> +
>  static int dlpar_online_cpu(struct device_node *dn)
>  {
>  	int rc = 0;
> @@ -364,6 +366,7 @@ static int dlpar_online_cpu(struct device_node *dn)
>  					!= CPU_STATE_OFFLINE);
>  			cpu_maps_update_done();
>  			timed_topology_update(1);
> +			find_and_online_cpu_nid(cpu);
>  			rc = device_online(get_cpu_device(cpu));
>  			if (rc)
>  				goto out;
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [V8,1/3] powerpc/nodes: Ensure enough nodes avail for operations
  2017-11-28 22:58 ` [PATCH V8 1/3] powerpc/nodes: Ensure enough nodes avail for operations Michael Bringmann
  2018-01-08 19:13   ` Nathan Fontenot
@ 2018-01-29  4:13   ` Michael Ellerman
  1 sibling, 0 replies; 8+ messages in thread
From: Michael Ellerman @ 2018-01-29  4:13 UTC (permalink / raw)
  To: Michael Bringmann, linuxppc-dev; +Cc: Nathan Fontenot, Michael Bringmann

On Tue, 2017-11-28 at 22:58:36 UTC, Michael Bringmann wrote:
> On powerpc systems which allow 'hot-add' of CPU or memory resources,
> it may occur that the new resources are to be inserted into nodes
> that were not used for these resources at bootup.  In the kernel,
> any node that is used must be defined and initialized.  These empty
> nodes may occur when,
> 
> * Dedicated vs. shared resources.  Shared resources require
>   information such as the VPHN hcall for CPU assignment to nodes.
>   Associativity decisions made based on dedicated resource rules,
>   such as associativity properties in the device tree, may vary
>   from decisions made using the values returned by the VPHN hcall.
> * memoryless nodes at boot.  Nodes need to be defined as 'possible'
>   at boot for operation with other code modules.  Previously, the
>   powerpc code would limit the set of possible nodes to those which
>   have memory assigned at boot, and were thus online.  Subsequent
>   add/remove of CPUs or memory would only work with this subset of
>   possible nodes.
> * memoryless nodes with CPUs at boot.  Due to the previous restriction
>   on nodes, nodes that had CPUs but no memory were being collapsed
>   into other nodes that did have memory at boot.  In practice this
>   meant that the node assignment presented by the runtime kernel
>   differed from the affinity and associativity attributes presented
>   by the device tree or VPHN hcalls.  Nodes that might be known to
>   the pHyp were not 'possible' in the runtime kernel because they did
>   not have memory at boot.
> 
> This patch ensures that sufficient nodes are defined to support
> configuration requirements after boot, as well as at boot.  This
> patch set fixes a couple of problems.
> 
> * Nodes known to powerpc to be memoryless at boot, but to have
>   CPUs in them are allowed to be 'possible' and 'online'.  Memory
>   allocations for those nodes are taken from another node that does
>   have memory until and if memory is hot-added to the node.
> * Nodes which have no resources assigned at boot, but which may still
>   be referenced subsequently by affinity or associativity attributes,
>   are kept in the list of 'possible' nodes for powerpc.  Hot-add of
>   memory or CPUs to the system can reference these nodes and bring
>   them online instead of redirecting to one of the set of nodes that
>   were known to have memory at boot.
> 
> This patch extracts the value of the lowest domain level (number of
> allocable resources) from the device tree property
> "ibm,max-associativity-domains" to use as the maximum number of nodes
> to setup as possibly available in the system.  This new setting will
> override the instruction,
> 
>     nodes_and(node_possible_map, node_possible_map, node_online_map);
> 
> presently seen in the function arch/powerpc/mm/numa.c:initmem_init().
> 
> If the "ibm,max-associativity-domains" property is not present at boot,
> no operation will be performed to define or enable additional nodes, or
> enable the above 'nodes_and()'.
> 
> Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
> Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/a346137e9142b039fd13af2e59696e

cheers

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-01-29  4:13 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-28 22:58 [PATCH V8 0/3] powerpc/nodes: Fix issues with memoryless nodes Michael Bringmann
2017-11-28 22:58 ` [PATCH V8 1/3] powerpc/nodes: Ensure enough nodes avail for operations Michael Bringmann
2018-01-08 19:13   ` Nathan Fontenot
2018-01-29  4:13   ` [V8,1/3] " Michael Ellerman
2017-11-28 22:58 ` [PATCH V8 2/3] poserpc/initnodes: Ensure nodes initialized for hotplug Michael Bringmann
2018-01-08 19:14   ` Nathan Fontenot
2017-11-28 22:58 ` [PATCH V8 3/3] hotplug/cpu: Fix crash with memoryless nodes Michael Bringmann
2018-01-08 19:15   ` Nathan Fontenot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).