* [PATCH 0/2] powerpc/nodes/hotplug: Fix problem with memoryless nodes @ 2017-09-18 18:28 Michael Bringmann 2017-09-18 18:28 ` [PATCH 1/2] powerpc/nodes: Ensure enough nodes avail for operations Michael Bringmann 2017-09-18 18:28 ` [PATCH 2/2] powerpc/hotplug: Ensure nodes initialized for hotplug Michael Bringmann 0 siblings, 2 replies; 10+ messages in thread From: Michael Bringmann @ 2017-09-18 18:28 UTC (permalink / raw) To: linuxppc-dev, linux-kernel Cc: Michael Bringmann, Nathan Fontenot, Michael Ellerman, John Allen powerpc/nodes: Ensure enough nodes avail for operations. On systems like PowerPC which allow 'hot-add' of CPU or memory resources, it may occur that the new resources are to be inserted into nodes that were not used for these resources at bootup. In the kernel, any node that is used must be defined and initialized at boot. This patch extracts the value of the lowest domain level (number of allocable resources) from the "rtas" device tree property "ibm,current-associativity-domains" or the device tree property "ibm,max-associativity-domains" to use as the maximum number of nodes to setup as possibly available in the system. powerpc/hotplug: Fix CPU-only node bringup bug On systems like PowerPC which allow 'hot-add' of CPU, it may occur that the new resources are to be inserted into nodes that were not used for memory resources at bootup. Many different configurations of PowerPC resources may need to be supported depending upon the environment. This patch fixes some problems encountered at runtime with configurations that support memory-less nodes, but which allow CPUs to be added at and after boot. Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com> Michael Bringmann (2): powerpc/vphn: Ensure enough nodes avail for operations powerpc/hotplug: Fix CPU-only node bringup bug ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/2] powerpc/nodes: Ensure enough nodes avail for operations 2017-09-18 18:28 [PATCH 0/2] powerpc/nodes/hotplug: Fix problem with memoryless nodes Michael Bringmann @ 2017-09-18 18:28 ` Michael Bringmann 2017-10-16 12:33 ` Michael Ellerman 2017-09-18 18:28 ` [PATCH 2/2] powerpc/hotplug: Ensure nodes initialized for hotplug Michael Bringmann 1 sibling, 1 reply; 10+ messages in thread From: Michael Bringmann @ 2017-09-18 18:28 UTC (permalink / raw) To: linuxppc-dev, linux-kernel Cc: Michael Ellerman, Michael Bringmann, John Allen, Nathan Fontenot powerpc/nodes: On systems like PowerPC which allow 'hot-add' of CPU or memory resources, it may occur that the new resources are to be inserted into nodes that were not used for these resources at bootup. In the kernel, any node that is used must be defined and initialized at boot. This patch extracts the value of the lowest domain level (number of allocable resources) from the "rtas" device tree property "ibm,current-associativity-domains" or the device tree property "ibm,max-associativity-domains" to use as the maximum number of nodes to setup as possibly available in the system. This new setting will override the instruction, nodes_and(node_possible_map, node_possible_map, node_online_map); presently seen in the function arch/powerpc/mm/numa.c:initmem_init(). If the property is not present at boot, no operation will be performed to define or enable additional nodes. Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com> --- arch/powerpc/mm/numa.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index ec098b3..b385cd0 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -892,6 +892,51 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn) NODE_DATA(nid)->node_spanned_pages = spanned_pages; } +static void __init node_associativity_setup(void) +{ + struct device_node *rtas; + + rtas = of_find_node_by_path("/rtas"); + if (rtas) { + const __be32 *prop; + u32 len, entries, numnodes, i; + + prop = of_get_property(rtas, + "ibm,current-associativity-domains", &len); + if (!prop || len < sizeof(unsigned int)) { + prop = of_get_property(rtas, + "ibm,max-associativity-domains", &len); + goto endit; + } + + entries = of_read_number(prop++, 1); + + if (len < (entries * sizeof(unsigned int))) + goto endit; + + if ((0 <= min_common_depth) && (min_common_depth <= (entries-1))) + entries = min_common_depth; + else + entries -= 1; + + numnodes = of_read_number(&prop[entries], 1); + + printk(KERN_INFO "numa: Nodes = %d (mcd = %d)\n", numnodes, + min_common_depth); + + for (i = 0; i < numnodes; i++) { + if (!node_possible(i)) { + setup_node_data(i, 0, 0); + node_set(i, node_possible_map); + } + } + } + +endit: + if (rtas) + of_node_put(rtas); +} + void __init initmem_init(void) { int nid, cpu; @@ -911,6 +956,8 @@ void __init initmem_init(void) */ nodes_and(node_possible_map, node_possible_map, node_online_map); + node_associativity_setup(); + for_each_online_node(nid) { unsigned long start_pfn, end_pfn; ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] powerpc/nodes: Ensure enough nodes avail for operations 2017-09-18 18:28 ` [PATCH 1/2] powerpc/nodes: Ensure enough nodes avail for operations Michael Bringmann @ 2017-10-16 12:33 ` Michael Ellerman 2017-10-17 16:14 ` Michael Bringmann 0 siblings, 1 reply; 10+ messages in thread From: Michael Ellerman @ 2017-10-16 12:33 UTC (permalink / raw) To: Michael Bringmann, linuxppc-dev, linux-kernel Cc: Michael Bringmann, John Allen, Nathan Fontenot Michael Bringmann <mwb@linux.vnet.ibm.com> writes: > powerpc/nodes: On systems like PowerPC which allow 'hot-add' of CPU This is a powerpc-only patch, so saying "systems like PowerPC" is confusing. What you should be saying is "On pseries systems". > or memory resources, it may occur that the new resources are to be > inserted into nodes that were not used for these resources at bootup. > In the kernel, any node that is used must be defined and initialized > at boot. > > This patch extracts the value of the lowest domain level (number of > allocable resources) from the "rtas" device tree property > "ibm,current-associativity-domains" or the device tree property What is current associativity domains? I've not heard of it, where is it documented, and what does it mean. Why would use the "current" set vs the "max"? I thought the whole point was to discover the maximum possible set of nodes that could be hotplugged. > "ibm,max-associativity-domains" to use as the maximum number of nodes > to setup as possibly available in the system. This new setting will > override the instruction, > > nodes_and(node_possible_map, node_possible_map, node_online_map); > > presently seen in the function arch/powerpc/mm/numa.c:initmem_init(). > > If the property is not present at boot, no operation will be performed > to define or enable additional nodes. > > Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com> > --- > arch/powerpc/mm/numa.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 47 insertions(+) > > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c > index ec098b3..b385cd0 100644 > --- a/arch/powerpc/mm/numa.c > +++ b/arch/powerpc/mm/numa.c > @@ -892,6 +892,51 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn) > NODE_DATA(nid)->node_spanned_pages = spanned_pages; > } > > +static void __init node_associativity_setup(void) This should really be called "find_possible_nodes()" or something more descriptive. > +{ > + struct device_node *rtas; > + > + rtas = of_find_node_by_path("/rtas"); > + if (rtas) { If you just short-circuit that return the whole function body can be deintented, making it significantly more readable. ie: + rtas = of_find_node_by_path("/rtas"); + if (!rtas) + return; > + const __be32 *prop; > + u32 len, entries, numnodes, i; > + > + prop = of_get_property(rtas, > + "ibm,current-associativity-domains", &len); Please don't use of_get_property() in new code, we have much better accessors these days, which do better error checking and handle the endian conversions for you. In this case you'd use eg: u32 entries; rc = of_property_read_u32(rtas, "ibm,current-associativity-domains", &entries); > + if (!prop || len < sizeof(unsigned int)) { > + prop = of_get_property(rtas, > + "ibm,max-associativity-domains", &len); > + goto endit; > + } > + > + entries = of_read_number(prop++, 1); > + > + if (len < (entries * sizeof(unsigned int))) > + goto endit; > + > + if ((0 <= min_common_depth) && (min_common_depth <= (entries-1))) > + entries = min_common_depth; > + else > + entries -= 1; ^ You can't just guess that will be the right entry. If min_common_depth is < 0 the function should have just returned immediately at the top. If min_common_depth is outside the range of the property that's a buggy device tree, you should print a warning and return. > + numnodes = of_read_number(&prop[entries], 1); u32 num_nodes; rc = of_property_read_u32_index(rtas, "ibm,current-associativity-domains", min_common_depth, &num_nodes); > + > + printk(KERN_INFO "numa: Nodes = %d (mcd = %d)\n", numnodes, > + min_common_depth); > + > + for (i = 0; i < numnodes; i++) { > + if (!node_possible(i)) { > + setup_node_data(i, 0, 0); Do we actually need to setup the NODE_DATA() yet? Doing it now ensures it will not be allocated node local, which sucks. > + node_set(i, node_possible_map); > + } > + } > + } > + > +endit: "out" would be the normal name. > + if (rtas) > + of_node_put(rtas); > +} > + > void __init initmem_init(void) > { > int nid, cpu; > @@ -911,6 +956,8 @@ void __init initmem_init(void) > */ You need to update the comment above here which is contradicted by the new function you're adding. > nodes_and(node_possible_map, node_possible_map, node_online_map); > > + node_associativity_setup(); > + > for_each_online_node(nid) { > unsigned long start_pfn, end_pfn; > cheers ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] powerpc/nodes: Ensure enough nodes avail for operations 2017-10-16 12:33 ` Michael Ellerman @ 2017-10-17 16:14 ` Michael Bringmann 2017-10-17 17:02 ` Nathan Fontenot 0 siblings, 1 reply; 10+ messages in thread From: Michael Bringmann @ 2017-10-17 16:14 UTC (permalink / raw) To: Michael Ellerman, linuxppc-dev, linux-kernel Cc: John Allen, Nathan Fontenot, Michael Bringmann from Kernel Team See below. On 10/16/2017 07:33 AM, Michael Ellerman wrote: > Michael Bringmann <mwb@linux.vnet.ibm.com> writes: > >> powerpc/nodes: On systems like PowerPC which allow 'hot-add' of CPU > > This is a powerpc-only patch, so saying "systems like PowerPC" is > confusing. What you should be saying is "On pseries systems". > >> or memory resources, it may occur that the new resources are to be >> inserted into nodes that were not used for these resources at bootup. >> In the kernel, any node that is used must be defined and initialized >> at boot. >> >> This patch extracts the value of the lowest domain level (number of >> allocable resources) from the "rtas" device tree property >> "ibm,current-associativity-domains" or the device tree property > > What is current associativity domains? I've not heard of it, where is it > documented, and what does it mean. > > Why would use the "current" set vs the "max"? I thought the whole point > was to discover the maximum possible set of nodes that could be > hotplugged. > >> "ibm,max-associativity-domains" to use as the maximum number of nodes >> to setup as possibly available in the system. This new setting will >> override the instruction, >> >> nodes_and(node_possible_map, node_possible_map, node_online_map); >> >> presently seen in the function arch/powerpc/mm/numa.c:initmem_init(). >> >> If the property is not present at boot, no operation will be performed >> to define or enable additional nodes. >> >> Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com> >> --- >> arch/powerpc/mm/numa.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 47 insertions(+) >> >> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c >> index ec098b3..b385cd0 100644 >> --- a/arch/powerpc/mm/numa.c >> +++ b/arch/powerpc/mm/numa.c >> @@ -892,6 +892,51 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn) >> NODE_DATA(nid)->node_spanned_pages = spanned_pages; >> } >> >> +static void __init node_associativity_setup(void) > > This should really be called "find_possible_nodes()" or something more > descriptive. Okay. > >> +{ >> + struct device_node *rtas; >> + >> + rtas = of_find_node_by_path("/rtas"); >> + if (rtas) { > > If you just short-circuit that return the whole function body can be > deintented, making it significantly more readable. > > ie: > + rtas = of_find_node_by_path("/rtas"); > + if (!rtas) > + return; Okay. > >> + const __be32 *prop; >> + u32 len, entries, numnodes, i; >> + >> + prop = of_get_property(rtas, >> + "ibm,current-associativity-domains", &len); > > Please don't use of_get_property() in new code, we have much better > accessors these days, which do better error checking and handle the > endian conversions for you. > > In this case you'd use eg: > > u32 entries; > rc = of_property_read_u32(rtas, "ibm,current-associativity-domains", &entries); The property 'ibm,current-associativity-domains' has the same format as the property 'ibm,max-associativity-domains' i.e. it is an integer array. The accessor of_property_read_32, however, expects it to be an integer singleton value. Instead, it needs: > >> + if (!prop || len < sizeof(unsigned int)) { >> + prop = of_get_property(rtas, >> + "ibm,max-associativity-domains", &len); if (!prop || len < sizeof(unsigned int)) >> + goto endit; >> + } >> + >> + entries = of_read_number(prop++, 1); >> + >> + if (len < (entries * sizeof(unsigned int))) >> + goto endit; >> + >> + if ((0 <= min_common_depth) && (min_common_depth <= (entries-1))) >> + entries = min_common_depth; >> + else >> + entries -= 1; > ^ > You can't just guess that will be the right entry. > > If min_common_depth is < 0 the function should have just returned > immediately at the top. Okay. > > If min_common_depth is outside the range of the property that's a buggy > device tree, you should print a warning and return. > >> + numnodes = of_read_number(&prop[entries], 1); > > u32 num_nodes; > rc = of_property_read_u32_index(rtas, "ibm,current-associativity-domains", min_common_depth, &num_nodes); >> + >> + printk(KERN_INFO "numa: Nodes = %d (mcd = %d)\n", numnodes, >> + min_common_depth); >> + >> + for (i = 0; i < numnodes; i++) { >> + if (!node_possible(i)) { >> + setup_node_data(i, 0, 0); > > Do we actually need to setup the NODE_DATA() yet? Doing it now ensures > it will not be allocated node local, which sucks. Okay. > >> + node_set(i, node_possible_map); >> + } >> + } >> + } >> + >> +endit: > > "out" would be the normal name. Okay. > >> + if (rtas) >> + of_node_put(rtas); >> +} >> + >> void __init initmem_init(void) >> { >> int nid, cpu; >> @@ -911,6 +956,8 @@ void __init initmem_init(void) >> */ > > You need to update the comment above here which is contradicted by the > new function you're adding. Okay. > >> nodes_and(node_possible_map, node_possible_map, node_online_map); >> >> + node_associativity_setup(); >> + >> for_each_online_node(nid) { >> unsigned long start_pfn, end_pfn; >> > > cheers > > -- Michael W. Bringmann Linux Technology Center IBM Corporation Tie-Line 363-5196 External: (512) 286-5196 Cell: (512) 466-0650 mwb@linux.vnet.ibm.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] powerpc/nodes: Ensure enough nodes avail for operations 2017-10-17 16:14 ` Michael Bringmann @ 2017-10-17 17:02 ` Nathan Fontenot 2017-10-17 17:22 ` Michael Bringmann 0 siblings, 1 reply; 10+ messages in thread From: Nathan Fontenot @ 2017-10-17 17:02 UTC (permalink / raw) To: Michael Bringmann, Michael Ellerman, linuxppc-dev, linux-kernel Cc: John Allen, Michael Bringmann from Kernel Team On 10/17/2017 11:14 AM, Michael Bringmann wrote: > See below. > > On 10/16/2017 07:33 AM, Michael Ellerman wrote: >> Michael Bringmann <mwb@linux.vnet.ibm.com> writes: >> >>> powerpc/nodes: On systems like PowerPC which allow 'hot-add' of CPU >> >> This is a powerpc-only patch, so saying "systems like PowerPC" is >> confusing. What you should be saying is "On pseries systems". >> >>> or memory resources, it may occur that the new resources are to be >>> inserted into nodes that were not used for these resources at bootup. >>> In the kernel, any node that is used must be defined and initialized >>> at boot. >>> >>> This patch extracts the value of the lowest domain level (number of >>> allocable resources) from the "rtas" device tree property >>> "ibm,current-associativity-domains" or the device tree property >> >> What is current associativity domains? I've not heard of it, where is it >> documented, and what does it mean. >> >> Why would use the "current" set vs the "max"? I thought the whole point >> was to discover the maximum possible set of nodes that could be >> hotplugged. >> >>> "ibm,max-associativity-domains" to use as the maximum number of nodes >>> to setup as possibly available in the system. This new setting will >>> override the instruction, >>> >>> nodes_and(node_possible_map, node_possible_map, node_online_map); >>> >>> presently seen in the function arch/powerpc/mm/numa.c:initmem_init(). >>> >>> If the property is not present at boot, no operation will be performed >>> to define or enable additional nodes. >>> >>> Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com> >>> --- >>> arch/powerpc/mm/numa.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ >>> 1 file changed, 47 insertions(+) >>> >>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c >>> index ec098b3..b385cd0 100644 >>> --- a/arch/powerpc/mm/numa.c >>> +++ b/arch/powerpc/mm/numa.c >>> @@ -892,6 +892,51 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn) >>> NODE_DATA(nid)->node_spanned_pages = spanned_pages; >>> } >>> >>> +static void __init node_associativity_setup(void) >> >> This should really be called "find_possible_nodes()" or something more >> descriptive. > > Okay. >> >>> +{ >>> + struct device_node *rtas; >>> + >>> + rtas = of_find_node_by_path("/rtas"); >>> + if (rtas) { >> >> If you just short-circuit that return the whole function body can be >> deintented, making it significantly more readable. >> >> ie: >> + rtas = of_find_node_by_path("/rtas"); >> + if (!rtas) >> + return; > > Okay. > >> >>> + const __be32 *prop; >>> + u32 len, entries, numnodes, i; >>> + >>> + prop = of_get_property(rtas, >>> + "ibm,current-associativity-domains", &len); >> >> Please don't use of_get_property() in new code, we have much better >> accessors these days, which do better error checking and handle the >> endian conversions for you. >> >> In this case you'd use eg: >> >> u32 entries; >> rc = of_property_read_u32(rtas, "ibm,current-associativity-domains", &entries); > > The property 'ibm,current-associativity-domains' has the same format as the property > 'ibm,max-associativity-domains' i.e. it is an integer array. The accessor of_property_read_32, > however, expects it to be an integer singleton value. Instead, it needs: I think for this case where the property is an array of values you could use of_property_count_elems_of_size() to get the number of elements in the array and then use of_property_read_u32_array() to read the array. -Nathan > >> >>> + if (!prop || len < sizeof(unsigned int)) { >>> + prop = of_get_property(rtas, >>> + "ibm,max-associativity-domains", &len); > if (!prop || len < sizeof(unsigned int)) >>> + goto endit; >>> + } >>> + >>> + entries = of_read_number(prop++, 1); >>> + >>> + if (len < (entries * sizeof(unsigned int))) >>> + goto endit; >>> + >>> + if ((0 <= min_common_depth) && (min_common_depth <= (entries-1))) >>> + entries = min_common_depth; >>> + else >>> + entries -= 1; >> ^ >> You can't just guess that will be the right entry. >> >> If min_common_depth is < 0 the function should have just returned >> immediately at the top. > > Okay. > >> >> If min_common_depth is outside the range of the property that's a buggy >> device tree, you should print a warning and return. >> >>> + numnodes = of_read_number(&prop[entries], 1); >> >> u32 num_nodes; >> rc = of_property_read_u32_index(rtas, "ibm,current-associativity-domains", min_common_depth, &num_nodes); >>> + >>> + printk(KERN_INFO "numa: Nodes = %d (mcd = %d)\n", numnodes, >>> + min_common_depth); >>> + >>> + for (i = 0; i < numnodes; i++) { >>> + if (!node_possible(i)) { >>> + setup_node_data(i, 0, 0); >> >> Do we actually need to setup the NODE_DATA() yet? Doing it now ensures >> it will not be allocated node local, which sucks. > > Okay. > >> >>> + node_set(i, node_possible_map); >>> + } >>> + } >>> + } >>> + >>> +endit: >> >> "out" would be the normal name. > > Okay. > >> >>> + if (rtas) >>> + of_node_put(rtas); >>> +} >>> + >>> void __init initmem_init(void) >>> { >>> int nid, cpu; >>> @@ -911,6 +956,8 @@ void __init initmem_init(void) >>> */ >> >> You need to update the comment above here which is contradicted by the >> new function you're adding. > > Okay. > >> >>> nodes_and(node_possible_map, node_possible_map, node_online_map); >>> >>> + node_associativity_setup(); >>> + >>> for_each_online_node(nid) { >>> unsigned long start_pfn, end_pfn; >>> >> >> cheers >> >> > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] powerpc/nodes: Ensure enough nodes avail for operations 2017-10-17 17:02 ` Nathan Fontenot @ 2017-10-17 17:22 ` Michael Bringmann 2017-10-17 17:41 ` Nathan Fontenot 0 siblings, 1 reply; 10+ messages in thread From: Michael Bringmann @ 2017-10-17 17:22 UTC (permalink / raw) To: Nathan Fontenot, Michael Ellerman, linuxppc-dev, linux-kernel Cc: John Allen, Michael Bringmann from Kernel Team On 10/17/2017 12:02 PM, Nathan Fontenot wrote: > On 10/17/2017 11:14 AM, Michael Bringmann wrote: >> See below. >> >> On 10/16/2017 07:33 AM, Michael Ellerman wrote: >>> Michael Bringmann <mwb@linux.vnet.ibm.com> writes: >>> >>>> powerpc/nodes: On systems like PowerPC which allow 'hot-add' of CPU >>> >>> This is a powerpc-only patch, so saying "systems like PowerPC" is >>> confusing. What you should be saying is "On pseries systems". >>> >>>> or memory resources, it may occur that the new resources are to be >>>> inserted into nodes that were not used for these resources at bootup. >>>> In the kernel, any node that is used must be defined and initialized >>>> at boot. >>>> >>>> This patch extracts the value of the lowest domain level (number of >>>> allocable resources) from the "rtas" device tree property >>>> "ibm,current-associativity-domains" or the device tree property >>> >>> What is current associativity domains? I've not heard of it, where is it >>> documented, and what does it mean. >>> >>> Why would use the "current" set vs the "max"? I thought the whole point >>> was to discover the maximum possible set of nodes that could be >>> hotplugged. >>> >>>> "ibm,max-associativity-domains" to use as the maximum number of nodes >>>> to setup as possibly available in the system. This new setting will >>>> override the instruction, >>>> >>>> nodes_and(node_possible_map, node_possible_map, node_online_map); >>>> >>>> presently seen in the function arch/powerpc/mm/numa.c:initmem_init(). >>>> >>>> If the property is not present at boot, no operation will be performed >>>> to define or enable additional nodes. >>>> >>>> Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com> >>>> --- >>>> arch/powerpc/mm/numa.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ >>>> 1 file changed, 47 insertions(+) >>>> >>>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c >>>> index ec098b3..b385cd0 100644 >>>> --- a/arch/powerpc/mm/numa.c >>>> +++ b/arch/powerpc/mm/numa.c >>>> @@ -892,6 +892,51 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn) >>>> NODE_DATA(nid)->node_spanned_pages = spanned_pages; >>>> } >>>> >>>> +static void __init node_associativity_setup(void) >>> >>> This should really be called "find_possible_nodes()" or something more >>> descriptive. >> >> Okay. >>> >>>> +{ >>>> + struct device_node *rtas; >>>> + >>>> + rtas = of_find_node_by_path("/rtas"); >>>> + if (rtas) { >>> >>> If you just short-circuit that return the whole function body can be >>> deintented, making it significantly more readable. >>> >>> ie: >>> + rtas = of_find_node_by_path("/rtas"); >>> + if (!rtas) >>> + return; >> >> Okay. >> >>> >>>> + const __be32 *prop; >>>> + u32 len, entries, numnodes, i; >>>> + >>>> + prop = of_get_property(rtas, >>>> + "ibm,current-associativity-domains", &len); >>> >>> Please don't use of_get_property() in new code, we have much better >>> accessors these days, which do better error checking and handle the >>> endian conversions for you. >>> >>> In this case you'd use eg: >>> >>> u32 entries; >>> rc = of_property_read_u32(rtas, "ibm,current-associativity-domains", &entries); >> >> The property 'ibm,current-associativity-domains' has the same format as the property >> 'ibm,max-associativity-domains' i.e. it is an integer array. The accessor of_property_read_32, >> however, expects it to be an integer singleton value. Instead, it needs: > > I think for this case where the property is an array of values you could use > of_property_count_elems_of_size() to get the number of elements in the array > and then use of_property_read_u32_array() to read the array. > > -Nathan We only need one value from the array which is why I am using, >>>> + numnodes = of_read_number(&prop[min_common_depth], 1); With this implementation I do not need to allocate memory for an array, nor execute code to read all elements of the array. Michael > >> >>> >>>> + if (!prop || len < sizeof(unsigned int)) { >>>> + prop = of_get_property(rtas, >>>> + "ibm,max-associativity-domains", &len); >> if (!prop || len < sizeof(unsigned int)) >>>> + goto endit; >>>> + } >>>> + >>>> + entries = of_read_number(prop++, 1); >>>> + >>>> + if (len < (entries * sizeof(unsigned int))) >>>> + goto endit; >>>> + >>>> + if ((0 <= min_common_depth) && (min_common_depth <= (entries-1))) >>>> + entries = min_common_depth; >>>> + else >>>> + entries -= 1; >>> ^ >>> You can't just guess that will be the right entry. >>> >>> If min_common_depth is < 0 the function should have just returned >>> immediately at the top. >> >> Okay. >> >>> >>> If min_common_depth is outside the range of the property that's a buggy >>> device tree, you should print a warning and return. >>> >>>> + numnodes = of_read_number(&prop[entries], 1); >>> >>> u32 num_nodes; >>> rc = of_property_read_u32_index(rtas, "ibm,current-associativity-domains", min_common_depth, &num_nodes); >>>> + >>>> + printk(KERN_INFO "numa: Nodes = %d (mcd = %d)\n", numnodes, >>>> + min_common_depth); >>>> + >>>> + for (i = 0; i < numnodes; i++) { >>>> + if (!node_possible(i)) { >>>> + setup_node_data(i, 0, 0); >>> >>> Do we actually need to setup the NODE_DATA() yet? Doing it now ensures >>> it will not be allocated node local, which sucks. >> >> Okay. >> >>> >>>> + node_set(i, node_possible_map); >>>> + } >>>> + } >>>> + } >>>> + >>>> +endit: >>> >>> "out" would be the normal name. >> >> Okay. >> >>> >>>> + if (rtas) >>>> + of_node_put(rtas); >>>> +} >>>> + >>>> void __init initmem_init(void) >>>> { >>>> int nid, cpu; >>>> @@ -911,6 +956,8 @@ void __init initmem_init(void) >>>> */ >>> >>> You need to update the comment above here which is contradicted by the >>> new function you're adding. >> >> Okay. >> >>> >>>> nodes_and(node_possible_map, node_possible_map, node_online_map); >>>> >>>> + node_associativity_setup(); >>>> + >>>> for_each_online_node(nid) { >>>> unsigned long start_pfn, end_pfn; >>>> >>> >>> cheers >>> >>> >> > > -- Michael W. Bringmann Linux Technology Center IBM Corporation Tie-Line 363-5196 External: (512) 286-5196 Cell: (512) 466-0650 mwb@linux.vnet.ibm.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] powerpc/nodes: Ensure enough nodes avail for operations 2017-10-17 17:22 ` Michael Bringmann @ 2017-10-17 17:41 ` Nathan Fontenot 0 siblings, 0 replies; 10+ messages in thread From: Nathan Fontenot @ 2017-10-17 17:41 UTC (permalink / raw) To: Michael Bringmann, Michael Ellerman, linuxppc-dev, linux-kernel Cc: John Allen, Michael Bringmann from Kernel Team On 10/17/2017 12:22 PM, Michael Bringmann wrote: > > > On 10/17/2017 12:02 PM, Nathan Fontenot wrote: >> On 10/17/2017 11:14 AM, Michael Bringmann wrote: >>> See below. >>> >>> On 10/16/2017 07:33 AM, Michael Ellerman wrote: >>>> Michael Bringmann <mwb@linux.vnet.ibm.com> writes: >>>> >>>>> powerpc/nodes: On systems like PowerPC which allow 'hot-add' of CPU >>>> >>>> This is a powerpc-only patch, so saying "systems like PowerPC" is >>>> confusing. What you should be saying is "On pseries systems". >>>> >>>>> or memory resources, it may occur that the new resources are to be >>>>> inserted into nodes that were not used for these resources at bootup. >>>>> In the kernel, any node that is used must be defined and initialized >>>>> at boot. >>>>> >>>>> This patch extracts the value of the lowest domain level (number of >>>>> allocable resources) from the "rtas" device tree property >>>>> "ibm,current-associativity-domains" or the device tree property >>>> >>>> What is current associativity domains? I've not heard of it, where is it >>>> documented, and what does it mean. >>>> >>>> Why would use the "current" set vs the "max"? I thought the whole point >>>> was to discover the maximum possible set of nodes that could be >>>> hotplugged. >>>> >>>>> "ibm,max-associativity-domains" to use as the maximum number of nodes >>>>> to setup as possibly available in the system. This new setting will >>>>> override the instruction, >>>>> >>>>> nodes_and(node_possible_map, node_possible_map, node_online_map); >>>>> >>>>> presently seen in the function arch/powerpc/mm/numa.c:initmem_init(). >>>>> >>>>> If the property is not present at boot, no operation will be performed >>>>> to define or enable additional nodes. >>>>> >>>>> Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com> >>>>> --- >>>>> arch/powerpc/mm/numa.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ >>>>> 1 file changed, 47 insertions(+) >>>>> >>>>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c >>>>> index ec098b3..b385cd0 100644 >>>>> --- a/arch/powerpc/mm/numa.c >>>>> +++ b/arch/powerpc/mm/numa.c >>>>> @@ -892,6 +892,51 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn) >>>>> NODE_DATA(nid)->node_spanned_pages = spanned_pages; >>>>> } >>>>> >>>>> +static void __init node_associativity_setup(void) >>>> >>>> This should really be called "find_possible_nodes()" or something more >>>> descriptive. >>> >>> Okay. >>>> >>>>> +{ >>>>> + struct device_node *rtas; >>>>> + >>>>> + rtas = of_find_node_by_path("/rtas"); >>>>> + if (rtas) { >>>> >>>> If you just short-circuit that return the whole function body can be >>>> deintented, making it significantly more readable. >>>> >>>> ie: >>>> + rtas = of_find_node_by_path("/rtas"); >>>> + if (!rtas) >>>> + return; >>> >>> Okay. >>> >>>> >>>>> + const __be32 *prop; >>>>> + u32 len, entries, numnodes, i; >>>>> + >>>>> + prop = of_get_property(rtas, >>>>> + "ibm,current-associativity-domains", &len); >>>> >>>> Please don't use of_get_property() in new code, we have much better >>>> accessors these days, which do better error checking and handle the >>>> endian conversions for you. >>>> >>>> In this case you'd use eg: >>>> >>>> u32 entries; >>>> rc = of_property_read_u32(rtas, "ibm,current-associativity-domains", &entries); >>> >>> The property 'ibm,current-associativity-domains' has the same format as the property >>> 'ibm,max-associativity-domains' i.e. it is an integer array. The accessor of_property_read_32, >>> however, expects it to be an integer singleton value. Instead, it needs: >> >> I think for this case where the property is an array of values you could use >> of_property_count_elems_of_size() to get the number of elements in the array >> and then use of_property_read_u32_array() to read the array. >> >> -Nathan > > We only need one value from the array which is why I am using, > >>>>> + numnodes = of_read_number(&prop[min_common_depth], 1); > > With this implementation I do not need to allocate memory for > an array, nor execute code to read all elements of the array. > > Michael OK, I didn't see that you just needed a single value from the array. In this case you could do of_property_read_u32_index(rtas, "ibm,current-associativity-domains", min_common_depth, &numnodes); -Nathan > >> >>> >>>> >>>>> + if (!prop || len < sizeof(unsigned int)) { >>>>> + prop = of_get_property(rtas, >>>>> + "ibm,max-associativity-domains", &len); >>> if (!prop || len < sizeof(unsigned int)) >>>>> + goto endit; >>>>> + } >>>>> + >>>>> + entries = of_read_number(prop++, 1); >>>>> + >>>>> + if (len < (entries * sizeof(unsigned int))) >>>>> + goto endit; >>>>> + >>>>> + if ((0 <= min_common_depth) && (min_common_depth <= (entries-1))) >>>>> + entries = min_common_depth; >>>>> + else >>>>> + entries -= 1; >>>> ^ >>>> You can't just guess that will be the right entry. >>>> >>>> If min_common_depth is < 0 the function should have just returned >>>> immediately at the top. >>> >>> Okay. >>> >>>> >>>> If min_common_depth is outside the range of the property that's a buggy >>>> device tree, you should print a warning and return. >>>> >>>>> + numnodes = of_read_number(&prop[entries], 1); >>>> >>>> u32 num_nodes; >>>> rc = of_property_read_u32_index(rtas, "ibm,current-associativity-domains", min_common_depth, &num_nodes); >>>>> + >>>>> + printk(KERN_INFO "numa: Nodes = %d (mcd = %d)\n", numnodes, >>>>> + min_common_depth); >>>>> + >>>>> + for (i = 0; i < numnodes; i++) { >>>>> + if (!node_possible(i)) { >>>>> + setup_node_data(i, 0, 0); >>>> >>>> Do we actually need to setup the NODE_DATA() yet? Doing it now ensures >>>> it will not be allocated node local, which sucks. >>> >>> Okay. >>> >>>> >>>>> + node_set(i, node_possible_map); >>>>> + } >>>>> + } >>>>> + } >>>>> + >>>>> +endit: >>>> >>>> "out" would be the normal name. >>> >>> Okay. >>> >>>> >>>>> + if (rtas) >>>>> + of_node_put(rtas); >>>>> +} >>>>> + >>>>> void __init initmem_init(void) >>>>> { >>>>> int nid, cpu; >>>>> @@ -911,6 +956,8 @@ void __init initmem_init(void) >>>>> */ >>>> >>>> You need to update the comment above here which is contradicted by the >>>> new function you're adding. >>> >>> Okay. >>> >>>> >>>>> nodes_and(node_possible_map, node_possible_map, node_online_map); >>>>> >>>>> + node_associativity_setup(); >>>>> + >>>>> for_each_online_node(nid) { >>>>> unsigned long start_pfn, end_pfn; >>>>> >>>> >>>> cheers >>>> >>>> >>> >> >> > ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/2] powerpc/hotplug: Ensure nodes initialized for hotplug 2017-09-18 18:28 [PATCH 0/2] powerpc/nodes/hotplug: Fix problem with memoryless nodes Michael Bringmann 2017-09-18 18:28 ` [PATCH 1/2] powerpc/nodes: Ensure enough nodes avail for operations Michael Bringmann @ 2017-09-18 18:28 ` Michael Bringmann 2017-10-16 12:54 ` Michael Ellerman 1 sibling, 1 reply; 10+ messages in thread From: Michael Bringmann @ 2017-09-18 18:28 UTC (permalink / raw) To: linuxppc-dev, linux-kernel Cc: Michael Ellerman, Michael Bringmann, John Allen, Nathan Fontenot powerpc/hotplug: On systems like PowerPC which allow 'hot-add' of CPU, it may occur that the new resources are to be inserted into nodes that were not used for memory resources at bootup. Many different configurations of PowerPC resources may need to be supported depending upon the environment. This patch fixes some problems encountered at runtime with configurations that support memory-less nodes, but which allow CPUs to be added at and after boot. Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com> --- arch/powerpc/mm/numa.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index b385cd0..e811dd1 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -551,7 +551,7 @@ static int numa_setup_cpu(unsigned long lcpu) nid = of_node_to_nid_single(cpu); out_present: - if (nid < 0 || !node_online(nid)) + if (nid < 0 || !node_possible(nid)) nid = first_online_node; map_cpu_to_node(lcpu, nid); @@ -1325,6 +1325,17 @@ static long vphn_get_associativity(unsigned long cpu, return rc; } +static int verify_node_preparation(int nid) +{ + if ((NODE_DATA(nid) == NULL) || + (NODE_DATA(nid)->node_spanned_pages == 0)) { + if (try_online_node(nid)) + return first_online_node; + } + + return nid; +} + /* * Update the CPU maps and sysfs entries for a single CPU when its NUMA * characteristics change. This function doesn't perform any locking and is @@ -1433,9 +1444,11 @@ int numa_update_cpu_topology(bool cpus_locked) /* Use associativity from first thread for all siblings */ vphn_get_associativity(cpu, associativity); new_nid = associativity_to_nid(associativity); - if (new_nid < 0 || !node_online(new_nid)) + if (new_nid < 0 || !node_possible(new_nid)) new_nid = first_online_node; + new_nid = verify_node_preparation(new_nid); + if (new_nid == numa_cpu_lookup_table[cpu]) { cpumask_andnot(&cpu_associativity_changes_mask, &cpu_associativity_changes_mask, ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] powerpc/hotplug: Ensure nodes initialized for hotplug 2017-09-18 18:28 ` [PATCH 2/2] powerpc/hotplug: Ensure nodes initialized for hotplug Michael Bringmann @ 2017-10-16 12:54 ` Michael Ellerman 2017-10-17 15:08 ` Michael Bringmann 0 siblings, 1 reply; 10+ messages in thread From: Michael Ellerman @ 2017-10-16 12:54 UTC (permalink / raw) To: Michael Bringmann, linuxppc-dev, linux-kernel Cc: Michael Bringmann, John Allen, Nathan Fontenot Michael Bringmann <mwb@linux.vnet.ibm.com> writes: > powerpc/hotplug: On systems like PowerPC which allow 'hot-add' of CPU, > it may occur that the new resources are to be inserted into nodes > that were not used for memory resources at bootup. Many different > configurations of PowerPC resources may need to be supported depending > upon the environment. Give me some detail please?! > This patch fixes some problems encountered at What problems? > runtime with configurations that support memory-less nodes, but which > allow CPUs to be added at and after boot. How does it fix those problems? > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c > index b385cd0..e811dd1 100644 > --- a/arch/powerpc/mm/numa.c > +++ b/arch/powerpc/mm/numa.c > @@ -1325,6 +1325,17 @@ static long vphn_get_associativity(unsigned long cpu, > return rc; > } > > +static int verify_node_preparation(int nid) > +{ I would not expect a function called "verify" ... > + if ((NODE_DATA(nid) == NULL) || > + (NODE_DATA(nid)->node_spanned_pages == 0)) { > + if (try_online_node(nid)) .. to do something like online a node. > + return first_online_node; > + } > + > + return nid; > +} > + > /* > * Update the CPU maps and sysfs entries for a single CPU when its NUMA > * characteristics change. This function doesn't perform any locking and is > @@ -1433,9 +1444,11 @@ int numa_update_cpu_topology(bool cpus_locked) > /* Use associativity from first thread for all siblings */ > vphn_get_associativity(cpu, associativity); > new_nid = associativity_to_nid(associativity); > - if (new_nid < 0 || !node_online(new_nid)) > + if (new_nid < 0 || !node_possible(new_nid)) > new_nid = first_online_node; > > + new_nid = verify_node_preparation(new_nid); You're being called part-way through CPU hotplug here, are we sure it's safe to go and do memory hotplug from there? What's the locking situation? cheers ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] powerpc/hotplug: Ensure nodes initialized for hotplug 2017-10-16 12:54 ` Michael Ellerman @ 2017-10-17 15:08 ` Michael Bringmann 0 siblings, 0 replies; 10+ messages in thread From: Michael Bringmann @ 2017-10-17 15:08 UTC (permalink / raw) To: Michael Ellerman, linuxppc-dev, linux-kernel; +Cc: John Allen, Nathan Fontenot On 10/16/2017 07:54 AM, Michael Ellerman wrote: > Michael Bringmann <mwb@linux.vnet.ibm.com> writes: > >> powerpc/hotplug: On systems like PowerPC which allow 'hot-add' of CPU, >> it may occur that the new resources are to be inserted into nodes >> that were not used for memory resources at bootup. Many different >> configurations of PowerPC resources may need to be supported depending >> upon the environment. > > Give me some detail please?! Configurations that demonstrated problems included 'memoryless' nodes that possessed only CPUs at boot, and nodes that contained neither CPUs nor memory at boot. The calculations in the kernel resulted in a different node use layout on many SAP HANA configured systems. > >> This patch fixes some problems encountered at > > What problems? The previous implementation collapsed all node assignments after affinity calculations to use only those nodes that had memory at boot. This resulted in calculation and configuration differences between the FSP code and the Linux kernel. > >> runtime with configurations that support memory-less nodes, but which >> allow CPUs to be added at and after boot. > > How does it fix those problems? The change involves completing the initialization of nodes that were not used at boot, but which were selected by VPHN affinity calculations during subsequent hotplug operations. Michael > >> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c >> index b385cd0..e811dd1 100644 >> --- a/arch/powerpc/mm/numa.c >> +++ b/arch/powerpc/mm/numa.c >> @@ -1325,6 +1325,17 @@ static long vphn_get_associativity(unsigned long cpu, >> return rc; >> } >> >> +static int verify_node_preparation(int nid) >> +{ > > I would not expect a function called "verify" ... > >> + if ((NODE_DATA(nid) == NULL) || >> + (NODE_DATA(nid)->node_spanned_pages == 0)) { >> + if (try_online_node(nid)) > > .. to do something like online a node. > >> + return first_online_node; >> + } >> + >> + return nid; >> +} >> + >> /* >> * Update the CPU maps and sysfs entries for a single CPU when its NUMA >> * characteristics change. This function doesn't perform any locking and is >> @@ -1433,9 +1444,11 @@ int numa_update_cpu_topology(bool cpus_locked) >> /* Use associativity from first thread for all siblings */ >> vphn_get_associativity(cpu, associativity); >> new_nid = associativity_to_nid(associativity); >> - if (new_nid < 0 || !node_online(new_nid)) >> + if (new_nid < 0 || !node_possible(new_nid)) >> new_nid = first_online_node; >> >> + new_nid = verify_node_preparation(new_nid); > > You're being called part-way through CPU hotplug here, are we sure it's > safe to go and do memory hotplug from there? What's the locking > situation? > > cheers > > -- Michael W. Bringmann Linux Technology Center IBM Corporation Tie-Line 363-5196 External: (512) 286-5196 Cell: (512) 466-0650 mwb@linux.vnet.ibm.com ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2017-10-17 17:51 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-09-18 18:28 [PATCH 0/2] powerpc/nodes/hotplug: Fix problem with memoryless nodes Michael Bringmann 2017-09-18 18:28 ` [PATCH 1/2] powerpc/nodes: Ensure enough nodes avail for operations Michael Bringmann 2017-10-16 12:33 ` Michael Ellerman 2017-10-17 16:14 ` Michael Bringmann 2017-10-17 17:02 ` Nathan Fontenot 2017-10-17 17:22 ` Michael Bringmann 2017-10-17 17:41 ` Nathan Fontenot 2017-09-18 18:28 ` [PATCH 2/2] powerpc/hotplug: Ensure nodes initialized for hotplug Michael Bringmann 2017-10-16 12:54 ` Michael Ellerman 2017-10-17 15:08 ` Michael Bringmann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox