* [PATCH 0/3] Early node associativity
@ 2019-08-22 14:42 Srikar Dronamraju
2019-08-22 14:42 ` [PATCH 1/3] powerpc/vphn: Check for error from hcall_vphn Srikar Dronamraju
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Srikar Dronamraju @ 2019-08-22 14:42 UTC (permalink / raw)
To: linuxppc-dev
Cc: Nathan Lynch, Srikar Dronamraju, Nicholas Piggin, Abdul Haleem,
Satheesh Rajendran
Abdul reported a warning on a shared lpar.
"WARNING: workqueue cpumask: online intersect > possible intersect".
This is because per node workqueue possible mask is set very early in the
boot process even before the system was querying the home node
associativity. However per node workqueue online cpumask gets updated
dynamically. Hence there is a chance when per node workqueue online cpumask
is a superset of per node workqueue possible mask.
The below patches try to fix this problem.
Reported at : https://github.com/linuxppc/issues/issues/167
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Srikar Dronamraju (3):
powerpc/vphn: Check for error from hcall_vphn
powerpc/numa: Early request for home node associativity
powerpc/numa: Remove late request for home node associativity
arch/powerpc/include/asm/topology.h | 4 ---
arch/powerpc/kernel/setup-common.c | 5 ++--
arch/powerpc/kernel/smp.c | 5 ----
arch/powerpc/mm/numa.c | 53 ++++++++++++++++++++++++++---------
arch/powerpc/platforms/pseries/vphn.c | 3 +-
5 files changed, 45 insertions(+), 25 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/3] powerpc/vphn: Check for error from hcall_vphn
2019-08-22 14:42 [PATCH 0/3] Early node associativity Srikar Dronamraju
@ 2019-08-22 14:42 ` Srikar Dronamraju
2019-08-22 16:41 ` Nathan Lynch
2019-08-22 14:42 ` [PATCH 2/3] powerpc/numa: Early request for home node associativity Srikar Dronamraju
2019-08-22 14:42 ` [PATCH 3/3] powerpc/numa: Remove late " Srikar Dronamraju
2 siblings, 1 reply; 10+ messages in thread
From: Srikar Dronamraju @ 2019-08-22 14:42 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nathan Lynch, Srikar Dronamraju, Nicholas Piggin
There is no point in unpacking associativity, if
H_HOME_NODE_ASSOCIATIVITY hcall has returned an error.
Also added error messages for H_PARAMETER and default case in
vphn_get_associativity.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
---
arch/powerpc/mm/numa.c | 16 +++++++++++++---
arch/powerpc/platforms/pseries/vphn.c | 3 ++-
2 files changed, 15 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 50d68d2..88b5157 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1191,6 +1191,10 @@ static long vphn_get_associativity(unsigned long cpu,
VPHN_FLAG_VCPU, associativity);
switch (rc) {
+ case H_SUCCESS:
+ dbg("VPHN hcall succeeded. Reset polling...\n");
+ timed_topology_update(0);
+ break;
case H_FUNCTION:
printk_once(KERN_INFO
"VPHN is not supported. Disabling polling...\n");
@@ -1202,9 +1206,15 @@ static long vphn_get_associativity(unsigned long cpu,
"preventing VPHN. Disabling polling...\n");
stop_topology_update();
break;
- case H_SUCCESS:
- dbg("VPHN hcall succeeded. Reset polling...\n");
- timed_topology_update(0);
+ case H_PARAMETER:
+ printk(KERN_ERR
+ "hcall_vphn() was passed an invalid parameter."
+ "Disabling polling...\n");
+ break;
+ default:
+ printk(KERN_ERR
+ "hcall_vphn() returned %ld. Disabling polling \n", rc);
+ stop_topology_update();
break;
}
diff --git a/arch/powerpc/platforms/pseries/vphn.c b/arch/powerpc/platforms/pseries/vphn.c
index 3f07bf6..cca474a 100644
--- a/arch/powerpc/platforms/pseries/vphn.c
+++ b/arch/powerpc/platforms/pseries/vphn.c
@@ -82,7 +82,8 @@ long hcall_vphn(unsigned long cpu, u64 flags, __be32 *associativity)
long retbuf[PLPAR_HCALL9_BUFSIZE] = {0};
rc = plpar_hcall9(H_HOME_NODE_ASSOCIATIVITY, retbuf, flags, cpu);
- vphn_unpack_associativity(retbuf, associativity);
+ if (rc == H_SUCCESS)
+ vphn_unpack_associativity(retbuf, associativity);
return rc;
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/3] powerpc/numa: Early request for home node associativity
2019-08-22 14:42 [PATCH 0/3] Early node associativity Srikar Dronamraju
2019-08-22 14:42 ` [PATCH 1/3] powerpc/vphn: Check for error from hcall_vphn Srikar Dronamraju
@ 2019-08-22 14:42 ` Srikar Dronamraju
2019-08-22 17:17 ` Nathan Lynch
2019-08-23 7:16 ` Satheesh Rajendran
2019-08-22 14:42 ` [PATCH 3/3] powerpc/numa: Remove late " Srikar Dronamraju
2 siblings, 2 replies; 10+ messages in thread
From: Srikar Dronamraju @ 2019-08-22 14:42 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nathan Lynch, Srikar Dronamraju, Nicholas Piggin
Currently the kernel detects if its running on a shared lpar platform
and requests home node associativity before the scheduler sched_domains
are setup. However between the time NUMA setup is initialized and the
request for home node associativity, workqueue initializes its per node
cpumask. The per node workqueue possible cpumask may turn invalid
after home node associativity resulting in weird situations like
workqueue possible cpumask being a subset of workqueue online cpumask.
This can be fixed by requesting home node associativity earlier just
before NUMA setup. However at the NUMA setup time, kernel may not be in
a position to detect if its running on a shared lpar platform. So
request for home node associativity and if the request fails, fallback
on the device tree property.
However home node associativity requires cpu's hwid which is set in
smp_setup_pacas. Hence call smp_setup_pacas before numa_setup_cpus.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
---
arch/powerpc/kernel/setup-common.c | 5 +++--
arch/powerpc/mm/numa.c | 28 +++++++++++++++++++++++++++-
2 files changed, 30 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index 1f8db66..9135dba 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -888,6 +888,9 @@ void __init setup_arch(char **cmdline_p)
/* Check the SMT related command line arguments (ppc64). */
check_smt_enabled();
+#ifdef CONFIG_SMP
+ smp_setup_pacas();
+#endif
/* Parse memory topology */
mem_topology_setup();
@@ -899,8 +902,6 @@ void __init setup_arch(char **cmdline_p)
* so smp_release_cpus() does nothing for them.
*/
#ifdef CONFIG_SMP
- smp_setup_pacas();
-
/* On BookE, setup per-core TLB data structures. */
setup_tlb_core_data();
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 88b5157..7965d3b 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -461,6 +461,21 @@ static int of_drconf_to_nid_single(struct drmem_lmb *lmb)
return nid;
}
+static int vphn_get_nid(unsigned long cpu)
+{
+ __be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
+ long rc;
+
+ /* Use associativity from first thread for all siblings */
+ rc = hcall_vphn(get_hard_smp_processor_id(cpu),
+ VPHN_FLAG_VCPU, associativity);
+
+ if (rc == H_SUCCESS)
+ return associativity_to_nid(associativity);
+
+ return NUMA_NO_NODE;
+}
+
/*
* Figure out to which domain a cpu belongs and stick it there.
* Return the id of the domain used.
@@ -490,7 +505,18 @@ static int numa_setup_cpu(unsigned long lcpu)
goto out;
}
- nid = of_node_to_nid_single(cpu);
+ /*
+ * On a shared lpar, the device tree might not have the correct node
+ * associativity. At this time lppaca, or its __old_status field
+ * may not be updated. Hence request an explicit associativity
+ * irrespective of whether the lpar is shared or dedicated. Use the
+ * device tree property as a fallback.
+ */
+ if (firmware_has_feature(FW_FEATURE_VPHN))
+ nid = vphn_get_nid(lcpu);
+
+ if (nid == NUMA_NO_NODE)
+ nid = of_node_to_nid_single(cpu);
out_present:
if (nid < 0 || !node_possible(nid))
--
1.8.3.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/3] powerpc/numa: Remove late request for home node associativity
2019-08-22 14:42 [PATCH 0/3] Early node associativity Srikar Dronamraju
2019-08-22 14:42 ` [PATCH 1/3] powerpc/vphn: Check for error from hcall_vphn Srikar Dronamraju
2019-08-22 14:42 ` [PATCH 2/3] powerpc/numa: Early request for home node associativity Srikar Dronamraju
@ 2019-08-22 14:42 ` Srikar Dronamraju
2 siblings, 0 replies; 10+ messages in thread
From: Srikar Dronamraju @ 2019-08-22 14:42 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nathan Lynch, Srikar Dronamraju, Nicholas Piggin
With commit ("powerpc/numa: Early request for home node associativity"),
commit 2ea626306810 ("powerpc/topology: Get topology for shared
processors at boot") which was requesting home node associativity
becomes redundant.
Hence remove the late request for home node associativity.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/topology.h | 4 ----
arch/powerpc/kernel/smp.c | 5 -----
arch/powerpc/mm/numa.c | 9 ---------
3 files changed, 18 deletions(-)
diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
index 2f7e1ea..9bd396f 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -98,7 +98,6 @@ static inline int cpu_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc)
extern int prrn_is_enabled(void);
extern int find_and_online_cpu_nid(int cpu);
extern int timed_topology_update(int nsecs);
-extern void __init shared_proc_topology_init(void);
#else
static inline int start_topology_update(void)
{
@@ -121,9 +120,6 @@ static inline int timed_topology_update(int nsecs)
return 0;
}
-#ifdef CONFIG_SMP
-static inline void shared_proc_topology_init(void) {}
-#endif
#endif /* CONFIG_NUMA && CONFIG_PPC_SPLPAR */
#include <asm-generic/topology.h>
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index ea6adbf..cdd39a0 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1359,11 +1359,6 @@ void __init smp_cpus_done(unsigned int max_cpus)
if (smp_ops && smp_ops->bringup_done)
smp_ops->bringup_done();
- /*
- * On a shared LPAR, associativity needs to be requested.
- * Hence, get numa topology before dumping cpu topology
- */
- shared_proc_topology_init();
dump_numa_cpu_topology();
#ifdef CONFIG_SCHED_SMT
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 7965d3b..2efeac8 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1604,15 +1604,6 @@ int prrn_is_enabled(void)
return prrn_enabled;
}
-void __init shared_proc_topology_init(void)
-{
- if (lppaca_shared_proc(get_lppaca())) {
- bitmap_fill(cpumask_bits(&cpu_associativity_changes_mask),
- nr_cpumask_bits);
- numa_update_cpu_topology(false);
- }
-}
-
static int topology_read(struct seq_file *file, void *v)
{
if (vphn_enabled || prrn_enabled)
--
1.8.3.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] powerpc/vphn: Check for error from hcall_vphn
2019-08-22 14:42 ` [PATCH 1/3] powerpc/vphn: Check for error from hcall_vphn Srikar Dronamraju
@ 2019-08-22 16:41 ` Nathan Lynch
0 siblings, 0 replies; 10+ messages in thread
From: Nathan Lynch @ 2019-08-22 16:41 UTC (permalink / raw)
To: Srikar Dronamraju, linuxppc-dev; +Cc: Srikar Dronamraju, Nicholas Piggin
Hi Srikar,
Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
> There is no point in unpacking associativity, if
> H_HOME_NODE_ASSOCIATIVITY hcall has returned an error.
>
> Also added error messages for H_PARAMETER and default case in
> vphn_get_associativity.
These are two logical changes and should be separated IMO.
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 50d68d2..88b5157 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -1191,6 +1191,10 @@ static long vphn_get_associativity(unsigned long cpu,
> VPHN_FLAG_VCPU, associativity);
>
> switch (rc) {
> + case H_SUCCESS:
> + dbg("VPHN hcall succeeded. Reset polling...\n");
> + timed_topology_update(0);
> + break;
> case H_FUNCTION:
> printk_once(KERN_INFO
> "VPHN is not supported. Disabling polling...\n");
> @@ -1202,9 +1206,15 @@ static long vphn_get_associativity(unsigned long cpu,
> "preventing VPHN. Disabling polling...\n");
> stop_topology_update();
> break;
> - case H_SUCCESS:
> - dbg("VPHN hcall succeeded. Reset polling...\n");
> - timed_topology_update(0);
> + case H_PARAMETER:
> + printk(KERN_ERR
> + "hcall_vphn() was passed an invalid parameter."
> + "Disabling polling...\n");
This will come out as:
hcall_vphn() was passed an invalid parameter.Disabling polling...
^
And it's misleading to say VPHN polling is being disabled when this case
does not invoke stop_topology_update().
> + break;
> + default:
> + printk(KERN_ERR
> + "hcall_vphn() returned %ld. Disabling polling \n", rc);
> + stop_topology_update();
> break;
Any added prints in this routine must be _once or _ratelimited to avoid
log floods. Also use the pr_ APIs instead of printk please.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/3] powerpc/numa: Early request for home node associativity
2019-08-22 14:42 ` [PATCH 2/3] powerpc/numa: Early request for home node associativity Srikar Dronamraju
@ 2019-08-22 17:17 ` Nathan Lynch
2019-08-22 17:40 ` Srikar Dronamraju
2019-08-23 7:16 ` Satheesh Rajendran
1 sibling, 1 reply; 10+ messages in thread
From: Nathan Lynch @ 2019-08-22 17:17 UTC (permalink / raw)
To: Srikar Dronamraju, linuxppc-dev; +Cc: Srikar Dronamraju, Nicholas Piggin
Hi Srikar,
Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
> Currently the kernel detects if its running on a shared lpar platform
> and requests home node associativity before the scheduler sched_domains
> are setup. However between the time NUMA setup is initialized and the
> request for home node associativity, workqueue initializes its per node
> cpumask. The per node workqueue possible cpumask may turn invalid
> after home node associativity resulting in weird situations like
> workqueue possible cpumask being a subset of workqueue online cpumask.
>
> This can be fixed by requesting home node associativity earlier just
> before NUMA setup. However at the NUMA setup time, kernel may not be in
> a position to detect if its running on a shared lpar platform. So
> request for home node associativity and if the request fails, fallback
> on the device tree property.
I think this is generally sound at the conceptual level.
> However home node associativity requires cpu's hwid which is set in
> smp_setup_pacas. Hence call smp_setup_pacas before numa_setup_cpus.
But this seems like it would negatively affect pacas' NUMA placements?
Would it be less risky to figure out a way to do "early" VPHN hcalls
before mem_topology_setup, getting the hwids from the cpu_to_phys_id
array perhaps?
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 88b5157..7965d3b 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -461,6 +461,21 @@ static int of_drconf_to_nid_single(struct drmem_lmb *lmb)
> return nid;
> }
>
> +static int vphn_get_nid(unsigned long cpu)
> +{
> + __be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
> + long rc;
> +
> + /* Use associativity from first thread for all siblings */
I don't understand how this comment corresponds to the code it
accompanies.
> + rc = hcall_vphn(get_hard_smp_processor_id(cpu),
> + VPHN_FLAG_VCPU, associativity);
> +
> + if (rc == H_SUCCESS)
> + return associativity_to_nid(associativity);
^^ extra space
> @@ -490,7 +505,18 @@ static int numa_setup_cpu(unsigned long lcpu)
> goto out;
> }
>
> - nid = of_node_to_nid_single(cpu);
> + /*
> + * On a shared lpar, the device tree might not have the correct node
> + * associativity. At this time lppaca, or its __old_status field
Sorry but I'm going to quibble with this phrasing a bit. On SPLPAR the
CPU nodes have no affinity information in the device tree at all. This
comment implies that they may have incorrect information, which is
AFAIK not the case.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/3] powerpc/numa: Early request for home node associativity
2019-08-22 17:17 ` Nathan Lynch
@ 2019-08-22 17:40 ` Srikar Dronamraju
2019-08-22 18:33 ` Nathan Lynch
0 siblings, 1 reply; 10+ messages in thread
From: Srikar Dronamraju @ 2019-08-22 17:40 UTC (permalink / raw)
To: Nathan Lynch; +Cc: linuxppc-dev, Nicholas Piggin
* Nathan Lynch <nathanl@linux.ibm.com> [2019-08-22 12:17:48]:
> Hi Srikar,
Thanks Nathan for the review.
>
> > However home node associativity requires cpu's hwid which is set in
> > smp_setup_pacas. Hence call smp_setup_pacas before numa_setup_cpus.
>
> But this seems like it would negatively affect pacas' NUMA placements?
>
> Would it be less risky to figure out a way to do "early" VPHN hcalls
> before mem_topology_setup, getting the hwids from the cpu_to_phys_id
> array perhaps?
>
Do you mean for calls from mem_topology_setup(), stuff we use cpu_to_phys_id
but for the calls from ppc_numa_cpu_prepare() we use the
get_hard_smp_processor_id()?
Thats doable.
>
> > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> > index 88b5157..7965d3b 100644
> > --- a/arch/powerpc/mm/numa.c
> > +++ b/arch/powerpc/mm/numa.c
> > @@ -461,6 +461,21 @@ static int of_drconf_to_nid_single(struct drmem_lmb *lmb)
> > return nid;
> > }
> >
> > +static int vphn_get_nid(unsigned long cpu)
> > +{
> > + __be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
> > + long rc;
> > +
> > + /* Use associativity from first thread for all siblings */
>
> I don't understand how this comment corresponds to the code it
> accompanies.
Okay will rephrase
>
>
> > + rc = hcall_vphn(get_hard_smp_processor_id(cpu),
> > + VPHN_FLAG_VCPU, associativity);
> > +
> > + if (rc == H_SUCCESS)
> > + return associativity_to_nid(associativity);
> ^^ extra space
>
> > @@ -490,7 +505,18 @@ static int numa_setup_cpu(unsigned long lcpu)
> > goto out;
> > }
> >
> > - nid = of_node_to_nid_single(cpu);
> > + /*
> > + * On a shared lpar, the device tree might not have the correct node
> > + * associativity. At this time lppaca, or its __old_status field
>
> Sorry but I'm going to quibble with this phrasing a bit. On SPLPAR the
> CPU nodes have no affinity information in the device tree at all. This
> comment implies that they may have incorrect information, which is
> AFAIK not the case.
>
Okay will clarify.
--
Thanks and Regards
Srikar Dronamraju
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/3] powerpc/numa: Early request for home node associativity
2019-08-22 17:40 ` Srikar Dronamraju
@ 2019-08-22 18:33 ` Nathan Lynch
0 siblings, 0 replies; 10+ messages in thread
From: Nathan Lynch @ 2019-08-22 18:33 UTC (permalink / raw)
To: Srikar Dronamraju; +Cc: linuxppc-dev, Nicholas Piggin
Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
> * Nathan Lynch <nathanl@linux.ibm.com> [2019-08-22 12:17:48]:
>> > However home node associativity requires cpu's hwid which is set in
>> > smp_setup_pacas. Hence call smp_setup_pacas before numa_setup_cpus.
>>
>> But this seems like it would negatively affect pacas' NUMA placements?
>>
>> Would it be less risky to figure out a way to do "early" VPHN hcalls
>> before mem_topology_setup, getting the hwids from the cpu_to_phys_id
>> array perhaps?
>>
>
> Do you mean for calls from mem_topology_setup(), stuff we use cpu_to_phys_id
> but for the calls from ppc_numa_cpu_prepare() we use the
> get_hard_smp_processor_id()?
Yes, something like that, I think. Although numa_setup_cpu() is used in
both contexts.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/3] powerpc/numa: Early request for home node associativity
2019-08-22 14:42 ` [PATCH 2/3] powerpc/numa: Early request for home node associativity Srikar Dronamraju
2019-08-22 17:17 ` Nathan Lynch
@ 2019-08-23 7:16 ` Satheesh Rajendran
2019-08-27 6:57 ` Srikar Dronamraju
1 sibling, 1 reply; 10+ messages in thread
From: Satheesh Rajendran @ 2019-08-23 7:16 UTC (permalink / raw)
To: Srikar Dronamraju; +Cc: Nathan Lynch, linuxppc-dev, Nicholas Piggin
On Thu, Aug 22, 2019 at 08:12:34PM +0530, Srikar Dronamraju wrote:
> Currently the kernel detects if its running on a shared lpar platform
> and requests home node associativity before the scheduler sched_domains
> are setup. However between the time NUMA setup is initialized and the
> request for home node associativity, workqueue initializes its per node
> cpumask. The per node workqueue possible cpumask may turn invalid
> after home node associativity resulting in weird situations like
> workqueue possible cpumask being a subset of workqueue online cpumask.
Tested this series on Power KVM guest and expected that it fixes
https://github.com/linuxppc/issues/issues/167 but am able to see the below warning
still while doing vcpu hotplug with numa nodes, Advise if am missing anything or
this is not the intended series to fix above issue.
Env:
HW: Power8
Host/Guest Kernel: 5.3.0-rc5-00172-g13e3f1076e29 (linux master + this series)
Qemu: 4.0.90 (v4.1.0-rc3)
Guest Config:
..
<vcpu placement='static' current='2'>4</vcpu>
...
<kernel>/home/kvmci/linux/vmlinux</kernel>
<cmdline>root=/dev/sda2 rw console=tty0 console=ttyS0,115200 init=/sbin/init initcall_debug numa=debug crashkernel=1024M selinux=0</cmdline>
...
<topology sockets='1' cores='2' threads='2'/>
<numa>
<cell id='0' cpus='0-1' memory='2097152' unit='KiB'/>
<cell id='1' cpus='2-3' memory='2097152' unit='KiB'/>
</numa>
Event:
vcpu hotplug
[root@atest-guest ~]# [ 41.447170] random: crng init done
[ 41.448153] random: 7 urandom warning(s) missed due to ratelimiting
[ 51.727256] VPHN hcall succeeded. Reset polling...
[ 51.826301] adding cpu 2 to node 1
[ 51.856238] WARNING: workqueue cpumask: online intersect > possible intersect
[ 51.916297] VPHN hcall succeeded. Reset polling...
[ 52.036272] adding cpu 3 to node 1
Regards,
-Satheesh.
>
> This can be fixed by requesting home node associativity earlier just
> before NUMA setup. However at the NUMA setup time, kernel may not be in
> a position to detect if its running on a shared lpar platform. So
> request for home node associativity and if the request fails, fallback
> on the device tree property.
>
> However home node associativity requires cpu's hwid which is set in
> smp_setup_pacas. Hence call smp_setup_pacas before numa_setup_cpus.
>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Nathan Lynch <nathanl@linux.ibm.com>
> Cc: linuxppc-dev@lists.ozlabs.org
> Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
> Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
> ---
> arch/powerpc/kernel/setup-common.c | 5 +++--
> arch/powerpc/mm/numa.c | 28 +++++++++++++++++++++++++++-
> 2 files changed, 30 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
> index 1f8db66..9135dba 100644
> --- a/arch/powerpc/kernel/setup-common.c
> +++ b/arch/powerpc/kernel/setup-common.c
> @@ -888,6 +888,9 @@ void __init setup_arch(char **cmdline_p)
> /* Check the SMT related command line arguments (ppc64). */
> check_smt_enabled();
>
> +#ifdef CONFIG_SMP
> + smp_setup_pacas();
> +#endif
> /* Parse memory topology */
> mem_topology_setup();
>
> @@ -899,8 +902,6 @@ void __init setup_arch(char **cmdline_p)
> * so smp_release_cpus() does nothing for them.
> */
> #ifdef CONFIG_SMP
> - smp_setup_pacas();
> -
> /* On BookE, setup per-core TLB data structures. */
> setup_tlb_core_data();
>
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 88b5157..7965d3b 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -461,6 +461,21 @@ static int of_drconf_to_nid_single(struct drmem_lmb *lmb)
> return nid;
> }
>
> +static int vphn_get_nid(unsigned long cpu)
> +{
> + __be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
> + long rc;
> +
> + /* Use associativity from first thread for all siblings */
> + rc = hcall_vphn(get_hard_smp_processor_id(cpu),
> + VPHN_FLAG_VCPU, associativity);
> +
> + if (rc == H_SUCCESS)
> + return associativity_to_nid(associativity);
> +
> + return NUMA_NO_NODE;
> +}
> +
> /*
> * Figure out to which domain a cpu belongs and stick it there.
> * Return the id of the domain used.
> @@ -490,7 +505,18 @@ static int numa_setup_cpu(unsigned long lcpu)
> goto out;
> }
>
> - nid = of_node_to_nid_single(cpu);
> + /*
> + * On a shared lpar, the device tree might not have the correct node
> + * associativity. At this time lppaca, or its __old_status field
> + * may not be updated. Hence request an explicit associativity
> + * irrespective of whether the lpar is shared or dedicated. Use the
> + * device tree property as a fallback.
> + */
> + if (firmware_has_feature(FW_FEATURE_VPHN))
> + nid = vphn_get_nid(lcpu);
> +
> + if (nid == NUMA_NO_NODE)
> + nid = of_node_to_nid_single(cpu);
>
> out_present:
> if (nid < 0 || !node_possible(nid))
> --
> 1.8.3.1
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/3] powerpc/numa: Early request for home node associativity
2019-08-23 7:16 ` Satheesh Rajendran
@ 2019-08-27 6:57 ` Srikar Dronamraju
0 siblings, 0 replies; 10+ messages in thread
From: Srikar Dronamraju @ 2019-08-27 6:57 UTC (permalink / raw)
To: Satheesh Rajendran; +Cc: Nathan Lynch, linuxppc-dev, Nicholas Piggin
Hi Satheesh,
> > Currently the kernel detects if its running on a shared lpar platform
> > and requests home node associativity before the scheduler sched_domains
> > are setup. However between the time NUMA setup is initialized and the
> > request for home node associativity, workqueue initializes its per node
> > cpumask. The per node workqueue possible cpumask may turn invalid
> > after home node associativity resulting in weird situations like
> > workqueue possible cpumask being a subset of workqueue online cpumask.
>
> Env:
> HW: Power8
> Host/Guest Kernel: 5.3.0-rc5-00172-g13e3f1076e29 (linux master + this series)
> Qemu: 4.0.90 (v4.1.0-rc3)
>
> Guest Config:
> ..
> <vcpu placement='static' current='2'>4</vcpu>
> ...
> <kernel>/home/kvmci/linux/vmlinux</kernel>
> <cmdline>root=/dev/sda2 rw console=tty0 console=ttyS0,115200 init=/sbin/init initcall_debug numa=debug crashkernel=1024M selinux=0</cmdline>
> ...
> <topology sockets='1' cores='2' threads='2'/>
> <numa>
> <cell id='0' cpus='0-1' memory='2097152' unit='KiB'/>
> <cell id='1' cpus='2-3' memory='2097152' unit='KiB'/>
> </numa>
>
> Event:
> vcpu hotplug
>
> [root@atest-guest ~]# [ 41.447170] random: crng init done
> [ 41.448153] random: 7 urandom warning(s) missed due to ratelimiting
> [ 51.727256] VPHN hcall succeeded. Reset polling...
> [ 51.826301] adding cpu 2 to node 1
> [ 51.856238] WARNING: workqueue cpumask: online intersect > possible intersect
> [ 51.916297] VPHN hcall succeeded. Reset polling...
> [ 52.036272] adding cpu 3 to node 1
>
Thanks for testing.
The fix for this patch series was to make sure per node workqueue possible
cpus is updated correctly at boot. However Node hotplug on KVM guests and
dlpar on PowerVM lpars aren't covered by this patch series. On systems that
support shared processor, the associativity of the possible cpus is not
known at boot time. Hence we will not be able to update the per node
workquque possible cpumask.
--
Thanks and Regards
Srikar Dronamraju
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-08-27 6:59 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-08-22 14:42 [PATCH 0/3] Early node associativity Srikar Dronamraju
2019-08-22 14:42 ` [PATCH 1/3] powerpc/vphn: Check for error from hcall_vphn Srikar Dronamraju
2019-08-22 16:41 ` Nathan Lynch
2019-08-22 14:42 ` [PATCH 2/3] powerpc/numa: Early request for home node associativity Srikar Dronamraju
2019-08-22 17:17 ` Nathan Lynch
2019-08-22 17:40 ` Srikar Dronamraju
2019-08-22 18:33 ` Nathan Lynch
2019-08-23 7:16 ` Satheesh Rajendran
2019-08-27 6:57 ` Srikar Dronamraju
2019-08-22 14:42 ` [PATCH 3/3] powerpc/numa: Remove late " Srikar Dronamraju
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).