* [PATCH 5/6] perf: Optimise topology iteration
@ 2011-02-20 16:57 Lin Ming
2011-02-20 21:15 ` Andi Kleen
0 siblings, 1 reply; 5+ messages in thread
From: Lin Ming @ 2011-02-20 16:57 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Stephane Eranian, Andi Kleen; +Cc: linux-kernel
Currently we iterate the full machine looking for a matching core_id/nb
for the percore and the amd northbridge stuff , using a smaller topology
mask makes sense.
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
---
arch/x86/kernel/cpu/perf_event_amd.c | 3 ++-
arch/x86/kernel/cpu/perf_event_intel.c | 2 +-
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index 461f62b..7217d84 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -320,6 +320,7 @@ static void amd_pmu_cpu_starting(int cpu)
{
struct cpu_hw_events *cpuc = &per_cpu(cpu_hw_events, cpu);
struct amd_nb *nb;
+ const struct cpumask *mask = cpu_coregroup_mask(cpu);
int i, nb_id;
if (boot_cpu_data.x86_max_cores < 2)
@@ -328,7 +329,7 @@ static void amd_pmu_cpu_starting(int cpu)
nb_id = amd_get_nb_id(cpu);
WARN_ON_ONCE(nb_id == BAD_APICID);
- for_each_online_cpu(i) {
+ for_each_cpu(i, mask) {
nb = per_cpu(cpu_hw_events, i).amd_nb;
if (WARN_ON_ONCE(!nb))
continue;
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 059c0ab..5540d35 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1123,7 +1123,7 @@ static void intel_pmu_cpu_starting(int cpu)
if (!ht_capable())
return;
- for_each_online_cpu(i) {
+ for_each_cpu(i, topology_thread_cpumask(cpu)) {
struct intel_percore *pc = per_cpu(cpu_hw_events, i).per_core;
if (pc && pc->core_id == core_id) {
--
1.7.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 5/6] perf: Optimise topology iteration
2011-02-20 16:57 [PATCH 5/6] perf: Optimise topology iteration Lin Ming
@ 2011-02-20 21:15 ` Andi Kleen
2011-02-21 3:29 ` Lin Ming
0 siblings, 1 reply; 5+ messages in thread
From: Andi Kleen @ 2011-02-20 21:15 UTC (permalink / raw)
To: Lin Ming
Cc: Peter Zijlstra, Ingo Molnar, Stephane Eranian, Andi Kleen,
linux-kernel
On Mon, Feb 21, 2011 at 12:57:39AM +0800, Lin Ming wrote:
> Currently we iterate the full machine looking for a matching core_id/nb
> for the percore and the amd northbridge stuff , using a smaller topology
> mask makes sense.
This is still wrong for CPU hotplug. The CPU "owning" the per core
does not necessarily need to be online anymore.
Please drop this patch.
-Andi
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 5/6] perf: Optimise topology iteration
2011-02-20 21:15 ` Andi Kleen
@ 2011-02-21 3:29 ` Lin Ming
2011-02-21 3:32 ` Andi Kleen
0 siblings, 1 reply; 5+ messages in thread
From: Lin Ming @ 2011-02-21 3:29 UTC (permalink / raw)
To: Andi Kleen; +Cc: Peter Zijlstra, Ingo Molnar, Stephane Eranian, linux-kernel
On Mon, 2011-02-21 at 05:15 +0800, Andi Kleen wrote:
> On Mon, Feb 21, 2011 at 12:57:39AM +0800, Lin Ming wrote:
> > Currently we iterate the full machine looking for a matching core_id/nb
> > for the percore and the amd northbridge stuff , using a smaller topology
> > mask makes sense.
>
> This is still wrong for CPU hotplug. The CPU "owning" the per core
> does not necessarily need to be online anymore.
This is remain issue for hotplug case, no matter we use
for_each_online_cpu or topology_thread_cpumask.
> Please drop this patch.
Re-look at the code, I think for_each_online_cpu is wrong for percore,
we should use topology_thread_cpumask instead.
for_each_online_cpu(i) {
struct intel_percore *pc = per_cpu(cpu_hw_events, i).per_core;
if (pc && pc->core_id == core_id) {
kfree(cpuc->per_core);
cpuc->per_core = pc;
break;
}
}
Assume 2 sockets,
//socket 0
cpu 0: core_id 0
cpu 1: core_id 0
//socket 1
cpu 2: core_id 0
cpu 3: core_id 0
If for_each_online_cpu is used, apparently 4 logical cpus will share the
same percore. This is wrong.
If topology_thread_cpumask is used, then cpu0 and cpu1 share one percore
and cpu2 and cpu3 share another percore. This is what we want.
Lin Ming
>
> -Andi
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 5/6] perf: Optimise topology iteration
2011-02-21 3:29 ` Lin Ming
@ 2011-02-21 3:32 ` Andi Kleen
2011-02-21 5:01 ` Lin Ming
0 siblings, 1 reply; 5+ messages in thread
From: Andi Kleen @ 2011-02-21 3:32 UTC (permalink / raw)
To: Lin Ming
Cc: Andi Kleen, Peter Zijlstra, Ingo Molnar, Stephane Eranian,
linux-kernel
On Mon, Feb 21, 2011 at 11:29:24AM +0800, Lin Ming wrote:
> On Mon, 2011-02-21 at 05:15 +0800, Andi Kleen wrote:
> > On Mon, Feb 21, 2011 at 12:57:39AM +0800, Lin Ming wrote:
> > > Currently we iterate the full machine looking for a matching core_id/nb
> > > for the percore and the amd northbridge stuff , using a smaller topology
> > > mask makes sense.
> >
> > This is still wrong for CPU hotplug. The CPU "owning" the per core
> > does not necessarily need to be online anymore.
>
> This is remain issue for hotplug case, no matter we use
> for_each_online_cpu or topology_thread_cpumask.
The original code I submitted used for_each_possible_cpu which
is correct.
>
> > Please drop this patch.
>
> Re-look at the code, I think for_each_online_cpu is wrong for percore,
> we should use topology_thread_cpumask instead.
No, that's also cleared on unplug. You really need the possible map
and nothing else.
-Andi
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 5/6] perf: Optimise topology iteration
2011-02-21 3:32 ` Andi Kleen
@ 2011-02-21 5:01 ` Lin Ming
0 siblings, 0 replies; 5+ messages in thread
From: Lin Ming @ 2011-02-21 5:01 UTC (permalink / raw)
To: Andi Kleen; +Cc: Peter Zijlstra, Ingo Molnar, Stephane Eranian, linux-kernel
On Mon, 2011-02-21 at 11:32 +0800, Andi Kleen wrote:
> On Mon, Feb 21, 2011 at 11:29:24AM +0800, Lin Ming wrote:
> > On Mon, 2011-02-21 at 05:15 +0800, Andi Kleen wrote:
> > > On Mon, Feb 21, 2011 at 12:57:39AM +0800, Lin Ming wrote:
> > > > Currently we iterate the full machine looking for a matching core_id/nb
> > > > for the percore and the amd northbridge stuff , using a smaller topology
> > > > mask makes sense.
> > >
> > > This is still wrong for CPU hotplug. The CPU "owning" the per core
> > > does not necessarily need to be online anymore.
> >
> > This is remain issue for hotplug case, no matter we use
> > for_each_online_cpu or topology_thread_cpumask.
>
> The original code I submitted used for_each_possible_cpu which
> is correct.
>
> >
> > > Please drop this patch.
> >
> > Re-look at the code, I think for_each_online_cpu is wrong for percore,
> > we should use topology_thread_cpumask instead.
>
> No, that's also cleared on unplug. You really need the possible map
> and nothing else.
That's wrong for kernel initialization, not related to hotplug.
I wrote a simple debug patch,
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index f152930..913a8a5 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1123,7 +1123,7 @@ static void intel_pmu_cpu_starting(int cpu)
if (!ht_capable())
return;
- for_each_cpu(i, topology_thread_cpumask(cpu)) {
+ for_each_possible_cpu(i) {
struct intel_percore *pc = per_cpu(cpu_hw_events, i).per_core;
if (pc && pc->core_id == core_id) {
@@ -1135,6 +1135,9 @@ static void intel_pmu_cpu_starting(int cpu)
cpuc->per_core->core_id = core_id;
cpuc->per_core->refcnt++;
+
+ printk("DEBUG: cpu%d, per_core %p, core_id: %d, ref_count: %d\n",
+ cpu, cpuc->per_core, cpuc->per_core->core_id, cpuc->per_core->refcnt);
}
static void intel_pmu_cpu_dying(int cpu)
The output as below,
DEBUG: cpu0, per_core ffff8801bec32600, core_id: 0, ref_count: 1
DEBUG: cpu1, per_core ffff8801bec32600, core_id: 0, ref_count: 2
DEBUG: cpu2, per_core ffff8801bec32a20, core_id: 1, ref_count: 1
DEBUG: cpu3, per_core ffff8801bec32a20, core_id: 1, ref_count: 2
DEBUG: cpu4, per_core ffff8801bec32de0, core_id: 2, ref_count: 1
DEBUG: cpu5, per_core ffff8801bec32de0, core_id: 2, ref_count: 2
DEBUG: cpu6, per_core ffff8801becfc120, core_id: 3, ref_count: 1
DEBUG: cpu7, per_core ffff8801becfc120, core_id: 3, ref_count: 2
DEBUG: cpu8, per_core ffff8801bec32600, core_id: 0, ref_count: 3
DEBUG: cpu9, per_core ffff8801bec32600, core_id: 0, ref_count: 4
DEBUG: cpu10, per_core ffff8801bec32a20, core_id: 1, ref_count: 3
DEBUG: cpu11, per_core ffff8801bec32a20, core_id: 1, ref_count: 4
DEBUG: cpu12, per_core ffff8801bec32de0, core_id: 2, ref_count: 3
DEBUG: cpu13, per_core ffff8801bec32de0, core_id: 2, ref_count: 4
DEBUG: cpu14, per_core ffff8801becfc120, core_id: 3, ref_count: 3
DEBUG: cpu15, per_core ffff8801becfc120, core_id: 3, ref_count: 4
As you can see, cpu0, cpu1, cpu8 and cpu9 share the same per_core(ffff8801bec32600).
This is wrong.
cpu0 and cpu8 should share one pef_core, cpu1 and cpu9 share another per_core.
>
> -Andi
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-02-21 5:00 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-20 16:57 [PATCH 5/6] perf: Optimise topology iteration Lin Ming
2011-02-20 21:15 ` Andi Kleen
2011-02-21 3:29 ` Lin Ming
2011-02-21 3:32 ` Andi Kleen
2011-02-21 5:01 ` Lin Ming
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox