* [PATCH 5/6] perf: Optimise topology iteration @ 2011-02-20 16:57 Lin Ming 2011-02-20 21:15 ` Andi Kleen 0 siblings, 1 reply; 5+ messages in thread From: Lin Ming @ 2011-02-20 16:57 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Stephane Eranian, Andi Kleen; +Cc: linux-kernel Currently we iterate the full machine looking for a matching core_id/nb for the percore and the amd northbridge stuff , using a smaller topology mask makes sense. Signed-off-by: Lin Ming <ming.m.lin@intel.com> --- arch/x86/kernel/cpu/perf_event_amd.c | 3 ++- arch/x86/kernel/cpu/perf_event_intel.c | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c index 461f62b..7217d84 100644 --- a/arch/x86/kernel/cpu/perf_event_amd.c +++ b/arch/x86/kernel/cpu/perf_event_amd.c @@ -320,6 +320,7 @@ static void amd_pmu_cpu_starting(int cpu) { struct cpu_hw_events *cpuc = &per_cpu(cpu_hw_events, cpu); struct amd_nb *nb; + const struct cpumask *mask = cpu_coregroup_mask(cpu); int i, nb_id; if (boot_cpu_data.x86_max_cores < 2) @@ -328,7 +329,7 @@ static void amd_pmu_cpu_starting(int cpu) nb_id = amd_get_nb_id(cpu); WARN_ON_ONCE(nb_id == BAD_APICID); - for_each_online_cpu(i) { + for_each_cpu(i, mask) { nb = per_cpu(cpu_hw_events, i).amd_nb; if (WARN_ON_ONCE(!nb)) continue; diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c index 059c0ab..5540d35 100644 --- a/arch/x86/kernel/cpu/perf_event_intel.c +++ b/arch/x86/kernel/cpu/perf_event_intel.c @@ -1123,7 +1123,7 @@ static void intel_pmu_cpu_starting(int cpu) if (!ht_capable()) return; - for_each_online_cpu(i) { + for_each_cpu(i, topology_thread_cpumask(cpu)) { struct intel_percore *pc = per_cpu(cpu_hw_events, i).per_core; if (pc && pc->core_id == core_id) { -- 1.7.3 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 5/6] perf: Optimise topology iteration 2011-02-20 16:57 [PATCH 5/6] perf: Optimise topology iteration Lin Ming @ 2011-02-20 21:15 ` Andi Kleen 2011-02-21 3:29 ` Lin Ming 0 siblings, 1 reply; 5+ messages in thread From: Andi Kleen @ 2011-02-20 21:15 UTC (permalink / raw) To: Lin Ming Cc: Peter Zijlstra, Ingo Molnar, Stephane Eranian, Andi Kleen, linux-kernel On Mon, Feb 21, 2011 at 12:57:39AM +0800, Lin Ming wrote: > Currently we iterate the full machine looking for a matching core_id/nb > for the percore and the amd northbridge stuff , using a smaller topology > mask makes sense. This is still wrong for CPU hotplug. The CPU "owning" the per core does not necessarily need to be online anymore. Please drop this patch. -Andi ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 5/6] perf: Optimise topology iteration 2011-02-20 21:15 ` Andi Kleen @ 2011-02-21 3:29 ` Lin Ming 2011-02-21 3:32 ` Andi Kleen 0 siblings, 1 reply; 5+ messages in thread From: Lin Ming @ 2011-02-21 3:29 UTC (permalink / raw) To: Andi Kleen; +Cc: Peter Zijlstra, Ingo Molnar, Stephane Eranian, linux-kernel On Mon, 2011-02-21 at 05:15 +0800, Andi Kleen wrote: > On Mon, Feb 21, 2011 at 12:57:39AM +0800, Lin Ming wrote: > > Currently we iterate the full machine looking for a matching core_id/nb > > for the percore and the amd northbridge stuff , using a smaller topology > > mask makes sense. > > This is still wrong for CPU hotplug. The CPU "owning" the per core > does not necessarily need to be online anymore. This is remain issue for hotplug case, no matter we use for_each_online_cpu or topology_thread_cpumask. > Please drop this patch. Re-look at the code, I think for_each_online_cpu is wrong for percore, we should use topology_thread_cpumask instead. for_each_online_cpu(i) { struct intel_percore *pc = per_cpu(cpu_hw_events, i).per_core; if (pc && pc->core_id == core_id) { kfree(cpuc->per_core); cpuc->per_core = pc; break; } } Assume 2 sockets, //socket 0 cpu 0: core_id 0 cpu 1: core_id 0 //socket 1 cpu 2: core_id 0 cpu 3: core_id 0 If for_each_online_cpu is used, apparently 4 logical cpus will share the same percore. This is wrong. If topology_thread_cpumask is used, then cpu0 and cpu1 share one percore and cpu2 and cpu3 share another percore. This is what we want. Lin Ming > > -Andi ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 5/6] perf: Optimise topology iteration 2011-02-21 3:29 ` Lin Ming @ 2011-02-21 3:32 ` Andi Kleen 2011-02-21 5:01 ` Lin Ming 0 siblings, 1 reply; 5+ messages in thread From: Andi Kleen @ 2011-02-21 3:32 UTC (permalink / raw) To: Lin Ming Cc: Andi Kleen, Peter Zijlstra, Ingo Molnar, Stephane Eranian, linux-kernel On Mon, Feb 21, 2011 at 11:29:24AM +0800, Lin Ming wrote: > On Mon, 2011-02-21 at 05:15 +0800, Andi Kleen wrote: > > On Mon, Feb 21, 2011 at 12:57:39AM +0800, Lin Ming wrote: > > > Currently we iterate the full machine looking for a matching core_id/nb > > > for the percore and the amd northbridge stuff , using a smaller topology > > > mask makes sense. > > > > This is still wrong for CPU hotplug. The CPU "owning" the per core > > does not necessarily need to be online anymore. > > This is remain issue for hotplug case, no matter we use > for_each_online_cpu or topology_thread_cpumask. The original code I submitted used for_each_possible_cpu which is correct. > > > Please drop this patch. > > Re-look at the code, I think for_each_online_cpu is wrong for percore, > we should use topology_thread_cpumask instead. No, that's also cleared on unplug. You really need the possible map and nothing else. -Andi ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 5/6] perf: Optimise topology iteration 2011-02-21 3:32 ` Andi Kleen @ 2011-02-21 5:01 ` Lin Ming 0 siblings, 0 replies; 5+ messages in thread From: Lin Ming @ 2011-02-21 5:01 UTC (permalink / raw) To: Andi Kleen; +Cc: Peter Zijlstra, Ingo Molnar, Stephane Eranian, linux-kernel On Mon, 2011-02-21 at 11:32 +0800, Andi Kleen wrote: > On Mon, Feb 21, 2011 at 11:29:24AM +0800, Lin Ming wrote: > > On Mon, 2011-02-21 at 05:15 +0800, Andi Kleen wrote: > > > On Mon, Feb 21, 2011 at 12:57:39AM +0800, Lin Ming wrote: > > > > Currently we iterate the full machine looking for a matching core_id/nb > > > > for the percore and the amd northbridge stuff , using a smaller topology > > > > mask makes sense. > > > > > > This is still wrong for CPU hotplug. The CPU "owning" the per core > > > does not necessarily need to be online anymore. > > > > This is remain issue for hotplug case, no matter we use > > for_each_online_cpu or topology_thread_cpumask. > > The original code I submitted used for_each_possible_cpu which > is correct. > > > > > > Please drop this patch. > > > > Re-look at the code, I think for_each_online_cpu is wrong for percore, > > we should use topology_thread_cpumask instead. > > No, that's also cleared on unplug. You really need the possible map > and nothing else. That's wrong for kernel initialization, not related to hotplug. I wrote a simple debug patch, diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c index f152930..913a8a5 100644 --- a/arch/x86/kernel/cpu/perf_event_intel.c +++ b/arch/x86/kernel/cpu/perf_event_intel.c @@ -1123,7 +1123,7 @@ static void intel_pmu_cpu_starting(int cpu) if (!ht_capable()) return; - for_each_cpu(i, topology_thread_cpumask(cpu)) { + for_each_possible_cpu(i) { struct intel_percore *pc = per_cpu(cpu_hw_events, i).per_core; if (pc && pc->core_id == core_id) { @@ -1135,6 +1135,9 @@ static void intel_pmu_cpu_starting(int cpu) cpuc->per_core->core_id = core_id; cpuc->per_core->refcnt++; + + printk("DEBUG: cpu%d, per_core %p, core_id: %d, ref_count: %d\n", + cpu, cpuc->per_core, cpuc->per_core->core_id, cpuc->per_core->refcnt); } static void intel_pmu_cpu_dying(int cpu) The output as below, DEBUG: cpu0, per_core ffff8801bec32600, core_id: 0, ref_count: 1 DEBUG: cpu1, per_core ffff8801bec32600, core_id: 0, ref_count: 2 DEBUG: cpu2, per_core ffff8801bec32a20, core_id: 1, ref_count: 1 DEBUG: cpu3, per_core ffff8801bec32a20, core_id: 1, ref_count: 2 DEBUG: cpu4, per_core ffff8801bec32de0, core_id: 2, ref_count: 1 DEBUG: cpu5, per_core ffff8801bec32de0, core_id: 2, ref_count: 2 DEBUG: cpu6, per_core ffff8801becfc120, core_id: 3, ref_count: 1 DEBUG: cpu7, per_core ffff8801becfc120, core_id: 3, ref_count: 2 DEBUG: cpu8, per_core ffff8801bec32600, core_id: 0, ref_count: 3 DEBUG: cpu9, per_core ffff8801bec32600, core_id: 0, ref_count: 4 DEBUG: cpu10, per_core ffff8801bec32a20, core_id: 1, ref_count: 3 DEBUG: cpu11, per_core ffff8801bec32a20, core_id: 1, ref_count: 4 DEBUG: cpu12, per_core ffff8801bec32de0, core_id: 2, ref_count: 3 DEBUG: cpu13, per_core ffff8801bec32de0, core_id: 2, ref_count: 4 DEBUG: cpu14, per_core ffff8801becfc120, core_id: 3, ref_count: 3 DEBUG: cpu15, per_core ffff8801becfc120, core_id: 3, ref_count: 4 As you can see, cpu0, cpu1, cpu8 and cpu9 share the same per_core(ffff8801bec32600). This is wrong. cpu0 and cpu8 should share one pef_core, cpu1 and cpu9 share another per_core. > > -Andi ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-02-21 5:00 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-02-20 16:57 [PATCH 5/6] perf: Optimise topology iteration Lin Ming 2011-02-20 21:15 ` Andi Kleen 2011-02-21 3:29 ` Lin Ming 2011-02-21 3:32 ` Andi Kleen 2011-02-21 5:01 ` Lin Ming
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox