[PATCH 5/6] perf: Optimise topology iteration

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 5/6] perf: Optimise topology iteration
@ 2011-02-20 16:57 Lin Ming
  2011-02-20 21:15 ` Andi Kleen
  0 siblings, 1 reply; 5+ messages in thread
From: Lin Ming @ 2011-02-20 16:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Stephane Eranian, Andi Kleen; +Cc: linux-kernel

Currently we iterate the full machine looking for a matching core_id/nb
for the percore and the amd northbridge stuff , using a smaller topology
mask makes sense.

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
---
 arch/x86/kernel/cpu/perf_event_amd.c   |    3 ++-
 arch/x86/kernel/cpu/perf_event_intel.c |    2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index 461f62b..7217d84 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -320,6 +320,7 @@ static void amd_pmu_cpu_starting(int cpu)
 {
 	struct cpu_hw_events *cpuc = &per_cpu(cpu_hw_events, cpu);
 	struct amd_nb *nb;
+	const struct cpumask *mask = cpu_coregroup_mask(cpu);
 	int i, nb_id;
 
 	if (boot_cpu_data.x86_max_cores < 2)
@@ -328,7 +329,7 @@ static void amd_pmu_cpu_starting(int cpu)
 	nb_id = amd_get_nb_id(cpu);
 	WARN_ON_ONCE(nb_id == BAD_APICID);
 
-	for_each_online_cpu(i) {
+	for_each_cpu(i, mask) {
 		nb = per_cpu(cpu_hw_events, i).amd_nb;
 		if (WARN_ON_ONCE(!nb))
 			continue;
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 059c0ab..5540d35 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1123,7 +1123,7 @@ static void intel_pmu_cpu_starting(int cpu)
 	if (!ht_capable())
 		return;
 
-	for_each_online_cpu(i) {
+	for_each_cpu(i, topology_thread_cpumask(cpu)) {
 		struct intel_percore *pc = per_cpu(cpu_hw_events, i).per_core;
 
 		if (pc && pc->core_id == core_id) {
-- 
1.7.3




^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 5/6] perf: Optimise topology iteration
  2011-02-20 16:57 [PATCH 5/6] perf: Optimise topology iteration Lin Ming
@ 2011-02-20 21:15 ` Andi Kleen
  2011-02-21  3:29   ` Lin Ming
  0 siblings, 1 reply; 5+ messages in thread
From: Andi Kleen @ 2011-02-20 21:15 UTC (permalink / raw)
  To: Lin Ming
  Cc: Peter Zijlstra, Ingo Molnar, Stephane Eranian, Andi Kleen,
	linux-kernel

On Mon, Feb 21, 2011 at 12:57:39AM +0800, Lin Ming wrote:
> Currently we iterate the full machine looking for a matching core_id/nb
> for the percore and the amd northbridge stuff , using a smaller topology
> mask makes sense.

This is still wrong for CPU hotplug. The CPU "owning" the per core
does not necessarily need to be online anymore.
Please drop this patch.

-Andi

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 5/6] perf: Optimise topology iteration
  2011-02-20 21:15 ` Andi Kleen
@ 2011-02-21  3:29   ` Lin Ming
  2011-02-21  3:32     ` Andi Kleen
  0 siblings, 1 reply; 5+ messages in thread
From: Lin Ming @ 2011-02-21  3:29 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Peter Zijlstra, Ingo Molnar, Stephane Eranian, linux-kernel

On Mon, 2011-02-21 at 05:15 +0800, Andi Kleen wrote:
> On Mon, Feb 21, 2011 at 12:57:39AM +0800, Lin Ming wrote:
> > Currently we iterate the full machine looking for a matching core_id/nb
> > for the percore and the amd northbridge stuff , using a smaller topology
> > mask makes sense.
> 
> This is still wrong for CPU hotplug. The CPU "owning" the per core
> does not necessarily need to be online anymore.

This is remain issue for hotplug case, no matter we use
for_each_online_cpu or topology_thread_cpumask.

> Please drop this patch.

Re-look at the code, I think for_each_online_cpu is wrong for percore,
we should use topology_thread_cpumask instead.

for_each_online_cpu(i) {
	struct intel_percore *pc = per_cpu(cpu_hw_events, i).per_core;

	if (pc && pc->core_id == core_id) {       
		kfree(cpuc->per_core);
		cpuc->per_core = pc;              
		break;
	}
}

Assume 2 sockets,

//socket 0
cpu 0: core_id 0
cpu 1: core_id 0

//socket 1
cpu 2: core_id 0
cpu 3: core_id 0

If for_each_online_cpu is used, apparently 4 logical cpus will share the
same percore. This is wrong.

If topology_thread_cpumask is used, then cpu0 and cpu1 share one percore
and cpu2 and cpu3 share another percore. This is what we want.

Lin Ming

> 
> -Andi



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 5/6] perf: Optimise topology iteration
  2011-02-21  3:29   ` Lin Ming
@ 2011-02-21  3:32     ` Andi Kleen
  2011-02-21  5:01       ` Lin Ming
  0 siblings, 1 reply; 5+ messages in thread
From: Andi Kleen @ 2011-02-21  3:32 UTC (permalink / raw)
  To: Lin Ming
  Cc: Andi Kleen, Peter Zijlstra, Ingo Molnar, Stephane Eranian,
	linux-kernel

On Mon, Feb 21, 2011 at 11:29:24AM +0800, Lin Ming wrote:
> On Mon, 2011-02-21 at 05:15 +0800, Andi Kleen wrote:
> > On Mon, Feb 21, 2011 at 12:57:39AM +0800, Lin Ming wrote:
> > > Currently we iterate the full machine looking for a matching core_id/nb
> > > for the percore and the amd northbridge stuff , using a smaller topology
> > > mask makes sense.
> > 
> > This is still wrong for CPU hotplug. The CPU "owning" the per core
> > does not necessarily need to be online anymore.
> 
> This is remain issue for hotplug case, no matter we use
> for_each_online_cpu or topology_thread_cpumask.

The original code I submitted used for_each_possible_cpu which
is correct.

> 
> > Please drop this patch.
> 
> Re-look at the code, I think for_each_online_cpu is wrong for percore,
> we should use topology_thread_cpumask instead.

No, that's also cleared on unplug. You really need the possible map
and nothing else.

-Andi

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 5/6] perf: Optimise topology iteration
  2011-02-21  3:32     ` Andi Kleen
@ 2011-02-21  5:01       ` Lin Ming
  0 siblings, 0 replies; 5+ messages in thread
From: Lin Ming @ 2011-02-21  5:01 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Peter Zijlstra, Ingo Molnar, Stephane Eranian, linux-kernel

On Mon, 2011-02-21 at 11:32 +0800, Andi Kleen wrote:
> On Mon, Feb 21, 2011 at 11:29:24AM +0800, Lin Ming wrote:
> > On Mon, 2011-02-21 at 05:15 +0800, Andi Kleen wrote:
> > > On Mon, Feb 21, 2011 at 12:57:39AM +0800, Lin Ming wrote:
> > > > Currently we iterate the full machine looking for a matching core_id/nb
> > > > for the percore and the amd northbridge stuff , using a smaller topology
> > > > mask makes sense.
> > > 
> > > This is still wrong for CPU hotplug. The CPU "owning" the per core
> > > does not necessarily need to be online anymore.
> > 
> > This is remain issue for hotplug case, no matter we use
> > for_each_online_cpu or topology_thread_cpumask.
> 
> The original code I submitted used for_each_possible_cpu which
> is correct.
> 
> > 
> > > Please drop this patch.
> > 
> > Re-look at the code, I think for_each_online_cpu is wrong for percore,
> > we should use topology_thread_cpumask instead.
> 
> No, that's also cleared on unplug. You really need the possible map
> and nothing else.

That's wrong for kernel initialization, not related to hotplug.

I wrote a simple debug patch,

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index f152930..913a8a5 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1123,7 +1123,7 @@ static void intel_pmu_cpu_starting(int cpu)
 	if (!ht_capable())
 		return;
 
-	for_each_cpu(i, topology_thread_cpumask(cpu)) {
+	for_each_possible_cpu(i) {
 		struct intel_percore *pc = per_cpu(cpu_hw_events, i).per_core;
 
 		if (pc && pc->core_id == core_id) {
@@ -1135,6 +1135,9 @@ static void intel_pmu_cpu_starting(int cpu)
 
 	cpuc->per_core->core_id = core_id;
 	cpuc->per_core->refcnt++;
+
+	printk("DEBUG: cpu%d, per_core %p, core_id: %d, ref_count: %d\n",
+		cpu, cpuc->per_core, cpuc->per_core->core_id, cpuc->per_core->refcnt);
 }
 
 static void intel_pmu_cpu_dying(int cpu)

The output as below,

DEBUG: cpu0, per_core ffff8801bec32600, core_id: 0, ref_count: 1
DEBUG: cpu1, per_core ffff8801bec32600, core_id: 0, ref_count: 2
DEBUG: cpu2, per_core ffff8801bec32a20, core_id: 1, ref_count: 1
DEBUG: cpu3, per_core ffff8801bec32a20, core_id: 1, ref_count: 2
DEBUG: cpu4, per_core ffff8801bec32de0, core_id: 2, ref_count: 1
DEBUG: cpu5, per_core ffff8801bec32de0, core_id: 2, ref_count: 2
DEBUG: cpu6, per_core ffff8801becfc120, core_id: 3, ref_count: 1
DEBUG: cpu7, per_core ffff8801becfc120, core_id: 3, ref_count: 2
DEBUG: cpu8, per_core ffff8801bec32600, core_id: 0, ref_count: 3
DEBUG: cpu9, per_core ffff8801bec32600, core_id: 0, ref_count: 4
DEBUG: cpu10, per_core ffff8801bec32a20, core_id: 1, ref_count: 3
DEBUG: cpu11, per_core ffff8801bec32a20, core_id: 1, ref_count: 4
DEBUG: cpu12, per_core ffff8801bec32de0, core_id: 2, ref_count: 3
DEBUG: cpu13, per_core ffff8801bec32de0, core_id: 2, ref_count: 4
DEBUG: cpu14, per_core ffff8801becfc120, core_id: 3, ref_count: 3
DEBUG: cpu15, per_core ffff8801becfc120, core_id: 3, ref_count: 4

As you can see, cpu0, cpu1, cpu8 and cpu9 share the same per_core(ffff8801bec32600).
This is wrong.

cpu0 and cpu8 should share one pef_core, cpu1 and cpu9 share another per_core.

> 
> -Andi



^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-02-21  5:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-20 16:57 [PATCH 5/6] perf: Optimise topology iteration Lin Ming
2011-02-20 21:15 ` Andi Kleen
2011-02-21  3:29   ` Lin Ming
2011-02-21  3:32     ` Andi Kleen
2011-02-21  5:01       ` Lin Ming

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox