public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Oprofile [still] doesn't work on 2.6.28-rc4 on certain CPU
@ 2008-11-13 20:57 Jiri Kosina
  2008-11-13 21:24 ` Ingo Molnar
  0 siblings, 1 reply; 19+ messages in thread
From: Jiri Kosina @ 2008-11-13 20:57 UTC (permalink / raw)
  To: Andi Kleen, Robert Richter
  Cc: oprofile-list, Jiri Benc, Vilem Marsik, Eric Dumazet, Ingo Molnar,
	Pekka Enberg, linux-kernel

Hi,

there has been a bug [1] recently which caused NMIs to be stuck on certain 
Intel CPUs, resultung in oprofile not working.

This has been fixed by 

	commit 7c64ade53a6f977d73f16243865c42ceae999aea
	Author: Andi Kleen <andi@firstfloor.org>
	Date:   Fri Nov 7 14:02:49 2008 +0100

		oprofile: Fix p6 counter overflow check

but we are seeing the very same symptoms still on the following CPU, even 
with 2.6.28-rc4 (which includes 7c64ade5):

	processor       : 0
	vendor_id       : AuthenticAMD
	cpu family      : 15
	model           : 47
	model name      : AMD Athlon(tm) 64 Processor 3000+
	stepping        : 0

After starting oprofile, no NMI is delivered at all:

	vulture:~ # opcontrol --start
	Using 2.6+ OProfile kernel interface.
	Reading module info.
	Using log file /var/lib/oprofile/samples/oprofiled.log
	Daemon started.
	Profiler running.
	vulture:~ # grep NMI /proc/interrupts
	NMI:          0   Non-maskable interrupts
	vulture:~ # grep NMI /proc/interrupts
	NMI:          0   Non-maskable interrupts

on a quite similar CPU, everything seems to be working smoothly though 
with the very same kernel -- NMIs are being delivered, and oprofile 
performs proper profiling:

	processor       : 0
	vendor_id       : AuthenticAMD
	cpu family      : 15
	model           : 4
	model name      : AMD Athlon(tm) 64 Processor 3000+
	stepping        : 8

I haven't yet found a time to start bisecting this.

[1] http://lkml.org/lkml/2008/10/30/319

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oprofile [still] doesn't work on 2.6.28-rc4 on certain CPU
  2008-11-13 20:57 Oprofile [still] doesn't work on 2.6.28-rc4 on certain CPU Jiri Kosina
@ 2008-11-13 21:24 ` Ingo Molnar
  2008-11-13 21:27   ` Jiri Kosina
  2008-11-13 21:33   ` Maynard Johnson
  0 siblings, 2 replies; 19+ messages in thread
From: Ingo Molnar @ 2008-11-13 21:24 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Andi Kleen, Robert Richter, oprofile-list, Jiri Benc,
	Vilem Marsik, Eric Dumazet, Pekka Enberg, linux-kernel


* Jiri Kosina <jkosina@suse.cz> wrote:

> I haven't yet found a time to start bisecting this.

Would be nice to identify a commit to revert - in case we run out of 
time fixing it.

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oprofile [still] doesn't work on 2.6.28-rc4 on certain CPU
  2008-11-13 21:24 ` Ingo Molnar
@ 2008-11-13 21:27   ` Jiri Kosina
  2008-11-13 21:37     ` Ingo Molnar
  2008-11-14 10:09     ` Oprofile [still] doesn't work on 2.6.28-rc4 on certain CPU Robert Richter
  2008-11-13 21:33   ` Maynard Johnson
  1 sibling, 2 replies; 19+ messages in thread
From: Jiri Kosina @ 2008-11-13 21:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Robert Richter, oprofile-list, Jiri Benc,
	Vilem Marsik, Eric Dumazet, Pekka Enberg, linux-kernel

On Thu, 13 Nov 2008, Ingo Molnar wrote:

> > I haven't yet found a time to start bisecting this.
> Would be nice to identify a commit to revert - in case we run out of 
> time fixing it.

Yup, I first wanted to make this known to the public in hope that it will 
ring a bell somewhere.

If noone sees an obvous reason for this, I will do my best to bisect this 
tomorrow.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oprofile [still] doesn't work on 2.6.28-rc4 on certain CPU
  2008-11-13 21:24 ` Ingo Molnar
  2008-11-13 21:27   ` Jiri Kosina
@ 2008-11-13 21:33   ` Maynard Johnson
  1 sibling, 0 replies; 19+ messages in thread
From: Maynard Johnson @ 2008-11-13 21:33 UTC (permalink / raw)
  To: Ingo Molnar, Yeh, Jason, Robert Richter
  Cc: Jiri Kosina, Pekka Enberg, linux-kernel, Eric Dumazet, Jiri Benc,
	Andi Kleen, oprofile-list, Vilem Marsik

Ingo Molnar wrote:
> * Jiri Kosina <jkosina@suse.cz> wrote:
> 
>> I haven't yet found a time to start bisecting this.
> 
> Would be nice to identify a commit to revert - in case we run out of 
> time fixing it.
Robert, Jason,
Can one of you take a look at this?  It appears to be an AMD issue right now. 
The first message in the thread can be found at:
http://lkml.org/lkml/2008/11/13/336

Thanks.
-Maynard
> 
> 	Ingo
> 
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> oprofile-list mailing list
> oprofile-list@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/oprofile-list


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oprofile [still] doesn't work on 2.6.28-rc4 on certain CPU
  2008-11-13 21:27   ` Jiri Kosina
@ 2008-11-13 21:37     ` Ingo Molnar
  2008-11-13 21:49       ` Eric Dumazet
  2008-11-14 10:09     ` Oprofile [still] doesn't work on 2.6.28-rc4 on certain CPU Robert Richter
  1 sibling, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2008-11-13 21:37 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Andi Kleen, Robert Richter, oprofile-list, Jiri Benc,
	Vilem Marsik, Eric Dumazet, Pekka Enberg, linux-kernel


* Jiri Kosina <jkosina@suse.cz> wrote:

> On Thu, 13 Nov 2008, Ingo Molnar wrote:
> 
> > > I haven't yet found a time to start bisecting this.
> > Would be nice to identify a commit to revert - in case we run out of 
> > time fixing it.
> 
> Yup, I first wanted to make this known to the public in hope that it 
> will ring a bell somewhere.
> 
> If noone sees an obvous reason for this, I will do my best to bisect 
> this tomorrow.

We've got the one patch below pending, but that's not for AMD cpus so 
it shouldnt impact your case.

But ... some change made it all much more fragile. I'm curious why 
things became more fragile.

	Ingo

--------------->
Subject: oprofile: un-mask APIC before resetting counter in ppro_check_ctrs()
From: Eric Dumazet <dada1@cosmosbay.com>
Date: Tue, 11 Nov 2008 09:32:12 +0100

While using oprofile on my HP BL460c G1, (two quad core intel E5450 CPU),
I noticed that one CPU after the other could not get anymore NMI.

After a while, all cores where blocked (ie not generating events for oprofile)
I tried all major linux versions and all where affected by this freeze.

I found that we have to un-mask APIC *before* writing to MSR counter
when we get event notification, because we use APIC_LVTPC in edge triggered mode.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/oprofile/op_model_ppro.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

Index: tip/arch/x86/oprofile/op_model_ppro.c
===================================================================
--- tip.orig/arch/x86/oprofile/op_model_ppro.c
+++ tip/arch/x86/oprofile/op_model_ppro.c
@@ -126,6 +126,12 @@ static int ppro_check_ctrs(struct pt_reg
 	u64 val;
 	int i;
 
+	/*
+	 * We need to unmask the apic vector *before* writing reset_value
+	 * to msr counter, because we use edge trigger
+	 */
+	apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);
+
 	for (i = 0 ; i < num_counters; ++i) {
 		if (!reset_value[i])
 			continue;
@@ -136,10 +142,6 @@ static int ppro_check_ctrs(struct pt_reg
 		}
 	}
 
-	/* Only P6 based Pentium M need to re-unmask the apic vector but it
-	 * doesn't hurt other P6 variant */
-	apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);
-
 	/* We can't work out if we really handled an interrupt. We
 	 * might have caught a *second* counter just after overflowing
 	 * the interrupt for this counter then arrives

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oprofile [still] doesn't work on 2.6.28-rc4 on certain CPU
  2008-11-13 21:37     ` Ingo Molnar
@ 2008-11-13 21:49       ` Eric Dumazet
  2008-11-14 15:25         ` Oprofile : need to adjust PC by 16 bytes Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2008-11-13 21:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jiri Kosina, Andi Kleen, Robert Richter, oprofile-list, Jiri Benc,
	Vilem Marsik, Pekka Enberg, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3372 bytes --]

Ingo Molnar a écrit :
> * Jiri Kosina <jkosina@suse.cz> wrote:
> 
>> On Thu, 13 Nov 2008, Ingo Molnar wrote:
>>
>>>> I haven't yet found a time to start bisecting this.
>>> Would be nice to identify a commit to revert - in case we run out of 
>>> time fixing it.
>> Yup, I first wanted to make this known to the public in hope that it 
>> will ring a bell somewhere.
>>
>> If noone sees an obvous reason for this, I will do my best to bisect 
>> this tomorrow.
> 
> We've got the one patch below pending, but that's not for AMD cpus so 
> it shouldnt impact your case.
> 
> But ... some change made it all much more fragile. I'm curious why 
> things became more fragile.
> 
> 	Ingo
> 
> --------------->
> Subject: oprofile: un-mask APIC before resetting counter in ppro_check_ctrs()
> From: Eric Dumazet <dada1@cosmosbay.com>
> Date: Tue, 11 Nov 2008 09:32:12 +0100
> 
> While using oprofile on my HP BL460c G1, (two quad core intel E5450 CPU),
> I noticed that one CPU after the other could not get anymore NMI.
> 
> After a while, all cores where blocked (ie not generating events for oprofile)
> I tried all major linux versions and all where affected by this freeze.
> 
> I found that we have to un-mask APIC *before* writing to MSR counter
> when we get event notification, because we use APIC_LVTPC in edge triggered mode.
> 
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
>  arch/x86/oprofile/op_model_ppro.c |   10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> Index: tip/arch/x86/oprofile/op_model_ppro.c
> ===================================================================
> --- tip.orig/arch/x86/oprofile/op_model_ppro.c
> +++ tip/arch/x86/oprofile/op_model_ppro.c
> @@ -126,6 +126,12 @@ static int ppro_check_ctrs(struct pt_reg
>  	u64 val;
>  	int i;
>  
> +	/*
> +	 * We need to unmask the apic vector *before* writing reset_value
> +	 * to msr counter, because we use edge trigger
> +	 */
> +	apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);
> +
>  	for (i = 0 ; i < num_counters; ++i) {
>  		if (!reset_value[i])
>  			continue;
> @@ -136,10 +142,6 @@ static int ppro_check_ctrs(struct pt_reg
>  		}
>  	}
>  
> -	/* Only P6 based Pentium M need to re-unmask the apic vector but it
> -	 * doesn't hurt other P6 variant */
> -	apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);
> -
>  	/* We can't work out if we really handled an interrupt. We
>  	 * might have caught a *second* counter just after overflowing
>  	 * the interrupt for this counter then arrives

Just to clarify, I found this patch necessary for previous linux versions as well.

Maybe new CPUS from intel triggers a software bug, I dont know.


Also, I posted a patch about the kmalloc() of reset_value, I am not sure patch was pushed.

This one is a real bug.

[PATCH] oprofile: fix an overflow in ppro code

reset_value was changed from long to u64 in commit b99170288421c79f0c2efa8b33e26e65f4bb7fb8
(oprofile: Implement Intel architectural perfmon support)

But dynamic allocation of this array use a wrong type (long instead of u64)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
---
arch/x86/oprofile/op_model_ppro.c |    2 +-
1 files changed, 1 insertion(+), 1 deletion(-)


[-- Attachment #2: oprofile_ppro.patch --]
[-- Type: text/plain, Size: 481 bytes --]

diff --git a/arch/x86/oprofile/op_model_ppro.c b/arch/x86/oprofile/op_model_ppro.c
index 3f1b81a..716d26f 100644
--- a/arch/x86/oprofile/op_model_ppro.c
+++ b/arch/x86/oprofile/op_model_ppro.c
@@ -69,7 +69,7 @@ static void ppro_setup_ctrs(struct op_msrs const * const msrs)
 	int i;
 
 	if (!reset_value) {
-		reset_value = kmalloc(sizeof(unsigned) * num_counters,
+		reset_value = kmalloc(sizeof(reset_value[0]) * num_counters,
 					GFP_ATOMIC);
 		if (!reset_value)
 			return;

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Oprofile [still] doesn't work on 2.6.28-rc4 on certain CPU
  2008-11-13 21:27   ` Jiri Kosina
  2008-11-13 21:37     ` Ingo Molnar
@ 2008-11-14 10:09     ` Robert Richter
  2008-11-14 11:12       ` Jiri Kosina
  1 sibling, 1 reply; 19+ messages in thread
From: Robert Richter @ 2008-11-14 10:09 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Ingo Molnar, Andi Kleen, oprofile-list, Jiri Benc, Vilem Marsik,
	Eric Dumazet, Pekka Enberg, linux-kernel

On 13.11.08 22:27:52, Jiri Kosina wrote:
> On Thu, 13 Nov 2008, Ingo Molnar wrote:
> 
> > > I haven't yet found a time to start bisecting this.
> > Would be nice to identify a commit to revert - in case we run out of 
> > time fixing it.
> 
> Yup, I first wanted to make this known to the public in hope that it will 
> ring a bell somewhere.
> 
> If noone sees an obvous reason for this, I will do my best to bisect this 
> tomorrow.

Could you at least test with v2.6.27? I could not reproduce it on the
hardware I am using.

Thanks,

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center
email: robert.richter@amd.com


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oprofile [still] doesn't work on 2.6.28-rc4 on certain CPU
  2008-11-14 10:09     ` Oprofile [still] doesn't work on 2.6.28-rc4 on certain CPU Robert Richter
@ 2008-11-14 11:12       ` Jiri Kosina
  0 siblings, 0 replies; 19+ messages in thread
From: Jiri Kosina @ 2008-11-14 11:12 UTC (permalink / raw)
  To: Robert Richter
  Cc: Ingo Molnar, Andi Kleen, oprofile-list, Jiri Benc, Vilem Marsik,
	Eric Dumazet, Pekka Enberg, linux-kernel

On Fri, 14 Nov 2008, Robert Richter wrote:

> Could you at least test with v2.6.27? 

Oprofile in 2.6.27 doesn't work on this system, as far as I know.

> I could not reproduce it on the hardware I am using.

Yes, it seems to be very system-specific. As I have said in my original 
mail, the machine we are seeing this problem on is

        vendor_id       : AuthenticAMD
        cpu family      : 15
        model           : 47
        model name      : AMD Athlon(tm) 64 Processor 3000+
        stepping        : 0

but on the other hand on the following CPU, the oprofile works with 2.6.27 
and 2.6.28-rc4 without any problems, NMIs are coming flawlessly

        vendor_id       : AuthenticAMD
        cpu family      : 15
        model           : 4
        model name      : AMD Athlon(tm) 64 Processor 3000+
        stepping        : 8


-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Oprofile : need to adjust PC by 16 bytes
  2008-11-13 21:49       ` Eric Dumazet
@ 2008-11-14 15:25         ` Eric Dumazet
  2008-11-14 15:59           ` Andi Kleen
  2008-11-14 17:01           ` Mikael Pettersson
  0 siblings, 2 replies; 19+ messages in thread
From: Eric Dumazet @ 2008-11-14 15:25 UTC (permalink / raw)
  To: Andi Kleen, Robert Richter, oprofile-list
  Cc: Ingo Molnar, Jiri Kosina, Jiri Benc, Vilem Marsik, Pekka Enberg,
	linux-kernel

Very strange results on my Intel E5450 CPU

I found that I have to bias regs->ip by 16 bytes in order to have
correct profiling.

Does it rings a bell to anyone ?

opcontrol --version
opcontrol: oprofile 0.9.4 compiled on Nov  3 2008 06:45:57

Thank you

diff --git a/arch/x86/oprofile/op_model_ppro.c b/arch/x86/oprofile/op_model_ppro.c
index 3f1b81a..051fb16 100644
--- a/arch/x86/oprofile/op_model_ppro.c
+++ b/arch/x86/oprofile/op_model_ppro.c
@@ -131,7 +131,9 @@ static int ppro_check_ctrs(struct pt_regs * const regs,
                        continue;
                rdmsrl(msrs->counters[i].addr, val);
                if (CTR_OVERFLOWED(val)) {
+                       regs->ip -= 0x10;
                        oprofile_add_sample(regs, i);
+                       regs->ip += 0x10;
                        wrmsrl(msrs->counters[i].addr, -reset_value[i]);
                }
        }


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Oprofile : need to adjust PC by 16 bytes
  2008-11-14 15:25         ` Oprofile : need to adjust PC by 16 bytes Eric Dumazet
@ 2008-11-14 15:59           ` Andi Kleen
  2008-11-14 16:02             ` Eric Dumazet
  2008-11-14 17:01           ` Mikael Pettersson
  1 sibling, 1 reply; 19+ messages in thread
From: Andi Kleen @ 2008-11-14 15:59 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andi Kleen, Robert Richter, oprofile-list, Ingo Molnar,
	Jiri Kosina, Jiri Benc, Vilem Marsik, Pekka Enberg, linux-kernel

On Fri, Nov 14, 2008 at 04:25:51PM +0100, Eric Dumazet wrote:
> Very strange results on my Intel E5450 CPU
> 
> I found that I have to bias regs->ip by 16 bytes in order to have
> correct profiling.

Can you expand a bit how you determined if the profile is right or wrong?

-Andi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oprofile : need to adjust PC by 16 bytes
  2008-11-14 15:59           ` Andi Kleen
@ 2008-11-14 16:02             ` Eric Dumazet
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Dumazet @ 2008-11-14 16:02 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Robert Richter, oprofile-list, Ingo Molnar, Jiri Kosina,
	Jiri Benc, Vilem Marsik, Pekka Enberg, linux-kernel

Andi Kleen a écrit :
> On Fri, Nov 14, 2008 at 04:25:51PM +0100, Eric Dumazet wrote:
>> Very strange results on my Intel E5450 CPU
>>
>> I found that I have to bias regs->ip by 16 bytes in order to have
>> correct profiling.
> 
> Can you expand a bit how you determined if the profile is right or wrong?
> 
> -Andi
> 
> 

opannotate -a vmlinux > FILE

And check results.

With the patch I get :

c041d270 <dst_release>: /* dst_release total:  16523  1.2978 */
   252  0.0198 :c041d270:       push   %ebp
   654  0.0514 :c041d271:       test   %eax,%eax
               :c041d273:       mov    %esp,%ebp
    47  0.0037 :c041d275:       mov    %eax,%edx
    42  0.0033 :c041d277:       je     c041d289 <dst_release+0x19>
               :c041d279:       mov    $0xffffffff,%eax
     3 2.4e-04 :c041d27e:       lock xadd %eax,0x80(%edx)
 14766  1.1598 :c041d286:       dec    %eax
     1 7.9e-05 :c041d287:       js     c041d28b <dst_release+0x1b>
    53  0.0042 :c041d289:       pop    %ebp
   705  0.0554 :c041d28a:       ret
               :c041d28b:       mov    $0x10e,%edx
               :c041d290:       mov    $0xc0561ed5,%eax
               :c041d295:       call   c022d1a0 <warn_on_slowpath>
               :c041d29a:       pop    %ebp
               :c041d29b:       ret
               :c041d29c:       lea    0x0(%esi),%esi

This results please me : The cost is right after the "lock xadd" as expected.

Without patch, all addresses were off by 16 bytes.

On dst_release() it was on the call on warn_on_slowpath(),
and I am pretty sure it is not possible at all this WARN_ON is hit




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oprofile : need to adjust PC by 16 bytes
  2008-11-14 15:25         ` Oprofile : need to adjust PC by 16 bytes Eric Dumazet
  2008-11-14 15:59           ` Andi Kleen
@ 2008-11-14 17:01           ` Mikael Pettersson
  2008-11-14 17:21             ` Eric Dumazet
  1 sibling, 1 reply; 19+ messages in thread
From: Mikael Pettersson @ 2008-11-14 17:01 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andi Kleen, Robert Richter, oprofile-list, Ingo Molnar,
	Jiri Kosina, Jiri Benc, Vilem Marsik, Pekka Enberg, linux-kernel

Eric Dumazet writes:
 > Very strange results on my Intel E5450 CPU
 > 
 > I found that I have to bias regs->ip by 16 bytes in order to have
 > correct profiling.
 > 
 > Does it rings a bell to anyone ?

The term is "skid". All OO processors are susceptible to it,
as was known already to the Alpha "continous profiling"
performance counter folks.

And no, blindly subtracting 16 from IP is not a fix.

 > 
 > opcontrol --version
 > opcontrol: oprofile 0.9.4 compiled on Nov  3 2008 06:45:57
 > 
 > Thank you
 > 
 > diff --git a/arch/x86/oprofile/op_model_ppro.c b/arch/x86/oprofile/op_model_ppro.c
 > index 3f1b81a..051fb16 100644
 > --- a/arch/x86/oprofile/op_model_ppro.c
 > +++ b/arch/x86/oprofile/op_model_ppro.c
 > @@ -131,7 +131,9 @@ static int ppro_check_ctrs(struct pt_regs * const regs,
 >                         continue;
 >                 rdmsrl(msrs->counters[i].addr, val);
 >                 if (CTR_OVERFLOWED(val)) {
 > +                       regs->ip -= 0x10;
 >                         oprofile_add_sample(regs, i);
 > +                       regs->ip += 0x10;
 >                         wrmsrl(msrs->counters[i].addr, -reset_value[i]);
 >                 }
 >         }
 > 
 > --
 > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 > the body of a message to majordomo@vger.kernel.org
 > More majordomo info at  http://vger.kernel.org/majordomo-info.html
 > Please read the FAQ at  http://www.tux.org/lkml/
 > 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oprofile : need to adjust PC by 16 bytes
  2008-11-14 17:01           ` Mikael Pettersson
@ 2008-11-14 17:21             ` Eric Dumazet
  2008-11-14 17:50               ` Andi Kleen
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2008-11-14 17:21 UTC (permalink / raw)
  To: Mikael Pettersson
  Cc: Andi Kleen, Robert Richter, oprofile-list, Ingo Molnar,
	Jiri Kosina, Jiri Benc, Vilem Marsik, Pekka Enberg, linux-kernel

Mikael Pettersson a écrit :
> Eric Dumazet writes:
>  > Very strange results on my Intel E5450 CPU
>  > 
>  > I found that I have to bias regs->ip by 16 bytes in order to have
>  > correct profiling.
>  > 
>  > Does it rings a bell to anyone ?
> 
> The term is "skid". All OO processors are susceptible to it,
> as was known already to the Alpha "continous profiling"
> performance counter folks.

Cool. Maybe this can explain the other bug I have with oprofile.

> 
> And no, blindly subtracting 16 from IP is not a fix.

Who mentioned a fix ? I am only giving more fuel to Intel guys so they
hopefully can give us a working oprofile.


> 
>  > 
>  > opcontrol --version
>  > opcontrol: oprofile 0.9.4 compiled on Nov  3 2008 06:45:57
>  > 
>  > Thank you
>  > 
>  > diff --git a/arch/x86/oprofile/op_model_ppro.c b/arch/x86/oprofile/op_model_ppro.c
>  > index 3f1b81a..051fb16 100644
>  > --- a/arch/x86/oprofile/op_model_ppro.c
>  > +++ b/arch/x86/oprofile/op_model_ppro.c
>  > @@ -131,7 +131,9 @@ static int ppro_check_ctrs(struct pt_regs * const regs,
>  >                         continue;
>  >                 rdmsrl(msrs->counters[i].addr, val);
>  >                 if (CTR_OVERFLOWED(val)) {
>  > +                       regs->ip -= 0x10;
>  >                         oprofile_add_sample(regs, i);
>  > +                       regs->ip += 0x10;
>  >                         wrmsrl(msrs->counters[i].addr, -reset_value[i]);
>  >                 }
>  >         }
>  > 
>  > --



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oprofile : need to adjust PC by 16 bytes
  2008-11-14 17:21             ` Eric Dumazet
@ 2008-11-14 17:50               ` Andi Kleen
  2008-11-15 16:30                 ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Andi Kleen @ 2008-11-14 17:50 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Mikael Pettersson, Andi Kleen, Robert Richter, oprofile-list,
	Ingo Molnar, Jiri Kosina, Jiri Benc, Vilem Marsik, Pekka Enberg,
	linux-kernel

> >And no, blindly subtracting 16 from IP is not a fix.
> 
> Who mentioned a fix ? I am only giving more fuel to Intel guys so they
> hopefully can give us a working oprofile.

You would need to implement PEBS support to avoid that problem. But it's a big
task. perfmon2 implements it already.

-Andi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oprofile : need to adjust PC by 16 bytes
  2008-11-14 17:50               ` Andi Kleen
@ 2008-11-15 16:30                 ` Eric Dumazet
  2008-11-15 18:36                   ` Andi Kleen
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2008-11-15 16:30 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Mikael Pettersson, Robert Richter, oprofile-list, Ingo Molnar,
	Jiri Kosina, Jiri Benc, Vilem Marsik, Pekka Enberg, linux-kernel

Andi Kleen a écrit :
>>> And no, blindly subtracting 16 from IP is not a fix.
>> Who mentioned a fix ? I am only giving more fuel to Intel guys so they
>> hopefully can give us a working oprofile.
> 
> You would need to implement PEBS support to avoid that problem. But it's a big
> task. perfmon2 implements it already.
> 

Thanks for the information.

Hum, so I grabbed perfmon2 git tree, installed various tools...

I am quite new to pfmon and tried :

# pfmon --system-wide
sizeof=64 44
<press ENTER to stop session>

Then started "tbench 8", and got a kernel panic after 6 seconds.


I was using oprofile like this

opcontrol --vmlinux=/path/vmlinux --start
// doing some benchmarking...
opreport -l vmlinux | head -n 40


What would be a working equivalent for perfmon2 based tools ?

Thanks



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oprofile : need to adjust PC by 16 bytes
  2008-11-15 16:30                 ` Eric Dumazet
@ 2008-11-15 18:36                   ` Andi Kleen
  2008-11-17 15:02                     ` stephane eranian
  0 siblings, 1 reply; 19+ messages in thread
From: Andi Kleen @ 2008-11-15 18:36 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andi Kleen, Mikael Pettersson, Robert Richter, oprofile-list,
	Ingo Molnar, Jiri Kosina, Jiri Benc, Vilem Marsik, Pekka Enberg,
	linux-kernel, eranian

On Sat, Nov 15, 2008 at 05:30:58PM +0100, Eric Dumazet wrote:
> Andi Kleen a écrit :
> >>>And no, blindly subtracting 16 from IP is not a fix.
> >>Who mentioned a fix ? I am only giving more fuel to Intel guys so they
> >>hopefully can give us a working oprofile.
> >
> >You would need to implement PEBS support to avoid that problem. But it's a 
> >big
> >task. perfmon2 implements it already.
> >
> 
> Thanks for the information.
> 
> Hum, so I grabbed perfmon2 git tree, installed various tools...
> 
> I am quite new to pfmon and tried :
> 
> # pfmon --system-wide
> sizeof=64 44
> <press ENTER to stop session>
> 
> Then started "tbench 8", and got a kernel panic after 6 seconds.
> 
> 
> I was using oprofile like this
> 
> opcontrol --vmlinux=/path/vmlinux --start
> // doing some benchmarking...
> opreport -l vmlinux | head -n 40
> 
> 
> What would be a working equivalent for perfmon2 based tools ?

Probably getting a perfmon tree that works. I guess Stephane
can help (cc'ed).  Or just deal with imprecise events for now.

-Andi
-- 
ak@linux.intel.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oprofile : need to adjust PC by 16 bytes
  2008-11-15 18:36                   ` Andi Kleen
@ 2008-11-17 15:02                     ` stephane eranian
  2008-11-17 15:16                       ` Eric Dumazet
  2008-11-17 15:24                       ` Andi Kleen
  0 siblings, 2 replies; 19+ messages in thread
From: stephane eranian @ 2008-11-17 15:02 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Eric Dumazet, Mikael Pettersson, Robert Richter, oprofile-list,
	Ingo Molnar, Jiri Kosina, Jiri Benc, Vilem Marsik, Pekka Enberg,
	linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 3991 bytes --]

Hello,
I have not seen the beginning of that discussion so my comments may beslightly off.It seems Eric has problems with accuracy of instruction addresses whensampling with the PMU.
This is an inherent limitation of the PMU. It can be mitigated but notcompletely eliminated. The coreissue is that it takes several cycles between the moment a counteroverflows and posting of the PMUinterrupt. During that time, the CPU keeps on executing instructions.The interrupt IP you get, reflectsthe place you were when it triggered. That can be far away from whereit was posted and where thecounter actually overflowed. Of course, if you are stalled thatdistance is usually 0 or off by a smallnumber of instructions. But it can be very large when overflow happensduring a kernel critical sectionwhere interrupts are off. There is nothing SW can do about all of this.
Andi mentioned PEBS. I don't know if you are familiar with what itdoes. Let me summarize. This isa hardware/microcode feature which implements a hardware-managedbuffer where samples arestored. The OS points the CPU to a memory region where PMU samples arestored. No PMUinterrupt is generated until the buffer becomes full. That partaddresses some of the overheadassociated with interrupt-based sampling. Unfortunately, PEBS  doesnot point to the instruction wherethe counter overflowed, it will still be a few instructions off. Butthis time, you get the machine state at thelast retired instruction. Furthermore, PEBS can record samples whilein kernel critical sections. A limitationof PEBS is that it does not work with all the PMU events. Only ahandful are available.
As for perfmon, if you pull from the perfmon2 GIT tree, this shouldwork. Don't know what happen inyou case.
Perfmon and the pfmon can do simple counting or also collect profiles.
$ pfmon date
Counts cycles at the user level only for the process date
$ pfmon --system-wide -t10
Counts elapsed cycles at user level for all CPU for 10s. Results are per-cpu
$ pfmon --long-smpl-periods=240000 date
Collect a flat profile of process date. Period is 240,000 elapsed user cycle
$ pfmon --system-wide --long-smpl-periods=240000 -t 10
Collect a flat profile on each online CPU during 10s. Period is240,000 user elapsed cycles. Results are per-cpu
You have a lot  more examples on the perfmon web site, Following thedocumentation and pfmon users' guide.
Perfmon/pfmon can use PEBS on Intel Core processors. First step is toinsert the kernel module for it:   # modprobe perfmon_pebs_core_smpl
Then use pfmon, we use instruction_retired because elapsed cycles doesnot support PEBS:
   $ pfmon --smpl-module=pebs -einstructions_retired--long-smpl-periods=120000 date
Hope this helps.

On Sat, Nov 15, 2008 at 7:36 PM, Andi Kleen <andi@firstfloor.org> wrote:> On Sat, Nov 15, 2008 at 05:30:58PM +0100, Eric Dumazet wrote:>> Andi Kleen a écrit :>> >>>And no, blindly subtracting 16 from IP is not a fix.>> >>Who mentioned a fix ? I am only giving more fuel to Intel guys so they>> >>hopefully can give us a working oprofile.>> >>> >You would need to implement PEBS support to avoid that problem. But it's a>> >big>> >task. perfmon2 implements it already.>> >>>>> Thanks for the information.>>>> Hum, so I grabbed perfmon2 git tree, installed various tools...>>>> I am quite new to pfmon and tried :>>>> # pfmon --system-wide>> sizeof=64 44>> <press ENTER to stop session>>>>> Then started "tbench 8", and got a kernel panic after 6 seconds.>>>>>> I was using oprofile like this>>>> opcontrol --vmlinux=/path/vmlinux --start>> // doing some benchmarking...>> opreport -l vmlinux | head -n 40>>>>>> What would be a working equivalent for perfmon2 based tools ?>> Probably getting a perfmon tree that works. I guess Stephane> can help (cc'ed).  Or just deal with imprecise events for now.>> -Andi> --> ak@linux.intel.com>ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oprofile : need to adjust PC by 16 bytes
  2008-11-17 15:02                     ` stephane eranian
@ 2008-11-17 15:16                       ` Eric Dumazet
  2008-11-17 15:24                       ` Andi Kleen
  1 sibling, 0 replies; 19+ messages in thread
From: Eric Dumazet @ 2008-11-17 15:16 UTC (permalink / raw)
  To: eranian
  Cc: Andi Kleen, Mikael Pettersson, Robert Richter, oprofile-list,
	Ingo Molnar, Jiri Kosina, Jiri Benc, Vilem Marsik, Pekka Enberg,
	linux-kernel

stephane eranian a écrit :
> Hello,
> 
> I have not seen the beginning of that discussion so my comments may be
> slightly off.
> It seems Eric has problems with accuracy of instruction addresses when
> sampling with the PMU.
> 
> This is an inherent limitation of the PMU. It can be mitigated but not
> completely eliminated. The core
> issue is that it takes several cycles between the moment a counter
> overflows and posting of the PMU
> interrupt. During that time, the CPU keeps on executing instructions.
> The interrupt IP you get, reflects
> the place you were when it triggered. That can be far away from where
> it was posted and where the
> counter actually overflowed. Of course, if you are stalled that
> distance is usually 0 or off by a small
> number of instructions. But it can be very large when overflow happens
> during a kernel critical section
> where interrupts are off. There is nothing SW can do about all of this.
> 
> Andi mentioned PEBS. I don't know if you are familiar with what it
> does. Let me summarize. This is
> a hardware/microcode feature which implements a hardware-managed
> buffer where samples are
> stored. The OS points the CPU to a memory region where PMU samples are
> stored. No PMU
> interrupt is generated until the buffer becomes full. That part
> addresses some of the overhead
> associated with interrupt-based sampling. Unfortunately, PEBS  does
> not point to the instruction where
> the counter overflowed, it will still be a few instructions off. But
> this time, you get the machine state at the
> last retired instruction. Furthermore, PEBS can record samples while
> in kernel critical sections. A limitation
> of PEBS is that it does not work with all the PMU events. Only a
> handful are available.
> 
> As for perfmon, if you pull from the perfmon2 GIT tree, this should
> work. Don't know what happen in
> you case.
> 
> Perfmon and the pfmon can do simple counting or also collect profiles.
> 
> $ pfmon date
> 
> Counts cycles at the user level only for the process date
> 
> $ pfmon --system-wide -t10
> 
> Counts elapsed cycles at user level for all CPU for 10s. Results are per-cpu
> 
> $ pfmon --long-smpl-periods=240000 date
> 
> Collect a flat profile of process date. Period is 240,000 elapsed user cycle
> 
> $ pfmon --system-wide --long-smpl-periods=240000 -t 10
> 
> Collect a flat profile on each online CPU during 10s. Period is
> 240,000 user elapsed cycles. Results are per-cpu
> 
> You have a lot  more examples on the perfmon web site, Following the
> documentation and pfmon users' guide.
> 
> Perfmon/pfmon can use PEBS on Intel Core processors. First step is to
> insert the kernel module for it:
>    # modprobe perfmon_pebs_core_smpl
> 
> Then use pfmon, we use instruction_retired because elapsed cycles does
> not support PEBS:
> 
>    $ pfmon --smpl-module=pebs -einstructions_retired
> --long-smpl-periods=120000 date
> 
> Hope this helps.
> 

Sure it does ! Thanks a lot Stephane, I am going to try all this stuff.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oprofile : need to adjust PC by 16 bytes
  2008-11-17 15:02                     ` stephane eranian
  2008-11-17 15:16                       ` Eric Dumazet
@ 2008-11-17 15:24                       ` Andi Kleen
  1 sibling, 0 replies; 19+ messages in thread
From: Andi Kleen @ 2008-11-17 15:24 UTC (permalink / raw)
  To: eranian
  Cc: Andi Kleen, Eric Dumazet, Mikael Pettersson, Robert Richter,
	oprofile-list, Ingo Molnar, Jiri Kosina, Jiri Benc, Vilem Marsik,
	Pekka Enberg, linux-kernel

On Mon, Nov 17, 2008 at 04:02:19PM +0100, stephane eranian wrote:

Thanks for explaining it better than I could.

Just a minor correction.

> But it can be very large when overflow happens
> during a kernel critical section
> where interrupts are off.

Actually oprofile uses the NMI to avoid that particular problem.
It still has limits, e.g. no profiling of SMM sections.

> There is nothing SW can do about all of this.

Yes profiling on a out of order machine is hard.

> Furthermore, PEBS can record samples while
> in kernel critical sections. 

Again with oprofile that works too without PEBS.

-Andi

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2008-11-17 15:17 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-13 20:57 Oprofile [still] doesn't work on 2.6.28-rc4 on certain CPU Jiri Kosina
2008-11-13 21:24 ` Ingo Molnar
2008-11-13 21:27   ` Jiri Kosina
2008-11-13 21:37     ` Ingo Molnar
2008-11-13 21:49       ` Eric Dumazet
2008-11-14 15:25         ` Oprofile : need to adjust PC by 16 bytes Eric Dumazet
2008-11-14 15:59           ` Andi Kleen
2008-11-14 16:02             ` Eric Dumazet
2008-11-14 17:01           ` Mikael Pettersson
2008-11-14 17:21             ` Eric Dumazet
2008-11-14 17:50               ` Andi Kleen
2008-11-15 16:30                 ` Eric Dumazet
2008-11-15 18:36                   ` Andi Kleen
2008-11-17 15:02                     ` stephane eranian
2008-11-17 15:16                       ` Eric Dumazet
2008-11-17 15:24                       ` Andi Kleen
2008-11-14 10:09     ` Oprofile [still] doesn't work on 2.6.28-rc4 on certain CPU Robert Richter
2008-11-14 11:12       ` Jiri Kosina
2008-11-13 21:33   ` Maynard Johnson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox