public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [git pull] OProfile fixes for v2.6.28
@ 2008-11-07 17:13 Robert Richter
  2008-11-07 18:22 ` Ingo Molnar
  2008-11-10  8:05 ` Eric Dumazet
  0 siblings, 2 replies; 21+ messages in thread
From: Robert Richter @ 2008-11-07 17:13 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML

Hi Ingo,

please pull oprofile fixes for 2.6.28 for tip regression:

git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git oprofile-for-tip

Thanks,

-Robert

Andi Kleen (1):
      oprofile: Fix p6 counter overflow check

Carl Love (2):
      Cell OProfile: Incorrect local array size in activate spu profiling function
      Cell OProfile: Incorrect local array size in activate spu profiling function

Jesper Dangaard Brouer (1):
      Change UTF8 chars in Kconfig help text about Oprofile AMD barcelona

Nick Piggin (1):
      oprofile: fix memory ordering

Robert Richter (1):
      Revert "Cell OProfile: Incorrect local array size in activate spu profiling function"

-- 
Advanced Micro Devices, Inc.
Operating System Research Center
email: robert.richter@amd.com


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [git pull] OProfile fixes for v2.6.28
  2008-11-07 17:13 [git pull] OProfile fixes for v2.6.28 Robert Richter
@ 2008-11-07 18:22 ` Ingo Molnar
  2008-11-10  8:05 ` Eric Dumazet
  1 sibling, 0 replies; 21+ messages in thread
From: Ingo Molnar @ 2008-11-07 18:22 UTC (permalink / raw)
  To: Robert Richter; +Cc: LKML


* Robert Richter <robert.richter@amd.com> wrote:

> Hi Ingo,
> 
> please pull oprofile fixes for 2.6.28 for tip regression:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git oprofile-for-tip
> 
> Thanks,

pulled, thanks Robert!

	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [git pull] OProfile fixes for v2.6.28
  2008-11-07 17:13 [git pull] OProfile fixes for v2.6.28 Robert Richter
  2008-11-07 18:22 ` Ingo Molnar
@ 2008-11-10  8:05 ` Eric Dumazet
  2008-11-10  8:43   ` Andi Kleen
  2008-11-17 17:57   ` [git pull] OProfile fixes for v2.6.28 Robert Richter
  1 sibling, 2 replies; 21+ messages in thread
From: Eric Dumazet @ 2008-11-10  8:05 UTC (permalink / raw)
  To: Robert Richter; +Cc: Ingo Molnar, LKML, Andi Kleen

[-- Attachment #1: Type: text/plain, Size: 1542 bytes --]

Robert Richter a écrit :
> Hi Ingo,
> 
> please pull oprofile fixes for 2.6.28 for tip regression:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git oprofile-for-tip
> 
> Thanks,
> 
> -Robert
> 
> Andi Kleen (1):
>       oprofile: Fix p6 counter overflow check
> 
> Carl Love (2):
>       Cell OProfile: Incorrect local array size in activate spu profiling function
>       Cell OProfile: Incorrect local array size in activate spu profiling function
> 
> Jesper Dangaard Brouer (1):
>       Change UTF8 chars in Kconfig help text about Oprofile AMD barcelona
> 
> Nick Piggin (1):
>       oprofile: fix memory ordering
> 
> Robert Richter (1):
>       Revert "Cell OProfile: Incorrect local array size in activate spu profiling function"
> 

Hi Robert

I am still trying to find why oprofile stops after some millions of samples on my machine.

(All versions of linux supporting oprofile/NMI on this machine have this problem)
Note : I am using a 32bit kernel.

While doing code review I found this bug.

Impact: 32bit kernels, memory corruption

[PATCH] oprofile: fix an overflow in ppro code

reset_value was changed from long to u64 in commit b99170288421c79f0c2efa8b33e26e65f4bb7fb8
(oprofile: Implement Intel architectural perfmon support)

But dynamic allocation of this array use a wrong type (long instead of u64)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
---
 arch/x86/oprofile/op_model_ppro.c |    2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)


[-- Attachment #2: oprofile_ppro.patch --]
[-- Type: text/plain, Size: 481 bytes --]

diff --git a/arch/x86/oprofile/op_model_ppro.c b/arch/x86/oprofile/op_model_ppro.c
index 3f1b81a..716d26f 100644
--- a/arch/x86/oprofile/op_model_ppro.c
+++ b/arch/x86/oprofile/op_model_ppro.c
@@ -69,7 +69,7 @@ static void ppro_setup_ctrs(struct op_msrs const * const msrs)
 	int i;
 
 	if (!reset_value) {
-		reset_value = kmalloc(sizeof(unsigned) * num_counters,
+		reset_value = kmalloc(sizeof(reset_value[0]) * num_counters,
 					GFP_ATOMIC);
 		if (!reset_value)
 			return;

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [git pull] OProfile fixes for v2.6.28
  2008-11-10  8:05 ` Eric Dumazet
@ 2008-11-10  8:43   ` Andi Kleen
  2008-11-10  9:01     ` Eric Dumazet
  2008-11-10 14:23     ` [PATCH] oprofile: re-arm APIC_DM_NMI in ppro_check_ctrs() Eric Dumazet
  2008-11-17 17:57   ` [git pull] OProfile fixes for v2.6.28 Robert Richter
  1 sibling, 2 replies; 21+ messages in thread
From: Andi Kleen @ 2008-11-10  8:43 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Robert Richter, Ingo Molnar, LKML

Eric Dumazet <dada1@cosmosbay.com> writes:

> diff --git a/arch/x86/oprofile/op_model_ppro.c b/arch/x86/oprofile/op_model_ppro.c
> index 3f1b81a..716d26f 100644
> --- a/arch/x86/oprofile/op_model_ppro.c
> +++ b/arch/x86/oprofile/op_model_ppro.c
> @@ -69,7 +69,7 @@ static void ppro_setup_ctrs(struct op_msrs const * const msrs)
>  	int i;
>  
>  	if (!reset_value) {
> -		reset_value = kmalloc(sizeof(unsigned) * num_counters,
> +		reset_value = kmalloc(sizeof(reset_value[0]) * num_counters,

Thanks for tracking this down.

But that still doesn't explain why 2.6.27 fails too?

-Andi

-- 
ak@linux.intel.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [git pull] OProfile fixes for v2.6.28
  2008-11-10  8:43   ` Andi Kleen
@ 2008-11-10  9:01     ` Eric Dumazet
  2008-11-10 14:23     ` [PATCH] oprofile: re-arm APIC_DM_NMI in ppro_check_ctrs() Eric Dumazet
  1 sibling, 0 replies; 21+ messages in thread
From: Eric Dumazet @ 2008-11-10  9:01 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Robert Richter, Ingo Molnar, LKML

Andi Kleen a écrit :
> Eric Dumazet <dada1@cosmosbay.com> writes:
> 
>> diff --git a/arch/x86/oprofile/op_model_ppro.c b/arch/x86/oprofile/op_model_ppro.c
>> index 3f1b81a..716d26f 100644
>> --- a/arch/x86/oprofile/op_model_ppro.c
>> +++ b/arch/x86/oprofile/op_model_ppro.c
>> @@ -69,7 +69,7 @@ static void ppro_setup_ctrs(struct op_msrs const * const msrs)
>>  	int i;
>>  
>>  	if (!reset_value) {
>> -		reset_value = kmalloc(sizeof(unsigned) * num_counters,
>> +		reset_value = kmalloc(sizeof(reset_value[0]) * num_counters,
> 
> Thanks for tracking this down.
> 
> But that still doesn't explain why 2.6.27 fails too?

No :(

In fact 2.6.24 fails too. Oprofile never worked correctly.

Still investigating...


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] oprofile: re-arm APIC_DM_NMI in ppro_check_ctrs()
  2008-11-10  8:43   ` Andi Kleen
  2008-11-10  9:01     ` Eric Dumazet
@ 2008-11-10 14:23     ` Eric Dumazet
  2008-11-10 15:49       ` Andi Kleen
  2008-11-10 16:11       ` Cyrill Gorcunov
  1 sibling, 2 replies; 21+ messages in thread
From: Eric Dumazet @ 2008-11-10 14:23 UTC (permalink / raw)
  To: Andi Kleen, Robert Richter, Ingo Molnar; +Cc: LKML

[-- Attachment #1: Type: text/plain, Size: 1919 bytes --]

Andi Kleen a écrit :
> Eric Dumazet <dada1@cosmosbay.com> writes:
> 
>> diff --git a/arch/x86/oprofile/op_model_ppro.c b/arch/x86/oprofile/op_model_ppro.c
>> index 3f1b81a..716d26f 100644
>> --- a/arch/x86/oprofile/op_model_ppro.c
>> +++ b/arch/x86/oprofile/op_model_ppro.c
>> @@ -69,7 +69,7 @@ static void ppro_setup_ctrs(struct op_msrs const * const msrs)
>>  	int i;
>>  
>>  	if (!reset_value) {
>> -		reset_value = kmalloc(sizeof(unsigned) * num_counters,
>> +		reset_value = kmalloc(sizeof(reset_value[0]) * num_counters,
> 
> Thanks for tracking this down.
> 
> But that still doesn't explain why 2.6.27 fails too?

Desesperatly Seeking Oprofile, next round.

I know *nothing* about APIC but spent few hours to try several tricks
and finally found something.

It solved my problem : oprofile can run several hours without
any freeze of NMI on any core.

# grep NMI /proc/interrupts
NMI:   10902884    9635871   10333815    8372989    7971483    8298373    8877495   10206963   Non-maskable interrupts
...
# grep NMI /proc/interrupts
NMI:   15518834   14340713   15038694   13078235   12676585   13003394   13582115   14912146   Non-maskable interrupts


Can anybody understand and explain what is happening ?

Is it a software or hardware problem ?

[PATCH] oprofile: re-arm APIC_DM_NMI in ppro_check_ctrs()

While using oprofile on my HP BL460c G1, (two quad core intel E5450 CPU),
I noticed that one CPU after the other could not get anymore NMI.

After a while, all cores where blocked (ie not generating events for oprofile)
I tried all major linux versions and all where affected by this freeze.

I found that we have to re-arm APIC_DM_NMI *before* writing to MSR counter
when we get event notification.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
 arch/x86/oprofile/op_model_ppro.c |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

[-- Attachment #2: oprofile_msr.patch --]
[-- Type: text/plain, Size: 891 bytes --]

diff --git a/arch/x86/oprofile/op_model_ppro.c b/arch/x86/oprofile/op_model_ppro.c
index 3f1b81a..7b142da 100644
--- a/arch/x86/oprofile/op_model_ppro.c
+++ b/arch/x86/oprofile/op_model_ppro.c
@@ -132,13 +132,15 @@ static int ppro_check_ctrs(struct pt_regs * const regs,
 		rdmsrl(msrs->counters[i].addr, val);
 		if (CTR_OVERFLOWED(val)) {
 			oprofile_add_sample(regs, i);
+			/*
+			 * We need to unmask the apic vector *before*
+			 * writing reset_value to msr counter
+			 */
+			apic_write(APIC_LVTPC, APIC_DM_NMI);
 			wrmsrl(msrs->counters[i].addr, -reset_value[i]);
 		}
 	}
 
-	/* Only P6 based Pentium M need to re-unmask the apic vector but it
-	 * doesn't hurt other P6 variant */
-	apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);
 
 	/* We can't work out if we really handled an interrupt. We
 	 * might have caught a *second* counter just after overflowing

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] oprofile: re-arm APIC_DM_NMI in ppro_check_ctrs()
  2008-11-10 14:23     ` [PATCH] oprofile: re-arm APIC_DM_NMI in ppro_check_ctrs() Eric Dumazet
@ 2008-11-10 15:49       ` Andi Kleen
  2008-11-10 15:50         ` Eric Dumazet
  2008-11-11  8:32         ` Eric Dumazet
  2008-11-10 16:11       ` Cyrill Gorcunov
  1 sibling, 2 replies; 21+ messages in thread
From: Andi Kleen @ 2008-11-10 15:49 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Andi Kleen, Robert Richter, Ingo Molnar, LKML

>  			oprofile_add_sample(regs, i);
> +			/*
> +			 * We need to unmask the apic vector *before*
> +			 * writing reset_value to msr counter
> +			 */
> +			apic_write(APIC_LVTPC, APIC_DM_NMI);
>  			wrmsrl(msrs->counters[i].addr, -reset_value[i]);
>  		}
>  	}
>  
> -	/* Only P6 based Pentium M need to re-unmask the apic vector but it
> -	 * doesn't hurt other P6 variant */
> -	apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);

Did you also test if it really needs to be inside the if () or 
just before the wrmsrl? 

-Andi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] oprofile: re-arm APIC_DM_NMI in ppro_check_ctrs()
  2008-11-10 15:49       ` Andi Kleen
@ 2008-11-10 15:50         ` Eric Dumazet
  2008-11-10 17:46           ` Andi Kleen
  2008-11-11  8:32         ` Eric Dumazet
  1 sibling, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2008-11-10 15:50 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Robert Richter, Ingo Molnar, LKML

Andi Kleen a écrit :
>>  			oprofile_add_sample(regs, i);
>> +			/*
>> +			 * We need to unmask the apic vector *before*
>> +			 * writing reset_value to msr counter
>> +			 */
>> +			apic_write(APIC_LVTPC, APIC_DM_NMI);
>>  			wrmsrl(msrs->counters[i].addr, -reset_value[i]);
>>  		}
>>  	}
>>  
>> -	/* Only P6 based Pentium M need to re-unmask the apic vector but it
>> -	 * doesn't hurt other P6 variant */
>> -	apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);
> 
> Did you also test if it really needs to be inside the if () or 
> just before the wrmsrl? 


In my testings, the "if (CTR_OVERFLOWED())" condition is always true.

(I am profiling one event only)

I felt uncomfortable issuing the apic_write() before oprofile_add_sample()

Do you mean doing the apic_write() right at the beginning of the ppro_check_ctrs()
function ?


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] oprofile: re-arm APIC_DM_NMI in ppro_check_ctrs()
  2008-11-10 14:23     ` [PATCH] oprofile: re-arm APIC_DM_NMI in ppro_check_ctrs() Eric Dumazet
  2008-11-10 15:49       ` Andi Kleen
@ 2008-11-10 16:11       ` Cyrill Gorcunov
  2008-11-10 16:19         ` Eric Dumazet
  1 sibling, 1 reply; 21+ messages in thread
From: Cyrill Gorcunov @ 2008-11-10 16:11 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Andi Kleen, Robert Richter, Ingo Molnar, LKML

[Eric Dumazet - Mon, Nov 10, 2008 at 03:23:00PM +0100]
> Andi Kleen a écrit :
>> Eric Dumazet <dada1@cosmosbay.com> writes:
>>
>>> diff --git a/arch/x86/oprofile/op_model_ppro.c b/arch/x86/oprofile/op_model_ppro.c
>>> index 3f1b81a..716d26f 100644
>>> --- a/arch/x86/oprofile/op_model_ppro.c
>>> +++ b/arch/x86/oprofile/op_model_ppro.c
>>> @@ -69,7 +69,7 @@ static void ppro_setup_ctrs(struct op_msrs const * const msrs)
>>>  	int i;
>>>   	if (!reset_value) {
>>> -		reset_value = kmalloc(sizeof(unsigned) * num_counters,
>>> +		reset_value = kmalloc(sizeof(reset_value[0]) * num_counters,
>>
>> Thanks for tracking this down.
>>
>> But that still doesn't explain why 2.6.27 fails too?
>
> Desesperatly Seeking Oprofile, next round.
>
> I know *nothing* about APIC but spent few hours to try several tricks
> and finally found something.
>
> It solved my problem : oprofile can run several hours without
> any freeze of NMI on any core.
>
> # grep NMI /proc/interrupts
> NMI:   10902884    9635871   10333815    8372989    7971483    8298373    8877495   10206963   Non-maskable interrupts
> ...
> # grep NMI /proc/interrupts
> NMI:   15518834   14340713   15038694   13078235   12676585   13003394   13582115   14912146   Non-maskable interrupts
>
>
> Can anybody understand and explain what is happening ?
>
> Is it a software or hardware problem ?
>
> [PATCH] oprofile: re-arm APIC_DM_NMI in ppro_check_ctrs()
>
> While using oprofile on my HP BL460c G1, (two quad core intel E5450 CPU),
> I noticed that one CPU after the other could not get anymore NMI.
>
> After a while, all cores where blocked (ie not generating events for oprofile)
> I tried all major linux versions and all where affected by this freeze.
>
> I found that we have to re-arm APIC_DM_NMI *before* writing to MSR counter
> when we get event notification.
>
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
> arch/x86/oprofile/op_model_ppro.c |    8 +++++---
> 1 files changed, 5 insertions(+), 3 deletions(-)

| diff --git a/arch/x86/oprofile/op_model_ppro.c b/arch/x86/oprofile/op_model_ppro.c
| index 3f1b81a..7b142da 100644
| --- a/arch/x86/oprofile/op_model_ppro.c
| +++ b/arch/x86/oprofile/op_model_ppro.c
| @@ -132,13 +132,15 @@ static int ppro_check_ctrs(struct pt_regs * const regs,
|  		rdmsrl(msrs->counters[i].addr, val);
|  		if (CTR_OVERFLOWED(val)) {
|  			oprofile_add_sample(regs, i);
| +			/*
| +			 * We need to unmask the apic vector *before*
| +			 * writing reset_value to msr counter
| +			 */
| +			apic_write(APIC_LVTPC, APIC_DM_NMI);
|  			wrmsrl(msrs->counters[i].addr, -reset_value[i]);
|  		}
|  	}
|  
| -	/* Only P6 based Pentium M need to re-unmask the apic vector but it
| -	 * doesn't hurt other P6 variant */
| -	apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);
|  
|  	/* We can't work out if we really handled an interrupt. We
|  	 * might have caught a *second* counter just after overflowing

Hi Eric,

for the record

	apic_write(APIC_LVTPC, APIC_DM_NMI);

is not just 'unmask' but also *zeroify* (not sure if I wrote this
word right :) all fields when the origianl code was just 'unmasking'
TPC register

	apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);

that is why apic_read() was in former.

		- Cyrill -

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] oprofile: re-arm APIC_DM_NMI in ppro_check_ctrs()
  2008-11-10 16:11       ` Cyrill Gorcunov
@ 2008-11-10 16:19         ` Eric Dumazet
  2008-11-10 16:31           ` Cyrill Gorcunov
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2008-11-10 16:19 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Andi Kleen, Robert Richter, Ingo Molnar, LKML

Cyrill Gorcunov a écrit :
> [Eric Dumazet - Mon, Nov 10, 2008 at 03:23:00PM +0100]
>> Andi Kleen a écrit :
>>> Eric Dumazet <dada1@cosmosbay.com> writes:
>>>
>>>> diff --git a/arch/x86/oprofile/op_model_ppro.c b/arch/x86/oprofile/op_model_ppro.c
>>>> index 3f1b81a..716d26f 100644
>>>> --- a/arch/x86/oprofile/op_model_ppro.c
>>>> +++ b/arch/x86/oprofile/op_model_ppro.c
>>>> @@ -69,7 +69,7 @@ static void ppro_setup_ctrs(struct op_msrs const * const msrs)
>>>>  	int i;
>>>>   	if (!reset_value) {
>>>> -		reset_value = kmalloc(sizeof(unsigned) * num_counters,
>>>> +		reset_value = kmalloc(sizeof(reset_value[0]) * num_counters,
>>> Thanks for tracking this down.
>>>
>>> But that still doesn't explain why 2.6.27 fails too?
>> Desesperatly Seeking Oprofile, next round.
>>
>> I know *nothing* about APIC but spent few hours to try several tricks
>> and finally found something.
>>
>> It solved my problem : oprofile can run several hours without
>> any freeze of NMI on any core.
>>
>> # grep NMI /proc/interrupts
>> NMI:   10902884    9635871   10333815    8372989    7971483    8298373    8877495   10206963   Non-maskable interrupts
>> ...
>> # grep NMI /proc/interrupts
>> NMI:   15518834   14340713   15038694   13078235   12676585   13003394   13582115   14912146   Non-maskable interrupts
>>
>>
>> Can anybody understand and explain what is happening ?
>>
>> Is it a software or hardware problem ?
>>
>> [PATCH] oprofile: re-arm APIC_DM_NMI in ppro_check_ctrs()
>>
>> While using oprofile on my HP BL460c G1, (two quad core intel E5450 CPU),
>> I noticed that one CPU after the other could not get anymore NMI.
>>
>> After a while, all cores where blocked (ie not generating events for oprofile)
>> I tried all major linux versions and all where affected by this freeze.
>>
>> I found that we have to re-arm APIC_DM_NMI *before* writing to MSR counter
>> when we get event notification.
>>
>> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
>> arch/x86/oprofile/op_model_ppro.c |    8 +++++---
>> 1 files changed, 5 insertions(+), 3 deletions(-)
> 
> | diff --git a/arch/x86/oprofile/op_model_ppro.c b/arch/x86/oprofile/op_model_ppro.c
> | index 3f1b81a..7b142da 100644
> | --- a/arch/x86/oprofile/op_model_ppro.c
> | +++ b/arch/x86/oprofile/op_model_ppro.c
> | @@ -132,13 +132,15 @@ static int ppro_check_ctrs(struct pt_regs * const regs,
> |  		rdmsrl(msrs->counters[i].addr, val);
> |  		if (CTR_OVERFLOWED(val)) {
> |  			oprofile_add_sample(regs, i);
> | +			/*
> | +			 * We need to unmask the apic vector *before*
> | +			 * writing reset_value to msr counter
> | +			 */
> | +			apic_write(APIC_LVTPC, APIC_DM_NMI);
> |  			wrmsrl(msrs->counters[i].addr, -reset_value[i]);
> |  		}
> |  	}
> |  
> | -	/* Only P6 based Pentium M need to re-unmask the apic vector but it
> | -	 * doesn't hurt other P6 variant */
> | -	apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);
> |  
> |  	/* We can't work out if we really handled an interrupt. We
> |  	 * might have caught a *second* counter just after overflowing
> 
> Hi Eric,
> 
> for the record
> 
> 	apic_write(APIC_LVTPC, APIC_DM_NMI);
> 
> is not just 'unmask' but also *zeroify* (not sure if I wrote this
> word right :) all fields when the origianl code was just 'unmasking'
> TPC register
> 
> 	apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);
> 
> that is why apic_read() was in former.
> 

Well, given that APIC_LVTPC is initialized by oprofile init to value APIC_DM_NMI,
I avoid an apic_read() and just write APIC_DM_NMI again...

Presumably, apic_read(APIC_LVTPC) should return APIC_DM_NMI or APIC_DM_NMI|APIC_LVT_MASKED

Thanks


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] oprofile: re-arm APIC_DM_NMI in ppro_check_ctrs()
  2008-11-10 16:19         ` Eric Dumazet
@ 2008-11-10 16:31           ` Cyrill Gorcunov
  0 siblings, 0 replies; 21+ messages in thread
From: Cyrill Gorcunov @ 2008-11-10 16:31 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Andi Kleen, Robert Richter, Ingo Molnar, LKML

[Eric Dumazet - Mon, Nov 10, 2008 at 05:19:20PM +0100]
...
>>
>> Hi Eric,
>>
>> for the record
>>
>> 	apic_write(APIC_LVTPC, APIC_DM_NMI);
>>
>> is not just 'unmask' but also *zeroify* (not sure if I wrote this
>> word right :) all fields when the origianl code was just 'unmasking'
>> TPC register
>>
>> 	apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);
>>
>> that is why apic_read() was in former.
>>
>
> Well, given that APIC_LVTPC is initialized by oprofile init to value APIC_DM_NMI,
> I avoid an apic_read() and just write APIC_DM_NMI again...
>
> Presumably, apic_read(APIC_LVTPC) should return APIC_DM_NMI or APIC_DM_NMI|APIC_LVT_MASKED
>
> Thanks
>

Yes, just grepped the sources -- it seems nobody else touching
this register indeed.
		- Cyrill -

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] oprofile: re-arm APIC_DM_NMI in ppro_check_ctrs()
  2008-11-10 15:50         ` Eric Dumazet
@ 2008-11-10 17:46           ` Andi Kleen
  0 siblings, 0 replies; 21+ messages in thread
From: Andi Kleen @ 2008-11-10 17:46 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Andi Kleen, Robert Richter, Ingo Molnar, LKML

On Mon, Nov 10, 2008 at 04:50:00PM +0100, Eric Dumazet wrote:
> Andi Kleen a écrit 


Can you send the /proc/cpuinfo output privately please?
I can ask around here.

> >> 			oprofile_add_sample(regs, i);
> >>+			/*
> >>+			 * We need to unmask the apic vector *before*
> >>+			 * writing reset_value to msr counter
> >>+			 */
> >>+			apic_write(APIC_LVTPC, APIC_DM_NMI);
> >> 			wrmsrl(msrs->counters[i].addr, -reset_value[i]);
> >> 		}
> >> 	}
> >> 
> >>-	/* Only P6 based Pentium M need to re-unmask the apic vector but it
> >>-	 * doesn't hurt other P6 variant */
> >>-	apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);
> >
> >Did you also test if it really needs to be inside the if () or 
> >just before the wrmsrl? 
> 
> 
> In my testings, the "if (CTR_OVERFLOWED())" condition is always true.
> 
> (I am profiling one event only)

There can be other valid (but rare) cases of NMIs. The code handles this
by not having any overflow events.

It might be that the case where your NMI gets stuck is such
a stray NMI?

That is in theory it should be safe to unmask  that vector
multiple times (the other NMIs should come from other vectors),
but local APIC is an fragile area.

> 
> I felt uncomfortable issuing the apic_write() before oprofile_add_sample()
> 
> Do you mean doing the apic_write() right at the beginning of the 
> ppro_check_ctrs()
> function ?

Better once, otherwise you'll do it multiple times worst case
(multiple counters can overflow)

-Andi

-- 
ak@linux.intel.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] oprofile: re-arm APIC_DM_NMI in ppro_check_ctrs()
  2008-11-10 15:49       ` Andi Kleen
  2008-11-10 15:50         ` Eric Dumazet
@ 2008-11-11  8:32         ` Eric Dumazet
  2008-11-17 17:33           ` Robert Richter
  1 sibling, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2008-11-11  8:32 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Robert Richter, Ingo Molnar, LKML

[-- Attachment #1: Type: text/plain, Size: 1520 bytes --]

Andi Kleen a écrit :
>>  			oprofile_add_sample(regs, i);
>> +			/*
>> +			 * We need to unmask the apic vector *before*
>> +			 * writing reset_value to msr counter
>> +			 */
>> +			apic_write(APIC_LVTPC, APIC_DM_NMI);
>>  			wrmsrl(msrs->counters[i].addr, -reset_value[i]);
>>  		}
>>  	}
>>  
>> -	/* Only P6 based Pentium M need to re-unmask the apic vector but it
>> -	 * doesn't hurt other P6 variant */
>> -	apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);
> 
> Did you also test if it really needs to be inside the if () or 
> just before the wrmsrl? 
> 

Andi, the following patch also works well for me. The key is we must
unmask APIC before the wrmsrl. after/before the rdmsrl has no impact.

I wonder then if the final comment in ppro_check_ctrs() is still applicable.


PATCH] oprofile: un-mask APIC before resetting counter in ppro_check_ctrs()

While using oprofile on my HP BL460c G1, (two quad core intel E5450 CPU),
I noticed that one CPU after the other could not get anymore NMI.

After a while, all cores where blocked (ie not generating events for oprofile)
I tried all major linux versions and all where affected by this freeze.

I found that we have to un-mask APIC *before* writing to MSR counter
when we get event notification, because we use APIC_LVTPC in edge triggered mode.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
---
 arch/x86/oprofile/op_model_ppro.c |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

[-- Attachment #2: oprofile_msr.patch --]
[-- Type: text/plain, Size: 1009 bytes --]

diff --git a/arch/x86/oprofile/op_model_ppro.c b/arch/x86/oprofile/op_model_ppro.c
index 3f1b81a..8484528 100644
--- a/arch/x86/oprofile/op_model_ppro.c
+++ b/arch/x86/oprofile/op_model_ppro.c
@@ -126,6 +126,12 @@ static int ppro_check_ctrs(struct pt_regs * const regs,
 	u64 val;
 	int i;
 
+	/*
+	 * We need to unmask the apic vector *before* writing reset_value
+	 * to msr counter, because we use edge trigger
+	 */
+	apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);
+
 	for (i = 0 ; i < num_counters; ++i) {
 		if (!reset_value[i])
 			continue;
@@ -136,10 +142,6 @@ static int ppro_check_ctrs(struct pt_regs * const regs,
 		}
 	}
 
-	/* Only P6 based Pentium M need to re-unmask the apic vector but it
-	 * doesn't hurt other P6 variant */
-	apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);
-
 	/* We can't work out if we really handled an interrupt. We
 	 * might have caught a *second* counter just after overflowing
 	 * the interrupt for this counter then arrives

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] oprofile: re-arm APIC_DM_NMI in ppro_check_ctrs()
  2008-11-11  8:32         ` Eric Dumazet
@ 2008-11-17 17:33           ` Robert Richter
  2008-11-17 18:25             ` Andi Kleen
  0 siblings, 1 reply; 21+ messages in thread
From: Robert Richter @ 2008-11-17 17:33 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Andi Kleen, Ingo Molnar, LKML

On 11.11.08 09:32:12, Eric Dumazet wrote:

[...]

> PATCH] oprofile: un-mask APIC before resetting counter in ppro_check_ctrs()

Is this the patch you all prefer? If so, I would like to send it
upstream.

-Robert

>
> While using oprofile on my HP BL460c G1, (two quad core intel E5450 CPU),
> I noticed that one CPU after the other could not get anymore NMI.
>
> After a while, all cores where blocked (ie not generating events for 
> oprofile)
> I tried all major linux versions and all where affected by this freeze.
>
> I found that we have to un-mask APIC *before* writing to MSR counter
> when we get event notification, because we use APIC_LVTPC in edge triggered 
> mode.
>
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
> ---
> arch/x86/oprofile/op_model_ppro.c |   10 ++++++----
> 1 files changed, 6 insertions(+), 4 deletions(-)

> diff --git a/arch/x86/oprofile/op_model_ppro.c b/arch/x86/oprofile/op_model_ppro.c
> index 3f1b81a..8484528 100644
> --- a/arch/x86/oprofile/op_model_ppro.c
> +++ b/arch/x86/oprofile/op_model_ppro.c
> @@ -126,6 +126,12 @@ static int ppro_check_ctrs(struct pt_regs * const regs,
>  	u64 val;
>  	int i;
>  
> +	/*
> +	 * We need to unmask the apic vector *before* writing reset_value
> +	 * to msr counter, because we use edge trigger
> +	 */
> +	apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);
> +
>  	for (i = 0 ; i < num_counters; ++i) {
>  		if (!reset_value[i])
>  			continue;
> @@ -136,10 +142,6 @@ static int ppro_check_ctrs(struct pt_regs * const regs,
>  		}
>  	}
>  
> -	/* Only P6 based Pentium M need to re-unmask the apic vector but it
> -	 * doesn't hurt other P6 variant */
> -	apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);
> -
>  	/* We can't work out if we really handled an interrupt. We
>  	 * might have caught a *second* counter just after overflowing
>  	 * the interrupt for this counter then arrives


-- 
Advanced Micro Devices, Inc.
Operating System Research Center
email: robert.richter@amd.com


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [git pull] OProfile fixes for v2.6.28
  2008-11-10  8:05 ` Eric Dumazet
  2008-11-10  8:43   ` Andi Kleen
@ 2008-11-17 17:57   ` Robert Richter
  1 sibling, 0 replies; 21+ messages in thread
From: Robert Richter @ 2008-11-17 17:57 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Ingo Molnar, LKML, Andi Kleen

On 10.11.08 09:05:37, Eric Dumazet wrote:
> [PATCH] oprofile: fix an overflow in ppro code
>
> reset_value was changed from long to u64 in commit 
> b99170288421c79f0c2efa8b33e26e65f4bb7fb8
> (oprofile: Implement Intel architectural perfmon support)
>
> But dynamic allocation of this array use a wrong type (long instead of u64)
>
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

Applied to oprofile/oprofile-for-tip. Thanks Eric.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center
email: robert.richter@amd.com


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] oprofile: re-arm APIC_DM_NMI in ppro_check_ctrs()
  2008-11-17 17:33           ` Robert Richter
@ 2008-11-17 18:25             ` Andi Kleen
  2008-11-18  8:57               ` Robert Richter
  0 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2008-11-17 18:25 UTC (permalink / raw)
  To: Robert Richter; +Cc: Eric Dumazet, Andi Kleen, Ingo Molnar, LKML

On Mon, Nov 17, 2008 at 06:33:59PM +0100, Robert Richter wrote:
> On 11.11.08 09:32:12, Eric Dumazet wrote:
> 
> [...]
> 
> > PATCH] oprofile: un-mask APIC before resetting counter in ppro_check_ctrs()
> 
> Is this the patch you all prefer? If so, I would like to send it
> upstream.

It would be better to root cause the problem first.

I worked with Eric privately and tried a few things, but didn't help
so far. Most likely the workaround is also not complete before
it is fully understood. Not sure how long this will take though.

It's probably ok as a workaround for a release, but shouldn't 
be considered a final fix.

-Andi


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] oprofile: re-arm APIC_DM_NMI in ppro_check_ctrs()
  2008-11-17 18:25             ` Andi Kleen
@ 2008-11-18  8:57               ` Robert Richter
  0 siblings, 0 replies; 21+ messages in thread
From: Robert Richter @ 2008-11-18  8:57 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Eric Dumazet, Ingo Molnar, LKML

On 17.11.08 19:25:27, Andi Kleen wrote:
> On Mon, Nov 17, 2008 at 06:33:59PM +0100, Robert Richter wrote:
> > Is this the patch you all prefer? If so, I would like to send it
> > upstream.
> 
> It would be better to root cause the problem first.
> 
> I worked with Eric privately and tried a few things, but didn't help
> so far. Most likely the workaround is also not complete before
> it is fully understood. Not sure how long this will take though.
> 
> It's probably ok as a workaround for a release, but shouldn't 
> be considered a final fix.

Ok, I will take no action. Let me know if there is a new solution
available or if this workaround should go in.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center
email: robert.richter@amd.com


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [git pull] OProfile fixes for v2.6.28
@ 2008-11-23 11:06 Robert Richter
  2008-11-23 11:18 ` Ingo Molnar
  0 siblings, 1 reply; 21+ messages in thread
From: Robert Richter @ 2008-11-23 11:06 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML

Hi Ingo,

please pull oprofile fixes for 2.6.28 for tip regression:

git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git oprofile-for-tip

There is only one fix:

Eric Dumazet (1):
      oprofile: fix an overflow in ppro code

Thanks,

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center
email: robert.richter@amd.com


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [git pull] OProfile fixes for v2.6.28
  2008-11-23 11:06 Robert Richter
@ 2008-11-23 11:18 ` Ingo Molnar
  0 siblings, 0 replies; 21+ messages in thread
From: Ingo Molnar @ 2008-11-23 11:18 UTC (permalink / raw)
  To: Robert Richter; +Cc: LKML


* Robert Richter <robert.richter@amd.com> wrote:

> Hi Ingo,
> 
> please pull oprofile fixes for 2.6.28 for tip regression:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git oprofile-for-tip
> 
> There is only one fix:
> 
> Eric Dumazet (1):
>       oprofile: fix an overflow in ppro code
> 
> Thanks,

Pulled into tip/x86/urgent, thanks Robert!

	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [git pull] OProfile fixes for v2.6.28
@ 2008-12-03 17:16 Robert Richter
  2008-12-03 17:53 ` Ingo Molnar
  0 siblings, 1 reply; 21+ messages in thread
From: Robert Richter @ 2008-12-03 17:16 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML

Hi Ingo,

please pull oprofile fixes for 2.6.28 for tip regression:

git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git oprofile-for-tip

There are these fixes:

Eric Dumazet (1):
      oprofile: fix CPU unplug panic in ppro_stop()

William Cohen (1):
      x86/oprofile: fix Intel cpu family 6 detection

Thanks,

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center
email: robert.richter@amd.com


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [git pull] OProfile fixes for v2.6.28
  2008-12-03 17:16 Robert Richter
@ 2008-12-03 17:53 ` Ingo Molnar
  0 siblings, 0 replies; 21+ messages in thread
From: Ingo Molnar @ 2008-12-03 17:53 UTC (permalink / raw)
  To: Robert Richter; +Cc: LKML


* Robert Richter <robert.richter@amd.com> wrote:

> Hi Ingo,
> 
> please pull oprofile fixes for 2.6.28 for tip regression:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git oprofile-for-tip
> 
> There are these fixes:
> 
> Eric Dumazet (1):
>       oprofile: fix CPU unplug panic in ppro_stop()
> 
> William Cohen (1):
>       x86/oprofile: fix Intel cpu family 6 detection

pulled into tip/x86/urgent, thanks Robert!

	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2008-12-03 17:54 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-07 17:13 [git pull] OProfile fixes for v2.6.28 Robert Richter
2008-11-07 18:22 ` Ingo Molnar
2008-11-10  8:05 ` Eric Dumazet
2008-11-10  8:43   ` Andi Kleen
2008-11-10  9:01     ` Eric Dumazet
2008-11-10 14:23     ` [PATCH] oprofile: re-arm APIC_DM_NMI in ppro_check_ctrs() Eric Dumazet
2008-11-10 15:49       ` Andi Kleen
2008-11-10 15:50         ` Eric Dumazet
2008-11-10 17:46           ` Andi Kleen
2008-11-11  8:32         ` Eric Dumazet
2008-11-17 17:33           ` Robert Richter
2008-11-17 18:25             ` Andi Kleen
2008-11-18  8:57               ` Robert Richter
2008-11-10 16:11       ` Cyrill Gorcunov
2008-11-10 16:19         ` Eric Dumazet
2008-11-10 16:31           ` Cyrill Gorcunov
2008-11-17 17:57   ` [git pull] OProfile fixes for v2.6.28 Robert Richter
  -- strict thread matches above, loose matches on Subject: below --
2008-11-23 11:06 Robert Richter
2008-11-23 11:18 ` Ingo Molnar
2008-12-03 17:16 Robert Richter
2008-12-03 17:53 ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox