[PATCH 4/4] [x86] perf: fix accidentally ack'ing a second event on intel perf counter

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 4/4] [x86] perf: fix accidentally ack'ing a second event on intel perf counter
  2010-09-01  2:56 [PATCH 0/4] nmi perf fixes Don Zickus
@ 2010-09-01  2:56 ` Don Zickus
  0 siblings, 0 replies; 15+ messages in thread
From: Don Zickus @ 2010-09-01  2:56 UTC (permalink / raw)
  To: mingo
  Cc: peterz, robert.richter, gorcunov, fweisbec, linux-kernel,
	ying.huang, ming.m.lin, yinghai, andi, Don Zickus

During testing of a patch to stop having the perf subsytem swallow nmis,
it was uncovered that Nehalem boxes were randomly getting unknown nmis
when using the perf tool.

Moving the ack'ing of the PMI closer to when we get the status allows
the hardware to properly re-set the PMU bit signaling another PMI was
triggered during the processing of the first PMI.  This allows the new
logic for dealing with the shortcomings of multiple PMIs to handle the
extra NMI by 'eat'ing it later.

Now one can wonder why are we getting a second PMI when we disable all
the PMUs in the begining of the NMI handler to prevent such a case, for
that I do not know.  But I know the fix below helps deal with this quirk.

Tested on multiple Nehalems where the problem was occuring.  With the
patch, the code now loops a second time to handle the second PMI (whereas
before it was not).

Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 arch/x86/kernel/cpu/perf_event_intel.c |    6 ++----
 1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 4539b4b..ee05c90 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -712,7 +712,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
 	struct perf_sample_data data;
 	struct cpu_hw_events *cpuc;
 	int bit, loops;
-	u64 ack, status;
+	u64 status;
 	int handled = 0;

 	perf_sample_data_init(&data, 0);
@@ -729,6 +729,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)

 	loops = 0;
 again:
+	intel_pmu_ack_status(status);
 	if (++loops > 100) {
 		WARN_ONCE(1, "perfevents: irq loop stuck!\n");
 		perf_event_print_debug();
@@ -737,7 +738,6 @@ again:
 	}

 	inc_irq_stat(apic_perf_irqs);
-	ack = status;

 	intel_pmu_lbr_read();

@@ -766,8 +766,6 @@ again:
 			x86_pmu_stop(event);
 	}

-	intel_pmu_ack_status(ack);
-
 	/*
 	 * Repeat if there is more work to be done:
 	 */
-- 
1.7.2.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] [x86] perf: fix accidentally ack'ing a second event on intel perf counter
@ 2010-09-01 13:04 Stephane Eranian
  2010-09-01 14:57 ` Robert Richter
  0 siblings, 1 reply; 15+ messages in thread
From: Stephane Eranian @ 2010-09-01 13:04 UTC (permalink / raw)
  To: Don Zickus; +Cc: linux-kernel, mingo, Peter Zijlstra, Robert Richter

Don,

Found your patch on LKML (I am not on it).

In your changelog you said:

> During testing of a patch to stop having the perf subsytem swallow nmis,
> it was uncovered that Nehalem boxes were randomly getting unknown nmis
> when using the perf tool.
>
> Moving the ack'ing of the PMI closer to when we get the status allows
> the hardware to properly re-set the PMU bit signaling another PMI was
> triggered during the processing of the first PMI.  This allows the new
> logic for dealing with the shortcomings of multiple PMIs to handle the
> extra NMI by 'eat'ing it later.

> Now one can wonder why are we getting a second PMI when we disable all
> the PMUs in the beginning of the NMI handler to prevent such a case, for
> that I do not know.  But I know the fix below helps deal with this quirk.
>

I am assuming you're talking about back-to-back NMIs here, not nested NMIs.
I don't quite understand the scenario here. Is it the case that you handled 1
overflow, and then right as you return from the interrupt, you get a second
PMI with a ovfl_status=0 ?

What events did you measure? Which counters did you use?
Did you have HT turned on?

> Tested on multiple Nehalems where the problem was occuring.  With the
> patch, the code now loops a second time to handle the second PMI (whereas
> before it was not).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] [x86] perf: fix accidentally ack'ing a second event on intel perf counter
  2010-09-01 13:04 [PATCH 4/4] [x86] perf: fix accidentally ack'ing a second event on intel perf counter Stephane Eranian
@ 2010-09-01 14:57 ` Robert Richter
  2010-09-02  8:13   ` Stephane Eranian
  0 siblings, 1 reply; 15+ messages in thread
From: Robert Richter @ 2010-09-01 14:57 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Don Zickus, linux-kernel@vger.kernel.org, mingo@elte.hu,
	Peter Zijlstra

On 01.09.10 09:04:45, Stephane Eranian wrote:
> Don,
> 
> Found your patch on LKML (I am not on it).
> 
> In your changelog you said:
> 
> > During testing of a patch to stop having the perf subsytem swallow nmis,
> > it was uncovered that Nehalem boxes were randomly getting unknown nmis
> > when using the perf tool.
> >
> > Moving the ack'ing of the PMI closer to when we get the status allows
> > the hardware to properly re-set the PMU bit signaling another PMI was
> > triggered during the processing of the first PMI.  This allows the new
> > logic for dealing with the shortcomings of multiple PMIs to handle the
> > extra NMI by 'eat'ing it later.
> 
> > Now one can wonder why are we getting a second PMI when we disable all
> > the PMUs in the beginning of the NMI handler to prevent such a case, for
> > that I do not know.  But I know the fix below helps deal with this quirk.
> >
> 
> I am assuming you're talking about back-to-back NMIs here, not nested NMIs.
> I don't quite understand the scenario here. Is it the case that you handled 1
> overflow, and then right as you return from the interrupt, you get a second
> PMI with a ovfl_status=0 ?
> 
> What events did you measure? Which counters did you use?
> Did you have HT turned on?

It is related to this thread:

 http://lkml.org/lkml/2010/8/25/124

Not acking the status immediately triggered an nmi, but the status was
0. Acking after reading and before processing the counters results in
a non-zero status and thus, no empty nmi.

-Robert

> 
> > Tested on multiple Nehalems where the problem was occuring.  With the
> > patch, the code now loops a second time to handle the second PMI (whereas
> > before it was not).
> 

-- 
Advanced Micro Devices, Inc.
Operating System Research Center


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] [x86] perf: fix accidentally ack'ing a second event on intel perf counter
  2010-09-01 14:57 ` Robert Richter
@ 2010-09-02  8:13   ` Stephane Eranian
  2010-09-02 13:11     ` Robert Richter
  2010-09-02 14:19     ` Don Zickus
  0 siblings, 2 replies; 15+ messages in thread
From: Stephane Eranian @ 2010-09-02  8:13 UTC (permalink / raw)
  To: Robert Richter
  Cc: Don Zickus, linux-kernel@vger.kernel.org, mingo@elte.hu,
	Peter Zijlstra

Robert,

Do you have the test program you used to test this?
I believe the NHM hack does not solve the problem, it
just makes it harder to appear.

I suspect the real issue is that the GLOBAL_STATUS
bitmask cannot be trusted. I'd like to verify this.

Has the problem appear only on Nehalem or also on
Westmere?


On Wed, Sep 1, 2010 at 4:57 PM, Robert Richter <robert.richter@amd.com> wrote:
> On 01.09.10 09:04:45, Stephane Eranian wrote:
>> Don,
>>
>> Found your patch on LKML (I am not on it).
>>
>> In your changelog you said:
>>
>> > During testing of a patch to stop having the perf subsytem swallow nmis,
>> > it was uncovered that Nehalem boxes were randomly getting unknown nmis
>> > when using the perf tool.
>> >
>> > Moving the ack'ing of the PMI closer to when we get the status allows
>> > the hardware to properly re-set the PMU bit signaling another PMI was
>> > triggered during the processing of the first PMI.  This allows the new
>> > logic for dealing with the shortcomings of multiple PMIs to handle the
>> > extra NMI by 'eat'ing it later.
>>
>> > Now one can wonder why are we getting a second PMI when we disable all
>> > the PMUs in the beginning of the NMI handler to prevent such a case, for
>> > that I do not know.  But I know the fix below helps deal with this quirk.
>> >
>>
>> I am assuming you're talking about back-to-back NMIs here, not nested NMIs.
>> I don't quite understand the scenario here. Is it the case that you handled 1
>> overflow, and then right as you return from the interrupt, you get a second
>> PMI with a ovfl_status=0 ?
>>
>> What events did you measure? Which counters did you use?
>> Did you have HT turned on?
>
> It is related to this thread:
>
>  http://lkml.org/lkml/2010/8/25/124
>
> Not acking the status immediately triggered an nmi, but the status was
> 0. Acking after reading and before processing the counters results in
> a non-zero status and thus, no empty nmi.
>
> -Robert
>
>>
>> > Tested on multiple Nehalems where the problem was occuring.  With the
>> > patch, the code now loops a second time to handle the second PMI (whereas
>> > before it was not).
>>
>
> --
> Advanced Micro Devices, Inc.
> Operating System Research Center
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] [x86] perf: fix accidentally ack'ing a second event on intel perf counter
  2010-09-02  8:13   ` Stephane Eranian
@ 2010-09-02 13:11     ` Robert Richter
  2010-09-02 14:19     ` Don Zickus
  1 sibling, 0 replies; 15+ messages in thread
From: Robert Richter @ 2010-09-02 13:11 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Don Zickus, linux-kernel@vger.kernel.org, mingo@elte.hu,
	Peter Zijlstra

On 02.09.10 04:13:19, Stephane Eranian wrote:
> Robert,
> 
> Do you have the test program you used to test this?
> I believe the NHM hack does not solve the problem, it
> just makes it harder to appear.

For testing back-to-back nmis I have used:

 perf record -e cycles -e instructions -e cache-references
       -e cache-misses -e branch-misses -a -- sleep 10

with load on all cpus. But I couldn't reproduce this particular
problem as I do not have such a system available. I think it might
trigger also with only one counter running. What the observed from the
status bits, only one counter was involved.

> 
> I suspect the real issue is that the GLOBAL_STATUS
> bitmask cannot be trusted. I'd like to verify this.

So yes, it looks like it is a cpu bug with a race then clearing the
status. I didn't check the errata list, maybe it is already known.

> 
> Has the problem appear only on Nehalem or also on
> Westmere?

I don't know.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] [x86] perf: fix accidentally ack'ing a second event on intel perf counter
  2010-09-02  8:13   ` Stephane Eranian
  2010-09-02 13:11     ` Robert Richter
@ 2010-09-02 14:19     ` Don Zickus
  2010-09-02 14:39       ` Stephane Eranian
  1 sibling, 1 reply; 15+ messages in thread
From: Don Zickus @ 2010-09-02 14:19 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Robert Richter, linux-kernel@vger.kernel.org, mingo@elte.hu,
	Peter Zijlstra

On Thu, Sep 02, 2010 at 10:13:19AM +0200, Stephane Eranian wrote:
> Robert,
> 
> Do you have the test program you used to test this?
> I believe the NHM hack does not solve the problem, it
> just makes it harder to appear.

Could be.

> 
> I suspect the real issue is that the GLOBAL_STATUS
> bitmask cannot be trusted. I'd like to verify this.
> 
> Has the problem appear only on Nehalem or also on
> Westmere?

I was able to duplicate on

Intel(R) Core(TM) i5 CPU         650  @ 3.20GHz
Intel(R) Xeon(R) CPU           X5560  @ 2.80GHz

with just running 'perf top' for about 60 seconds.

You would need the first three patches to expose the problem.

Reading the code, it seemed like the perf counters should be disabled and
this patch should be unecessary, but after playing around with the code
for a few hours, I came up with this patch to trap the issue.

I read through the cpu errata and could not find anything related but I
might have missed something.

I am willing to help test if you have a more targeted patch.

Cheers,
Don

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] [x86] perf: fix accidentally ack'ing a second event on intel perf counter
  2010-09-02 14:19     ` Don Zickus
@ 2010-09-02 14:39       ` Stephane Eranian
  2010-09-02 15:47         ` Don Zickus
  2010-09-03  8:33         ` Peter Zijlstra
  0 siblings, 2 replies; 15+ messages in thread
From: Stephane Eranian @ 2010-09-02 14:39 UTC (permalink / raw)
  To: Don Zickus
  Cc: Robert Richter, linux-kernel@vger.kernel.org, mingo@elte.hu,
	Peter Zijlstra

Don,

On Thu, Sep 2, 2010 at 4:19 PM, Don Zickus <dzickus@redhat.com> wrote:
> On Thu, Sep 02, 2010 at 10:13:19AM +0200, Stephane Eranian wrote:
>> Robert,
>>
>> Do you have the test program you used to test this?
>> I believe the NHM hack does not solve the problem, it
>> just makes it harder to appear.
>
> Could be.
>
>>
>> I suspect the real issue is that the GLOBAL_STATUS
>> bitmask cannot be trusted. I'd like to verify this.
>>
>> Has the problem appear only on Nehalem or also on
>> Westmere?
>
> I was able to duplicate on
>
> Intel(R) Core(TM) i5 CPU         650  @ 3.20GHz
> Intel(R) Xeon(R) CPU           X5560  @ 2.80GHz
>
I managed to reproduce on core i7 860 (without patch4).
Looking at the code again, I am dubious you ever execute
the retry goto. If the PMU is disabled and you've just
cleared the OVF_STAT, then I don't see where the new
overflows would come from. But that's a separate problem.

One thing I did is to compare status obtained via OVFL_STATUS
with one that I build manually by inspecting each individual
counter. The two returned bitmasks should always be identical
(with PEBS disabled).  When I got the spurious NMI, it did not
trip my status validation. So the OVFL_STATUS is valid.

I found something else that looked fishy. I am experimenting
with it. I will report back.


> with just running 'perf top' for about 60 seconds.
>
> You would need the first three patches to expose the problem.
>
> Reading the code, it seemed like the perf counters should be disabled and
> this patch should be unecessary, but after playing around with the code
> for a few hours, I came up with this patch to trap the issue.
>
> I read through the cpu errata and could not find anything related but I
> might have missed something.
>
> I am willing to help test if you have a more targeted patch.
>
Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] [x86] perf: fix accidentally ack'ing a second event on intel perf counter
  2010-09-02 14:39       ` Stephane Eranian
@ 2010-09-02 15:47         ` Don Zickus
  2010-09-02 16:18           ` Stephane Eranian
  2010-09-03  8:33         ` Peter Zijlstra
  1 sibling, 1 reply; 15+ messages in thread
From: Don Zickus @ 2010-09-02 15:47 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Robert Richter, linux-kernel@vger.kernel.org, mingo@elte.hu,
	Peter Zijlstra

On Thu, Sep 02, 2010 at 04:39:02PM +0200, Stephane Eranian wrote:
> Don,
> 
> On Thu, Sep 2, 2010 at 4:19 PM, Don Zickus <dzickus@redhat.com> wrote:
> > On Thu, Sep 02, 2010 at 10:13:19AM +0200, Stephane Eranian wrote:
> >> Robert,
> >>
> >> Do you have the test program you used to test this?
> >> I believe the NHM hack does not solve the problem, it
> >> just makes it harder to appear.
> >
> > Could be.
> >
> >>
> >> I suspect the real issue is that the GLOBAL_STATUS
> >> bitmask cannot be trusted. I'd like to verify this.
> >>
> >> Has the problem appear only on Nehalem or also on
> >> Westmere?
> >
> > I was able to duplicate on
> >
> > Intel(R) Core(TM) i5 CPU         650  @ 3.20GHz
> > Intel(R) Xeon(R) CPU           X5560  @ 2.80GHz
> >
> I managed to reproduce on core i7 860 (without patch4).
> Looking at the code again, I am dubious you ever execute
> the retry goto. If the PMU is disabled and you've just
> cleared the OVF_STAT, then I don't see where the new
> overflows would come from. But that's a separate problem.

I agree with you, but a printk before the goto proved otherwise! :-)

And a printk of the status bit that triggered the goto happened to be the
same one that we initially cleared.  Like I said when I initially posted
the patch, I am not sure why it works but it does do something to stem the
NMI.

There is probably a deeper problem here, I was just trying to get the
external/unknown nmis working again.

Cheers,
Don

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] [x86] perf: fix accidentally ack'ing a second event on intel perf counter
  2010-09-02 15:47         ` Don Zickus
@ 2010-09-02 16:18           ` Stephane Eranian
  0 siblings, 0 replies; 15+ messages in thread
From: Stephane Eranian @ 2010-09-02 16:18 UTC (permalink / raw)
  To: Don Zickus
  Cc: Robert Richter, linux-kernel@vger.kernel.org, mingo@elte.hu,
	Peter Zijlstra

On Thu, Sep 2, 2010 at 5:47 PM, Don Zickus <dzickus@redhat.com> wrote:
> On Thu, Sep 02, 2010 at 04:39:02PM +0200, Stephane Eranian wrote:
>> Don,
>>
>> On Thu, Sep 2, 2010 at 4:19 PM, Don Zickus <dzickus@redhat.com> wrote:
>> > On Thu, Sep 02, 2010 at 10:13:19AM +0200, Stephane Eranian wrote:
>> >> Robert,
>> >>
>> >> Do you have the test program you used to test this?
>> >> I believe the NHM hack does not solve the problem, it
>> >> just makes it harder to appear.
>> >
>> > Could be.
>> >
>> >>
>> >> I suspect the real issue is that the GLOBAL_STATUS
>> >> bitmask cannot be trusted. I'd like to verify this.
>> >>
>> >> Has the problem appear only on Nehalem or also on
>> >> Westmere?
>> >
>> > I was able to duplicate on
>> >
>> > Intel(R) Core(TM) i5 CPU         650  @ 3.20GHz
>> > Intel(R) Xeon(R) CPU           X5560  @ 2.80GHz
>> >
>> I managed to reproduce on core i7 860 (without patch4).
>> Looking at the code again, I am dubious you ever execute
>> the retry goto. If the PMU is disabled and you've just
>> cleared the OVF_STAT, then I don't see where the new
>> overflows would come from. But that's a separate problem.
>
> I agree with you, but a printk before the goto proved otherwise! :-)
>

Well, I was wrong about my status validation. I had my test wrong.
I am seeing discrepancies between OVF_STATUS (status)
and manual inspection of each counter (status2) compared to
active_mask (act):

[  489.338564] CPU2 status=0x6 status2=0x4 act=0x7
[  501.484235] CPU2 status=0x6 status2=0x4 act=0x7
[  541.686414] CPU2 status=0x6 status2=0x4 act=0x7
[  561.819575] CPU2 status=0x6 status2=0x4 act=0x7

Note that I have the NMI watchdog on, and running perf record
with 2 events only.

OVF_STATUS reports more than what has actually occurred.
I think that is fine because in the loop there is a check for against
active_mask. Nevertheless, I think there is a problem right there.

I am trying to narrow down when it happens. It seems to occur
only when I start a new perf record session. And possibly when
perf runs on a CPU that is not monitored:

taskset -c 4   perf record -e cycles -e instructions -F 2000 -a  -C 2
-o /dev/null -- sleep 600

> And a printk of the status bit that triggered the goto happened to be the
> same one that we initially cleared.  Like I said when I initially posted
> the patch, I am not sure why it works but it does do something to stem the
> NMI.

Yes, I suspect the clearing does not work well. Let's try to do it
multiple times
in a loop.

>
> There is probably a deeper problem here, I was just trying to get the
> external/unknown nmis working again.
>
I suspect that if we were not to use OVF_STATUS, then we would not need
the back-to-back nmi logic. NMi cannot be nested. So just looking at
the counters
should be reliable (excl. PEBS for now).


> Cheers,
> Don
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] [x86] perf: fix accidentally ack'ing a second event on intel perf counter
  2010-09-02 14:39       ` Stephane Eranian
  2010-09-02 15:47         ` Don Zickus
@ 2010-09-03  8:33         ` Peter Zijlstra
  2010-09-03 11:02           ` Stephane Eranian
  1 sibling, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2010-09-03  8:33 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Don Zickus, Robert Richter, linux-kernel@vger.kernel.org,
	mingo@elte.hu

On Thu, 2010-09-02 at 16:39 +0200, Stephane Eranian wrote:
> I managed to reproduce on core i7 860 (without patch4).
> Looking at the code again, I am dubious you ever execute
> the retry goto. If the PMU is disabled and you've just
> cleared the OVF_STAT, then I don't see where the new
> overflows would come from. But that's a separate problem.
> 
> One thing I did is to compare status obtained via OVFL_STATUS
> with one that I build manually by inspecting each individual
> counter. The two returned bitmasks should always be identical
> (with PEBS disabled).  When I got the spurious NMI, it did not
> trip my status validation. So the OVFL_STATUS is valid.
> 
> I found something else that looked fishy. I am experimenting
> with it. I will report back. 

One thing we still need to do is on init detect if the BIOS is using one
of the PMCs and simply disable all of perf and print a nice big message
to the user to request a new BIOS from their vendor.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] [x86] perf: fix accidentally ack'ing a second event on intel perf counter
  2010-09-03  8:33         ` Peter Zijlstra
@ 2010-09-03 11:02           ` Stephane Eranian
  2010-09-03 11:11             ` Peter Zijlstra
  2010-09-03 14:03             ` Don Zickus
  0 siblings, 2 replies; 15+ messages in thread
From: Stephane Eranian @ 2010-09-03 11:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Don Zickus, Robert Richter, linux-kernel@vger.kernel.org,
	mingo@elte.hu

On Fri, Sep 3, 2010 at 10:33 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, 2010-09-02 at 16:39 +0200, Stephane Eranian wrote:
>> I managed to reproduce on core i7 860 (without patch4).
>> Looking at the code again, I am dubious you ever execute
>> the retry goto. If the PMU is disabled and you've just
>> cleared the OVF_STAT, then I don't see where the new
>> overflows would come from. But that's a separate problem.
>>
>> One thing I did is to compare status obtained via OVFL_STATUS
>> with one that I build manually by inspecting each individual
>> counter. The two returned bitmasks should always be identical
>> (with PEBS disabled).  When I got the spurious NMI, it did not
>> trip my status validation. So the OVFL_STATUS is valid.
>>
>> I found something else that looked fishy. I am experimenting
>> with it. I will report back.
>
> One thing we still need to do is on init detect if the BIOS is using one
> of the PMCs and simply disable all of perf and print a nice big message
> to the user to request a new BIOS from their vendor.
>
Given then way perf_events operate, that is your only choice at this point.

But I am sure neither my system nor yours is subject to this particular issue
yet there is some unexplained errors with OVF_STATUS.

Here is an example of what I gathered on a Westmere:

This is coming into the interrupt handler:
- status   = overflow status coming from GLOBAL_OVF_STATUS
- status2 = inspection of the counters
- act = cpuc->active_mask[0]

In case both status don't match, I dump the state of the active events
incl. the counter values(val).

[  822.813808] CPU2 irqin status=0x6 status2=0x4 act=0x7
[  822.813818] CPU2 cfg=0x13003c idx=0 sel=53003c val=ffffa833f298
[  822.813821] CPU2 cfg=0x12003c idx=1 sel=52003c val=fffffe130229
[  822.813823] CPU2 cfg=0x11003c idx=2 sel=51003c val=5e9

Here only counter2 has overflowed, yet the handler will also process counter1
which is wrong.

The other thing I noticed is that in intel_pmu_disable_event(), the event
stopped sometimes has overflowed. Looks like OVF_STAUS is stale.
Maybe OVF_STATUS is not cleared properly somewhere, possibly when
an event gets disabled.

I have a busy system, with the NMI watchdog running (0x13003c) where I do:

perf record -a -C 1 -e cycles:k -ecycles:u -F 10 -- sleep 10

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] [x86] perf: fix accidentally ack'ing a second event on intel perf counter
  2010-09-03 11:02           ` Stephane Eranian
@ 2010-09-03 11:11             ` Peter Zijlstra
  2010-09-03 11:52               ` Stephane Eranian
  2010-09-03 14:03             ` Don Zickus
  1 sibling, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2010-09-03 11:11 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Don Zickus, Robert Richter, linux-kernel@vger.kernel.org,
	mingo@elte.hu

On Fri, 2010-09-03 at 13:02 +0200, Stephane Eranian wrote:
> 
> > One thing we still need to do is on init detect if the BIOS is using one
> > of the PMCs and simply disable all of perf and print a nice big message
> > to the user to request a new BIOS from their vendor.
> >
> Given then way perf_events operate, that is your only choice at this point.

Well, it wouldn't be too hard to cure that, but the BIOS should simply
keep its grubby paws of the PMU -- I'm really not interested in
co-operating on that point.

> But I am sure neither my system nor yours is subject to this particular issue

Sure, worth checking though, not sure Don did on his machine.

> yet there is some unexplained errors with OVF_STATUS.

Right.

> Here is an example of what I gathered on a Westmere:
> 
> This is coming into the interrupt handler:
> - status   = overflow status coming from GLOBAL_OVF_STATUS
> - status2 = inspection of the counters
> - act = cpuc->active_mask[0]
> 
> In case both status don't match, I dump the state of the active events
> incl. the counter values(val).
> 
> [  822.813808] CPU2 irqin status=0x6 status2=0x4 act=0x7
> [  822.813818] CPU2 cfg=0x13003c idx=0 sel=53003c val=ffffa833f298
> [  822.813821] CPU2 cfg=0x12003c idx=1 sel=52003c val=fffffe130229
> [  822.813823] CPU2 cfg=0x11003c idx=2 sel=51003c val=5e9
> 
> Here only counter2 has overflowed, yet the handler will also process counter1
> which is wrong.

Right, we could easily revert to scanning all counters like we do for
all other interrupt handlers.

> The other thing I noticed is that in intel_pmu_disable_event(), the event
> stopped sometimes has overflowed. Looks like OVF_STAUS is stale.
> Maybe OVF_STATUS is not cleared properly somewhere, possibly when
> an event gets disabled.

Right, the code pretty much assumes that if it overflows a PMI will be
generated. So you're saying a pending PMI might get canceled when we
clear the EN bit? Most icky.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] [x86] perf: fix accidentally ack'ing a second event on intel perf counter
  2010-09-03 11:11             ` Peter Zijlstra
@ 2010-09-03 11:52               ` Stephane Eranian
  0 siblings, 0 replies; 15+ messages in thread
From: Stephane Eranian @ 2010-09-03 11:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Don Zickus, Robert Richter, linux-kernel@vger.kernel.org,
	mingo@elte.hu

On Fri, Sep 3, 2010 at 1:11 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, 2010-09-03 at 13:02 +0200, Stephane Eranian wrote:
>>
>> > One thing we still need to do is on init detect if the BIOS is using one
>> > of the PMCs and simply disable all of perf and print a nice big message
>> > to the user to request a new BIOS from their vendor.
>> >
>> Given then way perf_events operate, that is your only choice at this point.
>
> Well, it wouldn't be too hard to cure that, but the BIOS should simply
> keep its grubby paws of the PMU -- I'm really not interested in
> co-operating on that point.
>
>> But I am sure neither my system nor yours is subject to this particular issue
>
> Sure, worth checking though, not sure Don did on his machine.
>
>> yet there is some unexplained errors with OVF_STATUS.
>
> Right.
>
>> Here is an example of what I gathered on a Westmere:
>>
>> This is coming into the interrupt handler:
>> - status   = overflow status coming from GLOBAL_OVF_STATUS
>> - status2 = inspection of the counters
>> - act = cpuc->active_mask[0]
>>
>> In case both status don't match, I dump the state of the active events
>> incl. the counter values(val).
>>
>> [  822.813808] CPU2 irqin status=0x6 status2=0x4 act=0x7
>> [  822.813818] CPU2 cfg=0x13003c idx=0 sel=53003c val=ffffa833f298
>> [  822.813821] CPU2 cfg=0x12003c idx=1 sel=52003c val=fffffe130229
>> [  822.813823] CPU2 cfg=0x11003c idx=2 sel=51003c val=5e9
>>
>> Here only counter2 has overflowed, yet the handler will also process counter1
>> which is wrong.
>
> Right, we could easily revert to scanning all counters like we do for
> all other interrupt handlers.
>
Well, that's the question! Looks like this may be more reliable, yet more
costly. And also you'd have to deal with PEBS separately, though using
OVF_STATUS for that may be sufficient.

>> The other thing I noticed is that in intel_pmu_disable_event(), the event
>> stopped sometimes has overflowed. Looks like OVF_STAUS is stale.
>> Maybe OVF_STATUS is not cleared properly somewhere, possibly when
>> an event gets disabled.
>
> Right, the code pretty much assumes that if it overflows a PMI will be
> generated. So you're saying a pending PMI might get canceled when we
> clear the EN bit? Most icky.
>
No, I don't think that cancels it. But that may be a reason why there are
back-to-back NMIs, with nothing to process sometimes (event not in the
active_mask anymore).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] [x86] perf: fix accidentally ack'ing a second event on intel perf counter
  2010-09-03 11:02           ` Stephane Eranian
  2010-09-03 11:11             ` Peter Zijlstra
@ 2010-09-03 14:03             ` Don Zickus
  2010-09-03 14:28               ` Stephane Eranian
  1 sibling, 1 reply; 15+ messages in thread
From: Don Zickus @ 2010-09-03 14:03 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Peter Zijlstra, Robert Richter, linux-kernel@vger.kernel.org,
	mingo@elte.hu

On Fri, Sep 03, 2010 at 01:02:49PM +0200, Stephane Eranian wrote:
> Here is an example of what I gathered on a Westmere:
> 
> This is coming into the interrupt handler:
> - status   = overflow status coming from GLOBAL_OVF_STATUS
> - status2 = inspection of the counters
> - act = cpuc->active_mask[0]
> 
> In case both status don't match, I dump the state of the active events
> incl. the counter values(val).
> 
> [  822.813808] CPU2 irqin status=0x6 status2=0x4 act=0x7
> [  822.813818] CPU2 cfg=0x13003c idx=0 sel=53003c val=ffffa833f298
> [  822.813821] CPU2 cfg=0x12003c idx=1 sel=52003c val=fffffe130229
> [  822.813823] CPU2 cfg=0x11003c idx=2 sel=51003c val=5e9
> 
> Here only counter2 has overflowed, yet the handler will also process counter1
> which is wrong.

Hmm, the test case I used was 'perf top'.  This was only using perf
counter0.  So I am not sure, at least in my case, it was a stale event.
Though I don't think I remember checking the status immediately after
acking it just to verify the ack worked.

I'll poke some more on my machine today.

Cheers,
Don

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] [x86] perf: fix accidentally ack'ing a second event on intel perf counter
  2010-09-03 14:03             ` Don Zickus
@ 2010-09-03 14:28               ` Stephane Eranian
  0 siblings, 0 replies; 15+ messages in thread
From: Stephane Eranian @ 2010-09-03 14:28 UTC (permalink / raw)
  To: Don Zickus
  Cc: Peter Zijlstra, Robert Richter, linux-kernel@vger.kernel.org,
	mingo@elte.hu

On Fri, Sep 3, 2010 at 4:03 PM, Don Zickus <dzickus@redhat.com> wrote:
> On Fri, Sep 03, 2010 at 01:02:49PM +0200, Stephane Eranian wrote:
>> Here is an example of what I gathered on a Westmere:
>>
>> This is coming into the interrupt handler:
>> - status   = overflow status coming from GLOBAL_OVF_STATUS
>> - status2 = inspection of the counters
>> - act = cpuc->active_mask[0]
>>
>> In case both status don't match, I dump the state of the active events
>> incl. the counter values(val).
>>
>> [  822.813808] CPU2 irqin status=0x6 status2=0x4 act=0x7
>> [  822.813818] CPU2 cfg=0x13003c idx=0 sel=53003c val=ffffa833f298
>> [  822.813821] CPU2 cfg=0x12003c idx=1 sel=52003c val=fffffe130229
>> [  822.813823] CPU2 cfg=0x11003c idx=2 sel=51003c val=5e9
>>
>> Here only counter2 has overflowed, yet the handler will also process counter1
>> which is wrong.
>
> Hmm, the test case I used was 'perf top'.  This was only using perf
> counter0.  So I am not sure, at least in my case, it was a stale event.
> Though I don't think I remember checking the status immediately after
> acking it just to verify the ack worked.

I am not acking it right away, this is still without patch 4/4. I am trying to
understand how we can get to this situation.

>
> I'll poke some more on my machine today.
>
> Cheers,
> Don
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2010-09-03 14:28 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-01 13:04 [PATCH 4/4] [x86] perf: fix accidentally ack'ing a second event on intel perf counter Stephane Eranian
2010-09-01 14:57 ` Robert Richter
2010-09-02  8:13   ` Stephane Eranian
2010-09-02 13:11     ` Robert Richter
2010-09-02 14:19     ` Don Zickus
2010-09-02 14:39       ` Stephane Eranian
2010-09-02 15:47         ` Don Zickus
2010-09-02 16:18           ` Stephane Eranian
2010-09-03  8:33         ` Peter Zijlstra
2010-09-03 11:02           ` Stephane Eranian
2010-09-03 11:11             ` Peter Zijlstra
2010-09-03 11:52               ` Stephane Eranian
2010-09-03 14:03             ` Don Zickus
2010-09-03 14:28               ` Stephane Eranian
  -- strict thread matches above, loose matches on Subject: below --
2010-09-01  2:56 [PATCH 0/4] nmi perf fixes Don Zickus
2010-09-01  2:56 ` [PATCH 4/4] [x86] perf: fix accidentally ack'ing a second event on intel perf counter Don Zickus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox