public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86, perf, p4: Counter corruption when using lots of perf groups
@ 2014-01-29 19:37 Don Zickus
  2014-01-29 20:06 ` Cyrill Gorcunov
  2014-02-10 13:29 ` [tip:perf/core] perf/x86/p4: Fix counter " tip-bot for Don Zickus
  0 siblings, 2 replies; 6+ messages in thread
From: Don Zickus @ 2014-01-29 19:37 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, Don Zickus, Cyrill Gorcunov, Peter Zijlstra

On a P4 box stressing perf with

./perf record -o perf.data ./perf stat -v ./perf bench all

it was noticed that a slew of unknown NMIs would pop out rather quickly.

Painfully debugging this ancient platform, led me to notice cross cpu counter
corruption.

The P4 machine is special in that it has 18 counters, half are used for cpu0
and the other half is for cpu1 (or all 18 if hyperthreading is disabled).  But
the splitting of the counters has to be actively managed by the software.

In this particular bug, one of the cpu0 specific counters was being used by
cpu1 and caused all sorts of random unknown nmis.

I am not entirely sure on the corruption path, but what happens is:

o perf schedules a group with p4_pmu_schedule_events()
o inside p4_pmu_schedule_events(), it notices an hwc pointer is being reused
  but for a different cpu, so it 'swaps' the config bits and returns the
  updated 'assign' array with a _new_ index.
o perf schedules another group with p4_pmu_schedule_events()
o inside p4_pmu_schedule_events(), it notices an hwc pointer is being reused
  (the same one as above) but for the _same_ cpu [BUG!!], so it updates the
  'assign' array to use the _old_ (wrong cpu) index because the _new_ index is in
  an earlier part of the 'assign' array (and hasn't been committed yet).
o perf commits the transaction using the wrong index and corrupts the other cpu

The [BUG!!] is because the 'hwc->config' is updated but not the 'hwc->idx'.  So
the check for 'p4_should_swap_ts()' is correct the first time around but
incorrect the second time around (because hwc->config was updated in between).

I think the spirit of perf was to not modify anything until all the
transactions had a chance to 'test' if they would succeed, and if so, commit
atomically.  However, P4 breaks this spirit by touching the hwc->config
element.

So my fix is to continue the un-perf like breakage, by assigning hwc->idx to -1
on swap to tell follow up group scheduling to find a new index.

Of course if the transaction fails rolling this back will be difficult, but
that is not different than how the current code works. :-)  And I wasn't sure
how much effort to cleanup the code I should do for a platform that is almost
10 years old by now.

Hence the lazy fix.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 arch/x86/kernel/cpu/perf_event_p4.c |   19 ++++++++++++++++++-
 1 files changed, 18 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_p4.c b/arch/x86/kernel/cpu/perf_event_p4.c
index 075f18c..dda02a1 100644
--- a/arch/x86/kernel/cpu/perf_event_p4.c
+++ b/arch/x86/kernel/cpu/perf_event_p4.c
@@ -1257,7 +1257,24 @@ again:
 			pass++;
 			goto again;
 		}
-
+		/*
+		 * Perf does test runs to see if a whole group can be assigned
+		 * together succesfully.  There can be multiple rounds of this.
+		 * Unfortunately, p4_pmu_swap_config_ts touches the hwc->config
+		 * bits, such that the next round of group assignments will
+		 * cause the above p4_should_swap_ts to pass instead of fail.
+		 * This leads to counters exclusive to thread0 being used by
+		 * thread1.
+		 *
+		 * Solve this with a cheap hack, reset the idx back to -1 to
+		 * force a new lookup (p4_next_cntr) to get the right counter
+		 * for the right thread.
+		 *
+		 * This probably doesn't comply with the general spirit of how
+		 * perf wants to work, but P4 is special. :-(
+		 */
+		if (p4_should_swap_ts(hwc->config, cpu))
+			hwc->idx = -1;
 		p4_pmu_swap_config_ts(hwc, cpu);
 		if (assign)
 			assign[i] = cntr_idx;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86, perf, p4: Counter corruption when using lots of perf groups
  2014-01-29 19:37 [PATCH] x86, perf, p4: Counter corruption when using lots of perf groups Don Zickus
@ 2014-01-29 20:06 ` Cyrill Gorcunov
  2014-01-29 20:17   ` Don Zickus
  2014-02-10 13:29 ` [tip:perf/core] perf/x86/p4: Fix counter " tip-bot for Don Zickus
  1 sibling, 1 reply; 6+ messages in thread
From: Cyrill Gorcunov @ 2014-01-29 20:06 UTC (permalink / raw)
  To: Don Zickus; +Cc: Peter Zijlstra, LKML

On Wed, Jan 29, 2014 at 02:37:50PM -0500, Don Zickus wrote:
> On a P4 box stressing perf with
> 
> ./perf record -o perf.data ./perf stat -v ./perf bench all
> 
> it was noticed that a slew of unknown NMIs would pop out rather quickly.
> 
> Painfully debugging this ancient platform, led me to notice cross cpu counter
> corruption.
> 
> The P4 machine is special in that it has 18 counters, half are used for cpu0
> and the other half is for cpu1 (or all 18 if hyperthreading is disabled).  But
> the splitting of the counters has to be actively managed by the software.
> 
> In this particular bug, one of the cpu0 specific counters was being used by
> cpu1 and caused all sorts of random unknown nmis.
> 
> I am not entirely sure on the corruption path, but what happens is:
> 
> o perf schedules a group with p4_pmu_schedule_events()
> o inside p4_pmu_schedule_events(), it notices an hwc pointer is being reused
>   but for a different cpu, so it 'swaps' the config bits and returns the
>   updated 'assign' array with a _new_ index.
> o perf schedules another group with p4_pmu_schedule_events()
> o inside p4_pmu_schedule_events(), it notices an hwc pointer is being reused
>   (the same one as above) but for the _same_ cpu [BUG!!], so it updates the
>   'assign' array to use the _old_ (wrong cpu) index because the _new_ index is in
>   an earlier part of the 'assign' array (and hasn't been committed yet).
> o perf commits the transaction using the wrong index and corrupts the other cpu

Thanks for the fix Don! I fear I won't be able to look precisely tonight, so
could it wait until tomorrow? (If it's critical sure such fix should do the
trick).

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86, perf, p4: Counter corruption when using lots of perf groups
  2014-01-29 20:06 ` Cyrill Gorcunov
@ 2014-01-29 20:17   ` Don Zickus
  2014-02-03  6:19     ` Cyrill Gorcunov
  0 siblings, 1 reply; 6+ messages in thread
From: Don Zickus @ 2014-01-29 20:17 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Peter Zijlstra, LKML

On Thu, Jan 30, 2014 at 12:06:57AM +0400, Cyrill Gorcunov wrote:
> On Wed, Jan 29, 2014 at 02:37:50PM -0500, Don Zickus wrote:
> > On a P4 box stressing perf with
> > 
> > ./perf record -o perf.data ./perf stat -v ./perf bench all
> > 
> > it was noticed that a slew of unknown NMIs would pop out rather quickly.
> > 
> > Painfully debugging this ancient platform, led me to notice cross cpu counter
> > corruption.
> > 
> > The P4 machine is special in that it has 18 counters, half are used for cpu0
> > and the other half is for cpu1 (or all 18 if hyperthreading is disabled).  But
> > the splitting of the counters has to be actively managed by the software.
> > 
> > In this particular bug, one of the cpu0 specific counters was being used by
> > cpu1 and caused all sorts of random unknown nmis.
> > 
> > I am not entirely sure on the corruption path, but what happens is:
> > 
> > o perf schedules a group with p4_pmu_schedule_events()
> > o inside p4_pmu_schedule_events(), it notices an hwc pointer is being reused
> >   but for a different cpu, so it 'swaps' the config bits and returns the
> >   updated 'assign' array with a _new_ index.
> > o perf schedules another group with p4_pmu_schedule_events()
> > o inside p4_pmu_schedule_events(), it notices an hwc pointer is being reused
> >   (the same one as above) but for the _same_ cpu [BUG!!], so it updates the
> >   'assign' array to use the _old_ (wrong cpu) index because the _new_ index is in
> >   an earlier part of the 'assign' array (and hasn't been committed yet).
> > o perf commits the transaction using the wrong index and corrupts the other cpu
> 
> Thanks for the fix Don! I fear I won't be able to look precisely tonight, so
> could it wait until tomorrow? (If it's critical sure such fix should do the
> trick).

There is no rush.  Early next week is fine too. :-)

Cheers,
Don


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86, perf, p4: Counter corruption when using lots of perf groups
  2014-01-29 20:17   ` Don Zickus
@ 2014-02-03  6:19     ` Cyrill Gorcunov
  2014-02-03 16:35       ` Don Zickus
  0 siblings, 1 reply; 6+ messages in thread
From: Cyrill Gorcunov @ 2014-02-03  6:19 UTC (permalink / raw)
  To: Don Zickus; +Cc: Peter Zijlstra, LKML

On Wed, Jan 29, 2014 at 03:17:17PM -0500, Don Zickus wrote:
> > > I am not entirely sure on the corruption path, but what happens is:
> > > 
> > > o perf schedules a group with p4_pmu_schedule_events()
> > > o inside p4_pmu_schedule_events(), it notices an hwc pointer is being reused
> > >   but for a different cpu, so it 'swaps' the config bits and returns the
> > >   updated 'assign' array with a _new_ index.
> > > o perf schedules another group with p4_pmu_schedule_events()
> > > o inside p4_pmu_schedule_events(), it notices an hwc pointer is being reused
> > >   (the same one as above) but for the _same_ cpu [BUG!!], so it updates the
> > >   'assign' array to use the _old_ (wrong cpu) index because the _new_ index is in
> > >   an earlier part of the 'assign' array (and hasn't been committed yet).
> > > o perf commits the transaction using the wrong index and corrupts the other cpu
> > 
> > Thanks for the fix Don! I fear I won't be able to look precisely tonight, so
> > could it wait until tomorrow? (If it's critical sure such fix should do the
> > trick).
> 
> There is no rush.  Early next week is fine too. :-)

Hi Don, sorry for delay. I thought maybe extending match_prev_assignment()
would be better (ie to figure out if previous event can run without
reprogramming the counter) but this makes code only harder (and what
is worse -- having no physical accees to p4 machine leaves no chance
to test changes). So eventually I think your patch does the same thing
as I had in mind but in different way. Thus

Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>

thanks a lot!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86, perf, p4: Counter corruption when using lots of perf groups
  2014-02-03  6:19     ` Cyrill Gorcunov
@ 2014-02-03 16:35       ` Don Zickus
  0 siblings, 0 replies; 6+ messages in thread
From: Don Zickus @ 2014-02-03 16:35 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Peter Zijlstra, LKML

On Mon, Feb 03, 2014 at 10:19:29AM +0400, Cyrill Gorcunov wrote:
> On Wed, Jan 29, 2014 at 03:17:17PM -0500, Don Zickus wrote:
> > > > I am not entirely sure on the corruption path, but what happens is:
> > > > 
> > > > o perf schedules a group with p4_pmu_schedule_events()
> > > > o inside p4_pmu_schedule_events(), it notices an hwc pointer is being reused
> > > >   but for a different cpu, so it 'swaps' the config bits and returns the
> > > >   updated 'assign' array with a _new_ index.
> > > > o perf schedules another group with p4_pmu_schedule_events()
> > > > o inside p4_pmu_schedule_events(), it notices an hwc pointer is being reused
> > > >   (the same one as above) but for the _same_ cpu [BUG!!], so it updates the
> > > >   'assign' array to use the _old_ (wrong cpu) index because the _new_ index is in
> > > >   an earlier part of the 'assign' array (and hasn't been committed yet).
> > > > o perf commits the transaction using the wrong index and corrupts the other cpu
> > > 
> > > Thanks for the fix Don! I fear I won't be able to look precisely tonight, so
> > > could it wait until tomorrow? (If it's critical sure such fix should do the
> > > trick).
> > 
> > There is no rush.  Early next week is fine too. :-)
> 
> Hi Don, sorry for delay. I thought maybe extending match_prev_assignment()
> would be better (ie to figure out if previous event can run without
> reprogramming the counter) but this makes code only harder (and what
> is worse -- having no physical accees to p4 machine leaves no chance
> to test changes). So eventually I think your patch does the same thing
> as I had in mind but in different way. Thus
> 
> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
> 
> thanks a lot!

thanks!

Cheers,
Don


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [tip:perf/core] perf/x86/p4: Fix counter corruption when using lots of perf groups
  2014-01-29 19:37 [PATCH] x86, perf, p4: Counter corruption when using lots of perf groups Don Zickus
  2014-01-29 20:06 ` Cyrill Gorcunov
@ 2014-02-10 13:29 ` tip-bot for Don Zickus
  1 sibling, 0 replies; 6+ messages in thread
From: tip-bot for Don Zickus @ 2014-02-10 13:29 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, gorcunov, peterz, tglx, dzickus

Commit-ID:  13beacee817d27a40ffc6f065ea0042685611dd5
Gitweb:     http://git.kernel.org/tip/13beacee817d27a40ffc6f065ea0042685611dd5
Author:     Don Zickus <dzickus@redhat.com>
AuthorDate: Wed, 29 Jan 2014 14:37:50 -0500
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 9 Feb 2014 13:17:23 +0100

perf/x86/p4: Fix counter corruption when using lots of perf groups

On a P4 box stressing perf with:

   ./perf record -o perf.data ./perf stat -v ./perf bench all

it was noticed that a slew of unknown NMIs would pop out rather quickly.

Painfully debugging this ancient platform, led me to notice cross cpu counter
corruption.

The P4 machine is special in that it has 18 counters, half are used for cpu0
and the other half is for cpu1 (or all 18 if hyperthreading is disabled).  But
the splitting of the counters has to be actively managed by the software.

In this particular bug, one of the cpu0 specific counters was being used by
cpu1 and caused all sorts of random unknown nmis.

I am not entirely sure on the corruption path, but what happens is:

 o perf schedules a group with p4_pmu_schedule_events()
 o inside p4_pmu_schedule_events(), it notices an hwc pointer is being reused
   but for a different cpu, so it 'swaps' the config bits and returns the
   updated 'assign' array with a _new_ index.
 o perf schedules another group with p4_pmu_schedule_events()
 o inside p4_pmu_schedule_events(), it notices an hwc pointer is being reused
   (the same one as above) but for the _same_ cpu [BUG!!], so it updates the
   'assign' array to use the _old_ (wrong cpu) index because the _new_ index is in
   an earlier part of the 'assign' array (and hasn't been committed yet).
 o perf commits the transaction using the wrong index and corrupts the other cpu

The [BUG!!] is because the 'hwc->config' is updated but not the 'hwc->idx'.  So
the check for 'p4_should_swap_ts()' is correct the first time around but
incorrect the second time around (because hwc->config was updated in between).

I think the spirit of perf was to not modify anything until all the
transactions had a chance to 'test' if they would succeed, and if so, commit
atomically.  However, P4 breaks this spirit by touching the hwc->config
element.

So my fix is to continue the un-perf like breakage, by assigning hwc->idx to -1
on swap to tell follow up group scheduling to find a new index.

Of course if the transaction fails rolling this back will be difficult, but
that is not different than how the current code works. :-)  And I wasn't sure
how much effort to cleanup the code I should do for a platform that is almost
10 years old by now.

Hence the lazy fix.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1391024270-19469-1-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/perf_event_p4.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/perf_event_p4.c b/arch/x86/kernel/cpu/perf_event_p4.c
index 3486e66..f44c34d 100644
--- a/arch/x86/kernel/cpu/perf_event_p4.c
+++ b/arch/x86/kernel/cpu/perf_event_p4.c
@@ -1257,7 +1257,24 @@ again:
 			pass++;
 			goto again;
 		}
-
+		/*
+		 * Perf does test runs to see if a whole group can be assigned
+		 * together succesfully.  There can be multiple rounds of this.
+		 * Unfortunately, p4_pmu_swap_config_ts touches the hwc->config
+		 * bits, such that the next round of group assignments will
+		 * cause the above p4_should_swap_ts to pass instead of fail.
+		 * This leads to counters exclusive to thread0 being used by
+		 * thread1.
+		 *
+		 * Solve this with a cheap hack, reset the idx back to -1 to
+		 * force a new lookup (p4_next_cntr) to get the right counter
+		 * for the right thread.
+		 *
+		 * This probably doesn't comply with the general spirit of how
+		 * perf wants to work, but P4 is special. :-(
+		 */
+		if (p4_should_swap_ts(hwc->config, cpu))
+			hwc->idx = -1;
 		p4_pmu_swap_config_ts(hwc, cpu);
 		if (assign)
 			assign[i] = cntr_idx;

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-02-10 13:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-29 19:37 [PATCH] x86, perf, p4: Counter corruption when using lots of perf groups Don Zickus
2014-01-29 20:06 ` Cyrill Gorcunov
2014-01-29 20:17   ` Don Zickus
2014-02-03  6:19     ` Cyrill Gorcunov
2014-02-03 16:35       ` Don Zickus
2014-02-10 13:29 ` [tip:perf/core] perf/x86/p4: Fix counter " tip-bot for Don Zickus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox