* [PATCH] perf, x86: P4 PMU -- Fix unflagged overflows test @ 2011-02-04 12:17 Cyrill Gorcunov 2011-02-04 16:59 ` Don Zickus 2011-02-05 2:28 ` George Spelvin 0 siblings, 2 replies; 11+ messages in thread From: Cyrill Gorcunov @ 2011-02-04 12:17 UTC (permalink / raw) To: Ingo Molnar, Don Zickus Cc: George Spelvin, Meelis Roos, Lin Ming, Peter Zijlstra, lkml [-- Attachment #1: Type: text/plain, Size: 215 bytes --] Please apply it, sorry for non-inlined patch (have a web access only at moment). Note that I've tested the patch on non-HT machine so if someone have HT'ed one -- it would be great to test the patch there. Cyrill [-- Attachment #2: x86-perf-unflagged-nmi --] [-- Type: application/octet-stream, Size: 2023 bytes --] From: Cyrill Gorcunov <gorcunov@openvz.org> Subject: [PATCH] perf, x86: P4 PMU -- Fix unflagged overflows test A couple of people have reported an unknown NMI issue on p4 pmu. This patch should fix it. Reported-by: George Spelvin <linux@horizon.com> Reported-by: Meelis Roos <mroos@linux.ee> Reported-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> CC: Ingo Molnar <mingo@elte.hu> CC: Lin Ming <ming.m.lin@intel.com> CC: Don Zickus <dzickus@redhat.com> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> --- arch/x86/include/asm/perf_event_p4.h | 1 + arch/x86/kernel/cpu/perf_event_p4.c | 11 ++++++++--- 2 files changed, 9 insertions(+), 3 deletions(-) Index: linux-2.6.tip/arch/x86/include/asm/perf_event_p4.h =================================================================== --- linux-2.6.tip.orig/arch/x86/include/asm/perf_event_p4.h +++ linux-2.6.tip/arch/x86/include/asm/perf_event_p4.h @@ -22,6 +22,7 @@ #define ARCH_P4_CNTRVAL_BITS (40) #define ARCH_P4_CNTRVAL_MASK ((1ULL << ARCH_P4_CNTRVAL_BITS) - 1) +#define ARCH_P4_UNFLAGGED_BIT ((1ULL) << (ARCH_P4_CNTRVAL_BITS - 1)) #define P4_ESCR_EVENT_MASK 0x7e000000U #define P4_ESCR_EVENT_SHIFT 25 Index: linux-2.6.tip/arch/x86/kernel/cpu/perf_event_p4.c =================================================================== --- linux-2.6.tip.orig/arch/x86/kernel/cpu/perf_event_p4.c +++ linux-2.6.tip/arch/x86/kernel/cpu/perf_event_p4.c @@ -770,9 +770,14 @@ static inline int p4_pmu_clear_cccr_ovf( return 1; } - /* it might be unflagged overflow */ - rdmsrl(hwc->event_base + hwc->idx, v); - if (!(v & ARCH_P4_CNTRVAL_MASK)) + /* + * at some circumstances the overflow might issue NMI but did + * not set P4_CCCR_OVF bit so since a counter holds a negative value + * we simply check for high bit being set, if it's cleared it means + * the counter has reached zero value and continued counting before + * real NMI signal was received + */ + if (!(v & ARCH_P4_UNFLAGGED_BIT)) return 1; return 0; ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] perf, x86: P4 PMU -- Fix unflagged overflows test 2011-02-04 12:17 [PATCH] perf, x86: P4 PMU -- Fix unflagged overflows test Cyrill Gorcunov @ 2011-02-04 16:59 ` Don Zickus 2011-02-04 17:32 ` Cyrill Gorcunov 2011-02-06 19:21 ` Cyrill Gorcunov 2011-02-05 2:28 ` George Spelvin 1 sibling, 2 replies; 11+ messages in thread From: Don Zickus @ 2011-02-04 16:59 UTC (permalink / raw) To: Cyrill Gorcunov Cc: Ingo Molnar, George Spelvin, Meelis Roos, Lin Ming, Peter Zijlstra, lkml On Fri, Feb 04, 2011 at 03:17:28PM +0300, Cyrill Gorcunov wrote: > Please apply it, sorry for non-inlined patch (have a web access only at moment). > > Note that I've tested the patch on non-HT machine so if someone have HT'ed one > -- it would be great to test the patch there. Hmm. For some reason, when I enable the kgdb testsuite, the box fails to boot with hardlockup issues. It seems like the code is swallowing the NMIs? I basically applied this patch on top of 2.6.38-rc3 and ran it on my Xeon box (p4 w/HT). Cheers, Don ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] perf, x86: P4 PMU -- Fix unflagged overflows test 2011-02-04 16:59 ` Don Zickus @ 2011-02-04 17:32 ` Cyrill Gorcunov 2011-02-06 19:21 ` Cyrill Gorcunov 1 sibling, 0 replies; 11+ messages in thread From: Cyrill Gorcunov @ 2011-02-04 17:32 UTC (permalink / raw) To: Don Zickus Cc: Ingo Molnar, George Spelvin, Meelis Roos, Lin Ming, Peter Zijlstra, lkml, Jason Wessel On 02/04/2011 07:59 PM, Don Zickus wrote: > On Fri, Feb 04, 2011 at 03:17:28PM +0300, Cyrill Gorcunov wrote: >> Please apply it, sorry for non-inlined patch (have a web access only at moment). >> >> Note that I've tested the patch on non-HT machine so if someone have HT'ed one >> -- it would be great to test the patch there. > > Hmm. For some reason, when I enable the kgdb testsuite, the box fails to > boot with hardlockup issues. It seems like the code is swallowing the > NMIs? I basically applied this patch on top of 2.6.38-rc3 and ran it on my > Xeon box (p4 w/HT). > > Cheers, > Don Interesting, seems old kgdb issue got back. The former unknown nmi problem is due to commit 047a3772feaae8e43d81d790f3d3f80dae8ae676 which assumed that counter stays zero when unflagged overflow happened, but it seems this is not what happens on hw level. I noted that at moment of nmi the counter reached some positive value so the new patch simply checks for negative bit being set. I must admit I forgot to test with kgdb testsuite at bootup time and I'll be able to test this at monday in best case. I'll try to figure out what might happen by code reading for a while (the only idea comes is that nmi from kgdb get slipped with one issued by a perf). -- Cyrill ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] perf, x86: P4 PMU -- Fix unflagged overflows test 2011-02-04 16:59 ` Don Zickus 2011-02-04 17:32 ` Cyrill Gorcunov @ 2011-02-06 19:21 ` Cyrill Gorcunov 2011-02-07 17:22 ` Cyrill Gorcunov 1 sibling, 1 reply; 11+ messages in thread From: Cyrill Gorcunov @ 2011-02-06 19:21 UTC (permalink / raw) To: Don Zickus Cc: Ingo Molnar, George Spelvin, Meelis Roos, Lin Ming, Peter Zijlstra, lkml On 02/04/2011 07:59 PM, Don Zickus wrote: > On Fri, Feb 04, 2011 at 03:17:28PM +0300, Cyrill Gorcunov wrote: >> Please apply it, sorry for non-inlined patch (have a web access only at moment). >> >> Note that I've tested the patch on non-HT machine so if someone have HT'ed one >> -- it would be great to test the patch there. > > Hmm. For some reason, when I enable the kgdb testsuite, the box fails to > boot with hardlockup issues. It seems like the code is swallowing the > NMIs? I basically applied this patch on top of 2.6.38-rc3 and ran it on my > Xeon box (p4 w/HT). > > Cheers, > Don Don, I hope to get access to p4 machine tomorrow and investigate this issue (didn't manage to read kgdb code this weekend). Sorry for delay. -- Cyrill ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] perf, x86: P4 PMU -- Fix unflagged overflows test 2011-02-06 19:21 ` Cyrill Gorcunov @ 2011-02-07 17:22 ` Cyrill Gorcunov 2011-02-08 14:26 ` George Spelvin 0 siblings, 1 reply; 11+ messages in thread From: Cyrill Gorcunov @ 2011-02-07 17:22 UTC (permalink / raw) To: Don Zickus Cc: Ingo Molnar, George Spelvin, Meelis Roos, Lin Ming, Peter Zijlstra, lkml, Jason Wessel On 02/06/2011 10:21 PM, Cyrill Gorcunov wrote: > On 02/04/2011 07:59 PM, Don Zickus wrote: >> On Fri, Feb 04, 2011 at 03:17:28PM +0300, Cyrill Gorcunov wrote: >>> Please apply it, sorry for non-inlined patch (have a web access only at moment). >>> >>> Note that I've tested the patch on non-HT machine so if someone have HT'ed one >>> -- it would be great to test the patch there. >> >> Hmm. For some reason, when I enable the kgdb testsuite, the box fails to >> boot with hardlockup issues. It seems like the code is swallowing the >> NMIs? I basically applied this patch on top of 2.6.38-rc3 and ran it on my >> Xeon box (p4 w/HT). >> >> Cheers, >> Don > > Don, I hope to get access to p4 machine tomorrow and investigate this issue > (didn't manage to read kgdb code this weekend). Sorry for delay. > Just for info -- I've tested the patch on p4 machine with kgdb bootup tests and results are somehow strange. If I disable nmi-watchdog the tests passes fine and i'm able to run perf top (or anything related). Same time if I leave nmi-watchdog enabled by default the borrowed event reported, kgdb tests passes but nmi-watchdog never fires (ie I see nmi irq counter remains zero). So I've added debug prints and found that counter reaches positive values and didn't issues nmi at all. All in one -- i'm still investigating this issue unfortunatelly the kernel build procedure sometime takes hours on this machine (even with ccache enabled) so it goes a way slower then I expected :( -- Cyrill ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] perf, x86: P4 PMU -- Fix unflagged overflows test 2011-02-07 17:22 ` Cyrill Gorcunov @ 2011-02-08 14:26 ` George Spelvin 2011-02-08 14:38 ` Cyrill Gorcunov 0 siblings, 1 reply; 11+ messages in thread From: George Spelvin @ 2011-02-08 14:26 UTC (permalink / raw) To: dzickus, gorcunov Cc: a.p.zijlstra, jason.wessel, linux-kernel, linux, ming.m.lin, mingo, mroos I don't use kgdb, so I can't comment on that, but patch #2 ("perf, x86: P4 PMU -- Fix unflagged overflows test") has given me no problems in 3.5 days of uptime. Thank you very much! ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] perf, x86: P4 PMU -- Fix unflagged overflows test 2011-02-08 14:26 ` George Spelvin @ 2011-02-08 14:38 ` Cyrill Gorcunov 0 siblings, 0 replies; 11+ messages in thread From: Cyrill Gorcunov @ 2011-02-08 14:38 UTC (permalink / raw) To: George Spelvin Cc: dzickus, a.p.zijlstra, jason.wessel, linux-kernel, ming.m.lin, mingo, mroos On 02/08/2011 05:26 PM, George Spelvin wrote: > I don't use kgdb, so I can't comment on that, but patch #2 > ("perf, x86: P4 PMU -- Fix unflagged overflows test") > has given me no problems in 3.5 days of uptime. > > Thank you very much! Thanks for testing George! I'm still investigatin this issue (though didn't manage to spend more time today but there are some results obtained I need to analyze first). -- Cyrill ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] perf, x86: P4 PMU -- Fix unflagged overflows test 2011-02-04 12:17 [PATCH] perf, x86: P4 PMU -- Fix unflagged overflows test Cyrill Gorcunov 2011-02-04 16:59 ` Don Zickus @ 2011-02-05 2:28 ` George Spelvin 2011-02-05 8:40 ` Cyrill Gorcunov 1 sibling, 1 reply; 11+ messages in thread From: George Spelvin @ 2011-02-05 2:28 UTC (permalink / raw) To: dzickus, gorcunov, mingo Cc: a.p.zijlstra, linux-kernel, linux, ming.m.lin, mroos The earlier patch didn't fix things after all. I'm rebooting with this one now. (2.6.38-rc3 wih x86_pmu_start patch.) Feb 2 01:00:03: Uhhuh. NMI received for unknown reason 2d on CPU 0. Feb 2 01:00:03: Do you have a strange power saving mode enabled? Feb 2 01:00:03: Dazed and confused, but trying to continue Feb 2 01:01:04: Uhhuh. NMI received for unknown reason 3d on CPU 0. Feb 2 01:01:04: Do you have a strange power saving mode enabled? Feb 2 01:01:04: Dazed and confused, but trying to continue Feb 2 04:22:20: Uhhuh. NMI received for unknown reason 2d on CPU 0. Feb 2 04:22:20: Do you have a strange power saving mode enabled? Feb 2 04:22:20: Dazed and confused, but trying to continue Feb 2 06:28:17: Uhhuh. NMI received for unknown reason 2d on CPU 0. Feb 2 06:28:17: Do you have a strange power saving mode enabled? Feb 2 06:28:17: Dazed and confused, but trying to continue Feb 2 08:39:01: Uhhuh. NMI received for unknown reason 3d on CPU 0. Feb 2 08:39:01: Do you have a strange power saving mode enabled? Feb 2 08:39:01: Dazed and confused, but trying to continue Feb 2 16:00:04: Uhhuh. NMI received for unknown reason 3d on CPU 0. Feb 2 16:00:04: Do you have a strange power saving mode enabled? Feb 2 16:00:04: Dazed and confused, but trying to continue Feb 2 20:04:41: Uhhuh. NMI received for unknown reason 2d on CPU 0. Feb 2 20:04:41: Do you have a strange power saving mode enabled? Feb 2 20:04:41: Dazed and confused, but trying to continue Feb 2 22:21:27: Uhhuh. NMI received for unknown reason 3d on CPU 0. Feb 2 22:21:27: Do you have a strange power saving mode enabled? Feb 2 22:21:27: Dazed and confused, but trying to continue Feb 3 00:09:11: Uhhuh. NMI received for unknown reason 2d on CPU 0. Feb 3 00:09:11: Do you have a strange power saving mode enabled? Feb 3 00:09:11: Dazed and confused, but trying to continue Feb 3 00:24:02: Uhhuh. NMI received for unknown reason 3d on CPU 0. Feb 3 00:24:02: Do you have a strange power saving mode enabled? Feb 3 00:24:02: Dazed and confused, but trying to continue Feb 3 01:00:12: Uhhuh. NMI received for unknown reason 2d on CPU 0. Feb 3 01:00:12: Do you have a strange power saving mode enabled? Feb 3 01:00:12: Dazed and confused, but trying to continue Feb 3 01:27:50: Uhhuh. NMI received for unknown reason 3d on CPU 0. Feb 3 01:27:50: Do you have a strange power saving mode enabled? Feb 3 01:27:50: Dazed and confused, but trying to continue Feb 3 06:27:55: Uhhuh. NMI received for unknown reason 3d on CPU 0. Feb 3 06:27:55: Do you have a strange power saving mode enabled? Feb 3 06:27:55: Dazed and confused, but trying to continue Feb 3 09:21:06: Uhhuh. NMI received for unknown reason 3d on CPU 0. Feb 3 09:21:06: Do you have a strange power saving mode enabled? Feb 3 09:21:06: Dazed and confused, but trying to continue Feb 3 11:35:23: Uhhuh. NMI received for unknown reason 2d on CPU 0. Feb 3 11:35:23: Do you have a strange power saving mode enabled? Feb 3 11:35:23: Dazed and confused, but trying to continue Feb 3 17:52:05: Uhhuh. NMI received for unknown reason 3d on CPU 0. Feb 3 17:52:05: Do you have a strange power saving mode enabled? Feb 3 17:52:05: Dazed and confused, but trying to continue Feb 3 18:01:44: Uhhuh. NMI received for unknown reason 3d on CPU 0. Feb 3 18:01:44: Do you have a strange power saving mode enabled? Feb 3 18:01:44: Dazed and confused, but trying to continue Feb 3 18:11:23: Uhhuh. NMI received for unknown reason 3d on CPU 0. Feb 3 18:11:23: Do you have a strange power saving mode enabled? Feb 3 18:11:23: Dazed and confused, but trying to continue Feb 3 20:02:43: Uhhuh. NMI received for unknown reason 3d on CPU 0. Feb 3 20:02:43: Do you have a strange power saving mode enabled? Feb 3 20:02:43: Dazed and confused, but trying to continue Feb 3 22:38:43: Uhhuh. NMI received for unknown reason 3d on CPU 0. Feb 3 22:38:43: Do you have a strange power saving mode enabled? Feb 3 22:38:43: Dazed and confused, but trying to continue Feb 4 01:00:41: Uhhuh. NMI received for unknown reason 3d on CPU 0. Feb 4 01:00:41: Do you have a strange power saving mode enabled? Feb 4 01:00:41: Dazed and confused, but trying to continue Feb 4 05:00:02: Uhhuh. NMI received for unknown reason 3d on CPU 0. Feb 4 05:00:02: Do you have a strange power saving mode enabled? Feb 4 05:00:02: Dazed and confused, but trying to continue Feb 4 06:28:42: Uhhuh. NMI received for unknown reason 2d on CPU 0. Feb 4 06:28:42: Do you have a strange power saving mode enabled? Feb 4 06:28:42: Dazed and confused, but trying to continue Feb 4 13:35:29: Uhhuh. NMI received for unknown reason 2d on CPU 0. Feb 4 13:35:29: Do you have a strange power saving mode enabled? Feb 4 13:35:29: Dazed and confused, but trying to continue Feb 4 20:00:32: Uhhuh. NMI received for unknown reason 3d on CPU 0. Feb 4 20:00:32: Do you have a strange power saving mode enabled? Feb 4 20:00:32: Dazed and confused, but trying to continue Feb 4 21:12:52: Uhhuh. NMI received for unknown reason 2d on CPU 0. Feb 4 21:12:52: Do you have a strange power saving mode enabled? Feb 4 21:12:52: Dazed and confused, but trying to continue ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] perf, x86: P4 PMU -- Fix unflagged overflows test 2011-02-05 2:28 ` George Spelvin @ 2011-02-05 8:40 ` Cyrill Gorcunov 2011-02-05 9:15 ` George Spelvin 0 siblings, 1 reply; 11+ messages in thread From: Cyrill Gorcunov @ 2011-02-05 8:40 UTC (permalink / raw) To: George Spelvin Cc: dzickus, mingo, a.p.zijlstra, linux-kernel, ming.m.lin, mroos On 02/05/2011 05:28 AM, George Spelvin wrote: > The earlier patch didn't fix things after all. I'm rebooting with this one now. > > (2.6.38-rc3 wih x86_pmu_start patch.) > Feb 2 01:00:03: Uhhuh. NMI received for unknown reason 2d on CPU 0. > Feb 2 01:00:03: Do you have a strange power saving mode enabled? ... Hi George, this log is when unflagged overflow patch applied or previous one? -- Cyrill ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] perf, x86: P4 PMU -- Fix unflagged overflows test 2011-02-05 8:40 ` Cyrill Gorcunov @ 2011-02-05 9:15 ` George Spelvin 2011-02-05 9:22 ` Cyrill Gorcunov 0 siblings, 1 reply; 11+ messages in thread From: George Spelvin @ 2011-02-05 9:15 UTC (permalink / raw) To: gorcunov, linux Cc: a.p.zijlstra, dzickus, linux-kernel, ming.m.lin, mingo, mroos > Hi George, this log is when unflagged overflow patch applied or previous one? The earlier one. As I said, the patch to x86_pmu_start. The later one (no complaints in 6 hours of uptime so far) was to p4_pmu_clear_cccr_ovf. Thank you very much! ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] perf, x86: P4 PMU -- Fix unflagged overflows test 2011-02-05 9:15 ` George Spelvin @ 2011-02-05 9:22 ` Cyrill Gorcunov 0 siblings, 0 replies; 11+ messages in thread From: Cyrill Gorcunov @ 2011-02-05 9:22 UTC (permalink / raw) To: George Spelvin Cc: a.p.zijlstra, dzickus, linux-kernel, ming.m.lin, mingo, mroos On 02/05/2011 12:15 PM, George Spelvin wrote: >> Hi George, this log is when unflagged overflow patch applied or previous one? > > The earlier one. As I said, the patch to x86_pmu_start. The later one (no > complaints in 6 hours of uptime so far) was to p4_pmu_clear_cccr_ovf. > > Thank you very much! Thanks for testing George! There is still a problem with kgdb bootup tests, but at moment I didn't resolve this thing. Will ping as only get any news ;) -- Cyrill ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2011-02-08 14:38 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-02-04 12:17 [PATCH] perf, x86: P4 PMU -- Fix unflagged overflows test Cyrill Gorcunov 2011-02-04 16:59 ` Don Zickus 2011-02-04 17:32 ` Cyrill Gorcunov 2011-02-06 19:21 ` Cyrill Gorcunov 2011-02-07 17:22 ` Cyrill Gorcunov 2011-02-08 14:26 ` George Spelvin 2011-02-08 14:38 ` Cyrill Gorcunov 2011-02-05 2:28 ` George Spelvin 2011-02-05 8:40 ` Cyrill Gorcunov 2011-02-05 9:15 ` George Spelvin 2011-02-05 9:22 ` Cyrill Gorcunov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).