From: Ingo Molnar <mingo@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: eranian@gmail.com, Linus Torvalds <torvalds@linux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Arnaldo Carvalho de Melo <acme@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Andi Kleen <andi@firstfloor.org>
Subject: Re: PEBS bug on HSW: "Unexpected number of pebs records 10" (was: Re: [GIT PULL] perf changes for v3.12)
Date: Tue, 17 Sep 2013 09:00:49 +0200 [thread overview]
Message-ID: <20130917070048.GB20661@gmail.com> (raw)
In-Reply-To: <20130916162926.GA12926@twins.programming.kicks-ass.net>
* Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Sep 16, 2013 at 05:41:46PM +0200, Ingo Molnar wrote:
> >
> > * Stephane Eranian <eranian@googlemail.com> wrote:
> >
> > > Hi,
> > >
> > > Some updates on this problem.
> > > I have been running tests all week-end long on my HSW.
> > > I can reproduce the problem. What I know:
> > >
> > > - It is not linked with callchain
> > > - The extra entries are valid
> > > - The reset values are still zeroes
> > > - The problem does not happen on SNB with the same test case
> > > - The PMU state looks sane when that happens.
> > > - The problem occurs even when restricting to one CPU/core (taskset -c 0-3)
> > >
> > > So it seems like the threshold is ignored. But I don't understand where
> > > there reset values are coming from. So it looks more like a bug in
> > > micro-code where under certain circumstances multiple entries get
> > > written.
> >
> > Either multiple entries are written, or the PMI/NMI is not asserted as it
> > should be?
>
> No, both :-)
>
> > > Something must be happening with the interrupt or HT. I will disable HT
> > > next and also disable the NMI watchdog.
> >
> > Yes, interaction with the NMI watchdog events might also be possible.
> >
> > If it's truly just the threshold that is broken occasionally in a
> > statistically insignificant manner then the bug is relatively benign and
> > we could work it around in the kernel by ignoring excess entries.
> >
> > In that case we should probably not annoy users with the scary kernel
> > warning and instead increase a debug count somewhere so that it's still
> > detectable.
>
> Its not just a broken threshold. When a PEBS event happens it can re-arm
> itself but only if you program a RESET value !0. We don't do that, so
> each counter should only ever fire once.
>
> We must do this because PEBS is broken on NHM+ in that the
> pebs_record::status is a direct copy of the overflow status field at
> time of the assist and if you use the RESET thing nothing will clear the
> status bits and you cannot demux the PEBS events back to the event that
> generated them.
>
> Worse, since its the overflow that arms the assist, and the assist
> happens at some undefined amount of cycles after this event it is
> possible for another assist to happen first.
>
> That is, suppose both CNT0 and CNT1 have PEBS enabled and CNT0 overflows
> first it is possible to find the CNT1 entry first in the buffer with
> both of them having status := 0x03.
>
> Complete and utter trainwreck.
>
> This is why we have a threshold of 1 and use NMI for PMI even for pure
> PEBS, it minimizes the complete clusterfuck described above.
What I mean that as per observations the problem seems to be statistical:
it happens only once every couple of million records. So, as long as no
memory is corrupted (the PEBS records don't go outside the DS area) it
could be ignored when it happens, and still produce a valid, usable
profile.
( Btw., we might want to introduce a 'error' event passed to tools, which
event they could process in a soft, statistical manner: only warn the
user if the erroneous events go beyond 1% or 5%, etc. Kernel warnings
are really not the best fit for such purposes. )
Thanks,
Ingo
next prev parent reply other threads:[~2013-09-17 7:00 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-03 13:29 [GIT PULL] perf changes for v3.12 Ingo Molnar
2013-09-03 13:37 ` Arnaldo Carvalho de Melo
2013-09-03 13:43 ` Ingo Molnar
2013-09-03 17:02 ` Vince Weaver
2013-09-04 17:53 ` Linus Torvalds
2013-09-05 10:56 ` Ingo Molnar
2013-09-05 12:42 ` Frederic Weisbecker
2013-09-05 12:51 ` Ingo Molnar
2013-09-05 12:58 ` Frederic Weisbecker
2013-09-10 8:06 ` Namhyung Kim
2013-09-10 11:18 ` Frederic Weisbecker
2013-09-05 13:38 ` Ingo Molnar
2013-09-08 2:17 ` Linus Torvalds
2013-09-09 10:05 ` Peter Zijlstra
2013-09-10 11:28 ` Stephane Eranian
2013-09-10 11:53 ` PEBS bug on HSW: "Unexpected number of pebs records 10" (was: Re: [GIT PULL] perf changes for v3.12) Ingo Molnar
2013-09-10 12:32 ` Stephane Eranian
2013-09-10 12:42 ` Ramkumar Ramachandra
2013-09-10 12:51 ` Ramkumar Ramachandra
2013-09-10 12:55 ` Stephane Eranian
2013-09-10 13:22 ` Ingo Molnar
2013-09-10 13:38 ` Ingo Molnar
2013-09-10 14:15 ` Stephane Eranian
2013-09-10 14:29 ` Ingo Molnar
2013-09-10 14:34 ` Stephane Eranian
2013-09-10 17:14 ` Ingo Molnar
2013-09-16 11:07 ` Stephane Eranian
2013-09-16 15:41 ` Ingo Molnar
2013-09-16 16:29 ` Peter Zijlstra
2013-09-17 7:00 ` Ingo Molnar [this message]
2013-09-23 15:25 ` Stephane Eranian
2013-09-23 15:33 ` Peter Zijlstra
2013-09-23 17:11 ` Stephane Eranian
2013-09-23 17:24 ` Peter Zijlstra
2013-09-10 15:28 ` Peter Zijlstra
2013-09-10 16:14 ` Stephane Eranian
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130917070048.GB20661@gmail.com \
--to=mingo@kernel.org \
--cc=acme@infradead.org \
--cc=andi@firstfloor.org \
--cc=eranian@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox