From: Don Zickus <dzickus@redhat.com>
To: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>,
Cyrill Gorcunov <gorcunov@gmail.com>,
"mingo@redhat.com" <mingo@redhat.com>,
"hpa@zytor.com" <hpa@zytor.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"yinghai@kernel.org" <yinghai@kernel.org>,
"andi@firstfloor.org" <andi@firstfloor.org>,
"peterz@infradead.org" <peterz@infradead.org>,
"ying.huang@intel.com" <ying.huang@intel.com>,
"fweisbec@gmail.com" <fweisbec@gmail.com>,
"ming.m.lin@intel.com" <ming.m.lin@intel.com>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"mingo@elte.hu" <mingo@elte.hu>
Subject: Re: [tip:perf/urgent] perf, x86: Catch spurious interrupts after disabling counters
Date: Thu, 30 Sep 2010 15:44:51 -0400 [thread overview]
Message-ID: <20100930194451.GI26290@redhat.com> (raw)
In-Reply-To: <20100930091246.GV13563@erda.amd.com>
On Thu, Sep 30, 2010 at 11:12:46AM +0200, Robert Richter wrote:
> On 29.09.10 15:42:26, Stephane Eranian wrote:
> > On Wed, Sep 29, 2010 at 8:12 PM, Don Zickus <dzickus@redhat.com> wrote:
> > > I think you missed Stephane's point. Say for example, kgdb is being used
> > > while we are doing stuff with the perf counter (and say kgdb's handler is
> > > a lower priority than perf; which isn't true I know, but let's say):
> > >
> > Yes, exactly my point. The reality is you cannot afford to have false positive
> > because you may starve another subsystem from an important notification.
>
> As soon as you stop executing the chain, there are chances to miss an
> nmi for other parts of the system. Where is no way to avoid this. So
> your argument above is valid also for regular perf nmis and not only
> for catched-spurious or back-to-back nmis.
I don't agree with that. Most nmi handlers can do a check to see if their
subsystem triggered an nmi or not. Now we may not catch it in the right
order because one handler is higher in the chain than the other, but
ultimately the other handler will get its chance to execute because it
fired its own nmi (which hasn't been lost).
Whereas the problem Stephane is describing is that the heurestics of the
perf counters 'eats' an NMI, thus possibly starving another handler. With
back-to-back nmis we are at least polite, letting everyone have a chance to
process the nmi before we indulge ourselves and 'eat' it (if it still
around to be eaten).
However in the case of the 'catched-spurious', we selfishly 'eat' the NMI
without really knowing if it was our to be eaten. That was the
difference and the concern.
>
> > > Now I sent a patch last week that can prevent that extra NMI from being
> > > generated at the cost of another rdmsrl in the non-pmu_stop cases (which I
> > > will attach below again, obviously P4 would need something similar too).
>
> A rdmsrl() does not help, it only causes overhead. There is no bit to
> detect if a counter overflowed and triggered the interrupt, you only
> know the counter value is greater zero or not.
Well, the counters are programmed to trigger an NMI when it crosses zero.
So if we delay reprogramming the counters until after we know if we are
going to issue a pmu_stop, then it should be impossible to trigger an
overflow (because the counters are going to keeping counting above zero,
unless it wraps which would be a different problem all together).
>
> We should take care the discussion becomes not academical and do not
> start to overengineer something. I always can imagine some really rare
> corner cases in which we may loss an nmi. This is because hardware is
> not built for it. But in 99% or so of the cases we get all nmis,
> instead of before where all nmis were eaten by the profiler.
I don't think this is over engineering. Basically we haven't seen the
problem yet because the only really active nmi handler is the perf one and
it is designed to be last on the list. If we start fiddling with
priorities and re-arranging the list, the problem might be exposed quicker
than you think.
Trying to prevent a 'spurious' NMI in the back-to-back case might be a
case for over-engineering, I'll agree to that (I think I tried and
realized how foolish that was).
Cheers,
Don
next prev parent reply other threads:[~2010-09-30 19:45 UTC|newest]
Thread overview: 117+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-02 19:07 [PATCH 0/3 v2] nmi perf fixes Don Zickus
2010-09-02 19:07 ` [PATCH 1/3] perf, x86: Fix accidentally ack'ing a second event on intel perf counter Don Zickus
2010-09-02 19:26 ` Cyrill Gorcunov
2010-09-02 20:00 ` Don Zickus
2010-09-02 20:36 ` Cyrill Gorcunov
2010-09-03 7:10 ` [tip:perf/urgent] " tip-bot for Don Zickus
2010-09-03 7:39 ` Yinghai Lu
2010-09-03 15:00 ` Don Zickus
2010-09-03 17:15 ` Yinghai Lu
2010-09-03 18:35 ` Don Zickus
2010-09-03 19:24 ` Yinghai Lu
2010-09-03 20:10 ` Don Zickus
2010-10-04 23:24 ` Yinghai Lu
2010-10-11 20:25 ` Don Zickus
2010-09-02 19:07 ` [PATCH 2/3] perf, x86: Try to handle unknown nmis with an enabled PMU Don Zickus
2010-09-03 7:11 ` [tip:perf/urgent] " tip-bot for Robert Richter
2010-09-02 19:07 ` [PATCH 3/3] perf, x86: Fix handle_irq return values Don Zickus
2010-09-03 7:10 ` [tip:perf/urgent] " tip-bot for Peter Zijlstra
2010-09-10 11:41 ` [PATCH 0/3 v2] nmi perf fixes Peter Zijlstra
2010-09-10 12:10 ` Stephane Eranian
2010-09-10 12:13 ` Stephane Eranian
2010-09-10 13:27 ` Don Zickus
2010-09-10 14:46 ` Ingo Molnar
2010-09-10 15:17 ` Robert Richter
2010-09-10 15:58 ` Peter Zijlstra
2010-09-10 16:41 ` Ingo Molnar
2010-09-10 16:42 ` Ingo Molnar
2010-09-10 16:37 ` Ingo Molnar
2010-09-10 16:51 ` Ingo Molnar
2010-09-10 15:56 ` [PATCH] x86: fix duplicate calls of the nmi handler Robert Richter
2010-09-10 16:15 ` Peter Zijlstra
2010-09-11 9:41 ` Ingo Molnar
2010-09-11 11:44 ` Robert Richter
2010-09-11 12:45 ` Ingo Molnar
2010-09-12 9:52 ` Robert Richter
2010-09-13 14:37 ` Robert Richter
2010-09-14 17:41 ` Robert Richter
2010-09-15 16:20 ` [PATCH] perf, x86: catch spurious interrupts after disabling counters Robert Richter
2010-09-15 16:36 ` Stephane Eranian
2010-09-15 17:00 ` Robert Richter
2010-09-15 17:32 ` Stephane Eranian
2010-09-15 18:44 ` Robert Richter
2010-09-15 19:34 ` Cyrill Gorcunov
2010-09-15 20:21 ` Stephane Eranian
2010-09-15 20:39 ` Cyrill Gorcunov
2010-09-15 22:27 ` Robert Richter
2010-09-16 14:51 ` Frederic Weisbecker
2010-09-15 16:46 ` Cyrill Gorcunov
2010-09-15 16:47 ` Stephane Eranian
2010-09-15 17:02 ` Cyrill Gorcunov
2010-09-15 17:28 ` Robert Richter
2010-09-15 17:40 ` Cyrill Gorcunov
2010-09-15 22:10 ` Robert Richter
2010-09-16 6:53 ` Cyrill Gorcunov
2010-09-16 17:34 ` Peter Zijlstra
2010-09-17 8:51 ` Robert Richter
2010-09-17 9:14 ` Peter Zijlstra
2010-09-17 13:06 ` Stephane Eranian
2010-09-20 8:41 ` Robert Richter
2010-09-24 0:02 ` Don Zickus
2010-09-24 3:18 ` Don Zickus
2010-09-24 10:03 ` Robert Richter
2010-09-24 13:38 ` Stephane Eranian
2010-09-30 12:33 ` Peter Zijlstra
2010-09-24 18:11 ` Don Zickus
2010-09-24 10:41 ` [tip:perf/urgent] perf, x86: Catch " tip-bot for Robert Richter
2010-09-29 12:26 ` Stephane Eranian
2010-09-29 12:53 ` Robert Richter
2010-09-29 12:54 ` Robert Richter
2010-09-29 13:13 ` Stephane Eranian
2010-09-29 13:28 ` Stephane Eranian
2010-09-29 15:01 ` Robert Richter
2010-09-29 15:12 ` Robert Richter
2010-09-29 15:27 ` Cyrill Gorcunov
2010-09-29 15:33 ` Stephane Eranian
2010-09-29 15:45 ` Cyrill Gorcunov
2010-09-29 15:51 ` Cyrill Gorcunov
2010-09-29 16:32 ` Robert Richter
2010-09-29 16:48 ` Cyrill Gorcunov
2010-09-29 16:00 ` Stephane Eranian
2010-09-29 17:09 ` Robert Richter
2010-09-29 17:41 ` Cyrill Gorcunov
2010-09-29 18:12 ` Don Zickus
2010-09-29 19:42 ` Stephane Eranian
2010-09-29 20:03 ` Don Zickus
2010-09-30 9:12 ` Robert Richter
2010-09-30 19:44 ` Don Zickus [this message]
2010-10-01 7:17 ` Robert Richter
[not found] ` <AANLkTimUyLaVaBigjm0-CwRsdh4UXWDiss2ffX53S+k_@mail.gmail.com>
2010-10-01 11:53 ` Stephane Eranian
2010-10-02 9:35 ` Robert Richter
2010-10-04 8:53 ` Stephane Eranian
2010-10-04 9:07 ` Andi Kleen
2010-10-04 17:28 ` Stephane Eranian
2010-09-29 16:31 ` Robert Richter
2010-09-29 16:22 ` Robert Richter
2010-09-29 19:01 ` Don Zickus
2010-09-29 13:39 ` Robert Richter
2010-09-29 13:56 ` Stephane Eranian
2010-09-29 14:00 ` Stephane Eranian
2010-10-02 9:50 ` Robert Richter
2010-10-02 17:40 ` Stephane Eranian
2010-09-29 15:02 ` Cyrill Gorcunov
2010-09-16 17:42 ` [PATCH] x86: fix duplicate calls of the nmi handler Peter Zijlstra
2010-09-16 20:18 ` Stephane Eranian
2010-09-17 7:09 ` Peter Zijlstra
2010-09-17 0:13 ` Huang Ying
2010-09-17 7:52 ` Peter Zijlstra
2010-09-17 8:13 ` Robert Richter
2010-09-17 8:37 ` Cyrill Gorcunov
2010-09-17 8:47 ` Huang Ying
2010-09-10 13:34 ` [PATCH 0/3 v2] nmi perf fixes Peter Zijlstra
2010-09-10 13:52 ` Peter Zijlstra
2010-09-13 8:55 ` Cyrill Gorcunov
2010-09-13 9:54 ` Stephane Eranian
2010-09-13 10:07 ` Cyrill Gorcunov
2010-09-13 10:10 ` Stephane Eranian
2010-09-13 10:12 ` Cyrill Gorcunov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100930194451.GI26290@redhat.com \
--to=dzickus@redhat.com \
--cc=andi@firstfloor.org \
--cc=eranian@google.com \
--cc=fweisbec@gmail.com \
--cc=gorcunov@gmail.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=ming.m.lin@intel.com \
--cc=mingo@elte.hu \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=robert.richter@amd.com \
--cc=tglx@linutronix.de \
--cc=ying.huang@intel.com \
--cc=yinghai@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).