linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Don Zickus <dzickus@redhat.com>
To: Robert Richter <robert.richter@amd.com>
Cc: Ingo Molnar <mingo@elte.hu>,
	Peter Zijlstra <peterz@infradead.org>,
	"gorcunov@gmail.com" <gorcunov@gmail.com>,
	"fweisbec@gmail.com" <fweisbec@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"ying.huang@intel.com" <ying.huang@intel.com>,
	"ming.m.lin@intel.com" <ming.m.lin@intel.com>,
	"yinghai@kernel.org" <yinghai@kernel.org>,
	"andi@firstfloor.org" <andi@firstfloor.org>,
	"eranian@google.com" <eranian@google.com>
Subject: Re: [PATCH] perf, x86: catch spurious interrupts after disabling counters
Date: Fri, 24 Sep 2010 14:11:43 -0400	[thread overview]
Message-ID: <20100924181143.GQ26290@redhat.com> (raw)
In-Reply-To: <20100924100345.GE13563@erda.amd.com>

On Fri, Sep 24, 2010 at 12:03:45PM +0200, Robert Richter wrote:
> On 23.09.10 23:18:34, Don Zickus wrote:
> 
> > > I was able to duplicate the problem and can confirm this patch fixes the
> > > issue for me.  I tried poking around (similar to things Robert probably
> > > did) and had no luck.  Something just doesn't make sense, but I guess for
> > > now this patch is good enough for me. :-)
> > 
> > Ah ha!  I figured out what the problem was, need to disable the pmu while
> > processing the nmi. :-)   Finally something simple in this crazy unknown
> > NMI spree.
> > 
> > Oh yeah and trace_printk is now my new favorite tool!
> > 
> > From: Don Zickus <dzickus@redhat.com>
> > Date: Thu, 23 Sep 2010 22:52:09 -0400
> > Subject: [PATCH] x86, perf: disable pmu from counting when processing irq
> > 
> > On certain AMD and Intel machines, the pmu was left enabled
> > while the counters were reset during handling of the NMI.
> > After the counters are reset, code continues to process an
> > overflow.  During this time another counter overflow interrupt
> > could happen because the counter is still ticking.  This leads to
> > an unknown NMI.
> 
> I don't like the approach of disabling all counters in the nmi
> handler. First, it stops counting and thus may falsify
> measurement. Second, it introduces much overhead doing a rd-/wrmsrl()
> for each counter.

Ok. good points.  Though we may have to re-visit the perf_event_intel.c
code as my patch was based on 'well they did it there, we can do it here'
approach.

> 
> Does your patch below solve something that my patch doesn't?

Well, two things I was trying to solve with your approach is the extra NMI
that is generated, and the fact that you may falsely eat an unknown NMI
(because you don't clear cpuc->running except in the special case).

Yeah, I know the heurestics from your patch a couple of weeks ago will
make the case of falsely eating an unknown NMI next to impossible.  But I
was trying to limit the number seen.

For example, the back-to-back nmi problem we dealt with a couple of weeks
ago, only had a 'special case' that had to be dealt with every once in a
while (depending on the level of perf activity).  Now I can see unknown
NMIs every time there is a pmu_stop() issued.  I was trying to figure out
a way to cut down on that noise.

I came up with a couple of approaches but both involve another rdmsrl in
the handle_irq path.  I thought I would toss these ideas out for
conversation.

the first approach gets rid of the extra nmi by waiting until the end to
re-write the counter, but adds a second rdmsrl to resync the period in the
case where pmu_stop is not called.


diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 48c6d8d..1642f48 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1175,11 +1175,22 @@ static int x86_pmu_handle_irq(struct pt_regs *regs)
 		handled++;
 		data.period	= event->hw.last_period;
 
-		if (!x86_perf_event_set_period(event))
-			continue;
+		/*
+		 * if period is over, process the overflow
+		 * before reseting the counter, otherwise
+		 * a new overflow could occur before the 
+		 * event is stopped
+		 */
+		if (local64_read(&hwc->period_left) <= 0) {
+			if (perf_event_overflow(event, 1, &data, regs)) {
+				x86_pmu_stop(event, 0);
+				continue;
+			}
+			/* if the overflow doesn't stop the event, resync */
+			x86_perf_event_update(event);
+		}
 
-		if (perf_event_overflow(event, 1, &data, regs))
-			x86_pmu_stop(event, 0);
+		x86_perf_event_set_period(event);
 	}
 
 	if (handled)

and the second approach, accepts the fact that we will get another unknown
NMI but check for it first (after stopping) and clear the cpuc->running
bit if we didn't overflow yet.

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 18c8856..bfb05da 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1189,8 +1189,12 @@ static int x86_pmu_handle_irq(struct pt_regs *regs)
 		if (!x86_perf_event_set_period(event))
 			continue;
 
-		if (perf_event_overflow(event, 1, &data, regs))
+		if (perf_event_overflow(event, 1, &data, regs)) {
 			x86_pmu_stop(event, 0);
+			rdmsrl(hwc->event_base + idx, val);
+			if (val & (1ULL << (x86_pmu.cntval_bits - 1)))
+				__clear_bit(idx, cpuc->running);
+		}
 	}
 
 	if (handled)


After dealing with the Intel folks on another thread about how they are
finding more ways to detect and report hardware errors with NMIs, that I was
getting nervous about generating so many false NMIs and accidentally
eating a real one.

Cheers,
Don


  parent reply	other threads:[~2010-09-24 18:12 UTC|newest]

Thread overview: 117+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-02 19:07 [PATCH 0/3 v2] nmi perf fixes Don Zickus
2010-09-02 19:07 ` [PATCH 1/3] perf, x86: Fix accidentally ack'ing a second event on intel perf counter Don Zickus
2010-09-02 19:26   ` Cyrill Gorcunov
2010-09-02 20:00     ` Don Zickus
2010-09-02 20:36       ` Cyrill Gorcunov
2010-09-03  7:10   ` [tip:perf/urgent] " tip-bot for Don Zickus
2010-09-03  7:39     ` Yinghai Lu
2010-09-03 15:00       ` Don Zickus
2010-09-03 17:15         ` Yinghai Lu
2010-09-03 18:35           ` Don Zickus
2010-09-03 19:24             ` Yinghai Lu
2010-09-03 20:10               ` Don Zickus
2010-10-04 23:24             ` Yinghai Lu
2010-10-11 20:25               ` Don Zickus
2010-09-02 19:07 ` [PATCH 2/3] perf, x86: Try to handle unknown nmis with an enabled PMU Don Zickus
2010-09-03  7:11   ` [tip:perf/urgent] " tip-bot for Robert Richter
2010-09-02 19:07 ` [PATCH 3/3] perf, x86: Fix handle_irq return values Don Zickus
2010-09-03  7:10   ` [tip:perf/urgent] " tip-bot for Peter Zijlstra
2010-09-10 11:41 ` [PATCH 0/3 v2] nmi perf fixes Peter Zijlstra
2010-09-10 12:10   ` Stephane Eranian
2010-09-10 12:13     ` Stephane Eranian
2010-09-10 13:27   ` Don Zickus
2010-09-10 14:46     ` Ingo Molnar
2010-09-10 15:17       ` Robert Richter
2010-09-10 15:58         ` Peter Zijlstra
2010-09-10 16:41           ` Ingo Molnar
2010-09-10 16:42             ` Ingo Molnar
2010-09-10 16:37         ` Ingo Molnar
2010-09-10 16:51           ` Ingo Molnar
2010-09-10 15:56       ` [PATCH] x86: fix duplicate calls of the nmi handler Robert Richter
2010-09-10 16:15         ` Peter Zijlstra
2010-09-11  9:41         ` Ingo Molnar
2010-09-11 11:44           ` Robert Richter
2010-09-11 12:45             ` Ingo Molnar
2010-09-12  9:52               ` Robert Richter
2010-09-13 14:37                 ` Robert Richter
2010-09-14 17:41                   ` Robert Richter
2010-09-15 16:20                     ` [PATCH] perf, x86: catch spurious interrupts after disabling counters Robert Richter
2010-09-15 16:36                       ` Stephane Eranian
2010-09-15 17:00                         ` Robert Richter
2010-09-15 17:32                           ` Stephane Eranian
2010-09-15 18:44                             ` Robert Richter
2010-09-15 19:34                               ` Cyrill Gorcunov
2010-09-15 20:21                                 ` Stephane Eranian
2010-09-15 20:39                                   ` Cyrill Gorcunov
2010-09-15 22:27                                     ` Robert Richter
2010-09-16 14:51                                       ` Frederic Weisbecker
2010-09-15 16:46                       ` Cyrill Gorcunov
2010-09-15 16:47                         ` Stephane Eranian
2010-09-15 17:02                           ` Cyrill Gorcunov
2010-09-15 17:28                             ` Robert Richter
2010-09-15 17:40                               ` Cyrill Gorcunov
2010-09-15 22:10                                 ` Robert Richter
2010-09-16  6:53                                   ` Cyrill Gorcunov
2010-09-16 17:34                       ` Peter Zijlstra
2010-09-17  8:51                         ` Robert Richter
2010-09-17  9:14                           ` Peter Zijlstra
2010-09-17 13:06                       ` Stephane Eranian
2010-09-20  8:41                         ` Robert Richter
2010-09-24  0:02                       ` Don Zickus
2010-09-24  3:18                         ` Don Zickus
2010-09-24 10:03                           ` Robert Richter
2010-09-24 13:38                             ` Stephane Eranian
2010-09-30 12:33                               ` Peter Zijlstra
2010-09-24 18:11                             ` Don Zickus [this message]
2010-09-24 10:41                       ` [tip:perf/urgent] perf, x86: Catch " tip-bot for Robert Richter
2010-09-29 12:26                         ` Stephane Eranian
2010-09-29 12:53                           ` Robert Richter
2010-09-29 12:54                             ` Robert Richter
2010-09-29 13:13                               ` Stephane Eranian
2010-09-29 13:28                                 ` Stephane Eranian
2010-09-29 15:01                                   ` Robert Richter
2010-09-29 15:12                                     ` Robert Richter
2010-09-29 15:27                                       ` Cyrill Gorcunov
2010-09-29 15:33                                         ` Stephane Eranian
2010-09-29 15:45                                           ` Cyrill Gorcunov
2010-09-29 15:51                                             ` Cyrill Gorcunov
2010-09-29 16:32                                               ` Robert Richter
2010-09-29 16:48                                                 ` Cyrill Gorcunov
2010-09-29 16:00                                             ` Stephane Eranian
2010-09-29 17:09                                               ` Robert Richter
2010-09-29 17:41                                                 ` Cyrill Gorcunov
2010-09-29 18:12                                                 ` Don Zickus
2010-09-29 19:42                                                   ` Stephane Eranian
2010-09-29 20:03                                                     ` Don Zickus
2010-09-30  9:12                                                     ` Robert Richter
2010-09-30 19:44                                                       ` Don Zickus
2010-10-01  7:17                                                         ` Robert Richter
     [not found]                                                           ` <AANLkTimUyLaVaBigjm0-CwRsdh4UXWDiss2ffX53S+k_@mail.gmail.com>
2010-10-01 11:53                                                             ` Stephane Eranian
2010-10-02  9:35                                                               ` Robert Richter
2010-10-04  8:53                                                                 ` Stephane Eranian
2010-10-04  9:07                                                                   ` Andi Kleen
2010-10-04 17:28                                                                     ` Stephane Eranian
2010-09-29 16:31                                           ` Robert Richter
2010-09-29 16:22                                         ` Robert Richter
2010-09-29 19:01                                         ` Don Zickus
2010-09-29 13:39                                 ` Robert Richter
2010-09-29 13:56                                   ` Stephane Eranian
2010-09-29 14:00                                     ` Stephane Eranian
2010-10-02  9:50                                       ` Robert Richter
2010-10-02 17:40                                         ` Stephane Eranian
2010-09-29 15:02                                     ` Cyrill Gorcunov
2010-09-16 17:42         ` [PATCH] x86: fix duplicate calls of the nmi handler Peter Zijlstra
2010-09-16 20:18           ` Stephane Eranian
2010-09-17  7:09             ` Peter Zijlstra
2010-09-17  0:13           ` Huang Ying
2010-09-17  7:52             ` Peter Zijlstra
2010-09-17  8:13               ` Robert Richter
2010-09-17  8:37                 ` Cyrill Gorcunov
2010-09-17  8:47               ` Huang Ying
2010-09-10 13:34   ` [PATCH 0/3 v2] nmi perf fixes Peter Zijlstra
2010-09-10 13:52     ` Peter Zijlstra
2010-09-13  8:55       ` Cyrill Gorcunov
2010-09-13  9:54         ` Stephane Eranian
2010-09-13 10:07           ` Cyrill Gorcunov
2010-09-13 10:10             ` Stephane Eranian
2010-09-13 10:12               ` Cyrill Gorcunov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100924181143.GQ26290@redhat.com \
    --to=dzickus@redhat.com \
    --cc=andi@firstfloor.org \
    --cc=eranian@google.com \
    --cc=fweisbec@gmail.com \
    --cc=gorcunov@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.m.lin@intel.com \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=robert.richter@amd.com \
    --cc=ying.huang@intel.com \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).