linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Frederic Weisbecker <fweisbec@gmail.com>
To: Robert Richter <robert.richter@amd.com>
Cc: Don Zickus <dzickus@redhat.com>,
	Cyrill Gorcunov <gorcunov@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Lin Ming <ming.m.lin@intel.com>, Ingo Molnar <mingo@elte.hu>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Huang, Ying" <ying.huang@intel.com>,
	Yinghai Lu <yinghai@kernel.org>, Andi Kleen <andi@firstfloor.org>
Subject: Re: [PATCH -v2] perf, x86: try to handle unknown nmis with running perfctrs
Date: Fri, 13 Aug 2010 06:25:36 +0200	[thread overview]
Message-ID: <20100813042533.GA9669@nowhere> (raw)
In-Reply-To: <20100811220058.GT26154@erda.amd.com>

On Thu, Aug 12, 2010 at 12:00:58AM +0200, Robert Richter wrote:
> I was debuging this a little more, see version 2 below.
> 
> -Robert
> 
> --
> 
> From 8bb831af56d118b85fc38e0ddc2e516f7504b9fb Mon Sep 17 00:00:00 2001
> From: Robert Richter <robert.richter@amd.com>
> Date: Thu, 5 Aug 2010 16:19:59 +0200
> Subject: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
> 
> When perfctrs are running it is valid to have unhandled nmis, two
> events could trigger 'simultaneously' raising two back-to-back
> NMIs. If the first NMI handles both, the latter will be empty and daze
> the CPU.
> 
> The solution to avoid an 'unknown nmi' massage in this case was simply
> to stop the nmi handler chain when perfctrs are runnning by stating
> the nmi was handled. This has the drawback that a) we can not detect
> unknown nmis anymore, and b) subsequent nmi handlers are not called.
> 
> This patch addresses this. Now, we check this unknown NMI if it could
> be a perfctr back-to-back NMI. Otherwise we pass it and let the kernel
> handle the unknown nmi.
> 
> This is a debug log:
> 
> cpu #6, nmi #32333, skip_nmi #32330, handled = 1, time = 1934364430
> cpu #6, nmi #32334, skip_nmi #32330, handled = 1, time = 1934704616
> cpu #6, nmi #32335, skip_nmi #32336, handled = 2, time = 1936032320
> cpu #6, nmi #32336, skip_nmi #32336, handled = 0, time = 1936034139
> cpu #6, nmi #32337, skip_nmi #32336, handled = 1, time = 1936120100
> cpu #6, nmi #32338, skip_nmi #32336, handled = 1, time = 1936404607
> cpu #6, nmi #32339, skip_nmi #32336, handled = 1, time = 1937983416
> cpu #6, nmi #32340, skip_nmi #32341, handled = 2, time = 1938201032
> cpu #6, nmi #32341, skip_nmi #32341, handled = 0, time = 1938202830
> cpu #6, nmi #32342, skip_nmi #32341, handled = 1, time = 1938443743
> cpu #6, nmi #32343, skip_nmi #32341, handled = 1, time = 1939956552
> cpu #6, nmi #32344, skip_nmi #32341, handled = 1, time = 1940073224
> cpu #6, nmi #32345, skip_nmi #32341, handled = 1, time = 1940485677
> cpu #6, nmi #32346, skip_nmi #32347, handled = 2, time = 1941947772
> cpu #6, nmi #32347, skip_nmi #32347, handled = 1, time = 1941949818
> cpu #6, nmi #32348, skip_nmi #32347, handled = 0, time = 1941951591
> Uhhuh. NMI received for unknown reason 00 on CPU 6.
> Do you have a strange power saving mode enabled?
> Dazed and confused, but trying to continue
> 
> Deltas:
> 
> nmi #32334 340186
> nmi #32335 1327704
> nmi #32336 1819      <<<< back-to-back nmi [1]
> nmi #32337 85961
> nmi #32338 284507
> nmi #32339 1578809
> nmi #32340 217616
> nmi #32341 1798      <<<< back-to-back nmi [2]
> nmi #32342 240913
> nmi #32343 1512809
> nmi #32344 116672
> nmi #32345 412453
> nmi #32346 1462095   <<<< 1st nmi (standard) handling 2 counters
> nmi #32347 2046      <<<< 2nd nmi (back-to-back) handling one counter
> nmi #32348 1773      <<<< 3rd nmi (back-to-back) handling no counter! [3]
> 
> For  back-to-back nmi detection there are the following rules:
> 
> The perfctr nmi handler was handling more than one counter and no
> counter was handled in the subsequent nmi (see [1] and [2] above).
> 
> There is another case if there are two subsequent back-to-back nmis
> [3]. In this case we measure the time between the first and the
> 2nd. The 2nd is detected as back-to-back because the first handled
> more than one counter. The time between the 1st and the 2nd is used to
> calculate a range for which we assume a back-to-back nmi. Now, the 3rd
> nmi triggers, we measure again the time delta and compare it with the
> first delta from which we know it was a back-to-back nmi. If the 3rd
> nmi is within the range, it is also a back-to-back nmi and we drop it.
> 
> Signed-off-by: Robert Richter <robert.richter@amd.com>
> ---



That time based thing looks a bit complicated.

I'm still not sure why you don't want to use a simple flag:

After handled a perf NMI:

	if (handled more than one counter)
		__get_cpu_var(skip_unknown) = 1;

While handling an unknown NMI:

	if (__get_cpu_var(skip_unknown)) {
		__get_cpu_var(skip_unknow) = 0;
		return NOTIFY_STOP;
	}


  parent reply	other threads:[~2010-08-13  4:25 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-04  9:21 A question of perf NMI handler Lin Ming
2010-08-04  9:50 ` Peter Zijlstra
2010-08-04 10:01 ` Robert Richter
2010-08-04 10:24   ` Peter Zijlstra
2010-08-04 10:29     ` Robert Richter
2010-08-04 14:00   ` Don Zickus
2010-08-04 14:11     ` Peter Zijlstra
2010-08-04 14:52       ` Don Zickus
2010-08-04 15:02         ` Peter Zijlstra
2010-08-04 15:18           ` Cyrill Gorcunov
2010-08-04 15:50             ` Don Zickus
2010-08-04 16:10               ` Cyrill Gorcunov
2010-08-04 16:20                 ` Don Zickus
2010-08-04 16:39                   ` Cyrill Gorcunov
2010-08-04 18:48                     ` Robert Richter
2010-08-04 19:22                       ` Andi Kleen
2010-08-04 19:26                       ` Cyrill Gorcunov
2010-08-06  6:52                         ` Robert Richter
2010-08-06 14:21                           ` Don Zickus
2010-08-09 19:48                             ` [PATCH] perf, x86: try to handle unknown nmis with running perfctrs Robert Richter
2010-08-09 20:02                               ` Cyrill Gorcunov
2010-08-10  7:42                                 ` Robert Richter
2010-08-10 16:16                                   ` Cyrill Gorcunov
2010-08-10 16:41                                     ` Robert Richter
2010-08-10 17:24                                       ` Cyrill Gorcunov
2010-08-10 19:05                                         ` Robert Richter
2010-08-10 19:24                                           ` Cyrill Gorcunov
2010-08-12 13:24                                             ` Robert Richter
2010-08-12 14:31                                               ` Cyrill Gorcunov
2010-08-10 20:48                               ` Don Zickus
2010-08-11  2:44                                 ` Frederic Weisbecker
2010-08-11 11:10                                   ` Robert Richter
2010-08-11 12:44                                     ` Don Zickus
2010-08-11 14:03                                       ` Robert Richter
2010-08-11 14:32                                         ` Don Zickus
2010-08-13  4:37                                     ` Frederic Weisbecker
2010-08-13  8:22                                       ` Robert Richter
2010-08-14  1:28                                         ` Frederic Weisbecker
2010-08-14  2:29                                           ` Robert Richter
2010-08-11 12:39                                   ` Don Zickus
2010-08-11  3:19                                 ` Huang Ying
2010-08-11 12:36                                   ` Don Zickus
2010-08-16 14:37                                     ` Peter Zijlstra
2010-08-11 22:00                               ` [PATCH -v2] " Robert Richter
2010-08-12 13:10                                 ` Robert Richter
2010-08-12 18:21                                   ` Don Zickus
2010-08-16  7:37                                     ` Robert Richter
2010-08-12 13:52                                 ` Don Zickus
2010-08-13  4:25                                 ` Frederic Weisbecker [this message]
2010-08-16 14:48                                 ` Peter Zijlstra
2010-08-16 16:27                                   ` Cyrill Gorcunov
2010-08-16 17:16                                     ` Robert Richter
2010-08-16 19:06                                       ` Cyrill Gorcunov
2010-08-16 19:13                                         ` Peter Zijlstra
2010-08-16 19:18                                           ` Cyrill Gorcunov
2010-08-16 22:55                                         ` Robert Richter
2010-08-17 15:23                                           ` Cyrill Gorcunov
2010-08-17 15:22                               ` [PATCH -v3] " Robert Richter
2010-08-17 16:17                                 ` Cyrill Gorcunov
2010-08-19 10:45                                 ` Peter Zijlstra
2010-08-19 12:39                                   ` Robert Richter
2010-08-19 14:12                                   ` Don Zickus
2010-08-19 14:27                                     ` Peter Zijlstra
2010-08-19 15:20                                       ` Don Zickus
2010-08-19 17:43                                       ` Cyrill Gorcunov
2010-08-19 17:53                                         ` Peter Zijlstra
2010-08-19 21:58                                       ` Don Zickus
2010-08-20  8:50                                         ` Peter Zijlstra
2010-08-20  1:50                                       ` Don Zickus
2010-08-20  8:16                                         ` Ingo Molnar
2010-08-20 10:04                                           ` Peter Zijlstra
2010-08-20 10:30                                             ` Cyrill Gorcunov
2010-08-20 12:39                                             ` Don Zickus
2010-08-20 13:27                                               ` Ingo Molnar
2010-08-20 13:51                                                 ` Don Zickus
2010-08-20 14:17                                                   ` Ingo Molnar
2010-08-20 20:45                                                     ` Cyrill Gorcunov
2010-08-24 21:48                                                     ` Don Zickus
2010-08-20  8:36                                         ` Robert Richter
2010-08-20 14:17                                       ` [tip:perf/urgent] perf, x86: Fix handle_irq return values tip-bot for Peter Zijlstra
2010-08-20 14:17                                 ` [tip:perf/urgent] perf, x86: Try to handle unknown nmis with an enabled PMU tip-bot for Robert Richter
2010-08-06 15:35                   ` A question of perf NMI handler Andi Kleen
2010-08-04 15:45           ` Don Zickus
2010-08-06 15:37           ` Andi Kleen
2010-08-04 13:54 ` Don Zickus

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100813042533.GA9669@nowhere \
    --to=fweisbec@gmail.com \
    --cc=andi@firstfloor.org \
    --cc=dzickus@redhat.com \
    --cc=gorcunov@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.m.lin@intel.com \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=robert.richter@amd.com \
    --cc=ying.huang@intel.com \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).