From: Frederic Weisbecker <fweisbec@gmail.com>
To: Robert Richter <robert.richter@amd.com>
Cc: Don Zickus <dzickus@redhat.com>,
Cyrill Gorcunov <gorcunov@gmail.com>,
Peter Zijlstra <peterz@infradead.org>,
Lin Ming <ming.m.lin@intel.com>, Ingo Molnar <mingo@elte.hu>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Huang, Ying" <ying.huang@intel.com>,
Yinghai Lu <yinghai@kernel.org>, Andi Kleen <andi@firstfloor.org>
Subject: Re: [PATCH -v2] perf, x86: try to handle unknown nmis with running perfctrs
Date: Fri, 13 Aug 2010 06:25:36 +0200 [thread overview]
Message-ID: <20100813042533.GA9669@nowhere> (raw)
In-Reply-To: <20100811220058.GT26154@erda.amd.com>
On Thu, Aug 12, 2010 at 12:00:58AM +0200, Robert Richter wrote:
> I was debuging this a little more, see version 2 below.
>
> -Robert
>
> --
>
> From 8bb831af56d118b85fc38e0ddc2e516f7504b9fb Mon Sep 17 00:00:00 2001
> From: Robert Richter <robert.richter@amd.com>
> Date: Thu, 5 Aug 2010 16:19:59 +0200
> Subject: [PATCH] perf, x86: try to handle unknown nmis with running perfctrs
>
> When perfctrs are running it is valid to have unhandled nmis, two
> events could trigger 'simultaneously' raising two back-to-back
> NMIs. If the first NMI handles both, the latter will be empty and daze
> the CPU.
>
> The solution to avoid an 'unknown nmi' massage in this case was simply
> to stop the nmi handler chain when perfctrs are runnning by stating
> the nmi was handled. This has the drawback that a) we can not detect
> unknown nmis anymore, and b) subsequent nmi handlers are not called.
>
> This patch addresses this. Now, we check this unknown NMI if it could
> be a perfctr back-to-back NMI. Otherwise we pass it and let the kernel
> handle the unknown nmi.
>
> This is a debug log:
>
> cpu #6, nmi #32333, skip_nmi #32330, handled = 1, time = 1934364430
> cpu #6, nmi #32334, skip_nmi #32330, handled = 1, time = 1934704616
> cpu #6, nmi #32335, skip_nmi #32336, handled = 2, time = 1936032320
> cpu #6, nmi #32336, skip_nmi #32336, handled = 0, time = 1936034139
> cpu #6, nmi #32337, skip_nmi #32336, handled = 1, time = 1936120100
> cpu #6, nmi #32338, skip_nmi #32336, handled = 1, time = 1936404607
> cpu #6, nmi #32339, skip_nmi #32336, handled = 1, time = 1937983416
> cpu #6, nmi #32340, skip_nmi #32341, handled = 2, time = 1938201032
> cpu #6, nmi #32341, skip_nmi #32341, handled = 0, time = 1938202830
> cpu #6, nmi #32342, skip_nmi #32341, handled = 1, time = 1938443743
> cpu #6, nmi #32343, skip_nmi #32341, handled = 1, time = 1939956552
> cpu #6, nmi #32344, skip_nmi #32341, handled = 1, time = 1940073224
> cpu #6, nmi #32345, skip_nmi #32341, handled = 1, time = 1940485677
> cpu #6, nmi #32346, skip_nmi #32347, handled = 2, time = 1941947772
> cpu #6, nmi #32347, skip_nmi #32347, handled = 1, time = 1941949818
> cpu #6, nmi #32348, skip_nmi #32347, handled = 0, time = 1941951591
> Uhhuh. NMI received for unknown reason 00 on CPU 6.
> Do you have a strange power saving mode enabled?
> Dazed and confused, but trying to continue
>
> Deltas:
>
> nmi #32334 340186
> nmi #32335 1327704
> nmi #32336 1819 <<<< back-to-back nmi [1]
> nmi #32337 85961
> nmi #32338 284507
> nmi #32339 1578809
> nmi #32340 217616
> nmi #32341 1798 <<<< back-to-back nmi [2]
> nmi #32342 240913
> nmi #32343 1512809
> nmi #32344 116672
> nmi #32345 412453
> nmi #32346 1462095 <<<< 1st nmi (standard) handling 2 counters
> nmi #32347 2046 <<<< 2nd nmi (back-to-back) handling one counter
> nmi #32348 1773 <<<< 3rd nmi (back-to-back) handling no counter! [3]
>
> For back-to-back nmi detection there are the following rules:
>
> The perfctr nmi handler was handling more than one counter and no
> counter was handled in the subsequent nmi (see [1] and [2] above).
>
> There is another case if there are two subsequent back-to-back nmis
> [3]. In this case we measure the time between the first and the
> 2nd. The 2nd is detected as back-to-back because the first handled
> more than one counter. The time between the 1st and the 2nd is used to
> calculate a range for which we assume a back-to-back nmi. Now, the 3rd
> nmi triggers, we measure again the time delta and compare it with the
> first delta from which we know it was a back-to-back nmi. If the 3rd
> nmi is within the range, it is also a back-to-back nmi and we drop it.
>
> Signed-off-by: Robert Richter <robert.richter@amd.com>
> ---
That time based thing looks a bit complicated.
I'm still not sure why you don't want to use a simple flag:
After handled a perf NMI:
if (handled more than one counter)
__get_cpu_var(skip_unknown) = 1;
While handling an unknown NMI:
if (__get_cpu_var(skip_unknown)) {
__get_cpu_var(skip_unknow) = 0;
return NOTIFY_STOP;
}
next prev parent reply other threads:[~2010-08-13 4:25 UTC|newest]
Thread overview: 85+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-08-04 9:21 A question of perf NMI handler Lin Ming
2010-08-04 9:50 ` Peter Zijlstra
2010-08-04 10:01 ` Robert Richter
2010-08-04 10:24 ` Peter Zijlstra
2010-08-04 10:29 ` Robert Richter
2010-08-04 14:00 ` Don Zickus
2010-08-04 14:11 ` Peter Zijlstra
2010-08-04 14:52 ` Don Zickus
2010-08-04 15:02 ` Peter Zijlstra
2010-08-04 15:18 ` Cyrill Gorcunov
2010-08-04 15:50 ` Don Zickus
2010-08-04 16:10 ` Cyrill Gorcunov
2010-08-04 16:20 ` Don Zickus
2010-08-04 16:39 ` Cyrill Gorcunov
2010-08-04 18:48 ` Robert Richter
2010-08-04 19:22 ` Andi Kleen
2010-08-04 19:26 ` Cyrill Gorcunov
2010-08-06 6:52 ` Robert Richter
2010-08-06 14:21 ` Don Zickus
2010-08-09 19:48 ` [PATCH] perf, x86: try to handle unknown nmis with running perfctrs Robert Richter
2010-08-09 20:02 ` Cyrill Gorcunov
2010-08-10 7:42 ` Robert Richter
2010-08-10 16:16 ` Cyrill Gorcunov
2010-08-10 16:41 ` Robert Richter
2010-08-10 17:24 ` Cyrill Gorcunov
2010-08-10 19:05 ` Robert Richter
2010-08-10 19:24 ` Cyrill Gorcunov
2010-08-12 13:24 ` Robert Richter
2010-08-12 14:31 ` Cyrill Gorcunov
2010-08-10 20:48 ` Don Zickus
2010-08-11 2:44 ` Frederic Weisbecker
2010-08-11 11:10 ` Robert Richter
2010-08-11 12:44 ` Don Zickus
2010-08-11 14:03 ` Robert Richter
2010-08-11 14:32 ` Don Zickus
2010-08-13 4:37 ` Frederic Weisbecker
2010-08-13 8:22 ` Robert Richter
2010-08-14 1:28 ` Frederic Weisbecker
2010-08-14 2:29 ` Robert Richter
2010-08-11 12:39 ` Don Zickus
2010-08-11 3:19 ` Huang Ying
2010-08-11 12:36 ` Don Zickus
2010-08-16 14:37 ` Peter Zijlstra
2010-08-11 22:00 ` [PATCH -v2] " Robert Richter
2010-08-12 13:10 ` Robert Richter
2010-08-12 18:21 ` Don Zickus
2010-08-16 7:37 ` Robert Richter
2010-08-12 13:52 ` Don Zickus
2010-08-13 4:25 ` Frederic Weisbecker [this message]
2010-08-16 14:48 ` Peter Zijlstra
2010-08-16 16:27 ` Cyrill Gorcunov
2010-08-16 17:16 ` Robert Richter
2010-08-16 19:06 ` Cyrill Gorcunov
2010-08-16 19:13 ` Peter Zijlstra
2010-08-16 19:18 ` Cyrill Gorcunov
2010-08-16 22:55 ` Robert Richter
2010-08-17 15:23 ` Cyrill Gorcunov
2010-08-17 15:22 ` [PATCH -v3] " Robert Richter
2010-08-17 16:17 ` Cyrill Gorcunov
2010-08-19 10:45 ` Peter Zijlstra
2010-08-19 12:39 ` Robert Richter
2010-08-19 14:12 ` Don Zickus
2010-08-19 14:27 ` Peter Zijlstra
2010-08-19 15:20 ` Don Zickus
2010-08-19 17:43 ` Cyrill Gorcunov
2010-08-19 17:53 ` Peter Zijlstra
2010-08-19 21:58 ` Don Zickus
2010-08-20 8:50 ` Peter Zijlstra
2010-08-20 1:50 ` Don Zickus
2010-08-20 8:16 ` Ingo Molnar
2010-08-20 10:04 ` Peter Zijlstra
2010-08-20 10:30 ` Cyrill Gorcunov
2010-08-20 12:39 ` Don Zickus
2010-08-20 13:27 ` Ingo Molnar
2010-08-20 13:51 ` Don Zickus
2010-08-20 14:17 ` Ingo Molnar
2010-08-20 20:45 ` Cyrill Gorcunov
2010-08-24 21:48 ` Don Zickus
2010-08-20 8:36 ` Robert Richter
2010-08-20 14:17 ` [tip:perf/urgent] perf, x86: Fix handle_irq return values tip-bot for Peter Zijlstra
2010-08-20 14:17 ` [tip:perf/urgent] perf, x86: Try to handle unknown nmis with an enabled PMU tip-bot for Robert Richter
2010-08-06 15:35 ` A question of perf NMI handler Andi Kleen
2010-08-04 15:45 ` Don Zickus
2010-08-06 15:37 ` Andi Kleen
2010-08-04 13:54 ` Don Zickus
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100813042533.GA9669@nowhere \
--to=fweisbec@gmail.com \
--cc=andi@firstfloor.org \
--cc=dzickus@redhat.com \
--cc=gorcunov@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=ming.m.lin@intel.com \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=robert.richter@amd.com \
--cc=ying.huang@intel.com \
--cc=yinghai@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).