From: Robert Richter <robert.richter@amd.com>
To: Don Zickus <dzickus@redhat.com>
Cc: "x86@kernel.org" <x86@kernel.org>,
Andi Kleen <andi@firstfloor.org>,
Peter Zijlstra <peterz@infradead.org>,
"ying.huang@intel.com" <ying.huang@intel.com>,
LKML <linux-kernel@vger.kernel.org>,
"paulmck@linux.vnet.ibm.com" <paulmck@linux.vnet.ibm.com>
Subject: Re: [V3][PATCH 4/6] x86, nmi: add in logic to handle multiple events and unknown NMIs
Date: Tue, 6 Sep 2011 18:18:42 +0200 [thread overview]
Message-ID: <20110906161842.GI14200@erda.amd.com> (raw)
In-Reply-To: <1314290748-23569-5-git-send-email-dzickus@redhat.com>
On 25.08.11 12:45:46, Don Zickus wrote:
> Previous patches allow the NMI subsystem to process multipe NMI events
> in one NMI. As previously discussed this can cause issues when an event
> triggered another NMI but is processed in the current NMI. This causes the
> next NMI to go unprocessed and become an 'unknown' NMI.
>
> Having this print 'unknown' NMI to the console would be inaccurate and
> scare users. As a result I have copied the 'skip unknown' NMI logic
> developed by Robert Richter (and simplfied a little because we can
> track _all_ NMIs better instead of just the perf ones) to the main
> NMI handling routine.
>
> It is fairly simple, if when processing an NMI, the nmi_handle routine returns
> more than one event handled, then set a flag for future use. This flag just
> says there might be a possible unknown NMI. If an unknown NMI does come in,
> then it is skipped (swallowed). Otherwise just clear the flag on the next NMI
> if it has events processed.
>
> The algorithm isn't 100% accurate but for most loads we have seen in perf it
> captures a large majority of unknown NMIs. Under high workloads there still
> is the chance that unknown NMIs can trigger because you can time it just right
> such that you are generating NMIs as fast as you can process them and go four
> or five NMIs before seeing the unknown NMI.
>
> Without involving the concept of time when tracking these 'possible' NMIs,
> we may never be 100% reliable. The idea with time being that back-to-back
> NMIs immediately follow each other. Anything more than a micro second or so
> on modern machines between when the first NMI finished to when the second one
> starts, probably indicates a completely new event.
>
> V2:
> - forgot to add the 'read' code for swallow_nmi (went into next patch)
>
> Signed-off-by: Don Zickus <dzickus@redhat.com>
> ---
> arch/x86/kernel/nmi.c | 29 +++++++++++++++++++++++++++--
> 1 files changed, 27 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
> index 45fcd82..435dc71 100644
> --- a/arch/x86/kernel/nmi.c
> +++ b/arch/x86/kernel/nmi.c
> @@ -270,6 +270,8 @@ unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
> pr_emerg("Dazed and confused, but trying to continue\n");
> }
>
> +DEFINE_PER_CPU(bool, swallow_nmi);
> +
> static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
> {
> unsigned char reason = 0;
> @@ -281,8 +283,28 @@ static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
> * NMI can not be detected/processed on other CPUs.
> */
> handled = nmi_handle(NMI_LOCAL, regs);
> - if (handled)
> + if (handled) {
> + /*
> + * When handling multiple NMI events, we are not
> + * sure if the second NMI was dropped (because of
> + * too many NMIs), piggy-backed on the same NMI
> + * (perf) or is queued right behind this NMI.
> + * In the last case, we may accidentally get an
> + * unknown NMI because the event is already handled.
> + * Flag for this condition and swallow it later.
> + *
> + * FIXME: This detection has holes in it mainly
> + * because we can't tell _when_ the next NMI comes
> + * in. A multi-handled NMI event followed by an
> + * unknown NMI a second later, clearly should not
> + * be swallowed.
> + */
> + if (handled > 1)
> + __this_cpu_write(swallow_nmi, true);
> + else
> + __this_cpu_write(swallow_nmi, false);
See my comment for patch 3/6.
> return;
> + }
>
> /* Non-CPU-specific NMI: NMI sources can be processed on any CPU */
> raw_spin_lock(&nmi_reason_lock);
> @@ -305,7 +327,10 @@ static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
> }
> raw_spin_unlock(&nmi_reason_lock);
>
> - unknown_nmi_error(reason, regs);
> + if (!__this_cpu_read(swallow_nmi))
> + unknown_nmi_error(reason, regs);
> +
> + __this_cpu_write(swallow_nmi, false);
> }
>
> dotraplinkage notrace __kprobes void
> --
> 1.7.6
>
>
--
Advanced Micro Devices, Inc.
Operating System Research Center
next prev parent reply other threads:[~2011-09-06 16:18 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-25 16:45 [V3][PATCH 0/6] x86, nmi: new NMI handling routines Don Zickus
2011-08-25 16:45 ` [V3][PATCH 1/6] x86, nmi: split out nmi from traps.c Don Zickus
2011-08-25 16:45 ` [V3][PATCH 2/6] x86, nmi: create new NMI handler routines Don Zickus
2011-08-26 9:44 ` Peter Zijlstra
2011-08-26 14:21 ` Don Zickus
2011-09-06 10:08 ` Robert Richter
2011-09-06 16:53 ` Don Zickus
2011-08-25 16:45 ` [V3][PATCH 3/6] x86, nmi: wire up NMI handlers to new routines Don Zickus
2011-09-06 16:15 ` Robert Richter
2011-09-06 16:52 ` Don Zickus
2011-09-07 9:03 ` Borislav Petkov
2011-09-06 17:20 ` Corey Minyard
2011-09-06 17:49 ` Don Zickus
2011-09-06 17:59 ` Corey Minyard
2011-08-25 16:45 ` [V3][PATCH 4/6] x86, nmi: add in logic to handle multiple events and unknown NMIs Don Zickus
2011-09-06 16:18 ` Robert Richter [this message]
2011-08-25 16:45 ` [V3][PATCH 5/6] x86, nmi: track NMI usage stats Don Zickus
2011-08-25 16:45 ` [V3][PATCH 6/6] x86, nmi: print out NMI stats in /proc/interrupts Don Zickus
2011-09-06 16:39 ` Robert Richter
2011-09-06 17:40 ` Don Zickus
2011-08-26 9:44 ` [V3][PATCH 0/6] x86, nmi: new NMI handling routines Peter Zijlstra
2011-08-26 14:39 ` Don Zickus
2011-08-26 14:51 ` Peter Zijlstra
2011-09-06 16:43 ` Robert Richter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110906161842.GI14200@erda.amd.com \
--to=robert.richter@amd.com \
--cc=andi@firstfloor.org \
--cc=dzickus@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=x86@kernel.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.