From: Ingo Molnar <mingo@kernel.org>
To: Mike Travis <travis@sgi.com>
Cc: Jason Wessel <jason.wessel@windriver.com>,
Dimitri Sivanich <sivanich@sgi.com>,
Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
Thomas Gleixner <tglx@linutronix.de>,
Andrew Morton <akpm@linux-foundation.org>,
kgdb-bugreport@lists.sourceforge.net, x86@kernel.org,
linux-kernel@vger.kernel.org, Russ Anderson <rja@sgi.com>,
Alexander Gordeev <agordeev@redhat.com>,
Suresh Siddha <suresh.b.siddha@intel.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
Steffen Persvold <sp@numascale.com>
Subject: Re: [PATCH 13/14] x86/UV: Update UV support for external NMI signals
Date: Thu, 21 Mar 2013 12:51:16 +0100 [thread overview]
Message-ID: <20130321115116.GB2659@gmail.com> (raw)
In-Reply-To: <5149538F.2080402@sgi.com>
* Mike Travis <travis@sgi.com> wrote:
>
>
> On 3/14/2013 12:20 AM, Ingo Molnar wrote:
> >
> > * Mike Travis <travis@sgi.com> wrote:
> >
> >>
> >> There is an exception where the NMI_LOCAL notifier chain is used. When
> >> the perf tools are in use, it's possible that our NMI was captured by
> >> some other NMI handler and then ignored. We set a per_cpu flag for
> >> those CPUs that ignored the initial NMI, and then send them an IPI NMI
> >> signal.
> >
> > "Other" NMI handlers should never lose NMIs - if they do then they should
> > be fixed I think.
> >
> > Thanks,
> >
> > Ingo
>
> Hi Ingo,
>
> I suspect that the other NMI handlers would not grab ours if we were
> on the NMI_LOCAL chain to claim them. The problem though is the UV
> Hub is not designed to have that amount of traffic reading the MMRs.
> This was handled in previous kernel versions by a.) putting us at the
> bottom of the chain; and b.) as soon as a handler claimed an NMI as
> it's own, the search would be stopped.
>
> Neither of these are true any more as all handlers are called for
> all NMIs. (I measured anywhere from .5M to 4M NMIs per second on a
> 64 socket, 1024 cpu thread system [not sure why the rate changes]).
> This was the primary motivation for placing the UV NMI handler on the
> NMI_UNKNOWN chain, so it would be called only if all other handlers
> "gave up", and thus not incur the overhead of the MMR reads on every
> NMI event.
That's a fair motivation.
> The good news is that I haven't yet encountered a case where the
> "missing" cpus were not called into the NMI loop. Even better news
> is that on the previous (3.0 vintage) kernels running two perf tops
> would almost always cause either tons of the infamous "dazed and
> confused" messages, or would lock up the system. Now it results in
> quite a few messages like:
>
> [ 961.119417] perf_event_intel: clearing PMU state on CPU#652
>
> followed by a dump of a number of cpu PMC registers. But the system
> remains responsive. (This was experienced in our Customer Training
> Lab where multiple system admins were in the class.)
I too can provoke those messages when pushing PMUs hard enough via
multiple perf users. I suspect there's still some PMU erratum that
seems to have been introduced at around Nehalem CPUs.
Clearing the PMU works it around, at the cost of a loss of a slight
amount of profiling data.
> The bad news is I'm not sure why the errant NMI interrupts are lost.
> I have noticed that restricting the 'perf tops' to separate and
> distinct cpusets seems to lessen this "stomping on each other's perf
> event handlers" effect, which might be more representative of actual
> customer usage.
>
> So in total the situation is vastly improved... :)
Okay. My main dislike is the linecount:
4 files changed, 648 insertions(+), 41 deletions(-)
... for something that should in theory work almost out of box, with
minimal glue!
As long as it stays in the UV platform code this isn't a NAK from me -
just wanted to inquire whether most of that complexity could be eliminated
by figuring out the root cause of the lost NMIs ...
Thanks,
Ingo
next prev parent reply other threads:[~2013-03-21 11:51 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-12 19:38 [PATCH 00/14] x86/UV/KDB/NMI: Updates for NMI/KDB handler for SGI UV Mike Travis
2013-03-12 19:38 ` [PATCH 01/14] KDB: fix the interrupt of the KDB btc command Mike Travis
2013-03-12 19:38 ` [PATCH 02/14] KDB: fix errant character in KDB show regs Mike Travis
2013-03-12 19:38 ` [PATCH 03/14] KDB: up the default LINES value Mike Travis
2013-03-12 19:38 ` [PATCH 04/14] KDB: allow KDB modules to be external modules Mike Travis
2013-03-12 19:38 ` [PATCH 05/14] KDB: add more exports for supporting KDB modules Mike Travis
2013-03-12 20:09 ` Eric W. Biederman
2013-03-12 22:03 ` Mike Travis
2013-03-12 22:13 ` Greg Kroah-Hartman
2013-03-12 22:26 ` Mike Travis
2013-03-12 22:39 ` Eric W. Biederman
2013-03-12 23:03 ` Mike Travis
2013-03-12 20:23 ` Greg Kroah-Hartman
2013-03-12 22:01 ` Thomas Gleixner
2013-03-12 22:08 ` Mike Travis
2013-03-12 19:38 ` [PATCH 06/14] KDB: consolidate KDB grep code Mike Travis
2013-03-12 19:38 ` [PATCH 07/14] KDB: clean up KDB grep code, add some options Mike Travis
2013-03-12 19:38 ` [PATCH 08/14] KDB: Restore call to kdump from KDB Mike Travis
2013-03-12 19:38 ` [PATCH 09/14] KDB: Add pshelp command Mike Travis
2013-03-12 19:38 ` [PATCH 10/14] KGDB/KDB: add support for external NMI handler to call KGDB/KDB Mike Travis
2013-03-12 19:38 ` [PATCH 11/14] x86/UV: Move NMI support Mike Travis
2013-03-12 19:38 ` [PATCH 12/14] x86/UV: Add uvtrace support Mike Travis
2013-03-12 19:38 ` [PATCH 13/14] x86/UV: Update UV support for external NMI signals Mike Travis
2013-03-14 7:20 ` Ingo Molnar
2013-03-20 6:13 ` Mike Travis
2013-03-21 11:51 ` Ingo Molnar [this message]
2013-03-12 19:38 ` [PATCH 14/14] x86/UV: Add call to KGDB/KDB from NMI handler Mike Travis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130321115116.GB2659@gmail.com \
--to=mingo@kernel.org \
--cc=agordeev@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=hpa@zytor.com \
--cc=jason.wessel@windriver.com \
--cc=kgdb-bugreport@lists.sourceforge.net \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=mst@redhat.com \
--cc=rja@sgi.com \
--cc=sivanich@sgi.com \
--cc=sp@numascale.com \
--cc=suresh.b.siddha@intel.com \
--cc=tglx@linutronix.de \
--cc=travis@sgi.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox