public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Mike Travis <travis@sgi.com>
Cc: Jason Wessel <jason.wessel@windriver.com>,
	Dimitri Sivanich <sivanich@sgi.com>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	kgdb-bugreport@lists.sourceforge.net, x86@kernel.org,
	linux-kernel@vger.kernel.org, Russ Anderson <rja@sgi.com>,
	Alexander Gordeev <agordeev@redhat.com>,
	Suresh Siddha <suresh.b.siddha@intel.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Steffen Persvold <sp@numascale.com>
Subject: Re: [PATCH 13/14] x86/UV: Update UV support for external NMI signals
Date: Thu, 21 Mar 2013 12:51:16 +0100	[thread overview]
Message-ID: <20130321115116.GB2659@gmail.com> (raw)
In-Reply-To: <5149538F.2080402@sgi.com>


* Mike Travis <travis@sgi.com> wrote:

> 
> 
> On 3/14/2013 12:20 AM, Ingo Molnar wrote:
> > 
> > * Mike Travis <travis@sgi.com> wrote:
> > 
> >>
> >> There is an exception where the NMI_LOCAL notifier chain is used. When 
> >> the perf tools are in use, it's possible that our NMI was captured by 
> >> some other NMI handler and then ignored.  We set a per_cpu flag for 
> >> those CPUs that ignored the initial NMI, and then send them an IPI NMI 
> >> signal.
> > 
> > "Other" NMI handlers should never lose NMIs - if they do then they should 
> > be fixed I think.
> > 
> > Thanks,
> > 
> > 	Ingo
> 
> Hi Ingo,
> 
> I suspect that the other NMI handlers would not grab ours if we were
> on the NMI_LOCAL chain to claim them.  The problem though is the UV
> Hub is not designed to have that amount of traffic reading the MMRs.
> This was handled in previous kernel versions by a.) putting us at the
> bottom of the chain; and b.) as soon as a handler claimed an NMI as
> it's own, the search would be stopped.
> 
> Neither of these are true any more as all handlers are called for
> all NMIs.  (I measured anywhere from .5M to 4M NMIs per second on a
> 64 socket, 1024 cpu thread system [not sure why the rate changes]).
> This was the primary motivation for placing the UV NMI handler on the
> NMI_UNKNOWN chain, so it would be called only if all other handlers
> "gave up", and thus not incur the overhead of the MMR reads on every
> NMI event.

That's a fair motivation.

> The good news is that I haven't yet encountered a case where the
> "missing" cpus were not called into the NMI loop.  Even better news
> is that on the previous (3.0 vintage) kernels running two perf tops
> would almost always cause either tons of the infamous "dazed and
> confused" messages, or would lock up the system.  Now it results in
> quite a few messages like:
> 
> [  961.119417] perf_event_intel: clearing PMU state on CPU#652
> 
> followed by a dump of a number of cpu PMC registers.  But the system
> remains responsive.  (This was experienced in our Customer Training
> Lab where multiple system admins were in the class.)

I too can provoke those messages when pushing PMUs hard enough via 
multiple perf users. I suspect there's still some PMU erratum that
seems to have been introduced at around Nehalem CPUs.

Clearing the PMU works it around, at the cost of a loss of a slight 
amount of profiling data.

> The bad news is I'm not sure why the errant NMI interrupts are lost.
> I have noticed that restricting the 'perf tops' to separate and
> distinct cpusets seems to lessen this "stomping on each other's perf
> event handlers" effect, which might be more representative of actual
> customer usage.
> 
> So in total the situation is vastly improved... :)

Okay. My main dislike is the linecount:

 4 files changed, 648 insertions(+), 41 deletions(-)

... for something that should in theory work almost out of box, with 
minimal glue!

As long as it stays in the UV platform code this isn't a NAK from me - 
just wanted to inquire whether most of that complexity could be eliminated 
by figuring out the root cause of the lost NMIs ...

Thanks,

	Ingo

  reply	other threads:[~2013-03-21 11:51 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-12 19:38 [PATCH 00/14] x86/UV/KDB/NMI: Updates for NMI/KDB handler for SGI UV Mike Travis
2013-03-12 19:38 ` [PATCH 01/14] KDB: fix the interrupt of the KDB btc command Mike Travis
2013-03-12 19:38 ` [PATCH 02/14] KDB: fix errant character in KDB show regs Mike Travis
2013-03-12 19:38 ` [PATCH 03/14] KDB: up the default LINES value Mike Travis
2013-03-12 19:38 ` [PATCH 04/14] KDB: allow KDB modules to be external modules Mike Travis
2013-03-12 19:38 ` [PATCH 05/14] KDB: add more exports for supporting KDB modules Mike Travis
2013-03-12 20:09   ` Eric W. Biederman
2013-03-12 22:03     ` Mike Travis
2013-03-12 22:13       ` Greg Kroah-Hartman
2013-03-12 22:26         ` Mike Travis
2013-03-12 22:39       ` Eric W. Biederman
2013-03-12 23:03         ` Mike Travis
2013-03-12 20:23   ` Greg Kroah-Hartman
2013-03-12 22:01   ` Thomas Gleixner
2013-03-12 22:08     ` Mike Travis
2013-03-12 19:38 ` [PATCH 06/14] KDB: consolidate KDB grep code Mike Travis
2013-03-12 19:38 ` [PATCH 07/14] KDB: clean up KDB grep code, add some options Mike Travis
2013-03-12 19:38 ` [PATCH 08/14] KDB: Restore call to kdump from KDB Mike Travis
2013-03-12 19:38 ` [PATCH 09/14] KDB: Add pshelp command Mike Travis
2013-03-12 19:38 ` [PATCH 10/14] KGDB/KDB: add support for external NMI handler to call KGDB/KDB Mike Travis
2013-03-12 19:38 ` [PATCH 11/14] x86/UV: Move NMI support Mike Travis
2013-03-12 19:38 ` [PATCH 12/14] x86/UV: Add uvtrace support Mike Travis
2013-03-12 19:38 ` [PATCH 13/14] x86/UV: Update UV support for external NMI signals Mike Travis
2013-03-14  7:20   ` Ingo Molnar
2013-03-20  6:13     ` Mike Travis
2013-03-21 11:51       ` Ingo Molnar [this message]
2013-03-12 19:38 ` [PATCH 14/14] x86/UV: Add call to KGDB/KDB from NMI handler Mike Travis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130321115116.GB2659@gmail.com \
    --to=mingo@kernel.org \
    --cc=agordeev@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hpa@zytor.com \
    --cc=jason.wessel@windriver.com \
    --cc=kgdb-bugreport@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mst@redhat.com \
    --cc=rja@sgi.com \
    --cc=sivanich@sgi.com \
    --cc=sp@numascale.com \
    --cc=suresh.b.siddha@intel.com \
    --cc=tglx@linutronix.de \
    --cc=travis@sgi.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox