public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Don Zickus <dzickus@redhat.com>
To: Jack Steiner <steiner@sgi.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>, Ingo Molnar <mingo@elte.hu>,
	tglx@linutronix.de, hpa@zytor.com, x86@kernel.org,
	linux-kernel@vger.kernel.org,
	Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [PATCH] x86, UV: Fix NMI handler for UV platforms
Date: Wed, 23 Mar 2011 13:53:20 -0400	[thread overview]
Message-ID: <20110323175320.GB9413@redhat.com> (raw)
In-Reply-To: <20110323163255.GA17178@sgi.com>

On Wed, Mar 23, 2011 at 11:32:55AM -0500, Jack Steiner wrote:
> > The first thing is in 'uv_handle_nmi' can you change that from
> > DIE_NMIUNKNOWN back to DIE_NMI.  Originally I set it to DIE_NMIUNKNOWN
> > because I didn't think you guys had the ability to determine if your BMC
> > generated the NMI or not.  Recently George B. said you guys add a register
> > bit to determine this, so I am wondering if by promoting this would fix
> > the missed UV NMI.  I am speculating this is being swallowed by the
> > hw_perf DIE_NMIUNKNOWN exception path.
> 
> Correct. I recently added a register that indicates the BMC sent an NMI.
> 
> Hmmm. Looks like I have been running with DIE_NMI. I think that came
> from porting the patch from RHEL6 to upstream.
> 
> However, neither DIE_NMIUNKNOWN  or DIE_NMI gives the desired behavior (2.6.38+).
> 
> 	- Using DIE_NMIUNKNOWN, I see many more "dazed" messages but no
> 	  perf top lockup. I see ~3 "dazed" messages per minute. UV NMIs are
> 	  being sent at a rate of 30/min, ie. ~10% failure rate.
> 
> 	- Using DIE_NMI, no "dazed" messages but perf top hangs about once a
> 	  minute (rough estimate).
> 
> 
> I wonder if we need a different approach to handling NMIs. Instead of using
> the die_notifier list, introduce a new notifier list reserved exclusively
> for NMIs. When an NMI occurs, all registered functions are unconditionally called.
> If any function accepts the NMI, the remaining functions are still called but
> the NMI is considered to have been valid (handled) & the "dazed" message
> is suppressed.
> 
> This is more-or-less functionally equivalent to the last patch I posted but
> may be cleaner. At a minimum, it is easier to understand the interactions
> between the various handlers.

This is the same approach I was realizing last night when I went to bed.
I think the more concurrent NMIs we have, the more tricky things get.  

I hacked up an ugly patch that might fix the 'dazed' message you are
seeing.  The original skip logic assumed the back-to-back nmis would stop
after 3 nmis.  Under load, those nmis could go on forever if the time it
takes to handle the nmi matches the period in which the nmi is being
generated (I assume all the stack dumping from the BMC nmi probably
lengthens the time it takes to handle the nmi?).

For example,  the first NMI might notice two perf counters triggered.  But
it doesn't know if it triggered under one or two NMIs, so it marks the
next NMI as a possible candidate to 'swallow' if no one claims it.

Once it is finished, it notices the next nmi came from perf too (reading
the status register).  Again we don't know if this is from the second NMI
that we have not 'swallowed' yet or from the third event (because the
second NMI was never generated).

Once that finishes, another nmi comes along.  The current code says that
one has to be the one we 'swallow' or if perf 'handles' it then assume
there are no 'extra' NMIs waiting to be swallowed.

This is where the problem is, as I have seen on my machine.  The
back-to-back nmis have gone up to 4 in-a-row before spitting out the extra
nmi the code was hoping to 'swallow'.

Let me know if the patch fixes that problem.  Then it will be one less
thing to worry about. :-)

Cheers,
Don


diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 19fbcad..f9dcd81 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1327,7 +1327,7 @@ perf_event_nmi_handler(struct notifier_block *self,
 	if ((handled > 1) ||
 		/* the next nmi could be a back-to-back nmi */
 	    ((__get_cpu_var(pmu_nmi).marked == this_nmi) &&
-	     (__get_cpu_var(pmu_nmi).handled > 1))) {
+	     (__get_cpu_var(pmu_nmi).handled > 0) && handled && this_nmi)) {
 		/*
 		 * We could have two subsequent back-to-back nmis: The
 		 * first handles more than one counter, the 2nd
@@ -1338,6 +1338,8 @@ perf_event_nmi_handler(struct notifier_block *self,
 		 * handling more than one counter. We will mark the
 		 * next (3rd) and then drop it if unhandled.
 		 */
+		//if ((__get_cpu_var(pmu_nmi).handled == 1) && (handled == 1))
+		//	trace_printk("!! fixed?\n");
 		__get_cpu_var(pmu_nmi).marked	= this_nmi + 1;
 		__get_cpu_var(pmu_nmi).handled	= handled;
 	}

  reply	other threads:[~2011-03-23 17:53 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-21 16:01 [PATCH] x86, UV: Fix NMI handler for UV platforms Jack Steiner
2011-03-21 16:14 ` Ingo Molnar
2011-03-21 16:26   ` Cyrill Gorcunov
2011-03-21 16:43     ` Cyrill Gorcunov
2011-03-21 17:00       ` Cyrill Gorcunov
2011-03-21 17:08         ` Jack Steiner
2011-03-21 17:19           ` Cyrill Gorcunov
2011-03-21 17:34             ` Jack Steiner
2011-03-21 17:48               ` Cyrill Gorcunov
2011-03-21 17:55                 ` Cyrill Gorcunov
2011-03-21 18:15           ` Cyrill Gorcunov
2011-03-21 18:24             ` Jack Steiner
2011-03-21 17:53       ` Don Zickus
2011-03-21 17:51     ` Don Zickus
2011-03-21 18:00       ` Cyrill Gorcunov
2011-03-21 18:22       ` Jack Steiner
2011-03-21 19:37         ` Don Zickus
2011-03-21 20:37           ` Jack Steiner
2011-03-22 17:11           ` Jack Steiner
2011-03-22 18:44             ` Don Zickus
2011-03-22 20:02               ` Jack Steiner
2011-03-22 21:25               ` Jack Steiner
2011-03-22 22:02                 ` Cyrill Gorcunov
2011-03-23 13:36                   ` Jack Steiner
2011-03-22 22:05                 ` Don Zickus
2011-03-23 16:32                   ` Jack Steiner
2011-03-23 17:53                     ` Don Zickus [this message]
2011-03-23 20:00                       ` Don Zickus
2011-03-23 20:41                         ` Cyrill Gorcunov
2011-03-23 20:45                         ` Cyrill Gorcunov
2011-03-23 21:22                           ` Don Zickus
2011-03-23 20:46                         ` Jack Steiner
2011-03-23 21:23                           ` Don Zickus
2011-03-24 17:09                             ` Jack Steiner
2011-03-24 18:43                               ` Don Zickus
2011-03-21 16:56   ` Jack Steiner
2011-03-21 18:05     ` Ingo Molnar
2011-03-21 19:23       ` [PATCH V2] " Jack Steiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110323175320.GB9413@redhat.com \
    --to=dzickus@redhat.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=gorcunov@gmail.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=steiner@sgi.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox