Re: [RFC] x86, perf: high volume of events produces a flood of unknown NMIs

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Cyrill Gorcunov <gorcunov@gmail.com>
To: Don Zickus <dzickus@redhat.com>
Cc: linux-kernel@vger.kernel.org, mingo@elte.hu,
	peterz@infradead.org, robert.richter@amd.com,
	andi@firstfloor.org, ming.m.lin@intel.com, eranian@google.com,
	ying.huang@intel.com, mathieu.desnoyers@efficios.com,
	acme@redhat.com
Subject: Re: [RFC] x86, perf: high volume of events produces a flood of unknown NMIs
Date: Wed, 20 Apr 2011 22:59:56 +0400	[thread overview]
Message-ID: <4DAF2D2C.6060509@gmail.com> (raw)
In-Reply-To: <4DAF2A57.1010804@gmail.com>

On 04/20/2011 10:47 PM, Cyrill Gorcunov wrote:
> On 04/20/2011 10:26 PM, Don Zickus wrote:
>> Hi,
>>
>> Arnaldo pointed me at an NMI problem that happens when he tries to
>> generate a high volume of perf events.  He receives a flood of unknown
>> NMIs.
>>
>> I have been poking at the problem and came up with a patch, but it doesn't
>> always work.  I was hoping people who understood how the NMI works at a
>> low level might be able to help me.
>>
>> I was able to duplicate this on an AMD Phenom, Pentium 4, Xeon Core2quad,
>> and Nehalem.  The problem I think is the large generation of back-to-back
>> NMIs.  The perf nmi handler may accidentally handle some of those
>> extra/in-flight NMIs in its first pass, leaving the next NMI to be
>> unhandled and generating an unknown NMI message.
>>
>> Robert included logic to check for two back-to-back NMIs, but that falls
>> short when more then three are generated.  I modified his logic to account
>> for three back-to-back NMIs, but that didn't completely solve the problem.
>>
>> I took another approach at catching back-to-back NMIs that seemed to work
>> on all my machines except for the Xeon core2quad, but I am not entirely
>> sure if my approach is valid.
>>
>> The approach I took was based on the idea that if an NMI is being
>> generated while currently in an NMI handler, the current NMI when finished
>> won't continue executing the next instruction before the exception but
>> instead jump back into another NMI exception frame.
>>
>> As a result, the args passed in to the NMI handler should have the same ip
>> and sp as the previous NMI interrupt.  Otherwise one could assume that
>> some amount of time passed between interrupts (enough to return from the
>> exception and execute code).
>>
>> I thought this would allow me to trap an infinite number of back-to-back
>> NMIs.  Like I said it seemed to work in a number of machines, except for
>> my Xeon core2quad.
>>
>> Does anyone know if my approach is a valid one?  Or is there a better way
>> to catch this condition?  Or maybe some other tips or tricks I can use to
>> help come up with a solution for this?
>>
>> Or perhaps we don't care about this because in the end perf can't even
>> capture the data without spitting out a CPU Overload message.
>>
>> Thoughts?
>>
> 
> Hi Don, just a thought -- since pmi masks lvtpc we could read it and check if it's
> masked or no, though I fear it is quite time consuming operation in compare with
> frames :( (hmm, intel spec mentions only p4 and xeon as masking lvtpc)

Something like

	if (apic_read(APIC_LVTPC) & APIC_LVT_MASKED))
		handle-perf

-- 
    Cyrill

next prev parent reply	other threads:[~2011-04-20 19:00 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-20 18:26 [RFC] x86, perf: high volume of events produces a flood of unknown NMIs Don Zickus
2011-04-20 18:47 ` Cyrill Gorcunov
2011-04-20 18:59   ` Cyrill Gorcunov [this message]
2011-04-20 19:12   ` Don Zickus
2011-04-20 19:26 ` Stephane Eranian
2011-04-20 21:01   ` Don Zickus
2011-04-21 13:28   ` Don Zickus
2011-04-21 13:40     ` Stephane Eranian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4DAF2D2C.6060509@gmail.com \
    --to=gorcunov@gmail.com \
    --cc=acme@redhat.com \
    --cc=andi@firstfloor.org \
    --cc=dzickus@redhat.com \
    --cc=eranian@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=ming.m.lin@intel.com \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=robert.richter@amd.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.