Re: Yet more softlockups.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dave Jones <davej@redhat.com>
To: Dave Hansen <dave.hansen@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>,
	Markus Trippelsdorf <markus@trippelsdorf.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	Peter Anvin <hpa@zytor.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Dave Hansen <dave.hansen@linux.intel.com>
Subject: Re: Yet more softlockups.
Date: Fri, 12 Jul 2013 11:45:21 -0400	[thread overview]
Message-ID: <20130712154521.GD1020@redhat.com> (raw)
In-Reply-To: <51E0230C.9010509@intel.com>

On Fri, Jul 12, 2013 at 08:38:52AM -0700, Dave Hansen wrote:
 
 > The warning comes from calling perf_sample_event_took(), which is only
 > called from one place: perf_event_nmi_handler().
 > 
 > So we can be pretty sure that the perf NMI is firing, or at least that
 > this handler code is running.
 > 
 > nmi_handle() says:
 >         /*
 >          * NMIs are edge-triggered, which means if you have enough
 >          * of them concurrently, you can lose some because only one
 >          * can be latched at any given time.  Walk the whole list
 >          * to handle those situations.
 >          */
 > 
 > perf_event_nmi_handler() probably gets _called_ when the watchdog NMI
 > goes off.  But, it should hit this check:
 > 
 >         if (!atomic_read(&active_events))
 >                 return NMI_DONE;
 > 
 > and return quickly. This is before it has a chance to call
 > perf_sample_event_took().
 > 
 > Dave, for your case, my suspicion would be that it got turned on
 > inadvertently, or that we somehow have a bug which bumped up
 > perf_event.c's 'active_events' and we're running some perf code that we
 > don't have to.
 
What do you 'inadvertantly' ? I see this during bootup every time.
Unless systemd or something has started playing with perf, (which afaik it isn't)

 > But, I'm suspicious.  I was having all kinds of issues with perf and
 > NMIs taking hundreds of milliseconds.  I never isolated it to having a
 > real, single, cause.  I attributed it to my large NUMA system just being
 > slow.  Your description makes me wonder what I missed, though.

Here's a fun trick:

trinity -c perf_event_open -C4 -q -l off

Within about a minute, that brings any of my boxes to its knees.
The softlockup detector starts going nuts, and then the box wedges solid.

(You may need to bump -C depending on your CPU count. I've never seen it happen
 with a single process, but -C2 seems to be a minimum)

That *is* using perf though, so I kind of expect bad shit to happen when there are bugs.
The "during bootup" case is still a head-scratcher.

	Dave

next prev parent reply	other threads:[~2013-07-12 15:46 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-04  1:55 scheduling while atomic & hang Dave Jones
2013-07-04  2:49 ` Linus Torvalds
2013-07-04  4:43   ` H. Peter Anvin
2013-07-04  7:49   ` Dave Jones
2013-07-04 17:22     ` Linus Torvalds
2013-07-10 20:54     ` J. Bruce Fields
2013-07-04 18:08   ` H. Peter Anvin
2013-07-05  6:51     ` Ingo Molnar
2013-07-05  7:19       ` Frederic Weisbecker
2013-07-05 10:05         ` Ingo Molnar
2013-07-05 19:27       ` Linus Torvalds
2013-07-06  8:02         ` Frederic Weisbecker
2013-07-06  8:07           ` Ingo Molnar
2013-07-05 14:38   ` Yet more softlockups Dave Jones
2013-07-05 15:15     ` Thomas Gleixner
2013-07-05 16:00       ` Dave Jones
2013-07-05 16:02         ` Thomas Gleixner
2013-07-05 16:41           ` H. Peter Anvin
2013-07-05 18:20             ` Seiji Aguchi
2013-07-05 22:21               ` Thomas Gleixner
2013-07-06  7:24         ` Ingo Molnar
2013-07-07  0:24           ` Dave Jones
2013-07-10 15:13           ` Dave Jones
2013-07-10 15:20             ` Markus Trippelsdorf
2013-07-10 15:40               ` Dave Jones
2013-07-12 10:31                 ` Ingo Molnar
2013-07-12 15:38                   ` Dave Hansen
2013-07-12 15:45                     ` Dave Jones [this message]
2013-07-12 15:55                       ` Dave Hansen
2013-07-12 17:00                         ` Dave Jones
2013-07-12 17:12                       ` David Ahern
2013-07-12 17:18                         ` Dave Jones
2013-07-12 17:40                           ` David Ahern
2013-07-12 17:50                             ` Dave Jones
2013-07-12 18:07                               ` David Ahern
2013-07-12 18:22                                 ` Dave Hansen
2013-07-12 20:13                                 ` Dave Hansen
2013-07-13  1:40                       ` Vince Weaver
2013-07-10 15:39             ` Vince Weaver
2013-07-10 15:45               ` Dave Jones
2013-07-10 21:54               ` Dave Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130712154521.GD1020@redhat.com \
    --to=davej@redhat.com \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=markus@trippelsdorf.de \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.