public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Jones <davej@redhat.com>
To: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Cc: Jack F Vogel <jfv@bluesong.net>, Andi Kleen <ak@muc.de>,
	notting@redhat.com
Subject: Re: [PATCH] check nmi watchdog is broken
Date: Wed, 11 May 2005 01:55:40 -0400	[thread overview]
Message-ID: <20050511055540.GA27052@redhat.com> (raw)
In-Reply-To: <200505011704.j41H4SUQ021132@hera.kernel.org>

On Sun, May 01, 2005 at 10:04:28AM -0700, Linux Kernel wrote:
 > tree 6adb8d33585f8eee20794827c79e40991aeeaee5
 > parent fd51f666fa591294bd7462447512666e61c56ea0
 > author Jack F Vogel <jfv@bluesong.net> Sun, 01 May 2005 22:58:48 -0700
 > committer Linus Torvalds <torvalds@ppc970.osdl.org> Sun, 01 May 2005 22:58:48 -0700
 > 
 > [PATCH] check nmi watchdog is broken
 > 
 > A bug against an xSeries system showed up recently noting that the
 > check_nmi_watchdog() test was failing.
 > 
 > I have been investigating it and discovered in both i386 and x86_64 the
 > recent change to the routine to use the cpu_callin_map has uncovered a
 > problem.  Prior to that change, on an SMP box, the test was trivally
 > passing because all cpu's were found to not yet be online, but now with the
 > callin_map they are discovered, it goes on to test the counter and they
 > have not yet begun to increment, so it announces a CPU is stuck and bails
 > out.
 > 
 > On all the systems I have access to test, the announcement of failure is
 > also bougs...  by the time you can login and check /proc/interrupts, the
 > NMI count is happily incrementing on all CPUs.  Its just that the test is
 > being done too early.
 > 
 > I have tried moving the call to the test around a bit, and it was always
 > too early.  I finally hit on this proposed solution, it delays the routine
 > via a late_initcall(), seems like the right solution to me.

My EM64T Dell Precision 470 seems to have problems both before
and after this change, though the behaviour changes.

With -rc3..
testing NMI watchdog ... CPU#1: NMI appears to be stuck (0)!

With -rc4...
Testing NMI watchdog ... CPU#4: NMI appears to be stuck (0)!

The CPU #'s are consistent across reboots (Ie, they're always the same).
Though the rc4 behaviour seems odd, as my CPUs are numbered 0-3.

Bill (Cc'd) also reports that with -rc4, he sees around
a 10 second delay when it does that 'Testing NMI watchdog'
message.

		Dave


       reply	other threads:[~2005-05-11  5:56 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <200505011704.j41H4SUQ021132@hera.kernel.org>
2005-05-11  5:55 ` Dave Jones [this message]
2005-05-11 11:33   ` [PATCH] check nmi watchdog is broken Mikael Pettersson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050511055540.GA27052@redhat.com \
    --to=davej@redhat.com \
    --cc=ak@muc.de \
    --cc=jfv@bluesong.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=notting@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox