From: Dave Jones <davej@redhat.com>
To: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Cc: Jack F Vogel <jfv@bluesong.net>, Andi Kleen <ak@muc.de>,
notting@redhat.com
Subject: Re: [PATCH] check nmi watchdog is broken
Date: Wed, 11 May 2005 01:55:40 -0400 [thread overview]
Message-ID: <20050511055540.GA27052@redhat.com> (raw)
In-Reply-To: <200505011704.j41H4SUQ021132@hera.kernel.org>
On Sun, May 01, 2005 at 10:04:28AM -0700, Linux Kernel wrote:
> tree 6adb8d33585f8eee20794827c79e40991aeeaee5
> parent fd51f666fa591294bd7462447512666e61c56ea0
> author Jack F Vogel <jfv@bluesong.net> Sun, 01 May 2005 22:58:48 -0700
> committer Linus Torvalds <torvalds@ppc970.osdl.org> Sun, 01 May 2005 22:58:48 -0700
>
> [PATCH] check nmi watchdog is broken
>
> A bug against an xSeries system showed up recently noting that the
> check_nmi_watchdog() test was failing.
>
> I have been investigating it and discovered in both i386 and x86_64 the
> recent change to the routine to use the cpu_callin_map has uncovered a
> problem. Prior to that change, on an SMP box, the test was trivally
> passing because all cpu's were found to not yet be online, but now with the
> callin_map they are discovered, it goes on to test the counter and they
> have not yet begun to increment, so it announces a CPU is stuck and bails
> out.
>
> On all the systems I have access to test, the announcement of failure is
> also bougs... by the time you can login and check /proc/interrupts, the
> NMI count is happily incrementing on all CPUs. Its just that the test is
> being done too early.
>
> I have tried moving the call to the test around a bit, and it was always
> too early. I finally hit on this proposed solution, it delays the routine
> via a late_initcall(), seems like the right solution to me.
My EM64T Dell Precision 470 seems to have problems both before
and after this change, though the behaviour changes.
With -rc3..
testing NMI watchdog ... CPU#1: NMI appears to be stuck (0)!
With -rc4...
Testing NMI watchdog ... CPU#4: NMI appears to be stuck (0)!
The CPU #'s are consistent across reboots (Ie, they're always the same).
Though the rc4 behaviour seems odd, as my CPUs are numbered 0-3.
Bill (Cc'd) also reports that with -rc4, he sees around
a 10 second delay when it does that 'Testing NMI watchdog'
message.
Dave
next parent reply other threads:[~2005-05-11 5:56 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <200505011704.j41H4SUQ021132@hera.kernel.org>
2005-05-11 5:55 ` Dave Jones [this message]
2005-05-11 11:33 ` [PATCH] check nmi watchdog is broken Mikael Pettersson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050511055540.GA27052@redhat.com \
--to=davej@redhat.com \
--cc=ak@muc.de \
--cc=jfv@bluesong.net \
--cc=linux-kernel@vger.kernel.org \
--cc=notting@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.