All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lee Howard <faxguy@howardsilvan.com>
To: Zan Lynx <zlynx@acm.org>
Cc: Ray Lee <ray-lk@madrabbit.org>, linux-kernel@vger.kernel.org
Subject: Re: troubleshooting/debugging hard locks
Date: Tue, 20 May 2008 13:47:26 -0700	[thread overview]
Message-ID: <483338DE.8090904@howardsilvan.com> (raw)
In-Reply-To: <482BB151.4010109@howardsilvan.com>

Lee Howard wrote:
> Zan Lynx wrote:
>> On Wed, 2008-05-14 at 15:43 -0700, Ray Lee wrote:
>>  
>>> On Wed, May 14, 2008 at 12:27 PM, Lee Howard 
>>> <faxguy@howardsilvan.com> wrote:
>>>     
>>
>>  
>>>>  But, without kernel messages indicating where to look to debug... 
>>>> what is
>>>> the best approach to start troubleshooting and debugging this 
>>>> condition?  Is
>>>> there some general debug feature that can be enabled in the kernel 
>>>> that
>>>> would help hone in on the culprit?
>>>>       
>>> There's something called the NMI watchdog, that will print debugging
>>> messages out if it finds the system has hard locked. The short version
>>> is that you should add "nmi_watchdog=1" (no quotes) to the line in
>>> GRUB that has the kernel options. That assumes you have an APIC on the
>>> system. If that's not the case (you're on Uniprocessor, and no APIC)
>>> then you can try nmi_watchdog=2 instead. That'll only work on some
>>> systems, though.
>>>
>>> Better docs (than my cheesy writeup) are in
>>> Documentation/nmi_watchdog.txt in the kernel source distribution.
>>>     
>>
>> I was once told to add these to the kernel command line as well when
>> using NMI watchdog and they do seem to help it trigger more reliably:
>> "idle=poll nohz=off"
>
> Thank you to both Ray and Zan.  This was very helpful, and I think 
> that it has gotten me what I needed.
>
> "serial8250: too much work for irq16"
>
> Interestingly, now CTRL-SysRq-H will wake it back up... things get 
> running normally afterwards - the hard lock never occurs. 

After further troubleshooting it turns out that the message above was a 
bit of a red herring.  I moved the PCIe modem card from one PCIe slot to 
another so that it would be on a different IRQ (one that wasn't shared 
with the network, ATA, and USB).  The message above does not occur now, 
however, the hard lock is back... and this time with no kernel messages 
at all.

So, I'm back to square one again... I've got:

  nmi_watchdog=1 idle=poll nohz=off

... and still I get the hard lockup and no kernel messages.  How do I 
troubleshoot/debug this?  How do I know where to look?

Thanks,

Lee.

  reply	other threads:[~2008-05-20 20:47 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-14 19:27 troubleshooting/debugging hard locks Lee Howard
2008-05-14 22:43 ` Ray Lee
2008-05-14 23:42   ` Zan Lynx
2008-05-15  3:43     ` Lee Howard
2008-05-20 20:47       ` Lee Howard [this message]
2008-05-20 21:19         ` Ray Lee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=483338DE.8090904@howardsilvan.com \
    --to=faxguy@howardsilvan.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ray-lk@madrabbit.org \
    --cc=zlynx@acm.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.