public inbox for linux-hams@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Platt <dplatt@radagast.org>
To: kd1zd@rtcubed.org
Cc: linux-hams@vger.kernel.org
Subject: Re: Problems with system lockup
Date: Thu, 18 Nov 2010 10:42:52 -0800	[thread overview]
Message-ID: <4CE573AC.1010607@radagast.org> (raw)
In-Reply-To: <555517178-1290101199-cardhu_decombobulator_blackberry.rim.net-278429640-@bda381.bisx.prod.on.blackberry>

kd1zd@rtcubed.org wrote:
> I've been having problems recently with my Linux system.  It randomly locks up, and I was wondering if anyone out there has experienced problems with a hung system.  
> 
> My system is CentOS 5.5, and is running UROnode and JNOS.  I have been having segmentation violation issues with JNOS, which I'm trying to figure out.  I have two serial ports which drive TNCs connected to radios.  
> 
> When I say the system is hung, I mean that the only thing that will liven it is a hard reboot or power cycle.  Anyone else troubleshoot these issues before?  I'm trying to run a 24/7 TCP/IP node and BBS, but these lockups make it very hard to do so.

There can be a number of problems which can cause these
sorts of lockups... sometimes hardware, sometimes software,
sometimes an interaction of the two.  I've run into a bunch
of them over the years.

Some examples:

-  Hardware problems on the motherboard, pure and simple...
   bad DRAM, for example, or an overheating CPU due to a
   fan failure, or overly-aggressive overclocking.  It
   wouldn't hurt to install, and then run the stand-alone
   MEMTEST86+ check (let it run overnight, at least) to
   see if there are DRAM or timing problems.

   The fact that you're seeing segfaults in JNOS, as well
   as complete freezes, brings this possibility to the
   top of my UsualSuspects list.

-  Power-supply problems... momentary voltage sags can
   glitch a box pretty badly.

-  Problems with the power-management code, in the kernel
   or in the BIOS (e.g. Intel SpeedStep, or the AMD
   equivalent).  There have been a fair number of motherboards
   and CPUs which don't handle the switching between different
   processor clock speeds and voltages properly.

-  PCI (or other) cards not plugged securely into their slots,
   resulting in intermittent contacts that can cause all sorts
   of confusion.

-  Driver bugs.  I had a nasty periodic full freeze on my new
   firewall/server system at home, which turned out to be a bug
   in the driver for the USB Ethernet dongle I was using to add
   a third Ethernet port... it worked OK under light load but
   froze the system solid under some heavy-load conditions.  If
   you've got a uSB dongle which uses the "kaweth" driver, get
   rid of it.

Something you may be able to do, as a very short term ugly
workaround, is to use a hardware-based watchdog to reboot
the system if it freezes.  A lot of motherboards these days
use a "Super IO" chip which incorporates such a watchdog, and
Linux has a driver and utility program to access it.  Start up
the watchdog program (in the "no exit allowed" mode), and if
the watchdog program doesn't wake up and successfully poke the
watchdog chip's registers every ten seconds or so, the chip will
yank the board's /RESET line and do a hard reboot.  Nasty, but
perhaps better than a day-long hang until you can get home
and push the Big Red Button yourself.  As the man said in
Young Frankenstein, "A riot is an ugly thing... and I think
it's about time we had one!!!"


  reply	other threads:[~2010-11-18 18:42 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-18 17:26 Problems with system lockup kd1zd
2010-11-18 18:42 ` Dave Platt [this message]
2010-11-18 19:13   ` Gordon JC Pearce
  -- strict thread matches above, loose matches on Subject: below --
2010-11-19 13:32 kd1zd
2010-11-19 14:50 ` Gordon JC Pearce

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CE573AC.1010607@radagast.org \
    --to=dplatt@radagast.org \
    --cc=kd1zd@rtcubed.org \
    --cc=linux-hams@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox