All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Johan Ekenberg" <johan@ekenberg.se>
To: <linux-kernel@vger.kernel.org>
Subject: Lockups with 2.4.14 and 2.4.16
Date: Wed, 12 Dec 2001 00:29:38 +0100	[thread overview]
Message-ID: <000901c1829b$b38e1720$050010ac@FUTURE> (raw)

We recently upgraded 10 servers from 2.2.19 to 2.4.14/2.4.16. Since then,
several servers have experienced severe lockups forcing hardware resets. The
machines are Intel PIII (Dual) SMP running Epox motherboards. Here are the
details:

## The Story:
 - Suddenly a machine gets a load average of about 500-1000.
 - It's not possible to log in either at the console or by SSH.
 - Some commands are possible to run through ssh from a remote server, like:
   "ssh badserver ps auxwf" or "ssh badserver free"
 - Despite a system load of 1000, commands like "free", "ps" and "uptime"
often respond quickly, no "sluggishness".
 - The locked up machine seems to use all available memory plus a good deal
of swap
 - The process table gets bigger and bigger, mainly ipop3d processes from
users trying to fetch mail but getting no reply.
 - The processors seem to be mostly idle.
 - Killing processes doesn't work, not even with SIGKILL.
 - We haven't been able to find a time pattern for the lockups, or to
reproduce them at will.
 - No kernel error messages are written to the console or logs.
 - Ctrl-alt-delete produces a "Rebooting"-message on the console, but there
is no actual reboot. Power cycling is the only way out.
 - My not-so-professional guess is that the machine is locked up waiting for
some disk i/o that never happens, either to swap or normal filesystem. But,
I might be all wrong.

## Hardware:
 - Dual PIII 850 on Epox BXB-S and Epox KP6-BS
 - 1Gb RAM (4x256)
 - Mylex AcceleRAID 352 PCI RAID Controller,
   IBM disks, 3x36Gb Raid-5 mounted on /
   and 2x18 Raid-1 mounted on /var/spool
 - 1x20Gb IDE for /boot and swap (2 x 2Gb swap partitions)
 - 1x36Gb IDE for backups

## Kernel:
 - 2.4.14 and 2.4.16
 - Patched for reiserfs-quota with patches found at
   ftp://ftp.suse.com/pub/people/mason/patches/reiserfs/quota-2.4/
     ( * 50_quota-patch
       * dquota_deadlock
       * nesting
       * reiserfs-quota )
 - Complete kernel-config found here:
http://www.ekenberg.se/2.4-trouble/2.4.16-config
 - Boot parameters are: "ether=0,0,eth1 panic=60 noapic"

## Filesystems:
 - ReiserFS (3.6) except /boot which is ext2

## General
 - The servers are used mainly for:
   * Apache/PHP with ~1000 VHosts
   * Mail (Sendmail, imap, pop3)
   * MySQL

## /etc/fstab:
/dev/rd/c0d0    /           reiserfs    defaults,usrquota,noatime,notail   1
1
/dev/rd/c0d1    /var/spool  reiserfs    defaults,usrquota,noatime,notail   1
1
/dev/hdb1       /hdb1       reiserfs    defaults,noatime,notail 0 0
/dev/hda1       /boot       ext2        defaults  1  1
/dev/hda2       swap        swap        defaults  0  0
/dev/hda3       swap        swap        defaults  0  0
none            /dev/pts    devpts      gid=5,mode=620  0   0
none            /proc       proc        defaults   0   0

## lspci:
00:00.0 Host bridge: Intel Corporation 440BX/ZX - 82443BX/ZX Host bridge
(rev 03)
00:01.0 PCI bridge: Intel Corporation 440BX/ZX - 82443BX/ZX AGP bridge (rev
03)
00:07.0 ISA bridge: Intel Corporation 82371AB PIIX4 ISA (rev 02)
00:07.1 IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 01)
00:07.2 USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 01)
00:07.3 Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 02)
00:08.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone]
(rev 30)
00:09.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone]
(rev 30)
00:0a.0 PCI bridge: Intel Corporation: Unknown device 0964 (rev 02)
00:0a.1 RAID bus controller: Mylex Corporation: Unknown device 0050 (rev 02)
00:0c.0 SCSI storage controller: Adaptec AHA-2940U2/W / 7890
01:00.0 VGA compatible controller: S3 Inc. 86c368 [Trio 3D/2X] (rev 02)


This is my first post to LKML, please forgive me if I forgot some relevant
info.
Please Cc: replies as I'm not subscribed to LKML.

Best regards,
/Johan Ekenberg



             reply	other threads:[~2001-12-11 23:30 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-12-11 23:29 Johan Ekenberg [this message]
2001-12-11 23:47 ` Lockups with 2.4.14 and 2.4.16 Alan Cox
2001-12-11 23:56   ` SV: " Johan Ekenberg
2001-12-12  0:36     ` Alan Cox
2001-12-14 16:49     ` Chris Mason
2001-12-14 17:26       ` Andrew Morton
2001-12-14 17:53         ` Chris Mason
2001-12-14 18:32           ` Andrea Arcangeli
2001-12-14 18:55             ` Chris Mason
2001-12-14 18:57             ` Andrew Morton
2001-12-14 19:16               ` Andrea Arcangeli
2001-12-20 13:29               ` Chris Mason
     [not found]               ` <1624652704.1008906979@tiny>
     [not found]                 ` <3C22CC54.D4F5B01@zip.com.au>
2001-12-21 13:29                   ` [PATCH] " Chris Mason
2001-12-14 19:26           ` Jan Kara
2001-12-14 19:21         ` Jan Kara
2001-12-12  0:56   ` SV: " Johan Ekenberg
2001-12-12  1:22     ` Alan Cox
2001-12-12  0:12 ` Brad Dameron
2001-12-12  0:47 ` Chris Mason
2001-12-12  1:01   ` SV: " Johan Ekenberg
2001-12-12  1:10     ` Hans Reiser
2001-12-12  1:15     ` Chris Mason
  -- strict thread matches above, loose matches on Subject: below --
2001-12-12  0:38 Johan Ekenberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='000901c1829b$b38e1720$050010ac@FUTURE' \
    --to=johan@ekenberg.se \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.