public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Lockups with 2.4.14 and 2.4.16
@ 2001-12-11 23:29 Johan Ekenberg
  2001-12-11 23:47 ` Alan Cox
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Johan Ekenberg @ 2001-12-11 23:29 UTC (permalink / raw)
  To: linux-kernel

We recently upgraded 10 servers from 2.2.19 to 2.4.14/2.4.16. Since then,
several servers have experienced severe lockups forcing hardware resets. The
machines are Intel PIII (Dual) SMP running Epox motherboards. Here are the
details:

## The Story:
 - Suddenly a machine gets a load average of about 500-1000.
 - It's not possible to log in either at the console or by SSH.
 - Some commands are possible to run through ssh from a remote server, like:
   "ssh badserver ps auxwf" or "ssh badserver free"
 - Despite a system load of 1000, commands like "free", "ps" and "uptime"
often respond quickly, no "sluggishness".
 - The locked up machine seems to use all available memory plus a good deal
of swap
 - The process table gets bigger and bigger, mainly ipop3d processes from
users trying to fetch mail but getting no reply.
 - The processors seem to be mostly idle.
 - Killing processes doesn't work, not even with SIGKILL.
 - We haven't been able to find a time pattern for the lockups, or to
reproduce them at will.
 - No kernel error messages are written to the console or logs.
 - Ctrl-alt-delete produces a "Rebooting"-message on the console, but there
is no actual reboot. Power cycling is the only way out.
 - My not-so-professional guess is that the machine is locked up waiting for
some disk i/o that never happens, either to swap or normal filesystem. But,
I might be all wrong.

## Hardware:
 - Dual PIII 850 on Epox BXB-S and Epox KP6-BS
 - 1Gb RAM (4x256)
 - Mylex AcceleRAID 352 PCI RAID Controller,
   IBM disks, 3x36Gb Raid-5 mounted on /
   and 2x18 Raid-1 mounted on /var/spool
 - 1x20Gb IDE for /boot and swap (2 x 2Gb swap partitions)
 - 1x36Gb IDE for backups

## Kernel:
 - 2.4.14 and 2.4.16
 - Patched for reiserfs-quota with patches found at
   ftp://ftp.suse.com/pub/people/mason/patches/reiserfs/quota-2.4/
     ( * 50_quota-patch
       * dquota_deadlock
       * nesting
       * reiserfs-quota )
 - Complete kernel-config found here:
http://www.ekenberg.se/2.4-trouble/2.4.16-config
 - Boot parameters are: "ether=0,0,eth1 panic=60 noapic"

## Filesystems:
 - ReiserFS (3.6) except /boot which is ext2

## General
 - The servers are used mainly for:
   * Apache/PHP with ~1000 VHosts
   * Mail (Sendmail, imap, pop3)
   * MySQL

## /etc/fstab:
/dev/rd/c0d0    /           reiserfs    defaults,usrquota,noatime,notail   1
1
/dev/rd/c0d1    /var/spool  reiserfs    defaults,usrquota,noatime,notail   1
1
/dev/hdb1       /hdb1       reiserfs    defaults,noatime,notail 0 0
/dev/hda1       /boot       ext2        defaults  1  1
/dev/hda2       swap        swap        defaults  0  0
/dev/hda3       swap        swap        defaults  0  0
none            /dev/pts    devpts      gid=5,mode=620  0   0
none            /proc       proc        defaults   0   0

## lspci:
00:00.0 Host bridge: Intel Corporation 440BX/ZX - 82443BX/ZX Host bridge
(rev 03)
00:01.0 PCI bridge: Intel Corporation 440BX/ZX - 82443BX/ZX AGP bridge (rev
03)
00:07.0 ISA bridge: Intel Corporation 82371AB PIIX4 ISA (rev 02)
00:07.1 IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 01)
00:07.2 USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 01)
00:07.3 Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 02)
00:08.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone]
(rev 30)
00:09.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone]
(rev 30)
00:0a.0 PCI bridge: Intel Corporation: Unknown device 0964 (rev 02)
00:0a.1 RAID bus controller: Mylex Corporation: Unknown device 0050 (rev 02)
00:0c.0 SCSI storage controller: Adaptec AHA-2940U2/W / 7890
01:00.0 VGA compatible controller: S3 Inc. 86c368 [Trio 3D/2X] (rev 02)


This is my first post to LKML, please forgive me if I forgot some relevant
info.
Please Cc: replies as I'm not subscribed to LKML.

Best regards,
/Johan Ekenberg



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2001-12-21 13:38 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-12-11 23:29 Lockups with 2.4.14 and 2.4.16 Johan Ekenberg
2001-12-11 23:47 ` Alan Cox
2001-12-11 23:56   ` SV: " Johan Ekenberg
2001-12-12  0:36     ` Alan Cox
2001-12-14 16:49     ` Chris Mason
2001-12-14 17:26       ` Andrew Morton
2001-12-14 17:53         ` Chris Mason
2001-12-14 18:32           ` Andrea Arcangeli
2001-12-14 18:55             ` Chris Mason
2001-12-14 18:57             ` Andrew Morton
2001-12-14 19:16               ` Andrea Arcangeli
2001-12-20 13:29               ` Chris Mason
     [not found]               ` <1624652704.1008906979@tiny>
     [not found]                 ` <3C22CC54.D4F5B01@zip.com.au>
2001-12-21 13:29                   ` [PATCH] " Chris Mason
2001-12-14 19:26           ` Jan Kara
2001-12-14 19:21         ` Jan Kara
2001-12-12  0:56   ` SV: " Johan Ekenberg
2001-12-12  1:22     ` Alan Cox
2001-12-12  0:12 ` Brad Dameron
2001-12-12  0:47 ` Chris Mason
2001-12-12  1:01   ` SV: " Johan Ekenberg
2001-12-12  1:10     ` Hans Reiser
2001-12-12  1:15     ` Chris Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox