From: "Johan Ekenberg" <johan@ekenberg.se>
To: <linux-kernel@vger.kernel.org>
Subject: Lockups with 2.4.14 and 2.4.16
Date: Wed, 12 Dec 2001 00:29:38 +0100 [thread overview]
Message-ID: <000901c1829b$b38e1720$050010ac@FUTURE> (raw)
We recently upgraded 10 servers from 2.2.19 to 2.4.14/2.4.16. Since then,
several servers have experienced severe lockups forcing hardware resets. The
machines are Intel PIII (Dual) SMP running Epox motherboards. Here are the
details:
## The Story:
- Suddenly a machine gets a load average of about 500-1000.
- It's not possible to log in either at the console or by SSH.
- Some commands are possible to run through ssh from a remote server, like:
"ssh badserver ps auxwf" or "ssh badserver free"
- Despite a system load of 1000, commands like "free", "ps" and "uptime"
often respond quickly, no "sluggishness".
- The locked up machine seems to use all available memory plus a good deal
of swap
- The process table gets bigger and bigger, mainly ipop3d processes from
users trying to fetch mail but getting no reply.
- The processors seem to be mostly idle.
- Killing processes doesn't work, not even with SIGKILL.
- We haven't been able to find a time pattern for the lockups, or to
reproduce them at will.
- No kernel error messages are written to the console or logs.
- Ctrl-alt-delete produces a "Rebooting"-message on the console, but there
is no actual reboot. Power cycling is the only way out.
- My not-so-professional guess is that the machine is locked up waiting for
some disk i/o that never happens, either to swap or normal filesystem. But,
I might be all wrong.
## Hardware:
- Dual PIII 850 on Epox BXB-S and Epox KP6-BS
- 1Gb RAM (4x256)
- Mylex AcceleRAID 352 PCI RAID Controller,
IBM disks, 3x36Gb Raid-5 mounted on /
and 2x18 Raid-1 mounted on /var/spool
- 1x20Gb IDE for /boot and swap (2 x 2Gb swap partitions)
- 1x36Gb IDE for backups
## Kernel:
- 2.4.14 and 2.4.16
- Patched for reiserfs-quota with patches found at
ftp://ftp.suse.com/pub/people/mason/patches/reiserfs/quota-2.4/
( * 50_quota-patch
* dquota_deadlock
* nesting
* reiserfs-quota )
- Complete kernel-config found here:
http://www.ekenberg.se/2.4-trouble/2.4.16-config
- Boot parameters are: "ether=0,0,eth1 panic=60 noapic"
## Filesystems:
- ReiserFS (3.6) except /boot which is ext2
## General
- The servers are used mainly for:
* Apache/PHP with ~1000 VHosts
* Mail (Sendmail, imap, pop3)
* MySQL
## /etc/fstab:
/dev/rd/c0d0 / reiserfs defaults,usrquota,noatime,notail 1
1
/dev/rd/c0d1 /var/spool reiserfs defaults,usrquota,noatime,notail 1
1
/dev/hdb1 /hdb1 reiserfs defaults,noatime,notail 0 0
/dev/hda1 /boot ext2 defaults 1 1
/dev/hda2 swap swap defaults 0 0
/dev/hda3 swap swap defaults 0 0
none /dev/pts devpts gid=5,mode=620 0 0
none /proc proc defaults 0 0
## lspci:
00:00.0 Host bridge: Intel Corporation 440BX/ZX - 82443BX/ZX Host bridge
(rev 03)
00:01.0 PCI bridge: Intel Corporation 440BX/ZX - 82443BX/ZX AGP bridge (rev
03)
00:07.0 ISA bridge: Intel Corporation 82371AB PIIX4 ISA (rev 02)
00:07.1 IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 01)
00:07.2 USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 01)
00:07.3 Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 02)
00:08.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone]
(rev 30)
00:09.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone]
(rev 30)
00:0a.0 PCI bridge: Intel Corporation: Unknown device 0964 (rev 02)
00:0a.1 RAID bus controller: Mylex Corporation: Unknown device 0050 (rev 02)
00:0c.0 SCSI storage controller: Adaptec AHA-2940U2/W / 7890
01:00.0 VGA compatible controller: S3 Inc. 86c368 [Trio 3D/2X] (rev 02)
This is my first post to LKML, please forgive me if I forgot some relevant
info.
Please Cc: replies as I'm not subscribed to LKML.
Best regards,
/Johan Ekenberg
next reply other threads:[~2001-12-11 23:30 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2001-12-11 23:29 Johan Ekenberg [this message]
2001-12-11 23:47 ` Lockups with 2.4.14 and 2.4.16 Alan Cox
2001-12-11 23:56 ` SV: " Johan Ekenberg
2001-12-12 0:36 ` Alan Cox
2001-12-14 16:49 ` Chris Mason
2001-12-14 17:26 ` Andrew Morton
2001-12-14 17:53 ` Chris Mason
2001-12-14 18:32 ` Andrea Arcangeli
2001-12-14 18:55 ` Chris Mason
2001-12-14 18:57 ` Andrew Morton
2001-12-14 19:16 ` Andrea Arcangeli
2001-12-20 13:29 ` Chris Mason
[not found] ` <1624652704.1008906979@tiny>
[not found] ` <3C22CC54.D4F5B01@zip.com.au>
2001-12-21 13:29 ` [PATCH] " Chris Mason
2001-12-14 19:26 ` Jan Kara
2001-12-14 19:21 ` Jan Kara
2001-12-12 0:56 ` SV: " Johan Ekenberg
2001-12-12 1:22 ` Alan Cox
2001-12-12 0:12 ` Brad Dameron
2001-12-12 0:47 ` Chris Mason
2001-12-12 1:01 ` SV: " Johan Ekenberg
2001-12-12 1:10 ` Hans Reiser
2001-12-12 1:15 ` Chris Mason
-- strict thread matches above, loose matches on Subject: below --
2001-12-12 0:38 Johan Ekenberg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='000901c1829b$b38e1720$050010ac@FUTURE' \
--to=johan@ekenberg.se \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox