public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* CMD680, kernel 2.4.21, and heartache
@ 2003-10-03 11:23 Erik Bourget
  2003-10-03 11:59 ` John Bradford
  2003-10-04  1:57 ` jimbleferret
  0 siblings, 2 replies; 10+ messages in thread
From: Erik Bourget @ 2003-10-03 11:23 UTC (permalink / raw)
  To: linux-kernel


Hello,

I've got a Big Problem.

Day 0: 8 new NFS servers go online, they are P4-2.4GHz boxes with two each
120GB Samsung drives attached to CMD680/SiI680 IDE controllers.  They run
Debian stable on a 2.4.21 kernel, with SMP enabled though they are uniproc
boxes, running NFSv3-via-TCP and reiserfs.  CMD680/siimage support compiled
in, obviously.  Software RAID, mirroring drives.

Out of 8 boxes:  

*) One has crashed hard.  I'm about to drive to the datacenter to plug in a
   KVM and take a picture.
*) Three have had DMA turned off and have given extremely spooky errors.
   Read below.

Some factors that are definitely NOT a problem:
- Faulty run of drives.  This has also happened to Hitachi 80GB drives in the
  same configurations.

- Heat.  They're in a chilly room.  The cases haven't overheated.  We've had
  guys checking this every few hours after the first one went bonkers.

Possible problems -
- Simple software problem that somebody can fix and save the day. :)
- All Dell Poweredge 650 servers are broken.  :/

Days 1-6: Faithful service.

Day 7: 
Sep 29 09:06:42 mailstore2-1 -- MARK --
Sep 29 09:12:18 mailstore2-1 kernel: hdc: dma_timer_expiry: dma status == 0x20
Sep 29 09:12:18 mailstore2-1 kernel: hdc: status timeout: status=0xd0 { Busy }
Sep 29 09:12:18 mailstore2-1 kernel: 
Sep 29 09:12:18 mailstore2-1 kernel: ide1: reset: success
Sep 29 09:26:42 mailstore2-1 -- MARK --

Few more days of faithful service.

Little bit ago:
Oct  1 07:28:40 mailstore2-1 -- MARK --
Oct  1 07:47:47 mailstore2-1 kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Oct  1 07:47:47 mailstore2-1 kernel: hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=37694874, high=2, low=4140442, sector=35220864
Oct  1 07:47:47 mailstore2-1 kernel: end_request: I/O error, dev 03:03 (hda), sector 35220864
Oct  1 07:47:47 mailstore2-1 kernel: ^IOperation continuing on 1 devices
Oct  1 07:47:47 mailstore2-1 kernel: md: updating md0 RAID superblock on device
Oct  1 07:47:47 mailstore2-1 kernel: md: hdc3 [events: 00000004]<6>(write) hdc3's sb offset: 115949056
Oct  1 07:47:47 mailstore2-1 kernel: md: recovery thread got woken up ...
Oct  1 07:47:47 mailstore2-1 kernel: md: recovery thread finished ...
Oct  1 07:47:47 mailstore2-1 kernel: md: (skipping faulty hda3 )
Oct  1 08:08:41 mailstore2-1 -- MARK --

Oct  1 10:48:45 mailstore2-1 -- MARK --
Oct  1 10:50:44 mailstore2-1 kernel: hdc: dma_timer_expiry: dma status == 0x20
Oct  1 10:50:44 mailstore2-1 kernel: hdc: status timeout: status=0xd0 { Busy }
Oct  1 10:50:44 mailstore2-1 kernel: 
Oct  1 10:50:44 mailstore2-1 kernel: ide1: reset: success
Oct  1 11:08:46 mailstore2-1 -- MARK --

I'll post again when I've got the text of the kernel panic.

- Erik


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2003-10-04  1:57 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-10-03 11:23 CMD680, kernel 2.4.21, and heartache Erik Bourget
2003-10-03 11:59 ` John Bradford
2003-10-03 12:23   ` Erik Bourget
2003-10-03 12:40     ` John Bradford
2003-10-03 12:48     ` Erik Bourget
2003-10-03 13:11       ` John Bradford
2003-10-03 18:10       ` Tomasz Rola
2003-10-03 18:22         ` Erik Bourget
2003-10-03 18:47           ` John Bradford
2003-10-04  1:57 ` jimbleferret

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox