Disks keep disapearing

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Disks keep disapearing
@ 2003-05-09 18:53 brandon
  2003-05-10  0:03 ` Peter L. Ashford
  0 siblings, 1 reply; 7+ messages in thread
From: brandon @ 2003-05-09 18:53 UTC (permalink / raw)
  To: linux-raid

3 servers I work with are having issues I am so far unable to track
down.  All 3 are setup like this:

RedHat 7.3
2x Zeon w/HyperThreading disabled
2x WD 120GB drives RAID1, ext3
2GB RAM
Kernel 2.4.18-27.7.xsmp

fdisk -l:

Disk /dev/hdd: 255 heads, 63 sectors, 14589 cylinders
Units = cylinders of 16065 * 512 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/hdd1   *         1         6     48163+  fd  Linux raid autodetect
/dev/hdd2             7       643   5116702+  fd  Linux raid autodetect
/dev/hdd3           644      1280   5116702+  fd  Linux raid autodetect
/dev/hdd4          1281     14589 106904542+   f  Win95 Ext'd (LBA)
/dev/hdd5          1281     14458 105852253+  fd  Linux raid autodetect
/dev/hdd6         14459     14589   1052226   82  Linux swap

Disk /dev/hda: 255 heads, 63 sectors, 14589 cylinders
Units = cylinders of 16065 * 512 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/hda1   *         1         6     48163+  fd  Linux raid autodetect
/dev/hda2             7       643   5116702+  fd  Linux raid autodetect
/dev/hda3           644      1280   5116702+  fd  Linux raid autodetect
/dev/hda4          1281     14589 106904542+   f  Win95 Ext'd (LBA)
/dev/hda5          1281     14458 105852253+  fd  Linux raid autodetect
/dev/hda6         14459     14589   1052226   82  Linux swap

Filesystem            Size  Used Avail Use% Mounted on
/dev/md1              4.8G  1.1G  3.4G  24% /
/dev/md0               45M   19M   24M  43% /boot
/dev/md3               99G   25G   70G  26% /home
none                 1008M     0 1008M   0% /dev/shm
/dev/md2              4.8G  795M  3.7G  18% /var

=-=-=-=-



All 3 of the servers run fine for about a week, then they crash.  When
they come back up, one of the drives is missing from the array.  Nothing
in the messsages log that I can find is helpful.  Using sar, I can see
that the load average isnt too high before they crash.

When the first server crashed, I thought perhaps the "Win95 Ext'd (LBA)"
partition type might have been causing some problems.  I replaced the
failed drive, just in case it was hardware, partitioned the drive to be
the same, except made the extended part. "5  Extended".  Rebuilt the
array, but still had these crashes.

Any suggestions on how I should go about troubleshooting this? Anyone
know what might be causing this?




=-=-=-=-
Here is a snip of the dmesg output from the last on that crashed:


Real Time Clock Driver v1.10e
oprofile: can't get RTC I/O Ports
block: 1024 slots per queue, batch=256
Uniform Multi-Platform E-IDE driver Revision: 6.31
ide: Assuming 33MHz system bus speed for PIO modes; override with
idebus=xx
PIIX4: IDE controller on PCI bus 00 dev f9
PCI: Enabling device 00:1f.1 (0005 -> 0007)
PIIX4: chipset revision 2
PIIX4: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio
    ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:pio, hdd:DMA
hda: WDC WD1200BB-00DAA1, ATA DISK drive
hdc: SR244W, ATAPI CD/DVD-ROM drive
hdd: WDC WD1200BB-00DAA1, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
blk: queue c0415f44, I/O limit 4095Mb (mask 0xffffffff)
blk: queue c0415f44, I/O limit 4095Mb (mask 0xffffffff)
hda: 234375000 sectors (120000 MB) w/2048KiB Cache, CHS=14589/255/63,
UDMA(100)
blk: queue c04163e8, I/O limit 4095Mb (mask 0xffffffff)
blk: queue c04163e8, I/O limit 4095Mb (mask 0xffffffff)
hdd: 234375000 sectors (120000 MB) w/2048KiB Cache, CHS=14589/255/63,
UDMA(100) ide-floppy driver 0.99.newide Partition check:
 hda: hda1 hda2 hda3 hda4 < hda5 hda6 >
 hdd: hdd1 hdd2 hdd3 hdd4 < hdd5 hdd6 >
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
NET4: Frame Diverter 0.46
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
ide-floppy driver 0.99.newide
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
 [events: 00000014]
 [events: 00000014]
 [events: 00000014]
 [events: 00000014]
 [events: 0000000e]
 [events: 0000000e]
 [events: 0000000e]
 [events: 0000000e]
md: autorun ...
md: considering hdd5 ...
md:  adding hdd5 ...
md:  adding hda5 ...
md: created md3
md: bind<hda5,1>
md: bind<hdd5,2>
md: running: <hdd5><hda5>
md: hdd5's event counter: 0000000e
md: hda5's event counter: 00000014
md: superblock update time inconsistency -- using the most recent one
md: freshest: hda5
md: kicking non-fresh hdd5 from array!
md: unbind<hdd5,1>
md: export_rdev(hdd5)
md: md3: raid array is not clean -- starting background reconstruction
md: RAID level 1 does not need chunksize! Continuing anyway.
kmod: failed to exec /sbin/modprobe -s -k md-personality-3, errno = 2
md: personality 3 is not loaded!
md :do_md_run() returned -22
md: md3 stopped.
md: unbind<hda5,0>
md: export_rdev(hda5)
md: considering hdd3 ...
md:  adding hdd3 ...
md:  adding hda3 ...
md: created md1
md: bind<hda3,1>
md: bind<hdd3,2>
md: running: <hdd3><hda3>
md: hdd3's event counter: 0000000e
md: hda3's event counter: 00000014
md: superblock update time inconsistency -- using the most recent one
md: freshest: hda3
md: kicking non-fresh hdd3 from array!
md: unbind<hdd3,1>
md: export_rdev(hdd3)
md: md1: raid array is not clean -- starting background reconstruction
md: RAID level 1 does not need chunksize! Continuing anyway.
kmod: failed to exec /sbin/modprobe -s -k md-personality-3, errno = 2
md: personality 3 is not loaded!
md :do_md_run() returned -22
md: md1 stopped.
md: unbind<hda3,0>
md: export_rdev(hda3)
md: considering hdd2 ...
md:  adding hdd2 ...
md:  adding hda2 ...
md: created md2
md: bind<hda2,1>
md: bind<hdd2,2>
md: running: <hdd2><hda2>
md: hdd2's event counter: 0000000e
md: hda2's event counter: 00000014
md: superblock update time inconsistency -- using the most recent one
md: freshest: hda2
md: kicking non-fresh hdd2 from array!
md: unbind<hdd2,1>
md: export_rdev(hdd2)
md: md2: raid array is not clean -- starting background reconstruction
md: RAID level 1 does not need chunksize! Continuing anyway.
kmod: failed to exec /sbin/modprobe -s -k md-personality-3, errno = 2
md: personality 3 is not loaded!
md :do_md_run() returned -22
md: md2 stopped.
md: unbind<hda2,0>
md: export_rdev(hda2)
md: considering hdd1 ...
md:  adding hdd1 ...
md:  adding hda1 ...
md: created md0
md: bind<hda1,1>
md: bind<hdd1,2>
md: running: <hdd1><hda1>
md: hdd1's event counter: 0000000e
md: hda1's event counter: 00000014
md: superblock update time inconsistency -- using the most recent one
md: freshest: hda1
md: kicking non-fresh hdd1 from array!
md: unbind<hdd1,1>
md: export_rdev(hdd1)
md: md0: raid array is not clean -- starting background reconstruction
md: RAID level 1 does not need chunksize! Continuing anyway.
kmod: failed to exec /sbin/modprobe -s -k md-personality-3, errno = 2
md: personality 3 is not loaded!
md :do_md_run() returned -22
md: md0 stopped.
md: unbind<hda1,0>
md: export_rdev(hda1)
md: ... autorun DONE.
pci_hotplug: PCI Hot Plug PCI Core version: 0.4
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 16384 buckets, 128Kbytes
TCP: Hash tables configured (established 262144 bind 65536) Linux IP
multicast router 0.06 plus PIM-SM
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
RAMDISK: Compressed image found at block 0
Freeing initrd memory: 133k freed
VFS: Mounted root (ext2 filesystem).
md: raid1 personality registered as nr 3
Journalled Block Device driver loaded
md: Autodetecting RAID arrays.
 [events: 0000000e]
 [events: 00000014]
 [events: 0000000e]
 [events: 00000014]
 [events: 0000000e]
 [events: 00000014]
 [events: 0000000e]
 [events: 00000014]
md: autorun ...
md: considering hda1 ...
md:  adding hda1 ...
md:  adding hdd1 ...
md: created md0
md: bind<hdd1,1>
md: bind<hda1,2>
md: running: <hda1><hdd1>
md: hda1's event counter: 00000014
md: hdd1's event counter: 0000000e
md: superblock update time inconsistency -- using the most recent one
md: freshest: hda1
md: kicking non-fresh hdd1 from array!
md: unbind<hdd1,1>
md: export_rdev(hdd1)
md: md0: raid array is not clean -- starting background reconstruction
md: RAID level 1 does not need chunksize! Continuing anyway.
md0: max total readahead window set to 508k
md0: 1 data-disks, max readahead per data-disk: 508k
raid1: device hda1 operational as mirror 0
raid1: md0, not all disks are operational -- trying to recover array
raid1: raid set md0 active with 1 out of 2 mirrors
md: updating md0 RAID superblock on device
md: hda1 [events: 00000015]<6>(write) hda1's sb offset: 48064
md: recovery thread got woken up ...
md0: no spare disk to reconstruct array! -- continuing in degraded mode
md: recovery thread finished ...
md: considering hda2 ...
md:  adding hda2 ...
md:  adding hdd2 ...
md: created md2
md: bind<hdd2,1>
md: bind<hda2,2>
md: running: <hda2><hdd2>
md: hda2's event counter: 00000014
md: hdd2's event counter: 0000000e
md: superblock update time inconsistency -- using the most recent one
md: freshest: hda2
md: kicking non-fresh hdd2 from array!
md: unbind<hdd2,1>
md: export_rdev(hdd2)
md: md2: raid array is not clean -- starting background reconstruction
md: RAID level 1 does not need chunksize! Continuing anyway.
md2: max total readahead window set to 508k
md2: 1 data-disks, max readahead per data-disk: 508k
raid1: device hda2 operational as mirror 0
raid1: md2, not all disks are operational -- trying to recover array
raid1: raid set md2 active with 1 out of 2 mirrors
md: updating md2 RAID superblock on device
md: hda2 [events: 00000015]<6>(write) hda2's sb offset: 5116608
md: recovery thread got woken up ...
md2: no spare disk to reconstruct array! -- continuing in degraded mode
md0: no spare disk to reconstruct array! -- continuing in degraded mode
md: recovery thread finished ...
md: considering hda3 ...
md:  adding hda3 ...
md:  adding hdd3 ...
md: created md1
md: bind<hdd3,1>
md: bind<hda3,2>
md: running: <hda3><hdd3>
md: hda3's event counter: 00000014
md: hdd3's event counter: 0000000e
md: superblock update time inconsistency -- using the most recent one
md: freshest: hda3
md: kicking non-fresh hdd3 from array!
md: unbind<hdd3,1>
md: export_rdev(hdd3)
md: md1: raid array is not clean -- starting background reconstruction
md: RAID level 1 does not need chunksize! Continuing anyway.
md1: max total readahead window set to 508k
md1: 1 data-disks, max readahead per data-disk: 508k
raid1: device hda3 operational as mirror 0
raid1: md1, not all disks are operational -- trying to recover array
raid1: raid set md1 active with 1 out of 2 mirrors
md: updating md1 RAID superblock on device
md: hda3 [events: 00000015]<6>(write) hda3's sb offset: 5116608
md: recovery thread got woken up ...
md1: no spare disk to reconstruct array! -- continuing in degraded mode
md2: no spare disk to reconstruct array! -- continuing in degraded mode
md0: no spare disk to reconstruct array! -- continuing in degraded mode
md: recovery thread finished ...
md: considering hda5 ...
md:  adding hda5 ...
md:  adding hdd5 ...
md: created md3
md: bind<hdd5,1>
md: bind<hda5,2>
md: running: <hda5><hdd5>
md: hda5's event counter: 00000014
md: hdd5's event counter: 0000000e
md: superblock update time inconsistency -- using the most recent one
md: freshest: hda5
md: kicking non-fresh hdd5 from array!
md: unbind<hdd5,1>
md: export_rdev(hdd5)
md: md3: raid array is not clean -- starting background reconstruction
md: RAID level 1 does not need chunksize! Continuing anyway.
md3: max total readahead window set to 508k
md3: 1 data-disks, max readahead per data-disk: 508k
raid1: device hda5 operational as mirror 0
raid1: md3, not all disks are operational -- trying to recover array
raid1: raid set md3 active with 1 out of 2 mirrors
md: updating md3 RAID superblock on device
md: hda5 [events: 00000015]<6>(write) hda5's sb offset: 105852160
md: recovery thread got woken up ...
md3: no spare disk to reconstruct array! -- continuing in degraded mode
md1: no spare disk to reconstruct array! -- continuing in degraded mode
md2: no spare disk to reconstruct array! -- continuing in degraded mode
md0: no spare disk to reconstruct array! -- continuing in degraded mode
md: recovery thread finished ...
md: ... autorun DONE.
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery. kjournald
starting.  Commit interval 5 seconds
EXT3-fs: md(9,1): orphan cleanup on readonly fs
ext3_orphan_cleanup: deleting unreferenced inode 96928
ext3_orphan_cleanup: deleting unreferenced inode 257967
ext3_orphan_cleanup: deleting unreferenced inode 257958
ext3_orphan_cleanup: deleting unreferenced inode 96927
ext3_orphan_cleanup: deleting unreferenced inode 592773
ext3_orphan_cleanup: deleting unreferenced inode 257653
ext3_orphan_cleanup: deleting unreferenced inode 368804
ext3_orphan_cleanup: deleting unreferenced inode 193627
EXT3-fs: md(9,1): 8 orphan inodes deleted
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
Freeing unused kernel memory: 188k freed
Adding Swap: 1052216k swap-space (priority -1)
Adding Swap: 1052216k swap-space (priority -2)
usb.c: registered new driver usbdevfs
usb.c: registered new driver hub
usb-uhci.c: $Revision: 1.275 $ time 06:15:20 Mar 14 2003
usb-uhci.c: High bandwidth mode enabled
PCI: Setting latency timer of device 00:1d.0 to 64
usb-uhci.c: USB UHCI at I/O 0xe800, IRQ 16
usb-uhci.c: Detected 2 ports
usb.c: new USB bus registered, assigned bus number 1
hub.c: USB hub found
hub.c: 2 ports detected
usb-uhci.c: v1.275:USB Universal Host Controller Interface driver EXT3
FS 2.4-0.9.18, 14 May 2002 on md(9,1), internal journal kjournald
starting.  Commit interval 5 seconds EXT3 FS 2.4-0.9.18, 14 May 2002 on
md(9,0), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.18, 14 May 2002 on md(9,3), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.18, 14 May 2002 on md(9,2), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
ide-floppy driver 0.99.newide
hdc: ATAPI 24X CD-ROM drive, 128kB Cache
Uniform CD-ROM driver Revision: 3.12














^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Disks keep disapearing
  2003-05-09 18:53 Disks keep disapearing brandon
@ 2003-05-10  0:03 ` Peter L. Ashford
  2003-05-10 16:47   ` Johan Schön
  0 siblings, 1 reply; 7+ messages in thread
From: Peter L. Ashford @ 2003-05-10  0:03 UTC (permalink / raw)
  To: brandon; +Cc: linux-raid

Brandon,

> 3 servers I work with are having issues I am so far unable to track
> down.  All 3 are setup like this:
>
> 2x WD 120GB drives RAID1, ext3

> All 3 of the servers run fine for about a week, then they crash.  When
> they come back up, one of the drives is missing from the array.  Nothing
> in the messsages log that I can find is helpful.  Using sar, I can see
> that the load average isnt too high before they crash.
>
> When the first server crashed, I thought perhaps the "Win95 Ext'd (LBA)"
> partition type might have been causing some problems.  I replaced the
> failed drive, just in case it was hardware, partitioned the drive to be
> the same, except made the extended part. "5  Extended".  Rebuilt the
> array, but still had these crashes.
>
> Any suggestions on how I should go about troubleshooting this? Anyone
> know what might be causing this?

WD has had problems similar to this with many of their drives.  It just
decides to 'go away'.  There is a fix available on their web site for the
180GB and 200GB drives (and a better description of the problem), but the
problem is NOT limited to those drives.

If the description looks like it matches your problem, you should contact
the WD tech support people, and get the fix for your drives from them.
You'll have to go to level 2 support, and maybe even to a product manager,
to get the fix.

I don't know if this particular problem could cause the crashed you're
seeing, but it could cause the drives to disappear.  Perhaps the drives
are disappearing, and causing the crash.

Good luck.
				Peter Ashford

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Disks keep disapearing
  2003-05-10  0:03 ` Peter L. Ashford
@ 2003-05-10 16:47   ` Johan Schön
  2003-05-10 17:01     ` Mads Peter Bach
                       ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Johan Schön @ 2003-05-10 16:47 UTC (permalink / raw)
  To: Peter L. Ashford; +Cc: linux-raid

Peter L. Ashford wrote:
> WD has had problems similar to this with many of their drives.  It just
> decides to 'go away'.  There is a fix available on their web site for the
> 180GB and 200GB drives (and a better description of the problem), but the
> problem is NOT limited to those drives.

How do these problem appear in log files?

I have a machine with two Promise Ultra100 TX2 cards, and five
WD2000JB 200 GB drives in RAID-5. In a month, i've had a few disk "failures"
that typically looks like this in the logs:

|hdg: dma_intr: status=0x63 { DriveReady DeviceFault Index Error }
|hdg: dma_intr: error=0x04 { DriveStatusError }
|hdg: DMA disabled
|hdh: DMA disabled
|PDC202XX: Secondary channel reset.
|ide3: reset: success
|hdg: irq timeout: status=0xd2 { Busy }
|
|PDC202XX: Secondary channel reset.
|ide3: reset: success
|hdg: irq timeout: status=0xd2 { Busy }
|
|end_request: I/O error, dev 22:00 (hdg), sector 280277504
|raid5: Disk failure on hdg, disabling device. Operation continuing on 4 devices
|hdg: status timeout: status=0xd2 { Busy }
|
|PDC202XX: Secondary channel reset.
|hdg: drive not ready for command
|md: updating md0 RAID superblock on device
|md: hdh [events: 00000007]<6>(write) hdh's sb offset: 195360896
|md: recovery thread got woken up ...
|md0: no spare disk to reconstruct array! -- continuing in degraded mode
|ide3: reset: success
|md: (skipping faulty hdg )
|md: hdf [events: 00000007]<6>(write) hdf's sb offset: 195360896
|md: hde [events: 00000007]<6>(write) hde's sb offset: 195360896
|md: hdb [events: 00000007]<6>(write) hdb's sb offset: 195360896
|hdg: irq timeout: status=0xd2 { Busy }

The disk itself doesn't appear to know about any failures
(using smartctl), and it works again when hotadded to the raidset. I've
also had a multiple drive "failure" twice, both times with two drives
using the same IDE channel.

I'm not sure if these problems are caused by buggy Promise ATA drivers
in my kernel (RH9, 2.4.20) or the WDC problem with 180/200 GB drives.
 From WDC's description of the problem, I got the impression that it
only happened when the drives were connected to hardware RAID cards
like 3Ware IDE raid controllers.

Can anyone advise?

  // Johan

-- 
Johan Schön                             www.visiarc.com
VISIARC AB                         Cell: +46-708-343002

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Disks keep disapearing
  2003-05-10 16:47   ` Johan Schön
@ 2003-05-10 17:01     ` Mads Peter Bach
  2003-05-11  4:09     ` Peter L. Ashford
  2003-05-12 17:31     ` Brandon Belshaw
  2 siblings, 0 replies; 7+ messages in thread
From: Mads Peter Bach @ 2003-05-10 17:01 UTC (permalink / raw)
  To: Johan Schön; +Cc: linux-raid

Johan Schön wrote:

> The disk itself doesn't appear to know about any failures
> (using smartctl), and it works again when hotadded to the raidset. I've
> also had a multiple drive "failure" twice, both times with two drives
> using the same IDE channel.

Using more than one drive on an IDE channel is considered risky by many. If a 
drive fails, it's likely to bring down the entire channel.
In any case, don't put more than one drive of an array on the same channel.

-- 
Mads Peter Bach
Systemadministrator,  Det Humanistiske Fakultet, Aalborg Universitet
Kroghstræde 3 - 5.111, DK-9220 Aalborg Øst - (+45) 96358062
# whois MPB1-DK@whois.dk-hostmaster.dk

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Disks keep disapearing
  2003-05-10 16:47   ` Johan Schön
  2003-05-10 17:01     ` Mads Peter Bach
@ 2003-05-11  4:09     ` Peter L. Ashford
  2003-05-12 17:31     ` Brandon Belshaw
  2 siblings, 0 replies; 7+ messages in thread
From: Peter L. Ashford @ 2003-05-11  4:09 UTC (permalink / raw)
  To: Johan Schön; +Cc: linux-raid

Johan,

> > WD has had problems similar to this with many of their drives.  It just
> > decides to 'go away'.  There is a fix available on their web site for the
> > 180GB and 200GB drives (and a better description of the problem), but the
> > problem is NOT limited to those drives.
>
> How do these problem appear in log files?
>
> I have a machine with two Promise Ultra100 TX2 cards, and five
> WD2000JB 200 GB drives in RAID-5. In a month, i've had a few disk "failures"
> that typically looks like this in the logs:

<SNIP LOG>

> The disk itself doesn't appear to know about any failures
> (using smartctl), and it works again when hotadded to the raidset. I've
> also had a multiple drive "failure" twice, both times with two drives
> using the same IDE channel.
>
> I'm not sure if these problems are caused by buggy Promise ATA drivers
> in my kernel (RH9, 2.4.20) or the WDC problem with 180/200 GB drives.
> From WDC's description of the problem, I got the impression that it
> only happened when the drives were connected to hardware RAID cards
> like 3Ware IDE raid controllers.

This appears to be the WD problem.  It is caused by some timing-related
irregularities in their microcode.  It occurs in a 'RAID environment'.
The article is not specific about only hardware RAID cards.

There is a fix package for 3Ware cards and a fix package for non-3Ware
cards.  Use the package for non-3Ware cards.

If you need to have more than 4 drives in an array, with 2 Ultra100 cards,
you might want to consider adding a Promise Ultra133 card.  You are
normally limited to 2 of each, for a total of 4 cards (8 channels total).
This would solve your double-failure problem.  FYI, I'm running the
reverse of this (2 Ultra133 + 1 Ultra100) in one of my file servers with
excellent results.

Good luck.
				Peter Ashford

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: Disks keep disapearing
  2003-05-10 16:47   ` Johan Schön
  2003-05-10 17:01     ` Mads Peter Bach
  2003-05-11  4:09     ` Peter L. Ashford
@ 2003-05-12 17:31     ` Brandon Belshaw
  2003-05-12 17:49       ` A.J.Dawson
  2 siblings, 1 reply; 7+ messages in thread
From: Brandon Belshaw @ 2003-05-12 17:31 UTC (permalink / raw)
  To: 'Johan Schön'; +Cc: linux-raid




> Peter L. Ashford wrote:
> > WD has had problems similar to this with many of their drives.  It 
> > just decides to 'go away'.  There is a fix available on 
> their web site 
> > for the 180GB and 200GB drives (and a better description of the 
> > problem), but the problem is NOT limited to those drives.
> 

> How do these problem appear in log files?

-= A server that lost one drive on Sunday, only had this error:

 kernel: end_request: I/O error, dev 03:41 (hdb), sector 512


-= Another server that is having this problems, has this in the logs:

May  1 03:01:28 virt10p kernel: end_request: I/O error, dev 16:42 (hdd),
sector 16
May  1 03:01:28 virt10p kernel: hdd: status error: status=0x10 {
SeekComplete }
( repet 10 times)
May  1 03:01:28 virt10p kernel: hdd: status error: status=0x10 {
SeekComplete }
May  1 03:01:28 virt10p kernel: end_request: I/O error, dev 16:42 (hdd),
sector 108736
May  1 04:02:13 virt10p kernel: hdd: status error: status=0x10 {
SeekComplete }
May  1 04:02:13 virt10p kernel: hdd: status error: status=0x10 {
SeekComplete }



> 
> I have a machine with two Promise Ultra100 TX2 cards, and 
> five WD2000JB 200 GB drives in RAID-5. In a month, i've had a 
> few disk "failures" that typically looks like this in the logs:
> 
[snip log]

> The disk itself doesn't appear to know about any failures 
> (using smartctl), and it works again when hotadded to the 
> raidset. I've also had a multiple drive "failure" twice, both 
> times with two drives using the same IDE channel.

On the server with the most recent crash, I replaced the drive with a
WD1200JB (it was a WD1200BB), rebuilt the array, then formated the drive
that wasn’t replaced checking it for badblocks, using the slower,
destructive, read-write test (they arnt kidding about the slower part,
took about 24 hours).

Up until Sunday, I could readd the disk to the array, but now the 2nd
hard drive doesn't even show up when doing a fdisk -l




> I'm not sure if these problems are caused by buggy Promise 
> ATA drivers in my kernel (RH9, 2.4.20) or the WDC problem 
> with 180/200 GB drives.  From WDC's description of the 
> problem, I got the impression that it only happened when the 
> drives were connected to hardware RAID cards like 3Ware IDE 
> raid controllers.

I've contacted WD's tech support to see how they can help.  When I'm
done with them I'll post the results.


-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: Disks keep disapearing
  2003-05-12 17:31     ` Brandon Belshaw
@ 2003-05-12 17:49       ` A.J.Dawson
  0 siblings, 0 replies; 7+ messages in thread
From: A.J.Dawson @ 2003-05-12 17:49 UTC (permalink / raw)
  To: linux-raid

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=X-UNKNOWN, Size: 4365 bytes --]

I have also experienced the problem with a set of WD 120Gb disks attached
to an Adaptec 2400A RAID controller.  The drives are WD1200JB's and show
the same sort of behaviour as others on the list have described (and
incidentally as described in answer 913 of the WD knowledge base), i.e.
the array runs fine for a while, then suddenly for no apparent reason a
disk vanishes from the array.

In the 2400A's BIOS, the disk that has dropped from the array shows itself
typically as a 'missing component'.  Zapping the drive and/or writing
zeros to the drive does not allow the drive to be re-inserted in the
array (after doing a thorough check that in fact the disk *is* okay) - the
array has to be rebuilt from scratch every time!  The log shows that the
drive timed-out when the RAID controller attempted to access it and so the
controller takes the disk off-line assuming it is faulty.

I've followed the advice in the knowledge base article and updated the
firmware on each of the drives used (8 at the moment as we have two
servers using them) to remove the acoustic noise reduction stuff - I'll
let you know how I get on!

Regards
Andy


On Mon, 12 May 2003, Brandon Belshaw wrote:

>
>
>
> > Peter L. Ashford wrote:
> > > WD has had problems similar to this with many of their drives.  It
> > > just decides to 'go away'.  There is a fix available on
> > their web site
> > > for the 180GB and 200GB drives (and a better description of the
> > > problem), but the problem is NOT limited to those drives.
> >
>
> > How do these problem appear in log files?
>
> -= A server that lost one drive on Sunday, only had this error:
>
>  kernel: end_request: I/O error, dev 03:41 (hdb), sector 512
>
>
> -= Another server that is having this problems, has this in the logs:
>
> May  1 03:01:28 virt10p kernel: end_request: I/O error, dev 16:42 (hdd),
> sector 16
> May  1 03:01:28 virt10p kernel: hdd: status error: status=0x10 {
> SeekComplete }
> ( repet 10 times)
> May  1 03:01:28 virt10p kernel: hdd: status error: status=0x10 {
> SeekComplete }
> May  1 03:01:28 virt10p kernel: end_request: I/O error, dev 16:42 (hdd),
> sector 108736
> May  1 04:02:13 virt10p kernel: hdd: status error: status=0x10 {
> SeekComplete }
> May  1 04:02:13 virt10p kernel: hdd: status error: status=0x10 {
> SeekComplete }
>
>
>
> >
> > I have a machine with two Promise Ultra100 TX2 cards, and
> > five WD2000JB 200 GB drives in RAID-5. In a month, i've had a
> > few disk "failures" that typically looks like this in the logs:
> >
> [snip log]
>
> > The disk itself doesn't appear to know about any failures
> > (using smartctl), and it works again when hotadded to the
> > raidset. I've also had a multiple drive "failure" twice, both
> > times with two drives using the same IDE channel.
>
> On the server with the most recent crash, I replaced the drive with a
> WD1200JB (it was a WD1200BB), rebuilt the array, then formated the drive
> that wasn’t replaced checking it for badblocks, using the slower,
> destructive, read-write test (they arnt kidding about the slower part,
> took about 24 hours).
>
> Up until Sunday, I could readd the disk to the array, but now the 2nd
> hard drive doesn't even show up when doing a fdisk -l
>
>
>
>
> > I'm not sure if these problems are caused by buggy Promise
> > ATA drivers in my kernel (RH9, 2.4.20) or the WDC problem
> > with 180/200 GB drives.  From WDC's description of the
> > problem, I got the impression that it only happened when the
> > drives were connected to hardware RAID cards like 3Ware IDE
> > raid controllers.
>
> I've contacted WD's tech support to see how they can help.  When I'm
> done with them I'll post the results.
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Dr. Andy Dawson
A.J.Dawson@Bradford.ac.uk
http://www.mossie.org
http://www.museum-explorer.org.uk

 Never attribute to malice that which is adequately explained by stupidity.

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2003-05-12 17:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-05-09 18:53 Disks keep disapearing brandon
2003-05-10  0:03 ` Peter L. Ashford
2003-05-10 16:47   ` Johan Schön
2003-05-10 17:01     ` Mads Peter Bach
2003-05-11  4:09     ` Peter L. Ashford
2003-05-12 17:31     ` Brandon Belshaw
2003-05-12 17:49       ` A.J.Dawson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).