From: Michael Stumpf <mjstumpf@pobox.com>
To: Guy <bugzilla@watkins-home.com>, linux-raid@vger.kernel.org
Subject: Re: 2 drive dropout (and raid 5), simultaneous, after 3 years
Date: Thu, 09 Dec 2004 11:22:26 -0600 [thread overview]
Message-ID: <41B889D2.2070808@pobox.com> (raw)
In-Reply-To: <200412091642.iB9Ggv918601@www.watkins-home.com>
Ahhhhhhh.. You're on to something here. In all my years of ghetto raid
one of the weakest things I've seen is the Y-molex-power-splitters. Do
you know where more solid ones can be found? I'm to the point where I'd
pay $10 or more for the bloody things if they didnt blink the power
connection when moved a little bit.
I'll bet good money this is what happened. Maybe I need to break out
the soldering iron, but that's kind of an ugly, proprietary, and slow
solution.
Guy wrote:
>Since they both went off line at the same time, check the power cables. Do
>they share a common power cable, or doe each have a unique cable directly
>from the power supply.
>
>Switch power connections with another drive to see if the problem stays with
>the power connection.
>
>Guy
>
>-----Original Message-----
>From: linux-raid-owner@vger.kernel.org
>[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Michael Stumpf
>Sent: Thursday, December 09, 2004 9:45 AM
>To: Guy; linux-raid@vger.kernel.org
>Subject: Re: 2 drive dropout (and raid 5), simultaneous, after 3 years
>
>All I see is this:
>
>Apr 14 22:03:56 drown kernel: scsi: device set offline - not ready or
>command retry failed after host reset: host 1 channel 0 id 2 lun 0
>Apr 14 22:03:56 drown kernel: scsi: device set offline - not ready or
>command retry failed after host reset: host 1 channel 0 id 3 lun 0
>Apr 14 22:03:56 drown kernel: md: updating md1 RAID superblock on device
>Apr 14 22:03:56 drown kernel: md: (skipping faulty sdj1 )
>Apr 14 22:03:56 drown kernel: md: (skipping faulty sdi1 )
>Apr 14 22:03:56 drown kernel: md: sdh1 [events: 000000b5]<6>(write)
>sdh1's sb offset: 117186944
>Apr 14 22:03:56 drown kernel: md: sdg1 [events: 000000b5]<6>(write)
>sdg1's sb offset: 117186944
>Apr 14 22:03:56 drown kernel: md: recovery thread got woken up ...
>Apr 14 22:03:56 drown kernel: md: recovery thread finished ...
>
>What the heck could that be? Can that possibly be related to the fact
>that there weren't proper block device nodes sitting in the filesystem?!
>
>I already ran WD's wonky tool to fix their "DMA timeout" problem, and
>one of the drives is a maxtor. They're on separate ATA cables, and I've
>got about 5 drives per power supply. I checked heat, and it wasn't very
>high.
>
>Any other sources of information I could tap? Maybe an "MD debug"
>setting in the kernel with a recompile?
>
>Guy wrote:
>
>
>
>>You should have some sort of md error in your logs. Try this command:
>>grep "md:" /var/log/messages*|more
>>
>>Yes, they don't play well together, so separate them! :)
>>
>>Guy
>>
>>-----Original Message-----
>>From: linux-raid-owner@vger.kernel.org
>>[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Michael Stumpf
>>Sent: Wednesday, December 08, 2004 11:46 PM
>>To: linux-raid@vger.kernel.org
>>Subject: Re: 2 drive dropout (and raid 5), simultaneous, after 3 years
>>
>>No idea what failure is occuring. Your dd test, run from begin to end
>>of each drive, completed fine. Smartd had no info to report.
>>
>>The fdisk weirdness was operator error; the /dev/sd* block nodes were
>>missing (forgotten detail on age old upgrade). Fixed with mknod.
>>
>>So, I forced mdadm to assemble and it is reconstructing now.
>>Troublesome, though, that 2 drives fail at once like this. I think I
>>should separate them to different raid-5s, just incase.
>>
>>
>>
>>Guy wrote:
>>
>>
>>
>>
>>
>>>What failure are you getting? I assume a read error. md will fail a
>>>
>>>
>drive
>
>
>>>when it gets a read error from the drive. It is "normal" to have a read
>>>error once in a while, but more than 1 a year may indicate a drive going
>>>bad.
>>>
>>>I test my drives with this command:
>>>dd if=/dev/hdi of=/dev/null bs=64k
>>>
>>>You may look into using "smartd". It monitors and tests disks for
>>>
>>>
>>>
>>>
>>problems.
>>
>>
>>
>>
>>>However, my dd test finds them first. smartd has never told me anything
>>>useful, but my drives are old, and are not smart enough for smartd.
>>>
>>>Guy
>>>
>>>-----Original Message-----
>>>From: linux-raid-owner@vger.kernel.org
>>>[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Michael Stumpf
>>>Sent: Wednesday, December 08, 2004 4:03 PM
>>>To: linux-raid@vger.kernel.org
>>>Subject: 2 drive dropout (and raid 5), simultaneous, after 3 years
>>>
>>>
>>>I've got a an LVM cobbled together of 2 RAID-5 md's. For the longest
>>>time I was running with 3 promise cards and surviving everything
>>>including the occasional drive failure, then suddenly I had double drive
>>>dropouts and the array would go into a degraded state.
>>>
>>>10 drives in the system, Linux 2.4.22, Slackware 9, mdadm v1.2.0 (13 mar
>>>2003)
>>>
>>>I started to diagnose; fdisk -l /dev/hdi returned nothing for the two
>>>failed drives, but "dmesg" reports that the drives are happy, and that
>>>the md would have been automounted if not for a mismatch on the event
>>>counters (of the 2 failed drives).
>>>
>>>I assumed that this had something to do with my semi-nonstandard
>>>application of a zillion (3) promise cards in 1 system, but I never had
>>>this problem before. I ripped out the promise cards and stuck in 3ware
>>>5700s, cleaning it up a bit and also putting a single drive per ATA
>>>channel. Two weeks later, the same problem crops up again.
>>>
>>>The "problematic" drives are even mixed; 1 is WD, 1 is Maxtor (both
>>>
>>>
>>>
>>>
>>120gig).
>>
>>
>>
>>
>>>Is this a known bug in 2.4.22 or mdadm 1.2.0? Suggestions?
>>>
>>>
>>>--------------------------------------------
>>>My mailbox is spam-free with ChoiceMail, the leader in personal and
>>>corporate anti-spam solutions. Download your free copy of ChoiceMail from
>>>www.choicemailfree.com
>>>
>>>-
>>>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>the body of a message to majordomo@vger.kernel.org
>>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>>-
>>>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>the body of a message to majordomo@vger.kernel.org
>>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>--------------------------------------------
>>My mailbox is spam-free with ChoiceMail, the leader in personal and
>>corporate anti-spam solutions. Download your free copy of ChoiceMail from
>>www.choicemailfree.com
>>
>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>>
>>
>>
>
>
>--------------------------------------------
>My mailbox is spam-free with ChoiceMail, the leader in personal and
>corporate anti-spam solutions. Download your free copy of ChoiceMail from
>www.choicemailfree.com
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
>
--------------------------------------------
My mailbox is spam-free with ChoiceMail, the leader in personal and corporate anti-spam solutions. Download your free copy of ChoiceMail from www.choicemailfree.com
next prev parent reply other threads:[~2004-12-09 17:22 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-12-08 21:02 2 drive dropout (and raid 5), simultaneous, after 3 years Michael Stumpf
2004-12-08 22:07 ` Guy
2004-12-09 4:46 ` Michael Stumpf
2004-12-09 4:57 ` Guy
2004-12-09 14:44 ` Michael Stumpf
2004-12-09 16:42 ` Guy
2004-12-09 17:22 ` Michael Stumpf [this message]
2004-12-15 17:45 ` Doug Ledford
[not found] ` <41C10709.4050303@pobox.com>
2004-12-16 3:55 ` Michael Stumpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=41B889D2.2070808@pobox.com \
--to=mjstumpf@pobox.com \
--cc=bugzilla@watkins-home.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).