From: Michael Stumpf <mjstumpf@pobox.com>
To: linux-raid@vger.kernel.org
Subject: Re: 2 drive dropout (and raid 5), simultaneous, after 3 years
Date: Wed, 15 Dec 2004 21:55:43 -0600 [thread overview]
Message-ID: <41C1073F.2040003@pobox.com> (raw)
In-Reply-To: <41C10709.4050303@pobox.com>
Good advice. I was actually about to post something I just found--at
Fry's, Antec appears to be selling Y power connectors at the typical
price, $2 each (2 per packet, $4/packet). These however are different.
They are more solid and connect more "rigidly" than anything I've bought
in the last 4 years, and they are setup not so much as a "Y" but as an
"M" in a "daisy chaining" fashion ( male --------female--------female ).
>
> Absolutely ideal for the kind of nonsense we are pursuing.
>
>
>
> Doug Ledford wrote:
>
>> On Thu, 2004-12-09 at 11:22 -0600, Michael Stumpf wrote:
>>
>>
>>> Ahhhhhhh.. You're on to something here. In all my years of ghetto
>>> raid one of the weakest things I've seen is the
>>> Y-molex-power-splitters. Do you know where more solid ones can be
>>> found? I'm to the point where I'd pay $10 or more for the bloody
>>> things if they didnt blink the power connection when moved a little
>>> bit.
>>>
>>> I'll bet good money this is what happened. Maybe I need to break
>>> out the soldering iron, but that's kind of an ugly, proprietary, and
>>> slow solution.
>>>
>>
>>
>> Well, that is usually overkill anyway ;-) I've solved this problem in
>> the past by simply getting out a pair of thin nose needle nose pliers
>> and crimping down on the connector's actual grip points. Once I
>> tightened up the grip spots on the Y connector, the problem went away.
>>
>>
>>
>>> Guy wrote:
>>>
>>>
>>>
>>>> Since they both went off line at the same time, check the power
>>>> cables. Do
>>>> they share a common power cable, or doe each have a unique cable
>>>> directly
>>>>
>>>> from the power supply.
>>>
>>>
>>>
>>>> Switch power connections with another drive to see if the problem
>>>> stays with
>>>> the power connection.
>>>>
>>>> Guy
>>>>
>>>> -----Original Message-----
>>>> From: linux-raid-owner@vger.kernel.org
>>>> [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Michael Stumpf
>>>> Sent: Thursday, December 09, 2004 9:45 AM
>>>> To: Guy; linux-raid@vger.kernel.org
>>>> Subject: Re: 2 drive dropout (and raid 5), simultaneous, after 3 years
>>>>
>>>> All I see is this:
>>>>
>>>> Apr 14 22:03:56 drown kernel: scsi: device set offline - not ready
>>>> or command retry failed after host reset: host 1 channel 0 id 2 lun 0
>>>> Apr 14 22:03:56 drown kernel: scsi: device set offline - not ready
>>>> or command retry failed after host reset: host 1 channel 0 id 3 lun 0
>>>> Apr 14 22:03:56 drown kernel: md: updating md1 RAID superblock on
>>>> device
>>>> Apr 14 22:03:56 drown kernel: md: (skipping faulty sdj1 )
>>>> Apr 14 22:03:56 drown kernel: md: (skipping faulty sdi1 )
>>>> Apr 14 22:03:56 drown kernel: md: sdh1 [events: 000000b5]<6>(write)
>>>> sdh1's sb offset: 117186944
>>>> Apr 14 22:03:56 drown kernel: md: sdg1 [events: 000000b5]<6>(write)
>>>> sdg1's sb offset: 117186944
>>>> Apr 14 22:03:56 drown kernel: md: recovery thread got woken up ...
>>>> Apr 14 22:03:56 drown kernel: md: recovery thread finished ...
>>>>
>>>> What the heck could that be? Can that possibly be related to the
>>>> fact that there weren't proper block device nodes sitting in the
>>>> filesystem?!
>>>>
>>>> I already ran WD's wonky tool to fix their "DMA timeout" problem,
>>>> and one of the drives is a maxtor. They're on separate ATA cables,
>>>> and I've got about 5 drives per power supply. I checked heat, and
>>>> it wasn't very high.
>>>>
>>>> Any other sources of information I could tap? Maybe an "MD debug"
>>>> setting in the kernel with a recompile?
>>>>
>>>> Guy wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> You should have some sort of md error in your logs. Try this
>>>>> command:
>>>>> grep "md:" /var/log/messages*|more
>>>>>
>>>>> Yes, they don't play well together, so separate them! :)
>>>>>
>>>>> Guy
>>>>>
>>>>> -----Original Message-----
>>>>> From: linux-raid-owner@vger.kernel.org
>>>>> [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Michael Stumpf
>>>>> Sent: Wednesday, December 08, 2004 11:46 PM
>>>>> To: linux-raid@vger.kernel.org
>>>>> Subject: Re: 2 drive dropout (and raid 5), simultaneous, after 3
>>>>> years
>>>>>
>>>>> No idea what failure is occuring. Your dd test, run from begin to
>>>>> end of each drive, completed fine. Smartd had no info to report.
>>>>>
>>>>> The fdisk weirdness was operator error; the /dev/sd* block nodes
>>>>> were missing (forgotten detail on age old upgrade). Fixed with
>>>>> mknod.
>>>>>
>>>>> So, I forced mdadm to assemble and it is reconstructing now.
>>>>> Troublesome, though, that 2 drives fail at once like this. I
>>>>> think I should separate them to different raid-5s, just incase.
>>>>>
>>>>>
>>>>>
>>>>> Guy wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> What failure are you getting? I assume a read error. md will
>>>>>> fail a
>>>>>>
>>>>>>
>>>>>
>>>> drive
>>>>
>>>>
>>>>
>>>>
>>>>>> when it gets a read error from the drive. It is "normal" to have
>>>>>> a read
>>>>>> error once in a while, but more than 1 a year may indicate a
>>>>>> drive going
>>>>>> bad.
>>>>>>
>>>>>> I test my drives with this command:
>>>>>> dd if=/dev/hdi of=/dev/null bs=64k
>>>>>>
>>>>>> You may look into using "smartd". It monitors and tests disks for
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> problems.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> However, my dd test finds them first. smartd has never told me
>>>>>> anything
>>>>>> useful, but my drives are old, and are not smart enough for smartd.
>>>>>>
>>>>>> Guy
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: linux-raid-owner@vger.kernel.org
>>>>>> [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Michael
>>>>>> Stumpf
>>>>>> Sent: Wednesday, December 08, 2004 4:03 PM
>>>>>> To: linux-raid@vger.kernel.org
>>>>>> Subject: 2 drive dropout (and raid 5), simultaneous, after 3 years
>>>>>>
>>>>>>
>>>>>> I've got a an LVM cobbled together of 2 RAID-5 md's. For the
>>>>>> longest time I was running with 3 promise cards and surviving
>>>>>> everything including the occasional drive failure, then suddenly
>>>>>> I had double drive dropouts and the array would go into a
>>>>>> degraded state.
>>>>>>
>>>>>> 10 drives in the system, Linux 2.4.22, Slackware 9, mdadm v1.2.0
>>>>>> (13 mar 2003)
>>>>>>
>>>>>> I started to diagnose; fdisk -l /dev/hdi returned nothing for
>>>>>> the two failed drives, but "dmesg" reports that the drives are
>>>>>> happy, and that the md would have been automounted if not for a
>>>>>> mismatch on the event counters (of the 2 failed drives).
>>>>>>
>>>>>> I assumed that this had something to do with my semi-nonstandard
>>>>>> application of a zillion (3) promise cards in 1 system, but I
>>>>>> never had this problem before. I ripped out the promise cards
>>>>>> and stuck in 3ware 5700s, cleaning it up a bit and also putting a
>>>>>> single drive per ATA channel. Two weeks later, the same problem
>>>>>> crops up again.
>>>>>>
>>>>>> The "problematic" drives are even mixed; 1 is WD, 1 is Maxtor (both
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> 120gig).
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Is this a known bug in 2.4.22 or mdadm 1.2.0? Suggestions?
>>>>>>
>>>>>>
>>>>>> --------------------------------------------
>>>>>> My mailbox is spam-free with ChoiceMail, the leader in personal and
>>>>>> corporate anti-spam solutions. Download your free copy of
>>>>>> ChoiceMail from
>>>>>> www.choicemailfree.com
>>>>>>
>>>>>> -
>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>> linux-raid" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>> -
>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>> linux-raid" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --------------------------------------------
>>>>> My mailbox is spam-free with ChoiceMail, the leader in personal and
>>>>> corporate anti-spam solutions. Download your free copy of
>>>>> ChoiceMail from
>>>>> www.choicemailfree.com
>>>>>
>>>>> -
>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>> linux-raid" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>> -
>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>> linux-raid" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> --------------------------------------------
>>>> My mailbox is spam-free with ChoiceMail, the leader in personal and
>>>> corporate anti-spam solutions. Download your free copy of
>>>> ChoiceMail from
>>>> www.choicemailfree.com
>>>>
>>>> -
>>>> To unsubscribe from this list: send the line "unsubscribe
>>>> linux-raid" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>> --------------------------------------------
>>> My mailbox is spam-free with ChoiceMail, the leader in personal and
>>> corporate anti-spam solutions. Download your free copy of ChoiceMail
>>> from www.choicemailfree.com
>>>
>>> -
>>> To unsubscribe from this list: send the line "unsubscribe
>>> linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>
>
>
--------------------------------------------
My mailbox is spam-free with ChoiceMail, the leader in personal and corporate anti-spam solutions. Download your free copy of ChoiceMail from www.choicemailfree.com
prev parent reply other threads:[~2004-12-16 3:55 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-12-08 21:02 2 drive dropout (and raid 5), simultaneous, after 3 years Michael Stumpf
2004-12-08 22:07 ` Guy
2004-12-09 4:46 ` Michael Stumpf
2004-12-09 4:57 ` Guy
2004-12-09 14:44 ` Michael Stumpf
2004-12-09 16:42 ` Guy
2004-12-09 17:22 ` Michael Stumpf
2004-12-15 17:45 ` Doug Ledford
[not found] ` <41C10709.4050303@pobox.com>
2004-12-16 3:55 ` Michael Stumpf [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=41C1073F.2040003@pobox.com \
--to=mjstumpf@pobox.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).