Re: 2 drive dropout (and raid 5), simultaneous, after 3 years

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Michael Stumpf <mjstumpf@pobox.com>
To: linux-raid@vger.kernel.org
Subject: Re: 2 drive dropout (and raid 5), simultaneous, after 3 years
Date: Wed, 15 Dec 2004 21:55:43 -0600	[thread overview]
Message-ID: <41C1073F.2040003@pobox.com> (raw)
In-Reply-To: <41C10709.4050303@pobox.com>

Good advice.  I was actually about to post something I just found--at 
Fry's, Antec appears to be selling Y power connectors at the typical 
price, $2 each (2 per packet, $4/packet).  These however are different.  
They are more solid and connect more "rigidly" than anything I've bought 
in the last 4 years, and they are setup not so much as a "Y" but as an 
"M" in a "daisy chaining" fashion (  male --------female--------female ).

>
> Absolutely ideal for the kind of nonsense we are pursuing.
>
>
>
> Doug Ledford wrote:
>
>> On Thu, 2004-12-09 at 11:22 -0600, Michael Stumpf wrote:
>>  
>>
>>> Ahhhhhhh.. You're on to something here.  In all my years of ghetto 
>>> raid one of the weakest things I've seen is the 
>>> Y-molex-power-splitters.  Do you know where more solid ones can be 
>>> found?  I'm to the point where I'd pay $10 or more for the bloody 
>>> things if they didnt blink the power connection when moved a little 
>>> bit.
>>>
>>> I'll bet good money this is what happened.  Maybe I need to break 
>>> out the soldering iron, but that's kind of an ugly, proprietary, and 
>>> slow solution.
>>>   
>>
>>
>> Well, that is usually overkill anyway ;-)  I've solved this problem in
>> the past by simply getting out a pair of thin nose needle nose pliers
>> and crimping down on the connector's actual grip points.  Once I
>> tightened up the grip spots on the Y connector, the problem went away.
>>
>>  
>>
>>> Guy wrote:
>>>
>>>   
>>>
>>>> Since they both went off line at the same time, check the power 
>>>> cables.  Do
>>>> they share a common power cable, or doe each have a unique cable 
>>>> directly
>>>>     
>>>> from the power supply.
>>>
>>>   
>>>
>>>> Switch power connections with another drive to see if the problem 
>>>> stays with
>>>> the power connection.
>>>>
>>>> Guy
>>>>
>>>> -----Original Message-----
>>>> From: linux-raid-owner@vger.kernel.org
>>>> [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Michael Stumpf
>>>> Sent: Thursday, December 09, 2004 9:45 AM
>>>> To: Guy; linux-raid@vger.kernel.org
>>>> Subject: Re: 2 drive dropout (and raid 5), simultaneous, after 3 years
>>>>
>>>> All I see is this:
>>>>
>>>> Apr 14 22:03:56 drown kernel: scsi: device set offline - not ready 
>>>> or command retry failed after host reset: host 1 channel 0 id 2 lun 0
>>>> Apr 14 22:03:56 drown kernel: scsi: device set offline - not ready 
>>>> or command retry failed after host reset: host 1 channel 0 id 3 lun 0
>>>> Apr 14 22:03:56 drown kernel: md: updating md1 RAID superblock on 
>>>> device
>>>> Apr 14 22:03:56 drown kernel: md: (skipping faulty sdj1 )
>>>> Apr 14 22:03:56 drown kernel: md: (skipping faulty sdi1 )
>>>> Apr 14 22:03:56 drown kernel: md: sdh1 [events: 000000b5]<6>(write) 
>>>> sdh1's sb offset: 117186944
>>>> Apr 14 22:03:56 drown kernel: md: sdg1 [events: 000000b5]<6>(write) 
>>>> sdg1's sb offset: 117186944
>>>> Apr 14 22:03:56 drown kernel: md: recovery thread got woken up ...
>>>> Apr 14 22:03:56 drown kernel: md: recovery thread finished ...
>>>>
>>>> What the heck could that be?  Can that possibly be related to the 
>>>> fact that there weren't proper block device nodes sitting in the 
>>>> filesystem?!
>>>>
>>>> I already ran WD's wonky tool to fix their "DMA timeout" problem, 
>>>> and one of the drives is a maxtor.  They're on separate ATA cables, 
>>>> and I've got about 5 drives per power supply.  I checked heat, and 
>>>> it wasn't very high.
>>>>
>>>> Any other sources of information I could tap?  Maybe an "MD debug" 
>>>> setting in the kernel with a recompile?
>>>>
>>>> Guy wrote:
>>>>
>>>>
>>>>
>>>>     
>>>>
>>>>> You should have some sort of md error in your logs.  Try this 
>>>>> command:
>>>>> grep "md:" /var/log/messages*|more
>>>>>
>>>>> Yes, they don't play well together, so separate them!  :)
>>>>>
>>>>> Guy
>>>>>
>>>>> -----Original Message-----
>>>>> From: linux-raid-owner@vger.kernel.org
>>>>> [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Michael Stumpf
>>>>> Sent: Wednesday, December 08, 2004 11:46 PM
>>>>> To: linux-raid@vger.kernel.org
>>>>> Subject: Re: 2 drive dropout (and raid 5), simultaneous, after 3 
>>>>> years
>>>>>
>>>>> No idea what failure is occuring.  Your dd test, run from begin to 
>>>>> end of each drive, completed fine.  Smartd had no info to report.
>>>>>
>>>>> The fdisk weirdness was operator error; the /dev/sd* block nodes 
>>>>> were missing (forgotten detail on age old upgrade).  Fixed with 
>>>>> mknod.
>>>>>
>>>>> So, I forced mdadm to assemble and it is reconstructing now.  
>>>>> Troublesome, though, that 2 drives fail at once like this.  I 
>>>>> think I should separate them to different raid-5s, just incase.
>>>>>
>>>>>
>>>>>
>>>>> Guy wrote:
>>>>>
>>>>>
>>>>>
>>>>>  
>>>>>       
>>>>>
>>>>>> What failure are you getting?  I assume a read error.  md will 
>>>>>> fail a
>>>>>>    
>>>>>>         
>>>>>
>>>> drive
>>>>
>>>>
>>>>     
>>>>
>>>>>> when it gets a read error from the drive.  It is "normal" to have 
>>>>>> a read
>>>>>> error once in a while, but more than 1 a year may indicate a 
>>>>>> drive going
>>>>>> bad.
>>>>>>
>>>>>> I test my drives with this command:
>>>>>> dd if=/dev/hdi of=/dev/null bs=64k
>>>>>>
>>>>>> You may look into using "smartd".  It monitors and tests disks for
>>>>>>  
>>>>>>
>>>>>>    
>>>>>>         
>>>>>
>>>>> problems.
>>>>>
>>>>>
>>>>>  
>>>>>       
>>>>>
>>>>>> However, my dd test finds them first.  smartd has never told me 
>>>>>> anything
>>>>>> useful, but my drives are old, and are not smart enough for smartd.
>>>>>>
>>>>>> Guy
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: linux-raid-owner@vger.kernel.org
>>>>>> [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Michael 
>>>>>> Stumpf
>>>>>> Sent: Wednesday, December 08, 2004 4:03 PM
>>>>>> To: linux-raid@vger.kernel.org
>>>>>> Subject: 2 drive dropout (and raid 5), simultaneous, after 3 years
>>>>>>
>>>>>>
>>>>>> I've got a an LVM cobbled together of 2 RAID-5 md's.  For the 
>>>>>> longest time I was running with 3 promise cards and surviving 
>>>>>> everything including the occasional drive failure, then suddenly 
>>>>>> I had double drive dropouts and the array would go into a 
>>>>>> degraded state.
>>>>>>
>>>>>> 10 drives in the system, Linux 2.4.22, Slackware 9, mdadm v1.2.0 
>>>>>> (13 mar 2003)
>>>>>>
>>>>>> I started to diagnose; fdisk -l /dev/hdi  returned nothing for 
>>>>>> the two failed drives, but "dmesg" reports that the drives are 
>>>>>> happy, and that the md would have been automounted if not for a 
>>>>>> mismatch on the event counters (of the 2 failed drives).
>>>>>>
>>>>>> I assumed that this had something to do with my semi-nonstandard 
>>>>>> application of a zillion (3) promise cards in 1 system, but I 
>>>>>> never had this problem before.  I ripped out the promise cards 
>>>>>> and stuck in 3ware 5700s, cleaning it up a bit and also putting a 
>>>>>> single drive per ATA channel.  Two weeks later, the same problem 
>>>>>> crops up again.
>>>>>>
>>>>>> The "problematic" drives are even mixed; 1 is WD, 1 is Maxtor (both
>>>>>>  
>>>>>>
>>>>>>    
>>>>>>         
>>>>>
>>>>> 120gig).
>>>>>
>>>>>
>>>>>  
>>>>>       
>>>>>
>>>>>> Is this a known bug in 2.4.22 or mdadm 1.2.0?  Suggestions?
>>>>>>
>>>>>>
>>>>>> --------------------------------------------
>>>>>> My mailbox is spam-free with ChoiceMail, the leader in personal and
>>>>>> corporate anti-spam solutions. Download your free copy of 
>>>>>> ChoiceMail from
>>>>>> www.choicemailfree.com
>>>>>>
>>>>>> -
>>>>>> To unsubscribe from this list: send the line "unsubscribe 
>>>>>> linux-raid" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>> -
>>>>>> To unsubscribe from this list: send the line "unsubscribe 
>>>>>> linux-raid" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>  
>>>>>>
>>>>>>    
>>>>>>         
>>>>>
>>>>> --------------------------------------------
>>>>> My mailbox is spam-free with ChoiceMail, the leader in personal and
>>>>> corporate anti-spam solutions. Download your free copy of 
>>>>> ChoiceMail from
>>>>> www.choicemailfree.com
>>>>>
>>>>> -
>>>>> To unsubscribe from this list: send the line "unsubscribe 
>>>>> linux-raid" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>> -
>>>>> To unsubscribe from this list: send the line "unsubscribe 
>>>>> linux-raid" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>  
>>>>>       
>>>>
>>>> --------------------------------------------
>>>> My mailbox is spam-free with ChoiceMail, the leader in personal and
>>>> corporate anti-spam solutions. Download your free copy of 
>>>> ChoiceMail from
>>>> www.choicemailfree.com
>>>>
>>>> -
>>>> To unsubscribe from this list: send the line "unsubscribe 
>>>> linux-raid" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>>>
>>>>
>>>>     
>>>
>>> --------------------------------------------
>>> My mailbox is spam-free with ChoiceMail, the leader in personal and 
>>> corporate anti-spam solutions. Download your free copy of ChoiceMail 
>>> from www.choicemailfree.com
>>>
>>> -
>>> To unsubscribe from this list: send the line "unsubscribe 
>>> linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>   
>>
>
>


--------------------------------------------
My mailbox is spam-free with ChoiceMail, the leader in personal and corporate anti-spam solutions. Download your free copy of ChoiceMail from www.choicemailfree.com

     prev parent reply	other threads:[~2004-12-16  3:55 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-12-08 21:02 2 drive dropout (and raid 5), simultaneous, after 3 years Michael Stumpf
2004-12-08 22:07 ` Guy
2004-12-09  4:46   ` Michael Stumpf
2004-12-09  4:57     ` Guy
2004-12-09 14:44       ` Michael Stumpf
2004-12-09 16:42         ` Guy
2004-12-09 17:22           ` Michael Stumpf
2004-12-15 17:45             ` Doug Ledford
     [not found]               ` <41C10709.4050303@pobox.com>
2004-12-16  3:55                 ` Michael Stumpf [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=41C1073F.2040003@pobox.com \
    --to=mjstumpf@pobox.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).