Re: 2 drive dropout (and raid 5), simultaneous, after 3 years

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Michael Stumpf <mjstumpf@pobox.com>
To: Guy <bugzilla@watkins-home.com>, linux-raid@vger.kernel.org
Subject: Re: 2 drive dropout (and raid 5), simultaneous, after 3 years
Date: Thu, 09 Dec 2004 08:44:43 -0600	[thread overview]
Message-ID: <41B864DB.3070105@pobox.com> (raw)
In-Reply-To: <200412090457.iB94vD916489@www.watkins-home.com>

All I see is this:

Apr 14 22:03:56 drown kernel: scsi: device set offline - not ready or 
command retry failed after host reset: host 1 channel 0 id 2 lun 0
Apr 14 22:03:56 drown kernel: scsi: device set offline - not ready or 
command retry failed after host reset: host 1 channel 0 id 3 lun 0
Apr 14 22:03:56 drown kernel: md: updating md1 RAID superblock on device
Apr 14 22:03:56 drown kernel: md: (skipping faulty sdj1 )
Apr 14 22:03:56 drown kernel: md: (skipping faulty sdi1 )
Apr 14 22:03:56 drown kernel: md: sdh1 [events: 000000b5]<6>(write) 
sdh1's sb offset: 117186944
Apr 14 22:03:56 drown kernel: md: sdg1 [events: 000000b5]<6>(write) 
sdg1's sb offset: 117186944
Apr 14 22:03:56 drown kernel: md: recovery thread got woken up ...
Apr 14 22:03:56 drown kernel: md: recovery thread finished ...

What the heck could that be?  Can that possibly be related to the fact 
that there weren't proper block device nodes sitting in the filesystem?!

I already ran WD's wonky tool to fix their "DMA timeout" problem, and 
one of the drives is a maxtor.  They're on separate ATA cables, and I've 
got about 5 drives per power supply.  I checked heat, and it wasn't very 
high.

Any other sources of information I could tap?  Maybe an "MD debug" 
setting in the kernel with a recompile?

Guy wrote:

>You should have some sort of md error in your logs.  Try this command:
>grep "md:" /var/log/messages*|more
>
>Yes, they don't play well together, so separate them!  :)
>
>Guy
>
>-----Original Message-----
>From: linux-raid-owner@vger.kernel.org
>[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Michael Stumpf
>Sent: Wednesday, December 08, 2004 11:46 PM
>To: linux-raid@vger.kernel.org
>Subject: Re: 2 drive dropout (and raid 5), simultaneous, after 3 years
>
>No idea what failure is occuring.  Your dd test, run from begin to end 
>of each drive, completed fine.  Smartd had no info to report.
>
>The fdisk weirdness was operator error; the /dev/sd* block nodes were 
>missing (forgotten detail on age old upgrade).  Fixed with mknod.
>
>So, I forced mdadm to assemble and it is reconstructing now.  
>Troublesome, though, that 2 drives fail at once like this.  I think I 
>should separate them to different raid-5s, just incase.
>
>
>
>Guy wrote:
>
>  
>
>>What failure are you getting?  I assume a read error.  md will fail a drive
>>when it gets a read error from the drive.  It is "normal" to have a read
>>error once in a while, but more than 1 a year may indicate a drive going
>>bad.
>>
>>I test my drives with this command:
>>dd if=/dev/hdi of=/dev/null bs=64k
>>
>>You may look into using "smartd".  It monitors and tests disks for
>>    
>>
>problems.
>  
>
>>However, my dd test finds them first.  smartd has never told me anything
>>useful, but my drives are old, and are not smart enough for smartd.
>>
>>Guy
>>
>>-----Original Message-----
>>From: linux-raid-owner@vger.kernel.org
>>[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Michael Stumpf
>>Sent: Wednesday, December 08, 2004 4:03 PM
>>To: linux-raid@vger.kernel.org
>>Subject: 2 drive dropout (and raid 5), simultaneous, after 3 years
>>
>>
>>I've got a an LVM cobbled together of 2 RAID-5 md's.  For the longest 
>>time I was running with 3 promise cards and surviving everything 
>>including the occasional drive failure, then suddenly I had double drive 
>>dropouts and the array would go into a degraded state.
>>
>>10 drives in the system, Linux 2.4.22, Slackware 9, mdadm v1.2.0 (13 mar 
>>2003)
>>
>>I started to diagnose; fdisk -l /dev/hdi  returned nothing for the two 
>>failed drives, but "dmesg" reports that the drives are happy, and that 
>>the md would have been automounted if not for a mismatch on the event 
>>counters (of the 2 failed drives).
>>
>>I assumed that this had something to do with my semi-nonstandard 
>>application of a zillion (3) promise cards in 1 system, but I never had 
>>this problem before.  I ripped out the promise cards and stuck in 3ware 
>>5700s, cleaning it up a bit and also putting a single drive per ATA 
>>channel.  Two weeks later, the same problem crops up again.
>>
>>The "problematic" drives are even mixed; 1 is WD, 1 is Maxtor (both
>>    
>>
>120gig).
>  
>
>>Is this a known bug in 2.4.22 or mdadm 1.2.0?  Suggestions?
>>
>>
>>--------------------------------------------
>>My mailbox is spam-free with ChoiceMail, the leader in personal and
>>corporate anti-spam solutions. Download your free copy of ChoiceMail from
>>www.choicemailfree.com
>>
>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>> 
>>
>>    
>>
>
>
>--------------------------------------------
>My mailbox is spam-free with ChoiceMail, the leader in personal and
>corporate anti-spam solutions. Download your free copy of ChoiceMail from
>www.choicemailfree.com
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>  
>


--------------------------------------------
My mailbox is spam-free with ChoiceMail, the leader in personal and corporate anti-spam solutions. Download your free copy of ChoiceMail from www.choicemailfree.com

next prev parent reply	other threads:[~2004-12-09 14:44 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-12-08 21:02 2 drive dropout (and raid 5), simultaneous, after 3 years Michael Stumpf
2004-12-08 22:07 ` Guy
2004-12-09  4:46   ` Michael Stumpf
2004-12-09  4:57     ` Guy
2004-12-09 14:44       ` Michael Stumpf [this message]
2004-12-09 16:42         ` Guy
2004-12-09 17:22           ` Michael Stumpf
2004-12-15 17:45             ` Doug Ledford
     [not found]               ` <41C10709.4050303@pobox.com>
2004-12-16  3:55                 ` Michael Stumpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=41B864DB.3070105@pobox.com \
    --to=mjstumpf@pobox.com \
    --cc=bugzilla@watkins-home.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).