Re: Busted disks caused healthy ones to fail

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Michael Stumpf <mjstumpf@pobox.com>
To: comsatcat@earthlink.net, linux-raid@vger.kernel.org
Subject: Re: Busted disks caused healthy ones to fail
Date: Tue, 14 Dec 2004 08:11:43 -0600	[thread overview]
Message-ID: <41BEF49F.5010907@pobox.com> (raw)
In-Reply-To: <1103012937.8162.23.camel@solaris.skunkware.org>

14 Internal drives on a single power supply plus the mb/cpu/etc?  Oy; 
I've got 15 + a p2-400 spinning between 2 550w power supplies, and I'm 
worried it is getting overloaded.  I might be paranoid, but I had some 
flakiness that was pretty much impossible to debug, so I took broad 
steps and overestimated.  Figured that maybe a heavily loaded supply 
could hiccup under an unusual condition if too many were attached to 
one..  and, while anecdotal, my once-a-month drive hiccup (require 
re-add to array, nothing else) problem did go away when I added a power 
supply.

comsatcat wrote:

>The two disks that were actually dead were both on a different bus.  The
>OS disk that died was on scsi0.
>
>Is there a way around this behavior (ie: kernel params that can be
>adjusted such as timeout values and queuing)?  It never really recovered
>correctly after the disks died, a manual reboot as required.
>Applications which were using the failed devices would hang forever (I'm
>assuming they were waiting for queued commands to complete).
>
>IDE: not in use
>Power: 14 internal drives, no external
>Temp: fust fine
>Kids: Upstairs taking tech calls.
>
>
>Thanks,
>Ben
>
>
>On Tue, 2004-12-14 at 01:55 -0500, Guy wrote:
>  
>
>>Did the disks that failed have anything in common?
>>
>>SCSI:
>>If you have disks on 1 SCSI bus, a single failed disk can affect other
>>disks.  By removing the bad disk you correct the problems with the others.
>>
>>IDE:  (or what ever they call it today)
>>2 disks on 1 bus, 1 drive failure will cause the other to fail most of the
>>time.
>>
>>Power supply:
>>If you have external disks, they will have another power supply.  If you
>>have problems with this power supply, they all could be affected.  Even a
>>common power cable can cause multi drive failures.
>>
>>Temperature:
>>Disks getting too hot can cause failures.
>>
>>Kids:
>>Someone turned the disk cabinet off?
>>
>>I am sure this list is not complete.  But it may help.
>>
>>Guy
>>
>>-----Original Message-----
>>From: linux-raid-owner@vger.kernel.org
>>[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of comsatcat
>>Sent: Tuesday, December 14, 2004 1:42 AM
>>To: linux-raid@vger.kernel.org
>>Subject: Busted disks caused healthy ones to fail
>>
>>An odd thing happened this weekend.  We were doing some heavy I/O when
>>one of our servers had two drives in two seperate raid1 mirrors pop.
>>This was not odd as these drives are old and the batch they are from
>>have been failing on other boxen as well.  What is odd is that our brand
>>new disks which the OS resides on (2 drives in raid 1) half busted.
>>
>>There are 4 md devices
>>
>>md/0  
>>md/1
>>md/2
>>md/3
>>
>>md3, md2, and md1 all lost the 2nd drive in the array (sdh3, sdh6, and
>>sdh5).  md0 however was fine with sdh1 being fine.  Why would losing
>>disks cause a seemingly healthy disk to go astray?
>>
>>P.S. I have pull out tons of syslogs showing the two bad disks failing
>>if that would help.
>>
>>
>>Thanks,
>>Ben
>>
>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>    
>>
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>  
>


--------------------------------------------
My mailbox is spam-free with ChoiceMail, the leader in personal and corporate anti-spam solutions. Download your free copy of ChoiceMail from www.choicemailfree.com

next prev parent reply	other threads:[~2004-12-14 14:11 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-12-14  6:42 Busted disks caused healthy ones to fail comsatcat
2004-12-14  6:55 ` Guy
2004-12-14  8:28   ` comsatcat
2004-12-14 14:11     ` Michael Stumpf [this message]
2004-12-14 22:34       ` comsatcat
2004-12-14 15:22     ` Guy
2004-12-14 20:13       ` Brad Campbell
2004-12-14 21:47         ` Guy
2004-12-14 23:54           ` Alvin Oga
2004-12-15  1:03             ` Guy
2004-12-15  1:23               ` Alvin Oga
2004-12-14 21:49         ` Jim Paris
2004-12-14 22:13           ` Guy
2004-12-15  4:46           ` Brad Campbell
2004-12-15  5:04             ` Guy
2004-12-15  5:22               ` Brad Campbell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=41BEF49F.5010907@pobox.com \
    --to=mjstumpf@pobox.com \
    --cc=comsatcat@earthlink.net \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).