Re: Busted disks caused healthy ones to fail

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Michael Stumpf <mjstumpf@pobox.com>
To: comsatcat@earthlink.net, linux-raid@vger.kernel.org
Subject: Re: Busted disks caused healthy ones to fail
Date: Tue, 14 Dec 2004 08:11:43 -0600	[thread overview]
Message-ID: <41BEF49F.5010907@pobox.com> (raw)
In-Reply-To: <1103012937.8162.23.camel@solaris.skunkware.org>

14 Internal drives on a single power supply plus the mb/cpu/etc?  Oy; 
I've got 15 + a p2-400 spinning between 2 550w power supplies, and I'm 
worried it is getting overloaded.  I might be paranoid, but I had some 
flakiness that was pretty much impossible to debug, so I took broad 
steps and overestimated.  Figured that maybe a heavily loaded supply 
could hiccup under an unusual condition if too many were attached to 
one..  and, while anecdotal, my once-a-month drive hiccup (require 
re-add to array, nothing else) problem did go away when I added a power 
supply.

comsatcat wrote:

>The two disks that were actually dead were both on a different bus.  The
>OS disk that died was on scsi0.
>
>Is there a way around this behavior (ie: kernel params that can be
>adjusted such as timeout values and queuing)?  It never really recovered
>correctly after the disks died, a manual reboot as required.
>Applications which were using the failed devices would hang forever (I'm
>assuming they were waiting for queued commands to complete).
>
>IDE: not in use
>Power: 14 internal drives, no external
>Temp: fust fine
>Kids: Upstairs taking tech calls.
>
>
>Thanks,
>Ben
>
>
>On Tue, 2004-12-14 at 01:55 -0500, Guy wrote:
>  
>
>>Did the disks that failed have anything in common?
>>
>>SCSI:
>>If you have disks on 1 SCSI bus, a single failed disk can affect other
>>disks.  By removing the bad disk you correct the problems with the others.
>>
>>IDE:  (or what ever they call it today)
>>2 disks on 1 bus, 1 drive failure will cause the other to fail most of the
>>time.
>>
>>Power supply:
>>If you have external disks, they will have another power supply.  If you
>>have problems with this power supply, they all could be affected.  Even a
>>common power cable can cause multi drive failures.
>>
>>Temperature:
>>Disks getting too hot can cause failures.
>>
>>Kids:
>>Someone turned the disk cabinet off?
>>
>>I am sure this list is not complete.  But it may help.
>>
>>Guy
>>
>>-----Original Message-----
>>From: linux-raid-owner@vger.kernel.org
>>[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of comsatcat
>>Sent: Tuesday, December 14, 2004 1:42 AM
>>To: linux-raid@vger.kernel.org
>>Subject: Busted disks caused healthy ones to fail
>>
>>An odd thing happened this weekend.  We were doing some heavy I/O when
>>one of our servers had two drives in two seperate raid1 mirrors pop.
>>This was not odd as these drives are old and the batch they are from
>>have been failing on other boxen as well.  What is odd is that our brand
>>new disks which the OS resides on (2 drives in raid 1) half busted.
>>
>>There are 4 md devices
>>
>>md/0  
>>md/1
>>md/2
>>md/3
>>
>>md3, md2, and md1 all lost the 2nd drive in the array (sdh3, sdh6, and
>>sdh5).  md0 however was fine with sdh1 being fine.  Why would losing
>>disks cause a seemingly healthy disk to go astray?
>>
>>P.S. I have pull out tons of syslogs showing the two bad disks failing
>>if that would help.
>>
>>
>>Thanks,
>>Ben
>>
>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>    
>>
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>  
>


--------------------------------------------
My mailbox is spam-free with ChoiceMail, the leader in personal and corporate anti-spam solutions. Download your free copy of ChoiceMail from www.choicemailfree.com

next prev parent reply	other threads:[~2004-12-14 14:11 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-12-14  6:42 Busted disks caused healthy ones to fail comsatcat
2004-12-14  6:55 ` Guy
2004-12-14  8:28   ` comsatcat
2004-12-14 14:11     ` Michael Stumpf [this message]
2004-12-14 22:34       ` comsatcat
2004-12-14 15:22     ` Guy
2004-12-14 20:13       ` Brad Campbell
2004-12-14 21:47         ` Guy
2004-12-14 23:54           ` Alvin Oga
2004-12-15  1:03             ` Guy
2004-12-15  1:23               ` Alvin Oga
2004-12-14 21:49         ` Jim Paris
2004-12-14 22:13           ` Guy
2004-12-15  4:46           ` Brad Campbell
2004-12-15  5:04             ` Guy
2004-12-15  5:22               ` Brad Campbell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=41BEF49F.5010907@pobox.com \
    --to=mjstumpf@pobox.com \
    --cc=comsatcat@earthlink.net \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.