From: Michael Stumpf <mjstumpf@pobox.com>
To: comsatcat@earthlink.net, linux-raid@vger.kernel.org
Subject: Re: Busted disks caused healthy ones to fail
Date: Tue, 14 Dec 2004 08:11:43 -0600 [thread overview]
Message-ID: <41BEF49F.5010907@pobox.com> (raw)
In-Reply-To: <1103012937.8162.23.camel@solaris.skunkware.org>
14 Internal drives on a single power supply plus the mb/cpu/etc? Oy;
I've got 15 + a p2-400 spinning between 2 550w power supplies, and I'm
worried it is getting overloaded. I might be paranoid, but I had some
flakiness that was pretty much impossible to debug, so I took broad
steps and overestimated. Figured that maybe a heavily loaded supply
could hiccup under an unusual condition if too many were attached to
one.. and, while anecdotal, my once-a-month drive hiccup (require
re-add to array, nothing else) problem did go away when I added a power
supply.
comsatcat wrote:
>The two disks that were actually dead were both on a different bus. The
>OS disk that died was on scsi0.
>
>Is there a way around this behavior (ie: kernel params that can be
>adjusted such as timeout values and queuing)? It never really recovered
>correctly after the disks died, a manual reboot as required.
>Applications which were using the failed devices would hang forever (I'm
>assuming they were waiting for queued commands to complete).
>
>IDE: not in use
>Power: 14 internal drives, no external
>Temp: fust fine
>Kids: Upstairs taking tech calls.
>
>
>Thanks,
>Ben
>
>
>On Tue, 2004-12-14 at 01:55 -0500, Guy wrote:
>
>
>>Did the disks that failed have anything in common?
>>
>>SCSI:
>>If you have disks on 1 SCSI bus, a single failed disk can affect other
>>disks. By removing the bad disk you correct the problems with the others.
>>
>>IDE: (or what ever they call it today)
>>2 disks on 1 bus, 1 drive failure will cause the other to fail most of the
>>time.
>>
>>Power supply:
>>If you have external disks, they will have another power supply. If you
>>have problems with this power supply, they all could be affected. Even a
>>common power cable can cause multi drive failures.
>>
>>Temperature:
>>Disks getting too hot can cause failures.
>>
>>Kids:
>>Someone turned the disk cabinet off?
>>
>>I am sure this list is not complete. But it may help.
>>
>>Guy
>>
>>-----Original Message-----
>>From: linux-raid-owner@vger.kernel.org
>>[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of comsatcat
>>Sent: Tuesday, December 14, 2004 1:42 AM
>>To: linux-raid@vger.kernel.org
>>Subject: Busted disks caused healthy ones to fail
>>
>>An odd thing happened this weekend. We were doing some heavy I/O when
>>one of our servers had two drives in two seperate raid1 mirrors pop.
>>This was not odd as these drives are old and the batch they are from
>>have been failing on other boxen as well. What is odd is that our brand
>>new disks which the OS resides on (2 drives in raid 1) half busted.
>>
>>There are 4 md devices
>>
>>md/0
>>md/1
>>md/2
>>md/3
>>
>>md3, md2, and md1 all lost the 2nd drive in the array (sdh3, sdh6, and
>>sdh5). md0 however was fine with sdh1 being fine. Why would losing
>>disks cause a seemingly healthy disk to go astray?
>>
>>P.S. I have pull out tons of syslogs showing the two bad disks failing
>>if that would help.
>>
>>
>>Thanks,
>>Ben
>>
>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
>
--------------------------------------------
My mailbox is spam-free with ChoiceMail, the leader in personal and corporate anti-spam solutions. Download your free copy of ChoiceMail from www.choicemailfree.com
next prev parent reply other threads:[~2004-12-14 14:11 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-12-14 6:42 Busted disks caused healthy ones to fail comsatcat
2004-12-14 6:55 ` Guy
2004-12-14 8:28 ` comsatcat
2004-12-14 14:11 ` Michael Stumpf [this message]
2004-12-14 22:34 ` comsatcat
2004-12-14 15:22 ` Guy
2004-12-14 20:13 ` Brad Campbell
2004-12-14 21:47 ` Guy
2004-12-14 23:54 ` Alvin Oga
2004-12-15 1:03 ` Guy
2004-12-15 1:23 ` Alvin Oga
2004-12-14 21:49 ` Jim Paris
2004-12-14 22:13 ` Guy
2004-12-15 4:46 ` Brad Campbell
2004-12-15 5:04 ` Guy
2004-12-15 5:22 ` Brad Campbell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=41BEF49F.5010907@pobox.com \
--to=mjstumpf@pobox.com \
--cc=comsatcat@earthlink.net \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).