From: Stan Hoeppner <stan@hardwarefreak.com>
To: stan@hardwarefreak.com
Cc: Barrett Lewis <barrett.lewis.mitsi@gmail.com>,
"linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>
Subject: Re: Mdadm server eating drives
Date: Tue, 02 Jul 2013 14:54:01 -0500 [thread overview]
Message-ID: <51D32FD9.3020906@hardwarefreak.com> (raw)
In-Reply-To: <51D32DBB.8030401@hardwarefreak.com>
Forgot to ask previously. This system is attached to a UPS isn't it?
--
Stan
On 7/2/2013 2:44 PM, Stan Hoeppner wrote:
> On 7/2/2013 10:48 AM, Barrett Lewis wrote:
>> After sending the last email I went out and bought 2 new WD reds, and
>> a new motherboard. I came back and in those 2 hours all but 1 of my
>> drives failed to the point of being unable to read the superblock so
>> it really seems like my array is ended
>
> The drive may be ok. They all may be.
>
>> On Mon, Jul 1, 2013 at 8:57 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
>>>> I noticed one drive was going up and down and determined that
>>>> the drive had actual physical damage to the power connecter and
>>>> was losing and regaining power through vibration.
>>>
>>> This intermittent contact could have damaged the PSU. You've continued
>>> to have drive and lockup problems since replacing this drive with bad
>>> connector.
>>
>> I hadn't thought of it until you said so but I bet you are right about
>> the iffy connector. It certainly seemed as if I never had an issue
>> with the array for 8 months, and then suddenly everything got unstable
>> at once, and since then I've lost atleast 6 hard drives.
>
> Your drives may not be toast. Don't toss them out, and don't throw up
> your hands yet.
>
>>> The pink elephant in the room is thermal failure due to insufficient
>>> airflow. The symptoms you describe sound like drives overheating. What
>>> chassis is this? Make/model please. If you've installed individual
>>> drive hot swap cages, etc, it would be helpful if you snapped a photo or
>>> two and made those available.
>>
>> It is also possible that there were cooling issues. The case is an
>> NZXT H2. It has some fans blowing directly on all the hard drives,
>> but there were a few times I have to admit I took the fans off to work
>> on things and forgot to put them back on for a few days, coming back
>> to find them very hot to the touch. I would have mentioned that
>> earlier, but a data recovery place told me that it was unlikely that
>> would be a culprit (after they had my money).
>
> I checked out the chassis on the NZXT site. With the front fans
> removed, you have only 2x120mm low rpm, low static pressure, and low CFM
> exhaust fans, one on in the PSU, one top rear. With 8 drives packed in
> such close proximity and with other lower resistance intake paths (the
> perforated chassis bottom), you won't get enough air through the front
> drive cage to cool those drives properly over a long period.
>
> However, running with the two front fans removed for a couple of days on
> an occasion or two shouldn't have overheated the drives to the point of
> permanent damage, assuming ambient air temp was ~75F or lower, and
> assuming you were not performing long array operations such as rebuilds
> or reshapes--if you did so the drives could get hot enough, long enough,
> to be permanently damaged.
>
>> Maybe thats all academic at this point. I guess i'll have to rebuild
>> my server from scratch since all my disks seem destroyed and I can't
>> trust the mobo, cpu, or psu.
>
> Don't start over. Not just yet. Leave everything as is for now.
> Simply replace the PSU. Fire it up and see what you can recover.
>
>> The psu wasn't dirt cheap, Thermaltake TR2 500w @ $58.
>
> The price isn't relevant. The quality and rail configuration is, and
> whether it's been damaged. I checked the spec on your TR2-500
> yesterday. It has dual +12V rails, one rated at 18A and one at 17A. I
> was unable to locate a wiring diagram for it. On paper it should have
> plenty of juice for your gear when in working order. My assumption here
> is that something internal to it may have failed.
>
>> Should I buy all new
>> everything?
>
> I wouldn't. Most of your gear is probably fine. Get the PSU swapped
> out and see if that fixes it. You may still have to wipe the drives and
> build a new array. You should know pretty quickly if the PSU swap fixed
> the problem, as drives will not continue to drop, or they will. You
> already have a new mobo in hand, so if the PSU isn't the problem, swap
> the mobo. That's a good chassis design with good airflow assuming you
> keep the front fans in it. Why you'd leave them removed is beyond me.
>
>> If so, while I'm at can you suggest a set of consumer
>> level hardware ideal running a personal mdadm server. Powered but not
>> overpowered, reliable not bleeding edge. If I need 6-8 sata ports,
>> should I do onboard or get a controller?
>
> A new HBA shouldn't be necessary. But if you choose to go that route
> further down the road I'd recommend an LSI 9211-8i.
>
>> I still have one backup allthough I'm very nervous now since it's on a
>> 3 disk RAID0, just asking to implode (created in an emergency).
>
> I assume this resides on a different machine.
>
> Swap the PSU. Recover the array if possible. If not blow it away and
> create new. If no drives drop out you're probably golden and the PSU
> fixed the problem. If they drop, swap in the new mobo. At that point
> you'll have replaced everything that could be the source of the problem
> but for the remaining original drives. They can't all be bad, if any.
> Always run with those front fans installed.
>
next prev parent reply other threads:[~2013-07-02 19:54 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-12 13:47 Mdadm server eating drives Barrett Lewis
2013-06-12 13:57 ` David Brown
2013-06-12 14:44 ` Phil Turmel
2013-06-12 15:41 ` Adam Goryachev
[not found] ` <CAPSPcXihHrAi2TB9Fuxb1qOGMc_WzwGoXAA7nHdwe2knkO0LkQ@mail.gmail.com>
[not found] ` <CAPSPcXib4YZ9Ah-jLvL_kPwpKHLxaGT0rNaDL4XQcFm=RtjcAQ@mail.gmail.com>
2013-06-14 0:19 ` Barrett Lewis
2013-06-14 2:08 ` Phil Turmel
[not found] ` <CAPSPcXgMxOF-C2Szu_nf4ZLDC8p+yJFOtvLPu7xy1DTW9VAHjg@mail.gmail.com>
2013-06-14 21:18 ` Barrett Lewis
2013-06-14 21:20 ` Barrett Lewis
2013-06-14 21:25 ` Phil Turmel
2013-06-14 21:30 ` Phil Turmel
2013-06-17 21:37 ` Barrett Lewis
2013-06-18 4:13 ` Mikael Abrahamsson
2013-06-27 0:23 ` Barrett Lewis
2013-06-27 17:13 ` Nicolas Jungers
2013-07-02 0:17 ` Barrett Lewis
2013-07-02 1:57 ` Stan Hoeppner
2013-07-02 15:48 ` Barrett Lewis
2013-07-02 19:44 ` Stan Hoeppner
2013-07-02 19:54 ` Stan Hoeppner [this message]
2013-07-02 20:07 ` Jon Nelson
2013-07-02 20:23 ` Stan Hoeppner
2013-07-02 20:58 ` Barrett Lewis
2013-07-03 1:50 ` Stan Hoeppner
2013-07-03 5:26 ` Barrett Lewis
2013-07-03 14:03 ` Jon Nelson
2013-07-03 14:36 ` Phil Turmel
2013-07-03 17:32 ` Stan Hoeppner
2013-07-03 19:47 ` Barrett Lewis
2013-07-03 20:38 ` Jon Nelson
2013-07-04 2:21 ` Stan Hoeppner
2013-07-03 17:05 ` Stan Hoeppner
2013-07-02 21:49 ` Phil Turmel
2013-06-14 21:24 ` Phil Turmel
2013-07-29 22:25 ` Roy Sigurd Karlsbakk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51D32FD9.3020906@hardwarefreak.com \
--to=stan@hardwarefreak.com \
--cc=barrett.lewis.mitsi@gmail.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).