linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stan Hoeppner <stan@hardwarefreak.com>
To: Barrett Lewis <barrett.lewis.mitsi@gmail.com>
Cc: "linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>
Subject: Re: Mdadm server eating drives
Date: Tue, 02 Jul 2013 14:44:59 -0500	[thread overview]
Message-ID: <51D32DBB.8030401@hardwarefreak.com> (raw)
In-Reply-To: <CAPSPcXjOPben0xGqcUKwyZn4pX403uFbp5f57f=LEV371ZuDjw@mail.gmail.com>

On 7/2/2013 10:48 AM, Barrett Lewis wrote:
> After sending the last email I went out and bought 2 new WD reds, and
> a new motherboard.  I came back and in those 2 hours all but 1 of my
> drives failed to the point of being unable to read the superblock so
> it really seems like my array is ended

The drive may be ok.  They all may be.

> On Mon, Jul 1, 2013 at 8:57 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
>>> I noticed one drive was going up and down and determined that
>>> the drive had actual physical damage to the power connecter and
>>> was losing and regaining power through vibration.
>>
>> This intermittent contact could have damaged the PSU.  You've continued
>> to have drive and lockup problems since replacing this drive with bad
>> connector.
> 
> I hadn't thought of it until you said so but I bet you are right about
> the iffy connector.  It certainly seemed as if I never had an issue
> with the array for 8 months, and then suddenly everything got unstable
> at once, and since then I've lost atleast 6 hard drives.

Your drives may not be toast.  Don't toss them out, and don't throw up
your hands yet.

>> The pink elephant in the room is thermal failure due to insufficient
>> airflow.  The symptoms you describe sound like drives overheating.  What
>> chassis is this?  Make/model please.  If you've installed individual
>> drive hot swap cages, etc, it would be helpful if you snapped a photo or
>> two and made those available.
>
> It is also possible that there were cooling issues.  The case is an
> NZXT H2.  It has some fans blowing directly on all the hard drives,
> but there were a few times I have to admit I took the fans off to work
> on things and forgot to put them back on for a few days, coming back
> to find them very hot to the touch.  I would have mentioned that
> earlier, but a data recovery place told me that it was unlikely that
> would be a culprit (after they had my money).

I checked out the chassis on the NZXT site.  With the front fans
removed, you have only 2x120mm low rpm, low static pressure, and low CFM
exhaust fans, one on in the PSU, one top rear.  With 8 drives packed in
such close proximity and with other lower resistance intake paths (the
perforated chassis bottom), you won't get enough air through the front
drive cage to cool those drives properly over a long period.

However, running with the two front fans removed for a couple of days on
an occasion or two shouldn't have overheated the drives to the point of
permanent damage, assuming ambient air temp was ~75F or lower, and
assuming you were not performing long array operations such as rebuilds
or reshapes--if you did so the drives could get hot enough, long enough,
to be permanently damaged.

> Maybe thats all academic at this point.  I guess i'll have to rebuild
> my server from scratch since all my disks seem destroyed and I can't
> trust the mobo, cpu, or psu.

Don't start over.  Not just yet.  Leave everything as is for now.
Simply replace the PSU.  Fire it up and see what you can recover.

> The psu wasn't dirt cheap, Thermaltake TR2 500w @ $58.  

The price isn't relevant.  The quality and rail configuration is, and
whether it's been damaged.  I checked the spec on your TR2-500
yesterday.  It has dual +12V rails, one rated at 18A and one at 17A.  I
was unable to locate a wiring diagram for it.  On paper it should have
plenty of juice for your gear when in working order.  My assumption here
is that something internal to it may have failed.

> Should I buy all new
> everything?  

I wouldn't.  Most of your gear is probably fine.  Get the PSU swapped
out and see if that fixes it.  You may still have to wipe the drives and
build a new array.  You should know pretty quickly if the PSU swap fixed
the problem, as drives will not continue to drop, or they will.  You
already have a new mobo in hand, so if the PSU isn't the problem, swap
the mobo.  That's a good chassis design with good airflow assuming you
keep the front fans in it.  Why you'd leave them removed is beyond me.

> If so, while I'm at can you suggest a set of consumer
> level hardware ideal running a personal mdadm server.  Powered but not
> overpowered, reliable not bleeding edge.  If I need 6-8 sata ports,
> should I do onboard or get a controller?

A new HBA shouldn't be necessary.  But if you choose to go that route
further down the road I'd recommend an LSI 9211-8i.

> I still have one backup allthough I'm very nervous now since it's on a
> 3 disk RAID0, just asking to implode (created in an emergency).

I assume this resides on a different machine.

Swap the PSU.  Recover the array if possible.  If not blow it away and
create new.  If no drives drop out you're probably golden and the PSU
fixed the problem.  If they drop, swap in the new mobo.  At that point
you'll have replaced everything that could be the source of the problem
but for the remaining original drives.  They can't all be bad, if any.
Always run with those front fans installed.

-- 
Stan





  reply	other threads:[~2013-07-02 19:44 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-12 13:47 Mdadm server eating drives Barrett Lewis
2013-06-12 13:57 ` David Brown
2013-06-12 14:44 ` Phil Turmel
2013-06-12 15:41 ` Adam Goryachev
     [not found]   ` <CAPSPcXihHrAi2TB9Fuxb1qOGMc_WzwGoXAA7nHdwe2knkO0LkQ@mail.gmail.com>
     [not found]     ` <CAPSPcXib4YZ9Ah-jLvL_kPwpKHLxaGT0rNaDL4XQcFm=RtjcAQ@mail.gmail.com>
2013-06-14  0:19       ` Barrett Lewis
2013-06-14  2:08         ` Phil Turmel
     [not found]           ` <CAPSPcXgMxOF-C2Szu_nf4ZLDC8p+yJFOtvLPu7xy1DTW9VAHjg@mail.gmail.com>
2013-06-14 21:18             ` Barrett Lewis
2013-06-14 21:20               ` Barrett Lewis
2013-06-14 21:25                 ` Phil Turmel
2013-06-14 21:30                   ` Phil Turmel
2013-06-17 21:37                     ` Barrett Lewis
2013-06-18  4:13                       ` Mikael Abrahamsson
2013-06-27  0:23                         ` Barrett Lewis
2013-06-27 17:13                           ` Nicolas Jungers
2013-07-02  0:17                             ` Barrett Lewis
2013-07-02  1:57                               ` Stan Hoeppner
2013-07-02 15:48                                 ` Barrett Lewis
2013-07-02 19:44                                   ` Stan Hoeppner [this message]
2013-07-02 19:54                                     ` Stan Hoeppner
2013-07-02 20:07                                     ` Jon Nelson
2013-07-02 20:23                                       ` Stan Hoeppner
2013-07-02 20:58                                     ` Barrett Lewis
2013-07-03  1:50                                       ` Stan Hoeppner
2013-07-03  5:26                                         ` Barrett Lewis
2013-07-03 14:03                                           ` Jon Nelson
2013-07-03 14:36                                             ` Phil Turmel
2013-07-03 17:32                                             ` Stan Hoeppner
2013-07-03 19:47                                               ` Barrett Lewis
2013-07-03 20:38                                                 ` Jon Nelson
2013-07-04  2:21                                                 ` Stan Hoeppner
2013-07-03 17:05                                           ` Stan Hoeppner
2013-07-02 21:49                               ` Phil Turmel
2013-06-14 21:24               ` Phil Turmel
2013-07-29 22:25           ` Roy Sigurd Karlsbakk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51D32DBB.8030401@hardwarefreak.com \
    --to=stan@hardwarefreak.com \
    --cc=barrett.lewis.mitsi@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).