All of lore.kernel.org
 help / color / mirror / Atom feed
From: Phil Turmel <philip@turmel.org>
To: Johannes Truschnigg <johannes@truschnigg.info>
Cc: linux-raid@vger.kernel.org
Subject: Re: What just happened to my disks/RAID5 array?
Date: Tue, 13 Sep 2011 07:37:53 -0400	[thread overview]
Message-ID: <4E6F4091.7050206@turmel.org> (raw)
In-Reply-To: <4E6F13F7.6070507@truschnigg.info>

Good Morning Johannes,

On 09/13/2011 04:27 AM, Johannes Truschnigg wrote:
> Dear list members,
> 
> my server at home just mailed in multiple FAIL events from members of
> the RAID5 array in it. I won't be able to get to the machine during
> the next ten or so hours, but I'd like to be prepared as best as I
> can when I face the disaster that apparently struck. I attached the
> relevant dmesg excerpt, as well as the current mdstat contents.
> Theories explaining what could have happened - and how to deal with
> such a scenario - are highly appreciated, as only some of the data on
> the array is actually backed up elsewhere. If you need any additional
> information about the system or its setup, please ask right away!
> 
> I do have SSH access to the box.

From a brief review of your dmesg, it all looks like hardware.  Some ideas come to mind:

1)  Controller failure.
2)  Power supply failure (possibly partial failure of a multi-rail PS).
3)  Cooling failure.

Simultaneous failure of that many devices strains credulity, so I doubt you've lost your array.  One possible variant of "2" would be a failed drive that draws enough current to drop the voltage to its sibling drives.

Since some drives are still "alive", they'll have newer event counts than the devices that went offline.  When you fix the root cause, you may need to use "--assemble --force" to get mdadm to restart your array.

The output of "lsdrv" [1] would be helpful in offering more specific advice, along with "mdadm -D" of the array and "mdadm -E" of all of its components (when you get them back).

HTH,

Phil

[1] http://github.com/pturmel/lsdrv

  reply	other threads:[~2011-09-13 11:37 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-13  8:27 What just happened to my disks/RAID5 array? Johannes Truschnigg
2011-09-13 11:37 ` Phil Turmel [this message]
2011-09-13 18:56   ` Johannes Truschnigg
2011-09-14 11:41     ` Phil Turmel
2011-09-14 18:17       ` Johannes Truschnigg
2011-09-14 19:19         ` Phil Turmel
2012-01-06 10:51           ` Johannes Truschnigg
2012-01-06 13:16             ` Phil Turmel
2012-01-06 13:46               ` Johannes Truschnigg
2012-01-06 14:51                 ` Phil Turmel
2012-01-06 15:28                   ` Johannes Truschnigg
2012-01-07 14:23                     ` John Robinson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E6F4091.7050206@turmel.org \
    --to=philip@turmel.org \
    --cc=johannes@truschnigg.info \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.