Re: RAID6 growing interrupted, array won't assemble or resume growing

Linux RAID subsystem development
 help / color / mirror / Atom feed

From: Phil Turmel <philip@turmel.org>
To: Nic Wolfe <nic@wolfeden.ca>
Cc: linux-raid@vger.kernel.org
Subject: Re: RAID6 growing interrupted, array won't assemble or resume growing
Date: Thu, 06 Jun 2013 13:31:21 -0400	[thread overview]
Message-ID: <51B0C769.1070009@turmel.org> (raw)
In-Reply-To: <CAGjXdAhMHEFFc1C0XUSY_K1ZxwaRG-x=NEw9H2Aiv+YtNat-tw@mail.gmail.com>

On 06/06/2013 02:41 AM, Nic Wolfe wrote:
> First a little bit of background about my setup and how I got into this state:

Very good report.

> I'm running an older version of ubuntu with a 2.6.24.5 kernel and
> mdadm 2.6.3. I had a 5x2TB raid6 array which I attempted to grow to a
> 6x2TB array. While it was growing I had some hardware problems and the
> disks in the array sporadically connected/disconnected. This put the
> array in a bad state.

The old kernel and mdadm concern me.  Patches go through the mailing
list pretty steadily, both for features and bugs.

> After fixing my hardware issues and getting the PC back up I had a
> problem where after booting mdadm would consume all my RAM trying to
> assemble my array (oom_killer started killing indiscriminately and I
> couldn't get on the PC to shut it down, had to power cycle it). I
> added some more memory (from 2GB to 4GB) and mdadm now only takes up
> about 70% before it exits with no results that I can tell. Below are
> the processes which run when I boot:

This sounds like an udev issue.  Probably not a problem on a stable
system, but you have an intermediate state.

[trim /]

> So anyway now that I have the system stable and all 6 drives hooked up
> I would very much like to get the array working again.
> 
> I have the following in my mdadm.conf: ARRAY /dev/md1 level=raid6
> num-devices=5 UUID=4672ced4:81401dbc:52723fc8:3fe02f5a
> (it is currently commented out, note that it didn't get updated after
> growing to 6)

mdadm is never updated automatically by the vanilla tools.  You get to
do that yourself.  Although you'd be fine to simply remove the level=
and num-devices= clauses.  (Remember to update your initramfs, too.)

> Below is the --examine for all 6 drives:

Yes!  The most important data you could report.

> midgetspy@MidgetNAS:~$ sudo mdadm --examine /dev/sda
> mdadm: No md superblock detected on /dev/sda.
> midgetspy@MidgetNAS:~$ sudo mdadm --examine /dev/sdb
> /dev/sdb:
>           Magic : a92b4efc
>         Version : 00.91.00
                    ^^^^^^^^
This means a normally v0.90 array has a reshape in progress.  That
prevents really old kernels from mistakenly assembling it.

>            UUID : 4672ced4:81401dbc:52723fc8:3fe02f5a (local to host MidgetNAS)
>   Creation Time : Wed Jun  2 21:11:18 2010
>      Raid Level : raid6
>   Used Dev Size : 1953431488 (1862.94 GiB 2000.31 GB)
>      Array Size : 7813725952 (7451.75 GiB 8001.26 GB)
>    Raid Devices : 6
>   Total Devices : 6
> Preferred Minor : 1
> 
>   Reshape pos'n : 665856 (650.36 MiB 681.84 MB)
>   Delta Devices : 1 (5->6)

Your reshape is barely started.  Presumably you specified a --backup
clause in the original --grow command.  You will need that file.

[trim /]

> How should I proceed? I'm far enough out of my depth that I'm hesitant
> to try anything for fear of causing more damage. Should I update my
> mdadm.conf to have num-devices=6 and see if it sorts itself out?

No.

> Try to force assemble the 5 drives with superblocks?

Yes, but see below.

> Create a "new" array out of them?

Absolutely not.

> Any input would be greatly appreciated.

Modern mdadm should be able to force assemble this and continue without
problems.  Rather than operate within a questionable environment, I
would strongly encourage you to perform the forced assembly with a
recent live cd.  I personally use "SystemRescueCD", and I know it has
the appropriate kernel support and tools.

But.  You need to share more information about your hardware problems.
Dmesg, etc.  There are commonly-encountered configuration problems that
appear to be mysterious drive failures.  If you know all about error
recovery control, please elaborate.  Otherwise, please share the output
of "smartctl -x /dev/sdX" for all of your member devices.

Phil

next prev parent reply	other threads:[~2013-06-06 17:31 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-06  6:41 RAID6 growing interrupted, array won't assemble or resume growing Nic Wolfe
2013-06-06 17:31 ` Phil Turmel [this message]
2013-06-07  4:15   ` Nic Wolfe
2013-06-07 12:43     ` Phil Turmel
2013-06-19  6:21       ` Nic Wolfe
2013-06-19 18:36         ` Phil Turmel
2013-06-19 23:52           ` Nic Wolfe
2013-06-21  4:17           ` Nic Wolfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51B0C769.1070009@turmel.org \
    --to=philip@turmel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=nic@wolfeden.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox