Re: RAID6 growing interrupted, array won't assemble or resume growing

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Phil Turmel <philip@turmel.org>
To: Nic Wolfe <nic@wolfeden.ca>
Cc: linux-raid@vger.kernel.org
Subject: Re: RAID6 growing interrupted, array won't assemble or resume growing
Date: Thu, 06 Jun 2013 13:31:21 -0400	[thread overview]
Message-ID: <51B0C769.1070009@turmel.org> (raw)
In-Reply-To: <CAGjXdAhMHEFFc1C0XUSY_K1ZxwaRG-x=NEw9H2Aiv+YtNat-tw@mail.gmail.com>

On 06/06/2013 02:41 AM, Nic Wolfe wrote:
> First a little bit of background about my setup and how I got into this state:

Very good report.

> I'm running an older version of ubuntu with a 2.6.24.5 kernel and
> mdadm 2.6.3. I had a 5x2TB raid6 array which I attempted to grow to a
> 6x2TB array. While it was growing I had some hardware problems and the
> disks in the array sporadically connected/disconnected. This put the
> array in a bad state.

The old kernel and mdadm concern me.  Patches go through the mailing
list pretty steadily, both for features and bugs.

> After fixing my hardware issues and getting the PC back up I had a
> problem where after booting mdadm would consume all my RAM trying to
> assemble my array (oom_killer started killing indiscriminately and I
> couldn't get on the PC to shut it down, had to power cycle it). I
> added some more memory (from 2GB to 4GB) and mdadm now only takes up
> about 70% before it exits with no results that I can tell. Below are
> the processes which run when I boot:

This sounds like an udev issue.  Probably not a problem on a stable
system, but you have an intermediate state.

[trim /]

> So anyway now that I have the system stable and all 6 drives hooked up
> I would very much like to get the array working again.
> 
> I have the following in my mdadm.conf: ARRAY /dev/md1 level=raid6
> num-devices=5 UUID=4672ced4:81401dbc:52723fc8:3fe02f5a
> (it is currently commented out, note that it didn't get updated after
> growing to 6)

mdadm is never updated automatically by the vanilla tools.  You get to
do that yourself.  Although you'd be fine to simply remove the level=
and num-devices= clauses.  (Remember to update your initramfs, too.)

> Below is the --examine for all 6 drives:

Yes!  The most important data you could report.

> midgetspy@MidgetNAS:~$ sudo mdadm --examine /dev/sda
> mdadm: No md superblock detected on /dev/sda.
> midgetspy@MidgetNAS:~$ sudo mdadm --examine /dev/sdb
> /dev/sdb:
>           Magic : a92b4efc
>         Version : 00.91.00
                    ^^^^^^^^
This means a normally v0.90 array has a reshape in progress.  That
prevents really old kernels from mistakenly assembling it.

>            UUID : 4672ced4:81401dbc:52723fc8:3fe02f5a (local to host MidgetNAS)
>   Creation Time : Wed Jun  2 21:11:18 2010
>      Raid Level : raid6
>   Used Dev Size : 1953431488 (1862.94 GiB 2000.31 GB)
>      Array Size : 7813725952 (7451.75 GiB 8001.26 GB)
>    Raid Devices : 6
>   Total Devices : 6
> Preferred Minor : 1
> 
>   Reshape pos'n : 665856 (650.36 MiB 681.84 MB)
>   Delta Devices : 1 (5->6)

Your reshape is barely started.  Presumably you specified a --backup
clause in the original --grow command.  You will need that file.

[trim /]

> How should I proceed? I'm far enough out of my depth that I'm hesitant
> to try anything for fear of causing more damage. Should I update my
> mdadm.conf to have num-devices=6 and see if it sorts itself out?

No.

> Try to force assemble the 5 drives with superblocks?

Yes, but see below.

> Create a "new" array out of them?

Absolutely not.

> Any input would be greatly appreciated.

Modern mdadm should be able to force assemble this and continue without
problems.  Rather than operate within a questionable environment, I
would strongly encourage you to perform the forced assembly with a
recent live cd.  I personally use "SystemRescueCD", and I know it has
the appropriate kernel support and tools.

But.  You need to share more information about your hardware problems.
Dmesg, etc.  There are commonly-encountered configuration problems that
appear to be mysterious drive failures.  If you know all about error
recovery control, please elaborate.  Otherwise, please share the output
of "smartctl -x /dev/sdX" for all of your member devices.

Phil

next prev parent reply	other threads:[~2013-06-06 17:31 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-06  6:41 RAID6 growing interrupted, array won't assemble or resume growing Nic Wolfe
2013-06-06 17:31 ` Phil Turmel [this message]
2013-06-07  4:15   ` Nic Wolfe
2013-06-07 12:43     ` Phil Turmel
2013-06-19  6:21       ` Nic Wolfe
2013-06-19 18:36         ` Phil Turmel
2013-06-19 23:52           ` Nic Wolfe
2013-06-21  4:17           ` Nic Wolfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51B0C769.1070009@turmel.org \
    --to=philip@turmel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=nic@wolfeden.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.