linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Neil Brown <neilb@suse.de>
To: Mike Hartman <mike@hartmanipulation.com>
Cc: Jon@ehardcastle.com, linux-raid@vger.kernel.org
Subject: Re: Accidental grow before add
Date: Tue, 5 Oct 2010 17:24:25 +1100	[thread overview]
Message-ID: <20101005172425.4e6a6711@notabene> (raw)
In-Reply-To: <AANLkTikDgY_icdc54FiaNVbAwGZ2CvOYtkVeKfTzYvtN@mail.gmail.com>

On Thu, 30 Sep 2010 12:13:27 -0400
Mike Hartman <mike@hartmanipulation.com> wrote:

> In the spirit of providing full updates for interested parties/future Googlers:
> 
> > I'm thinking it's going through the original
> > reshape I kicked off (transforming it from an intact 7 disk RAID 6 to
> > a degraded 8 disk RAID 6) and then when it gets to the end it will run
> > another reshape to pick up the new spare.
> 
> Well that "first" reshape finally finished and it looks like it
> actually did switch over to bringing in the new spare at some point in
> midstream. I only noticed it after the reshape completed, but here's
> the window where it happened.
> 
> 
> 23:02 (New spare still unused):
> 
> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
> md0 : active raid6 sdk1[0] md3p1[8](S) sde1[6] sdf1[5] md1p1[4] sdl1[3] sdj1[1]
>       7324227840 blocks super 1.2 level 6, 256k chunk, algorithm 2
> [8/6] [UUUUUU__]
>       [===============>.....]  reshape = 76.4% (1119168512/1464845568)
> finish=654.5min speed=8801K/sec
> 
> md3 : active raid0 sdb1[0] sdh1[1]
>       1465141760 blocks super 1.2 128k chunks
> 
> md1 : active raid0 sdi1[0] sdm1[1]
>       1465141760 blocks super 1.2 128k chunks
> 
> unused devices: <none>
> 
> 
> 23:03 (Spare flag is gone, although it's not marked as "Up" yet further down):
> 
> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
> md0 : active raid6 sdk1[0] md3p1[8] sde1[6] sdf1[5] md1p1[4] sdl1[3] sdj1[1]
>       8789073408 blocks super 1.2 level 6, 256k chunk, algorithm 2
> [8/6] [UUUUUU__]
>       [===============>.....]  recovery = 78.7%
> (1152999432/1464845568) finish=161.1min speed=32245K/sec
> 
> md3 : active raid0 sdb1[0] sdh1[1]
>       1465141760 blocks super 1.2 128k chunks
> 
> md1 : active raid0 sdi1[0] sdm1[1]
>       1465141760 blocks super 1.2 128k chunks
> 
> unused devices: <none>

This is really strange.  I cannot reproduce any behaviour like this.
What kernel are you using?

What should happen is that the reshape will continue to the end, and then a
recovery will start from the beginning of the array, incorporating the new
device.  This is what happens in my tests.

At about 84% the reshape should start going a lot faster as it no longer
needs to read data - it just writes zeros.  But there is nothing interesting
that can happen around 77%.





> 
> 
> 
> 14:57 (It seemed to stall at the percent complete above for about 16 hours):

This is also extremely odd.  I think you are saying that the 'speed' stayed
at a fairly normal level, but the 'recovery =' percent didn't change.
Looking at the code - that cannot happen!

Maybe there is a perfectly reasonable explanation - possibly dependant on the
particular kernel you are using - but I cannot see it.

I would certainly recommend a 'check' and a 'fsck' (if you haven't already).

NeilBrown




> 
> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
> md0 : active raid6 sdk1[0] md3p1[8] sde1[6] sdf1[5] md1p1[4] sdl1[3] sdj1[1]
>       8789073408 blocks super 1.2 level 6, 256k chunk, algorithm 2
> [8/6] [UUUUUU__]
>       [===============>.....]  recovery = 79.1%
> (1160057740/1464845568) finish=161.3min speed=31488K/sec
> 
> md3 : active raid0 sdb1[0] sdh1[1]
>       1465141760 blocks super 1.2 128k chunks
> 
> md1 : active raid0 sdi1[0] sdm1[1]
>       1465141760 blocks super 1.2 128k chunks
> 
> unused devices: <none>
> 
> 
> 
> 15:01 (And the leap forward):
> 
> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
> md0 : active raid6 sdk1[0] md3p1[8] sde1[6] sdf1[5] md1p1[4] sdl1[3] sdj1[1]
>       8789073408 blocks super 1.2 level 6, 256k chunk, algorithm 2
> [8/6] [UUUUUU__]
>       [==================>..]  recovery = 92.3%
> (1352535224/1464845568) finish=58.9min speed=31729K/sec
> 
> md3 : active raid0 sdb1[0] sdh1[1]
>       1465141760 blocks super 1.2 128k chunks
> 
> md1 : active raid0 sdi1[0] sdm1[1]
>       1465141760 blocks super 1.2 128k chunks
> 
> unused devices: <none>
> 
> 
> 
> 16:05 (Finishing clean, with only the drive that failed in mid-reshape
> still missing):
> 
> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
> md0 : active raid6 sdk1[0] md3p1[8] sde1[6] sdf1[5] md1p1[4] sdl1[3] sdj1[1]
>       8789073408 blocks super 1.2 level 6, 256k chunk, algorithm 2
> [8/7] [UUUUUUU_]
> 
> md3 : active raid0 sdb1[0] sdh1[1]
>       1465141760 blocks super 1.2 128k chunks
> 
> md1 : active raid0 sdi1[0] sdm1[1]
>       1465141760 blocks super 1.2 128k chunks
> 
> unused devices: <none>
> 
> 
> So it seemed to pause for about 16 hours to pull in the spare, but
> that's 4-5 times faster than it would normally take to grow the array
> onto a new one. I assume that's because I was already reshaping the
> array to fit across 8 disks (they just weren't all there) so when it
> saw the new one it only had to update the new disk. Hopefully it will
> go that fast when I replace the other disk that died.
> 
> Everything seems to have worked out ok - I just did a forced fsck on
> the filesystem and it didn't mention correcting anything. Mounted it
> and everything seems to be intact. Hopefully this whole thread will be
> useful for someone in a similar situation. Thanks to everyone for the
> help.
> 
> Mike
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


      reply	other threads:[~2010-10-05  6:24 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-26  7:27 Accidental grow before add Mike Hartman
2010-09-26  9:39 ` Mike Hartman
2010-09-26  9:54   ` Mike Hartman
2010-09-26  9:59     ` Mikael Abrahamsson
2010-09-26 10:18       ` Mike Hartman
2010-09-26 10:38         ` Robin Hill
2010-09-26 19:34           ` Mike Hartman
2010-09-26 21:22             ` Robin Hill
2010-09-27  8:11 ` Jon Hardcastle
2010-09-27  9:05   ` Mike Hartman
2010-09-28 15:14     ` Nagilum
2010-10-05  5:18       ` Neil Brown
2010-09-30 16:13     ` Mike Hartman
2010-10-05  6:24       ` Neil Brown [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101005172425.4e6a6711@notabene \
    --to=neilb@suse.de \
    --cc=Jon@ehardcastle.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=mike@hartmanipulation.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).