From: Neil Brown <neilb@suse.de>
To: "Simon SÉHIER" <simon@sehier.fr>
Cc: linux-raid@vger.kernel.org
Subject: Re: reboot before reshape from raid 5 to raid 6 (was in state resync=DELAYED). Doesn't assemble anymore.
Date: Thu, 14 Oct 2010 07:24:57 +1100 [thread overview]
Message-ID: <20101014072457.0e7205a8@notabene> (raw)
In-Reply-To: <20101013173212.GB25675@leontine.pompomgali.com>
Thanks for the extra details.
It would probably work to just add '-f' to the assemble line. Then it should
assemble the array, include the space (sdb4 currently thinks it is a spare -
not sure why), and proceed with the reshape.
The alternative is simply to re-create the array:
mdadm -C /dev/md4 -l5 -n5 -c 512 --layout ls /dev/sd{c,d,e,f,g}4 --assume-clean
Then fsck to make sure it looks OK - it should as long as the devices haven't
renamed themselves again.
Then add sdb4 as a spare and try the 'grow' again.
I'd probably try the "--assemble -f" first. It if completely fails try the
-C.
If it works - great.
If it seems to start, but doesn't progress properly (unlikely), don't try the
-C - show my the new "-E" output and we'll take it from there.
NeilBrown
On Wed, 13 Oct 2010 19:32:12 +0200
Simon SÉHIER <simon@sehier.fr> wrote:
> On Wed, Oct 13, 2010 at 07:37:59PM +1100, Neil Brown wrote:
> > On Wed, 13 Oct 2010 10:18:33 +0200
> > Simon SÉHIER <simon@sehier.fr> wrote:
> >
> > > On Wed, Oct 13, 2010 at 11:08:23AM +1100, Neil Brown wrote:
> > > > On Wed, 13 Oct 2010 00:59:52 +0200
> > > > Simon SÉHIER <simon@sehier.fr> wrote:
> > > >
> > > > > On 12 oct. 2010 22:46:12, Neil Brown wrote :
> > > > > > On Tue, 12 Oct 2010 16:27:53 +0200
> > > > > >
> > > > > > Simon S <simon@sehier.fr> wrote:
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I had a config with 5 disks and 3 raid 5 arrays:
> > > > > > >
> > > > > > > md2 : system root
> > > > > > > md3 : swap
> > > > > > > md4 : data
> > > > > > >
> > > > > > > I added a 6th disk with the intention of growing my raid5 into raid6.
> > > > > > >
> > > > > > > The step I used were :
> > > > > > >
> > > > > > > # mdadm /dev/mdX -a /dev/newdiskX
> > > > > > > # mdadm -G --level 6 -n 6 /dev/mdX --backup-file /mdXbackup
> > > > > > >
> > > > > > > (yes, with backup file on root partition md2...)
> > > > > >
> > > > > > Bad idea.. Very bad idea.
> > > > > >
> > > > > > > The md3 array reshaped without any problem.
> > > > > > > md2 seemed to reshape well until it reaches 50.4%, then the rebuild speed
> > > > > > > stalled at 14Kb/s.
> > > > > >
> > > > > > This is the expected consequence of that bad idea. Unfortunately it would
> > > > > > be hard to reliably get mdadm to complain about that, though I guess the
> > > > > > common cases are easy to protect against ... added to 'todo' list
> > > > > >
> > > > > > > md4 was still in the state "resync=DELAYED" then.
> > > > > > >
> > > > > > > As the rebuild process seemed hung, I restart the machine ... bad idea.
> > > > > >
> > > > > > Not really, nothing else would have worked.
> > > > > >
> > > > > > > Now mdadm refuses to assemble md2 and md4, and displays this message :
> > > > > > > mdadm: Failed to restore critical section for reshape, sorry.
> > > > > > >
> > > > > > > Possibly you needed to specify the --backup-file
> > > > > > >
> > > > > > > md2 is my linux installation, not very bad if I lose this one.
> > > > > > >
> > > > > > > md4 however contains valuable data.
> > > > > > >
> > > > > > > While md4 was still in the state resync=DELAYED before the shutdown, I
> > > > > > > expect it should not has been (to much) modified and can be recovered.
> > > > > >
> > > > > > Very true.
> > > > > >
> > > > > > > Any idea on how I could safely do it ?
> > > > > > >
> > > > > > > Should I give a try to the hack "Get 'Grow_restart' to always return 0."
> > > > > > > mentionned by Neil Brown on 22 april 2010 in this mailing list ?
> > > > > >
> > > > > > That is your best bet. I plan to make that easier to do in mdadm-3.2 (no
> > > > > > recompile necessary).
> > > > > >
> > > > > > Before you do, check "mdadm -E /dev/newdiskX" and make sure the "Reshape
> > > > > > position" is 0. If it is you should be fine. I
> > > > > >
> > > > > > It won't be for md2 of course. So md will quite possible have some
> > > > > > corruption. Run fsck on it an it will probably be mostly OK, but there is
> > > > > > a reasonable chance that some files will be corrupted. Whether and when
> > > > > > you will notice is impossible to guess.
> > > > >
> > > > > Thanks for your answer Neil,
> > > > >
> > > > > I recompiled mdadm 3.1.4 with return 0 in the beginning of the function
> > > > > Grow_restart (mistake was made with 3.1.2). I have one more question :
> > > > >
> > > > > I first tried assembling the least valued array, md2. It starts reshaping from
> > > > > where it stops, in the first seconds around 1300 K/s, and rapidly above 10K/s.
> > > > >
> > > > > While my backup file for md4 (the array I care about) was also on md2. Do I
> > > > > have to expect a problem assembling md4 with the modified version of mdadm, or
> > > > > can I go without worying md2 (rootfs) isn't assembled ?
> > > >
> > > > The backup file for md4 would have been essentially empty. It can be created
> > > > anew elsewhere. I probably wouldn't rick using the original backup file
> > > > even if you can access it, as it could be corrupted.
> > > > So when you assemble md4, give it a fresh backup file in some stable location,
> > > > and use the hacked mdadm.
> > > >
> > > > NeilBrown
> > > >
> > >
> > > I tried
> > >
> > > # mdadm -A --backup-file=/new-empty-md4backup-file /dev/md4
> > >
> > > but the array is now in "inactive" state with 6 spares :
> > >
> > > md4 : inactive sdc4[0](S) sdh4[6](S) sdg4[5](S) sdf4[3](S) sde4[2](S) sdd4[1](S)
> > > 1411288041 blocks super 1.2
> > >
> > > I'm a bit confuse on what I could do now.
> >
> > That surprises me a little.
> > Try:
> > mdadm -S /dev/md4
> > mdadm -Avv --backup-file=/new-empty-md4backup-file /dev/md4
> > dmesg | tail -100
> > mdadm -E /dev/sd[cd]4
> >
> > and send all of the output.
> >
> > NeilBrown
> >
>
> # mdadm -S /dev/md4
>
> mdadm: stopped /dev/md4
>
>
> # mdadm -Avv --backup-file=/new-empty-md4backup-file /dev/md4
>
> mdadm: looking for devices for /dev/md4
> mdadm: no RAID superblock on /dev/md/3
> mdadm: /dev/md/3 has wrong uuid.
> mdadm: no RAID superblock on /dev/md1
> mdadm: /dev/md1 has wrong uuid.
> mdadm: cannot open device /dev/sdg3: Device or resource busy
> mdadm: /dev/sdg3 has wrong uuid.
> mdadm: /dev/sdg2 has wrong uuid.
> mdadm: no RAID superblock on /dev/sdg1
> mdadm: /dev/sdg1 has wrong uuid.
> mdadm: cannot open device /dev/sdg: Device or resource busy
> mdadm: /dev/sdg has wrong uuid.
> mdadm: cannot open device /dev/sdf3: Device or resource busy
> mdadm: /dev/sdf3 has wrong uuid.
> mdadm: /dev/sdf2 has wrong uuid.
> mdadm: no RAID superblock on /dev/sdf1
> mdadm: /dev/sdf1 has wrong uuid.
> mdadm: cannot open device /dev/sdf: Device or resource busy
> mdadm: /dev/sdf has wrong uuid.
> mdadm: cannot open device /dev/sde3: Device or resource busy
> mdadm: /dev/sde3 has wrong uuid.
> mdadm: /dev/sde2 has wrong uuid.
> mdadm: cannot open device /dev/sde1: Device or resource busy
> mdadm: /dev/sde1 has wrong uuid.
> mdadm: cannot open device /dev/sde: Device or resource busy
> mdadm: /dev/sde has wrong uuid.
> mdadm: cannot open device /dev/sdd3: Device or resource busy
> mdadm: /dev/sdd3 has wrong uuid.
> mdadm: /dev/sdd2 has wrong uuid.
> mdadm: cannot open device /dev/sdd1: Device or resource busy
> mdadm: /dev/sdd1 has wrong uuid.
> mdadm: cannot open device /dev/sdd: Device or resource busy
> mdadm: /dev/sdd has wrong uuid.
> mdadm: cannot open device /dev/sdc3: Device or resource busy
> mdadm: /dev/sdc3 has wrong uuid.
> mdadm: /dev/sdc2 has wrong uuid.
> mdadm: cannot open device /dev/sdc1: Device or resource busy
> mdadm: /dev/sdc1 has wrong uuid.
> mdadm: cannot open device /dev/sdc: Device or resource busy
> mdadm: /dev/sdc has wrong uuid.
> mdadm: cannot open device /dev/sdb3: Device or resource busy
> mdadm: /dev/sdb3 has wrong uuid.
> mdadm: /dev/sdb2 has wrong uuid.
> mdadm: cannot open device /dev/sdb1: Device or resource busy
> mdadm: /dev/sdb1 has wrong uuid.
> mdadm: cannot open device /dev/sdb: Device or resource busy
> mdadm: /dev/sdb has wrong uuid.
> mdadm: cannot open device /dev/sda5: Device or resource busy
> mdadm: /dev/sda5 has wrong uuid.
> mdadm: no RAID superblock on /dev/sda2
> mdadm: /dev/sda2 has wrong uuid.
> mdadm: cannot open device /dev/sda1: Device or resource busy
> mdadm: /dev/sda1 has wrong uuid.
> mdadm: cannot open device /dev/sda: Device or resource busy
> mdadm: /dev/sda has wrong uuid.
> mdadm: /dev/sdg4 is identified as a member of /dev/md4, slot 4.
> mdadm: /dev/sdf4 is identified as a member of /dev/md4, slot 3.
> mdadm: /dev/sde4 is identified as a member of /dev/md4, slot 2.
> mdadm: /dev/sdd4 is identified as a member of /dev/md4, slot 1.
> mdadm: /dev/sdc4 is identified as a member of /dev/md4, slot 0.
> mdadm: /dev/sdb4 is identified as a member of /dev/md4, slot -1.
> mdadm:/dev/md4 has an active reshape - checking if critical section needs to be restored
> mdadm: added /dev/sdd4 to /dev/md4 as 1
> mdadm: added /dev/sde4 to /dev/md4 as 2
> mdadm: added /dev/sdf4 to /dev/md4 as 3
> mdadm: added /dev/sdg4 to /dev/md4 as 4
> mdadm: no uptodate device for slot 5 of /dev/md4
> mdadm: added /dev/sdb4 to /dev/md4 as -1
> mdadm: added /dev/sdc4 to /dev/md4 as 0
> mdadm: /dev/md4 assembled from 5 drives and 1 spare - not enough to start the array while not clean - consider --force.
>
>
> # dmesg | tail -n100
>
> [ 11.127010] HDA Intel 0000:00:1b.0: PCI INT A -> GSI 21 (level, low) -> IRQ 21
> [ 11.127038] HDA Intel 0000:00:1b.0: setting latency timer to 64
> [ 11.196601] alloc irq_desc for 16 on node -1
> [ 11.196603] alloc kstat_irqs on node -1
> [ 11.196611] pci 0000:00:02.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
> [ 11.196614] pci 0000:00:02.0: setting latency timer to 64
> [ 11.203560] alloc irq_desc for 32 on node -1
> [ 11.203563] alloc kstat_irqs on node -1
> [ 11.203573] pci 0000:00:02.0: irq 32 for MSI/MSI-X
> [ 11.203602] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
> [ 11.230846] Error: Driver 'pcspkr' is already registered, aborting...
> [ 11.251260] hda_codec: ALC662 rev1: BIOS auto-probing.
> [ 11.252651] input: HDA Digital PCBeep as /devices/pci0000:00/0000:00:1b.0/input/input5
> [ 11.848246] md: md2 stopped.
> [ 11.852104] md: bind<sdd2>
> [ 11.852253] md: bind<sde2>
> [ 11.852386] md: bind<sdf2>
> [ 11.852636] md: bind<sdg2>
> [ 11.852845] md: bind<sdb2>
> [ 11.852932] md: bind<sdc2>
> [ 11.882369] raid5: reshape will continue
> [ 11.882378] raid5: device sdc2 operational as raid disk 0
> [ 11.882380] raid5: device sdg2 operational as raid disk 4
> [ 11.882382] raid5: device sdf2 operational as raid disk 3
> [ 11.882383] raid5: device sde2 operational as raid disk 2
> [ 11.882385] raid5: device sdd2 operational as raid disk 1
> [ 11.882767] raid5: allocated 6386kB for md2
> [ 11.882797] 0: w=1 pa=18 pr=6 m=2 a=2 r=6 op1=0 op2=0
> [ 11.882799] 5: w=1 pa=18 pr=6 m=2 a=2 r=6 op1=1 op2=0
> [ 11.882801] 4: w=2 pa=18 pr=6 m=2 a=2 r=6 op1=0 op2=0
> [ 11.882803] 3: w=3 pa=18 pr=6 m=2 a=2 r=6 op1=0 op2=0
> [ 11.882805] 2: w=4 pa=18 pr=6 m=2 a=2 r=6 op1=0 op2=0
> [ 11.882807] 1: w=5 pa=18 pr=6 m=2 a=2 r=6 op1=0 op2=0
> [ 11.882809] raid5: raid level 6 set md2 active with 5 out of 6 devices, algorithm 2
> [ 11.882849] RAID5 conf printout:
> [ 11.882851] --- rd:6 wd:5
> [ 11.882852] disk 0, o:1, dev:sdc2
> [ 11.882854] disk 1, o:1, dev:sdd2
> [ 11.882855] disk 2, o:1, dev:sde2
> [ 11.882856] disk 3, o:1, dev:sdf2
> [ 11.882858] disk 4, o:1, dev:sdg2
> [ 11.882859] disk 5, o:1, dev:sdb2
> [ 11.882860] ...ok start reshape thread
> [ 11.882905] md2: detected capacity change from 0 to 34376515584
> [ 11.882970] md: md2 switched to read-write mode.
> [ 11.883452] md: reshape of RAID array md2
> [ 11.883453] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> [ 11.883455] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape.
> [ 11.883459] md: using 128k window, over a total of 8392704 blocks.
> [ 11.954838] md2: unknown partition table
> [ 12.939843] Adding 1648632k swap on /dev/sda5. Priority:-1 extents:1 across:1648632k
> [ 13.142646] EXT3 FS on sda1, internal journal
> [ 13.255820] loop: module loaded
> [ 100.224309] e1000e 0000:00:19.0: irq 30 for MSI/MSI-X
> [ 100.280126] e1000e 0000:00:19.0: irq 30 for MSI/MSI-X
> [ 100.280456] ADDRCONF(NETDEV_UP): eth2: link is not ready
> [ 104.940830] e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
> [ 104.941081] ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
> [ 143.313242] md: md_do_sync() got signal ... exiting
> [ 143.373125] md: md2 stopped.
> [ 143.373138] md: unbind<sdc2>
> [ 143.384082] md: export_rdev(sdc2)
> [ 143.384113] md: unbind<sdb2>
> [ 143.400081] md: export_rdev(sdb2)
> [ 143.400107] md: unbind<sdg2>
> [ 143.416081] md: export_rdev(sdg2)
> [ 143.416108] md: unbind<sdf2>
> [ 143.432080] md: export_rdev(sdf2)
> [ 143.432105] md: unbind<sde2>
> [ 143.448080] md: export_rdev(sde2)
> [ 143.448104] md: unbind<sdd2>
> [ 143.464081] md: export_rdev(sdd2)
> [ 143.464405] md2: detected capacity change from 34376515584 to 0
> [ 252.687538] md: md4 stopped.
> [ 252.690104] md: bind<sdd4>
> [ 252.690266] md: bind<sde4>
> [ 252.690415] md: bind<sdf4>
> [ 252.696210] md: bind<sdg4>
> [ 252.718353] md: bind<sdb4>
> [ 252.723594] md: bind<sdc4>
> [ 332.729180] md: md4 stopped.
> [ 332.729190] md: unbind<sdc4>
> [ 332.740090] md: export_rdev(sdc4)
> [ 332.740165] md: unbind<sdb4>
> [ 332.752030] md: export_rdev(sdb4)
> [ 332.752092] md: unbind<sdg4>
> [ 332.768081] md: export_rdev(sdg4)
> [ 332.768140] md: unbind<sdf4>
> [ 332.784081] md: export_rdev(sdf4)
> [ 332.784139] md: unbind<sde4>
> [ 332.800081] md: export_rdev(sde4)
> [ 332.800141] md: unbind<sdd4>
> [ 332.816089] md: export_rdev(sdd4)
> [ 556.983627] md: md4 stopped.
> [ 556.988921] md: bind<sdd4>
> [ 556.989094] md: bind<sde4>
> [ 556.989239] md: bind<sdf4>
> [ 556.989391] md: bind<sdg4>
> [ 556.989642] md: bind<sdb4>
> [ 556.989787] md: bind<sdc4>
>
>
> # mdadm -E /dev/sd[cd]4
>
> /dev/sdc4:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x4
> Array UUID : ec600d5d:00cc3fc7:862a4878:9d191184
> Name : citrouille:4
> Creation Time : Wed Sep 15 17:28:55 2010
> Raid Level : raid6
> Raid Devices : 6
>
> Avail Dev Size : 470429347 (224.32 GiB 240.86 GB)
> Array Size : 1881714688 (897.27 GiB 963.44 GB)
> Used Dev Size : 470428672 (224.32 GiB 240.86 GB)
> Data Offset : 2048 sectors
> Super Offset : 8 sectors
> State : active
> Device UUID : e8ef9525:cfb44c96:b5e209ea:b307c619
>
> Reshape pos'n : 0
> New Layout : left-symmetric
>
> Update Time : Tue Oct 12 00:01:03 2010
> Checksum : 6dfe6f7b - correct
> Events : 97
>
> Layout : left-symmetric-6
> Chunk Size : 512K
>
> Device Role : Active device 0
> Array State : AAAAA. ('A' == active, '.' == missing)
> /dev/sdd4:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x4
> Array UUID : ec600d5d:00cc3fc7:862a4878:9d191184
> Name : citrouille:4
> Creation Time : Wed Sep 15 17:28:55 2010
> Raid Level : raid6
> Raid Devices : 6
>
> Avail Dev Size : 470429347 (224.32 GiB 240.86 GB)
> Array Size : 1881714688 (897.27 GiB 963.44 GB)
> Used Dev Size : 470428672 (224.32 GiB 240.86 GB)
> Data Offset : 2048 sectors
> Super Offset : 8 sectors
> State : active
> Device UUID : 08a661f3:eace7b1f:26fd20a8:ac0ae049
>
> Reshape pos'n : 0
> New Layout : left-symmetric
>
> Update Time : Tue Oct 12 00:01:03 2010
> Checksum : b32a5d21 - correct
> Events : 97
>
> Layout : left-symmetric-6
> Chunk Size : 512K
>
> Device Role : Active device 1
> Array State : AAAAA. ('A' == active, '.' == missing)
>
>
>
> # mdadm -V
> mdadm - v3.1.4 - 31st August 2010 - with Grow_restart always 0
> (hacked mdadm)
>
> # uname -a
> Linux citrouillerescue 2.6.32-5-amd64 #1 SMP Fri Sep 17 21:50:19 UTC 2010 x86_64 GNU/Linux
>
> # cat /etc/issue.net
> Debian GNU/Linux squeeze/sid
>
> Just in case, I posted the full output of mdadm -E /dev/sd?4 here : http://pastebin.com/zV6s2Npi
>
> Hope it helps.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-10-13 20:24 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-12 14:27 reboot before reshape from raid 5 to raid 6 (was in state resync=DELAYED). Doesn't assemble anymore Simon S
2010-10-12 20:46 ` Neil Brown
2010-10-12 22:59 ` Simon SÉHIER
2010-10-12 23:06 ` Simon SEHIER
2010-10-13 0:08 ` Neil Brown
2010-10-13 8:18 ` Simon SÉHIER
2010-10-13 8:37 ` Neil Brown
2010-10-13 17:32 ` Simon SÉHIER
2010-10-13 20:24 ` Neil Brown [this message]
2010-10-14 8:35 ` [resolved] " Simon SÉHIER
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101014072457.0e7205a8@notabene \
--to=neilb@suse.de \
--cc=linux-raid@vger.kernel.org \
--cc=simon@sehier.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).