* After partition resize, RAID5 array does not assemble on boot
@ 2008-06-03 6:49 Jules Bean
2008-06-03 21:19 ` Jules Bean
0 siblings, 1 reply; 14+ messages in thread
From: Jules Bean @ 2008-06-03 6:49 UTC (permalink / raw)
To: linux-raid
Kernel: 2.6.24 i386
mdadm: 2.6.4
Hi,
I had a RAID5 array in the configuration 250/250/400/400 (so only
250/250/250/250 was actually being used)
After a partition rearrangement it was possible to increase the size
of the two 250 partitions to 400. I did the following:
mdadm --fail partition
mdadm --remove partition
cfdisk resize partition
mdadm --add partition
wait some hours for rebuild to complete
Twice, once for each 250G partition.
The new array was running fine. Here is its status:
champagne:/home/jules# mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.03
Creation Time : Tue Jan 30 21:28:07 2007
Raid Level : raid5
Array Size : 726732096 (693.07 GiB 744.17 GB)
Used Dev Size : 242244032 (231.02 GiB 248.06 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Mon Jun 2 22:46:33 2008
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
UUID : 52252ae8:5d1fd858:31a51f4c:5ff55ddd
Events : 0.1638776
Number Major Minor RaidDevice State
0 8 2 0 active sync /dev/sda2
1 8 18 1 active sync /dev/sdb2
2 8 50 2 active sync /dev/sdd2
3 8 34 3 active sync /dev/sdc2
However I was under the impression --grow --size=max would grow it up
to the real limits of the partitions. This didn't work.
Thinking something was cached in internal tables with incorrect
partition sizes, I rebooted the machine.
Bad idea :(
The RAID array failed to reconstruct. The boot messages said there
were only two working devices, not enough to start array.
/dev/sdd2 and /dev/sd2 (which were the partitions I didn't touch) were
both there. /dev/sda2 and /dev/sdb2 didn't add.
I tried (mistake?) adding /dev/sda2 explicitly with --add but it added
as a spare, not as a proper member.
I tried assembling explicitly with --assemble /dev/sda2 /dev/sdb2
/dev/sdc2 /dev/sdd2 and it complained of no RAID superblock on
/dev/sdb2.
Help? What next ;) Is there enough information in /dev/sdd2 and
/dev/sdc2 to reconstruct the apparently missing superblocks on /dev/sda2
and /dev/sdb2? Do I need to try to resize my partitions back to their
old size so it can find the old superblock? Even if by adding /dev/sda2
as a spare I've corrupted its superblock entirely, sdb2 should still
have enough to save my array with 3 out of 4 devices?
Many thanks,
Jules
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: After partition resize, RAID5 array does not assemble on boot
2008-06-03 6:49 After partition resize, RAID5 array does not assemble on boot Jules Bean
@ 2008-06-03 21:19 ` Jules Bean
2008-06-03 21:27 ` NeilBrown
2008-06-03 21:46 ` Peter Rabbitson
0 siblings, 2 replies; 14+ messages in thread
From: Jules Bean @ 2008-06-03 21:19 UTC (permalink / raw)
To: linux-raid
Jules Bean wrote:
> Help? What next ;) Is there enough information in /dev/sdd2 and
> /dev/sdc2 to reconstruct the apparently missing superblocks on /dev/sda2
> and /dev/sdb2? Do I need to try to resize my partitions back to their
> old size so it can find the old superblock? Even if by adding /dev/sda2
> as a spare I've corrupted its superblock entirely, sdb2 should still
> have enough to save my array with 3 out of 4 devices?
I have become convinced (correct me if you think I'm wrong) that the
problem was cfdisk resizing the partitions but the kernel tables not
being updated.
Therefore although I thought the partitions were 400G, the kernel still
thought they were 250G, and presumably the raid subsystem used that figure.
So the raid subsytem probably recorded its superblock as if the
partitions were still only 250G long? So I ought to be able to find that
superblock again by resizing the partitions back?
Alas I didn't take precise notes of my old partition table (stupid
error). I have tried a couple of cylinder counts near 250G but no luck.
Is there any good way to 'search for' somethign which looks like a RAID
superblock?
Does the mdadm --detail output I pasted in my last message hold any clues?
Jules
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: After partition resize, RAID5 array does not assemble on boot
2008-06-03 21:19 ` Jules Bean
@ 2008-06-03 21:27 ` NeilBrown
2008-06-04 6:31 ` Jules Bean
2008-06-03 21:46 ` Peter Rabbitson
1 sibling, 1 reply; 14+ messages in thread
From: NeilBrown @ 2008-06-03 21:27 UTC (permalink / raw)
To: Jules Bean; +Cc: linux-raid
On Wed, June 4, 2008 7:19 am, Jules Bean wrote:
> Jules Bean wrote:
>> Help? What next ;) Is there enough information in /dev/sdd2 and
>> /dev/sdc2 to reconstruct the apparently missing superblocks on /dev/sda2
>> and /dev/sdb2? Do I need to try to resize my partitions back to their
>> old size so it can find the old superblock? Even if by adding /dev/sda2
>> as a spare I've corrupted its superblock entirely, sdb2 should still
>> have enough to save my array with 3 out of 4 devices?
>
> I have become convinced (correct me if you think I'm wrong) that the
> problem was cfdisk resizing the partitions but the kernel tables not
> being updated.
That sounds likely.... That really should get fixed one day!
>
> Therefore although I thought the partitions were 400G, the kernel still
> thought they were 250G, and presumably the raid subsystem used that
> figure.
>
> So the raid subsytem probably recorded its superblock as if the
> partitions were still only 250G long? So I ought to be able to find that
> superblock again by resizing the partitions back?
>
> Alas I didn't take precise notes of my old partition table (stupid
> error). I have tried a couple of cylinder counts near 250G but no luck.
> Is there any good way to 'search for' somethign which looks like a RAID
> superblock?
>
> Does the mdadm --detail output I pasted in my last message hold any clues?
>
Yes. Based on the "used device size", the smallest device was between
242244096K and 242244160K
Hopefully both of the smaller devices were the same size.
NeilBrown
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: After partition resize, RAID5 array does not assemble on boot
2008-06-03 21:19 ` Jules Bean
2008-06-03 21:27 ` NeilBrown
@ 2008-06-03 21:46 ` Peter Rabbitson
1 sibling, 0 replies; 14+ messages in thread
From: Peter Rabbitson @ 2008-06-03 21:46 UTC (permalink / raw)
To: Jules Bean; +Cc: linux-raid
Jules Bean wrote:
> Is there any good way to 'search for' somethign which looks like a RAID
> superblock?
>
The superblock itself starts with the "magic number" 0xA92B4EFC
HTH
Peter
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: After partition resize, RAID5 array does not assemble on boot
2008-06-03 21:27 ` NeilBrown
@ 2008-06-04 6:31 ` Jules Bean
2008-06-04 6:36 ` Peter Rabbitson
2008-06-04 7:58 ` David Greaves
0 siblings, 2 replies; 14+ messages in thread
From: Jules Bean @ 2008-06-04 6:31 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
NeilBrown wrote:
>> Does the mdadm --detail output I pasted in my last message hold any clues?
>>
>
> Yes. Based on the "used device size", the smallest device was between
> 242244096K and 242244160K
>
> Hopefully both of the smaller devices were the same size.
They were (the same size) but I couldn't find the superblock. I tried
several possible partition sizes around 242244032K (plus or minus a
cylinder or two) and no luck.
In the end I gritted my teeth and, following the advice in
http://joshuahayes.blogspot.com/2006/11/expand-existing-raid-5-array.html
I forced mdadm --create to recreate the array using correct parameters.
mdadm --create /dev/md0 --chunk=64 --level=5 --layout=left-symmetric
--raid-devices=4 --size=242244032 /dev/sda2 /dev/sdb2 /dev/sdd2 /dev/sdc2
and after a reboot, everything came all the way up to multiuser (which
is significant, because /usr was on an LVM on this RAID partition) and
as far as I can see, everything is fine.
Phew!
As to where my superblock has gone, the only theory I have is that the
MD layer knew that my partitions were 400G large while the kernel was
convinced they were 250G large, so the md layer tried to write the
superblock at (approx) +400G, and the kernel refused to do that.
However, since the used dev size was only 242244032, all my actual data
was safe and just using mdadm to recreate the superblocks was all that
was needed.
The lesson here is always reboot after changing partition sizes, unless
you have a tool which reliably flushes the kernel partition table cache
(partprobe?)
Thanks for the help,
Jules
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: After partition resize, RAID5 array does not assemble on boot
2008-06-04 6:31 ` Jules Bean
@ 2008-06-04 6:36 ` Peter Rabbitson
2008-06-04 7:58 ` David Greaves
1 sibling, 0 replies; 14+ messages in thread
From: Peter Rabbitson @ 2008-06-04 6:36 UTC (permalink / raw)
To: Jules Bean; +Cc: NeilBrown, linux-raid
Jules Bean wrote:
> The lesson here is always reboot after changing partition sizes, unless
> you have a tool which reliably flushes the kernel partition table cache
> (partprobe?)
>
hdparm -z ?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: After partition resize, RAID5 array does not assemble on boot
2008-06-04 6:31 ` Jules Bean
2008-06-04 6:36 ` Peter Rabbitson
@ 2008-06-04 7:58 ` David Greaves
2008-06-04 8:30 ` Jules Bean
1 sibling, 1 reply; 14+ messages in thread
From: David Greaves @ 2008-06-04 7:58 UTC (permalink / raw)
To: Jules Bean, NeilBrown; +Cc: linux-raid
Jules Bean wrote:
> As to where my superblock has gone, the only theory I have is that the
> MD layer knew that my partitions were 400G large while the kernel was
> convinced they were 250G large, so the md layer tried to write the
> superblock at (approx) +400G, and the kernel refused to do that.
I failed to do a similar grow operation recently and had to re-create.
I was using 0.9 sb which is stored at the end of the disk.
I have no idea how this is supposed to work...
If I have sda1 at 250Mb then the sb is at 250-d Mb
I'd like to stop the array, remove the partition, grow the partition to 400Mb
and start the array.
This won't work because md won't find an sb at 400-d Mb and so won't know that
it's an md component.
However, with a 1.1 or 1.2 sb I think it would work.
I tried using Michael Tokarev's mdsuper to pull the sb from the partition,
resize and then push it to the end of the new partition but that went wrong
somewhere.
I think the process should be:
1 stop array
2 mdadm --save-superblock=component.sb /dev/<component>
3 grow partition
4 mdadm --write-superblock=component.sb /dev/<component>
5 start array
6 grow array
7 grow fs
For sb 1.1 and 1.2 steps 2+4 should be no-ops
in step 4 mdadm may want to call the reread pt ioctl (which is what blockdev
--rereadpt does)
This approach, it seems to me, would avoid any reconstruction and would be a
'safer' way to grow the components.
If this sounds reasonable then I'd happily have a go at implementing
--save-superblock/--write-superblock
David
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: After partition resize, RAID5 array does not assemble on boot
2008-06-04 7:58 ` David Greaves
@ 2008-06-04 8:30 ` Jules Bean
2008-06-04 11:51 ` David Greaves
0 siblings, 1 reply; 14+ messages in thread
From: Jules Bean @ 2008-06-04 8:30 UTC (permalink / raw)
To: David Greaves; +Cc: NeilBrown, linux-raid
David Greaves wrote:
> Jules Bean wrote:
>> As to where my superblock has gone, the only theory I have is that the
>> MD layer knew that my partitions were 400G large while the kernel was
>> convinced they were 250G large, so the md layer tried to write the
>> superblock at (approx) +400G, and the kernel refused to do that.
>
> I failed to do a similar grow operation recently and had to re-create.
>
> I was using 0.9 sb which is stored at the end of the disk.
> I have no idea how this is supposed to work...
>
> If I have sda1 at 250Mb then the sb is at 250-d Mb
> I'd like to stop the array, remove the partition, grow the partition to 400Mb
> and start the array.
> This won't work because md won't find an sb at 400-d Mb and so won't know that
> it's an md component.
That doesn't matter.
Just add it as a fresh component. The old SB is irrelevant.
1. Fail component
2. remove component
3. resize partition
4. FORCE KERNEL TO NOTICE NEW SIZE (that's what I got wrong!). Reboot is
safest.
5. add component as new
6. watch as md layer rebuilds
If I hadn't screwed up step 4, I would have been fine. I have now done
step 4 correctly and grown my array to used dev size 400 (up from 250).
Of course this does assume your RAID level has the redundancy required
to remove a component (i.e. not RAID0).
> in step 4 mdadm may want to call the reread pt ioctl (which is what
> blockdev --rereadpt does)
Seems to me that whilst cfdisk makes no visible attempt, plain 'fdisk'
does try to call this ioctl but nonetheless it doesn't work if some
other partition on that disk is busy (e.g. involved in some other md
device, or mounted elsewhere). I saw messages to this effect whilst
experimenting with different partition sizes looking for my missing
superblock.
> This approach, it seems to me, would avoid any reconstruction and would be a
> 'safer' way to grow the components.
It would avoid reconstruction which is good for the impatient.
I don't really see that it's "safer" though. I would have thought it was
quicker but potentially less safe.
Jules
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: After partition resize, RAID5 array does not assemble on boot
2008-06-04 8:30 ` Jules Bean
@ 2008-06-04 11:51 ` David Greaves
2008-06-04 13:14 ` Jules Bean
2008-06-12 3:59 ` Neil Brown
0 siblings, 2 replies; 14+ messages in thread
From: David Greaves @ 2008-06-04 11:51 UTC (permalink / raw)
To: Jules Bean; +Cc: NeilBrown, linux-raid
Jules Bean wrote:
> David Greaves wrote:
> Just add it as a fresh component. The old SB is irrelevant.
>
> 1. Fail component
> 2. remove component
> 3. resize partition
> 4. FORCE KERNEL TO NOTICE NEW SIZE (that's what I got wrong!). Reboot is
> safest.
> 5. add component as new
> 6. watch as md layer rebuilds
And that's the bit that shouldn't be needed :)
Otherwise you're 'just' replacing devices one at a time which isn't very
interesting.
> Of course this does assume your RAID level has the redundancy required
> to remove a component (i.e. not RAID0).
Neil, would the sb-move approach support RAID0?
>> in step 4 mdadm may want to call the reread pt ioctl (which is what
>> blockdev --rereadpt does)
>
> Seems to me that whilst cfdisk makes no visible attempt, plain 'fdisk'
> does try to call this ioctl but nonetheless it doesn't work if some
> other partition on that disk is busy (e.g. involved in some other md
> device, or mounted elsewhere). I saw messages to this effect whilst
> experimenting with different partition sizes looking for my missing
> superblock.
I read (http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-10/4319.html) there
are some ioctls that work even when some disk partitions are still in use
(providing there is no impact).
However I couldn't see them in the block-layer ioctls or any code in
fs/partitions/check.c
>> This approach, it seems to me, would avoid any reconstruction and
>> would be a
>> 'safer' way to grow the components.
>
> It would avoid reconstruction which is good for the impatient.
>
> I don't really see that it's "safer" though. I would have thought it was
> quicker but potentially less safe.
Avoiding a lot of time stress testing the disks in degraded mode isn't 'safer'?
David
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: After partition resize, RAID5 array does not assemble on boot
2008-06-04 11:51 ` David Greaves
@ 2008-06-04 13:14 ` Jules Bean
2008-06-06 13:52 ` Bill Davidsen
2008-06-12 3:59 ` Neil Brown
1 sibling, 1 reply; 14+ messages in thread
From: Jules Bean @ 2008-06-04 13:14 UTC (permalink / raw)
To: David Greaves; +Cc: NeilBrown, linux-raid
David Greaves wrote:
>> I don't really see that it's "safer" though. I would have thought it was
>> quicker but potentially less safe.
>
> Avoiding a lot of time stress testing the disks in degraded mode isn't 'safer'?
Stress testing the disks by an md rebuild is a feature! It increases
confidence that they work.
;)
Seriously, I understand your point now. Yes, a rebuild-free partition
resize would be a nice feature. So would a "help, please find my
superblock by exhaustive scanning" utility ;)
Jules
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: After partition resize, RAID5 array does not assemble on boot
2008-06-04 13:14 ` Jules Bean
@ 2008-06-06 13:52 ` Bill Davidsen
2008-06-06 14:42 ` David Greaves
0 siblings, 1 reply; 14+ messages in thread
From: Bill Davidsen @ 2008-06-06 13:52 UTC (permalink / raw)
To: Jules Bean; +Cc: David Greaves, NeilBrown, linux-raid
Jules Bean wrote:
> David Greaves wrote:
>>> I don't really see that it's "safer" though. I would have thought it
>>> was
>>> quicker but potentially less safe.
>>
>> Avoiding a lot of time stress testing the disks in degraded mode
>> isn't 'safer'?
>
> Stress testing the disks by an md rebuild is a feature! It increases
> confidence that they work.
>
> ;)
>
> Seriously, I understand your point now. Yes, a rebuild-free partition
> resize would be a nice feature. So would a "help, please find my
> superblock by exhaustive scanning" utility ;)
Since this code must work when a partition is added on a totally new
drive, and when the partition is grown DOWN from the low end, clearly
the default must be a rebuild. And running "repair" before doing this
stuff is a really good idea!
What is needed is to do something like assume-clean on the old data and
a sync on the new chunks. I don't see that there is a remotely safe way
to do that, currently, although if you were willing to be unsafe you
could remove a partition, grow it at the "top" end, and reassemble with
--assume-clean. Sprinkling with holy water first might be a good thing.
I'm just thinking out loud here, there are probably good reasons why
this wouldn't work.
--
Bill Davidsen <davidsen@tmr.com>
"Woe unto the statesman who makes war without a reason that will still
be valid when the war is over..." Otto von Bismark
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: After partition resize, RAID5 array does not assemble on boot
2008-06-06 13:52 ` Bill Davidsen
@ 2008-06-06 14:42 ` David Greaves
2008-06-06 14:46 ` Jules Bean
0 siblings, 1 reply; 14+ messages in thread
From: David Greaves @ 2008-06-06 14:42 UTC (permalink / raw)
To: Bill Davidsen; +Cc: Jules Bean, NeilBrown, linux-raid
Bill Davidsen wrote:
> What is needed is to do something like assume-clean on the old data and
> a sync on the new chunks. I don't see that there is a remotely safe way
> to do that,
I assumed that --grow --size=max on a v1.2 superblock would do exactly that...
(well, start a resync at the old partition-end location anyway).
David
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: After partition resize, RAID5 array does not assemble on boot
2008-06-06 14:42 ` David Greaves
@ 2008-06-06 14:46 ` Jules Bean
0 siblings, 0 replies; 14+ messages in thread
From: Jules Bean @ 2008-06-06 14:46 UTC (permalink / raw)
To: David Greaves; +Cc: Bill Davidsen, NeilBrown, linux-raid
David Greaves wrote:
> Bill Davidsen wrote:
>> What is needed is to do something like assume-clean on the old data and
>> a sync on the new chunks. I don't see that there is a remotely safe way
>> to do that,
>
> I assumed that --grow --size=max on a v1.2 superblock would do exactly that...
> (well, start a resync at the old partition-end location anyway).
I can confirm that when I did this on my array, it only resynced from
the old-end. It "started" at 70% complete or so.
Jules
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: After partition resize, RAID5 array does not assemble on boot
2008-06-04 11:51 ` David Greaves
2008-06-04 13:14 ` Jules Bean
@ 2008-06-12 3:59 ` Neil Brown
1 sibling, 0 replies; 14+ messages in thread
From: Neil Brown @ 2008-06-12 3:59 UTC (permalink / raw)
To: David Greaves; +Cc: Jules Bean, linux-raid
On Wednesday June 4, david@dgreaves.com wrote:
> Jules Bean wrote:
> > David Greaves wrote:
> > Just add it as a fresh component. The old SB is irrelevant.
> >
> > 1. Fail component
> > 2. remove component
> > 3. resize partition
> > 4. FORCE KERNEL TO NOTICE NEW SIZE (that's what I got wrong!). Reboot is
> > safest.
> > 5. add component as new
>
>
> > 6. watch as md layer rebuilds
> And that's the bit that shouldn't be needed :)
>
> Otherwise you're 'just' replacing devices one at a time which isn't very
> interesting.
>
> > Of course this does assume your RAID level has the redundancy required
> > to remove a component (i.e. not RAID0).
> Neil, would the sb-move approach support RAID0?
Maybe. If the old partitions were all exactly the same size, and the
new are too.
If you start with different sized partitions, md/raid0 uses all the
space by having some stripes across fewer devices. So changing the
sizes of the partitions will confuse things.
>
> >> in step 4 mdadm may want to call the reread pt ioctl (which is what
> >> blockdev --rereadpt does)
> >
> > Seems to me that whilst cfdisk makes no visible attempt, plain 'fdisk'
> > does try to call this ioctl but nonetheless it doesn't work if some
> > other partition on that disk is busy (e.g. involved in some other md
> > device, or mounted elsewhere). I saw messages to this effect whilst
> > experimenting with different partition sizes looking for my missing
> > superblock.
> I read (http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-10/4319.html) there
> are some ioctls that work even when some disk partitions are still in use
> (providing there is no impact).
> However I couldn't see them in the block-layer ioctls or any code in
> fs/partitions/check.c
linux/block/ioctl.c.
BLKPG_DEL_PARTITION
BLKPG_ADD_PARTITION
Only works if the partition being changed isn't in used though.
NeilBrown
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2008-06-12 3:59 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-03 6:49 After partition resize, RAID5 array does not assemble on boot Jules Bean
2008-06-03 21:19 ` Jules Bean
2008-06-03 21:27 ` NeilBrown
2008-06-04 6:31 ` Jules Bean
2008-06-04 6:36 ` Peter Rabbitson
2008-06-04 7:58 ` David Greaves
2008-06-04 8:30 ` Jules Bean
2008-06-04 11:51 ` David Greaves
2008-06-04 13:14 ` Jules Bean
2008-06-06 13:52 ` Bill Davidsen
2008-06-06 14:42 ` David Greaves
2008-06-06 14:46 ` Jules Bean
2008-06-12 3:59 ` Neil Brown
2008-06-03 21:46 ` Peter Rabbitson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).