mystified by behaviour of mdadm raid5 -> raid0 conversion

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* mystified by behaviour of mdadm raid5 -> raid0 conversion
@ 2012-11-07 11:47 Geoff Attwater
  2012-11-07 22:00 ` NeilBrown
  0 siblings, 1 reply; 3+ messages in thread
From: Geoff Attwater @ 2012-11-07 11:47 UTC (permalink / raw)
  To: linux-raid

I have a relatively unimportant home fileserver that uses an mdadm
raid5 across three 1TB partitions (on separate disks - one is 1.5 TB
and has another 500GB partitition for other stuff). I wish to convert
it to raid10 across 4 1TB partitions by adding a fresh drive.

The mdadm man page, section *Grow Mode* states that it may

"convert between RAID1 and RAID5, between RAID5 and RAID6, between
RAID0, RAID4, and RAID5, and between RAID0 and RAID10 (in the near-2
mode)."

Conversion between RAID5 and RAID10 directly is not supported (mdadm
tells you so if you try it).
So my plan was to do a three stage conversion:

 1. back everything up
 2. convert from 3-disk raid5 -> 2-disk raid0 (now with no redundancy,
 but it's backed up, so that's ok)
 3. convert the 2-disk raid0 -> 4-disk raid10

All of these have the same logical size (2TB). This is on an Ubuntu
12.10 system.
mdadm --version reports:
mdadm - v3.2.5 - 18th May 2012
uname -a reports:
Linux penguin 3.5.0-18-generic #29-Ubuntu SMP Fri Oct 19 10:26:51 UTC
2012 x86_64 x86_64 x86_64 GNU/Linux

I searched around to see if anyone had followed this kind of procedure
before, but didn't find anything directly addressing exactly what I
was trying to do (I saw much more about raid0 -> raid5 type
conversions, while adding a device and the like and nothing much on
going the other way), so I proceeded based on what I understood from
the man page and other general stuff on mdadm raid reshaping I read.

for stage 2, I used the command

    mdadm --grow /dev/md0 --level=raid0 --raid-devices=2
--backup-file=/media/newdisk/raid_to_0_backup

where the backup-file is on another disk not in the array. I put the
--raid-devices=2 in to make it clear that what I was after was 2x1TB
disks in RAID0 and one spare (the same logical size), rather than a
larger logical size 3TB three-disk RAID0. Although based on Neil
Brown's blog post at http://neil.brown.name/blog/20090817000931 it
seems the conversion should generally operate by reshuffling things
into an equal-logical size array anyway, so that perhaps wasn't
necessary.

This began a lengthy rebuild process that has now finished. However,
at the end of the process, after no visible error messages and
obviously a lot of data movement seen via iostat, `mdadm --detail
/dev/md0` showed the array as *still raid5* with all disks used, and
the dmesg output contained these relevant lines:

    [93874.341429] md: reshape of RAID array md0
    [93874.341435] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
    [93874.341437] md: using maximum available idle IO bandwidth (but
not more than 200000 KB/sec) for reshape.
    [93874.341442] md: using 128k window, over a total of 976630272k.
    === snip misc unrelated stuff  ===
    [183629.064361] md: md0: reshape done.
    [183629.072722] RAID conf printout:
    [183629.072732]  --- level:5 rd:3 wd:3
    [183629.072738]  disk 0, o:1, dev:sda1
    [183629.072742]  disk 1, o:1, dev:sdc1
    [183629.072746]  disk 2, o:1, dev:sdb1
    [183629.088584] md/raid0:md0: raid5 must be degraded! Degraded disks: 0
    [183629.091657] md: md0: raid0 would not accept array

This, I have trouble making sense of. The filesystem on the /dev/md0
was still mounted throughout and appeared fine. Unmounting /dev/md0
and running `fsck.ext4 -n -f /dev/md0` to force checking integrity
even though it was marked clean (but avoid making any modifications)
showed no trouble with the actual data on the array despite all the
shenanigans.

I rebooted the system, thinking perhaps the kernel wouldn't pick up
the new layout and display it in /proc/mdstat and mdadm output until
then, but the same situation of the array continuing to report itself
as raid5 persisted.

I googled the "would not accept array" message and came up with this
page http://forums.gentoo.org/viewtopic-t-938092-start-0 It concerns
trouble in converting 2 disk raid0 -> 3 disk raid5, though. Right at
the bottom of the first page of posts, a GlenSom states:

> Though, I found the issue. html If a raid0 is created with more then 1
> zone - reshaping is not supported. (If one partition is slightly
> larger then the others)

I do not know if that is the correct diagnosis in the case of their problem, but
I have checked my partition tables:

       Device Boot      Start         End      Blocks   Id  System
    /dev/sda1            2048  1953525167   976761560   fd  Linux RAID
autodetect
    /dev/sda2      1953525168  2930277167   488376000   83  Linux
       Device Boot      Start         End      Blocks   Id  System
    /dev/sdb1            2048  1953525167   976761560   fd  Linux RAID
autodetect
       Device Boot      Start         End      Blocks   Id  System
    /dev/sdc1            2048  1953525167   976761560   fd  Linux RAID
autodetect

All of the partitions in the raid are precisely the same geometry so
mismatches there should not be an issue.

Based on the paired `mdadm --detail` outputs on that forum post, I
noticed a before/after difference:
`Layout: Parity-last` appears after the reshape. I checked in my
`mdmadm --detail` output post-reshape (I'm afraid I did not save a
copy of the pre-reshape output) and it is there also.

I gather that this means that the reshape has successfully juggled the
data around, such that now it is laid out with basically a RAID4-style
layout, with one disk entirely consisting of parity, instead of
staggering the parity through the disks RAID5 style.

This means that the array should be losslessly convertible to RAID0
with no data motion simply through 'reinterpreting' the array as
consisting of just the first two disks and removing the parity disk
(one might expect to see it turn up as a spare in say a RAID6 -> RAID5
conversion, but as md RAID0 cannot have spares because your data is
destroyed at the point of a disk failure, it wouldn't here).

However, it didn't actually do that (failing instead with the dmesg
output mentioned above). I have since tried the command

    mdadm --grow /dev/md0 --level=raid0

to finish the job. This returns
`mdadm: failed to set raid disks` and adds a

    [ 4780.580972] md/raid:md0: reshape: not enough stripes.  Needed 512
    [ 4780.597961] md: couldn't update array info. -28

in the dmesg output.
further googling suggested that there was an interaction with a small
default stripe cache size causing it to fail. See post 12 on page
three of this thread

https://lkml.org/lkml/2006/7/7/325
> Yes. This is something I need to fix in the next mdadm. You need to
> tell md/raid5 to increase the size of the stripe cache before the grow
> can proceed. You can do this with
>
> echo 600 > /sys/block/md3/md/stripe_cache_size
>
> Then the --grow should work. The next mdadm will do this for you.
>
> NeilBrown

Anyway, `/sys/block/md0/md/stripe_cache_size` was 256, but the chunk
size of was 512k as reported by mdadm.

Running `echo 16384 > /sys/block/md/md/stripe_cache_size` and then
`mdadm --grow /dev/md0 --level=raid0`

once more, it was apparently happy and reported `raid_disks for
/dev/md0 set to 2` . Perhaps mdadm has not, in fact, been patched to
autoincrease the stripe_cache_size yet?

(NOTE: I believe that way back, before the initial resync, the
stripe_cache_size may already have been manually increased after
booting to a value larger than 512 so it may or may not have been an
issue then - but after reboot, reset to 256 as is not persistent).

However, `mdadm --detail` still reports the array as raid5,
parity-last and containing 3 active disks.

Running `mdadm --grow /dev/md0 --level=raid0 --raid-devices=2` in a
perhaps superstitious attempt to emphasise that only two devices are
wanted gives exactly the same message and no visible change in the
result (still raid5, three active drives albeit with `Layout :
parity-last`.

So this is where I stand, with a raid4/5 that doesn't seem to want to
turn into a raid0.

I guess at this point that perhaps something like failing the disk
that's parity-only and assembling an array with assumed geometry of
raid0 from the other two disks might be necessary: some way or other
to ensure that the md system does the final reinterpretation step
correctly. But it is unclear to me after reading the man page and
scanning through this list, and many serverfault questions tagged
mdadm how to do this correctly.

cat /proc/mdstat:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md0 : active raid5 sda1[0] sdc1[1] sdb1[3]
      1953260544 blocks super 1.2 level 5, 512k chunk, algorithm 5 [3/3] [UUU]

unused devices: <none>

mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Thu Sep 20 15:27:14 2012
     Raid Level : raid5
     Array Size : 1953260544 (1862.77 GiB 2000.14 GB)
  Used Dev Size : 976630272 (931.39 GiB 1000.07 GB)
   Raid Devices : 3
  Total Devices : 3
    Persistence : Superblock is persistent

    Update Time : Wed Nov  7 21:25:04 2012
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : parity-last
     Chunk Size : 512K

           Name : penguin:0  (local to host penguin)
           UUID : a881a285:2e5d5ed0:cadf3ad1:ea423f6f
         Events : 650048

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       33        1      active sync   /dev/sdc1
       3       8       17        2      active sync   /dev/sdb1

I have backups of everything I need to keep, so I can just kill the
thing and rebuild it in a new config without doing online reshaping,
but that's not what I'm worried about at this point so much as
understanding what is going on.

In particular, it seems to me the first command was pretty clear and
should have either just worked, or said something informative about
why it couldn't do it or it didn't make sense in that form, rather
than crunching through the whole thing, then leaving me with an array
that was *not actually raid0* although I said *lo, make the raid level
turn into raid0*.

I'm stumped at this point after reading a bunch of documentation and
discussion about using mdadm online and trying to figure out what to
do next, not wanting to waste time with a question I can answer just
by researching the web. Anyway, sorry about the length - I've tried to
keep it relevant and to the point.  Oh yes, and thanks to all those
responsible for the md system: it's been working very nicely up to
this point and I do appreciate your hard work and recognise that you
aren't obliged to offer random people technical support.

- Geoff Attwater

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: mystified by behaviour of mdadm raid5 -> raid0 conversion
  2012-11-07 11:47 mystified by behaviour of mdadm raid5 -> raid0 conversion Geoff Attwater
@ 2012-11-07 22:00 ` NeilBrown
  2012-11-10 13:00   ` Geoff Attwater
  0 siblings, 1 reply; 3+ messages in thread
From: NeilBrown @ 2012-11-07 22:00 UTC (permalink / raw)
  To: Geoff Attwater; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 4244 bytes --]

On Wed, 7 Nov 2012 22:47:20 +1100 Geoff Attwater <geoffwater@gmail.com> wrote:

> I have a relatively unimportant home fileserver that uses an mdadm
> raid5 across three 1TB partitions (on separate disks - one is 1.5 TB
> and has another 500GB partitition for other stuff). I wish to convert
> it to raid10 across 4 1TB partitions by adding a fresh drive.
> 
> The mdadm man page, section *Grow Mode* states that it may
> 
> "convert between RAID1 and RAID5, between RAID5 and RAID6, between
> RAID0, RAID4, and RAID5, and between RAID0 and RAID10 (in the near-2
> mode)."
> 
> Conversion between RAID5 and RAID10 directly is not supported (mdadm
> tells you so if you try it).
> So my plan was to do a three stage conversion:
> 
>  1. back everything up
>  2. convert from 3-disk raid5 -> 2-disk raid0 (now with no redundancy,
>  but it's backed up, so that's ok)
>  3. convert the 2-disk raid0 -> 4-disk raid10
> 
> All of these have the same logical size (2TB). This is on an Ubuntu
> 12.10 system.
> mdadm --version reports:
> mdadm - v3.2.5 - 18th May 2012
> uname -a reports:
> Linux penguin 3.5.0-18-generic #29-Ubuntu SMP Fri Oct 19 10:26:51 UTC
> 2012 x86_64 x86_64 x86_64 GNU/Linux
> 
> I searched around to see if anyone had followed this kind of procedure
> before, but didn't find anything directly addressing exactly what I
> was trying to do (I saw much more about raid0 -> raid5 type
> conversions, while adding a device and the like and nothing much on
> going the other way), so I proceeded based on what I understood from
> the man page and other general stuff on mdadm raid reshaping I read.
> 
> for stage 2, I used the command
> 
>     mdadm --grow /dev/md0 --level=raid0 --raid-devices=2
> --backup-file=/media/newdisk/raid_to_0_backup
> 
> where the backup-file is on another disk not in the array. I put the
> --raid-devices=2 in to make it clear that what I was after was 2x1TB
> disks in RAID0 and one spare (the same logical size), rather than a
> larger logical size 3TB three-disk RAID0. Although based on Neil
> Brown's blog post at http://neil.brown.name/blog/20090817000931 it
> seems the conversion should generally operate by reshuffling things
> into an equal-logical size array anyway, so that perhaps wasn't
> necessary.
> 
> This began a lengthy rebuild process that has now finished. However,
> at the end of the process, after no visible error messages and
> obviously a lot of data movement seen via iostat, `mdadm --detail
> /dev/md0` showed the array as *still raid5* with all disks used, and
> the dmesg output contained these relevant lines:
> 
>     [93874.341429] md: reshape of RAID array md0
>     [93874.341435] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
>     [93874.341437] md: using maximum available idle IO bandwidth (but
> not more than 200000 KB/sec) for reshape.
>     [93874.341442] md: using 128k window, over a total of 976630272k.
>     === snip misc unrelated stuff  ===
>     [183629.064361] md: md0: reshape done.
>     [183629.072722] RAID conf printout:
>     [183629.072732]  --- level:5 rd:3 wd:3
>     [183629.072738]  disk 0, o:1, dev:sda1
>     [183629.072742]  disk 1, o:1, dev:sdc1
>     [183629.072746]  disk 2, o:1, dev:sdb1
>     [183629.088584] md/raid0:md0: raid5 must be degraded! Degraded disks: 0
>     [183629.091657] md: md0: raid0 would not accept array

These last two are the interesting messages.
The raid0 module in the kernel will only access a raid5 for conversion if it
is in 'parity-last' layout, and is degraded.  But it isn't.

mdadm should fail and remove the 'parity' disk before trying to convert to
raid0, but it doesn't.
I guess I never tested it - and untested code is buggy code!

You could be able to finish the task manually.
 - fail the last (parity) device
 - remove that device
 - echo raid0 > /sys/block/md0/md/level

So:
  mdadm /dev/md0 -f /dev/sdb1
  mdadm /dev/md0 -r /dev/sdb1
  echo raid0 > /sys/block/md0/md/level

However you should double-check that 'sdb1' is the correct device.  Look in
the output of 'mdadm -D' and see what raid device number '2' is.

I'll add this to my list of things to fix.

Thanks,

NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: mystified by behaviour of mdadm raid5 -> raid0 conversion
  2012-11-07 22:00 ` NeilBrown
@ 2012-11-10 13:00   ` Geoff Attwater
  0 siblings, 0 replies; 3+ messages in thread
From: Geoff Attwater @ 2012-11-10 13:00 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Thanks for the speedy and informative reply. The fail, remove, echo
raid0 > /sys/block/md0/md/level worked perfectly to turn the
parity-last raid5 into raid0 (the parity disk *was* /dev/sdb, as the
array was built in that order in the first place). By this time I had
already concluded that raid10 was kind of nuts for my purposes anyway,
as raid6 is more resilient and easier to resize and I don't need
speed. So I converted it straight back to raid5, then raid6 and all of
that was easy and went fine.

I did notice however, that mdadm doesn't quite seem to convert raid0
-> raid10 as I expected either. I didn't try too hard then as I'd
already thought better of it, but I did some tests in a VM just now.

Taking a raid0 with disks /dev/vda and /dev/vdb ,
running mdadm /dev/md0 --grow --level=10 --add /dev/vdc /dev/vdd
states: "mdadm: Need 2 spares to create working array, and only have 0."
running mdadm /dev/md0 --grow --level=10
(which hypothetically one might expect to promote it to an array with
two missing drives, if it worked)
gives: "mdadm: can only add devices to linear arrays"

There seems to be a chicken-and-egg problem where raid0 arrays can't
have spares (as they can never be useful in normal usage), but mdadm
doesn't want to flip the mode until it has extra disks to work with.
The man page does seem to imply that this should work.

However, running
echo raid10 > /sys/block/md0/md/level
by analogy with before actually did mutate the array into one with two
missing devices,
mdadm --add /dev/vdc /dev/vdd
worked fine to then extend it.

I'm not very familiar with the whole mdadm system, so I might just be
using it wrong. But perhaps the raid0 -> raid10 path has an issue
also?

A third possible issue I noted before, but it might have been lost in
the middle of my last rambling message as I mentioned anything I
thought could possible relate is:
The grow sometimes requires a stripe_cache_size boost to function
properly, and reports "mdadm: failed to set raid disks" which is kind
of confusing unless you check the dmesg.

Anyway, it seems to me that a direct conversion to raid6 is a great
deal saner than what I was trying to do anyway, so all of this is
probably the kind of thing that only gets tested because some newbie
is trying to do weird things that don't make much sense.

Thanks again for clearing this up, and I must say, I'm impressed by md
and all the mutations this array has undergone, online, without losing
any data even if it got stuck. It's really nifty.

- Geoff.

On Thu, Nov 8, 2012 at 9:00 AM, NeilBrown <neilb@suse.de> wrote:
> On Wed, 7 Nov 2012 22:47:20 +1100 Geoff Attwater <geoffwater@gmail.com> wrote:
>
>> I have a relatively unimportant home fileserver that uses an mdadm
>> raid5 across three 1TB partitions (on separate disks - one is 1.5 TB
>> and has another 500GB partitition for other stuff). I wish to convert
>> it to raid10 across 4 1TB partitions by adding a fresh drive.
>>
>> The mdadm man page, section *Grow Mode* states that it may
>>
>> "convert between RAID1 and RAID5, between RAID5 and RAID6, between
>> RAID0, RAID4, and RAID5, and between RAID0 and RAID10 (in the near-2
>> mode)."
>>
>> Conversion between RAID5 and RAID10 directly is not supported (mdadm
>> tells you so if you try it).
>> So my plan was to do a three stage conversion:
>>
>>  1. back everything up
>>  2. convert from 3-disk raid5 -> 2-disk raid0 (now with no redundancy,
>>  but it's backed up, so that's ok)
>>  3. convert the 2-disk raid0 -> 4-disk raid10
>>
>> All of these have the same logical size (2TB). This is on an Ubuntu
>> 12.10 system.
>> mdadm --version reports:
>> mdadm - v3.2.5 - 18th May 2012
>> uname -a reports:
>> Linux penguin 3.5.0-18-generic #29-Ubuntu SMP Fri Oct 19 10:26:51 UTC
>> 2012 x86_64 x86_64 x86_64 GNU/Linux
>>
>> I searched around to see if anyone had followed this kind of procedure
>> before, but didn't find anything directly addressing exactly what I
>> was trying to do (I saw much more about raid0 -> raid5 type
>> conversions, while adding a device and the like and nothing much on
>> going the other way), so I proceeded based on what I understood from
>> the man page and other general stuff on mdadm raid reshaping I read.
>>
>> for stage 2, I used the command
>>
>>     mdadm --grow /dev/md0 --level=raid0 --raid-devices=2
>> --backup-file=/media/newdisk/raid_to_0_backup
>>
>> where the backup-file is on another disk not in the array. I put the
>> --raid-devices=2 in to make it clear that what I was after was 2x1TB
>> disks in RAID0 and one spare (the same logical size), rather than a
>> larger logical size 3TB three-disk RAID0. Although based on Neil
>> Brown's blog post at http://neil.brown.name/blog/20090817000931 it
>> seems the conversion should generally operate by reshuffling things
>> into an equal-logical size array anyway, so that perhaps wasn't
>> necessary.
>>
>> This began a lengthy rebuild process that has now finished. However,
>> at the end of the process, after no visible error messages and
>> obviously a lot of data movement seen via iostat, `mdadm --detail
>> /dev/md0` showed the array as *still raid5* with all disks used, and
>> the dmesg output contained these relevant lines:
>>
>>     [93874.341429] md: reshape of RAID array md0
>>     [93874.341435] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
>>     [93874.341437] md: using maximum available idle IO bandwidth (but
>> not more than 200000 KB/sec) for reshape.
>>     [93874.341442] md: using 128k window, over a total of 976630272k.
>>     === snip misc unrelated stuff  ===
>>     [183629.064361] md: md0: reshape done.
>>     [183629.072722] RAID conf printout:
>>     [183629.072732]  --- level:5 rd:3 wd:3
>>     [183629.072738]  disk 0, o:1, dev:sda1
>>     [183629.072742]  disk 1, o:1, dev:sdc1
>>     [183629.072746]  disk 2, o:1, dev:sdb1
>>     [183629.088584] md/raid0:md0: raid5 must be degraded! Degraded disks: 0
>>     [183629.091657] md: md0: raid0 would not accept array
>
> These last two are the interesting messages.
> The raid0 module in the kernel will only access a raid5 for conversion if it
> is in 'parity-last' layout, and is degraded.  But it isn't.
>
> mdadm should fail and remove the 'parity' disk before trying to convert to
> raid0, but it doesn't.
> I guess I never tested it - and untested code is buggy code!
>
> You could be able to finish the task manually.
>  - fail the last (parity) device
>  - remove that device
>  - echo raid0 > /sys/block/md0/md/level
>
> So:
>   mdadm /dev/md0 -f /dev/sdb1
>   mdadm /dev/md0 -r /dev/sdb1
>   echo raid0 > /sys/block/md0/md/level
>
> However you should double-check that 'sdb1' is the correct device.  Look in
> the output of 'mdadm -D' and see what raid device number '2' is.
>
> I'll add this to my list of things to fix.
>
> Thanks,
>
> NeilBrown
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-11-10 13:00 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-07 11:47 mystified by behaviour of mdadm raid5 -> raid0 conversion Geoff Attwater
2012-11-07 22:00 ` NeilBrown
2012-11-10 13:00   ` Geoff Attwater

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).