RAID6 reshape integer problem

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RAID6 reshape integer problem
@ 2014-01-03  9:30 Piotr Klimek
  2014-01-03 12:56 ` Phil Turmel
  0 siblings, 1 reply; 6+ messages in thread
From: Piotr Klimek @ 2014-01-03  9:30 UTC (permalink / raw)
  To: linux-raid

Hello,
I was trying to migrate from 5x3TB RAID5 to 6x3TB RAID6 with this command:

$ mdadm --grow /dev/md1 --level=6 --raid-devices=6 --
backup=/home/archiwum/r6.bak

After couple of days reshape stalled on 2147483648 block. I have found that 
this is a bug in mdadm. 2147483648 is max integer on 32 bit system. For now 
situation looks like this:

$ uname -a
Linux backuper 2.6.32-5-686-bigmem #1 SMP Mon Sep 23 23:38:27 UTC 2013 i686 
GNU/Linux

$ mdadm --version
mdadm - v3.1.4 - 31st August 2010

$ cat /proc/mdstat
md1 : active raid6 sdi1[6] sdj1[0] sdk1[5] sdc1[4] sdf1[3] sdh1[1]
      11721054208 blocks super 1.2 level 6, 512k chunk, algorithm 18 [6/5] 
[UUUUU_]
      [==============>......]  reshape = 73.2% (2147483648/2930263552) 
finish=18031577.4min speed=0K/sec

$ mdadm --detail /dev/md1
/dev/md1:
        Version : 1.2
  Creation Time : Mon Nov 25 11:41:34 2013
     Raid Level : raid6
     Array Size : 11721054208 (11178.07 GiB 12002.36 GB)
  Used Dev Size : 2930263552 (2794.52 GiB 3000.59 GB)
   Raid Devices : 6
  Total Devices : 6
    Persistence : Superblock is persistent

    Update Time : Fri Jan  3 10:26:20 2014
          State : clean, degraded, recovering
 Active Devices : 5
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric-6
     Chunk Size : 512K

 Reshape Status : 73% complete
     New Layout : left-symmetric

           Name : backuper:1  (local to host backuper)
           UUID : 5fb40412:f3447bf7:f654d5f1:1bef1e1c
         Events : 4553396

    Number   Major   Minor   RaidDevice State
       0       8      145        0      active sync   /dev/sdj1
       1       8      113        1      active sync   /dev/sdh1
       3       8       81        2      active sync   /dev/sdf1
       4       8       33        3      active sync   /dev/sdc1
       5       8      161        4      active sync   /dev/sdk1
       6       8      129        5      spare rebuilding   /dev/sdi1

What should I do now? Its a production system, of course I have a backups 
but I would like to avoid recovering 8TB data from backups. My idea is to 
cancel reshape, and then do it again with bigger chunk, or upgrade 
mdadm/kernel. 
Thanks in advance for your help.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RAID6 reshape integer problem
  2014-01-03  9:30 RAID6 reshape integer problem Piotr Klimek
@ 2014-01-03 12:56 ` Phil Turmel
  2014-01-03 13:43   ` Piotr Klimek
       [not found]   ` <CAM8eUy0=YbC68vesaLPSp0RhL_2c8byVyROMGv2DdQ2_JSjYrg@mail.gmail.com>
  0 siblings, 2 replies; 6+ messages in thread
From: Phil Turmel @ 2014-01-03 12:56 UTC (permalink / raw)
  To: Piotr Klimek, linux-raid

On 01/03/2014 04:30 AM, Piotr Klimek wrote:
> Hello,
> I was trying to migrate from 5x3TB RAID5 to 6x3TB RAID6 with this command:
> 
> $ mdadm --grow /dev/md1 --level=6 --raid-devices=6 --
> backup=/home/archiwum/r6.bak
> 
> After couple of days reshape stalled on 2147483648 block. I have found that 
> this is a bug in mdadm. 2147483648 is max integer on 32 bit system. For now 
> situation looks like this:

You'll certainly have to take production down long enough to let a newer
kernel finish the reshape.  Or you could be brave and replace the kernel
and let production continue through the balance of the reshape.

HTH,

Phil

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RAID6 reshape integer problem
  2014-01-03 12:56 ` Phil Turmel
@ 2014-01-03 13:43   ` Piotr Klimek
       [not found]   ` <CAM8eUy0=YbC68vesaLPSp0RhL_2c8byVyROMGv2DdQ2_JSjYrg@mail.gmail.com>
  1 sibling, 0 replies; 6+ messages in thread
From: Piotr Klimek @ 2014-01-03 13:43 UTC (permalink / raw)
  Cc: linux-raid

2014/1/3 Phil Turmel <philip@turmel.org>

> > After couple of days reshape stalled on 2147483648 block. I have found that
> > this is a bug in mdadm. 2147483648 is max integer on 32 bit system. For now
> > situation looks like this:
>
> You'll certainly have to take production down long enough to let a newer
> kernel finish the reshape.  Or you could be brave and replace the kernel
> and let production continue through the balance of the reshape.
>

Is it safe to reboot during reshape and boot up with new kernel? I can
easly upgrade kernel to 3.2.0.

-- 
Regards
Piotr Klimek

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RAID6 reshape integer problem
       [not found]   ` <CAM8eUy0=YbC68vesaLPSp0RhL_2c8byVyROMGv2DdQ2_JSjYrg@mail.gmail.com>
@ 2014-01-03 17:07     ` Phil Turmel
  2014-01-08 11:19       ` Piotr Klimek
  0 siblings, 1 reply; 6+ messages in thread
From: Phil Turmel @ 2014-01-03 17:07 UTC (permalink / raw)
  To: Piotr Klimek, linux-raid

On 01/03/2014 08:34 AM, Piotr Klimek wrote:
> 2014/1/3 Phil Turmel <philip@turmel.org>
> 
>>
>>> After couple of days reshape stalled on 2147483648 block. I have found
>> that
>>> this is a bug in mdadm. 2147483648 is max integer on 32 bit system. For
>> now
>>> situation looks like this:
>>
>> You'll certainly have to take production down long enough to let a newer
>> kernel finish the reshape.  Or you could be brave and replace the kernel
>> and let production continue through the balance of the reshape.
>>
> 
> Is it safe to reboot during reshape and boot up with new kernel? I can
> easly upgrade kernel to 3.2.0.
> 

It is supposed to be safe. With a clean shutdown, the superblock will
have a record of the reshape progress and will be able to continue after
re-assembly.

But you'll probably have to assembly manually to specify the location of
the critical section backup file.  If your root FS is in this array,
you'll have to intervene in the initramfs.

HTH,

Phil

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RAID6 reshape integer problem
  2014-01-03 17:07     ` Phil Turmel
@ 2014-01-08 11:19       ` Piotr Klimek
  2014-01-08 16:32         ` joystick
  0 siblings, 1 reply; 6+ messages in thread
From: Piotr Klimek @ 2014-01-08 11:19 UTC (permalink / raw)
  To: Phil Turmel; +Cc: linux-raid

Hi,
Everything works fine right now, all I have to do after reboot was
assemble array using backup file and set MDADM_GROW_ALLOW_OLD=1
enviroment variable.

Thanks for your help.

2014/1/3 Phil Turmel <philip@turmel.org>:
> On 01/03/2014 08:34 AM, Piotr Klimek wrote:
>> 2014/1/3 Phil Turmel <philip@turmel.org>
>>
>>>
>>>> After couple of days reshape stalled on 2147483648 block. I have found
>>> that
>>>> this is a bug in mdadm. 2147483648 is max integer on 32 bit system. For
>>> now
>>>> situation looks like this:
>>>
>>> You'll certainly have to take production down long enough to let a newer
>>> kernel finish the reshape.  Or you could be brave and replace the kernel
>>> and let production continue through the balance of the reshape.
>>>
>>
>> Is it safe to reboot during reshape and boot up with new kernel? I can
>> easly upgrade kernel to 3.2.0.
>>
>
> It is supposed to be safe. With a clean shutdown, the superblock will
> have a record of the reshape progress and will be able to continue after
> re-assembly.
>
> But you'll probably have to assembly manually to specify the location of
> the critical section backup file.  If your root FS is in this array,
> you'll have to intervene in the initramfs.
>
> HTH,
>
> Phil



-- 
Pozdrawiam
Piotr Klimek

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RAID6 reshape integer problem
  2014-01-08 11:19       ` Piotr Klimek
@ 2014-01-08 16:32         ` joystick
  0 siblings, 0 replies; 6+ messages in thread
From: joystick @ 2014-01-08 16:32 UTC (permalink / raw)
  To: Piotr Klimek; +Cc: Phil Turmel, linux-raid

Had I seen this thread earlier... I would probably have responded that 
no, AFAIR it is not safe to reboot while reshaping (!)

In fact i seem to recall that there was a very serious known problem 
during pivot_root , which is at the end of initramfs. A reshape in 
progress through that point would cause data loss. So the array could 
only be assembled after that point, so you had to modify initrd or use a 
livecd so to be sure that it wouldn't find/assemble the array at the 
early stages of boot. This is what I seem to remember.

However by looking for details on such problem everywhere, I am not 
anymore able to find any reference.
Does anybody recall this problem?

Thanks
J.

On 08/01/2014 12:19, Piotr Klimek wrote:
> Hi,
> Everything works fine right now, all I have to do after reboot was
> assemble array using backup file and set MDADM_GROW_ALLOW_OLD=1
> enviroment variable.
>
> Thanks for your help.
>
> 2014/1/3 Phil Turmel <philip@turmel.org>:
>> It is supposed to be safe. With a clean shutdown, the superblock will
>> have a record of the reshape progress and will be able to continue after
>> re-assembly.
>>
>> But you'll probably have to assembly manually to specify the location of
>> the critical section backup file.  If your root FS is in this array,
>> you'll have to intervene in the initramfs.
>>
>> HTH,
>>
>> Phil
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-01-08 16:32 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-03  9:30 RAID6 reshape integer problem Piotr Klimek
2014-01-03 12:56 ` Phil Turmel
2014-01-03 13:43   ` Piotr Klimek
     [not found]   ` <CAM8eUy0=YbC68vesaLPSp0RhL_2c8byVyROMGv2DdQ2_JSjYrg@mail.gmail.com>
2014-01-03 17:07     ` Phil Turmel
2014-01-08 11:19       ` Piotr Klimek
2014-01-08 16:32         ` joystick

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).