linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Interrupted reshape -- mangled backup ?
@ 2012-10-17 21:34 Haakon Alstadheim
  2012-10-17 22:33 ` NeilBrown
  0 siblings, 1 reply; 3+ messages in thread
From: Haakon Alstadheim @ 2012-10-17 21:34 UTC (permalink / raw)
  To: linux-raid

I have a Raid5 array with 4 devices that I wanted to see if I could get 
a better performance out of, so i tried changing the chunk size from 64K 
to something bigger. (famous last words) .  I got into some other 
trouble and thought I needed a reboot. On reboot I several times managed 
to mount and specify the device with my backup file during initramfs, 
but the reshape stopped every time once the system was at initialized.

This is under debian sqeeze with a 3.2.0-0.bpo.3-686-pae kernel from 
backports. I installed mdadm from backports to get the latest version of 
that as well, and tried rebooting with --freeze-reshape. Suspect that I 
mixed up my initrd.img-files and started without --freeze-reshape the 
first time after installing the new mdadm. Now mdadm says it can not 
find a backup in my backup file. Opening up the backup in emacs, it 
seems to contain only NULs. Can't be right, can it? I have been mounting 
the backup under a directory under /dev/, on the assumption that the 
mount wold survive past the initramfs stage.

My bumbling has been happening with a current, correct, 
/etc/mdadm/mdadm.conf containigng:
--------
DEVICE /dev/sdh /dev/sde /dev/sdc /dev/sdd
CREATE owner=root group=disk mode=0660 auto=yes
HOMEHOST <system>
ARRAY /dev/md1 level=raid5 num-devices=4 
UUID=583001c4:650dcf0c:404aaa6f:7fc38959 spare-group=main
-------
The show-stopper happened with an initramfs and a script in 
/scripts/local-top/mdadm along the lines of:
-------
/sbin/mdadm --assemble -f --backup-file=/dev/bak/md1-backup /dev/md1 
--run --auto=yes /dev/sdh /dev/sde /dev/sdc /dev/sdd
-------

At times I have also had to use the env-variable MDADM_GROW_ALLOW_OLD=1

Below is the output of mdadm -Evvvvs:
--------


/dev/sdh:
           Magic : a92b4efc
         Version : 0.91.00
            UUID : 583001c4:650dcf0c:404aaa6f:7fc38959
   Creation Time : Wed Dec  3 19:45:33 2008
      Raid Level : raid5
   Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
      Array Size : 2930287488 (2794.54 GiB 3000.61 GB)
    Raid Devices : 4
   Total Devices : 4
Preferred Minor : 1

   Reshape pos'n : 2368561152 (2258.84 GiB 2425.41 GB)
   New Chunksize : 131072

     Update Time : Wed Oct 17 02:15:53 2012
           State : active
  Active Devices : 4
Working Devices : 4
  Failed Devices : 0
   Spare Devices : 0
        Checksum : 14da0760 - correct
          Events : 778795

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     0       8      112        0      active sync   /dev/sdh

    0     0       8      112        0      active sync   /dev/sdh
    1     1       8       48        1      active sync   /dev/sdd
    2     2       8       32        2      active sync   /dev/sdc
    3     3       8       64        3      active sync   /dev/sde
/dev/sde:
           Magic : a92b4efc
         Version : 0.91.00
            UUID : 583001c4:650dcf0c:404aaa6f:7fc38959
   Creation Time : Wed Dec  3 19:45:33 2008
      Raid Level : raid5
   Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
      Array Size : 2930287488 (2794.54 GiB 3000.61 GB)
    Raid Devices : 4
   Total Devices : 4
Preferred Minor : 1

   Reshape pos'n : 2368561152 (2258.84 GiB 2425.41 GB)
   New Chunksize : 131072

     Update Time : Wed Oct 17 02:15:53 2012
           State : active
  Active Devices : 4
Working Devices : 4
  Failed Devices : 0
   Spare Devices : 0
        Checksum : 14da0736 - correct
          Events : 778795

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     3       8       64        3      active sync   /dev/sde

    0     0       8      112        0      active sync   /dev/sdh
    1     1       8       48        1      active sync   /dev/sdd
    2     2       8       32        2      active sync   /dev/sdc
    3     3       8       64        3      active sync   /dev/sde
/dev/sdc:
           Magic : a92b4efc
         Version : 0.91.00
            UUID : 583001c4:650dcf0c:404aaa6f:7fc38959
   Creation Time : Wed Dec  3 19:45:33 2008
      Raid Level : raid5
   Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
      Array Size : 2930287488 (2794.54 GiB 3000.61 GB)
    Raid Devices : 4
   Total Devices : 4
Preferred Minor : 1

   Reshape pos'n : 2368561152 (2258.84 GiB 2425.41 GB)
   New Chunksize : 131072

     Update Time : Wed Oct 17 02:15:53 2012
           State : active
  Active Devices : 4
Working Devices : 4
  Failed Devices : 0
   Spare Devices : 0
        Checksum : 14da0714 - correct
          Events : 778795

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     2       8       32        2      active sync   /dev/sdc

    0     0       8      112        0      active sync   /dev/sdh
    1     1       8       48        1      active sync   /dev/sdd
    2     2       8       32        2      active sync   /dev/sdc
    3     3       8       64        3      active sync   /dev/sde
/dev/sdd:
           Magic : a92b4efc
         Version : 0.91.00
            UUID : 583001c4:650dcf0c:404aaa6f:7fc38959
   Creation Time : Wed Dec  3 19:45:33 2008
      Raid Level : raid5
   Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
      Array Size : 2930287488 (2794.54 GiB 3000.61 GB)
    Raid Devices : 4
   Total Devices : 4
Preferred Minor : 1

   Reshape pos'n : 2368561152 (2258.84 GiB 2425.41 GB)
   New Chunksize : 131072

     Update Time : Wed Oct 17 02:15:53 2012
           State : active
  Active Devices : 4
Working Devices : 4
  Failed Devices : 0
   Spare Devices : 0
        Checksum : 14da0722 - correct
          Events : 778795

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     1       8       48        1      active sync   /dev/sdd

    0     0       8      112        0      active sync   /dev/sdh
    1     1       8       48        1      active sync   /dev/sdd
    2     2       8       32        2      active sync   /dev/sdc
    3     3       8       64        3      active sync   /dev/sde
---------------------------

I guess the moral of all this is that if you want to use mdadm you 
should pay attention and not be in too much of a hurry :-/ .
I'm just hoping that I can get my system back. This raid contains my 
entire system, and will take a LOT of work to recreate. Mail, calendars 
... . Backups are a couple of weeks old ...

^ permalink raw reply	[flat|nested] 3+ messages in thread
* Re: Interrupted reshape -- mangled backup ?
@ 2012-10-18 14:08 Haakon Alstadheim
  0 siblings, 0 replies; 3+ messages in thread
From: Haakon Alstadheim @ 2012-10-18 14:08 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

(I'm copying to the list as well)

On 18. okt. 2012 00:33, NeilBrown wrote:
> On Wed, 17 Oct 2012 23:34:26 +0200 Haakon Alstadheim
> <hakon.alstadheim@gmail.com>  wrote:
>
>> I have a Raid5 array with 4 devices that I wanted to see if I could get
>> a better performance out of, so i tried changing the chunk size from 64K
>> to something bigger. (famous last words) .  I got into some other
>> trouble and thought I needed a reboot. On reboot I several times managed
>> to mount and specify the device with my backup file during initramfs,
>> but the reshape stopped every time once the system was at initialized.
> So worst-case you can do that again, but insert a "sleep 365d" immediately
> after the "mdadm --assemble" is run, so the system never completely
> initialises.  Then just wait for the reshape to finish.
Yes, well I need to keep my mail-server running :-). Couple of hours 
down-time each night is acceptable though.

>
> When mdadm assembles and array that needs to keep growing it will for a
> background process to continue monitoring the reshape process.  Presumably
> that background process is getting killed.  I don't know why.
>
>> This is under debian sqeeze with a 3.2.0-0.bpo.3-686-pae kernel from
>> backports. I installed mdadm from backports to get the latest version of
>> that as well, and tried rebooting with --freeze-reshape. Suspect that I
>> mixed up my initrd.img-files and started without --freeze-reshape the
>> first time after installing the new mdadm. Now mdadm says it can not
>> find a backup in my backup file. Opening up the backup in emacs, it
>> seems to contain only NULs. Can't be right, can it? I have been mounting
>> the backup under a directory under /dev/, on the assumption that the
>> mount wold survive past the initramfs stage.
> The backup file could certainly contain lots of nuls, but it shouldn't be
> *all* nulls.
I checked again, isearch-forward-regexp for [^^@] in emacs gives no 
hits. ^@ is the emacs way of displaying ASCII-NUL
>    At least there should be a header at the start which describes
> which area of the device is contained in the backup.
>
> You can continue without a backup.  You still need to specify a backup file,
> but if you add "--invalid-backup", it will continue even if the backup file
> doesn't contain anything useful.
Thanks! device is now running again. These switches are hard to google 
after. Especially when you are a bit stressed :-)
> If the machine was shutdown by a crash during reshape you might suffer
> corruption.  If it was a clean shutdown you won't.

No corruption yet, (last reboot without fsck though).

>
> --freeze-reshape is intended to be the way to handle this, with
>     --grow --continue
> once you are fully up and running, but I don't think that works correctly for
> 'native' metadata yet - it was implemented with IMSM metadata in mind.
>
> NeilBrown
You are right, mdadm segfaults when I try to do mdadm --grow --continue 
--backup-file=/dev/bak/md1-backup /dev/md1.

We'll se tonight at 04:15 whether my custom initramfs script can make 
some progress on the reshape, and continue booting without messing up 
the backup-file.

In my initramfs script, should I do the following ? :
1. Start the array
2. Sleep a while
3. Stop the array
4. Start with --freeze-reshape
5. Continue boot

... or is it just as well to live with the timestamps getting out of 
sync when the reshape dies? I.e skip the stop&restart bit ?



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-10-18 14:08 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-17 21:34 Interrupted reshape -- mangled backup ? Haakon Alstadheim
2012-10-17 22:33 ` NeilBrown
  -- strict thread matches above, loose matches on Subject: below --
2012-10-18 14:08 Haakon Alstadheim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).