From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Burgess Subject: Re: reshape changing chunk size won't restart Date: Tue, 21 Dec 2010 18:09:46 -0800 Message-ID: <1292983786.5543.1@athlon> References: <20101222120810.5bba5304@notabene.brown> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; DelSp=Yes; Format=Flowed Content-Transfer-Encoding: 8BIT Return-path: In-Reply-To: <20101222120810.5bba5304@notabene.brown> (from neilb@suse.de on Tue Dec 21 17:08:10 2010) Content-Disposition: inline Sender: linux-raid-owner@vger.kernel.org To: linux raid mailing list List-Id: linux-raid.ids On 12/21/2010 05:08:10 PM, Neil Brown wrote: > On Tue, 21 Dec 2010 16:09:59 -0800 Andrew Burgess > wrote: > > > On 12/21/2010 02:16:19 PM, Neil Brown wrote: > > > > > > I started a reshape changing chunk size and after it ran > > > > for a while i realized the disk i used for the > > > > backup file was slow so I killed the mdadm > > > > > > That was a mistake. > > > > Its looking to be a bad one > > > > > > running in the background and tried to restart > > > > with the new location (i moved the file just in case) > > > > > > > > mdadm /dev/md5 --grow --chunk=8 > > > --backup-file=/my/raid/RAID_BACKUP_FILE > > > > > > As you discovered, that doesn't work. I'd like to make it > possible > > > to do > > > something like that, but time is not something I have a lot of. > > > > Understand 100% > > > > > > I didn't try rebooting as the filesystem is mounted and > > > > the data seems ok. Didn't want to make things worse... > > > > > > It shouldn't make things worse. > > > > I had too because umount wouldn't and neither fuser nor lsof > > could find the guilty party > > > > > Do don't need to reboot, unless md5 has your root filesystem. > > > Just unmount, 'mdadm -S /dev/md5', and assemble: > > > mdadm -A /dev/md5 > --backup-file=/whereever-you-copied-the-file-to \ > > > /dev/sd[dfcbhljgk]1 > > > > > > should do it. > > > > After rebooting something happened to sdg1: > > > > mdadm -A /dev/md5 --backup-file=/my/raid/RAID_BACKUP_FILE > > /dev/sd[dfcbhljgk]1 > > mdadm: cannot open device /dev/sdg1: No such device or address > > mdadm: /dev/sdg1 has no superblock - assembly aborted > > > > so i tried it with sdg1 missing > > > > mdadm -A /dev/md5 --backup-file=/my/raid/RAID_BACKUP_FILE > > /dev/sd[dfcbhljk]1 > > mdadm: Failed to restore critical section for reshape, sorry. > > > > so i rebooted and power cycled hoping to get sdg1 back but it was > > still unhappy with the superblock > > > > I even tried it letting it scan for devices: > > > > mdadm -A /dev/md5 --backup-file=/my/raid/RAID_BACKUP_FILE > > mdadm: WARNING /dev/sdg1 and /dev/sdg appear to have very similar > > superblocks. > > If they are really different, please --zero the superblock > on one > > If they are the same or overlap, please remove one from the > > DEVICE list in mdadm.conf. > > > > so repeating with all but sdg1 specified it results in: > > > > mdadm: Failed to restore critical section for reshape, sorry. > > > > Anything else I can try? We do have the sector it was on in the > original > > email when it stopped: (2715648/1953511936) > > > The business with sdg1 is a bit odd... I would use "--examine" to > check each > device and make sure they have good matching superblocks. It would > be a lot > better if you can make sure all devices get included when you start > the array. all the working devices have the same Reshape pos'n value in the superblock. sdg1 though: mdadm -E /dev/sdg1 mdadm: cannot open /dev/sdg1: No such device or address even though: ls -l /dev/sdg* brw-rw---- 1 root disk 8, 96 Dec 21 15:53 /dev/sdg brw-rw---- 1 root disk 8, 97 Dec 21 15:55 /dev/sdg1 and the partition table looks ok. sdg is brand new but there are no i/o errors in the log > Also, try starting with '--verbose', it might give some useful > information, > but I don't hold out a lot of hope. unless old timestamp is helpful: mdadm --verbose -A /dev/md5 --backup-file=/my/raid/RAID_BACKUP_FILE /dev/sd[dfcbhljk]1 mdadm: looking for devices for /dev/md5 mdadm: /dev/sdb1 is identified as a member of /dev/md5, slot 0. mdadm: /dev/sdc1 is identified as a member of /dev/md5, slot 1. mdadm: /dev/sdd1 is identified as a member of /dev/md5, slot 2. mdadm: /dev/sdf1 is identified as a member of /dev/md5, slot 8. mdadm: /dev/sdh1 is identified as a member of /dev/md5, slot 4. mdadm: /dev/sdj1 is identified as a member of /dev/md5, slot 3. mdadm: /dev/sdk1 is identified as a member of /dev/md5, slot 6. mdadm: /dev/sdl1 is identified as a member of /dev/md5, slot 5. mdadm:/dev/md5 has an active reshape - checking if critical section needs to be restored mdadm: too-old timestamp on backup-metadata on /my/raid/RAID_BACKUP_FILE mdadm: Failed to find backup of critical section mdadm: Failed to restore critical section for reshape, sorry. > Finally, you will probably end up having to modify mdadm so that it > ignores a > failure from Grow_restart. AS you had a reasonably clean shutdown > rather > than a crash, there is a good chance that the backup file isn't > actually > needed. If the timestamp info above doesn't change your mind then I'll try that. > The next release of mdadm will have a --invalid-backup option to > --assemble > to tell it to just continue even though the backup file looks wrong. Hope to send you a patch for that. Thanks for your time!