* Failed to find backup of critical section
@ 2013-09-01 8:56 Nathan Shearer
2013-09-01 9:21 ` NeilBrown
0 siblings, 1 reply; 3+ messages in thread
From: Nathan Shearer @ 2013-09-01 8:56 UTC (permalink / raw)
To: linux-raid
Hi, I've run into a problem recovering my array from a server power
failure. I'll try to keep it short so here is a sequence of events:
1. Running a healthy 4-disk RAID5 array (on server-01).
2. Added a 5th drive and grow the array to a 5-disk RAID6 array (backup
file stored on a separate RAID1 array on other disks)
3. Grow begins and passes the critical section, gets to ~15% complete
and power to the server fails
4. I then move all 5 drives to backup server. The RAID5/6 array
assembles and grow continues (without backup file since it's on
server-01)
5. I begin copying data off of that array onto a separate array --
filesystem and data is consistent :)
6. Power restored to server-01
7. Safely stop the growing array with mdadm --stop
8. Move 5 drives back into server-01
9. Attempt mdadm --assemble and I get:
# mdadm --assemble /dev/md9
mdadm: Failed to restore critical section for reshape, sorry.
Possibly you needed to specify the --backup-file
10. Attempt with the original backup file:
# mdadm --assemble /dev/md9 --backup-file
/mnt/temp/raid-reshape-backup-file
mdadm: Failed to restore critical section for reshape, sorry.
So when I enable --verbose I get:
mdadm:/dev/md9 has an active reshape - checking if critical section
needs to be restored
mdadm: Failed to find backup of critical section
mdadm: Failed to restore critical section for reshape, sorry.
Possibly you needed to specify the --backup-file
When I provide the backup file I get:
mdadm:/dev/md9 has an active reshape - checking if critical section
needs to be restored
mdadm: too-old timestamp on backup-metadata on
/mnt/temp/raid-reshape-backup-file
mdadm: Failed to find backup of critical section
mdadm: Failed to restore critical section for reshape, sorry.
When I tell it to use the "old" backup file I get:
# export MDADM_GROW_ALLOW_OLD=1
# mdadm --assemble /dev/md9 -vv --backup-file
/mnt/temp/raid-reshape-backup-file
mdadm:/dev/md9 has an active reshape - checking if critical section
needs to be restored
mdadm: accepting backup with timestamp 1377794387 for array with
timestamp 1377904444
mdadm: backup-metadata found on /mnt/temp/raid-reshape-backup-file
but is not needed
mdadm: Failed to find backup of critical section
mdadm: Failed to restore critical section for reshape, sorry.
OK, so the backup file is not needed. I assume this is because the
critical section was passed long ago, but then why is it attempting to
find and restore the backup file when it is provided and also not
needed? I have not tried a --force because I don't want to trash my
array if there is another better option that I can still try. Any ideas?
Is this potentially a bug in mdadm where this kind of array state is not
expected?
--
*Nathan Shearer*
www.nathanshearer.ca
403 393 6789
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Failed to find backup of critical section
2013-09-01 8:56 Failed to find backup of critical section Nathan Shearer
@ 2013-09-01 9:21 ` NeilBrown
2013-09-01 10:25 ` Nathan Shearer
0 siblings, 1 reply; 3+ messages in thread
From: NeilBrown @ 2013-09-01 9:21 UTC (permalink / raw)
To: Nathan Shearer; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 4386 bytes --]
On Sun, 01 Sep 2013 02:56:12 -0600 Nathan Shearer <mail@nathanshearer.ca>
wrote:
> Hi, I've run into a problem recovering my array from a server power
> failure. I'll try to keep it short so here is a sequence of events:
>
> 1. Running a healthy 4-disk RAID5 array (on server-01).
> 2. Added a 5th drive and grow the array to a 5-disk RAID6 array (backup
> file stored on a separate RAID1 array on other disks)
> 3. Grow begins and passes the critical section, gets to ~15% complete
> and power to the server fails
When growing a 4-disk RAID5 to a 5-disk RAID6 the entire process is in the
"critical section". This is because it is always writing to location where
live data is.
When increasing the number of data drives there is a short critical section
at the start.
When decreasing the number of data drives there is a short critical section
at the end.
But when you don't change the number of data drives as in this case, it is
all critical and all needs a backup.
> 4. I then move all 5 drives to backup server. The RAID5/6 array
> assembles and grow continues (without backup file since it's on
> server-01)
That shouldn't work. It shouldn't start without the backup file.
> 5. I begin copying data off of that array onto a separate array --
> filesystem and data is consistent :)
> 6. Power restored to server-01
> 7. Safely stop the growing array with mdadm --stop
> 8. Move 5 drives back into server-01
> 9. Attempt mdadm --assemble and I get:
> # mdadm --assemble /dev/md9
> mdadm: Failed to restore critical section for reshape, sorry.
> Possibly you needed to specify the --backup-file
That should have happened on server-02
> 10. Attempt with the original backup file:
> # mdadm --assemble /dev/md9 --backup-file
> /mnt/temp/raid-reshape-backup-file
> mdadm: Failed to restore critical section for reshape, sorry.
>
> So when I enable --verbose I get:
>
> mdadm:/dev/md9 has an active reshape - checking if critical section
> needs to be restored
> mdadm: Failed to find backup of critical section
> mdadm: Failed to restore critical section for reshape, sorry.
> Possibly you needed to specify the --backup-file
>
> When I provide the backup file I get:
>
> mdadm:/dev/md9 has an active reshape - checking if critical section
> needs to be restored
> mdadm: too-old timestamp on backup-metadata on
> /mnt/temp/raid-reshape-backup-file
> mdadm: Failed to find backup of critical section
> mdadm: Failed to restore critical section for reshape, sorry.
>
> When I tell it to use the "old" backup file I get:
>
> # export MDADM_GROW_ALLOW_OLD=1
> # mdadm --assemble /dev/md9 -vv --backup-file
> /mnt/temp/raid-reshape-backup-file
> mdadm:/dev/md9 has an active reshape - checking if critical section
> needs to be restored
> mdadm: accepting backup with timestamp 1377794387 for array with
> timestamp 1377904444
> mdadm: backup-metadata found on /mnt/temp/raid-reshape-backup-file
> but is not needed
> mdadm: Failed to find backup of critical section
> mdadm: Failed to restore critical section for reshape, sorry.
>
> OK, so the backup file is not needed. I assume this is because the
> critical section was passed long ago, but then why is it attempting to
> find and restore the backup file when it is provided and also not
> needed? I have not tried a --force because I don't want to trash my
> array if there is another better option that I can still try. Any ideas?
> Is this potentially a bug in mdadm where this kind of array state is not
> expected?
>
The content of the backup file is not needed as it is (presumably) before the
place where the reshape has proceeded to.
The backup is only needed after an unclean shutdown. Presumably you had an
unclean shutdown when server-01 lost power, so that could have resulted in
corruption and shouldn't have restarted easily on server-02.
However as the shutdown on server-02 was clean there would be no further
corruption.
You can start the array by giving a backup file (it can be empty) and
specifying --invalid-backup. This tells mdadm not to bother if it cannot
restore the critical section but to just keep going.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Failed to find backup of critical section
2013-09-01 9:21 ` NeilBrown
@ 2013-09-01 10:25 ` Nathan Shearer
0 siblings, 0 replies; 3+ messages in thread
From: Nathan Shearer @ 2013-09-01 10:25 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
> On Sun, 01 Sep 2013 02:56:12 -0600 Nathan Shearer<mail@nathanshearer.ca>
> wrote:
>
>> Hi, I've run into a problem recovering my array from a server power
>> failure. I'll try to keep it short so here is a sequence of events:
>>
>> 1. Running a healthy 4-disk RAID5 array (on server-01).
>> 2. Added a 5th drive and grow the array to a 5-disk RAID6 array (backup
>> file stored on a separate RAID1 array on other disks)
>> 3. Grow begins and passes the critical section, gets to ~15% complete
>> and power to the server fails
> When growing a 4-disk RAID5 to a 5-disk RAID6 the entire process is in the
> "critical section". This is because it is always writing to location where
> live data is.
> When increasing the number of data drives there is a short critical section
> at the start.
> When decreasing the number of data drives there is a short critical section
> at the end.
> But when you don't change the number of data drives as in this case, it is
> all critical and all needs a backup.
>
>> 4. I then move all 5 drives to backup server. The RAID5/6 array
>> assembles and grow continues (without backup file since it's on
>> server-01)
> That shouldn't work. It shouldn't start without the backup file.
>
>> 5. I begin copying data off of that array onto a separate array --
>> filesystem and data is consistent :)
>> 6. Power restored to server-01
>> 7. Safely stop the growing array with mdadm --stop
>> 8. Move 5 drives back into server-01
>> 9. Attempt mdadm --assemble and I get:
>> # mdadm --assemble /dev/md9
>> mdadm: Failed to restore critical section for reshape, sorry.
>> Possibly you needed to specify the --backup-file
> That should have happened on server-02
>
>> 10. Attempt with the original backup file:
>> # mdadm --assemble /dev/md9 --backup-file
>> /mnt/temp/raid-reshape-backup-file
>> mdadm: Failed to restore critical section for reshape, sorry.
>>
>> So when I enable --verbose I get:
>>
>> mdadm:/dev/md9 has an active reshape - checking if critical section
>> needs to be restored
>> mdadm: Failed to find backup of critical section
>> mdadm: Failed to restore critical section for reshape, sorry.
>> Possibly you needed to specify the --backup-file
>>
>> When I provide the backup file I get:
>>
>> mdadm:/dev/md9 has an active reshape - checking if critical section
>> needs to be restored
>> mdadm: too-old timestamp on backup-metadata on
>> /mnt/temp/raid-reshape-backup-file
>> mdadm: Failed to find backup of critical section
>> mdadm: Failed to restore critical section for reshape, sorry.
>>
>> When I tell it to use the "old" backup file I get:
>>
>> # export MDADM_GROW_ALLOW_OLD=1
>> # mdadm --assemble /dev/md9 -vv --backup-file
>> /mnt/temp/raid-reshape-backup-file
>> mdadm:/dev/md9 has an active reshape - checking if critical section
>> needs to be restored
>> mdadm: accepting backup with timestamp 1377794387 for array with
>> timestamp 1377904444
>> mdadm: backup-metadata found on /mnt/temp/raid-reshape-backup-file
>> but is not needed
>> mdadm: Failed to find backup of critical section
>> mdadm: Failed to restore critical section for reshape, sorry.
>>
>> OK, so the backup file is not needed. I assume this is because the
>> critical section was passed long ago, but then why is it attempting to
>> find and restore the backup file when it is provided and also not
>> needed? I have not tried a --force because I don't want to trash my
>> array if there is another better option that I can still try. Any ideas?
>> Is this potentially a bug in mdadm where this kind of array state is not
>> expected?
>>
> The content of the backup file is not needed as it is (presumably) before the
> place where the reshape has proceeded to.
>
> The backup is only needed after an unclean shutdown. Presumably you had an
> unclean shutdown when server-01 lost power, so that could have resulted in
> corruption and shouldn't have restarted easily on server-02.
>
> However as the shutdown on server-02 was clean there would be no further
> corruption.
> You can start the array by giving a backup file (it can be empty) and
> specifying --invalid-backup. This tells mdadm not to bother if it cannot
> restore the critical section but to just keep going.
>
> NeilBrown
>
>
I must be confused on the order of events then -- it's been a busy week.
Just for the record (in case anybody else runs into a similar problem
searching the e-mail archive), the --invalid-backup option did start the
array for me. I used the original backup file that was created instead
of creating a blank one like Neil suggested.
# mdadm --assemble /dev/md3 --backup-file
/root/raid-reshape-backup-file --invalid-backup --verbose
mdadm: looking for devices for /dev/md3
mdadm: /dev/sdf3 is identified as a member of /dev/md3, slot 0.
mdadm: /dev/sde3 is identified as a member of /dev/md3, slot 1.
mdadm: /dev/sdd3 is identified as a member of /dev/md3, slot 3.
mdadm: /dev/sdc3 is identified as a member of /dev/md3, slot 2.
mdadm: /dev/sdb3 is identified as a member of /dev/md3, slot 4.
mdadm:/dev/md3 has an active reshape - checking if critical section
needs to be restored
mdadm: accepting backup with timestamp 1377794387 for array with
timestamp 1377904444
mdadm: backup-metadata found on /root/raid-reshape-backup-file but
is not needed
mdadm: Failed to find backup of critical section
mdadm: continuing without restoring backup
mdadm: added /dev/sde3 to /dev/md3 as 1
mdadm: added /dev/sdc3 to /dev/md3 as 2
mdadm: added /dev/sdd3 to /dev/md3 as 3
mdadm: added /dev/sdb3 to /dev/md3 as 4
mdadm: added /dev/sdf3 to /dev/md3 as 0
mdadm: /dev/md3 has been started with 4 drives (out of 5) and 1
rebuilding.
# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md3 : active raid6 sdf3[5] sdb3[6] sdd3[4] sdc3[2] sde3[1]
8587336140 blocks super 1.2 level 6, 4k chunk, algorithm 18
[5/4] [UUUU_]
[==========>..........] reshape = 54.8%
(1570055672/2862445380) finish=9347.2min speed=2304K/sec
unused devices: <none>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2013-09-01 10:25 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-01 8:56 Failed to find backup of critical section Nathan Shearer
2013-09-01 9:21 ` NeilBrown
2013-09-01 10:25 ` Nathan Shearer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox