* RAID 6 reshape failed (false message about critical section)
@ 2007-09-05 10:33 Anton Voloshin
2007-09-05 12:36 ` Neil Brown
0 siblings, 1 reply; 7+ messages in thread
From: Anton Voloshin @ 2007-09-05 10:33 UTC (permalink / raw)
To: linux-raid
Dear all,
I've been using RAID5 on my video archival server with 6 x 750GB
drives for a year and it was working just fine. Thanks to Niel Brown
and all the team for great job!
Recently while upgrading server to RAID6 I've created new 5 x 750GB
drives RAID6 array, waited for parity calculation to finish (it took
6-7 hours) and started reshaping from 5 to 8 drives. According to
information on this list 2.6.21 or later kernel is required for RAID6
reshaping so I upgraded kernel from 2.6.20 to 2.6.22 (both are
standard Ubuntu server kernels, from Feisty and Gutsy releases resp.).
I started reshape by
mdadm --add /dev/md1 /dev/sda2 /dev/sdb2 /dev/sdd2
mdadm --grow /dev/md1 -n 8
It went through critical section just fine and have been happily
reshaping for few hours. Estimated completion time according to /proc/
mdstat was around 1600-1700 minutes.
Then due to other circumstances I had to reboot my server.
Reboot was going not smoothly (had to reboot few times due to some
errors in my startup scripts - not connected to md in any way I think).
After I rebooted, I could see that md1 array was not stared
automatically and when I'm trying to run
mdadm --assemble /dev/md1
it says "Failed to restore critical section for reshape, sorry."
although it is not the case as far as I can tell (reshaping was going
for at least one or two hours before first reboot).
Please advise me how should I proceed to resolve this situation and
save my data if possible (yes, unfortunately I was going to make
backups in a week but did not had them yet - too bad for me :-( ).
Superblocks on all partitions are fine, mdadm --examine gives the
same information for all 8 partitions, e.g.
> /dev/sda2:
> Magic : a92b4efc
> Version : 00.91.00
> UUID : 37d56bd1:4f8ccf24:2421b4fc:05cfad50 (local to
> host videoserver)
> Creation Time : Mon Sep 3 16:27:04 2007
> Raid Level : raid6
> Used Dev Size : 730619904 (696.77 GiB 748.15 GB)
> Array Size : 4383719424 (4180.64 GiB 4488.93 GB)
> Raid Devices : 8
> Total Devices : 8
> Preferred Minor : 1
>
> Reshape pos'n : 140169216 (133.68 GiB 143.53 GB)
> Delta Devices : 3 (5->8)
>
> Update Time : Wed Sep 5 00:15:40 2007
> State : clean
> Active Devices : 8
> Working Devices : 8
> Failed Devices : 0
> Spare Devices : 0
> Checksum : 249b218d - correct
> Events : 0.15268
>
> Chunk Size : 1024K
>
> Number Major Minor RaidDevice State
> this 5 8 98 5 active sync /dev/sdg2
>
> 0 0 8 2 0 active sync /dev/sda2
> 1 1 8 34 1 active sync /dev/sdc2
> 2 2 8 50 2 active sync /dev/sdd2
> 3 3 8 66 3 active sync /dev/sde2
> 4 4 8 82 4 active sync /dev/sdf2
> 5 5 8 98 5 active sync /dev/sdg2
> 6 6 8 114 6 active sync /dev/sdh2
> 7 7 8 18 7 active sync /dev/sdb2
relevant lines from /etc/mdadm/mdadm.conf:
> DEVICE /dev/sd[a-z]*
> ARRAY /dev/md1 level=raid6 num-devices=8
> UUID=37d56bd1:4f8ccf24:2421b4fc:05cfad50
Kernel version:
> root@videoserver:/# uname -a
> Linux videoserver 2.6.22-10-server #1 SMP Wed Aug 22 08:06:27 GMT
> 2007 x86_64 GNU/Linux
mdadm version:
> root@videoserver:/# mdadm --version
> mdadm - v2.6.3 - 20th August 2007
I had some programming experience before (in userspace only) so I
could try to do some debugging in gdb if necessary - but please
advise me what to look for.
Thank you in advance for any advice and/or help.
Best regards,
Anton Voloshin
Saint Petersburg, Russia
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID 6 reshape failed (false message about critical section)
2007-09-05 10:33 RAID 6 reshape failed (false message about critical section) Anton Voloshin
@ 2007-09-05 12:36 ` Neil Brown
2007-09-05 18:13 ` Anton Voloshin
2007-09-05 18:16 ` RAID 6 reshape failed (false message about critical section) Ashutosh Krishna Das
0 siblings, 2 replies; 7+ messages in thread
From: Neil Brown @ 2007-09-05 12:36 UTC (permalink / raw)
To: Anton Voloshin; +Cc: linux-raid
On Wednesday September 5, ashutosh@harekrishna.ru wrote:
>
> Please advise me how should I proceed to resolve this situation and
> save my data if possible (yes, unfortunately I was going to make
> backups in a week but did not had them yet - too bad for me :-( ).
Hi.
thanks for the detailed report.
I can see what the problem is. It will take a little while to figure
out what the "correct" fix it, but a quick fix to get you out of
trouble would be to remove the lines:
if (info->array.utime > __le64_to_cpu(bsb.mtime) + 3600 ||
info->array.utime < __le64_to_cpu(bsb.mtime))
continue; /* time stamp is too bad */
from Grow.c in mdadm (around line 925). This change is definitely
safe for your case, and should get the array assembled and the reshape
restarted. Please let me know how it goes.
NeilBrown
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID 6 reshape failed (false message about critical section)
2007-09-05 12:36 ` Neil Brown
@ 2007-09-05 18:13 ` Anton Voloshin
2007-09-06 5:12 ` Neil Brown
2007-09-05 18:16 ` RAID 6 reshape failed (false message about critical section) Ashutosh Krishna Das
1 sibling, 1 reply; 7+ messages in thread
From: Anton Voloshin @ 2007-09-05 18:13 UTC (permalink / raw)
To: linux-raid; +Cc: Neil Brown
Dear Neil,
> I can see what the problem is. It will take a little while to figure
> out what the "correct" fix it, but a quick fix to get you out of
> trouble would be to remove the lines:
>
> if (info->array.utime > __le64_to_cpu(bsb.mtime) + 3600 ||
> info->array.utime < __le64_to_cpu(bsb.mtime))
> continue; /* time stamp is too bad */
I've applied the patch that you suggested but I'm getting exactly the
same result:
> root@videoserver:~/mdadm-2.6.3# ./mdadm --assemble /dev/md1
> mdadm: Failed to restore critical section for reshape, sorry.
Please advise me on any more information that I should provide or
make any other suggestions/advices.
Best regards,
Anton Voloshin
Saint Petersburg, Russia
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID 6 reshape failed (false message about critical section)
2007-09-05 18:13 ` Anton Voloshin
@ 2007-09-06 5:12 ` Neil Brown
2007-09-06 20:57 ` RAID 6 reshape failed (false message about critical section) - success report Anton Voloshin
0 siblings, 1 reply; 7+ messages in thread
From: Neil Brown @ 2007-09-06 5:12 UTC (permalink / raw)
To: Anton Voloshin; +Cc: linux-raid
On Wednesday September 5, ashutosh@harekrishna.ru wrote:
> Dear Neil,
>
> > I can see what the problem is. It will take a little while to figure
> > out what the "correct" fix it, but a quick fix to get you out of
> > trouble would be to remove the lines:
> >
> > if (info->array.utime > __le64_to_cpu(bsb.mtime) + 3600 ||
> > info->array.utime < __le64_to_cpu(bsb.mtime))
> > continue; /* time stamp is too bad */
>
> I've applied the patch that you suggested but I'm getting exactly the
> same result:
>
> > root@videoserver:~/mdadm-2.6.3# ./mdadm --assemble /dev/md1
> > mdadm: Failed to restore critical section for reshape, sorry.
>
> Please advise me on any more information that I should provide or
> make any other suggestions/advices.
>
At the top of Grow_restart (in Grow.c), just put
return 0;
That will definitely get you your array back.
I think the correct fix will be to put:
if (info->reshape_progress > SOME_NUMBER)
return 0;
at the top of Grow_restart. I just have to review exactly how it
works to make sure I pick the correct "SOME_NUMBER".
Also
if (__le64_to_cpu(bsb.length) <
info->reshape_progress)
continue; /* No new data here */
might need to become
if (__le64_to_cpu(bsb.length) <
info->reshape_progress)
return 0; /* No new data here */
but I need to think carefully about that too.
NeilBrown
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: RAID 6 reshape failed (false message about critical section) - success report
2007-09-06 5:12 ` Neil Brown
@ 2007-09-06 20:57 ` Anton Voloshin
2007-09-08 16:19 ` Bill Davidsen
0 siblings, 1 reply; 7+ messages in thread
From: Anton Voloshin @ 2007-09-06 20:57 UTC (permalink / raw)
To: linux-raid; +Cc: Neil Brown
Dear Neil,
> At the top of Grow_restart (in Grow.c), just put
> return 0;
>
> That will definitely get you your array back.
Thank you for your help, I've got my array assembled, running, and all my
data back up!
But everything was not so smooth as I was (secretly) hoping originally.
Details follow.
After I first run ./mdadm --assemble /dev/md1
array actually assembled but with 6 drives out of 8. I don't know exact
reason why two partitions are missing, but attempts to add missing
partitions with
> mdadm --add /dev/md1 /dev/sda2
resulted in message that "/dev/sdc1 is locked" or something like that.
Partitions are there (fdisk -l /dev/sda supports that) and they are
present in /dev. I suspect that maybe udev was doing something wrong but I
don't know for sure. Missing partitions were ones from drives which are
used to run my root partition /dev/md0 - raid1 made of /dev/sda1 and
/dev/sdc1.
Anyway, since it's RAID6 even with two drives missing array was able to
get running. But recovery speed was zero i.e. /proc/mdstat was something
like:
(it is not verbatim copy, but a fake based on current state - just to give
an idea of what it was like before)
> md1 : active raid6 sde2[1] sdd2[7] sdb2[6] sda2[5] sdg2[3] sdf2[2]
> 2191859712 blocks super 0.91 level 6, 1024k chunk, algorithm 2
[8/6] [_UUU_UUU]
> [=>...................] reshape = 7.0% (51783680/730619904)
finish=1571992.7min speed=0K/sec
So speed was zero and finish time was gradually increasing from tens of
thousands to tens of millions minutes and more.
Any process trying to read from /dev/md1 would hang in "D" state,
including mount so I was not able to see my data at that moment.
Few reboots later (during those I was debugging my boot scripts to find
out that /etc/init.d/udev in startup scripts never got control back after
starting /sbin/udevsettle, but I believe this is a separate matter not
connected with md.
Anyway, during each reboot I could see the same condition - array
assembled but with 0k reshape speed (and before somebody asked -
/sys/block/md1/md/sync_speed_{min,max} had their default values - 1000 and
200000 resp). After some reboots I could see array was assembled from all
8 disks but still reshape speed was zero.
Few reboots later after fixing my startup scripts, I was rather pleasantly
surprised to hear my hard drives busily humming and to find in
/proc/mdstat that reshape speed is 800K/sec and growing (up to current
value of about 10000K/sec). Array was working from 6 partitions out of 8.
/dev/md1 mounted fine and I have all my precious data back intact, and
needless to say that I'm very happy to see that.
Now, I have my array in degraded condition (6 out of 8 drives running) and
reshaping. I don't feel adventurous enough to try to add drives to array
before it will finish current reshaping. :-)
I believe it's time to get some backup space for about 2TB of data kept in
this array.
So array's current state according to /proc/mdstat is:
> md1 : active raid6 sde2[1] sdd2[7] sdb2[6] sda2[5] sdg2[3] sdf2[2]
> 2191859712 blocks super 0.91 level 6, 1024k chunk, algorithm 2
[8/6] [_UUU_UUU]
> [=>...................] reshape = 8.5% (62185472/730619904)
finish=969.0min speed=11495K/sec
And I'm waiting for it to finish this operation. In the mean time /dev/md1
works fine for both read and write so our file server is happily back
online to much happiness of my colleagues.
Please let me know if I can provide any information useful for debugging
or fixing this issue. I seems to me that something needs to be fixed on
kernel side too (but I'm not really qualified to make such judgments).
I will post again after finishing current reshape operation and adding two
"lost" partitions to array. I believe it will take more than 52 hours to
finish all operations (current reshape will take 16 more hours and two
times 18 hours to add two 750 GB partitions). Will let you know
afterwards.
My thanks are going again to Niel for quick and efficient fix - he is just
living up to his reputation of living legend of programming world.
> I think the correct fix will be to put:
>
> if (info->reshape_progress > SOME_NUMBER)
> return 0;
>
> at the top of Grow_restart. I just have to review exactly how it
> works to make sure I pick the correct "SOME_NUMBER".
>
> Also
> if (__le64_to_cpu(bsb.length) <
> info->reshape_progress)
> continue; /* No new data here */
>
> might need to become
> if (__le64_to_cpu(bsb.length) <
> info->reshape_progress)
> return 0; /* No new data here */
>
> but I need to think carefully about that too.
I'm looking forward to see some new fixes and improvements for this
wonderful piece of software - linux md.
Best regards,
Anton "Ashutosh" Voloshin
Saint Petersburg, Russia (SCSMath)
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: RAID 6 reshape failed (false message about critical section) - success report
2007-09-06 20:57 ` RAID 6 reshape failed (false message about critical section) - success report Anton Voloshin
@ 2007-09-08 16:19 ` Bill Davidsen
0 siblings, 0 replies; 7+ messages in thread
From: Bill Davidsen @ 2007-09-08 16:19 UTC (permalink / raw)
To: Anton Voloshin; +Cc: linux-raid, Neil Brown
Anton Voloshin wrote:
> Dear Neil,
>
>
>> At the top of Grow_restart (in Grow.c), just put
>> return 0;
>>
>> That will definitely get you your array back.
>>
>
> Thank you for your help, I've got my array assembled, running, and all my
> data back up!
>
> But everything was not so smooth as I was (secretly) hoping originally.
> Details follow.
>
> After I first run ./mdadm --assemble /dev/md1
> array actually assembled but with 6 drives out of 8. I don't know exact
> reason why two partitions are missing, but attempts to add missing
> partitions with
>
>> mdadm --add /dev/md1 /dev/sda2
>>
> resulted in message that "/dev/sdc1 is locked" or something like that.
> Partitions are there (fdisk -l /dev/sda supports that) and they are
> present in /dev. I suspect that maybe udev was doing something wrong but I
> don't know for sure. Missing partitions were ones from drives which are
> used to run my root partition /dev/md0 - raid1 made of /dev/sda1 and
> /dev/sdc1.
>
I thought I had replied to this, but I don't see the message, so I'll
say it again in case it is helpful to others.
If you have a condition like this where a partition doesn't want to add,
use "lsof" to look for things using it, and check the values in
/proc/diskstats over a few minutes to see if something actually *is*
using the partition. Attempting to force an add when some process or
kernel thread is using a partition is a bad idea.
--
bill davidsen <davidsen@tmr.com>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID 6 reshape failed (false message about critical section)
2007-09-05 12:36 ` Neil Brown
2007-09-05 18:13 ` Anton Voloshin
@ 2007-09-05 18:16 ` Ashutosh Krishna Das
1 sibling, 0 replies; 7+ messages in thread
From: Ashutosh Krishna Das @ 2007-09-05 18:16 UTC (permalink / raw)
To: linux-raid; +Cc: Neil Brown
> I can see what the problem is. It will take a little while to figure
> out what the "correct" fix it, but a quick fix to get you out of
> trouble would be to remove the lines:
And by the way, Neil, thank you for quick and informative reply.
Best regards,
Anton Voloshin
Saint Petersburg, Russia
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2007-09-08 16:19 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-05 10:33 RAID 6 reshape failed (false message about critical section) Anton Voloshin
2007-09-05 12:36 ` Neil Brown
2007-09-05 18:13 ` Anton Voloshin
2007-09-06 5:12 ` Neil Brown
2007-09-06 20:57 ` RAID 6 reshape failed (false message about critical section) - success report Anton Voloshin
2007-09-08 16:19 ` Bill Davidsen
2007-09-05 18:16 ` RAID 6 reshape failed (false message about critical section) Ashutosh Krishna Das
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).