From: Anshuman Aggarwal <anshuman.aggarwal@gmail.com>
To: NeilBrown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Subject: Re: Growing raid 5: Failed to reshape
Date: Sat, 22 Aug 2009 10:05:28 +0530 [thread overview]
Message-ID: <3FA7DE88-932D-4194-9195-E7CBA93D5432@gmail.com> (raw)
In-Reply-To: <cad055848e80fe2cffebf8bffdfe94f0.squirrel@neil.brown.name>
Well, I was so relieved on seeing what looked like my data, that I
didn't wait for this last mail....and already started the grow
operation again!
What is the best way I can have the array check itself out now? to
make sure there are no data inconsistencies? I guess I'll should wait
for the grow operation to complete first?
Will a controlled system shutdown hurt the grow operation (I have an
APC UPS which shuts down my machine well in time when there is an
outage)? I am hoping it will resume from where it left off? since the
critical section has passed?
Also, one interesting observation about mdadm you may be especially
interested in:
I have tried using both 2.6.7 and 3.0 (final june version) of mdadm
with the kernel 2.6.30.4...
* mdadm 3.0 wouldn't grow the array :
/Src/mdadm-3.0# ./mdadm --grow /dev/md127 -n 4
mdadm: Need to backup 384K of critical section..
mdadm: /dev/md127: failed to save critical region
I resorted to using the mdadm 2.6.7 that came with Ubuntu...
Thanks,
Anshuman
On 22-Aug-09, at 9:44 AM, NeilBrown wrote:
> On Sat, August 22, 2009 1:55 pm, Anshuman Aggarwal wrote:
>> I have just sent in another mail with the mdadm examine details from
>> the 3 + 1(grown) partitions. I am sure of the device names, but not
>> sure of the order (which examine does tell me)
>> Here are the devices, in order (I think): /dev/sdb, /dev/sdd5, /dev/
>> sdc5 + /dev/sda2 with the dd output you requested:
>
> Thanks.
> /dev/sdb and /dev/sdd5 definitely look correct.
> I am very suspicious of the others though. If the metadata has been
> destroyed, it is entirely possible that some of the data has been
> corrupted as well.
>
> As you only need two drives to recover your data, and you have two
> drives that look good, I suggest that you just use those.
> So:
>
> mdadm --create /dev/md0 -l5 -n3 -e1.2 --name raid5_280G \
> /dev/sdb /dev/sdd5 missing
>
> The first thing to do is --examine sdb and sdd5 and make sure that
> "Data Offset" is 272. It probably will be, but some different
> versions
> of mdadm used different offsets and you need to be sure.
> Assuming it is 272 your data should be safe and you an "fsck" and
> "mount"
> just to confirm that.
>
> Then add sdc5 and sda2 and let the array recover the missing device.
> Once that is done you can try the --grow again.
>
> NeilBrown
>
>
>
>
>>
>> ----------------------------------
>> dd if=/dev/sdb skip=8 count=2 | od -x
>>
>> 2+0 records in
>> 2+0 records out
>> 1024 bytes (1.0 kB) copied, 5.6394e-05 s, 18.2 MB/s
>> 0000000 4efc a92b 0001 0000 0000 0000 0000 0000
>> 0000020 5f49 6866 e1f1 102d 5299 920f 1976 87b4
>> 0000040 4147 4554 4157 3a59 6172 6469 5f35 3832
>> 0000060 4730 0000 0000 0000 0000 0000 0000 0000
>> 0000100 2b74 4a73 0000 0000 0005 0000 0002 0000
>> 0000120 2900 22ef 0000 0000 0080 0000 0003 0000
>> 0000140 0002 0000 0000 0000 0300 0000 0000 0000
>> 0000160 0000 0000 0000 0000 0000 0000 0000 0000
>> 0000200 0110 0000 0000 0000 6580 22ef 0000 0000
>> 0000220 0008 0000 0000 0000 0000 0000 0000 0000
>> 0000240 0000 0000 0000 0000 a272 abb3 8be3 62a6
>> 0000260 c0bd c0a0 990e 583b 0000 0000 0000 0000
>> 0000300 209f 4a8e 0000 0000 3508 0000 0000 0000
>> 0000320 ffff ffff ffff ffff 24a1 59e3 0180 0000
>> 0000340 0000 0000 0000 0000 0000 0000 0000 0000
>> *
>> 0000400 0000 fffe fffe 0002 0001 ffff ffff ffff
>> 0000420 ffff ffff ffff ffff ffff ffff ffff ffff
>> *
>> 0002000
>> ------------------------------------
>> dd if=/dev/sdd5 skip=8 count=2 | od -x
>> 2+0 records in
>> 2+0 records out
>> 1024 bytes (1.0 kB) copied, 0.0104253 s, 98.2 kB/s
>> 0000000 4efc a92b 0001 0000 0004 0000 0000 0000
>> 0000020 5f49 6866 e1f1 102d 5299 920f 1976 87b4
>> 0000040 4147 4554 4157 3a59 6172 6469 5f35 3832
>> 0000060 4730 0000 0000 0000 0000 0000 0000 0000
>> 0000100 2b74 4a73 0000 0000 0005 0000 0002 0000
>> 0000120 2900 22ef 0000 0000 0080 0000 0004 0000
>> 0000140 0002 0000 0005 0000 0000 0000 0000 0000
>> 0000160 0001 0000 0002 0000 0080 0000 0000 0000
>> 0000200 0110 0000 0000 0000 2974 22ef 0000 0000
>> 0000220 0008 0000 0000 0000 0000 0000 0000 0000
>> 0000240 0004 0000 0000 0000 4a75 cfe1 eebb 8205
>> 0000260 60f6 89ec 88a8 d300 0000 0000 0000 0000
>> 0000300 21c2 4a8e 0000 0000 350d 0000 0000 0000
>> 0000320 0000 0000 0000 0000 81fb e184 0180 0000
>> 0000340 0000 0000 0000 0000 0000 0000 0000 0000
>> *
>> 0000400 0000 fffe fffe 0002 0001 0003 ffff ffff
>> 0000420 ffff ffff ffff ffff ffff ffff ffff ffff
>> *
>> 0002000
>> ------------------------------------
>> dd if=/dev/sdc5 skip=8 count=2 | od -x
>> 2+0 records in
>> 2+0 records out
>> 1024 bytes (1.0 kB) copied, 0.0102071 s, 100 kB/s
>> 0000000 0000 0000 0000 0000 0000 0000 0000 0000
>> *
>> 0002000
>> --------------------------------------
>> This following is probably just junk since it is not even initialized
>>
>> dd if=/dev/sda1 skip=8 count=2 | od -x
>> 2+0 records in
>> 2+0 records out
>> 1024 bytes (1.0 kB) copied, 0.0127419 s, 80.4 kB/s
>> 0000000 0000 0000 0000 0000 0000 0000 0000 0000
>> *
>> 0000200 0000 0000 0000 0000 4cf4 0000 0000 0000
>> 0000220 0000 0000 0000 0000 0000 0000 0000 0000
>> 0000240 0004 0000 0000 0000 e807 6452 6558 e0a3
>> 0000260 a04b 494c 11a6 8b3b 0000 0000 0000 0000
>> 0000300 0000 0000 0000 0000 0002 0000 0000 0000
>> 0000320 0000 0000 0000 0000 a1e8 b863 0000 0000
>> 0000340 0000 0000 0000 0000 0000 0000 0000 0000
>> *
>> 0002000
>>
>>
>> Thanks,
>> Anshuman
>>
>>
>> On 22-Aug-09, at 8:58 AM, NeilBrown wrote:
>>
>>> On Sat, August 22, 2009 12:41 pm, Anshuman Aggarwal wrote:
>>>> Neil,
>>>> Thanks for your input. Its great to have some hand holding when
>>>> your
>>>> heart is stuck in your mouth.
>>>>
>>>> Here is some more explanation:
>>>>
>>>> I have another raid array on the same disks in different partitions
>>>> and there was a grow operation happening on those also at time
>>>> (which
>>>> has completed splendidly after the power outage). From what I have
>>>> observed so far, when there is heavy activity on the disk due to 1
>>>> array, the kernel delays puts the other tasks in a DELAYED status.
>>>> ( I
>>>> have done it this way because I have 4 different sized disks
>>>> purchased
>>>> over time)
>>>>
>>>> I had given the grow command before I realized that the other grow
>>>> operation had not completed on the other partitions.
>>>>
>>>> * The critical section status from mdam was stuck (apparently
>>>> waiting
>>>> for the grow on the other partitions to complete). Hence it did not
>>>> complete as quickly as it should have.
>>>> * Because it kept waiting for the other md operations on the disk
>>>> to
>>>> complete, the critical section didn't get written (my guess, its
>>>> also
>>>> possible that the disk was so busy that it took more than an hour
>>>> but
>>>> unlikely)
>>>>
>>>> Please tell me if you this additional info changes our approach to
>>>> try
>>>> and fix this?
>>>
>>> I understand now (and on reflection, your original email had enough
>>> information that I should have been able to pick up on). When
>>> there is a resync happening on one partition of a drive, md will
>>> not start a resync on any other partition of that drive and that
>>> would result in significantly reduced performance and reduced total
>>> time to completion.
>>> This applies equally to recovery and reshape.
>>>
>>> So while the first reshape has happening, the second would not
>>> have started at all. This confirms that no data will have been
>>> relocated at all, so a correct '--create' will get your data back
>>> correctly.
>>>
>>> I should change mdadm to not try starting a reshape if it won't
>>> proceed as it could cause real problems if the start of the reshape
>>> blocks for too long.
>>>
>>> This still doesn't explain why you lost some metadata though.
>>> If it updated one of the devices, it should have updated all of them
>>> as it does the update in parallel.
>>>
>>> Would you be able to:
>>>
>>> dd if=/dev/WHATEVER skip=8 count=2 | od -x
>>>
>>> where 'WHATEVER' is each of the different devices that you think it
>>> in the array. That might give me some clue.
>>>
>>> My recommendation for how to fix it remains the same. I now have
>>> more confidence that it will work. You need to be sure which device
>>> is
>>> which though.
>>>
>>> NeilBrown
>>>
>>>
>>>>
>>>> I do have a UPS with an hour of backup but recently moved back to
>>>> my
>>>> home country, India where power supply will probably *NEVER* ever
>>>> be
>>>> continuos enough for a long md operation :). Hence, I'm definitely
>>>> one to vote for recoverable moves (which mdadm and the kernal have
>>>> been pretty good at so far)
>>>>
>>>> Thanks,
>>>> Anshuman
>>>>
>>>> On 22-Aug-09, at 3:00 AM, NeilBrown wrote:
>>>>
>>>>> On Sat, August 22, 2009 5:31 am, Anshuman Aggarwal wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> Here is my problem and configuration. :
>>>>>>
>>>>>> I had a 3 partition raid5 cluster to which I added a 4th disk
>>>>>> and
>>>>>> tried to grow the raid5 by adding the partition on the 4th disk
>>>>>> and
>>>>>> then growing it. Unfortunately since another sync task was
>>>>>> happening
>>>>>> on the same disks, the operation to move the critical section did
>>>>>> not
>>>>>> complete before the machine was shutdown by the UPS (in control
>>>>>> not a
>>>>>> crash) due to low battery.
>>>>>>
>>>>>> Kernel: 2.6.30.4; mdadm (tried 2.6.7 and 3.0)
>>>>>>
>>>>>> Now, only 1 of my 3 partitions has the superblock and the other 2
>>>>>> and
>>>>>> the 4th new one does not have anything.
>>>>>
>>>>> It is very strange that only one partition has a superblock.
>>>>> I cannot imagine any way that could have happened short of
>>>>> changing
>>>>> the partition tables or deliberately destroying them.
>>>>> I feel the need to ask "are you sure" though presumably you are or
>>>>> you wouldn't have said so...
>>>>
>>>>
>>>> I am positive (at least from the output of mdadm that no superblock
>>>> exists on the other partitions). I am also sure that I am not
>>>> fumbling
>>>> on the partition device names.
>>>>
>>>>>
>>>>>>
>>>>>> Here is the output of a few mdadm commands.
>>>>>>
>>>>>> $mdadm --misc --examine /dev/sdd5
>>>>>> /dev/sdd5:
>>>>>> Magic : a92b4efc
>>>>>> Version : 1.2
>>>>>> Feature Map : 0x4
>>>>>> Array UUID : 495f6668:f1e12d10:99520f92:7619b487
>>>>>> Name : GATEWAY:raid5_280G (local to host GATEWAY)
>>>>>> Creation Time : Fri Jul 31 23:05:48 2009
>>>>>> Raid Level : raid5
>>>>>> Raid Devices : 4
>>>>>>
>>>>>> Avail Dev Size : 586099060 (279.47 GiB 300.08 GB)
>>>>>> Array Size : 1758296832 (838.42 GiB 900.25 GB)
>>>>>> Used Dev Size : 586098944 (279.47 GiB 300.08 GB)
>>>>>> Data Offset : 272 sectors
>>>>>> Super Offset : 8 sectors
>>>>>> State : active
>>>>>> Device UUID : 754ae1cf:bbee0582:f660ec89:a88800d3
>>>>>>
>>>>>> Reshape pos'n : 0
>>>>>> Delta Devices : 1 (3->4)
>>>>>
>>>>> It certainly looks like it didn't get very far. We cannot
>>>>> know from this for certain.
>>>>> mdadm should have copied the first 4 chunks (256K) to somewhere
>>>>> near the end of the new device, then allowed the reshape to
>>>>> continue.
>>>>> It is possible that the reshape had written to some of these early
>>>>> blocks. If it did we need to recover that backed-up data. I
>>>>> should
>>>>> probably add functionality to mdadm to find and recover such a
>>>>> backup....
>>>>>
>>>>> For now your best bet is to simply try to recreate the array.
>>>>> i.e something like
>>>>>
>>>>> mdadm -C /dev/md0 -l5 -n3 -e 1.2 --name "raid5_280G" --assume-
>>>>> clean \
>>>>> /dev/sdc5 /dev/sdd5 /dev/sde5
>>>>>
>>>>> You need to make sure that you get the right devices in the right
>>>>> order. From the information you gave I only know for certain that
>>>>> /dev/sdd5 is the middle of the three.
>>>>>
>>>>> This will write new superblocks and assemble the array but will
>>>>> not
>>>>> change any of the data. You can then access the array read-only
>>>>> and see if the data looks like it is all there. If it isn't, stop
>>>>> the array and try to work out why.
>>>>> If it is, you can try to grow the array again, this time with a
>>>>> more
>>>>> reliable power supply ;-)
>>>>>
>>>>> Speaking of which... just how long was it before when you started
>>>>> the
>>>>> grow and when the power shut off. It really shouldn't be more
>>>>> than
>>>>> a few seconds, even if other things are happening on the system.
>>>>> (normally it would be a few hundred milliseconds at most).
>>>>>
>>>>> Good luck,
>>>>> NeilBrown
>>>>>
>>>>>
>>>>>>
>>>>>> Update Time : Fri Aug 21 09:55:38 2009
>>>>>> Checksum : e18481fb - correct
>>>>>> Events : 13581
>>>>>>
>>>>>> Layout : left-symmetric
>>>>>> Chunk Size : 64K
>>>>>>
>>>>>> Array Slot : 4 (0, failed, failed, 2, 1, 3)
>>>>>> Array State : uUuu 2 failed
>>>>>>
>>>>>> $mdadm --assemble --scan
>>>>>> mdadm: Failed to restore critical section for reshape, sorry.
>>>>>>
>>>>>> I am positive that none of the actual growing steps even
>>>>>> started so
>>>>>> my
>>>>>> data 'should' be safe as long as I can recreate the superblocks,
>>>>>> right?
>>>>>>
>>>>>> As always, appreciate the help of the open source community.
>>>>>> Thanks!!
>>>>>>
>>>>>> Thanks,
>>>>>> Anshuman
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-
>>>>>> raid" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at http://vger.kernel.org/majordomo-
>>>>>> info.html
>>>>>>
>>>>>
>>>>
>>>
>>
>
next prev parent reply other threads:[~2009-08-22 4:35 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-21 19:31 Growing raid 5: Failed to reshape Anshuman Aggarwal
2009-08-21 21:30 ` NeilBrown
2009-08-22 2:41 ` Anshuman Aggarwal
2009-08-22 3:28 ` Anshuman Aggarwal
2009-08-22 3:28 ` NeilBrown
2009-08-22 3:55 ` Anshuman Aggarwal
2009-08-22 4:14 ` NeilBrown
2009-08-22 4:35 ` Anshuman Aggarwal [this message]
2009-08-22 4:55 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3FA7DE88-932D-4194-9195-E7CBA93D5432@gmail.com \
--to=anshuman.aggarwal@gmail.com \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox