From: Anshuman Aggarwal <anshuman.aggarwal@gmail.com>
To: Anshuman Aggarwal <anshuman.aggarwal@gmail.com>
Cc: NeilBrown <neilb@suse.de>, linux-raid@vger.kernel.org
Subject: Re: Growing raid 5: Failed to reshape
Date: Sat, 22 Aug 2009 08:58:28 +0530 [thread overview]
Message-ID: <87EF1BF9-E84C-4555-AF9D-E1CE501AE1D4@gmail.com> (raw)
In-Reply-To: <121580D1-2950-43FB-AD1F-B235D1160932@gmail.com>
Here is the mdadm output from all 3+1(newly added partition which was
being grown)...now I'm not sure if I should try to reconstruct as a 4
device or a 3 device array
=== 1st device ===
mdadm --misc --examine /dev/sdb
/dev/sdb:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 495f6668:f1e12d10:99520f92:7619b487
Name : GATEWAY:raid5_280G (local to host GATEWAY)
Creation Time : Fri Jul 31 23:05:48 2009
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 586114432 (279.48 GiB 300.09 GB)
Array Size : 1172197888 (558.95 GiB 600.17 GB)
Used Dev Size : 586098944 (279.47 GiB 300.08 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 72a2b3ab:e38ba662:bdc0a0c0:0e993b58
Update Time : Fri Aug 21 09:50:47 2009
Checksum : 59e324a1 - correct
Events : 13576
Layout : left-symmetric
Chunk Size : 64K
Array Slot : 0 (0, failed, failed, 2, 1)
Array State : Uuu 2 failed
=== 2nd Device ===
mdadm --misc --examine /dev/sdd5
/dev/sdd5:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x4
Array UUID : 495f6668:f1e12d10:99520f92:7619b487
Name : GATEWAY:raid5_280G (local to host GATEWAY)
Creation Time : Fri Jul 31 23:05:48 2009
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 586099060 (279.47 GiB 300.08 GB)
Array Size : 1758296832 (838.42 GiB 900.25 GB)
Used Dev Size : 586098944 (279.47 GiB 300.08 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : active
Device UUID : 754ae1cf:bbee0582:f660ec89:a88800d3
Reshape pos'n : 0
Delta Devices : 1 (3->4)
Update Time : Fri Aug 21 09:55:38 2009
Checksum : e18481fb - correct
Events : 13581
Layout : left-symmetric
Chunk Size : 64K
Array Slot : 4 (0, failed, failed, 2, 1, 3)
Array State : uUuu 2 failed
=== 3rd Device ===
mdadm --misc --examine /dev/sdc5
mdadm: No md superblock detected on /dev/sdc5.
=== 4th Device (newly added, being grown) ===
mdadm --misc --examine /dev/sda1
mdadm: No md superblock detected on /dev/sda1.
When I try to assemmble, it says superblocks don't match
(obviously)....so which way to try to reconstruct...3 device or 4
device? My hunch says 3 device?
Thanks,
Anshuman
On 22-Aug-09, at 8:11 AM, Anshuman Aggarwal wrote:
> Neil,
> Thanks for your input. Its great to have some hand holding when your
> heart is stuck in your mouth.
>
> Here is some more explanation:
>
> I have another raid array on the same disks in different partitions
> and there was a grow operation happening on those also at time
> (which has completed splendidly after the power outage). From what I
> have observed so far, when there is heavy activity on the disk due
> to 1 array, the kernel delays puts the other tasks in a DELAYED
> status. ( I have done it this way because I have 4 different sized
> disks purchased over time)
>
> I had given the grow command before I realized that the other grow
> operation had not completed on the other partitions.
>
> * The critical section status from mdam was stuck (apparently
> waiting for the grow on the other partitions to complete). Hence it
> did not complete as quickly as it should have.
> * Because it kept waiting for the other md operations on the disk to
> complete, the critical section didn't get written (my guess, its
> also possible that the disk was so busy that it took more than an
> hour but unlikely)
>
> Please tell me if you this additional info changes our approach to
> try and fix this?
>
> I do have a UPS with an hour of backup but recently moved back to my
> home country, India where power supply will probably *NEVER* ever be
> continuos enough for a long md operation :). Hence, I'm definitely
> one to vote for recoverable moves (which mdadm and the kernal have
> been pretty good at so far)
>
> Thanks,
> Anshuman
>
> On 22-Aug-09, at 3:00 AM, NeilBrown wrote:
>
>> On Sat, August 22, 2009 5:31 am, Anshuman Aggarwal wrote:
>>> Hi all,
>>>
>>> Here is my problem and configuration. :
>>>
>>> I had a 3 partition raid5 cluster to which I added a 4th disk and
>>> tried to grow the raid5 by adding the partition on the 4th disk and
>>> then growing it. Unfortunately since another sync task was happening
>>> on the same disks, the operation to move the critical section did
>>> not
>>> complete before the machine was shutdown by the UPS (in control
>>> not a
>>> crash) due to low battery.
>>>
>>> Kernel: 2.6.30.4; mdadm (tried 2.6.7 and 3.0)
>>>
>>> Now, only 1 of my 3 partitions has the superblock and the other 2
>>> and
>>> the 4th new one does not have anything.
>>
>> It is very strange that only one partition has a superblock.
>> I cannot imagine any way that could have happened short of changing
>> the partition tables or deliberately destroying them.
>> I feel the need to ask "are you sure" though presumably you are or
>> you wouldn't have said so...
>
>
> I am positive (at least from the output of mdadm that no superblock
> exists on the other partitions). I am also sure that I am not
> fumbling on the partition device names.
>
>>
>>>
>>> Here is the output of a few mdadm commands.
>>>
>>> $mdadm --misc --examine /dev/sdd5
>>> /dev/sdd5:
>>> Magic : a92b4efc
>>> Version : 1.2
>>> Feature Map : 0x4
>>> Array UUID : 495f6668:f1e12d10:99520f92:7619b487
>>> Name : GATEWAY:raid5_280G (local to host GATEWAY)
>>> Creation Time : Fri Jul 31 23:05:48 2009
>>> Raid Level : raid5
>>> Raid Devices : 4
>>>
>>> Avail Dev Size : 586099060 (279.47 GiB 300.08 GB)
>>> Array Size : 1758296832 (838.42 GiB 900.25 GB)
>>> Used Dev Size : 586098944 (279.47 GiB 300.08 GB)
>>> Data Offset : 272 sectors
>>> Super Offset : 8 sectors
>>> State : active
>>> Device UUID : 754ae1cf:bbee0582:f660ec89:a88800d3
>>>
>>> Reshape pos'n : 0
>>> Delta Devices : 1 (3->4)
>>
>> It certainly looks like it didn't get very far. We cannot
>> know from this for certain.
>> mdadm should have copied the first 4 chunks (256K) to somewhere
>> near the end of the new device, then allowed the reshape to continue.
>> It is possible that the reshape had written to some of these early
>> blocks. If it did we need to recover that backed-up data. I should
>> probably add functionality to mdadm to find and recover such a
>> backup....
>>
>> For now your best bet is to simply try to recreate the array.
>> i.e something like
>>
>> mdadm -C /dev/md0 -l5 -n3 -e 1.2 --name "raid5_280G" --assume-clean \
>> /dev/sdc5 /dev/sdd5 /dev/sde5
>>
>> You need to make sure that you get the right devices in the right
>> order. From the information you gave I only know for certain that
>> /dev/sdd5 is the middle of the three.
>>
>> This will write new superblocks and assemble the array but will not
>> change any of the data. You can then access the array read-only
>> and see if the data looks like it is all there. If it isn't, stop
>> the array and try to work out why.
>> If it is, you can try to grow the array again, this time with a more
>> reliable power supply ;-)
>>
>> Speaking of which... just how long was it before when you started the
>> grow and when the power shut off. It really shouldn't be more than
>> a few seconds, even if other things are happening on the system.
>> (normally it would be a few hundred milliseconds at most).
>>
>> Good luck,
>> NeilBrown
>>
>>
>>>
>>> Update Time : Fri Aug 21 09:55:38 2009
>>> Checksum : e18481fb - correct
>>> Events : 13581
>>>
>>> Layout : left-symmetric
>>> Chunk Size : 64K
>>>
>>> Array Slot : 4 (0, failed, failed, 2, 1, 3)
>>> Array State : uUuu 2 failed
>>>
>>> $mdadm --assemble --scan
>>> mdadm: Failed to restore critical section for reshape, sorry.
>>>
>>> I am positive that none of the actual growing steps even started
>>> so my
>>> data 'should' be safe as long as I can recreate the superblocks,
>>> right?
>>>
>>> As always, appreciate the help of the open source community.
>>> Thanks!!
>>>
>>> Thanks,
>>> Anshuman
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-
>>> raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>
>
next prev parent reply other threads:[~2009-08-22 3:28 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-21 19:31 Growing raid 5: Failed to reshape Anshuman Aggarwal
2009-08-21 21:30 ` NeilBrown
2009-08-22 2:41 ` Anshuman Aggarwal
2009-08-22 3:28 ` Anshuman Aggarwal [this message]
2009-08-22 3:28 ` NeilBrown
2009-08-22 3:55 ` Anshuman Aggarwal
2009-08-22 4:14 ` NeilBrown
2009-08-22 4:35 ` Anshuman Aggarwal
2009-08-22 4:55 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87EF1BF9-E84C-4555-AF9D-E1CE501AE1D4@gmail.com \
--to=anshuman.aggarwal@gmail.com \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox