From mboxrd@z Thu Jan 1 00:00:00 1970 From: Doug Ledford Subject: Re: Seagate black armour recovery Date: Sat, 09 Nov 2013 02:25:22 -0500 Message-ID: <527DE362.5060509@redhat.com> References: <52781F71.4060105@turmel.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Sender: linux-raid-owner@vger.kernel.org To: Kevin Wilson , Phil Turmel Cc: linux-raid@vger.kernel.org, Morne Botha , Neil Brown , Jes Sorensen List-Id: linux-raid.ids On 11/05/2013 02:39 PM, Kevin Wilson wrote: > Hi Phil, > Thanks for the quick reply. I should have, as you correctly stated, > included the result from trying to force assemble. > mdadm: looking for devices for /dev/md3 > mdadm: /dev/sda4 is identified as a member of /dev/md3, slot 0. > mdadm: /dev/sdb4 is identified as a member of /dev/md3, slot 1. > mdadm: /dev/sdc4 is identified as a member of /dev/md3, slot 2. > mdadm: ignoring /dev/sdb4 as it reports /dev/sda4 as failed > mdadm: ignoring /dev/sdc4 as it reports /dev/sda4 as failed > mdadm: no uptodate device for slot 1 of /dev/md3 > mdadm: no uptodate device for slot 2 of /dev/md3 > mdadm: no uptodate device for slot 3 of /dev/md3 > mdadm: added /dev/sda4 to /dev/md3 as 0 > mdadm: /dev/md3 assembled from 1 drive - not enough to start the array. > > I was then trying to edit the Array status in sdb4 and sdc4 due to the > two lines ignoring /dev/sd[x]4 as it reports... > The man pages suggest using the --update=summaries with a list of the > devices, however I get an error that states that this is not valid for > 1.X superblock versions. Hmmm...this looks like a legitimate bug in the raid superblock update code. I'm putting Neil on the Cc: of this email so he doesn't accidentally overlook this issue. So, as I see it, the bug (which is present in your mdadm -E output below, and confirmed in the dmesg output above) is that at some point in time, /dev/sdd4 failed, resulting in a superblock update on sda4, sdb4, and sdc4. From the looks of it, the update landed on sda4 before something else happened causing the raid subsystem to mark sda4 as bad. Then, we marked sda4 bad in our internal superblock and wrote that to sdb4, and then that must have returned a failure before we even attempted to write sdc4 and we marked sdb4 bad before we did. This is what I think normally happens when we have a drive fail, but the rest of the system is ok: drive X fails -> update event count and mark drive bad in superblock -> submit write to new superblock on drive A submit write to new superblock on drive B submit write to new superblock on drive C (delay for drive access time) write to new superblock on drive A completes write to new superblock on drive B completes write to new superblock on drive C completes superblock update complete, array in consistent, degraded state Now, here's where I think the problem may creep in: drive X fails -> update event count and mark drive bad in superblock -> submit write to new superblock on drive A write to drive A immediately fails, mark drive A bad in superblock but because we are in the process of doing a superblock update with a new event count, don't bother to increment event count submit write to new superblock on drive B with drive A marked bad -> write to drive B immediately fails, mark drive B bad in superblock but because we are in the process of doing a superblock update with a new event count, don't bother to increment event count submit write to new superblock on drive C, ditto on the rest superblock update more or less fails, but for some reason, the writes actually completed to disk (an interrupt issue on the controller would cause the write to complete but never get acknowledged by the disk layer, resulting in the sort of thing we see here, although that wouldn't explain the ordering) I haven't actually read through the code, but this is the sort of thing that seems to be happening. I don't have a better explanation for why the superblocks got into the state that they are. Now, as for what to do, I think the only thing to do now is to recreate the array using the same information that you currently have. Use the output of mdadm -E on a constituent device to get all the settings you need (save them off). Then you should be able to get the superblock version, the chunk size, the presence or absence of a bitmap, bitmap chunk, and the data offset from the mdadm -E output you saved. As long as any attempts to remake the array use the same superblock version, use --assume-clean, keep the drives in the right order, and the array is created/assembled in read-only state and you just do a read-only fsck, then you won't corrupt anything in the array if the rest of the parameters aren't perfect and you can try again as many times as needed to get things right and get the disks back online. The one thing you might have to do is track down the same version of mdadm that was used to create the array as the default data offset for some of the superblock versions has changed over time and you might not be able to get the data offset right without having the older mdadm version on hand. > At this point we found only the two options I mentioned, and we > decided to climb the mountain and talk to the oracle. Is there another > way to get the other two drives back into the array? > > regards, > > Kevin > > On 5 November 2013 00:28, Phil Turmel wrote: >> Hi Kevin, >> >> On 11/04/2013 08:51 AM, Kevin Wilson wrote: >>> Good day All, >> >> [snip /] >> >> Good report, BTW. >> >>> 1. Hexedit the drive status information in the superblocks and set it >>> to what we require to assemble >> >> You would have to be very brave to try that, and very confident that you >> complete understood the on-disk raid metadata. >> >>> 2. Run the create option of mdadm with precisely the original >>> configuration of the pack to overwrite the superblock information >> >> This is a valid option, but should always be the *last* resort. >> >> Your research missed the recommended *first* option: >> >> mdadm --assemble --force .... >> >> [snip /] >> >>> Mdadm examine for each drive: >>> /dev/sda4: >> >>> Events : 18538 >>> Device Role : Active device 0 >>> Array State : AAA. ('A' == active, '.' == missing) >> >>> /dev/sdb4: >>> Events : 18538 >>> Device Role : Active device 1 >>> Array State : .AA. ('A' == active, '.' == missing) >> >>> /dev/sdc4: >>> Events : 18538 >>> Device Role : Active device 2 >>> Array State : ..A. ('A' == active, '.' == missing) >> >>> /dev/sdd4 is the faulty drive that now shows up as 4GB. >> >> Check /proc/mdstat and then use mdadm --stop to make sure any partial >> assembly of these devices is gone. Then >> >> mdadm -Afv /dev/md3 /dev/sd[abc]4 >> >> Save the output so you can report it to this list if it fails. You >> should end up with the array running in degraded mode. >> >> Use fsck as needed to deal with the detritus from the power losses, then >> make your backups. >> >> HTH, >> >> Phil >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >