linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Recovering from the kernel bug
@ 2012-08-19 13:56 Oliver Schinagl
  2012-09-09 20:22 ` Recovering from the kernel bug, Neil? Oliver Schinagl
  0 siblings, 1 reply; 12+ messages in thread
From: Oliver Schinagl @ 2012-08-19 13:56 UTC (permalink / raw)
  To: linux-raid

Hi list,

I've once again started to try to repair my broken array. I've tried 
most things suggested by Neil before (create array in place whilst 
keeping data etc etc) only breaking it more (having to new of mdadm).

So instead, I made a dd of: sda4 and sdb4; sda5 and sdb5, both working 
raid10 arrays, f2 and o2 layouts. I then compared that to an image of 
sdb6. Granted, I only used 256mb worth of data.

Using https://raid.wiki.kernel.org/index.php/RAID_superblock_formats I 
compared my broken sdb6 array to the two working and active arrays.

I haven't completly finished comparing, since the wiki falls short at 
the end, which I think is the more important bit concerning my situation.

Some info about sdb6:

/dev/sdb6:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : cde37e2e:309beb19:3461f3f3:1ea70694
            Name : valexia:opt  (local to host valexia)
   Creation Time : Sun Aug 28 17:46:27 2011
      Raid Level : -unknown-
    Raid Devices : 0

  Avail Dev Size : 456165376 (217.52 GiB 233.56 GB)
     Data Offset : 2048 sectors
    Super Offset : 8 sectors
           State : active
     Device UUID : 7b47e9ab:ea4b27ce:50e12587:9c572944

     Update Time : Mon May 28 20:53:42 2012
        Checksum : 32e1e116 - correct
          Events : 1


    Device Role : spare
    Array State :  ('A' == active, '.' == missing)


Now my questions regarding trying to repair this array are the following:

At offset 0x10A0, (metaversion 1.2 accounts for the 0x1000 extra) I 
found on the wiki:

"This is shown as "Array Slot" by the mdadm v2.x "--examine" command

Note: This is a 32-bit unsigned integer, but the Device-Roles 
(Positions-in-Array) Area indexes these values using only 16-bit 
unsigned integers, and reserves the values 0xFFFF as spare and 0xFFFE as 
faulty, so only 65,534 devices per array are possible."

sda4 and sdb4 list this as 02 00 00 00 and 01 00 00 00. Sounds sensible, 
although I would have expected 0x0 and 0x1, but I'm sure there's some 
sensible explanation. sda5 and sdb5 however are slightly different, 03 
00 00 00 and 02 00 00 00. It quickly shows that for some coincidental 
reason, but the 'b' parts have a higher number then the 'a' parts. So a 
02 00 00 00 on sdb6 (the broken array) should be okay.

Then next, is 'resync_offset' at 0x10D0. I think all devices list it as 
FF FF FF FF, but the broken device has it at 00 00 00 00. Any impact on 
this one?

Then of course tehre's the 0x10D8 checksum. mdadm currently says it 
matches, but once I start editing things those probably won't match 
anymore. Any way around that?

Then offset 0x1100 is slightly different for each array. Array sd?5 
looks like: FE FF FE FF 01 00 00 00
Array sd?4 looks similar enough, FE FF 01 00 00 00 FE FF

Does this correspond to the 01, 02 and 03 value pairs for 0x10A0?

The broken array reads FE FF FE FF FE FF FE, which probably is wrong?


As for determining whether the first data block is offset, or 'real', I 
compared dataoffsets 0x100000 - 0x100520-ish and noticed something that 
looks like s_volume_name and s_last_mounted of ext4. Thus this should be 
the 'real' first block. Since sdb6 has something that looks a lot like 
what's on sdb5, 20 80 00 00 20 80 01 00 20 80 02 etc etc at 0x100000 
this should be the first offset block, correct?


Assuming I can force somehow that mdadm recognizes my disk as part of an 
array, and no longer a spare, how does mdadm know which of the two parts 
it is? 'real' or offset? I haven't bumped into anything that would tell 
mdadm that bit of information. The data seems to all be still very much 
available, so I still have hope. I did try making a copy of the entire 
partition, and re-create the array as missing /dev/loop0 (with loop0 
being the dd-ed copy) but that didn't work.

Finally, would it even be possible to 'restore' my first 127mb on sda6, 
those that the wrong version of mdadm destroyed by reserving 128mb of 
data instead of the usual 1mb using data from sdb6?

Sorry for the long mail, I tried to be complete :)

Oliver

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-09-20 17:49 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-19 13:56 Recovering from the kernel bug Oliver Schinagl
2012-09-09 20:22 ` Recovering from the kernel bug, Neil? Oliver Schinagl
2012-09-09 23:08   ` NeilBrown
2012-09-10  8:44     ` Oliver Schinagl
2012-09-11  6:16       ` NeilBrown
2012-09-14 10:07         ` Oliver Schinagl
2012-09-14 11:51           ` Small short question Was: " Oliver Schinagl
2012-09-14 16:43             ` Small short question Peter Grandi
2012-09-14 20:19               ` Oliver Schinagl
2012-09-20  2:22             ` Small short question Was: Re: Recovering from the kernel bug, Neil? NeilBrown
2012-09-20 17:05               ` Oliver Schinagl
2012-09-20 17:49                 ` Chris Murphy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).