Recovering from the kernel bug

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Recovering from the kernel bug
@ 2012-08-19 13:56 Oliver Schinagl
  2012-09-09 20:22 ` Recovering from the kernel bug, Neil? Oliver Schinagl
  0 siblings, 1 reply; 12+ messages in thread
From: Oliver Schinagl @ 2012-08-19 13:56 UTC (permalink / raw)
  To: linux-raid

Hi list,

I've once again started to try to repair my broken array. I've tried 
most things suggested by Neil before (create array in place whilst 
keeping data etc etc) only breaking it more (having to new of mdadm).

So instead, I made a dd of: sda4 and sdb4; sda5 and sdb5, both working 
raid10 arrays, f2 and o2 layouts. I then compared that to an image of 
sdb6. Granted, I only used 256mb worth of data.

Using https://raid.wiki.kernel.org/index.php/RAID_superblock_formats I 
compared my broken sdb6 array to the two working and active arrays.

I haven't completly finished comparing, since the wiki falls short at 
the end, which I think is the more important bit concerning my situation.

Some info about sdb6:

/dev/sdb6:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : cde37e2e:309beb19:3461f3f3:1ea70694
            Name : valexia:opt  (local to host valexia)
   Creation Time : Sun Aug 28 17:46:27 2011
      Raid Level : -unknown-
    Raid Devices : 0

  Avail Dev Size : 456165376 (217.52 GiB 233.56 GB)
     Data Offset : 2048 sectors
    Super Offset : 8 sectors
           State : active
     Device UUID : 7b47e9ab:ea4b27ce:50e12587:9c572944

     Update Time : Mon May 28 20:53:42 2012
        Checksum : 32e1e116 - correct
          Events : 1

    Device Role : spare
    Array State :  ('A' == active, '.' == missing)

Now my questions regarding trying to repair this array are the following:

At offset 0x10A0, (metaversion 1.2 accounts for the 0x1000 extra) I 
found on the wiki:

"This is shown as "Array Slot" by the mdadm v2.x "--examine" command

Note: This is a 32-bit unsigned integer, but the Device-Roles 
(Positions-in-Array) Area indexes these values using only 16-bit 
unsigned integers, and reserves the values 0xFFFF as spare and 0xFFFE as 
faulty, so only 65,534 devices per array are possible."

sda4 and sdb4 list this as 02 00 00 00 and 01 00 00 00. Sounds sensible, 
although I would have expected 0x0 and 0x1, but I'm sure there's some 
sensible explanation. sda5 and sdb5 however are slightly different, 03 
00 00 00 and 02 00 00 00. It quickly shows that for some coincidental 
reason, but the 'b' parts have a higher number then the 'a' parts. So a 
02 00 00 00 on sdb6 (the broken array) should be okay.

Then next, is 'resync_offset' at 0x10D0. I think all devices list it as 
FF FF FF FF, but the broken device has it at 00 00 00 00. Any impact on 
this one?

Then of course tehre's the 0x10D8 checksum. mdadm currently says it 
matches, but once I start editing things those probably won't match 
anymore. Any way around that?

Then offset 0x1100 is slightly different for each array. Array sd?5 
looks like: FE FF FE FF 01 00 00 00
Array sd?4 looks similar enough, FE FF 01 00 00 00 FE FF

Does this correspond to the 01, 02 and 03 value pairs for 0x10A0?

The broken array reads FE FF FE FF FE FF FE, which probably is wrong?

As for determining whether the first data block is offset, or 'real', I 
compared dataoffsets 0x100000 - 0x100520-ish and noticed something that 
looks like s_volume_name and s_last_mounted of ext4. Thus this should be 
the 'real' first block. Since sdb6 has something that looks a lot like 
what's on sdb5, 20 80 00 00 20 80 01 00 20 80 02 etc etc at 0x100000 
this should be the first offset block, correct?

Assuming I can force somehow that mdadm recognizes my disk as part of an 
array, and no longer a spare, how does mdadm know which of the two parts 
it is? 'real' or offset? I haven't bumped into anything that would tell 
mdadm that bit of information. The data seems to all be still very much 
available, so I still have hope. I did try making a copy of the entire 
partition, and re-create the array as missing /dev/loop0 (with loop0 
being the dd-ed copy) but that didn't work.

Finally, would it even be possible to 'restore' my first 127mb on sda6, 
those that the wrong version of mdadm destroyed by reserving 128mb of 
data instead of the usual 1mb using data from sdb6?

Sorry for the long mail, I tried to be complete :)

Oliver

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Recovering from the kernel bug, Neil?
  2012-08-19 13:56 Recovering from the kernel bug Oliver Schinagl
@ 2012-09-09 20:22 ` Oliver Schinagl
  2012-09-09 23:08   ` NeilBrown
  0 siblings, 1 reply; 12+ messages in thread
From: Oliver Schinagl @ 2012-09-09 20:22 UTC (permalink / raw)
  To: linux-raid

Since I had no reply as of yet, I wonder if I would arbitrarly change 
the data at offset 0x1100 to something that _might_ be right could I 
horribly break something?

oliver

On 08/19/12 15:56, Oliver Schinagl wrote:
> Hi list,
>
> I've once again started to try to repair my broken array. I've tried
> most things suggested by Neil before (create array in place whilst
> keeping data etc etc) only breaking it more (having to new of mdadm).
>
> So instead, I made a dd of: sda4 and sdb4; sda5 and sdb5, both working
> raid10 arrays, f2 and o2 layouts. I then compared that to an image of
> sdb6. Granted, I only used 256mb worth of data.
>
> Using https://raid.wiki.kernel.org/index.php/RAID_superblock_formats I
> compared my broken sdb6 array to the two working and active arrays.
>
> I haven't completly finished comparing, since the wiki falls short at
> the end, which I think is the more important bit concerning my situation.
>
> Some info about sdb6:
>
> /dev/sdb6:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x0
> Array UUID : cde37e2e:309beb19:3461f3f3:1ea70694
> Name : valexia:opt (local to host valexia)
> Creation Time : Sun Aug 28 17:46:27 2011
> Raid Level : -unknown-
> Raid Devices : 0
>
> Avail Dev Size : 456165376 (217.52 GiB 233.56 GB)
> Data Offset : 2048 sectors
> Super Offset : 8 sectors
> State : active
> Device UUID : 7b47e9ab:ea4b27ce:50e12587:9c572944
>
> Update Time : Mon May 28 20:53:42 2012
> Checksum : 32e1e116 - correct
> Events : 1
>
>
> Device Role : spare
> Array State : ('A' == active, '.' == missing)
>
>
> Now my questions regarding trying to repair this array are the following:
>
> At offset 0x10A0, (metaversion 1.2 accounts for the 0x1000 extra) I
> found on the wiki:
>
> "This is shown as "Array Slot" by the mdadm v2.x "--examine" command
>
> Note: This is a 32-bit unsigned integer, but the Device-Roles
> (Positions-in-Array) Area indexes these values using only 16-bit
> unsigned integers, and reserves the values 0xFFFF as spare and 0xFFFE as
> faulty, so only 65,534 devices per array are possible."
>
> sda4 and sdb4 list this as 02 00 00 00 and 01 00 00 00. Sounds sensible,
> although I would have expected 0x0 and 0x1, but I'm sure there's some
> sensible explanation. sda5 and sdb5 however are slightly different, 03
> 00 00 00 and 02 00 00 00. It quickly shows that for some coincidental
> reason, but the 'b' parts have a higher number then the 'a' parts. So a
> 02 00 00 00 on sdb6 (the broken array) should be okay.
>
> Then next, is 'resync_offset' at 0x10D0. I think all devices list it as
> FF FF FF FF, but the broken device has it at 00 00 00 00. Any impact on
> this one?
>
> Then of course tehre's the 0x10D8 checksum. mdadm currently says it
> matches, but once I start editing things those probably won't match
> anymore. Any way around that?
>
> Then offset 0x1100 is slightly different for each array. Array sd?5
> looks like: FE FF FE FF 01 00 00 00
> Array sd?4 looks similar enough, FE FF 01 00 00 00 FE FF
>
> Does this correspond to the 01, 02 and 03 value pairs for 0x10A0?
>
> The broken array reads FE FF FE FF FE FF FE, which probably is wrong?
>
>
> As for determining whether the first data block is offset, or 'real', I
> compared dataoffsets 0x100000 - 0x100520-ish and noticed something that
> looks like s_volume_name and s_last_mounted of ext4. Thus this should be
> the 'real' first block. Since sdb6 has something that looks a lot like
> what's on sdb5, 20 80 00 00 20 80 01 00 20 80 02 etc etc at 0x100000
> this should be the first offset block, correct?
>
>
> Assuming I can force somehow that mdadm recognizes my disk as part of an
> array, and no longer a spare, how does mdadm know which of the two parts
> it is? 'real' or offset? I haven't bumped into anything that would tell
> mdadm that bit of information. The data seems to all be still very much
> available, so I still have hope. I did try making a copy of the entire
> partition, and re-create the array as missing /dev/loop0 (with loop0
> being the dd-ed copy) but that didn't work.
>
> Finally, would it even be possible to 'restore' my first 127mb on sda6,
> those that the wrong version of mdadm destroyed by reserving 128mb of
> data instead of the usual 1mb using data from sdb6?
>
> Sorry for the long mail, I tried to be complete :)
>
> Oliver
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Recovering from the kernel bug, Neil?
  2012-09-09 20:22 ` Recovering from the kernel bug, Neil? Oliver Schinagl
@ 2012-09-09 23:08   ` NeilBrown
  2012-09-10  8:44     ` Oliver Schinagl
  0 siblings, 1 reply; 12+ messages in thread
From: NeilBrown @ 2012-09-09 23:08 UTC (permalink / raw)
  To: Oliver Schinagl; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 5633 bytes --]

On Sun, 09 Sep 2012 22:22:19 +0200 Oliver Schinagl <oliver+list@schinagl.nl>
wrote:

> Since I had no reply as of yet, I wonder if I would arbitrarly change 
> the data at offset 0x1100 to something that _might_ be right could I 
> horribly break something?

I doubt it would do any good.
I think that editing the metadata by 'hand' is not likely to be a useful
approach.  You really want to get 'mdadm --create' to recreate the array with
the correct details.  It should be possible to do this, though a little bit
of hacking or careful selection of mdadm version might be required.

What exactly do you know about the array?   When you use mdadm to --create
the array, what details does it get wrong?

NeilBrown


> 
> oliver
> 
> On 08/19/12 15:56, Oliver Schinagl wrote:
> > Hi list,
> >
> > I've once again started to try to repair my broken array. I've tried
> > most things suggested by Neil before (create array in place whilst
> > keeping data etc etc) only breaking it more (having to new of mdadm).
> >
> > So instead, I made a dd of: sda4 and sdb4; sda5 and sdb5, both working
> > raid10 arrays, f2 and o2 layouts. I then compared that to an image of
> > sdb6. Granted, I only used 256mb worth of data.
> >
> > Using https://raid.wiki.kernel.org/index.php/RAID_superblock_formats I
> > compared my broken sdb6 array to the two working and active arrays.
> >
> > I haven't completly finished comparing, since the wiki falls short at
> > the end, which I think is the more important bit concerning my situation.
> >
> > Some info about sdb6:
> >
> > /dev/sdb6:
> > Magic : a92b4efc
> > Version : 1.2
> > Feature Map : 0x0
> > Array UUID : cde37e2e:309beb19:3461f3f3:1ea70694
> > Name : valexia:opt (local to host valexia)
> > Creation Time : Sun Aug 28 17:46:27 2011
> > Raid Level : -unknown-
> > Raid Devices : 0
> >
> > Avail Dev Size : 456165376 (217.52 GiB 233.56 GB)
> > Data Offset : 2048 sectors
> > Super Offset : 8 sectors
> > State : active
> > Device UUID : 7b47e9ab:ea4b27ce:50e12587:9c572944
> >
> > Update Time : Mon May 28 20:53:42 2012
> > Checksum : 32e1e116 - correct
> > Events : 1
> >
> >
> > Device Role : spare
> > Array State : ('A' == active, '.' == missing)
> >
> >
> > Now my questions regarding trying to repair this array are the following:
> >
> > At offset 0x10A0, (metaversion 1.2 accounts for the 0x1000 extra) I
> > found on the wiki:
> >
> > "This is shown as "Array Slot" by the mdadm v2.x "--examine" command
> >
> > Note: This is a 32-bit unsigned integer, but the Device-Roles
> > (Positions-in-Array) Area indexes these values using only 16-bit
> > unsigned integers, and reserves the values 0xFFFF as spare and 0xFFFE as
> > faulty, so only 65,534 devices per array are possible."
> >
> > sda4 and sdb4 list this as 02 00 00 00 and 01 00 00 00. Sounds sensible,
> > although I would have expected 0x0 and 0x1, but I'm sure there's some
> > sensible explanation. sda5 and sdb5 however are slightly different, 03
> > 00 00 00 and 02 00 00 00. It quickly shows that for some coincidental
> > reason, but the 'b' parts have a higher number then the 'a' parts. So a
> > 02 00 00 00 on sdb6 (the broken array) should be okay.
> >
> > Then next, is 'resync_offset' at 0x10D0. I think all devices list it as
> > FF FF FF FF, but the broken device has it at 00 00 00 00. Any impact on
> > this one?
> >
> > Then of course tehre's the 0x10D8 checksum. mdadm currently says it
> > matches, but once I start editing things those probably won't match
> > anymore. Any way around that?
> >
> > Then offset 0x1100 is slightly different for each array. Array sd?5
> > looks like: FE FF FE FF 01 00 00 00
> > Array sd?4 looks similar enough, FE FF 01 00 00 00 FE FF
> >
> > Does this correspond to the 01, 02 and 03 value pairs for 0x10A0?
> >
> > The broken array reads FE FF FE FF FE FF FE, which probably is wrong?
> >
> >
> > As for determining whether the first data block is offset, or 'real', I
> > compared dataoffsets 0x100000 - 0x100520-ish and noticed something that
> > looks like s_volume_name and s_last_mounted of ext4. Thus this should be
> > the 'real' first block. Since sdb6 has something that looks a lot like
> > what's on sdb5, 20 80 00 00 20 80 01 00 20 80 02 etc etc at 0x100000
> > this should be the first offset block, correct?
> >
> >
> > Assuming I can force somehow that mdadm recognizes my disk as part of an
> > array, and no longer a spare, how does mdadm know which of the two parts
> > it is? 'real' or offset? I haven't bumped into anything that would tell
> > mdadm that bit of information. The data seems to all be still very much
> > available, so I still have hope. I did try making a copy of the entire
> > partition, and re-create the array as missing /dev/loop0 (with loop0
> > being the dd-ed copy) but that didn't work.
> >
> > Finally, would it even be possible to 'restore' my first 127mb on sda6,
> > those that the wrong version of mdadm destroyed by reserving 128mb of
> > data instead of the usual 1mb using data from sdb6?
> >
> > Sorry for the long mail, I tried to be complete :)
> >
> > Oliver
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Recovering from the kernel bug, Neil?
  2012-09-09 23:08   ` NeilBrown
@ 2012-09-10  8:44     ` Oliver Schinagl
  2012-09-11  6:16       ` NeilBrown
  0 siblings, 1 reply; 12+ messages in thread
From: Oliver Schinagl @ 2012-09-10  8:44 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On 09/10/12 01:08, NeilBrown wrote:
> On Sun, 09 Sep 2012 22:22:19 +0200 Oliver Schinagl<oliver+list@schinagl.nl>
> wrote:
>
>> Since I had no reply as of yet, I wonder if I would arbitrarly change
>> the data at offset 0x1100 to something that _might_ be right could I
>> horribly break something?
>
> I doubt it would do any good.
> I think that editing the metadata by 'hand' is not likely to be a useful
> approach.  You really want to get 'mdadm --create' to recreate the array with
> the correct details.  It should be possible to do this, though a little bit
> of hacking or careful selection of mdadm version might be required.
>
> What exactly do you know about the array?   When you use mdadm to --create
> the array, what details does it get wrong?
I tried and believe it doesn't know what the order is of the array. E.g. 
A1 B1 or B1 A1, basically. After that no filesystem is found, nothing. I 
did make a dump of the partition ('only' 160gb or so) so I have some 
room to experiment. Going through the file/disk with hexedit I do see 
all the data, seemingly intact, as described below. I did tune the mdadm 
version to the one your recommended back then;
mdadm - v3.2.3 - 23rd December 2011

I'm just mad at myself for having destroyed the first half of the array 
(the one that is in proper order) by using the wrong version of mdadm 
and destroying the first 128mb of my disk. The first 128mb of data isn't 
that important I don't think, but it of course did contain all the 
information ext4 needs to mount the disk :S
>
> NeilBrown
>
>
>>
>> oliver
>>
>> On 08/19/12 15:56, Oliver Schinagl wrote:
>>> Hi list,
>>>
>>> I've once again started to try to repair my broken array. I've tried
>>> most things suggested by Neil before (create array in place whilst
>>> keeping data etc etc) only breaking it more (having to new of mdadm).
>>>
>>> So instead, I made a dd of: sda4 and sdb4; sda5 and sdb5, both working
>>> raid10 arrays, f2 and o2 layouts. I then compared that to an image of
>>> sdb6. Granted, I only used 256mb worth of data.
>>>
>>> Using https://raid.wiki.kernel.org/index.php/RAID_superblock_formats I
>>> compared my broken sdb6 array to the two working and active arrays.
>>>
>>> I haven't completly finished comparing, since the wiki falls short at
>>> the end, which I think is the more important bit concerning my situation.
>>>
>>> Some info about sdb6:
>>>
>>> /dev/sdb6:
>>> Magic : a92b4efc
>>> Version : 1.2
>>> Feature Map : 0x0
>>> Array UUID : cde37e2e:309beb19:3461f3f3:1ea70694
>>> Name : valexia:opt (local to host valexia)
>>> Creation Time : Sun Aug 28 17:46:27 2011
>>> Raid Level : -unknown-
>>> Raid Devices : 0
>>>
>>> Avail Dev Size : 456165376 (217.52 GiB 233.56 GB)
>>> Data Offset : 2048 sectors
>>> Super Offset : 8 sectors
>>> State : active
>>> Device UUID : 7b47e9ab:ea4b27ce:50e12587:9c572944
>>>
>>> Update Time : Mon May 28 20:53:42 2012
>>> Checksum : 32e1e116 - correct
>>> Events : 1
>>>
>>>
>>> Device Role : spare
>>> Array State : ('A' == active, '.' == missing)
>>>
>>>
>>> Now my questions regarding trying to repair this array are the following:
>>>
>>> At offset 0x10A0, (metaversion 1.2 accounts for the 0x1000 extra) I
>>> found on the wiki:
>>>
>>> "This is shown as "Array Slot" by the mdadm v2.x "--examine" command
>>>
>>> Note: This is a 32-bit unsigned integer, but the Device-Roles
>>> (Positions-in-Array) Area indexes these values using only 16-bit
>>> unsigned integers, and reserves the values 0xFFFF as spare and 0xFFFE as
>>> faulty, so only 65,534 devices per array are possible."
>>>
>>> sda4 and sdb4 list this as 02 00 00 00 and 01 00 00 00. Sounds sensible,
>>> although I would have expected 0x0 and 0x1, but I'm sure there's some
>>> sensible explanation. sda5 and sdb5 however are slightly different, 03
>>> 00 00 00 and 02 00 00 00. It quickly shows that for some coincidental
>>> reason, but the 'b' parts have a higher number then the 'a' parts. So a
>>> 02 00 00 00 on sdb6 (the broken array) should be okay.
>>>
>>> Then next, is 'resync_offset' at 0x10D0. I think all devices list it as
>>> FF FF FF FF, but the broken device has it at 00 00 00 00. Any impact on
>>> this one?
>>>
>>> Then of course tehre's the 0x10D8 checksum. mdadm currently says it
>>> matches, but once I start editing things those probably won't match
>>> anymore. Any way around that?
>>>
>>> Then offset 0x1100 is slightly different for each array. Array sd?5
>>> looks like: FE FF FE FF 01 00 00 00
>>> Array sd?4 looks similar enough, FE FF 01 00 00 00 FE FF
>>>
>>> Does this correspond to the 01, 02 and 03 value pairs for 0x10A0?
>>>
>>> The broken array reads FE FF FE FF FE FF FE, which probably is wrong?
>>>
>>>
>>> As for determining whether the first data block is offset, or 'real', I
>>> compared dataoffsets 0x100000 - 0x100520-ish and noticed something that
>>> looks like s_volume_name and s_last_mounted of ext4. Thus this should be
>>> the 'real' first block. Since sdb6 has something that looks a lot like
>>> what's on sdb5, 20 80 00 00 20 80 01 00 20 80 02 etc etc at 0x100000
>>> this should be the first offset block, correct?
>>>
>>>
>>> Assuming I can force somehow that mdadm recognizes my disk as part of an
>>> array, and no longer a spare, how does mdadm know which of the two parts
>>> it is? 'real' or offset? I haven't bumped into anything that would tell
>>> mdadm that bit of information. The data seems to all be still very much
>>> available, so I still have hope. I did try making a copy of the entire
>>> partition, and re-create the array as missing /dev/loop0 (with loop0
>>> being the dd-ed copy) but that didn't work.
>>>
>>> Finally, would it even be possible to 'restore' my first 127mb on sda6,
>>> those that the wrong version of mdadm destroyed by reserving 128mb of
>>> data instead of the usual 1mb using data from sdb6?
>>>
>>> Sorry for the long mail, I tried to be complete :)
>>>
>>> Oliver
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Recovering from the kernel bug, Neil?
  2012-09-10  8:44     ` Oliver Schinagl
@ 2012-09-11  6:16       ` NeilBrown
  2012-09-14 10:07         ` Oliver Schinagl
  0 siblings, 1 reply; 12+ messages in thread
From: NeilBrown @ 2012-09-11  6:16 UTC (permalink / raw)
  To: Oliver Schinagl; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 7635 bytes --]

On Mon, 10 Sep 2012 10:44:12 +0200 Oliver Schinagl <oliver+list@schinagl.nl>
wrote:

> On 09/10/12 01:08, NeilBrown wrote:
> > On Sun, 09 Sep 2012 22:22:19 +0200 Oliver Schinagl<oliver+list@schinagl.nl>
> > wrote:
> >
> >> Since I had no reply as of yet, I wonder if I would arbitrarly change
> >> the data at offset 0x1100 to something that _might_ be right could I
> >> horribly break something?
> >
> > I doubt it would do any good.
> > I think that editing the metadata by 'hand' is not likely to be a useful
> > approach.  You really want to get 'mdadm --create' to recreate the array with
> > the correct details.  It should be possible to do this, though a little bit
> > of hacking or careful selection of mdadm version might be required.
> >
> > What exactly do you know about the array?   When you use mdadm to --create
> > the array, what details does it get wrong?
> I tried and believe it doesn't know what the order is of the array. E.g. 
> A1 B1 or B1 A1, basically.

So did you try both orders of the devices and  neither worked?

But you didn't even try to answer my question: "what do you know about the
array".
I think it is raid10 - correct?
2 devices?
You say someting about 'offset' below so may you chose the 'o2' layout - is
that correct?
Do you know anything about chunk size?

It does look like there is an ext4 superblock 1M into the device, so that
suggests a 'data_offset' of 1Meg.
What data offset do you get when you try to --create the array?

>                           After that no filesystem is found, nothing. I 
> did make a dump of the partition ('only' 160gb or so) so I have some 
> room to experiment. Going through the file/disk with hexedit I do see 
> all the data, seemingly intact, as described below. I did tune the mdadm 
> version to the one your recommended back then;
> mdadm - v3.2.3 - 23rd December 2011
> 
> I'm just mad at myself for having destroyed the first half of the array 
> (the one that is in proper order) by using the wrong version of mdadm 
> and destroying the first 128mb of my disk. The first 128mb of data isn't 
> that important I don't think, but it of course did contain all the 
> information ext4 needs to mount the disk :S

Why do you think that you destroyed 128mb of your disk?  Creating an array
with the 'wrong' data offset should at most destroy 4K, probably less.  It
just writes out the super-block, nothing else.

NeilBrown



> >
> > NeilBrown
> >
> >
> >>
> >> oliver
> >>
> >> On 08/19/12 15:56, Oliver Schinagl wrote:
> >>> Hi list,
> >>>
> >>> I've once again started to try to repair my broken array. I've tried
> >>> most things suggested by Neil before (create array in place whilst
> >>> keeping data etc etc) only breaking it more (having to new of mdadm).
> >>>
> >>> So instead, I made a dd of: sda4 and sdb4; sda5 and sdb5, both working
> >>> raid10 arrays, f2 and o2 layouts. I then compared that to an image of
> >>> sdb6. Granted, I only used 256mb worth of data.
> >>>
> >>> Using https://raid.wiki.kernel.org/index.php/RAID_superblock_formats I
> >>> compared my broken sdb6 array to the two working and active arrays.
> >>>
> >>> I haven't completly finished comparing, since the wiki falls short at
> >>> the end, which I think is the more important bit concerning my situation.
> >>>
> >>> Some info about sdb6:
> >>>
> >>> /dev/sdb6:
> >>> Magic : a92b4efc
> >>> Version : 1.2
> >>> Feature Map : 0x0
> >>> Array UUID : cde37e2e:309beb19:3461f3f3:1ea70694
> >>> Name : valexia:opt (local to host valexia)
> >>> Creation Time : Sun Aug 28 17:46:27 2011
> >>> Raid Level : -unknown-
> >>> Raid Devices : 0
> >>>
> >>> Avail Dev Size : 456165376 (217.52 GiB 233.56 GB)
> >>> Data Offset : 2048 sectors
> >>> Super Offset : 8 sectors
> >>> State : active
> >>> Device UUID : 7b47e9ab:ea4b27ce:50e12587:9c572944
> >>>
> >>> Update Time : Mon May 28 20:53:42 2012
> >>> Checksum : 32e1e116 - correct
> >>> Events : 1
> >>>
> >>>
> >>> Device Role : spare
> >>> Array State : ('A' == active, '.' == missing)
> >>>
> >>>
> >>> Now my questions regarding trying to repair this array are the following:
> >>>
> >>> At offset 0x10A0, (metaversion 1.2 accounts for the 0x1000 extra) I
> >>> found on the wiki:
> >>>
> >>> "This is shown as "Array Slot" by the mdadm v2.x "--examine" command
> >>>
> >>> Note: This is a 32-bit unsigned integer, but the Device-Roles
> >>> (Positions-in-Array) Area indexes these values using only 16-bit
> >>> unsigned integers, and reserves the values 0xFFFF as spare and 0xFFFE as
> >>> faulty, so only 65,534 devices per array are possible."
> >>>
> >>> sda4 and sdb4 list this as 02 00 00 00 and 01 00 00 00. Sounds sensible,
> >>> although I would have expected 0x0 and 0x1, but I'm sure there's some
> >>> sensible explanation. sda5 and sdb5 however are slightly different, 03
> >>> 00 00 00 and 02 00 00 00. It quickly shows that for some coincidental
> >>> reason, but the 'b' parts have a higher number then the 'a' parts. So a
> >>> 02 00 00 00 on sdb6 (the broken array) should be okay.
> >>>
> >>> Then next, is 'resync_offset' at 0x10D0. I think all devices list it as
> >>> FF FF FF FF, but the broken device has it at 00 00 00 00. Any impact on
> >>> this one?
> >>>
> >>> Then of course tehre's the 0x10D8 checksum. mdadm currently says it
> >>> matches, but once I start editing things those probably won't match
> >>> anymore. Any way around that?
> >>>
> >>> Then offset 0x1100 is slightly different for each array. Array sd?5
> >>> looks like: FE FF FE FF 01 00 00 00
> >>> Array sd?4 looks similar enough, FE FF 01 00 00 00 FE FF
> >>>
> >>> Does this correspond to the 01, 02 and 03 value pairs for 0x10A0?
> >>>
> >>> The broken array reads FE FF FE FF FE FF FE, which probably is wrong?
> >>>
> >>>
> >>> As for determining whether the first data block is offset, or 'real', I
> >>> compared dataoffsets 0x100000 - 0x100520-ish and noticed something that
> >>> looks like s_volume_name and s_last_mounted of ext4. Thus this should be
> >>> the 'real' first block. Since sdb6 has something that looks a lot like
> >>> what's on sdb5, 20 80 00 00 20 80 01 00 20 80 02 etc etc at 0x100000
> >>> this should be the first offset block, correct?
> >>>
> >>>
> >>> Assuming I can force somehow that mdadm recognizes my disk as part of an
> >>> array, and no longer a spare, how does mdadm know which of the two parts
> >>> it is? 'real' or offset? I haven't bumped into anything that would tell
> >>> mdadm that bit of information. The data seems to all be still very much
> >>> available, so I still have hope. I did try making a copy of the entire
> >>> partition, and re-create the array as missing /dev/loop0 (with loop0
> >>> being the dd-ed copy) but that didn't work.
> >>>
> >>> Finally, would it even be possible to 'restore' my first 127mb on sda6,
> >>> those that the wrong version of mdadm destroyed by reserving 128mb of
> >>> data instead of the usual 1mb using data from sdb6?
> >>>
> >>> Sorry for the long mail, I tried to be complete :)
> >>>
> >>> Oliver
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Recovering from the kernel bug, Neil?
  2012-09-11  6:16       ` NeilBrown
@ 2012-09-14 10:07         ` Oliver Schinagl
  2012-09-14 11:51           ` Small short question Was: " Oliver Schinagl
  0 siblings, 1 reply; 12+ messages in thread
From: Oliver Schinagl @ 2012-09-14 10:07 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

So I've spent the last few days trying several things to recover the 
array. Assumption is the mother right?

I had 3 arrays, /, /usr and /opt. I did some basic research at the 
various raid levels, and for some reason decided that f2 was good for /, 
and o2 for /usr. In that same trend I thought o2 was good for /opt as 
well. I was wrong. I was so sure I made it o2, that I ruled out the 
possibility it being f2. I did try various offsets, but never f2 with 
missing /dev/sdb6. I think i even tried f2 in the passed, but on sda6 
where i may have broke things.

Short story short, it turns out it was 128k chunks, Far2 offset. The 
data actually is accessible from a dd-ed image looped to mdadm and that 
mounted. I will now recreate my md2 array, and copy the data over.

Thank you for all advice and help in the past and again. You are an 
amazing dev and a good person.

Oliver



I left the below because I typed a lot and it could be potentially still 
be usefull to someone :p This has been an interesting en-devour using 
hexdump to investigate a raid array and I learned a lot from it.

On 09/11/12 08:16, NeilBrown wrote:
> On Mon, 10 Sep 2012 10:44:12 +0200 Oliver Schinagl<oliver+list@schinagl.nl>
> wrote:
>
>> On 09/10/12 01:08, NeilBrown wrote:
>>> On Sun, 09 Sep 2012 22:22:19 +0200 Oliver Schinagl<oliver+list@schinagl.nl>
>>> wrote:
>>>
>>>> Since I had no reply as of yet, I wonder if I would arbitrarly change
>>>> the data at offset 0x1100 to something that _might_ be right could I
>>>> horribly break something?
>>>
>>> I doubt it would do any good.
>>> I think that editing the metadata by 'hand' is not likely to be a useful
>>> approach.  You really want to get 'mdadm --create' to recreate the array with
>>> the correct details.  It should be possible to do this, though a little bit
>>> of hacking or careful selection of mdadm version might be required.
>>>
>>> What exactly do you know about the array?   When you use mdadm to --create
>>> the array, what details does it get wrong?
>> I tried and believe it doesn't know what the order is of the array. E.g.
>> A1 B1 or B1 A1, basically.
>
> So did you try both orders of the devices and  neither worked?
Correct, but have re-tried again over the days. I am now guessing that I 
must have gotten the dimensions of the array wrong? E.g. not o2, but f2? 
Not 64k, but 128 or 256k? I suppose it won't hurt trying different sizes?

Here's what I did. dd the entire partition to a file (takes long, I 
somehow only get 45mb/s using BS=64k, partition is 160gb.

>
> But you didn't even try to answer my question: "what do you know about the
> array".
I am sorry for not properly describing the array, my mistake. (I have in 
the far past actually :p)

> I think it is raid10 - correct?
Yes, Raid10, I _thought_ it was o2, 128k chunks.

> 2 devices?
Yes, 2 devices only.

> You say someting about 'offset' below so may you chose the 'o2' layout - is
> that correct?
O2 from what I remember.

> Do you know anything about chunk size?
I am almost certain it was 128k, but in strong doubt now.

>
> It does look like there is an ext4 superblock 1M into the device, so that
> suggests a 'data_offset' of 1Meg.
I used the older version of mdadm, the one that didn't have the 4k and 
128M diffferentiation. Using the v1.2 metadata, puts my raid superblock 
at 4k I belive, and after that, at 1M, the ext4 begins.

data_offset also says 2048 with mdadm --examine, which I belive is 1M.

> What data offset do you get when you try to --create the array?
With mdadm v3.2.3, 2048.

>
>>                            After that no filesystem is found, nothing. I
>> did make a dump of the partition ('only' 160gb or so) so I have some
>> room to experiment. Going through the file/disk with hexedit I do see
>> all the data, seemingly intact, as described below. I did tune the mdadm
>> version to the one your recommended back then;
>> mdadm - v3.2.3 - 23rd December 2011
>>
>> I'm just mad at myself for having destroyed the first half of the array
>> (the one that is in proper order) by using the wrong version of mdadm
>> and destroying the first 128mb of my disk. The first 128mb of data isn't
>> that important I don't think, but it of course did contain all the
>> information ext4 needs to mount the disk :S
>
> Why do you think that you destroyed 128mb of your disk?  Creating an array
> with the 'wrong' data offset should at most destroy 4K, probably less.  It
> just writes out the super-block, nothing else.
>
> NeilBrown
Because I made the silly assumtion, that mdadm would clear the first 
128M as that's where the start of the array would be.

In that case, I should be able to see this with a hexdump, correct? At 
0x10000 I see nothing, a little further down, there's some data though. 
So maybe it is not destroyed after all. Will examine.
>
>
>
>>>
>>> NeilBrown
>>>
>>>
>>>>
>>>> oliver
>>>>
>>>> On 08/19/12 15:56, Oliver Schinagl wrote:
>>>>> Hi list,
>>>>>
>>>>> I've once again started to try to repair my broken array. I've tried
>>>>> most things suggested by Neil before (create array in place whilst
>>>>> keeping data etc etc) only breaking it more (having to new of mdadm).
>>>>>
>>>>> So instead, I made a dd of: sda4 and sdb4; sda5 and sdb5, both working
>>>>> raid10 arrays, f2 and o2 layouts. I then compared that to an image of
>>>>> sdb6. Granted, I only used 256mb worth of data.
>>>>>
>>>>> Using https://raid.wiki.kernel.org/index.php/RAID_superblock_formats I
>>>>> compared my broken sdb6 array to the two working and active arrays.
>>>>>
>>>>> I haven't completly finished comparing, since the wiki falls short at
>>>>> the end, which I think is the more important bit concerning my situation.
>>>>>
>>>>> Some info about sdb6:
>>>>>
>>>>> /dev/sdb6:
>>>>> Magic : a92b4efc
>>>>> Version : 1.2
>>>>> Feature Map : 0x0
>>>>> Array UUID : cde37e2e:309beb19:3461f3f3:1ea70694
>>>>> Name : valexia:opt (local to host valexia)
>>>>> Creation Time : Sun Aug 28 17:46:27 2011
>>>>> Raid Level : -unknown-
>>>>> Raid Devices : 0
>>>>>
>>>>> Avail Dev Size : 456165376 (217.52 GiB 233.56 GB)
>>>>> Data Offset : 2048 sectors
>>>>> Super Offset : 8 sectors
>>>>> State : active
>>>>> Device UUID : 7b47e9ab:ea4b27ce:50e12587:9c572944
>>>>>
>>>>> Update Time : Mon May 28 20:53:42 2012
>>>>> Checksum : 32e1e116 - correct
>>>>> Events : 1
>>>>>
>>>>>
>>>>> Device Role : spare
>>>>> Array State : ('A' == active, '.' == missing)
>>>>>
>>>>>
>>>>> Now my questions regarding trying to repair this array are the following:
>>>>>
>>>>> At offset 0x10A0, (metaversion 1.2 accounts for the 0x1000 extra) I
>>>>> found on the wiki:
>>>>>
>>>>> "This is shown as "Array Slot" by the mdadm v2.x "--examine" command
>>>>>
>>>>> Note: This is a 32-bit unsigned integer, but the Device-Roles
>>>>> (Positions-in-Array) Area indexes these values using only 16-bit
>>>>> unsigned integers, and reserves the values 0xFFFF as spare and 0xFFFE as
>>>>> faulty, so only 65,534 devices per array are possible."
>>>>>
>>>>> sda4 and sdb4 list this as 02 00 00 00 and 01 00 00 00. Sounds sensible,
>>>>> although I would have expected 0x0 and 0x1, but I'm sure there's some
>>>>> sensible explanation. sda5 and sdb5 however are slightly different, 03
>>>>> 00 00 00 and 02 00 00 00. It quickly shows that for some coincidental
>>>>> reason, but the 'b' parts have a higher number then the 'a' parts. So a
>>>>> 02 00 00 00 on sdb6 (the broken array) should be okay.
>>>>>
>>>>> Then next, is 'resync_offset' at 0x10D0. I think all devices list it as
>>>>> FF FF FF FF, but the broken device has it at 00 00 00 00. Any impact on
>>>>> this one?
>>>>>
>>>>> Then of course tehre's the 0x10D8 checksum. mdadm currently says it
>>>>> matches, but once I start editing things those probably won't match
>>>>> anymore. Any way around that?
>>>>>
>>>>> Then offset 0x1100 is slightly different for each array. Array sd?5
>>>>> looks like: FE FF FE FF 01 00 00 00
>>>>> Array sd?4 looks similar enough, FE FF 01 00 00 00 FE FF
>>>>>
>>>>> Does this correspond to the 01, 02 and 03 value pairs for 0x10A0?
>>>>>
>>>>> The broken array reads FE FF FE FF FE FF FE, which probably is wrong?
>>>>>
>>>>>
>>>>> As for determining whether the first data block is offset, or 'real', I
>>>>> compared dataoffsets 0x100000 - 0x100520-ish and noticed something that
>>>>> looks like s_volume_name and s_last_mounted of ext4. Thus this should be
>>>>> the 'real' first block. Since sdb6 has something that looks a lot like
>>>>> what's on sdb5, 20 80 00 00 20 80 01 00 20 80 02 etc etc at 0x100000
>>>>> this should be the first offset block, correct?
>>>>>
>>>>>
>>>>> Assuming I can force somehow that mdadm recognizes my disk as part of an
>>>>> array, and no longer a spare, how does mdadm know which of the two parts
>>>>> it is? 'real' or offset? I haven't bumped into anything that would tell
>>>>> mdadm that bit of information. The data seems to all be still very much
>>>>> available, so I still have hope. I did try making a copy of the entire
>>>>> partition, and re-create the array as missing /dev/loop0 (with loop0
>>>>> being the dd-ed copy) but that didn't work.
>>>>>
>>>>> Finally, would it even be possible to 'restore' my first 127mb on sda6,
>>>>> those that the wrong version of mdadm destroyed by reserving 128mb of
>>>>> data instead of the usual 1mb using data from sdb6?
>>>>>
>>>>> Sorry for the long mail, I tried to be complete :)
>>>>>
>>>>> Oliver
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Small short question Was: Re: Recovering from the kernel bug, Neil?
  2012-09-14 10:07         ` Oliver Schinagl
@ 2012-09-14 11:51           ` Oliver Schinagl
  2012-09-14 16:43             ` Small short question Peter Grandi
  2012-09-20  2:22             ` Small short question Was: Re: Recovering from the kernel bug, Neil? NeilBrown
  0 siblings, 2 replies; 12+ messages in thread
From: Oliver Schinagl @ 2012-09-14 11:51 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Since I had only one valid device to check with etc, I assume that if 
fsck -n -f /dev/md2 runs sucessfully, it is 100% safe to assume the 
array is perfectly healthy?

E.g. it should be perfectly safe to mdadm --zero-superblock on /dev/sda6 
and add it to /dev/md2 (missing /dev/sdb6)?

I know technically this all works out fine, and the bug shouldn't have 
broken anything in that regard. Or is it absolutly recommended to simply 
create a new array, with a new FS on it, and copy all data over 
(Logically also with /dev/md2, /dev/sda6 missing and later adding sdb6)?

Oliver

On 09/14/12 12:07, Oliver Schinagl wrote:
> So I've spent the last few days trying several things to recover the
> array. Assumption is the mother right?
>
> I had 3 arrays, /, /usr and /opt. I did some basic research at the
> various raid levels, and for some reason decided that f2 was good for /,
> and o2 for /usr. In that same trend I thought o2 was good for /opt as
> well. I was wrong. I was so sure I made it o2, that I ruled out the
> possibility it being f2. I did try various offsets, but never f2 with
> missing /dev/sdb6. I think i even tried f2 in the passed, but on sda6
> where i may have broke things.
>
> Short story short, it turns out it was 128k chunks, Far2 offset. The
> data actually is accessible from a dd-ed image looped to mdadm and that
> mounted. I will now recreate my md2 array, and copy the data over.
>
> Thank you for all advice and help in the past and again. You are an
> amazing dev and a good person.
>
> Oliver
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Small short question
  2012-09-14 11:51           ` Small short question Was: " Oliver Schinagl
@ 2012-09-14 16:43             ` Peter Grandi
  2012-09-14 20:19               ` Oliver Schinagl
  2012-09-20  2:22             ` Small short question Was: Re: Recovering from the kernel bug, Neil? NeilBrown
  1 sibling, 1 reply; 12+ messages in thread
From: Peter Grandi @ 2012-09-14 16:43 UTC (permalink / raw)
  To: Linux RAID

> Since I had only one valid device to check with etc, I assume
> that if fsck -n -f /dev/md2 runs sucessfully, it is 100% safe
> to assume the array is perfectly healthy?

If the filesystem used supports both metadata and data checksums
and its 'fsck -f' does verify all of them, then yes. Otherwise
obviously no.

> [ ... ] recommended to simply create a new array, with a new
> FS on it, and copy all data over (Logically also with
> /dev/md2, /dev/sda6 missing and later adding sdb6)?

Given the number of people who ask here to recover the unique
data they have on a RAID set, perhaps it is good to restate here
that RAID does not mean "no longer need backups" :-).

>> [ ... ] I was wrong. I was so sure I made it o2, that I ruled
>> out the possibility it being f2. [ ... ] Short story short,
>> it turns out it was 128k chunks, Far2 offset. [ ... ]

While you go the lengths of choosing particular layouts on
specific filetrees, saving the output of '--scan' in
'mdadm.conf' seems strangely to have been a lower priority :-).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Small short question
  2012-09-14 16:43             ` Small short question Peter Grandi
@ 2012-09-14 20:19               ` Oliver Schinagl
  0 siblings, 0 replies; 12+ messages in thread
From: Oliver Schinagl @ 2012-09-14 20:19 UTC (permalink / raw)
  To: Peter Grandi; +Cc: Linux RAID

On 14-09-12 18:43, Peter Grandi wrote:
>> Since I had only one valid device to check with etc, I assume
>> that if fsck -n -f /dev/md2 runs sucessfully, it is 100% safe
>> to assume the array is perfectly healthy?
> If the filesystem used supports both metadata and data checksums
> and its 'fsck -f' does verify all of them, then yes. Otherwise
> obviously no.
I dont' think ext4 has datachecksums yet? Let me rephrase that, I know 
there's a patch circulating I think but it wasn't recommended yet as it 
wasn't 'stable' yet? Anyway, I doubt my older ext4 disk has 
datachecksums yet. Metadata is pretty much default if i'm not mistaken 
:) So 0 errors from fsck, should mean all data is okay, right?
>
>> [ ... ] recommended to simply create a new array, with a new
>> FS on it, and copy all data over (Logically also with
>> /dev/md2, /dev/sda6 missing and later adding sdb6)?
> Given the number of people who ask here to recover the unique
> data they have on a RAID set, perhaps it is good to restate here
> that RAID does not mean "no longer need backups" :-).
It's not super crucial important data. That said, its never nice to 
loose data, right? :) Yes, always backup! :p

>
>>> [ ... ] I was wrong. I was so sure I made it o2, that I ruled
>>> out the possibility it being f2. [ ... ] Short story short,
>>> it turns out it was 128k chunks, Far2 offset. [ ... ]
> While you go the lengths of choosing particular layouts on
> specific filetrees, saving the output of '--scan' in
> 'mdadm.conf' seems strangely to have been a lower priority :-).
Oh I have mdadm.conf, but it doesn't list raid level, layout nor chunk 
sizes :( Only array name, meta-data and raid ID.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Small short question Was: Re: Recovering from the kernel bug, Neil?
  2012-09-14 11:51           ` Small short question Was: " Oliver Schinagl
  2012-09-14 16:43             ` Small short question Peter Grandi
@ 2012-09-20  2:22             ` NeilBrown
  2012-09-20 17:05               ` Oliver Schinagl
  1 sibling, 1 reply; 12+ messages in thread
From: NeilBrown @ 2012-09-20  2:22 UTC (permalink / raw)
  To: Oliver Schinagl; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1041 bytes --]

On Fri, 14 Sep 2012 13:51:46 +0200 Oliver Schinagl <oliver+list@schinagl.nl>
wrote:

> Since I had only one valid device to check with etc, I assume that if 
> fsck -n -f /dev/md2 runs sucessfully, it is 100% safe to assume the 
> array is perfectly healthy?

I might be a bit late here but....

No, not 100% safe.
I would at least mount the filesystem and have a look around - take a random
sample and see if it all looks credible.  If it does then that is as good as
it gets.

> 
> E.g. it should be perfectly safe to mdadm --zero-superblock on /dev/sda6 
> and add it to /dev/md2 (missing /dev/sdb6)?
> 
> I know technically this all works out fine, and the bug shouldn't have 
> broken anything in that regard. Or is it absolutly recommended to simply 
> create a new array, with a new FS on it, and copy all data over 
> (Logically also with /dev/md2, /dev/sda6 missing and later adding sdb6)?

Creating a new array shouldn't be necessary.  In general I would avoid
copying data when convenient.

NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Small short question Was: Re: Recovering from the kernel bug, Neil?
  2012-09-20  2:22             ` Small short question Was: Re: Recovering from the kernel bug, Neil? NeilBrown
@ 2012-09-20 17:05               ` Oliver Schinagl
  2012-09-20 17:49                 ` Chris Murphy
  0 siblings, 1 reply; 12+ messages in thread
From: Oliver Schinagl @ 2012-09-20 17:05 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On 20-09-12 04:22, NeilBrown wrote:
> On Fri, 14 Sep 2012 13:51:46 +0200 Oliver Schinagl <oliver+list@schinagl.nl>
> wrote:
>
>> Since I had only one valid device to check with etc, I assume that if
>> fsck -n -f /dev/md2 runs sucessfully, it is 100% safe to assume the
>> array is perfectly healthy?
> I might be a bit late here but....
>
> No, not 100% safe.
> I would at least mount the filesystem and have a look around - take a random
> sample and see if it all looks credible.  If it does then that is as good as
> it gets.
I did check the FS, I ran some fsck tests and checked data on the drive, 
everything appeared to be normal. I still have to verify some 
unimportant iso images, once that is done, I cannot imagine the data 
being bad.
>
>> E.g. it should be perfectly safe to mdadm --zero-superblock on /dev/sda6
>> and add it to /dev/md2 (missing /dev/sdb6)?
>>
>> I know technically this all works out fine, and the bug shouldn't have
>> broken anything in that regard. Or is it absolutly recommended to simply
>> create a new array, with a new FS on it, and copy all data over
>> (Logically also with /dev/md2, /dev/sda6 missing and later adding sdb6)?
> Creating a new array shouldn't be necessary.  In general I would avoid
> copying data when convenient.
I didn't think it would have mattered. The meta-data is rewritten so if 
data is good, everything is good. Since I added an empty disk to the 
'half' array, everything is re-synced and a-ok.
> NeilBrown


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Small short question Was: Re: Recovering from the kernel bug, Neil?
  2012-09-20 17:05               ` Oliver Schinagl
@ 2012-09-20 17:49                 ` Chris Murphy
  0 siblings, 0 replies; 12+ messages in thread
From: Chris Murphy @ 2012-09-20 17:49 UTC (permalink / raw)
  To: Linux RAID

On Sep 20, 2012, at 11:05 AM, Oliver Schinagl wrote:

>> 
> I did check the FS, I ran some fsck tests and checked data on the drive, everything appeared to be normal. I still have to verify some unimportant iso images, once that is done, I cannot imagine the data being bad.

You should make sure it wasn't just a quick journal check, which is now common. If the journal is clean, the file system is assumed to be clean, because that's faster than fully checking the file system. For e2fsck this means passing -f. You might want to use the -n option also.

For XFS you'd xfs_check first, and then xfs_repair -n which could take quite a while depending on the size of the file system.

Chris Murphy

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-09-20 17:49 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-19 13:56 Recovering from the kernel bug Oliver Schinagl
2012-09-09 20:22 ` Recovering from the kernel bug, Neil? Oliver Schinagl
2012-09-09 23:08   ` NeilBrown
2012-09-10  8:44     ` Oliver Schinagl
2012-09-11  6:16       ` NeilBrown
2012-09-14 10:07         ` Oliver Schinagl
2012-09-14 11:51           ` Small short question Was: " Oliver Schinagl
2012-09-14 16:43             ` Small short question Peter Grandi
2012-09-14 20:19               ` Oliver Schinagl
2012-09-20  2:22             ` Small short question Was: Re: Recovering from the kernel bug, Neil? NeilBrown
2012-09-20 17:05               ` Oliver Schinagl
2012-09-20 17:49                 ` Chris Murphy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).