Yet another corrupt raid5

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Yet another corrupt raid5
@ 2012-05-05 12:42 Philipp Wendler
  2012-05-06  6:00 ` NeilBrown
  0 siblings, 1 reply; 4+ messages in thread
From: Philipp Wendler @ 2012-05-05 12:42 UTC (permalink / raw)
  To: linux-raid

Hi,

sorry, but here's yet another guy asking for some help on fixing his
RAID5. I have read the other threads, but please help me to make sure
that I am doing the correct things.

I have a RAID5 with 3 devices and a write intent bitmap, created with
Ubuntu 11.10 (Kernel 3.0, mdadm 3.1) and I upgraded to Ubuntu 12.04
(Kernel 3.2, mdadm 3.2.3). No hardware failure happened.

Since the first boot with the new system, all 3 devices are marked as
spares and --assemble refuses to run the raid because of this:

# mdadm --assemble -vv /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: looking for devices for /dev/md0
mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot -1.
mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot -1.
mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot -1.
mdadm: added /dev/sdc1 to /dev/md0 as -1
mdadm: added /dev/sdd1 to /dev/md0 as -1
mdadm: added /dev/sdb1 to /dev/md0 as -1
mdadm: /dev/md0 assembled from 0 drives and 3 spares - not enough to
start the array.

# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md0 : inactive sdc1[0](S) sdb1[1](S) sdd1[3](S)
      5860537344 blocks super 1.2

# --examine /dev/sdb1
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : c37dda6d:b10ef0c4:c304569f:1db0fd44
           Name : server:0  (local to host server)
  Creation Time : Thu Jun 30 12:15:27 2011
     Raid Level : -unknown-
   Raid Devices : 0

 Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 4635f495:15c062a3:33a2fe5c:2c4e0d6d

    Update Time : Sat May  5 13:06:49 2012
       Checksum : d8fe5afe - correct
         Events : 1

   Device Role : spare
   Array State :  ('A' == active, '.' == missing)

I did not write on the disks, and did not execute any other commands
than --assemble, so from the other threads I guess that I can recreate
my raid with the data?

My questions:
Do I need to upgrade mdadm, for example to avoid the bitmap problem?

How I can I backup the superblocks before?
(I'm not sure where they are on disk).

Is the following command right:
mdadm -C -e 1.2 -5 -n 3 --assume-clean \
  -b /boot/md0_write_intent_map \
  /dev/sdb1 /dev/sdc1 /dev/sdd1

Do I need to specify the chunk-size?
If so, how can I find it out?
I think I might have used a custom chunk size back then.
-X on my bitmap says Chunksize is 2MB, is this the right chunk size?

Is it a problem that there is a write intent map?
-X says there are 1375 dirty chunks.
Will mdadm be able to use this information, or are the dirty chunks just
lost?

Is the order of the devices on the --create command line important?
I am not 100% sure about the original order.

Am I correct that, if I have backuped the three superblocks, execute the
command above and do not write on the created array, I am not in danger
of risking anything?
I could always just reset the superblocks and then I am exactly in the
situation that I am now, so I have multiple tries, for example if chunk
size or order are wrong?
Or will mdadm do something else do my raid in the process?

Should I take any other precautions except stopping my raid before
shutting down?

Thank you very much in advance for your help.

Greetings, Philipp

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Yet another corrupt raid5
  2012-05-05 12:42 Yet another corrupt raid5 Philipp Wendler
@ 2012-05-06  6:00 ` NeilBrown
  2012-05-06  9:21   ` Philipp Wendler
  0 siblings, 1 reply; 4+ messages in thread
From: NeilBrown @ 2012-05-06  6:00 UTC (permalink / raw)
  To: Philipp Wendler; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 6455 bytes --]

On Sat, 05 May 2012 14:42:25 +0200 Philipp Wendler <ml@philippwendler.de>
wrote:

> Hi,
> 
> sorry, but here's yet another guy asking for some help on fixing his
> RAID5. I have read the other threads, but please help me to make sure
> that I am doing the correct things.
> 
> I have a RAID5 with 3 devices and a write intent bitmap, created with
> Ubuntu 11.10 (Kernel 3.0, mdadm 3.1) and I upgraded to Ubuntu 12.04
> (Kernel 3.2, mdadm 3.2.3). No hardware failure happened.
> 
> Since the first boot with the new system, all 3 devices are marked as
> spares and --assemble refuses to run the raid because of this:
> 
> # mdadm --assemble -vv /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1
> mdadm: looking for devices for /dev/md0
> mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot -1.
> mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot -1.
> mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot -1.
> mdadm: added /dev/sdc1 to /dev/md0 as -1
> mdadm: added /dev/sdd1 to /dev/md0 as -1
> mdadm: added /dev/sdb1 to /dev/md0 as -1
> mdadm: /dev/md0 assembled from 0 drives and 3 spares - not enough to
> start the array.
> 
> # cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> md0 : inactive sdc1[0](S) sdb1[1](S) sdd1[3](S)
>       5860537344 blocks super 1.2
> 
> # --examine /dev/sdb1
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x0
>      Array UUID : c37dda6d:b10ef0c4:c304569f:1db0fd44
>            Name : server:0  (local to host server)
>   Creation Time : Thu Jun 30 12:15:27 2011
>      Raid Level : -unknown-
>    Raid Devices : 0
> 
>  Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : active
>     Device UUID : 4635f495:15c062a3:33a2fe5c:2c4e0d6d
> 
>     Update Time : Sat May  5 13:06:49 2012
>        Checksum : d8fe5afe - correct
>          Events : 1
> 
>    Device Role : spare
>    Array State :  ('A' == active, '.' == missing)
> 
> 
> I did not write on the disks, and did not execute any other commands
> than --assemble, so from the other threads I guess that I can recreate
> my raid with the data?

Yes, you should be able to.  Patience is important though, don't rush things.

> 
> My questions:
> Do I need to upgrade mdadm, for example to avoid the bitmap problem?

No.  The 'bitmap problem' only involves adding an internal bitmap to an
existing array.  You aren't doing that here.

> 
> How I can I backup the superblocks before?
> (I'm not sure where they are on disk).

You can't easily.  The output of "mdadm --examine" is probably the best
backup for now.

> 
> Is the following command right:
> mdadm -C -e 1.2 -5 -n 3 --assume-clean \
>   -b /boot/md0_write_intent_map \
>   /dev/sdb1 /dev/sdc1 /dev/sdd1

If you had an external write-intent bitmap and 3 drives is a RAID5 which
were, in order , sdb1, sdc1, sdd1, then it is close.
You want "-l 5" rather than "-5"
You also want "/dev/md0" after the "-C".

> 
> Do I need to specify the chunk-size?

It is best to, else it will use the default which might not be correct.

> If so, how can I find it out?

You cannot directly.  If you don't know it then you might need to try
different chunk sizes until you get an array the presents your data correctly.
I would try the chunksize that you think is probably correct, then "fsck -n"
the filesystem (Assuming you are using extX).  If that works, mount read-only
and have a look at some files.
If it doesn't work, stop the array and try with a different chunk size.

> I think I might have used a custom chunk size back then.
> -X on my bitmap says Chunksize is 2MB, is this the right chunk size?

No.  The bitmap chunk size (should be called a 'region size' I now think) is
quite different from the RAID5 chunk size.

However the bitmap will record the total size of the array.  The chunksize
must divide that evenly.  As you have 2 data disks, 2*chunksize must divide
the total size evenly.  That will put an upper bound on the chunk size.

The "mdadm -E" claims the array to be 3907024896 sectors which is 1953512448K.
That is 2^10K * 3 * 635909
So that chunk size is at most 2^9K - 512K, which is currently the default.
It might be less.

> 
> Is it a problem that there is a write intent map?

Not particularly.

> -X says there are 1375 dirty chunks.
> Will mdadm be able to use this information, or are the dirty chunks just
> lost?

No mdadm cannot use this information, but that is unlikely to be a problem.
"dirty" doesn't mean that the parity is inconsistent with the data, it means
that the parity might be inconsistent with the data.  It most cases it isn't.
And as your array is not degraded, it doesn't matter anyway.

Once you have you array back together again you should
   echo repair > /sys/block/md0/md/sync_action
to check all the parity blocks and repair any that are found to be wrong.

> 
> Is the order of the devices on the --create command line important?
> I am not 100% sure about the original order.

Yes, it is very import.
Every time md starts the array it will print a "RAID conf printout" which
lists the devices in order.  If you can find a recent one of those in kernel
logs it will confirm the correct order.  Unfortunately it doesn't list the
chunk size.

> 
> Am I correct that, if I have backuped the three superblocks, execute the
> command above and do not write on the created array, I am not in danger
> of risking anything?

Correct.

> I could always just reset the superblocks and then I am exactly in the
> situation that I am now, so I have multiple tries, for example if chunk
> size or order are wrong?

Correct

> Or will mdadm do something else do my raid in the process?

It should all be fine.
It is important that the metadata version is the same (1.2) otherwise you
could corrupt data.
You should also check that the "data offset" of the newly created array is
the same as before (2048 sectors).

> 
> Should I take any other precautions except stopping my raid before
> shutting down?

None that I can think of.

> 
> Thank you very much in advance for your help.

Good luck, and please accept my apologies for the bug that resulted in this
unfortunate situation.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Yet another corrupt raid5
  2012-05-06  6:00 ` NeilBrown
@ 2012-05-06  9:21   ` Philipp Wendler
  2012-05-08 20:40     ` NeilBrown
  0 siblings, 1 reply; 4+ messages in thread
From: Philipp Wendler @ 2012-05-06  9:21 UTC (permalink / raw)
  To: linux-raid

Hi Neil,

Am 06.05.2012 08:00, schrieb NeilBrown:
> On Sat, 05 May 2012 14:42:25 +0200 Philipp Wendler <ml@philippwendler.de>
> wrote:

>> I did not write on the disks, and did not execute any other commands
>> than --assemble, so from the other threads I guess that I can recreate
>> my raid with the data?
> 
> Yes, you should be able to.  Patience is important though, don't rush things.

Yes, that's why I didn't try anything myself and came to this list to ask.

>> Is the following command right:
>> mdadm -C -e 1.2 -5 -n 3 --assume-clean \
>>   -b /boot/md0_write_intent_map \
>>   /dev/sdb1 /dev/sdc1 /dev/sdd1
> 
> If you had an external write-intent bitmap and 3 drives is a RAID5 which
> were, in order , sdb1, sdc1, sdd1, then it is close.
> You want "-l 5" rather than "-5"
> You also want "/dev/md0" after the "-C".

Right, I just forgot that.

>> Do I need to specify the chunk-size?
>> If so, how can I find it out?
> 
> You cannot directly.  If you don't know it then you might need to try
> different chunk sizes until you get an array the presents your data correctly.
> I would try the chunksize that you think is probably correct, then "fsck -n"
> the filesystem (Assuming you are using extX).  If that works, mount read-only
> and have a look at some files.
> If it doesn't work, stop the array and try with a different chunk size.
> 
>> I think I might have used a custom chunk size back then.
>> -X on my bitmap says Chunksize is 2MB, is this the right chunk size?
> 
> No.  The bitmap chunk size (should be called a 'region size' I now think) is
> quite different from the RAID5 chunk size.
> 
> However the bitmap will record the total size of the array.  The chunksize
> must divide that evenly.  As you have 2 data disks, 2*chunksize must divide
> the total size evenly.  That will put an upper bound on the chunk size.
> 
> The "mdadm -E" claims the array to be 3907024896 sectors which is 1953512448K.
> That is 2^10K * 3 * 635909
> So that chunk size is at most 2^9K - 512K, which is currently the default.
> It might be less.

Ah, if the maximum size is equal to the default, then I am sure I used
this. I just was not sure if I made it bigger.

>> -X says there are 1375 dirty chunks.
>> Will mdadm be able to use this information, or are the dirty chunks just
>> lost?
> 
> No mdadm cannot use this information, but that is unlikely to be a problem.
> "dirty" doesn't mean that the parity is inconsistent with the data, it means
> that the parity might be inconsistent with the data.  It most cases it isn't.
> And as your array is not degraded, it doesn't matter anyway.
> 
> Once you have you array back together again you should
>    echo repair > /sys/block/md0/md/sync_action
> to check all the parity blocks and repair any that are found to be wrong.

Ok, I already thought that might be good.

>> Is the order of the devices on the --create command line important?
>> I am not 100% sure about the original order.
> 
> Yes, it is very import.
> Every time md starts the array it will print a "RAID conf printout" which
> lists the devices in order.  If you can find a recent one of those in kernel
> logs it will confirm the correct order.  Unfortunately it doesn't list the
> chunk size.

Good idea, I found it in the log. Was actually sdc1 sdb1 sdd1.

So I did it, and it worked out fine on the first try.
Luks could successfully decrypt it, fsck did not complain, mounting
worked and data is also fine. So hurrayy and big thanks ;-)
Now I am running the resync.

>> Thank you very much in advance for your help.
> 
> Good luck, and please accept my apologies for the bug that resulted in this
> unfortunate situation.

Hey, you don't need to apologize. I am a software developer as well, and
I know that such things might happen. And it didn't crash my data, so
everything is fine.

On the contrary, I want to thank all the developers here that you do all
this work I can use for free (in both senses) and now when there was a
problem, I could ask on this list and get such a extensive and helpful
answer, although I am this "yet another guy" that asks something for the
x-th time.

Greetings, Philipp

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Yet another corrupt raid5
  2012-05-06  9:21   ` Philipp Wendler
@ 2012-05-08 20:40     ` NeilBrown
  0 siblings, 0 replies; 4+ messages in thread
From: NeilBrown @ 2012-05-08 20:40 UTC (permalink / raw)
  To: Philipp Wendler; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1315 bytes --]

On Sun, 06 May 2012 11:21:15 +0200 Philipp Wendler <ml@philippwendler.de>
wrote:

> So I did it, and it worked out fine on the first try.
> Luks could successfully decrypt it, fsck did not complain, mounting
> worked and data is also fine. So hurrayy and big thanks ;-)
> Now I am running the resync.

Good news!  thanks for letting us know.

> 
> >> Thank you very much in advance for your help.
> > 
> > Good luck, and please accept my apologies for the bug that resulted in this
> > unfortunate situation.
> 
> Hey, you don't need to apologize. I am a software developer as well, and
> I know that such things might happen. And it didn't crash my data, so
> everything is fine.
> 
> On the contrary, I want to thank all the developers here that you do all
> this work I can use for free (in both senses) and now when there was a
> problem, I could ask on this list and get such a extensive and helpful
> answer, although I am this "yet another guy" that asks something for the
> x-th time.

I find "Do unto other what you would have others do to you" to be a very good
idea.

And the more I can encourage people to test my development code and report
bugs, the more bugs I can fix before they get to paying customers - and that
is certainly a good thing :-)

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-05-08 20:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-05 12:42 Yet another corrupt raid5 Philipp Wendler
2012-05-06  6:00 ` NeilBrown
2012-05-06  9:21   ` Philipp Wendler
2012-05-08 20:40     ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).