* Yet another corrupt raid5 @ 2012-05-05 12:42 Philipp Wendler 2012-05-06 6:00 ` NeilBrown 0 siblings, 1 reply; 4+ messages in thread From: Philipp Wendler @ 2012-05-05 12:42 UTC (permalink / raw) To: linux-raid Hi, sorry, but here's yet another guy asking for some help on fixing his RAID5. I have read the other threads, but please help me to make sure that I am doing the correct things. I have a RAID5 with 3 devices and a write intent bitmap, created with Ubuntu 11.10 (Kernel 3.0, mdadm 3.1) and I upgraded to Ubuntu 12.04 (Kernel 3.2, mdadm 3.2.3). No hardware failure happened. Since the first boot with the new system, all 3 devices are marked as spares and --assemble refuses to run the raid because of this: # mdadm --assemble -vv /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1 mdadm: looking for devices for /dev/md0 mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot -1. mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot -1. mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot -1. mdadm: added /dev/sdc1 to /dev/md0 as -1 mdadm: added /dev/sdd1 to /dev/md0 as -1 mdadm: added /dev/sdb1 to /dev/md0 as -1 mdadm: /dev/md0 assembled from 0 drives and 3 spares - not enough to start the array. # cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : inactive sdc1[0](S) sdb1[1](S) sdd1[3](S) 5860537344 blocks super 1.2 # --examine /dev/sdb1 Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : c37dda6d:b10ef0c4:c304569f:1db0fd44 Name : server:0 (local to host server) Creation Time : Thu Jun 30 12:15:27 2011 Raid Level : -unknown- Raid Devices : 0 Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : active Device UUID : 4635f495:15c062a3:33a2fe5c:2c4e0d6d Update Time : Sat May 5 13:06:49 2012 Checksum : d8fe5afe - correct Events : 1 Device Role : spare Array State : ('A' == active, '.' == missing) I did not write on the disks, and did not execute any other commands than --assemble, so from the other threads I guess that I can recreate my raid with the data? My questions: Do I need to upgrade mdadm, for example to avoid the bitmap problem? How I can I backup the superblocks before? (I'm not sure where they are on disk). Is the following command right: mdadm -C -e 1.2 -5 -n 3 --assume-clean \ -b /boot/md0_write_intent_map \ /dev/sdb1 /dev/sdc1 /dev/sdd1 Do I need to specify the chunk-size? If so, how can I find it out? I think I might have used a custom chunk size back then. -X on my bitmap says Chunksize is 2MB, is this the right chunk size? Is it a problem that there is a write intent map? -X says there are 1375 dirty chunks. Will mdadm be able to use this information, or are the dirty chunks just lost? Is the order of the devices on the --create command line important? I am not 100% sure about the original order. Am I correct that, if I have backuped the three superblocks, execute the command above and do not write on the created array, I am not in danger of risking anything? I could always just reset the superblocks and then I am exactly in the situation that I am now, so I have multiple tries, for example if chunk size or order are wrong? Or will mdadm do something else do my raid in the process? Should I take any other precautions except stopping my raid before shutting down? Thank you very much in advance for your help. Greetings, Philipp ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Yet another corrupt raid5 2012-05-05 12:42 Yet another corrupt raid5 Philipp Wendler @ 2012-05-06 6:00 ` NeilBrown 2012-05-06 9:21 ` Philipp Wendler 0 siblings, 1 reply; 4+ messages in thread From: NeilBrown @ 2012-05-06 6:00 UTC (permalink / raw) To: Philipp Wendler; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 6455 bytes --] On Sat, 05 May 2012 14:42:25 +0200 Philipp Wendler <ml@philippwendler.de> wrote: > Hi, > > sorry, but here's yet another guy asking for some help on fixing his > RAID5. I have read the other threads, but please help me to make sure > that I am doing the correct things. > > I have a RAID5 with 3 devices and a write intent bitmap, created with > Ubuntu 11.10 (Kernel 3.0, mdadm 3.1) and I upgraded to Ubuntu 12.04 > (Kernel 3.2, mdadm 3.2.3). No hardware failure happened. > > Since the first boot with the new system, all 3 devices are marked as > spares and --assemble refuses to run the raid because of this: > > # mdadm --assemble -vv /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1 > mdadm: looking for devices for /dev/md0 > mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot -1. > mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot -1. > mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot -1. > mdadm: added /dev/sdc1 to /dev/md0 as -1 > mdadm: added /dev/sdd1 to /dev/md0 as -1 > mdadm: added /dev/sdb1 to /dev/md0 as -1 > mdadm: /dev/md0 assembled from 0 drives and 3 spares - not enough to > start the array. > > # cat /proc/mdstat > Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] > [raid4] [raid10] > md0 : inactive sdc1[0](S) sdb1[1](S) sdd1[3](S) > 5860537344 blocks super 1.2 > > # --examine /dev/sdb1 > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x0 > Array UUID : c37dda6d:b10ef0c4:c304569f:1db0fd44 > Name : server:0 (local to host server) > Creation Time : Thu Jun 30 12:15:27 2011 > Raid Level : -unknown- > Raid Devices : 0 > > Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB) > Data Offset : 2048 sectors > Super Offset : 8 sectors > State : active > Device UUID : 4635f495:15c062a3:33a2fe5c:2c4e0d6d > > Update Time : Sat May 5 13:06:49 2012 > Checksum : d8fe5afe - correct > Events : 1 > > Device Role : spare > Array State : ('A' == active, '.' == missing) > > > I did not write on the disks, and did not execute any other commands > than --assemble, so from the other threads I guess that I can recreate > my raid with the data? Yes, you should be able to. Patience is important though, don't rush things. > > My questions: > Do I need to upgrade mdadm, for example to avoid the bitmap problem? No. The 'bitmap problem' only involves adding an internal bitmap to an existing array. You aren't doing that here. > > How I can I backup the superblocks before? > (I'm not sure where they are on disk). You can't easily. The output of "mdadm --examine" is probably the best backup for now. > > Is the following command right: > mdadm -C -e 1.2 -5 -n 3 --assume-clean \ > -b /boot/md0_write_intent_map \ > /dev/sdb1 /dev/sdc1 /dev/sdd1 If you had an external write-intent bitmap and 3 drives is a RAID5 which were, in order , sdb1, sdc1, sdd1, then it is close. You want "-l 5" rather than "-5" You also want "/dev/md0" after the "-C". > > Do I need to specify the chunk-size? It is best to, else it will use the default which might not be correct. > If so, how can I find it out? You cannot directly. If you don't know it then you might need to try different chunk sizes until you get an array the presents your data correctly. I would try the chunksize that you think is probably correct, then "fsck -n" the filesystem (Assuming you are using extX). If that works, mount read-only and have a look at some files. If it doesn't work, stop the array and try with a different chunk size. > I think I might have used a custom chunk size back then. > -X on my bitmap says Chunksize is 2MB, is this the right chunk size? No. The bitmap chunk size (should be called a 'region size' I now think) is quite different from the RAID5 chunk size. However the bitmap will record the total size of the array. The chunksize must divide that evenly. As you have 2 data disks, 2*chunksize must divide the total size evenly. That will put an upper bound on the chunk size. The "mdadm -E" claims the array to be 3907024896 sectors which is 1953512448K. That is 2^10K * 3 * 635909 So that chunk size is at most 2^9K - 512K, which is currently the default. It might be less. > > Is it a problem that there is a write intent map? Not particularly. > -X says there are 1375 dirty chunks. > Will mdadm be able to use this information, or are the dirty chunks just > lost? No mdadm cannot use this information, but that is unlikely to be a problem. "dirty" doesn't mean that the parity is inconsistent with the data, it means that the parity might be inconsistent with the data. It most cases it isn't. And as your array is not degraded, it doesn't matter anyway. Once you have you array back together again you should echo repair > /sys/block/md0/md/sync_action to check all the parity blocks and repair any that are found to be wrong. > > Is the order of the devices on the --create command line important? > I am not 100% sure about the original order. Yes, it is very import. Every time md starts the array it will print a "RAID conf printout" which lists the devices in order. If you can find a recent one of those in kernel logs it will confirm the correct order. Unfortunately it doesn't list the chunk size. > > Am I correct that, if I have backuped the three superblocks, execute the > command above and do not write on the created array, I am not in danger > of risking anything? Correct. > I could always just reset the superblocks and then I am exactly in the > situation that I am now, so I have multiple tries, for example if chunk > size or order are wrong? Correct > Or will mdadm do something else do my raid in the process? It should all be fine. It is important that the metadata version is the same (1.2) otherwise you could corrupt data. You should also check that the "data offset" of the newly created array is the same as before (2048 sectors). > > Should I take any other precautions except stopping my raid before > shutting down? None that I can think of. > > Thank you very much in advance for your help. Good luck, and please accept my apologies for the bug that resulted in this unfortunate situation. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Yet another corrupt raid5 2012-05-06 6:00 ` NeilBrown @ 2012-05-06 9:21 ` Philipp Wendler 2012-05-08 20:40 ` NeilBrown 0 siblings, 1 reply; 4+ messages in thread From: Philipp Wendler @ 2012-05-06 9:21 UTC (permalink / raw) To: linux-raid Hi Neil, Am 06.05.2012 08:00, schrieb NeilBrown: > On Sat, 05 May 2012 14:42:25 +0200 Philipp Wendler <ml@philippwendler.de> > wrote: >> I did not write on the disks, and did not execute any other commands >> than --assemble, so from the other threads I guess that I can recreate >> my raid with the data? > > Yes, you should be able to. Patience is important though, don't rush things. Yes, that's why I didn't try anything myself and came to this list to ask. >> Is the following command right: >> mdadm -C -e 1.2 -5 -n 3 --assume-clean \ >> -b /boot/md0_write_intent_map \ >> /dev/sdb1 /dev/sdc1 /dev/sdd1 > > If you had an external write-intent bitmap and 3 drives is a RAID5 which > were, in order , sdb1, sdc1, sdd1, then it is close. > You want "-l 5" rather than "-5" > You also want "/dev/md0" after the "-C". Right, I just forgot that. >> Do I need to specify the chunk-size? >> If so, how can I find it out? > > You cannot directly. If you don't know it then you might need to try > different chunk sizes until you get an array the presents your data correctly. > I would try the chunksize that you think is probably correct, then "fsck -n" > the filesystem (Assuming you are using extX). If that works, mount read-only > and have a look at some files. > If it doesn't work, stop the array and try with a different chunk size. > >> I think I might have used a custom chunk size back then. >> -X on my bitmap says Chunksize is 2MB, is this the right chunk size? > > No. The bitmap chunk size (should be called a 'region size' I now think) is > quite different from the RAID5 chunk size. > > However the bitmap will record the total size of the array. The chunksize > must divide that evenly. As you have 2 data disks, 2*chunksize must divide > the total size evenly. That will put an upper bound on the chunk size. > > The "mdadm -E" claims the array to be 3907024896 sectors which is 1953512448K. > That is 2^10K * 3 * 635909 > So that chunk size is at most 2^9K - 512K, which is currently the default. > It might be less. Ah, if the maximum size is equal to the default, then I am sure I used this. I just was not sure if I made it bigger. >> -X says there are 1375 dirty chunks. >> Will mdadm be able to use this information, or are the dirty chunks just >> lost? > > No mdadm cannot use this information, but that is unlikely to be a problem. > "dirty" doesn't mean that the parity is inconsistent with the data, it means > that the parity might be inconsistent with the data. It most cases it isn't. > And as your array is not degraded, it doesn't matter anyway. > > Once you have you array back together again you should > echo repair > /sys/block/md0/md/sync_action > to check all the parity blocks and repair any that are found to be wrong. Ok, I already thought that might be good. >> Is the order of the devices on the --create command line important? >> I am not 100% sure about the original order. > > Yes, it is very import. > Every time md starts the array it will print a "RAID conf printout" which > lists the devices in order. If you can find a recent one of those in kernel > logs it will confirm the correct order. Unfortunately it doesn't list the > chunk size. Good idea, I found it in the log. Was actually sdc1 sdb1 sdd1. So I did it, and it worked out fine on the first try. Luks could successfully decrypt it, fsck did not complain, mounting worked and data is also fine. So hurrayy and big thanks ;-) Now I am running the resync. >> Thank you very much in advance for your help. > > Good luck, and please accept my apologies for the bug that resulted in this > unfortunate situation. Hey, you don't need to apologize. I am a software developer as well, and I know that such things might happen. And it didn't crash my data, so everything is fine. On the contrary, I want to thank all the developers here that you do all this work I can use for free (in both senses) and now when there was a problem, I could ask on this list and get such a extensive and helpful answer, although I am this "yet another guy" that asks something for the x-th time. Greetings, Philipp ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Yet another corrupt raid5 2012-05-06 9:21 ` Philipp Wendler @ 2012-05-08 20:40 ` NeilBrown 0 siblings, 0 replies; 4+ messages in thread From: NeilBrown @ 2012-05-08 20:40 UTC (permalink / raw) To: Philipp Wendler; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 1315 bytes --] On Sun, 06 May 2012 11:21:15 +0200 Philipp Wendler <ml@philippwendler.de> wrote: > So I did it, and it worked out fine on the first try. > Luks could successfully decrypt it, fsck did not complain, mounting > worked and data is also fine. So hurrayy and big thanks ;-) > Now I am running the resync. Good news! thanks for letting us know. > > >> Thank you very much in advance for your help. > > > > Good luck, and please accept my apologies for the bug that resulted in this > > unfortunate situation. > > Hey, you don't need to apologize. I am a software developer as well, and > I know that such things might happen. And it didn't crash my data, so > everything is fine. > > On the contrary, I want to thank all the developers here that you do all > this work I can use for free (in both senses) and now when there was a > problem, I could ask on this list and get such a extensive and helpful > answer, although I am this "yet another guy" that asks something for the > x-th time. I find "Do unto other what you would have others do to you" to be a very good idea. And the more I can encourage people to test my development code and report bugs, the more bugs I can fix before they get to paying customers - and that is certainly a good thing :-) Thanks, NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-05-08 20:40 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-05-05 12:42 Yet another corrupt raid5 Philipp Wendler 2012-05-06 6:00 ` NeilBrown 2012-05-06 9:21 ` Philipp Wendler 2012-05-08 20:40 ` NeilBrown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).