* Recovering a RAID6 after all disks were disconnected @ 2016-12-07 8:59 Giuseppe Bilotta 2016-12-07 14:31 ` John Stoffel 0 siblings, 1 reply; 12+ messages in thread From: Giuseppe Bilotta @ 2016-12-07 8:59 UTC (permalink / raw) To: linux-raid Hello all, my situation is the following: I have a small 4-disk JBOD that I use to hold a RAID6 software raid setup controlled by mdraid (currently Debian version 3.4-4 on Linux kernel 4.7.8-1 I've had sporadic resets of the JBOD due to a variety of reasons (power failures or disk failures —the JBOD has the bad habit of resetting when one disk has an I/O error, which causes all of the disks to go offline temporarily). When this happens, all the disks get kicked from the RAID, as md fails to find them until the reset of the JBOD is complete. When the disks come back online, even if it's just a few seconds later, the RAID remains in the failed configuration with all 4 disks missing, of course. Normally, the way I would proceed in this case is to unmount the filesystem sitting on top of the RAID, stop the RAID, and then try to start it again, which works reasonably well (aside from the obvious filesystem check that is often needed). The thing happened again a couple of days ago, but this time I tried re-adding the disks directly when they came back online, using mdadm -a and confident that since they _had_ been recently part of the array, the array would actually go back to work fine —except that this is not the case when ALL disks were kicked out of the array! Instead, what happened was that all the disks were marked as 'spare' and the RAID would not assemble anymore. At this point I stopped everything and made a full copy of the RAID disks (lucky me, I had just bought a new JBOD for an upgrade, and a bunch of new disks, even if one of them is apparently defective so I have only been able to backup 3 of the 4 disks) and I have been toying around with ways to recover the array by playing on the copies I've made (I've set the original disks to readonly at the kernel level just to be sure). So now my situation is this, and I would like to know if there is something I can try to recover the RAID (I've made a few tests that I will describe momentarily). (I would like to know if there is any possibility for md to handle these kind of issue —all disks in a RAID going temporarily offline— more gracefully, which is likely needed for a lot of home setup where SATA is used instead of SAS). So one thing that I've done is to hack around the superblock in the disks (copies) to put back the device roles as they were (getting the information from the pre-failure dmesg output). (By the way, I've been using Andy's Binary Editor for the superblock editing, so if anyone is interested in a be.ini for mdraid v1 superblocks, including checksum verification, I'd be happy to share). Specifically, I've left the device number untouched, but I have edited the dev_roles array so that the slots corresponding to the dev_number from all the disks map to appropriate device roles. I can then assemble the array with only 3 of 4 disks (because I do not have a copy of the fourth, essentially) and force-run it. However, when I do this, I get two things: (1) a complaint about the bitmap being out of date (number of events too low by 3) and (2) I/O errors on logical block 0 (and the RAID data thus completely inaccessible) I'm now wondering about what I should try next. Prevent a resync by matching the event count with that of the bitmap (or conversely)? Try a different permutation of the roles? (I have triple-checked but who knows)? Try a different subset of disks? Try and recreate the array? Thanks in advance for any suggestion you may have, -- Giuseppe "Oblomov" Bilotta ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Recovering a RAID6 after all disks were disconnected 2016-12-07 8:59 Recovering a RAID6 after all disks were disconnected Giuseppe Bilotta @ 2016-12-07 14:31 ` John Stoffel 2016-12-07 17:21 ` Giuseppe Bilotta 0 siblings, 1 reply; 12+ messages in thread From: John Stoffel @ 2016-12-07 14:31 UTC (permalink / raw) To: Giuseppe Bilotta; +Cc: linux-raid Giuseppe> my situation is the following: I have a small 4-disk JBOD that I use Giuseppe> to hold a RAID6 software raid setup controlled by mdraid (currently Giuseppe> Debian version 3.4-4 on Linux kernel 4.7.8-1 Giuseppe> I've had sporadic resets of the JBOD due to a variety of reasons Giuseppe> (power failures or disk failures —the JBOD has the bad habit of Giuseppe> resetting when one disk has an I/O error, which causes all of the Giuseppe> disks to go offline temporarily). Please toss that JBOD out the window! *grin* Giuseppe> When this happens, all the disks get kicked from the RAID, Giuseppe> as md fails to find them until the reset of the JBOD is Giuseppe> complete. When the disks come back online, even if it's just Giuseppe> a few seconds later, the RAID remains in the failed Giuseppe> configuration with all 4 disks missing, of course. Giuseppe> Normally, the way I would proceed in this case is to unmount Giuseppe> the filesystem sitting on top of the RAID, stop the RAID, Giuseppe> and then try to start it again, which works reasonably well Giuseppe> (aside from the obvious filesystem check that is often Giuseppe> needed). Giuseppe> The thing happened again a couple of days ago, but this time Giuseppe> I tried re-adding the disks directly when they came back Giuseppe> online, using mdadm -a and confident that since they _had_ Giuseppe> been recently part of the array, the array would actually go Giuseppe> back to work fine —except that this is not the case when ALL Giuseppe> disks were kicked out of the array! Instead, what happened Giuseppe> was that all the disks were marked as 'spare' and the RAID Giuseppe> would not assemble anymore. Can you please send us the full details of each disk using the command: mdadm -E /dev/sda1 Where of course 'a' and '1' depend on whether or not you are using whole disk or partitioned disks for your arrays. You might be able to just for the three spare disks (assumed in this case to be sda1, sdb1, sdc1; but you need to be sure first!) to assemble into a full array with: mdadm -A /dev/md50 /dev/sda1 /dev/sdb1 /dev/sdc1 And if that works, great. If not, post the error message(s) you get back. Basically provide more details on your setup so we can help you. John Giuseppe> At this point I stopped everything and made a full copy of Giuseppe> the RAID disks (lucky me, I had just bought a new JBOD for Giuseppe> an upgrade, and a bunch of new disks, even if one of them is Giuseppe> apparently defective so I have only been able to backup 3 of Giuseppe> the 4 disks) and I have been toying around with ways to Giuseppe> recover the array by playing on the copies I've made (I've Giuseppe> set the original disks to readonly at the kernel level just Giuseppe> to be sure). Giuseppe> So now my situation is this, and I would like to know if there is Giuseppe> something I can try to recover the RAID (I've made a few tests that I Giuseppe> will describe momentarily). (I would like to know if there is any Giuseppe> possibility for md to handle these kind of issue —all disks in a RAID Giuseppe> going temporarily offline— more gracefully, which is likely needed for Giuseppe> a lot of home setup where SATA is used instead of SAS). Giuseppe> So one thing that I've done is to hack around the superblock in the Giuseppe> disks (copies) to put back the device roles as they were (getting the Giuseppe> information from the pre-failure dmesg output). (By the way, I've been Giuseppe> using Andy's Binary Editor for the superblock editing, so if anyone is Giuseppe> interested in a be.ini for mdraid v1 superblocks, including checksum Giuseppe> verification, I'd be happy to share). Specifically, I've left the Giuseppe> device number untouched, but I have edited the dev_roles array so that Giuseppe> the slots corresponding to the dev_number from all the disks map to Giuseppe> appropriate device roles. Giuseppe> I can then assemble the array with only 3 of 4 disks (because I do not Giuseppe> have a copy of the fourth, essentially) and force-run it. However, Giuseppe> when I do this, I get two things: Giuseppe> (1) a complaint about the bitmap being out of date (number of events Giuseppe> too low by 3) and Giuseppe> (2) I/O errors on logical block 0 (and the RAID data thus completely Giuseppe> inaccessible) Giuseppe> I'm now wondering about what I should try next. Prevent a resync by Giuseppe> matching the event count with that of the bitmap (or conversely)? Try Giuseppe> a different permutation of the roles? (I have triple-checked but who Giuseppe> knows)? Try a different subset of disks? Try and recreate the array? Giuseppe> Thanks in advance for any suggestion you may have, Giuseppe> -- Giuseppe> Giuseppe "Oblomov" Bilotta Giuseppe> -- Giuseppe> To unsubscribe from this list: send the line "unsubscribe linux-raid" in Giuseppe> the body of a message to majordomo@vger.kernel.org Giuseppe> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Recovering a RAID6 after all disks were disconnected 2016-12-07 14:31 ` John Stoffel @ 2016-12-07 17:21 ` Giuseppe Bilotta 2016-12-08 19:02 ` John Stoffel 0 siblings, 1 reply; 12+ messages in thread From: Giuseppe Bilotta @ 2016-12-07 17:21 UTC (permalink / raw) To: John Stoffel; +Cc: linux-raid Hello John, and thanks for your time Giuseppe> I've had sporadic resets of the JBOD due to a variety of reasons Giuseppe> (power failures or disk failures —the JBOD has the bad habit of Giuseppe> resetting when one disk has an I/O error, which causes all of the Giuseppe> disks to go offline temporarily). John> Please toss that JBOD out the window! *grin* Well, that's exactly why I bought the new one which is the one I'm currently using to host the backup disks I'm experimenting on! 8-) However I suspect this is a misfeature common to many if not all 'home' JBODS which are all SATA based and only provide eSATA and/or USB3 connection to the machine. Giuseppe> The thing happened again a couple of days ago, but this time Giuseppe> I tried re-adding the disks directly when they came back Giuseppe> online, using mdadm -a and confident that since they _had_ Giuseppe> been recently part of the array, the array would actually go Giuseppe> back to work fine —except that this is not the case when ALL Giuseppe> disks were kicked out of the array! Instead, what happened Giuseppe> was that all the disks were marked as 'spare' and the RAID Giuseppe> would not assemble anymore. John> Can you please send us the full details of each disk using the John> command: John> John> mdadm -E /dev/sda1 John> Here it is. Notice that this is the result of -E _after_ the attempted re-add while the RAID was running, which marked all the disks as spares: ==8<======= /dev/sdc: Magic : a92b4efc Version : 1.2 Feature Map : 0x9 Array UUID : 943d287e:af28b455:88a047f2:d714b8c6 Name : labrador:oneforall (local to host labrador) Creation Time : Fri Nov 30 19:57:45 2012 Raid Level : raid6 Raid Devices : 4 Avail Dev Size : 5860271024 (2794.39 GiB 3000.46 GB) Array Size : 5860270080 (5588.79 GiB 6000.92 GB) Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262048 sectors, after=944 sectors State : clean Device UUID : 543f75ac:a1f3cf99:1c6b71d9:52e358b9 Internal Bitmap : 8 sectors from superblock Update Time : Sun Dec 4 17:11:19 2016 Bad Block Log : 512 entries available at offset 80 sectors - bad blocks present. Checksum : 1e2f00fc - correct Events : 31196 Layout : left-symmetric Chunk Size : 512K Device Role : spare Array State : .... ('A' == active, '.' == missing, 'R' == replacing) /dev/sdd: Magic : a92b4efc Version : 1.2 Feature Map : 0x9 Array UUID : 943d287e:af28b455:88a047f2:d714b8c6 Name : labrador:oneforall (local to host labrador) Creation Time : Fri Nov 30 19:57:45 2012 Raid Level : raid6 Raid Devices : 4 Avail Dev Size : 5860271024 (2794.39 GiB 3000.46 GB) Array Size : 5860270080 (5588.79 GiB 6000.92 GB) Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262048 sectors, after=944 sectors State : clean Device UUID : 649d53ad:f909b7a9:cd0f57f2:08a55e3b Internal Bitmap : 8 sectors from superblock Update Time : Sun Dec 4 17:11:19 2016 Bad Block Log : 512 entries available at offset 80 sectors - bad blocks present. Checksum : c9dfe033 - correct Events : 31196 Layout : left-symmetric Chunk Size : 512K Device Role : spare Array State : .... ('A' == active, '.' == missing, 'R' == replacing) /dev/sde: Magic : a92b4efc Version : 1.2 Feature Map : 0x9 Array UUID : 943d287e:af28b455:88a047f2:d714b8c6 Name : labrador:oneforall (local to host labrador) Creation Time : Fri Nov 30 19:57:45 2012 Raid Level : raid6 Raid Devices : 4 Avail Dev Size : 5860271024 (2794.39 GiB 3000.46 GB) Array Size : 5860270080 (5588.79 GiB 6000.92 GB) Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262048 sectors, after=944 sectors State : clean Device UUID : dd3f90ab:619684c0:942a7d88:f116f2db Internal Bitmap : 8 sectors from superblock Update Time : Sun Dec 4 17:11:19 2016 Bad Block Log : 512 entries available at offset 80 sectors - bad blocks present. Checksum : 15a3975a - correct Events : 31196 Layout : left-symmetric Chunk Size : 512K Device Role : spare Array State : .... ('A' == active, '.' == missing, 'R' == replacing) /dev/sdf: Magic : a92b4efc Version : 1.2 Feature Map : 0x9 Array UUID : 943d287e:af28b455:88a047f2:d714b8c6 Name : labrador:oneforall (local to host labrador) Creation Time : Fri Nov 30 19:57:45 2012 Raid Level : raid6 Raid Devices : 4 Avail Dev Size : 5860271024 (2794.39 GiB 3000.46 GB) Array Size : 5860270080 (5588.79 GiB 6000.92 GB) Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262048 sectors, after=944 sectors State : clean Device UUID : f7359c4e:c1f04b22:ce7aa32f:ed5bb054 Internal Bitmap : 8 sectors from superblock Update Time : Sun Dec 4 17:11:19 2016 Bad Block Log : 512 entries available at offset 80 sectors - bad blocks present. Checksum : 3a5b94a7 - correct Events : 31196 Layout : left-symmetric Chunk Size : 512K Device Role : spare Array State : .... ('A' == active, '.' == missing, 'R' == replacing) ==8<======= I do however know the _original_ positions of the respective disks from the kernel messages At assembly time: [ +0.000638] RAID conf printout: [ +0.000001] --- level:6 rd:4 wd:4 [ +0.000001] disk 0, o:1, dev:sdf [ +0.000001] disk 1, o:1, dev:sde [ +0.000000] disk 2, o:1, dev:sdd [ +0.000001] disk 3, o:1, dev:sdc After the JBOD disappeared and right before they all get kicked out: [ +0.000438] RAID conf printout: [ +0.000001] --- level:6 rd:4 wd:0 [ +0.000001] disk 0, o:0, dev:sdf [ +0.000001] disk 1, o:0, dev:sde [ +0.000000] disk 2, o:0, dev:sdd [ +0.000001] disk 3, o:0, dev:sdc John> You might be able to just for the three spare disks (assumed in this John> case to be sda1, sdb1, sdc1; but you need to be sure first!) to John> assemble into a full array with: John> John> mdadm -A /dev/md50 /dev/sda1 /dev/sdb1 /dev/sdc1 John> John> And if that works, great. If not, post the error message(s) you get John> back. Note that the RAID has no active disks anymore, since when I tried re-adding the formerly active disks that where kicked from the array they got marked as spares, and mdraid simply refuses to start a RAID6 setup with only spares. The message I get is indeed mdadm: /dev/md126 assembled from 0 drives and 3 spares - not enough to start the array. This is the point at which I made a copy of 3 of the 4 disks and started playing around. Specifically, I dd'ed sdc into sdh, sdd into sdi and sde into sdj and started playing around with sd[hij] rather than the original disks, as I mentioned: Giuseppe> So one thing that I've done is to hack around the superblock in the Giuseppe> disks (copies) to put back the device roles as they were (getting the Giuseppe> information from the pre-failure dmesg output). (By the way, I've been Giuseppe> using Andy's Binary Editor for the superblock editing, so if anyone is Giuseppe> interested in a be.ini for mdraid v1 superblocks, including checksum Giuseppe> verification, I'd be happy to share). Specifically, I've left the Giuseppe> device number untouched, but I have edited the dev_roles array so that Giuseppe> the slots corresponding to the dev_number from all the disks map to Giuseppe> appropriate device roles. Specifically, I hand-edited the superblocks to achieve this: ==8<=============== /dev/sdh: Magic : a92b4efc Version : 1.2 Feature Map : 0x9 Array UUID : 943d287e:af28b455:88a047f2:d714b8c6 Name : labrador:oneforall (local to host labrador) Creation Time : Fri Nov 30 19:57:45 2012 Raid Level : raid6 Raid Devices : 4 Avail Dev Size : 5860271024 (2794.39 GiB 3000.46 GB) Array Size : 5860270080 (5588.79 GiB 6000.92 GB) Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262048 sectors, after=944 sectors State : clean Device UUID : 543f75ac:a1f3cf99:1c6b71d9:52e358b9 Internal Bitmap : 8 sectors from superblock Update Time : Sun Dec 4 17:11:19 2016 Bad Block Log : 512 entries available at offset 80 sectors - bad blocks present. Checksum : 1e3300fe - correct Events : 31196 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 3 Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing) /dev/sdi: Magic : a92b4efc Version : 1.2 Feature Map : 0x9 Array UUID : 943d287e:af28b455:88a047f2:d714b8c6 Name : labrador:oneforall (local to host labrador) Creation Time : Fri Nov 30 19:57:45 2012 Raid Level : raid6 Raid Devices : 4 Avail Dev Size : 5860271024 (2794.39 GiB 3000.46 GB) Array Size : 5860270080 (5588.79 GiB 6000.92 GB) Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262048 sectors, after=944 sectors State : clean Device UUID : 649d53ad:f909b7a9:cd0f57f2:08a55e3b Internal Bitmap : 8 sectors from superblock Update Time : Sun Dec 4 17:11:19 2016 Bad Block Log : 512 entries available at offset 80 sectors - bad blocks present. Checksum : c9e3e035 - correct Events : 31196 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 2 Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing) /dev/sdj: Magic : a92b4efc Version : 1.2 Feature Map : 0x9 Array UUID : 943d287e:af28b455:88a047f2:d714b8c6 Name : labrador:oneforall (local to host labrador) Creation Time : Fri Nov 30 19:57:45 2012 Raid Level : raid6 Raid Devices : 4 Avail Dev Size : 5860271024 (2794.39 GiB 3000.46 GB) Array Size : 5860270080 (5588.79 GiB 6000.92 GB) Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262048 sectors, after=944 sectors State : clean Device UUID : dd3f90ab:619684c0:942a7d88:f116f2db Internal Bitmap : 8 sectors from superblock Update Time : Sun Dec 4 17:11:19 2016 Bad Block Log : 512 entries available at offset 80 sectors - bad blocks present. Checksum : 15a7975c - correct Events : 31196 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 1 Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing) ==8<=============== And I _can_ assemble the array, but what I get is this: [ +0.003574] md: bind<sdi> [ +0.001823] md: bind<sdh> [ +0.000978] md: bind<sdj> [ +0.003971] md/raid:md127: device sdj operational as raid disk 1 [ +0.000125] md/raid:md127: device sdh operational as raid disk 3 [ +0.000105] md/raid:md127: device sdi operational as raid disk 2 [ +0.015017] md/raid:md127: allocated 4374kB [ +0.000139] md/raid:md127: raid level 6 active with 3 out of 4 devices, algorithm 2 [ +0.000063] RAID conf printout: [ +0.000002] --- level:6 rd:4 wd:3 [ +0.000003] disk 1, o:1, dev:sdj [ +0.000002] disk 2, o:1, dev:sdi [ +0.000001] disk 3, o:1, dev:sdh [ +0.004187] md127: bitmap file is out of date (31193 < 31196) -- forcing full recovery [ +0.000065] created bitmap (22 pages) for device md127 [ +0.000072] md127: bitmap file is out of date, doing full recovery [ +0.100300] md127: bitmap initialized from disk: read 2 pages, set 44711 of 44711 bits [ +0.039741] md127: detected capacity change from 0 to 6000916561920 [ +0.000085] Buffer I/O error on dev md127, logical block 0, async page read [ +0.000064] Buffer I/O error on dev md127, logical block 0, async page read [ +0.000022] Buffer I/O error on dev md127, logical block 0, async page read [ +0.000022] Buffer I/O error on dev md127, logical block 0, async page read [ +0.000019] ldm_validate_partition_table(): Disk read failed. [ +0.000021] Buffer I/O error on dev md127, logical block 0, async page read [ +0.000026] Buffer I/O error on dev md127, logical block 0, async page read [ +0.000022] Buffer I/O error on dev md127, logical block 0, async page read [ +0.000021] Buffer I/O error on dev md127, logical block 0, async page read [ +0.000019] Dev md127: unable to read RDB block 0 [ +0.000016] Buffer I/O error on dev md127, logical block 0, async page read [ +0.000022] Buffer I/O error on dev md127, logical block 0, async page read [ +0.000030] md127: unable to read partition table and any attempt to access md127 content gives an I/O error. -- Giuseppe "Oblomov" Bilotta ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Recovering a RAID6 after all disks were disconnected 2016-12-07 17:21 ` Giuseppe Bilotta @ 2016-12-08 19:02 ` John Stoffel 2016-12-22 23:11 ` Giuseppe Bilotta 0 siblings, 1 reply; 12+ messages in thread From: John Stoffel @ 2016-12-08 19:02 UTC (permalink / raw) To: Giuseppe Bilotta; +Cc: John Stoffel, linux-raid Sorry for not getting back to you sooner, I've been under the weather lately. And I'm NOT an expert on this, but it's good you've made copies of the disks. Giuseppe> Hello John, and thanks for your time Giuseppe> I've had sporadic resets of the JBOD due to a variety of reasons Giuseppe> (power failures or disk failures —the JBOD has the bad habit of Giuseppe> resetting when one disk has an I/O error, which causes all of the Giuseppe> disks to go offline temporarily). John> Please toss that JBOD out the window! *grin* Giuseppe> Well, that's exactly why I bought the new one which is the one I'm Giuseppe> currently using to host the backup disks I'm experimenting on! 8-) Giuseppe> However I suspect this is a misfeature common to many if not all Giuseppe> 'home' JBODS which are all SATA based and only provide eSATA and/or Giuseppe> USB3 connection to the machine. Giuseppe> The thing happened again a couple of days ago, but this time Giuseppe> I tried re-adding the disks directly when they came back Giuseppe> online, using mdadm -a and confident that since they _had_ Giuseppe> been recently part of the array, the array would actually go Giuseppe> back to work fine —except that this is not the case when ALL Giuseppe> disks were kicked out of the array! Instead, what happened Giuseppe> was that all the disks were marked as 'spare' and the RAID Giuseppe> would not assemble anymore. John> Can you please send us the full details of each disk using the John> command: John> John> mdadm -E /dev/sda1 John> Giuseppe> Here it is. Notice that this is the result of -E _after_ the attempted Giuseppe> re-add while the RAID was running, which marked all the disks as Giuseppe> spares: Yeah, this is probably a bad state. I would suggest you try to just assemble the disks in various orders using your clones: mdadm -A /dev/md0 /dev/sdc /dev/sdd /dev/sde /dev/sdf And then mix up the order until you get a working array. You might also want to try assembling using the 'missing' flag for the original disk which dropped out of the array, so that just the three good disks are used. This might take a while to test all the possible permutations. You might also want to look back in the archives of this mailing list. Phil Turmel has some great advice and howto guides for this. You can do the test assembles using loop back devices so that you don't write to the originals, or even to the clones. This should let you do testing more quickly. Here's some other pointers for drive timeout issues that you should look at as well: Readings for timeout mismatch issues: (whole threads if possible) http://marc.info/?l=linux-raid&m=139050322510249&w=2 http://marc.info/?l=linux-raid&m=135863964624202&w=2 http://marc.info/?l=linux-raid&m=135811522817345&w=1 http://marc.info/?l=linux-raid&m=133761065622164&w=2 http://marc.info/?l=linux-raid&m=132477199207506 http://marc.info/?l=linux-raid&m=133665797115876&w=2 http://marc.info/?l=linux-raid&m=142487508806844&w=3 http://marc.info/?l=linux-raid&m=144535576302583&w=2 Giuseppe> ==8<======= Giuseppe> /dev/sdc: Giuseppe> Magic : a92b4efc Giuseppe> Version : 1.2 Giuseppe> Feature Map : 0x9 Giuseppe> Array UUID : 943d287e:af28b455:88a047f2:d714b8c6 Giuseppe> Name : labrador:oneforall (local to host labrador) Giuseppe> Creation Time : Fri Nov 30 19:57:45 2012 Giuseppe> Raid Level : raid6 Giuseppe> Raid Devices : 4 Giuseppe> Avail Dev Size : 5860271024 (2794.39 GiB 3000.46 GB) Giuseppe> Array Size : 5860270080 (5588.79 GiB 6000.92 GB) Giuseppe> Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB) Giuseppe> Data Offset : 262144 sectors Giuseppe> Super Offset : 8 sectors Giuseppe> Unused Space : before=262048 sectors, after=944 sectors Giuseppe> State : clean Giuseppe> Device UUID : 543f75ac:a1f3cf99:1c6b71d9:52e358b9 Giuseppe> Internal Bitmap : 8 sectors from superblock Giuseppe> Update Time : Sun Dec 4 17:11:19 2016 Giuseppe> Bad Block Log : 512 entries available at offset 80 sectors - bad Giuseppe> blocks present. Giuseppe> Checksum : 1e2f00fc - correct Giuseppe> Events : 31196 Giuseppe> Layout : left-symmetric Giuseppe> Chunk Size : 512K Giuseppe> Device Role : spare Giuseppe> Array State : .... ('A' == active, '.' == missing, 'R' == replacing) Giuseppe> /dev/sdd: Giuseppe> Magic : a92b4efc Giuseppe> Version : 1.2 Giuseppe> Feature Map : 0x9 Giuseppe> Array UUID : 943d287e:af28b455:88a047f2:d714b8c6 Giuseppe> Name : labrador:oneforall (local to host labrador) Giuseppe> Creation Time : Fri Nov 30 19:57:45 2012 Giuseppe> Raid Level : raid6 Giuseppe> Raid Devices : 4 Giuseppe> Avail Dev Size : 5860271024 (2794.39 GiB 3000.46 GB) Giuseppe> Array Size : 5860270080 (5588.79 GiB 6000.92 GB) Giuseppe> Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB) Giuseppe> Data Offset : 262144 sectors Giuseppe> Super Offset : 8 sectors Giuseppe> Unused Space : before=262048 sectors, after=944 sectors Giuseppe> State : clean Giuseppe> Device UUID : 649d53ad:f909b7a9:cd0f57f2:08a55e3b Giuseppe> Internal Bitmap : 8 sectors from superblock Giuseppe> Update Time : Sun Dec 4 17:11:19 2016 Giuseppe> Bad Block Log : 512 entries available at offset 80 sectors - bad Giuseppe> blocks present. Giuseppe> Checksum : c9dfe033 - correct Giuseppe> Events : 31196 Giuseppe> Layout : left-symmetric Giuseppe> Chunk Size : 512K Giuseppe> Device Role : spare Giuseppe> Array State : .... ('A' == active, '.' == missing, 'R' == replacing) Giuseppe> /dev/sde: Giuseppe> Magic : a92b4efc Giuseppe> Version : 1.2 Giuseppe> Feature Map : 0x9 Giuseppe> Array UUID : 943d287e:af28b455:88a047f2:d714b8c6 Giuseppe> Name : labrador:oneforall (local to host labrador) Giuseppe> Creation Time : Fri Nov 30 19:57:45 2012 Giuseppe> Raid Level : raid6 Giuseppe> Raid Devices : 4 Giuseppe> Avail Dev Size : 5860271024 (2794.39 GiB 3000.46 GB) Giuseppe> Array Size : 5860270080 (5588.79 GiB 6000.92 GB) Giuseppe> Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB) Giuseppe> Data Offset : 262144 sectors Giuseppe> Super Offset : 8 sectors Giuseppe> Unused Space : before=262048 sectors, after=944 sectors Giuseppe> State : clean Giuseppe> Device UUID : dd3f90ab:619684c0:942a7d88:f116f2db Giuseppe> Internal Bitmap : 8 sectors from superblock Giuseppe> Update Time : Sun Dec 4 17:11:19 2016 Giuseppe> Bad Block Log : 512 entries available at offset 80 sectors - bad Giuseppe> blocks present. Giuseppe> Checksum : 15a3975a - correct Giuseppe> Events : 31196 Giuseppe> Layout : left-symmetric Giuseppe> Chunk Size : 512K Giuseppe> Device Role : spare Giuseppe> Array State : .... ('A' == active, '.' == missing, 'R' == replacing) Giuseppe> /dev/sdf: Giuseppe> Magic : a92b4efc Giuseppe> Version : 1.2 Giuseppe> Feature Map : 0x9 Giuseppe> Array UUID : 943d287e:af28b455:88a047f2:d714b8c6 Giuseppe> Name : labrador:oneforall (local to host labrador) Giuseppe> Creation Time : Fri Nov 30 19:57:45 2012 Giuseppe> Raid Level : raid6 Giuseppe> Raid Devices : 4 Giuseppe> Avail Dev Size : 5860271024 (2794.39 GiB 3000.46 GB) Giuseppe> Array Size : 5860270080 (5588.79 GiB 6000.92 GB) Giuseppe> Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB) Giuseppe> Data Offset : 262144 sectors Giuseppe> Super Offset : 8 sectors Giuseppe> Unused Space : before=262048 sectors, after=944 sectors Giuseppe> State : clean Giuseppe> Device UUID : f7359c4e:c1f04b22:ce7aa32f:ed5bb054 Giuseppe> Internal Bitmap : 8 sectors from superblock Giuseppe> Update Time : Sun Dec 4 17:11:19 2016 Giuseppe> Bad Block Log : 512 entries available at offset 80 sectors - bad Giuseppe> blocks present. Giuseppe> Checksum : 3a5b94a7 - correct Giuseppe> Events : 31196 Giuseppe> Layout : left-symmetric Giuseppe> Chunk Size : 512K Giuseppe> Device Role : spare Giuseppe> Array State : .... ('A' == active, '.' == missing, 'R' == replacing) Giuseppe> ==8<======= Giuseppe> I do however know the _original_ positions of the respective disks Giuseppe> from the kernel messages Giuseppe> At assembly time: Giuseppe> [ +0.000638] RAID conf printout: Giuseppe> [ +0.000001] --- level:6 rd:4 wd:4 Giuseppe> [ +0.000001] disk 0, o:1, dev:sdf Giuseppe> [ +0.000001] disk 1, o:1, dev:sde Giuseppe> [ +0.000000] disk 2, o:1, dev:sdd Giuseppe> [ +0.000001] disk 3, o:1, dev:sdc Giuseppe> After the JBOD disappeared and right before they all get kicked out: Giuseppe> [ +0.000438] RAID conf printout: Giuseppe> [ +0.000001] --- level:6 rd:4 wd:0 Giuseppe> [ +0.000001] disk 0, o:0, dev:sdf Giuseppe> [ +0.000001] disk 1, o:0, dev:sde Giuseppe> [ +0.000000] disk 2, o:0, dev:sdd Giuseppe> [ +0.000001] disk 3, o:0, dev:sdc John> You might be able to just for the three spare disks (assumed in this John> case to be sda1, sdb1, sdc1; but you need to be sure first!) to John> assemble into a full array with: John> John> mdadm -A /dev/md50 /dev/sda1 /dev/sdb1 /dev/sdc1 John> John> And if that works, great. If not, post the error message(s) you get John> back. Giuseppe> Note that the RAID has no active disks anymore, since when I tried Giuseppe> re-adding the formerly active disks that Giuseppe> where kicked from the array they got marked as spares, and mdraid Giuseppe> simply refuses to start a RAID6 setup with only spares. The message I Giuseppe> get is indeed Giuseppe> mdadm: /dev/md126 assembled from 0 drives and 3 spares - not enough to Giuseppe> start the array. Giuseppe> This is the point at which I made a copy of 3 of the 4 disks and Giuseppe> started playing around. Specifically, I dd'ed sdc into sdh, sdd into Giuseppe> sdi and sde into sdj and started playing around with sd[hij] rather Giuseppe> than the original disks, as I mentioned: Giuseppe> So one thing that I've done is to hack around the superblock in the Giuseppe> disks (copies) to put back the device roles as they were (getting the Giuseppe> information from the pre-failure dmesg output). (By the way, I've been Giuseppe> using Andy's Binary Editor for the superblock editing, so if anyone is Giuseppe> interested in a be.ini for mdraid v1 superblocks, including checksum Giuseppe> verification, I'd be happy to share). Specifically, I've left the Giuseppe> device number untouched, but I have edited the dev_roles array so that Giuseppe> the slots corresponding to the dev_number from all the disks map to Giuseppe> appropriate device roles. Giuseppe> Specifically, I hand-edited the superblocks to achieve this: Giuseppe> ==8<=============== Giuseppe> /dev/sdh: Giuseppe> Magic : a92b4efc Giuseppe> Version : 1.2 Giuseppe> Feature Map : 0x9 Giuseppe> Array UUID : 943d287e:af28b455:88a047f2:d714b8c6 Giuseppe> Name : labrador:oneforall (local to host labrador) Giuseppe> Creation Time : Fri Nov 30 19:57:45 2012 Giuseppe> Raid Level : raid6 Giuseppe> Raid Devices : 4 Giuseppe> Avail Dev Size : 5860271024 (2794.39 GiB 3000.46 GB) Giuseppe> Array Size : 5860270080 (5588.79 GiB 6000.92 GB) Giuseppe> Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB) Giuseppe> Data Offset : 262144 sectors Giuseppe> Super Offset : 8 sectors Giuseppe> Unused Space : before=262048 sectors, after=944 sectors Giuseppe> State : clean Giuseppe> Device UUID : 543f75ac:a1f3cf99:1c6b71d9:52e358b9 Giuseppe> Internal Bitmap : 8 sectors from superblock Giuseppe> Update Time : Sun Dec 4 17:11:19 2016 Giuseppe> Bad Block Log : 512 entries available at offset 80 sectors - bad Giuseppe> blocks present. Giuseppe> Checksum : 1e3300fe - correct Giuseppe> Events : 31196 Giuseppe> Layout : left-symmetric Giuseppe> Chunk Size : 512K Giuseppe> Device Role : Active device 3 Giuseppe> Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing) Giuseppe> /dev/sdi: Giuseppe> Magic : a92b4efc Giuseppe> Version : 1.2 Giuseppe> Feature Map : 0x9 Giuseppe> Array UUID : 943d287e:af28b455:88a047f2:d714b8c6 Giuseppe> Name : labrador:oneforall (local to host labrador) Giuseppe> Creation Time : Fri Nov 30 19:57:45 2012 Giuseppe> Raid Level : raid6 Giuseppe> Raid Devices : 4 Giuseppe> Avail Dev Size : 5860271024 (2794.39 GiB 3000.46 GB) Giuseppe> Array Size : 5860270080 (5588.79 GiB 6000.92 GB) Giuseppe> Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB) Giuseppe> Data Offset : 262144 sectors Giuseppe> Super Offset : 8 sectors Giuseppe> Unused Space : before=262048 sectors, after=944 sectors Giuseppe> State : clean Giuseppe> Device UUID : 649d53ad:f909b7a9:cd0f57f2:08a55e3b Giuseppe> Internal Bitmap : 8 sectors from superblock Giuseppe> Update Time : Sun Dec 4 17:11:19 2016 Giuseppe> Bad Block Log : 512 entries available at offset 80 sectors - bad Giuseppe> blocks present. Giuseppe> Checksum : c9e3e035 - correct Giuseppe> Events : 31196 Giuseppe> Layout : left-symmetric Giuseppe> Chunk Size : 512K Giuseppe> Device Role : Active device 2 Giuseppe> Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing) Giuseppe> /dev/sdj: Giuseppe> Magic : a92b4efc Giuseppe> Version : 1.2 Giuseppe> Feature Map : 0x9 Giuseppe> Array UUID : 943d287e:af28b455:88a047f2:d714b8c6 Giuseppe> Name : labrador:oneforall (local to host labrador) Giuseppe> Creation Time : Fri Nov 30 19:57:45 2012 Giuseppe> Raid Level : raid6 Giuseppe> Raid Devices : 4 Giuseppe> Avail Dev Size : 5860271024 (2794.39 GiB 3000.46 GB) Giuseppe> Array Size : 5860270080 (5588.79 GiB 6000.92 GB) Giuseppe> Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB) Giuseppe> Data Offset : 262144 sectors Giuseppe> Super Offset : 8 sectors Giuseppe> Unused Space : before=262048 sectors, after=944 sectors Giuseppe> State : clean Giuseppe> Device UUID : dd3f90ab:619684c0:942a7d88:f116f2db Giuseppe> Internal Bitmap : 8 sectors from superblock Giuseppe> Update Time : Sun Dec 4 17:11:19 2016 Giuseppe> Bad Block Log : 512 entries available at offset 80 sectors - bad Giuseppe> blocks present. Giuseppe> Checksum : 15a7975c - correct Giuseppe> Events : 31196 Giuseppe> Layout : left-symmetric Giuseppe> Chunk Size : 512K Giuseppe> Device Role : Active device 1 Giuseppe> Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing) Giuseppe> ==8<=============== Giuseppe> And I _can_ assemble the array, but what I get is this: Giuseppe> [ +0.003574] md: bind<sdi> Giuseppe> [ +0.001823] md: bind<sdh> Giuseppe> [ +0.000978] md: bind<sdj> Giuseppe> [ +0.003971] md/raid:md127: device sdj operational as raid disk 1 Giuseppe> [ +0.000125] md/raid:md127: device sdh operational as raid disk 3 Giuseppe> [ +0.000105] md/raid:md127: device sdi operational as raid disk 2 Giuseppe> [ +0.015017] md/raid:md127: allocated 4374kB Giuseppe> [ +0.000139] md/raid:md127: raid level 6 active with 3 out of 4 Giuseppe> devices, algorithm 2 Giuseppe> [ +0.000063] RAID conf printout: Giuseppe> [ +0.000002] --- level:6 rd:4 wd:3 Giuseppe> [ +0.000003] disk 1, o:1, dev:sdj Giuseppe> [ +0.000002] disk 2, o:1, dev:sdi Giuseppe> [ +0.000001] disk 3, o:1, dev:sdh Giuseppe> [ +0.004187] md127: bitmap file is out of date (31193 < 31196) -- Giuseppe> forcing full recovery Giuseppe> [ +0.000065] created bitmap (22 pages) for device md127 Giuseppe> [ +0.000072] md127: bitmap file is out of date, doing full recovery Giuseppe> [ +0.100300] md127: bitmap initialized from disk: read 2 pages, set Giuseppe> 44711 of 44711 bits Giuseppe> [ +0.039741] md127: detected capacity change from 0 to 6000916561920 Giuseppe> [ +0.000085] Buffer I/O error on dev md127, logical block 0, async page read Giuseppe> [ +0.000064] Buffer I/O error on dev md127, logical block 0, async page read Giuseppe> [ +0.000022] Buffer I/O error on dev md127, logical block 0, async page read Giuseppe> [ +0.000022] Buffer I/O error on dev md127, logical block 0, async page read Giuseppe> [ +0.000019] ldm_validate_partition_table(): Disk read failed. Giuseppe> [ +0.000021] Buffer I/O error on dev md127, logical block 0, async page read Giuseppe> [ +0.000026] Buffer I/O error on dev md127, logical block 0, async page read Giuseppe> [ +0.000022] Buffer I/O error on dev md127, logical block 0, async page read Giuseppe> [ +0.000021] Buffer I/O error on dev md127, logical block 0, async page read Giuseppe> [ +0.000019] Dev md127: unable to read RDB block 0 Giuseppe> [ +0.000016] Buffer I/O error on dev md127, logical block 0, async page read Giuseppe> [ +0.000022] Buffer I/O error on dev md127, logical block 0, async page read Giuseppe> [ +0.000030] md127: unable to read partition table Giuseppe> and any attempt to access md127 content gives an I/O error. Giuseppe> -- Giuseppe> Giuseppe "Oblomov" Bilotta Giuseppe> -- Giuseppe> To unsubscribe from this list: send the line "unsubscribe linux-raid" in Giuseppe> the body of a message to majordomo@vger.kernel.org Giuseppe> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Recovering a RAID6 after all disks were disconnected 2016-12-08 19:02 ` John Stoffel @ 2016-12-22 23:11 ` Giuseppe Bilotta 2016-12-22 23:25 ` NeilBrown 0 siblings, 1 reply; 12+ messages in thread From: Giuseppe Bilotta @ 2016-12-22 23:11 UTC (permalink / raw) To: John Stoffel; +Cc: linux-raid Hello again, On Thu, Dec 8, 2016 at 8:02 PM, John Stoffel <john@stoffel.org> wrote: > > Sorry for not getting back to you sooner, I've been under the weather > lately. And I'm NOT an expert on this, but it's good you've made > copies of the disks. Don't worry about the timing, as you can see I haven't had much time to dedicate to the recovery of this RAID either. As you can see, it was not that urgent ;-) > Giuseppe> Here it is. Notice that this is the result of -E _after_ the attempted > Giuseppe> re-add while the RAID was running, which marked all the disks as > Giuseppe> spares: > > Yeah, this is probably a bad state. I would suggest you try to just > assemble the disks in various orders using your clones: > > mdadm -A /dev/md0 /dev/sdc /dev/sdd /dev/sde /dev/sdf > > And then mix up the order until you get a working array. You might > also want to try assembling using the 'missing' flag for the original > disk which dropped out of the array, so that just the three good disks > are used. This might take a while to test all the possible > permutations. > > You might also want to look back in the archives of this mailing > list. Phil Turmel has some great advice and howto guides for this. > You can do the test assembles using loop back devices so that you > don't write to the originals, or even to the clones. I've used the instructions on using overlays with dmsetup + sparse files on the RAID wiki https://raid.wiki.kernel.org/index.php/Recovering_a_damaged_RAID to experiment with the recovery (and just to be sure, I set the original disks read-only using blockdev; might be worth adding this to the wiki). I also wrote a small script to test all combinations (nothing smart, really, simply enumeration of combos, but I'll consider putting it up on the wiki as well), and I was actually surprised by the results. To test if the RAID was being re-created correctly with each combination, I used `file -s` on the RAID, and verified that the results made sense. I am surprised to find out that there are multiple combinations that make sense (note that the disk names are shifted by one compared to previous emails due a machine lockup that required a reboot and another disk butting in to a different order): trying /dev/sdd /dev/sdf /dev/sde /dev/sdg /dev/md111: Linux rev 1.0 ext4 filesystem data, UUID=0031565c-38dd-4445-a707-f77aef1cbf7e, volume name "oneforall" (needs journal recovery) (extents) (large files) (huge files) trying /dev/sdd /dev/sdf /dev/sdg /dev/sde /dev/md111: Linux rev 1.0 ext4 filesystem data, UUID=0031565c-38dd-4445-a707-f77aef1cbf7e, volume name "oneforall" (needs journal recovery) (extents) (large files) (huge files) trying /dev/sde /dev/sdf /dev/sdd /dev/sdg /dev/md111: Linux rev 1.0 ext4 filesystem data, UUID=0031565c-38dd-4445-a707-f77aef1cbf7e, volume name "oneforall" (needs journal recovery) (extents) (large files) (huge files) trying /dev/sde /dev/sdf /dev/sdg /dev/sdd /dev/md111: Linux rev 1.0 ext4 filesystem data, UUID=0031565c-38dd-4445-a707-f77aef1cbf7e, volume name "oneforall" (needs journal recovery) (extents) (large files) (huge files) trying /dev/sdg /dev/sdf /dev/sde /dev/sdd /dev/md111: Linux rev 1.0 ext4 filesystem data, UUID=0031565c-38dd-4445-a707-f77aef1cbf7e, volume name "oneforall" (needs journal recovery) (extents) (large files) (huge files) trying /dev/sdg /dev/sdf /dev/sdd /dev/sde /dev/md111: Linux rev 1.0 ext4 filesystem data, UUID=0031565c-38dd-4445-a707-f77aef1cbf7e, volume name "oneforall" (needs journal recovery) (extents) (large files) (huge files) : So there are six out of 24 combinations that make sense, at least for the first block. I know from the pre-fail dmesg that the g-f-e-d order should be the correct one, but now I'm left wondering if there is a better way to verify this (other than manually sampling files to see if they make sense), or if the left-symmetric layout on a RAID6 simply allows some of the disk positions to be swapped without loss of data. -- Giuseppe "Oblomov" Bilotta ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Recovering a RAID6 after all disks were disconnected 2016-12-22 23:11 ` Giuseppe Bilotta @ 2016-12-22 23:25 ` NeilBrown 2016-12-23 16:17 ` Giuseppe Bilotta 0 siblings, 1 reply; 12+ messages in thread From: NeilBrown @ 2016-12-22 23:25 UTC (permalink / raw) To: Giuseppe Bilotta, John Stoffel; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 4557 bytes --] On Fri, Dec 23 2016, Giuseppe Bilotta wrote: > Hello again, > > On Thu, Dec 8, 2016 at 8:02 PM, John Stoffel <john@stoffel.org> wrote: >> >> Sorry for not getting back to you sooner, I've been under the weather >> lately. And I'm NOT an expert on this, but it's good you've made >> copies of the disks. > > Don't worry about the timing, as you can see I haven't had much time > to dedicate to the recovery of this RAID either. As you can see, it > was not that urgent ;-) > > >> Giuseppe> Here it is. Notice that this is the result of -E _after_ the attempted >> Giuseppe> re-add while the RAID was running, which marked all the disks as >> Giuseppe> spares: >> >> Yeah, this is probably a bad state. I would suggest you try to just >> assemble the disks in various orders using your clones: >> >> mdadm -A /dev/md0 /dev/sdc /dev/sdd /dev/sde /dev/sdf >> >> And then mix up the order until you get a working array. You might >> also want to try assembling using the 'missing' flag for the original >> disk which dropped out of the array, so that just the three good disks >> are used. This might take a while to test all the possible >> permutations. >> >> You might also want to look back in the archives of this mailing >> list. Phil Turmel has some great advice and howto guides for this. >> You can do the test assembles using loop back devices so that you >> don't write to the originals, or even to the clones. > > I've used the instructions on using overlays with dmsetup + sparse > files on the RAID wiki > https://raid.wiki.kernel.org/index.php/Recovering_a_damaged_RAID > to experiment with the recovery (and just to be sure, I set the > original disks read-only using blockdev; might be worth adding this to > the wiki). > > I also wrote a small script to test all combinations (nothing smart, > really, simply enumeration of combos, but I'll consider putting it up > on the wiki as well), and I was actually surprised by the results. To > test if the RAID was being re-created correctly with each combination, > I used `file -s` on the RAID, and verified that the results made > sense. I am surprised to find out that there are multiple combinations > that make sense (note that the disk names are shifted by one compared > to previous emails due a machine lockup that required a reboot and > another disk butting in to a different order): > > trying /dev/sdd /dev/sdf /dev/sde /dev/sdg > /dev/md111: Linux rev 1.0 ext4 filesystem data, > UUID=0031565c-38dd-4445-a707-f77aef1cbf7e, volume name "oneforall" > (needs journal recovery) (extents) (large files) (huge files) > > trying /dev/sdd /dev/sdf /dev/sdg /dev/sde > /dev/md111: Linux rev 1.0 ext4 filesystem data, > UUID=0031565c-38dd-4445-a707-f77aef1cbf7e, volume name "oneforall" > (needs journal recovery) (extents) (large files) (huge files) > > trying /dev/sde /dev/sdf /dev/sdd /dev/sdg > /dev/md111: Linux rev 1.0 ext4 filesystem data, > UUID=0031565c-38dd-4445-a707-f77aef1cbf7e, volume name "oneforall" > (needs journal recovery) (extents) (large files) (huge files) > > trying /dev/sde /dev/sdf /dev/sdg /dev/sdd > /dev/md111: Linux rev 1.0 ext4 filesystem data, > UUID=0031565c-38dd-4445-a707-f77aef1cbf7e, volume name "oneforall" > (needs journal recovery) (extents) (large files) (huge files) > > trying /dev/sdg /dev/sdf /dev/sde /dev/sdd > /dev/md111: Linux rev 1.0 ext4 filesystem data, > UUID=0031565c-38dd-4445-a707-f77aef1cbf7e, volume name "oneforall" > (needs journal recovery) (extents) (large files) (huge files) > > trying /dev/sdg /dev/sdf /dev/sdd /dev/sde > /dev/md111: Linux rev 1.0 ext4 filesystem data, > UUID=0031565c-38dd-4445-a707-f77aef1cbf7e, volume name "oneforall" > (needs journal recovery) (extents) (large files) (huge files) > : > So there are six out of 24 combinations that make sense, at least for > the first block. I know from the pre-fail dmesg that the g-f-e-d order > should be the correct one, but now I'm left wondering if there is a > better way to verify this (other than manually sampling files to see > if they make sense), or if the left-symmetric layout on a RAID6 simply > allows some of the disk positions to be swapped without loss of data. > You script has reported all arrangements with /dev/sdf as the second device. Presumably that is where the single block you are reading resides. To check if a RAID6 arrangement is credible, you can try the raid6check program that is include in the mdadm source release. There is a man page. If the order of devices is not correct raid6check will tell you about it. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Recovering a RAID6 after all disks were disconnected 2016-12-22 23:25 ` NeilBrown @ 2016-12-23 16:17 ` Giuseppe Bilotta 2016-12-23 21:14 ` Giuseppe Bilotta 2016-12-23 22:46 ` NeilBrown 0 siblings, 2 replies; 12+ messages in thread From: Giuseppe Bilotta @ 2016-12-23 16:17 UTC (permalink / raw) To: NeilBrown; +Cc: John Stoffel, linux-raid On Fri, Dec 23, 2016 at 12:25 AM, NeilBrown <neilb@suse.com> wrote: > On Fri, Dec 23 2016, Giuseppe Bilotta wrote: >> I also wrote a small script to test all combinations (nothing smart, >> really, simply enumeration of combos, but I'll consider putting it up >> on the wiki as well), and I was actually surprised by the results. To >> test if the RAID was being re-created correctly with each combination, >> I used `file -s` on the RAID, and verified that the results made >> sense. I am surprised to find out that there are multiple combinations >> that make sense (note that the disk names are shifted by one compared >> to previous emails due a machine lockup that required a reboot and >> another disk butting in to a different order): >> >> trying /dev/sdd /dev/sdf /dev/sde /dev/sdg >> /dev/md111: Linux rev 1.0 ext4 filesystem data, >> UUID=0031565c-38dd-4445-a707-f77aef1cbf7e, volume name "oneforall" >> (needs journal recovery) (extents) (large files) (huge files) >> >> trying /dev/sdd /dev/sdf /dev/sdg /dev/sde >> /dev/md111: Linux rev 1.0 ext4 filesystem data, >> UUID=0031565c-38dd-4445-a707-f77aef1cbf7e, volume name "oneforall" >> (needs journal recovery) (extents) (large files) (huge files) >> >> trying /dev/sde /dev/sdf /dev/sdd /dev/sdg >> /dev/md111: Linux rev 1.0 ext4 filesystem data, >> UUID=0031565c-38dd-4445-a707-f77aef1cbf7e, volume name "oneforall" >> (needs journal recovery) (extents) (large files) (huge files) >> >> trying /dev/sde /dev/sdf /dev/sdg /dev/sdd >> /dev/md111: Linux rev 1.0 ext4 filesystem data, >> UUID=0031565c-38dd-4445-a707-f77aef1cbf7e, volume name "oneforall" >> (needs journal recovery) (extents) (large files) (huge files) >> >> trying /dev/sdg /dev/sdf /dev/sde /dev/sdd >> /dev/md111: Linux rev 1.0 ext4 filesystem data, >> UUID=0031565c-38dd-4445-a707-f77aef1cbf7e, volume name "oneforall" >> (needs journal recovery) (extents) (large files) (huge files) >> >> trying /dev/sdg /dev/sdf /dev/sdd /dev/sde >> /dev/md111: Linux rev 1.0 ext4 filesystem data, >> UUID=0031565c-38dd-4445-a707-f77aef1cbf7e, volume name "oneforall" >> (needs journal recovery) (extents) (large files) (huge files) >> : >> So there are six out of 24 combinations that make sense, at least for >> the first block. I know from the pre-fail dmesg that the g-f-e-d order >> should be the correct one, but now I'm left wondering if there is a >> better way to verify this (other than manually sampling files to see >> if they make sense), or if the left-symmetric layout on a RAID6 simply >> allows some of the disk positions to be swapped without loss of data. > You script has reported all arrangements with /dev/sdf as the second > device. Presumably that is where the single block you are reading > resides. That makes sense. > To check if a RAID6 arrangement is credible, you can try the raid6check > program that is include in the mdadm source release. There is a man > page. > If the order of devices is not correct raid6check will tell you about > it. That's a wonderful small utility, thanks for making it known to me! Checking even just a small number of stripes was enough in this case, as the expected combination (g f e d) was the only one that produced no errors. Now I wonder if it it would be possible to combine this approach with something that simply hacked the metadata of each disk to re-establish the correct disk order to make it possible to reassemble this particular array without recreating anything. Are problems such as mine common enough to warrant support for this kind of verified reassembly from assumed-clean disks easier?. -- Giuseppe "Oblomov" Bilotta ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Recovering a RAID6 after all disks were disconnected 2016-12-23 16:17 ` Giuseppe Bilotta @ 2016-12-23 21:14 ` Giuseppe Bilotta 2016-12-23 22:50 ` NeilBrown 2016-12-23 22:46 ` NeilBrown 1 sibling, 1 reply; 12+ messages in thread From: Giuseppe Bilotta @ 2016-12-23 21:14 UTC (permalink / raw) To: NeilBrown; +Cc: John Stoffel, linux-raid On Fri, Dec 23, 2016 at 5:17 PM, Giuseppe Bilotta <giuseppe.bilotta@gmail.com> wrote: > > Now I wonder if it it would be possible to combine this approach with > something that simply hacked the metadata of each disk to re-establish > the correct disk order to make it possible to reassemble this > particular array without recreating anything. Are problems such as > mine common enough to warrant support for this kind of verified > reassembly from assumed-clean disks easier?. Actually, now that the correct order is verified, I would like to know why re-creating the array using mdadm -C --assume-clean with the disks in the correct order works (the RAID is then accessible, and I can read data off of it). However, if I simply hand-edit the metadata to assign the correct device order to the disks (I do this by restoring the correct device roles in the dev_roles table, at the entries corresponding to the disks' dev_numbers, in the correct order, and then adjust the checksum accrdingly) and then assemble the array, I get I/O errors accessing the array contents, even though raid6check doesn't report issues. In the 'hacked dev role' case, the dmesg reads: [ +0.002057] md: bind<dm-2> [ +0.000936] md: bind<dm-1> [ +0.000932] md: bind<dm-0> [ +0.000925] md: bind<dm-3> [ +0.001443] md/raid:md112: device dm-3 operational as raid disk 0 [ +0.000540] md/raid:md112: device dm-0 operational as raid disk 3 [ +0.000710] md/raid:md112: device dm-1 operational as raid disk 2 [ +0.000508] md/raid:md112: device dm-2 operational as raid disk 1 [ +0.009716] md/raid:md112: allocated 4374kB [ +0.000555] md/raid:md112: raid level 6 active with 4 out of 4 devices, algorithm 2 [ +0.000531] RAID conf printout: [ +0.000001] --- level:6 rd:4 wd:4 [ +0.000001] disk 0, o:1, dev:dm-3 [ +0.000001] disk 1, o:1, dev:dm-2 [ +0.000000] disk 2, o:1, dev:dm-1 [ +0.000001] disk 3, o:1, dev:dm-0 [ +0.000449] created bitmap (22 pages) for device md112 [ +0.001865] md112: bitmap initialized from disk: read 2 pages, set 5 of 44711 bits [ +0.533458] md112: detected capacity change from 0 to 6000916561920 [ +0.004194] Buffer I/O error on dev md112, logical block 0, async page read [ +0.003450] Buffer I/O error on dev md112, logical block 0, async page read [ +0.001953] Buffer I/O error on dev md112, logical block 0, async page read [ +0.001978] Buffer I/O error on dev md112, logical block 0, async page read [ +0.001852] ldm_validate_partition_table(): Disk read failed. [ +0.001889] Buffer I/O error on dev md112, logical block 0, async page read [ +0.001875] Buffer I/O error on dev md112, logical block 0, async page read [ +0.001834] Buffer I/O error on dev md112, logical block 0, async page read [ +0.001596] Buffer I/O error on dev md112, logical block 0, async page read [ +0.001551] Dev md112: unable to read RDB block 0 [ +0.001293] Buffer I/O error on dev md112, logical block 0, async page read [ +0.001284] Buffer I/O error on dev md112, logical block 0, async page read [ +0.001307] md112: unable to read partition table So the array assembles, and raid6check reports no error, but the data is actually inaccessible .. am I missing other aspects of the metadata that need to be restored? -- Giuseppe "Oblomov" Bilotta ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Recovering a RAID6 after all disks were disconnected 2016-12-23 21:14 ` Giuseppe Bilotta @ 2016-12-23 22:50 ` NeilBrown 2016-12-24 14:47 ` Giuseppe Bilotta 0 siblings, 1 reply; 12+ messages in thread From: NeilBrown @ 2016-12-23 22:50 UTC (permalink / raw) To: Giuseppe Bilotta; +Cc: John Stoffel, linux-raid [-- Attachment #1: Type: text/plain, Size: 588 bytes --] On Sat, Dec 24 2016, Giuseppe Bilotta wrote: > > > So the array assembles, and raid6check reports no error, but the data > is actually inaccessible .. am I missing other aspects of the metadata > that need to be restored? Presumably, yes. If you provide "mdadm --examine" from devices in both the "working" and the "not working" case, I might be able to point to the difference. Alternately, use "mdadm --dump" to extract the metadata, then "tar --sparse" to combine the (sparse) metadata files into a tar-archive, and send that. Then I would be able to experiment myself. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Recovering a RAID6 after all disks were disconnected 2016-12-23 22:50 ` NeilBrown @ 2016-12-24 14:47 ` Giuseppe Bilotta 0 siblings, 0 replies; 12+ messages in thread From: Giuseppe Bilotta @ 2016-12-24 14:47 UTC (permalink / raw) To: NeilBrown; +Cc: John Stoffel, linux-raid On Fri, Dec 23, 2016 at 11:50 PM, NeilBrown <neilb@suse.com> wrote: > On Sat, Dec 24 2016, Giuseppe Bilotta wrote: >> >> >> So the array assembles, and raid6check reports no error, but the data >> is actually inaccessible .. am I missing other aspects of the metadata >> that need to be restored? > > Presumably, yes. > > If you provide "mdadm --examine" from devices in both the "working" and > the "not working" case, I might be able to point to the difference. I found the culprit. All disks have bad block lists, and the bad block lists include the initial data sectors (i.e. the sectors pointed at by the data offset in the superblock). This is quite probably a side effect of my stupid idea of trying a re-add of all disks after all of them were kicked out during the JBOD disconnect. One more reason to just stop the array when this situation arises. -- Giuseppe "Oblomov" Bilotta ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Recovering a RAID6 after all disks were disconnected 2016-12-23 16:17 ` Giuseppe Bilotta 2016-12-23 21:14 ` Giuseppe Bilotta @ 2016-12-23 22:46 ` NeilBrown 2016-12-24 14:34 ` Giuseppe Bilotta 1 sibling, 1 reply; 12+ messages in thread From: NeilBrown @ 2016-12-23 22:46 UTC (permalink / raw) To: Giuseppe Bilotta; +Cc: John Stoffel, linux-raid [-- Attachment #1: Type: text/plain, Size: 4429 bytes --] On Sat, Dec 24 2016, Giuseppe Bilotta wrote: > On Fri, Dec 23, 2016 at 12:25 AM, NeilBrown <neilb@suse.com> wrote: >> On Fri, Dec 23 2016, Giuseppe Bilotta wrote: >>> I also wrote a small script to test all combinations (nothing smart, >>> really, simply enumeration of combos, but I'll consider putting it up >>> on the wiki as well), and I was actually surprised by the results. To >>> test if the RAID was being re-created correctly with each combination, >>> I used `file -s` on the RAID, and verified that the results made >>> sense. I am surprised to find out that there are multiple combinations >>> that make sense (note that the disk names are shifted by one compared >>> to previous emails due a machine lockup that required a reboot and >>> another disk butting in to a different order): >>> >>> trying /dev/sdd /dev/sdf /dev/sde /dev/sdg >>> /dev/md111: Linux rev 1.0 ext4 filesystem data, >>> UUID=0031565c-38dd-4445-a707-f77aef1cbf7e, volume name "oneforall" >>> (needs journal recovery) (extents) (large files) (huge files) >>> >>> trying /dev/sdd /dev/sdf /dev/sdg /dev/sde >>> /dev/md111: Linux rev 1.0 ext4 filesystem data, >>> UUID=0031565c-38dd-4445-a707-f77aef1cbf7e, volume name "oneforall" >>> (needs journal recovery) (extents) (large files) (huge files) >>> >>> trying /dev/sde /dev/sdf /dev/sdd /dev/sdg >>> /dev/md111: Linux rev 1.0 ext4 filesystem data, >>> UUID=0031565c-38dd-4445-a707-f77aef1cbf7e, volume name "oneforall" >>> (needs journal recovery) (extents) (large files) (huge files) >>> >>> trying /dev/sde /dev/sdf /dev/sdg /dev/sdd >>> /dev/md111: Linux rev 1.0 ext4 filesystem data, >>> UUID=0031565c-38dd-4445-a707-f77aef1cbf7e, volume name "oneforall" >>> (needs journal recovery) (extents) (large files) (huge files) >>> >>> trying /dev/sdg /dev/sdf /dev/sde /dev/sdd >>> /dev/md111: Linux rev 1.0 ext4 filesystem data, >>> UUID=0031565c-38dd-4445-a707-f77aef1cbf7e, volume name "oneforall" >>> (needs journal recovery) (extents) (large files) (huge files) >>> >>> trying /dev/sdg /dev/sdf /dev/sdd /dev/sde >>> /dev/md111: Linux rev 1.0 ext4 filesystem data, >>> UUID=0031565c-38dd-4445-a707-f77aef1cbf7e, volume name "oneforall" >>> (needs journal recovery) (extents) (large files) (huge files) >>> : >>> So there are six out of 24 combinations that make sense, at least for >>> the first block. I know from the pre-fail dmesg that the g-f-e-d order >>> should be the correct one, but now I'm left wondering if there is a >>> better way to verify this (other than manually sampling files to see >>> if they make sense), or if the left-symmetric layout on a RAID6 simply >>> allows some of the disk positions to be swapped without loss of data. > >> You script has reported all arrangements with /dev/sdf as the second >> device. Presumably that is where the single block you are reading >> resides. > > That makes sense. > >> To check if a RAID6 arrangement is credible, you can try the raid6check >> program that is include in the mdadm source release. There is a man >> page. >> If the order of devices is not correct raid6check will tell you about >> it. > > That's a wonderful small utility, thanks for making it known to me! > Checking even just a small number of stripes was enough in this case, > as the expected combination (g f e d) was the only one that produced > no errors. > > Now I wonder if it it would be possible to combine this approach with > something that simply hacked the metadata of each disk to re-establish > the correct disk order to make it possible to reassemble this > particular array without recreating anything. Are problems such as > mine common enough to warrant support for this kind of verified > reassembly from assumed-clean disks easier?. The way I look at this sort of question is to ask "what is the root cause?", and then "What is the best response to the consequences of that root cause?". In your case, I would look at the sequence of event that lead to you needing to re-create your array, and ask "At which point could md or mdadm done something differently?". If you, or someone, can describe precisely how to reproduce your outcome - so that I can reproduce it myself - then I'll happily have a look and see at which point something different could have happened. Until then, I think the best response to these situations is to ask for help, and to have tools which allow details to be extract and repairs to be made. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Recovering a RAID6 after all disks were disconnected 2016-12-23 22:46 ` NeilBrown @ 2016-12-24 14:34 ` Giuseppe Bilotta 0 siblings, 0 replies; 12+ messages in thread From: Giuseppe Bilotta @ 2016-12-24 14:34 UTC (permalink / raw) To: NeilBrown; +Cc: John Stoffel, linux-raid On Fri, Dec 23, 2016 at 11:46 PM, NeilBrown <neilb@suse.com> wrote: > On Sat, Dec 24 2016, Giuseppe Bilotta wrote: >> >> Now I wonder if it it would be possible to combine this approach with >> something that simply hacked the metadata of each disk to re-establish >> the correct disk order to make it possible to reassemble this >> particular array without recreating anything. Are problems such as >> mine common enough to warrant support for this kind of verified >> reassembly from assumed-clean disks easier?. > > The way I look at this sort of question is to ask "what is the root > cause?", and then "What is the best response to the consequences of that > root cause?". > > In your case, I would look at the sequence of event that lead to you > needing to re-create your array, and ask "At which point could md or > mdadm done something differently?". > > If you, or someone, can describe precisely how to reproduce your outcome > - so that I can reproduce it myself - then I'll happily have a look and > see at which point something different could have happened. As I mentioned on the first post, the root of the issue is cheap hardware plus user error. Basically, all disks in this RAID are hosted on a JBOD that has a tendency to 'disappear' at times. I've seen this happen generally when one of the disks acts up (in which case Linux attempting to reset it leads to a reset of the whole JBOD, which makes all disks disappear until the device recovers). The JBOD is connected via USB3, but I had the same issues when using an eSATA connection with port multiplexer, and from what I've read around it's a known limitation of SATA (as opposed to professional stuff based on SAS). When this happens, md ends up removing all devices from the RAID. The proper way to handle this, I've found, is to unmount the filesystem, stop the array, and then reassemble it and remount it as soon as the JBOD is back online. With this approach the RAID recovers in pretty good shape (aside from the disk that is acting up, possibly). However, it's a bit bothersome and may take some time to free up all filesystem usage to allow for the unmounting, sometimes to the point of requiring a reboot. So the last time this happened I tried something different, and I made the mistake of trying a re-add of all the disks. This resulted in the disks being marked as spares because md could not restore the RAID functionality after having dropped to 0 disks. I'm not sure this could be handled differently, unless mdraid could be made to not kick all disks out if the whole JBOD disappears, but rather wait for it to come back? -- Giuseppe "Oblomov" Bilotta ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2016-12-24 14:47 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-12-07 8:59 Recovering a RAID6 after all disks were disconnected Giuseppe Bilotta 2016-12-07 14:31 ` John Stoffel 2016-12-07 17:21 ` Giuseppe Bilotta 2016-12-08 19:02 ` John Stoffel 2016-12-22 23:11 ` Giuseppe Bilotta 2016-12-22 23:25 ` NeilBrown 2016-12-23 16:17 ` Giuseppe Bilotta 2016-12-23 21:14 ` Giuseppe Bilotta 2016-12-23 22:50 ` NeilBrown 2016-12-24 14:47 ` Giuseppe Bilotta 2016-12-23 22:46 ` NeilBrown 2016-12-24 14:34 ` Giuseppe Bilotta
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).