* MD RAID Bug 7/15/12 @ 2012-09-30 0:12 Mark Munoz 2012-09-30 2:47 ` Chris Murphy 2012-10-01 3:02 ` NeilBrown 0 siblings, 2 replies; 12+ messages in thread From: Mark Munoz @ 2012-09-30 0:12 UTC (permalink / raw) To: linux-raid Hi I appear to have been affected by the bug you found on 7/15/12. The data I have on this array is really important and I want to make sure I get this correct before I actually make changes. Configuration: md0 is a RAID 6 volume with 24 devices and 1 spare. It is working fine and was unaffected. md1 is a RAID 6 volume with 19 devices and 1 spare. It was affected. All the drives show as unknown raid level and 0 devices. With the exception of device 5. It has all the information. Here is the output from that drive: serveradmin@hulk:/etc/mdadm$ sudo mdadm --examine /dev/sdaf /dev/sdaf: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 6afb3306:144cec30:1b2d1a19:3a56f0d3 Name : hulk:1 (local to host hulk) Creation Time : Wed Aug 15 16:25:30 2012 Raid Level : raid6 Raid Devices : 19 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) Array Size : 99629024416 (47506.82 GiB 51010.06 GB) Used Dev Size : 5860530848 (2794.52 GiB 3000.59 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : 205dfd9f:9be2b9ca:1f775974:fb1b742c Update Time : Sat Sep 29 12:22:51 2012 Checksum : 9f164d8e - correct Events : 38 Layout : left-symmetric Chunk Size : 4K Device Role : Active device 5 Array State : AAAAAAAAAAAAAAAAAAA ('A' == active, '.' == missing) Now I also have md2 which is a striped RAID of both md0 and md1. When I type: sudo mdadm --create --assume-clean /dev/md1 --level=6 --chunk=4 --metadata=1.2 --raid-devices=19 /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai /dev/sdaj /dev/sdak /dev/sdal /dev/sdam /dev/sdan /dev/sdao /dev/sdap /dev/sdaq /dev/sdar /dev/sdas the following error for each device. mdadm: /dev/sdaa appears to be part of a raid array: level=-unknown- devices=0 ctime=Wed Aug 15 16:25:30 2012 mdadm: partition table exists on /dev/sdaa but will be lost or meaningless after creating array I want to make sure by running this above command that I won't affect any of the data of md2 when I assemble that array after creating md1. Any help on this issue would be greatly appreciated. I would normally just DD copies but as you can see I would have to buy 19 more 3TB hard drives as well as the time to DD each drive. It is a production server and that kind of down time would really rather be avoided. Thank you so much for your time. Mark Munoz 623.523.3201 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: MD RAID Bug 7/15/12 2012-09-30 0:12 MD RAID Bug 7/15/12 Mark Munoz @ 2012-09-30 2:47 ` Chris Murphy 2012-09-30 21:08 ` Stefan /*St0fF*/ Hübner 2012-10-01 3:02 ` NeilBrown 1 sibling, 1 reply; 12+ messages in thread From: Chris Murphy @ 2012-09-30 2:47 UTC (permalink / raw) To: Linux RAID On Sep 29, 2012, at 6:12 PM, Mark Munoz wrote: > > Configuration: > md0 is a RAID 6 volume with 24 devices and 1 spare. It is working fine and was unaffected. > md1 is a RAID 6 volume with 19 devices and 1 spare. It was affected. All the drives show as unknown raid level and 0 devices. With the exception of device 5. It has all the information. > > Layout : left-symmetric > Chunk Size : 4K Off topic response: I'm kindof new at all of this. But 24 and 19 devices? Is this really ideal? Why not cap RAID6 to a max of 12 disks, and either use LVM or md raid linear to aggregate? Chris Murphy ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: MD RAID Bug 7/15/12 2012-09-30 2:47 ` Chris Murphy @ 2012-09-30 21:08 ` Stefan /*St0fF*/ Hübner 2012-09-30 22:16 ` Chris Murphy 0 siblings, 1 reply; 12+ messages in thread From: Stefan /*St0fF*/ Hübner @ 2012-09-30 21:08 UTC (permalink / raw) To: Chris Murphy; +Cc: Linux RAID Am 30.09.2012 04:47, schrieb Chris Murphy: > > On Sep 29, 2012, at 6:12 PM, Mark Munoz wrote: > >> >> Configuration: >> md0 is a RAID 6 volume with 24 devices and 1 spare. It is working fine and was unaffected. >> md1 is a RAID 6 volume with 19 devices and 1 spare. It was affected. All the drives show as unknown raid level and 0 devices. With the exception of device 5. It has all the information. >> >> Layout : left-symmetric >> Chunk Size : 4K > > Off topic response: I'm kindof new at all of this. But 24 and 19 devices? Is this really ideal? Why not cap RAID6 to a max of 12 disks, and either use LVM or md raid linear to aggregate? > Also off topic: 12 drives would be as "nearly unalignable" as 19 are. But still this setup is kind of sporty. I wouldn't put too expensive data on there. My rule of thumb: each 4 drives need one drive of redundancy. So a ten drive raid6 is good. Next alignment step (powers of two amount of data drives) would be 18 - I'd add spare to it. But 45 drives? I'd give it a RAID60 of 4x10 drives, then think about it again, what to do with those other 5 drives... or 2x18+1x6 and 3 spares ... Well, whatever. This is not an ideal setup anyhow. God bless Supermicro SC847E16JBOD ;) St0fF ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: MD RAID Bug 7/15/12 2012-09-30 21:08 ` Stefan /*St0fF*/ Hübner @ 2012-09-30 22:16 ` Chris Murphy 2012-10-01 22:27 ` Stefan /*St0fF*/ Hübner 0 siblings, 1 reply; 12+ messages in thread From: Chris Murphy @ 2012-09-30 22:16 UTC (permalink / raw) To: Linux RAID On Sep 30, 2012, at 3:08 PM, Stefan /*St0fF*/ Hübner wrote: > Also off topic: 12 drives would be as "nearly unalignable" as 19 are. I'm not sure what you mean by unalignable. A separate question is if a 4K chunks size is a good idea, even with 24 disks, but I'm unsure of the usage and workload. On Sep 29, 2012, at 6:12 PM, Mark Munoz wrote: > sudo mdadm --create --assume-clean /dev/md1 --level=6 --chunk=4 --metadata=1.2 --raid-devices=19 /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai /dev/sdaj /dev/sdak /dev/sdal /dev/sdam /dev/sdan /dev/sdao /dev/sdap /dev/sdaq /dev/sdar /dev/sdas > > the following error for each device. > > mdadm: /dev/sdaa appears to be part of a raid array: > level=-unknown- devices=0 ctime=Wed Aug 15 16:25:30 2012 > mdadm: partition table exists on /dev/sdaa but will be lost or > meaningless after creating array > > I want to make sure by running this above command that I won't affect any of the data of md2 when I assemble that array after creating md1. Any help on this issue would be greatly appreciated. I would normally just DD copies but as you can see I would have to buy 19 more 3TB hard drives as well as the time to DD each drive. It is a production server and that kind of down time would really rather be avoided. That metadata should be stored elsewhere. If I'm understanding the logic correctly the RAID 6 metadata would be on all /dev/sdaX disks at offset 2048. And the RAID 0 metadata would be on /dev/md0 and /dev/md1 at offset 2048. I'd make certain that /dev/md0 is not mounted, and that neither md0 nor md1 are scheduled for a repair scrubbing which would likely cause problems when it comes time to marry the two RAID 6's back together again. Maybe not necessary, since you aren't missing any disk, but after create you could do 'echo check > /sys/block/md0/md/sync_action' and check /sys/block/md0/md/mismatch_cnt. If both RAID 6's are happy, then you can deal with the RAID 0, and check the file system with -n or equivalent to report problems but not make repairs. Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: MD RAID Bug 7/15/12 2012-09-30 22:16 ` Chris Murphy @ 2012-10-01 22:27 ` Stefan /*St0fF*/ Hübner 0 siblings, 0 replies; 12+ messages in thread From: Stefan /*St0fF*/ Hübner @ 2012-10-01 22:27 UTC (permalink / raw) To: Chris Murphy; +Cc: Linux RAID Am 01.10.2012 00:16, schrieb Chris Murphy: > > On Sep 30, 2012, at 3:08 PM, Stefan /*St0fF*/ Hübner wrote: > > >> Also off topic: 12 drives would be as "nearly unalignable" as 19 are. > > I'm not sure what you mean by unalignable. A separate question is if a 4K chunks size is a good idea, even with 24 disks, but I'm unsure of the usage and workload. > > [...] By that I mean that optimal alignment has to do with a reasonable chunk size and an amount of data disks that is/should be a power of two. If you made a databases commit chunk size 64k, you'd create maybe a raid6 of 6 drives with a 16k chunk size. That way you'd spare many many read-modify-write operations, as on every commit it'd write all 6 disks once and no rmw would happen. You'd also have to take care of proper partition placement. That probably is what you know as alignment. I just want to make it more concious that not only the placement of a data partition goes into alignment, also the size of a "normal" data packet relative to chunk size and amount of disks used. cheers, stefan -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: MD RAID Bug 7/15/12 2012-09-30 0:12 MD RAID Bug 7/15/12 Mark Munoz 2012-09-30 2:47 ` Chris Murphy @ 2012-10-01 3:02 ` NeilBrown [not found] ` <42BA87F6-C5A3-4321-A4C7-0DCF0A9DF79D@rightthisminute.com> 1 sibling, 1 reply; 12+ messages in thread From: NeilBrown @ 2012-10-01 3:02 UTC (permalink / raw) To: Mark Munoz; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 3174 bytes --] On Sat, 29 Sep 2012 17:12:40 -0700 Mark Munoz <mark.munoz@rightthisminute.com> wrote: > Hi I appear to have been affected by the bug you found on 7/15/12. The data I have on this array is really important and I want to make sure I get this correct before I actually make changes. > > Configuration: > md0 is a RAID 6 volume with 24 devices and 1 spare. It is working fine and was unaffected. > md1 is a RAID 6 volume with 19 devices and 1 spare. It was affected. All the drives show as unknown raid level and 0 devices. With the exception of device 5. It has all the information. > > Here is the output from that drive: > > serveradmin@hulk:/etc/mdadm$ sudo mdadm --examine /dev/sdaf > /dev/sdaf: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x0 > Array UUID : 6afb3306:144cec30:1b2d1a19:3a56f0d3 > Name : hulk:1 (local to host hulk) > Creation Time : Wed Aug 15 16:25:30 2012 > Raid Level : raid6 > Raid Devices : 19 > > Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) > Array Size : 99629024416 (47506.82 GiB 51010.06 GB) > Used Dev Size : 5860530848 (2794.52 GiB 3000.59 GB) > Data Offset : 2048 sectors > Super Offset : 8 sectors > State : clean > Device UUID : 205dfd9f:9be2b9ca:1f775974:fb1b742c > > Update Time : Sat Sep 29 12:22:51 2012 > Checksum : 9f164d8e - correct > Events : 38 > > Layout : left-symmetric > Chunk Size : 4K > > Device Role : Active device 5 > Array State : AAAAAAAAAAAAAAAAAAA ('A' == active, '.' == missing) > > Now I also have md2 which is a striped RAID of both md0 and md1. > > When I type: > > sudo mdadm --create --assume-clean /dev/md1 --level=6 --chunk=4 --metadata=1.2 --raid-devices=19 /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai /dev/sdaj /dev/sdak /dev/sdal /dev/sdam /dev/sdan /dev/sdao /dev/sdap /dev/sdaq /dev/sdar /dev/sdas > > the following error for each device. > > mdadm: /dev/sdaa appears to be part of a raid array: > level=-unknown- devices=0 ctime=Wed Aug 15 16:25:30 2012 > mdadm: partition table exists on /dev/sdaa but will be lost or > meaningless after creating array > > I want to make sure by running this above command that I won't affect any of the data of md2 when I assemble that array after creating md1. Any help on this issue would be greatly appreciated. I would normally just DD copies but as you can see I would have to buy 19 more 3TB hard drives as well as the time to DD each drive. It is a production server and that kind of down time would really rather be avoided. Running this command will only overwrite the 4K of metadata, 4K from the start of the devices. It will not write anything else to any device. so yes, it is safe. NeilBrown > > Thank you so much for your time. > > Mark Munoz > 623.523.3201-- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <42BA87F6-C5A3-4321-A4C7-0DCF0A9DF79D@rightthisminute.com>]
* Re: MD RAID Bug 7/15/12 [not found] ` <42BA87F6-C5A3-4321-A4C7-0DCF0A9DF79D@rightthisminute.com> @ 2012-10-02 1:51 ` Mark Munoz 2012-10-02 2:25 ` NeilBrown [not found] ` <20121002114920.1029bed7@notabene.brown> 1 sibling, 1 reply; 12+ messages in thread From: Mark Munoz @ 2012-10-02 1:51 UTC (permalink / raw) To: linux-raid@vger.kernel.org; +Cc: NeilBrown Neil, Thank you again so much for taking time out of your day to personally help me it really means a lot. I have ran the command and have successfully recreated my md1 Now however md2 will not assemble. I get this error. sudo mdadm --assemble --force /dev/md2 /dev/md0 /dev/md1 mdadm: superblock on /dev/md1 doesn't match others - assembly aborted Would I be correct in thinking that I just need to recreate md2 now as well? I assume with this command? sudo mdadm --create --assume-clean /dev/md2 --level=0 --chunk=64 --metadata=1.2 --raid-devices=2 /dev/md0 /dev/md1 Mark Munoz On Oct 1, 2012, at 2:00 PM, Mark Munoz <mark.munoz@rightthisminute.com> wrote: > >> On Sat, 29 Sep 2012 17:12:40 -0700 Mark Munoz >> <mark.munoz@rightthisminute.com> wrote: >> >>> Hi I appear to have been affected by the bug you found on 7/15/12. The data I have on this array is really important and I want to make sure I get this correct before I actually make changes. >>> >>> Configuration: >>> md0 is a RAID 6 volume with 24 devices and 1 spare. It is working fine and was unaffected. >>> md1 is a RAID 6 volume with 19 devices and 1 spare. It was affected. All the drives show as unknown raid level and 0 devices. With the exception of device 5. It has all the information. >>> >>> Here is the output from that drive: >>> >>> serveradmin@hulk:/etc/mdadm$ sudo mdadm --examine /dev/sdaf >>> /dev/sdaf: >>> Magic : a92b4efc >>> Version : 1.2 >>> Feature Map : 0x0 >>> Array UUID : 6afb3306:144cec30:1b2d1a19:3a56f0d3 >>> Name : hulk:1 (local to host hulk) >>> Creation Time : Wed Aug 15 16:25:30 2012 >>> Raid Level : raid6 >>> Raid Devices : 19 >>> >>> Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) >>> Array Size : 99629024416 (47506.82 GiB 51010.06 GB) >>> Used Dev Size : 5860530848 (2794.52 GiB 3000.59 GB) >>> Data Offset : 2048 sectors >>> Super Offset : 8 sectors >>> State : clean >>> Device UUID : 205dfd9f:9be2b9ca:1f775974:fb1b742c >>> >>> Update Time : Sat Sep 29 12:22:51 2012 >>> Checksum : 9f164d8e - correct >>> Events : 38 >>> >>> Layout : left-symmetric >>> Chunk Size : 4K >>> >>> Device Role : Active device 5 >>> Array State : AAAAAAAAAAAAAAAAAAA ('A' == active, '.' == missing) >>> >>> Now I also have md2 which is a striped RAID of both md0 and md1. >>> >>> When I type: >>> >>> sudo mdadm --create --assume-clean /dev/md1 --level=6 --chunk=4 --metadata=1.2 --raid-devices=19 /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai /dev/sdaj /dev/sdak /dev/sdal /dev/sdam /dev/sdan /dev/sdao /dev/sdap /dev/sdaq /dev/sdar /dev/sdas >>> >>> the following error for each device. >>> >>> mdadm: /dev/sdaa appears to be part of a raid array: >>> level=-unknown- devices=0 ctime=Wed Aug 15 16:25:30 2012 >>> mdadm: partition table exists on /dev/sdaa but will be lost or >>> meaningless after creating array >>> >>> I want to make sure by running this above command that I won't affect any of the data of md2 when I assemble that array after creating md1. Any help on this issue would be greatly appreciated. I would normally just DD copies but as you can see I would have to buy 19 more 3TB hard drives as well as the time to DD each drive. It is a production server and that kind of down time would really rather be avoided. >> >> Running this command will only overwrite the 4K of metadata, 4K from the >> start of the devices. It will not write anything else to any device. >> >> so yes, it is safe. >> >> NeilBrown >> >> >> >>> >>> Thank you so much for your time. >>> >>> Mark Munoz >>> 623.523.3201-- >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: MD RAID Bug 7/15/12 2012-10-02 1:51 ` Mark Munoz @ 2012-10-02 2:25 ` NeilBrown 0 siblings, 0 replies; 12+ messages in thread From: NeilBrown @ 2012-10-02 2:25 UTC (permalink / raw) To: Mark Munoz; +Cc: linux-raid@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 4847 bytes --] On Mon, 1 Oct 2012 18:51:09 -0700 Mark Munoz <mark.munoz@rightthisminute.com> wrote: > Neil, > > Thank you again so much for taking time out of your day to personally help me it really means a lot. I have ran the command and have successfully recreated my md1 Now however md2 will not assemble. I get this error. > > sudo mdadm --assemble --force /dev/md2 /dev/md0 /dev/md1 > mdadm: superblock on /dev/md1 doesn't match others - assembly aborted > > Would I be correct in thinking that I just need to recreate md2 now as well? Maybe, but probably not. I would think it more likely that md1 wasn't created quite right - otherwise it should have the right metadata. What does: mdadm -E /dev/md1 display now? How does that compare with "mdadm -E /dev/md0" ? What about mdadm -E /dev/sdaf (or any other device in md1)? How does that compare to what was displayed previously? NeilBrown > > I assume with this command? > > sudo mdadm --create --assume-clean /dev/md2 --level=0 --chunk=64 --metadata=1.2 --raid-devices=2 /dev/md0 /dev/md1 > > Mark Munoz > > On Oct 1, 2012, at 2:00 PM, Mark Munoz <mark.munoz@rightthisminute.com> wrote: > > > >> On Sat, 29 Sep 2012 17:12:40 -0700 Mark Munoz > >> <mark.munoz@rightthisminute.com> wrote: > >> > >>> Hi I appear to have been affected by the bug you found on 7/15/12. The data I have on this array is really important and I want to make sure I get this correct before I actually make changes. > >>> > >>> Configuration: > >>> md0 is a RAID 6 volume with 24 devices and 1 spare. It is working fine and was unaffected. > >>> md1 is a RAID 6 volume with 19 devices and 1 spare. It was affected. All the drives show as unknown raid level and 0 devices. With the exception of device 5. It has all the information. > >>> > >>> Here is the output from that drive: > >>> > >>> serveradmin@hulk:/etc/mdadm$ sudo mdadm --examine /dev/sdaf > >>> /dev/sdaf: > >>> Magic : a92b4efc > >>> Version : 1.2 > >>> Feature Map : 0x0 > >>> Array UUID : 6afb3306:144cec30:1b2d1a19:3a56f0d3 > >>> Name : hulk:1 (local to host hulk) > >>> Creation Time : Wed Aug 15 16:25:30 2012 > >>> Raid Level : raid6 > >>> Raid Devices : 19 > >>> > >>> Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) > >>> Array Size : 99629024416 (47506.82 GiB 51010.06 GB) > >>> Used Dev Size : 5860530848 (2794.52 GiB 3000.59 GB) > >>> Data Offset : 2048 sectors > >>> Super Offset : 8 sectors > >>> State : clean > >>> Device UUID : 205dfd9f:9be2b9ca:1f775974:fb1b742c > >>> > >>> Update Time : Sat Sep 29 12:22:51 2012 > >>> Checksum : 9f164d8e - correct > >>> Events : 38 > >>> > >>> Layout : left-symmetric > >>> Chunk Size : 4K > >>> > >>> Device Role : Active device 5 > >>> Array State : AAAAAAAAAAAAAAAAAAA ('A' == active, '.' == missing) > >>> > >>> Now I also have md2 which is a striped RAID of both md0 and md1. > >>> > >>> When I type: > >>> > >>> sudo mdadm --create --assume-clean /dev/md1 --level=6 --chunk=4 --metadata=1.2 --raid-devices=19 /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai /dev/sdaj /dev/sdak /dev/sdal /dev/sdam /dev/sdan /dev/sdao /dev/sdap /dev/sdaq /dev/sdar /dev/sdas > >>> > >>> the following error for each device. > >>> > >>> mdadm: /dev/sdaa appears to be part of a raid array: > >>> level=-unknown- devices=0 ctime=Wed Aug 15 16:25:30 2012 > >>> mdadm: partition table exists on /dev/sdaa but will be lost or > >>> meaningless after creating array > >>> > >>> I want to make sure by running this above command that I won't affect any of the data of md2 when I assemble that array after creating md1. Any help on this issue would be greatly appreciated. I would normally just DD copies but as you can see I would have to buy 19 more 3TB hard drives as well as the time to DD each drive. It is a production server and that kind of down time would really rather be avoided. > >> > >> Running this command will only overwrite the 4K of metadata, 4K from the > >> start of the devices. It will not write anything else to any device. > >> > >> so yes, it is safe. > >> > >> NeilBrown > >> > >> > >> > >>> > >>> Thank you so much for your time. > >>> > >>> Mark Munoz > >>> 623.523.3201-- > >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in > >>> the body of a message to majordomo@vger.kernel.org > >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <20121002114920.1029bed7@notabene.brown>]
* Re: MD RAID Bug 7/15/12 [not found] ` <20121002114920.1029bed7@notabene.brown> @ 2012-10-02 2:33 ` Mark Munoz 2012-10-02 5:07 ` NeilBrown 0 siblings, 1 reply; 12+ messages in thread From: Mark Munoz @ 2012-10-02 2:33 UTC (permalink / raw) To: linux-raid@vger.kernel.org; +Cc: NeilBrown Yes sorry should have sent that output. This is the weird side! So md0 was unaffected and it assembled perfectly. Here is a sample output from /dev/sdf /dev/sdf: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : a24f542c:e0bc0fd0:983af76e:c7581724 Name : hulk:0 (local to host hulk) Creation Time : Wed Aug 15 16:24:17 2012 Raid Level : raid6 Raid Devices : 24 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) Array Size : 128931677952 (61479.42 GiB 66013.02 GB) Used Dev Size : 5860530816 (2794.52 GiB 3000.59 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : c3d3ea5b:7db696fd:71474db6:c07c7415 Update Time : Mon Oct 1 13:54:05 2012 Checksum : 246155dd - correct Events : 41 Layout : left-symmetric Chunk Size : 64K Device Role : Active device 4 Array State : AAAAAAAAAAAAAAAAAAAAAAAA ('A' == active, '.' == missing) So this chunk size is 64k. The chunk size of md1 is 4k. /dev/sdaf: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : ab4c898b:03310a29:276a40a1:2ad45c73 Name : hulk:1 (local to host hulk) Creation Time : Mon Oct 1 13:51:50 2012 Raid Level : raid6 Raid Devices : 19 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) Array Size : 99629024416 (47506.82 GiB 51010.06 GB) Used Dev Size : 5860530848 (2794.52 GiB 3000.59 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : 0a88e45f:7df6b724:59825715:faf4abdf Update Time : Mon Oct 1 13:54:05 2012 Checksum : b276ec79 - correct Events : 2 Layout : left-symmetric Chunk Size : 4K Device Role : Active device 5 Array State : AAAAAAAAAAAAAAAAAAA ('A' == active, '.' == missing) I created these arrays using webmin and I guess I must have left the default which is 4k in that tool. I manually changed md0 and md2 to 64k when I created those arrays. However here is the weird part. This is the output of md0 which was the array that built perfectly upon reboot. /dev/md0: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : aaa29f43:b689f66d:f270c5fc:405620b7 Name : hulk:2 (local to host hulk) Creation Time : Wed Aug 15 16:26:09 2012 Raid Level : -unknown- Raid Devices : 0 Avail Dev Size : 128931675904 (61479.42 GiB 66013.02 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : active Device UUID : b626dd72:995f1277:6f61e94a:489e0468 Update Time : Sat Sep 29 14:49:09 2012 Checksum : c09051e9 - correct Events : 1 Device Role : spare Array State : ('A' == active, '.' == missing) And this is the output of md1 AFTER the recreation. /dev/md1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : aaa29f43:b689f66d:f270c5fc:405620b7 Name : hulk:2 (local to host hulk) Creation Time : Wed Aug 15 16:26:09 2012 Raid Level : raid0 Raid Devices : 2 Avail Dev Size : 99629022368 (47506.82 GiB 51010.06 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : 184bb4db:b77d921f:28a45e09:dc58e1e1 Update Time : Wed Aug 15 16:26:09 2012 Checksum : 334e56ee - correct Events : 0 Chunk Size : 64K Device Role : Active device 1 Array State : AA ('A' == active, '.' == missing) So I am now thinking that during my shutdown this weekend the device /dev/md0 as part of the array /dev/md2 was affected by the bug as well as 18 of the 19 devices contained inside of /devm/md1 Mark Munoz On Oct 1, 2012, at 6:49 PM, NeilBrown <neilb@suse.de> wrote: > On Mon, 1 Oct 2012 14:00:24 -0700 Mark Munoz <mark.munoz@rightthisminute.com> > wrote: > >> Neil, >> >> Thank you again so much for taking time out of your day to personally help me it really means a lot. I have ran the command and have successfully recreated my md1 Now however md2 will not assemble. I get this error. > > Please don't take the discussion off-list. > > If you want an answer, you will have to post to the list - and CC me if you > like. > > NeilBrown > > >> >> sudo mdadm --assemble --force /dev/md2 /dev/md0 /dev/md1 >> mdadm: superblock on /dev/md1 doesn't match others - assembly aborted >> >> Would I be correct in thinking that I just need to recreate md2 now as well? >> >> I assume with this command? >> >> sudo mdadm --create --assume-clean /dev/md2 --level=0 --chunk=64 --metadata=1.2 --raid-devices=2 /dev/md0 /dev/md1 >> >> Mark Munoz >> >> On Sep 30, 2012, at 8:02 PM, NeilBrown <neilb@suse.de> wrote: >> >>> On Sat, 29 Sep 2012 17:12:40 -0700 Mark Munoz >>> <mark.munoz@rightthisminute.com> wrote: >>> >>>> Hi I appear to have been affected by the bug you found on 7/15/12. The data I have on this array is really important and I want to make sure I get this correct before I actually make changes. >>>> >>>> Configuration: >>>> md0 is a RAID 6 volume with 24 devices and 1 spare. It is working fine and was unaffected. >>>> md1 is a RAID 6 volume with 19 devices and 1 spare. It was affected. All the drives show as unknown raid level and 0 devices. With the exception of device 5. It has all the information. >>>> >>>> Here is the output from that drive: >>>> >>>> serveradmin@hulk:/etc/mdadm$ sudo mdadm --examine /dev/sdaf >>>> /dev/sdaf: >>>> Magic : a92b4efc >>>> Version : 1.2 >>>> Feature Map : 0x0 >>>> Array UUID : 6afb3306:144cec30:1b2d1a19:3a56f0d3 >>>> Name : hulk:1 (local to host hulk) >>>> Creation Time : Wed Aug 15 16:25:30 2012 >>>> Raid Level : raid6 >>>> Raid Devices : 19 >>>> >>>> Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB) >>>> Array Size : 99629024416 (47506.82 GiB 51010.06 GB) >>>> Used Dev Size : 5860530848 (2794.52 GiB 3000.59 GB) >>>> Data Offset : 2048 sectors >>>> Super Offset : 8 sectors >>>> State : clean >>>> Device UUID : 205dfd9f:9be2b9ca:1f775974:fb1b742c >>>> >>>> Update Time : Sat Sep 29 12:22:51 2012 >>>> Checksum : 9f164d8e - correct >>>> Events : 38 >>>> >>>> Layout : left-symmetric >>>> Chunk Size : 4K >>>> >>>> Device Role : Active device 5 >>>> Array State : AAAAAAAAAAAAAAAAAAA ('A' == active, '.' == missing) >>>> >>>> Now I also have md2 which is a striped RAID of both md0 and md1. >>>> >>>> When I type: >>>> >>>> sudo mdadm --create --assume-clean /dev/md1 --level=6 --chunk=4 --metadata=1.2 --raid-devices=19 /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai /dev/sdaj /dev/sdak /dev/sdal /dev/sdam /dev/sdan /dev/sdao /dev/sdap /dev/sdaq /dev/sdar /dev/sdas >>>> >>>> the following error for each device. >>>> >>>> mdadm: /dev/sdaa appears to be part of a raid array: >>>> level=-unknown- devices=0 ctime=Wed Aug 15 16:25:30 2012 >>>> mdadm: partition table exists on /dev/sdaa but will be lost or >>>> meaningless after creating array >>>> >>>> I want to make sure by running this above command that I won't affect any of the data of md2 when I assemble that array after creating md1. Any help on this issue would be greatly appreciated. I would normally just DD copies but as you can see I would have to buy 19 more 3TB hard drives as well as the time to DD each drive. It is a production server and that kind of down time would really rather be avoided. >>> >>> Running this command will only overwrite the 4K of metadata, 4K from the >>> start of the devices. It will not write anything else to any device. >>> >>> so yes, it is safe. >>> >>> NeilBrown >>> >>> >>> >>>> >>>> Thank you so much for your time. >>>> >>>> Mark Munoz >>>> 623.523.3201-- >>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: MD RAID Bug 7/15/12 2012-10-02 2:33 ` Mark Munoz @ 2012-10-02 5:07 ` NeilBrown 2012-10-02 22:53 ` Mark Munoz 0 siblings, 1 reply; 12+ messages in thread From: NeilBrown @ 2012-10-02 5:07 UTC (permalink / raw) To: Mark Munoz; +Cc: linux-raid@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 492 bytes --] On Mon, 1 Oct 2012 19:33:50 -0700 Mark Munoz <mark.munoz@rightthisminute.com> wrote: > So I am now thinking that during my shutdown this weekend the device /dev/md0 as part of the array /dev/md2 was affected by the bug as well as 18 of the 19 devices contained inside of /devm/md1 Yes, I agree, so: > >> > >> sudo mdadm --create --assume-clean /dev/md2 --level=0 --chunk=64 --metadata=1.2 --raid-devices=2 /dev/md0 /dev/md1 This is the correct thing to do. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: MD RAID Bug 7/15/12 2012-10-02 5:07 ` NeilBrown @ 2012-10-02 22:53 ` Mark Munoz 2012-10-03 1:54 ` NeilBrown 0 siblings, 1 reply; 12+ messages in thread From: Mark Munoz @ 2012-10-02 22:53 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid@vger.kernel.org Neil, I just wanted to send you a HUGE thank you. We got back all of our data and are very pleased! Do you guys have some sort of donation fund or something that we can send something to your team to show our appreciation? Mark On Oct 1, 2012, at 10:07 PM, NeilBrown <neilb@suse.de> wrote: > On Mon, 1 Oct 2012 19:33:50 -0700 Mark Munoz <mark.munoz@rightthisminute.com> > wrote: > > >> So I am now thinking that during my shutdown this weekend the device /dev/md0 as part of the array /dev/md2 was affected by the bug as well as 18 of the 19 devices contained inside of /devm/md1 > > Yes, I agree, so: > >>>> >>>> sudo mdadm --create --assume-clean /dev/md2 --level=0 --chunk=64 --metadata=1.2 --raid-devices=2 /dev/md0 /dev/md1 > > This is the correct thing to do. > > NeilBrown > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: MD RAID Bug 7/15/12 2012-10-02 22:53 ` Mark Munoz @ 2012-10-03 1:54 ` NeilBrown 0 siblings, 0 replies; 12+ messages in thread From: NeilBrown @ 2012-10-03 1:54 UTC (permalink / raw) To: Mark Munoz; +Cc: linux-raid@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 1042 bytes --] On Tue, 2 Oct 2012 15:53:34 -0700 Mark Munoz <mark.munoz@rightthisminute.com> wrote: > Neil, > > I just wanted to send you a HUGE thank you. We got back all of our data and are very pleased! Do you guys have some sort of donation fund or something that we can send something to your team to show our appreciation? Thanks for the offer, but no: there is nothing like that. NeilBrown > > Mark > > On Oct 1, 2012, at 10:07 PM, NeilBrown <neilb@suse.de> wrote: > > > On Mon, 1 Oct 2012 19:33:50 -0700 Mark Munoz <mark.munoz@rightthisminute.com> > > wrote: > > > > > >> So I am now thinking that during my shutdown this weekend the device /dev/md0 as part of the array /dev/md2 was affected by the bug as well as 18 of the 19 devices contained inside of /devm/md1 > > > > Yes, I agree, so: > > > >>>> > >>>> sudo mdadm --create --assume-clean /dev/md2 --level=0 --chunk=64 --metadata=1.2 --raid-devices=2 /dev/md0 /dev/md1 > > > > This is the correct thing to do. > > > > NeilBrown > > > > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2012-10-03 1:54 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-09-30 0:12 MD RAID Bug 7/15/12 Mark Munoz 2012-09-30 2:47 ` Chris Murphy 2012-09-30 21:08 ` Stefan /*St0fF*/ Hübner 2012-09-30 22:16 ` Chris Murphy 2012-10-01 22:27 ` Stefan /*St0fF*/ Hübner 2012-10-01 3:02 ` NeilBrown [not found] ` <42BA87F6-C5A3-4321-A4C7-0DCF0A9DF79D@rightthisminute.com> 2012-10-02 1:51 ` Mark Munoz 2012-10-02 2:25 ` NeilBrown [not found] ` <20121002114920.1029bed7@notabene.brown> 2012-10-02 2:33 ` Mark Munoz 2012-10-02 5:07 ` NeilBrown 2012-10-02 22:53 ` Mark Munoz 2012-10-03 1:54 ` NeilBrown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).