MD RAID Bug 7/15/12

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* MD RAID Bug 7/15/12
@ 2012-09-30  0:12 Mark Munoz
  2012-09-30  2:47 ` Chris Murphy
  2012-10-01  3:02 ` NeilBrown
  0 siblings, 2 replies; 12+ messages in thread
From: Mark Munoz @ 2012-09-30  0:12 UTC (permalink / raw)
  To: linux-raid

Hi I appear to have been affected by the bug you found on 7/15/12.  The data I have on this array is really important and I want to make sure I get this correct before I actually make changes.

Configuration:
md0 is a RAID 6 volume with 24 devices and 1 spare.  It is working fine and was unaffected.
md1 is a RAID 6 volume with 19 devices and 1 spare.  It was affected.  All the drives show as unknown raid level and 0 devices.  With the exception of device 5.  It has all the information.

Here is the output from that drive:

serveradmin@hulk:/etc/mdadm$ sudo mdadm --examine /dev/sdaf
/dev/sdaf:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 6afb3306:144cec30:1b2d1a19:3a56f0d3
           Name : hulk:1  (local to host hulk)
  Creation Time : Wed Aug 15 16:25:30 2012
     Raid Level : raid6
   Raid Devices : 19

 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
     Array Size : 99629024416 (47506.82 GiB 51010.06 GB)
  Used Dev Size : 5860530848 (2794.52 GiB 3000.59 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 205dfd9f:9be2b9ca:1f775974:fb1b742c

    Update Time : Sat Sep 29 12:22:51 2012
       Checksum : 9f164d8e - correct
         Events : 38

         Layout : left-symmetric
     Chunk Size : 4K

   Device Role : Active device 5
   Array State : AAAAAAAAAAAAAAAAAAA ('A' == active, '.' == missing)

Now I also have md2 which is a striped RAID of both md0 and md1.

When I type:

sudo mdadm --create --assume-clean /dev/md1 --level=6 --chunk=4 --metadata=1.2 --raid-devices=19 /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai /dev/sdaj /dev/sdak /dev/sdal /dev/sdam /dev/sdan /dev/sdao /dev/sdap /dev/sdaq /dev/sdar /dev/sdas

the following error for each device.

mdadm: /dev/sdaa appears to be part of a raid array:
    level=-unknown- devices=0 ctime=Wed Aug 15 16:25:30 2012
mdadm: partition table exists on /dev/sdaa but will be lost or
       meaningless after creating array

I want to make sure by running this above command that I won't affect any of the data of md2 when I assemble that array after creating md1.  Any help on this issue would be greatly appreciated.  I would normally just DD copies but as you can see I would have to buy 19 more 3TB hard drives as well as the time to DD each drive.  It is a production server and that kind of down time would really rather be avoided.  

Thank you so much for your time.

Mark Munoz
623.523.3201

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: MD RAID Bug 7/15/12
  2012-09-30  0:12 MD RAID Bug 7/15/12 Mark Munoz
@ 2012-09-30  2:47 ` Chris Murphy
  2012-09-30 21:08   ` Stefan /*St0fF*/ Hübner
  2012-10-01  3:02 ` NeilBrown
  1 sibling, 1 reply; 12+ messages in thread
From: Chris Murphy @ 2012-09-30  2:47 UTC (permalink / raw)
  To: Linux RAID


On Sep 29, 2012, at 6:12 PM, Mark Munoz wrote:

> 
> Configuration:
> md0 is a RAID 6 volume with 24 devices and 1 spare.  It is working fine and was unaffected.
> md1 is a RAID 6 volume with 19 devices and 1 spare.  It was affected.  All the drives show as unknown raid level and 0 devices.  With the exception of device 5.  It has all the information.
> 
>         Layout : left-symmetric
>     Chunk Size : 4K

Off topic response: I'm kindof new at all of this. But 24 and 19 devices? Is this really ideal? Why not cap RAID6 to a max of 12 disks, and either use LVM or md raid linear to aggregate?

Chris Murphy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: MD RAID Bug 7/15/12
  2012-09-30  2:47 ` Chris Murphy
@ 2012-09-30 21:08   ` Stefan /*St0fF*/ Hübner
  2012-09-30 22:16     ` Chris Murphy
  0 siblings, 1 reply; 12+ messages in thread
From: Stefan /*St0fF*/ Hübner @ 2012-09-30 21:08 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Linux RAID

Am 30.09.2012 04:47, schrieb Chris Murphy:
> 
> On Sep 29, 2012, at 6:12 PM, Mark Munoz wrote:
> 
>>
>> Configuration:
>> md0 is a RAID 6 volume with 24 devices and 1 spare.  It is working fine and was unaffected.
>> md1 is a RAID 6 volume with 19 devices and 1 spare.  It was affected.  All the drives show as unknown raid level and 0 devices.  With the exception of device 5.  It has all the information.
>>
>>         Layout : left-symmetric
>>     Chunk Size : 4K
> 
> Off topic response: I'm kindof new at all of this. But 24 and 19 devices? Is this really ideal? Why not cap RAID6 to a max of 12 disks, and either use LVM or md raid linear to aggregate?
> 
Also off topic: 12 drives would be as "nearly unalignable" as 19 are.
But still this setup is kind of sporty.  I wouldn't put too expensive
data on there.  My rule of thumb: each 4 drives need one drive of
redundancy.  So a ten drive raid6 is good.  Next alignment step (powers
of two amount of data drives) would be 18 - I'd add spare to it.

But 45 drives?  I'd give it a RAID60 of 4x10 drives, then think about it
again, what to do with those other 5 drives... or 2x18+1x6 and 3 spares ...

Well, whatever.  This is not an ideal setup anyhow.  God bless
Supermicro SC847E16JBOD ;)

St0fF


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: MD RAID Bug 7/15/12
  2012-09-30 21:08   ` Stefan /*St0fF*/ Hübner
@ 2012-09-30 22:16     ` Chris Murphy
  2012-10-01 22:27       ` Stefan /*St0fF*/ Hübner
  0 siblings, 1 reply; 12+ messages in thread
From: Chris Murphy @ 2012-09-30 22:16 UTC (permalink / raw)
  To: Linux RAID

On Sep 30, 2012, at 3:08 PM, Stefan /*St0fF*/ Hübner wrote:

> Also off topic: 12 drives would be as "nearly unalignable" as 19 are.

I'm not sure what you mean by unalignable. A separate question is if a 4K chunks size is a good idea, even with 24 disks, but I'm unsure of the usage and workload.

On Sep 29, 2012, at 6:12 PM, Mark Munoz wrote:

> sudo mdadm --create --assume-clean /dev/md1 --level=6 --chunk=4 --metadata=1.2 --raid-devices=19 /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai /dev/sdaj /dev/sdak /dev/sdal /dev/sdam /dev/sdan /dev/sdao /dev/sdap /dev/sdaq /dev/sdar /dev/sdas
> 
> the following error for each device.
> 
> mdadm: /dev/sdaa appears to be part of a raid array:
>    level=-unknown- devices=0 ctime=Wed Aug 15 16:25:30 2012
> mdadm: partition table exists on /dev/sdaa but will be lost or
>       meaningless after creating array
> 
> I want to make sure by running this above command that I won't affect any of the data of md2 when I assemble that array after creating md1.  Any help on this issue would be greatly appreciated.  I would normally just DD copies but as you can see I would have to buy 19 more 3TB hard drives as well as the time to DD each drive.  It is a production server and that kind of down time would really rather be avoided.  

That metadata should be stored elsewhere. If I'm understanding the logic correctly the RAID 6 metadata would be on all /dev/sdaX disks at offset 2048. And the RAID 0 metadata would be on /dev/md0 and /dev/md1 at offset 2048. I'd make certain that /dev/md0 is not mounted, and that neither md0 nor md1 are scheduled for a repair scrubbing which would likely cause problems when it comes time to marry the two RAID 6's back together again.

Maybe not necessary, since you aren't missing any disk, but after create you could do 'echo check > /sys/block/md0/md/sync_action' and check /sys/block/md0/md/mismatch_cnt. If both RAID 6's are happy, then you can deal with the RAID 0, and check the file system with -n or equivalent to report problems but not make repairs.

Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: MD RAID Bug 7/15/12
  2012-09-30  0:12 MD RAID Bug 7/15/12 Mark Munoz
  2012-09-30  2:47 ` Chris Murphy
@ 2012-10-01  3:02 ` NeilBrown
       [not found]   ` <42BA87F6-C5A3-4321-A4C7-0DCF0A9DF79D@rightthisminute.com>
  1 sibling, 1 reply; 12+ messages in thread
From: NeilBrown @ 2012-10-01  3:02 UTC (permalink / raw)
  To: Mark Munoz; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 3174 bytes --]

On Sat, 29 Sep 2012 17:12:40 -0700 Mark Munoz
<mark.munoz@rightthisminute.com> wrote:

> Hi I appear to have been affected by the bug you found on 7/15/12.  The data I have on this array is really important and I want to make sure I get this correct before I actually make changes.
> 
> Configuration:
> md0 is a RAID 6 volume with 24 devices and 1 spare.  It is working fine and was unaffected.
> md1 is a RAID 6 volume with 19 devices and 1 spare.  It was affected.  All the drives show as unknown raid level and 0 devices.  With the exception of device 5.  It has all the information.
> 
> Here is the output from that drive:
> 
> serveradmin@hulk:/etc/mdadm$ sudo mdadm --examine /dev/sdaf
> /dev/sdaf:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x0
>      Array UUID : 6afb3306:144cec30:1b2d1a19:3a56f0d3
>            Name : hulk:1  (local to host hulk)
>   Creation Time : Wed Aug 15 16:25:30 2012
>      Raid Level : raid6
>    Raid Devices : 19
> 
>  Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
>      Array Size : 99629024416 (47506.82 GiB 51010.06 GB)
>   Used Dev Size : 5860530848 (2794.52 GiB 3000.59 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : 205dfd9f:9be2b9ca:1f775974:fb1b742c
> 
>     Update Time : Sat Sep 29 12:22:51 2012
>        Checksum : 9f164d8e - correct
>          Events : 38
> 
>          Layout : left-symmetric
>      Chunk Size : 4K
> 
>    Device Role : Active device 5
>    Array State : AAAAAAAAAAAAAAAAAAA ('A' == active, '.' == missing)
> 
> Now I also have md2 which is a striped RAID of both md0 and md1.
> 
> When I type:
> 
> sudo mdadm --create --assume-clean /dev/md1 --level=6 --chunk=4 --metadata=1.2 --raid-devices=19 /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai /dev/sdaj /dev/sdak /dev/sdal /dev/sdam /dev/sdan /dev/sdao /dev/sdap /dev/sdaq /dev/sdar /dev/sdas
> 
> the following error for each device.
> 
> mdadm: /dev/sdaa appears to be part of a raid array:
>     level=-unknown- devices=0 ctime=Wed Aug 15 16:25:30 2012
> mdadm: partition table exists on /dev/sdaa but will be lost or
>        meaningless after creating array
> 
> I want to make sure by running this above command that I won't affect any of the data of md2 when I assemble that array after creating md1.  Any help on this issue would be greatly appreciated.  I would normally just DD copies but as you can see I would have to buy 19 more 3TB hard drives as well as the time to DD each drive.  It is a production server and that kind of down time would really rather be avoided.  

Running this command will only overwrite the 4K of metadata, 4K from the
start of the devices.  It will not write anything else to any device.

so yes, it is safe.

NeilBrown



> 
> Thank you so much for your time.
> 
> Mark Munoz
> 623.523.3201--
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: MD RAID Bug 7/15/12
  2012-09-30 22:16     ` Chris Murphy
@ 2012-10-01 22:27       ` Stefan /*St0fF*/ Hübner
  0 siblings, 0 replies; 12+ messages in thread
From: Stefan /*St0fF*/ Hübner @ 2012-10-01 22:27 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Linux RAID

Am 01.10.2012 00:16, schrieb Chris Murphy:
> 
> On Sep 30, 2012, at 3:08 PM, Stefan /*St0fF*/ Hübner wrote:
> 
> 
>> Also off topic: 12 drives would be as "nearly unalignable" as 19 are.
> 
> I'm not sure what you mean by unalignable. A separate question is if a 4K chunks size is a good idea, even with 24 disks, but I'm unsure of the usage and workload.
> 
> 
[...]
By that I mean that optimal alignment has to do with a reasonable chunk
size and an amount of data disks that is/should be a power of two.  If
you made a databases commit chunk size 64k, you'd create maybe a raid6
of 6 drives with a 16k chunk size.  That way you'd spare many many
read-modify-write operations, as on every commit it'd write all 6 disks
once and no rmw would happen.  You'd also have to take care of proper
partition placement.

That probably is what you know as alignment.  I just want to make it
more concious that not only the placement of a data partition goes into
alignment, also the size of a "normal" data packet relative to chunk
size and amount of disks used.

cheers,
stefan

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: MD RAID Bug 7/15/12
       [not found]   ` <42BA87F6-C5A3-4321-A4C7-0DCF0A9DF79D@rightthisminute.com>
@ 2012-10-02  1:51     ` Mark Munoz
  2012-10-02  2:25       ` NeilBrown
       [not found]     ` <20121002114920.1029bed7@notabene.brown>
  1 sibling, 1 reply; 12+ messages in thread
From: Mark Munoz @ 2012-10-02  1:51 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org; +Cc: NeilBrown

Neil,

Thank you again so much for taking time out of your day to personally help me it really means a lot.  I have ran the command and have successfully recreated my md1  Now however md2 will not assemble.  I get this error.

sudo mdadm --assemble --force /dev/md2 /dev/md0 /dev/md1
mdadm: superblock on /dev/md1 doesn't match others - assembly aborted

Would I be correct in thinking that I just need to recreate md2 now as well?

I assume with this command?

sudo mdadm --create --assume-clean /dev/md2 --level=0 --chunk=64 --metadata=1.2 --raid-devices=2 /dev/md0 /dev/md1

Mark Munoz

On Oct 1, 2012, at 2:00 PM, Mark Munoz <mark.munoz@rightthisminute.com> wrote:
> 
>> On Sat, 29 Sep 2012 17:12:40 -0700 Mark Munoz
>> <mark.munoz@rightthisminute.com> wrote:
>> 
>>> Hi I appear to have been affected by the bug you found on 7/15/12.  The data I have on this array is really important and I want to make sure I get this correct before I actually make changes.
>>> 
>>> Configuration:
>>> md0 is a RAID 6 volume with 24 devices and 1 spare.  It is working fine and was unaffected.
>>> md1 is a RAID 6 volume with 19 devices and 1 spare.  It was affected.  All the drives show as unknown raid level and 0 devices.  With the exception of device 5.  It has all the information.
>>> 
>>> Here is the output from that drive:
>>> 
>>> serveradmin@hulk:/etc/mdadm$ sudo mdadm --examine /dev/sdaf
>>> /dev/sdaf:
>>>         Magic : a92b4efc
>>>       Version : 1.2
>>>   Feature Map : 0x0
>>>    Array UUID : 6afb3306:144cec30:1b2d1a19:3a56f0d3
>>>          Name : hulk:1  (local to host hulk)
>>> Creation Time : Wed Aug 15 16:25:30 2012
>>>    Raid Level : raid6
>>>  Raid Devices : 19
>>> 
>>> Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
>>>    Array Size : 99629024416 (47506.82 GiB 51010.06 GB)
>>> Used Dev Size : 5860530848 (2794.52 GiB 3000.59 GB)
>>>   Data Offset : 2048 sectors
>>>  Super Offset : 8 sectors
>>>         State : clean
>>>   Device UUID : 205dfd9f:9be2b9ca:1f775974:fb1b742c
>>> 
>>>   Update Time : Sat Sep 29 12:22:51 2012
>>>      Checksum : 9f164d8e - correct
>>>        Events : 38
>>> 
>>>        Layout : left-symmetric
>>>    Chunk Size : 4K
>>> 
>>>  Device Role : Active device 5
>>>  Array State : AAAAAAAAAAAAAAAAAAA ('A' == active, '.' == missing)
>>> 
>>> Now I also have md2 which is a striped RAID of both md0 and md1.
>>> 
>>> When I type:
>>> 
>>> sudo mdadm --create --assume-clean /dev/md1 --level=6 --chunk=4 --metadata=1.2 --raid-devices=19 /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai /dev/sdaj /dev/sdak /dev/sdal /dev/sdam /dev/sdan /dev/sdao /dev/sdap /dev/sdaq /dev/sdar /dev/sdas
>>> 
>>> the following error for each device.
>>> 
>>> mdadm: /dev/sdaa appears to be part of a raid array:
>>>   level=-unknown- devices=0 ctime=Wed Aug 15 16:25:30 2012
>>> mdadm: partition table exists on /dev/sdaa but will be lost or
>>>      meaningless after creating array
>>> 
>>> I want to make sure by running this above command that I won't affect any of the data of md2 when I assemble that array after creating md1.  Any help on this issue would be greatly appreciated.  I would normally just DD copies but as you can see I would have to buy 19 more 3TB hard drives as well as the time to DD each drive.  It is a production server and that kind of down time would really rather be avoided.  
>> 
>> Running this command will only overwrite the 4K of metadata, 4K from the
>> start of the devices.  It will not write anything else to any device.
>> 
>> so yes, it is safe.
>> 
>> NeilBrown
>> 
>> 
>> 
>>> 
>>> Thank you so much for your time.
>>> 
>>> Mark Munoz
>>> 623.523.3201--
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: MD RAID Bug 7/15/12
  2012-10-02  1:51     ` Mark Munoz
@ 2012-10-02  2:25       ` NeilBrown
  0 siblings, 0 replies; 12+ messages in thread
From: NeilBrown @ 2012-10-02  2:25 UTC (permalink / raw)
  To: Mark Munoz; +Cc: linux-raid@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 4847 bytes --]

On Mon, 1 Oct 2012 18:51:09 -0700 Mark Munoz <mark.munoz@rightthisminute.com>
wrote:

> Neil,
> 
> Thank you again so much for taking time out of your day to personally help me it really means a lot.  I have ran the command and have successfully recreated my md1  Now however md2 will not assemble.  I get this error.
> 
> sudo mdadm --assemble --force /dev/md2 /dev/md0 /dev/md1
> mdadm: superblock on /dev/md1 doesn't match others - assembly aborted
> 
> Would I be correct in thinking that I just need to recreate md2 now as well?

Maybe, but probably not.
I would think it more likely that md1 wasn't created quite right - otherwise
it should have the right metadata.

What does:
  mdadm -E /dev/md1
display now?  How does that compare with "mdadm -E /dev/md0" ?

What about
   mdadm -E /dev/sdaf
(or any other device in md1)?  How does that compare to what was displayed
previously?

NeilBrown


> 
> I assume with this command?
> 
> sudo mdadm --create --assume-clean /dev/md2 --level=0 --chunk=64 --metadata=1.2 --raid-devices=2 /dev/md0 /dev/md1
> 
> Mark Munoz
> 
> On Oct 1, 2012, at 2:00 PM, Mark Munoz <mark.munoz@rightthisminute.com> wrote:
> > 
> >> On Sat, 29 Sep 2012 17:12:40 -0700 Mark Munoz
> >> <mark.munoz@rightthisminute.com> wrote:
> >> 
> >>> Hi I appear to have been affected by the bug you found on 7/15/12.  The data I have on this array is really important and I want to make sure I get this correct before I actually make changes.
> >>> 
> >>> Configuration:
> >>> md0 is a RAID 6 volume with 24 devices and 1 spare.  It is working fine and was unaffected.
> >>> md1 is a RAID 6 volume with 19 devices and 1 spare.  It was affected.  All the drives show as unknown raid level and 0 devices.  With the exception of device 5.  It has all the information.
> >>> 
> >>> Here is the output from that drive:
> >>> 
> >>> serveradmin@hulk:/etc/mdadm$ sudo mdadm --examine /dev/sdaf
> >>> /dev/sdaf:
> >>>         Magic : a92b4efc
> >>>       Version : 1.2
> >>>   Feature Map : 0x0
> >>>    Array UUID : 6afb3306:144cec30:1b2d1a19:3a56f0d3
> >>>          Name : hulk:1  (local to host hulk)
> >>> Creation Time : Wed Aug 15 16:25:30 2012
> >>>    Raid Level : raid6
> >>>  Raid Devices : 19
> >>> 
> >>> Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
> >>>    Array Size : 99629024416 (47506.82 GiB 51010.06 GB)
> >>> Used Dev Size : 5860530848 (2794.52 GiB 3000.59 GB)
> >>>   Data Offset : 2048 sectors
> >>>  Super Offset : 8 sectors
> >>>         State : clean
> >>>   Device UUID : 205dfd9f:9be2b9ca:1f775974:fb1b742c
> >>> 
> >>>   Update Time : Sat Sep 29 12:22:51 2012
> >>>      Checksum : 9f164d8e - correct
> >>>        Events : 38
> >>> 
> >>>        Layout : left-symmetric
> >>>    Chunk Size : 4K
> >>> 
> >>>  Device Role : Active device 5
> >>>  Array State : AAAAAAAAAAAAAAAAAAA ('A' == active, '.' == missing)
> >>> 
> >>> Now I also have md2 which is a striped RAID of both md0 and md1.
> >>> 
> >>> When I type:
> >>> 
> >>> sudo mdadm --create --assume-clean /dev/md1 --level=6 --chunk=4 --metadata=1.2 --raid-devices=19 /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai /dev/sdaj /dev/sdak /dev/sdal /dev/sdam /dev/sdan /dev/sdao /dev/sdap /dev/sdaq /dev/sdar /dev/sdas
> >>> 
> >>> the following error for each device.
> >>> 
> >>> mdadm: /dev/sdaa appears to be part of a raid array:
> >>>   level=-unknown- devices=0 ctime=Wed Aug 15 16:25:30 2012
> >>> mdadm: partition table exists on /dev/sdaa but will be lost or
> >>>      meaningless after creating array
> >>> 
> >>> I want to make sure by running this above command that I won't affect any of the data of md2 when I assemble that array after creating md1.  Any help on this issue would be greatly appreciated.  I would normally just DD copies but as you can see I would have to buy 19 more 3TB hard drives as well as the time to DD each drive.  It is a production server and that kind of down time would really rather be avoided.  
> >> 
> >> Running this command will only overwrite the 4K of metadata, 4K from the
> >> start of the devices.  It will not write anything else to any device.
> >> 
> >> so yes, it is safe.
> >> 
> >> NeilBrown
> >> 
> >> 
> >> 
> >>> 
> >>> Thank you so much for your time.
> >>> 
> >>> Mark Munoz
> >>> 623.523.3201--
> >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> 
> > 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: MD RAID Bug 7/15/12
       [not found]     ` <20121002114920.1029bed7@notabene.brown>
@ 2012-10-02  2:33       ` Mark Munoz
  2012-10-02  5:07         ` NeilBrown
  0 siblings, 1 reply; 12+ messages in thread
From: Mark Munoz @ 2012-10-02  2:33 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org; +Cc: NeilBrown

Yes sorry should have sent that output.  This is the weird side!

So md0 was unaffected and it assembled perfectly.  Here is a sample output from /dev/sdf

/dev/sdf:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : a24f542c:e0bc0fd0:983af76e:c7581724
           Name : hulk:0  (local to host hulk)
  Creation Time : Wed Aug 15 16:24:17 2012
     Raid Level : raid6
   Raid Devices : 24

 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
     Array Size : 128931677952 (61479.42 GiB 66013.02 GB)
  Used Dev Size : 5860530816 (2794.52 GiB 3000.59 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : c3d3ea5b:7db696fd:71474db6:c07c7415

    Update Time : Mon Oct  1 13:54:05 2012
       Checksum : 246155dd - correct
         Events : 41

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 4
   Array State : AAAAAAAAAAAAAAAAAAAAAAAA ('A' == active, '.' == missing)

So this chunk size is 64k.  The chunk size of md1 is 4k.

/dev/sdaf:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : ab4c898b:03310a29:276a40a1:2ad45c73
           Name : hulk:1  (local to host hulk)
  Creation Time : Mon Oct  1 13:51:50 2012
     Raid Level : raid6
   Raid Devices : 19

 Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
     Array Size : 99629024416 (47506.82 GiB 51010.06 GB)
  Used Dev Size : 5860530848 (2794.52 GiB 3000.59 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 0a88e45f:7df6b724:59825715:faf4abdf

    Update Time : Mon Oct  1 13:54:05 2012
       Checksum : b276ec79 - correct
         Events : 2

         Layout : left-symmetric
     Chunk Size : 4K

   Device Role : Active device 5
   Array State : AAAAAAAAAAAAAAAAAAA ('A' == active, '.' == missing)

I created these arrays using webmin and I guess I must have left the default which is 4k in that tool.  I manually changed md0 and md2 to 64k when I created those arrays.

However here is the weird part.  This is the output of md0 which was the array that built perfectly upon reboot.

/dev/md0:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : aaa29f43:b689f66d:f270c5fc:405620b7
           Name : hulk:2  (local to host hulk)
  Creation Time : Wed Aug 15 16:26:09 2012
     Raid Level : -unknown-
   Raid Devices : 0

 Avail Dev Size : 128931675904 (61479.42 GiB 66013.02 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : b626dd72:995f1277:6f61e94a:489e0468

    Update Time : Sat Sep 29 14:49:09 2012
       Checksum : c09051e9 - correct
         Events : 1


   Device Role : spare
   Array State :  ('A' == active, '.' == missing)


And this is the output of md1 AFTER the recreation.

/dev/md1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : aaa29f43:b689f66d:f270c5fc:405620b7
           Name : hulk:2  (local to host hulk)
  Creation Time : Wed Aug 15 16:26:09 2012
     Raid Level : raid0
   Raid Devices : 2

 Avail Dev Size : 99629022368 (47506.82 GiB 51010.06 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 184bb4db:b77d921f:28a45e09:dc58e1e1

    Update Time : Wed Aug 15 16:26:09 2012
       Checksum : 334e56ee - correct
         Events : 0

     Chunk Size : 64K

   Device Role : Active device 1
   Array State : AA ('A' == active, '.' == missing)


So I am now thinking that during my shutdown this weekend the device /dev/md0 as part of the array /dev/md2 was affected by the bug as well as 18 of the 19 devices contained inside of /devm/md1

Mark Munoz
On Oct 1, 2012, at 6:49 PM, NeilBrown <neilb@suse.de> wrote:

> On Mon, 1 Oct 2012 14:00:24 -0700 Mark Munoz <mark.munoz@rightthisminute.com>
> wrote:
> 
>> Neil,
>> 
>> Thank you again so much for taking time out of your day to personally help me it really means a lot.  I have ran the command and have successfully recreated my md1  Now however md2 will not assemble.  I get this error.
> 
> Please don't take the discussion off-list.
> 
> If you want an answer, you will have to post to the list - and CC me if you
> like.
> 
> NeilBrown
> 
> 
>> 
>> sudo mdadm --assemble --force /dev/md2 /dev/md0 /dev/md1
>> mdadm: superblock on /dev/md1 doesn't match others - assembly aborted
>> 
>> Would I be correct in thinking that I just need to recreate md2 now as well?
>> 
>> I assume with this command?
>> 
>> sudo mdadm --create --assume-clean /dev/md2 --level=0 --chunk=64 --metadata=1.2 --raid-devices=2 /dev/md0 /dev/md1
>> 
>> Mark Munoz
>> 
>> On Sep 30, 2012, at 8:02 PM, NeilBrown <neilb@suse.de> wrote:
>> 
>>> On Sat, 29 Sep 2012 17:12:40 -0700 Mark Munoz
>>> <mark.munoz@rightthisminute.com> wrote:
>>> 
>>>> Hi I appear to have been affected by the bug you found on 7/15/12.  The data I have on this array is really important and I want to make sure I get this correct before I actually make changes.
>>>> 
>>>> Configuration:
>>>> md0 is a RAID 6 volume with 24 devices and 1 spare.  It is working fine and was unaffected.
>>>> md1 is a RAID 6 volume with 19 devices and 1 spare.  It was affected.  All the drives show as unknown raid level and 0 devices.  With the exception of device 5.  It has all the information.
>>>> 
>>>> Here is the output from that drive:
>>>> 
>>>> serveradmin@hulk:/etc/mdadm$ sudo mdadm --examine /dev/sdaf
>>>> /dev/sdaf:
>>>>         Magic : a92b4efc
>>>>       Version : 1.2
>>>>   Feature Map : 0x0
>>>>    Array UUID : 6afb3306:144cec30:1b2d1a19:3a56f0d3
>>>>          Name : hulk:1  (local to host hulk)
>>>> Creation Time : Wed Aug 15 16:25:30 2012
>>>>    Raid Level : raid6
>>>>  Raid Devices : 19
>>>> 
>>>> Avail Dev Size : 5860531120 (2794.52 GiB 3000.59 GB)
>>>>    Array Size : 99629024416 (47506.82 GiB 51010.06 GB)
>>>> Used Dev Size : 5860530848 (2794.52 GiB 3000.59 GB)
>>>>   Data Offset : 2048 sectors
>>>>  Super Offset : 8 sectors
>>>>         State : clean
>>>>   Device UUID : 205dfd9f:9be2b9ca:1f775974:fb1b742c
>>>> 
>>>>   Update Time : Sat Sep 29 12:22:51 2012
>>>>      Checksum : 9f164d8e - correct
>>>>        Events : 38
>>>> 
>>>>        Layout : left-symmetric
>>>>    Chunk Size : 4K
>>>> 
>>>>  Device Role : Active device 5
>>>>  Array State : AAAAAAAAAAAAAAAAAAA ('A' == active, '.' == missing)
>>>> 
>>>> Now I also have md2 which is a striped RAID of both md0 and md1.
>>>> 
>>>> When I type:
>>>> 
>>>> sudo mdadm --create --assume-clean /dev/md1 --level=6 --chunk=4 --metadata=1.2 --raid-devices=19 /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai /dev/sdaj /dev/sdak /dev/sdal /dev/sdam /dev/sdan /dev/sdao /dev/sdap /dev/sdaq /dev/sdar /dev/sdas
>>>> 
>>>> the following error for each device.
>>>> 
>>>> mdadm: /dev/sdaa appears to be part of a raid array:
>>>>   level=-unknown- devices=0 ctime=Wed Aug 15 16:25:30 2012
>>>> mdadm: partition table exists on /dev/sdaa but will be lost or
>>>>      meaningless after creating array
>>>> 
>>>> I want to make sure by running this above command that I won't affect any of the data of md2 when I assemble that array after creating md1.  Any help on this issue would be greatly appreciated.  I would normally just DD copies but as you can see I would have to buy 19 more 3TB hard drives as well as the time to DD each drive.  It is a production server and that kind of down time would really rather be avoided.  
>>> 
>>> Running this command will only overwrite the 4K of metadata, 4K from the
>>> start of the devices.  It will not write anything else to any device.
>>> 
>>> so yes, it is safe.
>>> 
>>> NeilBrown
>>> 
>>> 
>>> 
>>>> 
>>>> Thank you so much for your time.
>>>> 
>>>> Mark Munoz
>>>> 623.523.3201--
>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> 
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: MD RAID Bug 7/15/12
  2012-10-02  2:33       ` Mark Munoz
@ 2012-10-02  5:07         ` NeilBrown
  2012-10-02 22:53           ` Mark Munoz
  0 siblings, 1 reply; 12+ messages in thread
From: NeilBrown @ 2012-10-02  5:07 UTC (permalink / raw)
  To: Mark Munoz; +Cc: linux-raid@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 492 bytes --]

On Mon, 1 Oct 2012 19:33:50 -0700 Mark Munoz <mark.munoz@rightthisminute.com>
wrote:


> So I am now thinking that during my shutdown this weekend the device /dev/md0 as part of the array /dev/md2 was affected by the bug as well as 18 of the 19 devices contained inside of /devm/md1

Yes, I agree, so:

> >> 
> >> sudo mdadm --create --assume-clean /dev/md2 --level=0 --chunk=64 --metadata=1.2 --raid-devices=2 /dev/md0 /dev/md1

This is the correct thing to do.

NeilBrown



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: MD RAID Bug 7/15/12
  2012-10-02  5:07         ` NeilBrown
@ 2012-10-02 22:53           ` Mark Munoz
  2012-10-03  1:54             ` NeilBrown
  0 siblings, 1 reply; 12+ messages in thread
From: Mark Munoz @ 2012-10-02 22:53 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid@vger.kernel.org

Neil,

I just wanted to send you a HUGE thank you.  We got back all of our data and are very pleased!  Do you guys have some sort of donation fund or something that we can send something to your team to show our appreciation?

Mark

On Oct 1, 2012, at 10:07 PM, NeilBrown <neilb@suse.de> wrote:

> On Mon, 1 Oct 2012 19:33:50 -0700 Mark Munoz <mark.munoz@rightthisminute.com>
> wrote:
> 
> 
>> So I am now thinking that during my shutdown this weekend the device /dev/md0 as part of the array /dev/md2 was affected by the bug as well as 18 of the 19 devices contained inside of /devm/md1
> 
> Yes, I agree, so:
> 
>>>> 
>>>> sudo mdadm --create --assume-clean /dev/md2 --level=0 --chunk=64 --metadata=1.2 --raid-devices=2 /dev/md0 /dev/md1
> 
> This is the correct thing to do.
> 
> NeilBrown
> 
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: MD RAID Bug 7/15/12
  2012-10-02 22:53           ` Mark Munoz
@ 2012-10-03  1:54             ` NeilBrown
  0 siblings, 0 replies; 12+ messages in thread
From: NeilBrown @ 2012-10-03  1:54 UTC (permalink / raw)
  To: Mark Munoz; +Cc: linux-raid@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 1042 bytes --]

On Tue, 2 Oct 2012 15:53:34 -0700 Mark Munoz <mark.munoz@rightthisminute.com>
wrote:

> Neil,
> 
> I just wanted to send you a HUGE thank you.  We got back all of our data and are very pleased!  Do you guys have some sort of donation fund or something that we can send something to your team to show our appreciation?

Thanks for the offer, but no: there is nothing like that.

NeilBrown



> 
> Mark
> 
> On Oct 1, 2012, at 10:07 PM, NeilBrown <neilb@suse.de> wrote:
> 
> > On Mon, 1 Oct 2012 19:33:50 -0700 Mark Munoz <mark.munoz@rightthisminute.com>
> > wrote:
> > 
> > 
> >> So I am now thinking that during my shutdown this weekend the device /dev/md0 as part of the array /dev/md2 was affected by the bug as well as 18 of the 19 devices contained inside of /devm/md1
> > 
> > Yes, I agree, so:
> > 
> >>>> 
> >>>> sudo mdadm --create --assume-clean /dev/md2 --level=0 --chunk=64 --metadata=1.2 --raid-devices=2 /dev/md0 /dev/md1
> > 
> > This is the correct thing to do.
> > 
> > NeilBrown
> > 
> > 


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-10-03  1:54 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-30  0:12 MD RAID Bug 7/15/12 Mark Munoz
2012-09-30  2:47 ` Chris Murphy
2012-09-30 21:08   ` Stefan /*St0fF*/ Hübner
2012-09-30 22:16     ` Chris Murphy
2012-10-01 22:27       ` Stefan /*St0fF*/ Hübner
2012-10-01  3:02 ` NeilBrown
     [not found]   ` <42BA87F6-C5A3-4321-A4C7-0DCF0A9DF79D@rightthisminute.com>
2012-10-02  1:51     ` Mark Munoz
2012-10-02  2:25       ` NeilBrown
     [not found]     ` <20121002114920.1029bed7@notabene.brown>
2012-10-02  2:33       ` Mark Munoz
2012-10-02  5:07         ` NeilBrown
2012-10-02 22:53           ` Mark Munoz
2012-10-03  1:54             ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).