raid10 devices all marked as spares?!

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* raid10 devices all marked as spares?!
@ 2012-05-28 20:50 Oliver Schinagl
  2012-05-28 22:07 ` NeilBrown
  0 siblings, 1 reply; 8+ messages in thread
From: Oliver Schinagl @ 2012-05-28 20:50 UTC (permalink / raw)
  To: linux-raid

Hi list,

I'm sorry if this is the wrong place to start, but I've been quite lost 
as to what is going wrong here.

I've been having some issues latly with my raid10 array. First some info.

I have three raid10 arrays on my gentoo box on 2 drives using GPT.
I was running 3.2.1 at the time but have 3.4.0 running at the moment.
mdadm - v3.2.5 - 18th May 2012

md0, a 2 far-copies, 1.2 metadata, raid10 array  consisting of /dev/sda4 
and sdb4.
md1, a 2 offset-copies, 1.2 metadata, raid10 array consisting of 
/dev/sda5 and sdb5
md2, a 2 offset-copies, 1.2 metadata, raid10 array consisting of 
/dev/sda6 and sdb6

sd*1 is bios_grub data, sd*2 is 256mb fat for playing with uefi and sd*3 
is 8gigs of unused space, may have some version of ubuntu on it and sd*7 
for swap.

For all of this, md0 has always worked normally. it is being assembled 
from initramfs where a static mdadm lives as such:
/bin/mdadm -A /dev/md0 -R -a md /dev/sda4 /dev/sdb4 || exit 1

md1 and md2 are being brought up during boot, md0 holds root, /usr etc 
wheras md1 are just for home and data.

The last few weeks md1 and md2 randomly fail to come up properly. md1 or 
md2 come up as inactive and one of the two drivers are marked as spares. 
(Why as spares? Why won't it try to run the array with a missing drive?) 
While this happens, it's completly abitrary whether sda or sdb is being 
used. so md1 can be sda5[2](S) and md2 can be sdb5[2](S).

When this happens, I mdadm --stop /dev/md1 and /dev/md2, followed 
immediatly by mdadm -A /dev/md1 (using mdadm.conf which doesn't even 
list the devices. ARRAY /dev/md1 metadata=1.2 UUID=nnn name=host:home). 
The arrays come up and work just fine.

What happend today however, is that md2 again does not come up, and 
sda6[3](S) shows in /proc/mdadm. However re-assembly of the array fails 
and only using mdadm -A /dev/md2 /dev/sda6 /dev/sdb6 shows:
mdadm: device 1 in /dev/md2 has wrong state in superblock, but /dev/sdb6 
seems ok
mdadm: /dev/md2 assembled from 0 drives and 2 spares - not enough to 
start the array.
/proc/mdadm shows as somewhat expected.
md2 : inactive sda6[3](S) sdb6[2](S)

Only using sdb6 however also fails. I guess because it does not want to 
use a spare.
mdadm: failed to RUN_ARRAY /dev/md2: Invalid argument
mdadm: Not enough devices to start the array.

Now the really disturbing part comes from mdadm --examine.
valexia oliver # mdadm --examine /dev/sda6
/dev/sda6:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : nnnn
            Name : host:opt  (local to host host)
   Creation Time : Sun Aug 28 17:46:27 2011
      Raid Level : -unknown-
    Raid Devices : 0

  Avail Dev Size : 456165376 (217.52 GiB 233.56 GB)
     Data Offset : 2048 sectors
    Super Offset : 8 sectors
           State : active
     Device UUID : nnnn

     Update Time : Mon May 28 20:52:35 2012
        Checksum : ac17255 - correct
          Events : 1

    Device Role : spare
    Array State :  ('A' == active, '.' == missing)

sdb6 lists identical content only with the checksum's being correbt, 
albeit different and of coruse the Device UUID. Array UUID is of course 
identical as is creation time.

Also to note, is that grub2 does mention an 'error: Unsupported RAID 
level: -1000000.' which probably relates to the 'Raid Level: -unknown-'.

As to what may have caused this? I have absolutely no idea. I did a 
clean shutdown where the arrays get cleanly unmounted. Not 100% sure if 
the arrays get --stopped but I would be surprised if they did not.

So I guess is this a md driver bug? Is there anything I can do to 
recover my data, which i cannot image it not being?

Thanks in advance for reading this.

Oliver

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: raid10 devices all marked as spares?!
  2012-05-28 20:50 raid10 devices all marked as spares?! Oliver Schinagl
@ 2012-05-28 22:07 ` NeilBrown
  2012-05-28 22:44   ` Oliver Schinagl
  0 siblings, 1 reply; 8+ messages in thread
From: NeilBrown @ 2012-05-28 22:07 UTC (permalink / raw)
  To: Oliver Schinagl; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 5664 bytes --]

On Mon, 28 May 2012 22:50:03 +0200 Oliver Schinagl <oliver+list@schinagl.nl>
wrote:

> Hi list,
> 
> I'm sorry if this is the wrong place to start, but I've been quite lost 
> as to what is going wrong here.

No, you are in exactly the right place!

> 
> I've been having some issues latly with my raid10 array. First some info.
> 
> I have three raid10 arrays on my gentoo box on 2 drives using GPT.
> I was running 3.2.1 at the time but have 3.4.0 running at the moment.
> mdadm - v3.2.5 - 18th May 2012
> 
> 
> md0, a 2 far-copies, 1.2 metadata, raid10 array  consisting of /dev/sda4 
> and sdb4.
> md1, a 2 offset-copies, 1.2 metadata, raid10 array consisting of 
> /dev/sda5 and sdb5
> md2, a 2 offset-copies, 1.2 metadata, raid10 array consisting of 
> /dev/sda6 and sdb6

I'm liking the level of detail you are providing - thanks.

> 
> sd*1 is bios_grub data, sd*2 is 256mb fat for playing with uefi and sd*3 
> is 8gigs of unused space, may have some version of ubuntu on it and sd*7 
> for swap.
> 
> For all of this, md0 has always worked normally. it is being assembled 
> from initramfs where a static mdadm lives as such:
> /bin/mdadm -A /dev/md0 -R -a md /dev/sda4 /dev/sdb4 || exit 1

In general I wouldn't recommend this.  Names of sd devices change when
devices are removed or added, so this is fragile.  It may cause the actual
problems you have been experiencing currently.

> 
> md1 and md2 are being brought up during boot, md0 holds root, /usr etc 
> wheras md1 are just for home and data.
> 
> The last few weeks md1 and md2 randomly fail to come up properly. md1 or 
> md2 come up as inactive and one of the two drivers are marked as spares. 
> (Why as spares? Why won't it try to run the array with a missing drive?) 
> While this happens, it's completly abitrary whether sda or sdb is being 
> used. so md1 can be sda5[2](S) and md2 can be sdb5[2](S).

The (S) is a bit misleading here.  When an array is 'inactive', all devices
are marked as '(S)', because they are not currently active (nothing is as the
whole array is inactive).

When md1 has sda5[2](S), is sdb5 mentioned for md1 as well, or is it simply
absent.   I'm guessing the second.

This it most likely caused by "mdadm -I" being run by udev on device
discovery.  Possibly it is racing with an "mdadm -A" run from a boot script.
Have a look for a udev/rules.d script which run mdadm -I and maybe disable it
and see what happens.

> 
> When this happens, I mdadm --stop /dev/md1 and /dev/md2, followed 
> immediatly by mdadm -A /dev/md1 (using mdadm.conf which doesn't even 
> list the devices. ARRAY /dev/md1 metadata=1.2 UUID=nnn name=host:home). 
> The arrays come up and work just fine.
> 
> What happend today however, is that md2 again does not come up, and 
> sda6[3](S) shows in /proc/mdadm. However re-assembly of the array fails 
> and only using mdadm -A /dev/md2 /dev/sda6 /dev/sdb6 shows:
> mdadm: device 1 in /dev/md2 has wrong state in superblock, but /dev/sdb6 
> seems ok
> mdadm: /dev/md2 assembled from 0 drives and 2 spares - not enough to 
> start the array.
> /proc/mdadm shows as somewhat expected.
> md2 : inactive sda6[3](S) sdb6[2](S)
> 
> Only using sdb6 however also fails. I guess because it does not want to 
> use a spare.
> mdadm: failed to RUN_ARRAY /dev/md2: Invalid argument
> mdadm: Not enough devices to start the array.
> 
> Now the really disturbing part comes from mdadm --examine.
> valexia oliver # mdadm --examine /dev/sda6
> /dev/sda6:
>            Magic : a92b4efc
>          Version : 1.2
>      Feature Map : 0x0
>       Array UUID : nnnn
>             Name : host:opt  (local to host host)
>    Creation Time : Sun Aug 28 17:46:27 2011
>       Raid Level : -unknown-
>     Raid Devices : 0
> 
>   Avail Dev Size : 456165376 (217.52 GiB 233.56 GB)
>      Data Offset : 2048 sectors
>     Super Offset : 8 sectors
>            State : active
>      Device UUID : nnnn
> 
>      Update Time : Mon May 28 20:52:35 2012
>         Checksum : ac17255 - correct
>           Events : 1
> 
> 
>     Device Role : spare
>     Array State :  ('A' == active, '.' == missing)
> 
> sdb6 lists identical content only with the checksum's being correbt, 
> albeit different and of coruse the Device UUID. Array UUID is of course 
> identical as is creation time.
> 
> Also to note, is that grub2 does mention an 'error: Unsupported RAID 
> level: -1000000.' which probably relates to the 'Raid Level: -unknown-'.
> 
> As to what may have caused this? I have absolutely no idea. I did a 
> clean shutdown where the arrays get cleanly unmounted. Not 100% sure if 
> the arrays get --stopped but I would be surprised if they did not.
> 
> So I guess is this a md driver bug? Is there anything I can do to 
> recover my data, which i cannot image it not being?

This is a known bug which has been fixed.  You are now running 3.4 so are
safe from it.
You can recover your data by re-creating the array.

  mdadm -C /dev/md2 -l10 -n2 --layout o2 --assume-clean \
  -e 1.2  /dev/sda6 /dev/sdb6

Check that I have that right - don't just assume :-)

when you have created the array, check that the 'Data Offset' is still
correct, then if it is "fsck -n" the array to ensure everything looks good.
Then you should be back in business.

NeilBrown



> 
> Thanks in advance for reading this.
> 
> Oliver
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: raid10 devices all marked as spares?!
  2012-05-28 22:07 ` NeilBrown
@ 2012-05-28 22:44   ` Oliver Schinagl
  2012-05-28 23:09     ` NeilBrown
  0 siblings, 1 reply; 8+ messages in thread
From: Oliver Schinagl @ 2012-05-28 22:44 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On 05/29/12 00:07, NeilBrown wrote:
> On Mon, 28 May 2012 22:50:03 +0200 Oliver Schinagl<oliver+list@schinagl.nl>
> wrote:
>
>> Hi list,
>>
>> I'm sorry if this is the wrong place to start, but I've been quite lost
>> as to what is going wrong here.
>
> No, you are in exactly the right place!
Pfew :D

>
>>
>> I've been having some issues latly with my raid10 array. First some info.
>>
>> I have three raid10 arrays on my gentoo box on 2 drives using GPT.
>> I was running 3.2.1 at the time but have 3.4.0 running at the moment.
>> mdadm - v3.2.5 - 18th May 2012
>>
>>
>> md0, a 2 far-copies, 1.2 metadata, raid10 array  consisting of /dev/sda4
>> and sdb4.
>> md1, a 2 offset-copies, 1.2 metadata, raid10 array consisting of
>> /dev/sda5 and sdb5
>> md2, a 2 offset-copies, 1.2 metadata, raid10 array consisting of
>> /dev/sda6 and sdb6
>
> I'm liking the level of detail you are providing - thanks.
The more information provided, the better I always recon!

>
>>
>> sd*1 is bios_grub data, sd*2 is 256mb fat for playing with uefi and sd*3
>> is 8gigs of unused space, may have some version of ubuntu on it and sd*7
>> for swap.
>>
>> For all of this, md0 has always worked normally. it is being assembled
>> from initramfs where a static mdadm lives as such:
>> /bin/mdadm -A /dev/md0 -R -a md /dev/sda4 /dev/sdb4 || exit 1
>
> In general I wouldn't recommend this.  Names of sd devices change when
> devices are removed or added, so this is fragile.  It may cause the actual
> problems you have been experiencing currently.
Yes! Yes yes yes! I know. Kinda offtopic here, but:

I've always used a small 100mb -250mb (or 1gb on my desktop) array using 
metadata 0.9 and autodetect. This worked perfectly. /usr, /home etc 
where on exotic raid setups (metadata 1.2 etc) but this all just worked 
(tm).

Recently Fedora decided booting and /usr was a mess and not long after 
udev (i belive not only in gentoo) 'agreed' that /usr and / should be 
merged.

With 0.90 autodetect being depreciated by the kernel anyway I decided to 
bite the bullit and use my 8gb /usr as combined / and /usr. Now however 
I was 'forced' to also use an initramfs to get my raid array going. Long 
story short, I just quickly hacked that together as minimally as 
possible, as I haven't found any 'clean' way to do 
it/documented/recommended way to copy.

It's not only just error-prone, It's also broken. Having a disk missing 
or fail, causes kernel panics due to init pre-maturly failing because 
mdadm fails at finding /dev/sdb.

>
>>
>> md1 and md2 are being brought up during boot, md0 holds root, /usr etc
>> wheras md1 are just for home and data.
>>
>> The last few weeks md1 and md2 randomly fail to come up properly. md1 or
>> md2 come up as inactive and one of the two drivers are marked as spares.
>> (Why as spares? Why won't it try to run the array with a missing drive?)
>> While this happens, it's completly abitrary whether sda or sdb is being
>> used. so md1 can be sda5[2](S) and md2 can be sdb5[2](S).
>
> The (S) is a bit misleading here.  When an array is 'inactive', all devices
> are marked as '(S)', because they are not currently active (nothing is as the
> whole array is inactive).
>
> When md1 has sda5[2](S), is sdb5 mentioned for md1 as well, or is it simply
> absent.   I'm guessing the second.
Yes, i'm 99% sure however it's randomly sda6 or sdb6 that's shown. But 
never both. Only if I do mdadm -A /dev/md2 /dev/sda6 /dev/sdb6 (after 
stopping md2 first of course).

>
> This it most likely caused by "mdadm -I" being run by udev on device
> discovery.  Possibly it is racing with an "mdadm -A" run from a boot script.
> Have a look for a udev/rules.d script which run mdadm -I and maybe disable it
> and see what happens.
In init.d I find two scripts calling mdadm. /etc/init.d/mdadm only does 
monitoring (mdadm --monitor --scan --daemonize) I strongly doubt that 
forces any of the assembling? (even though scan is there?)

/etc/init.d/mdraid does some md stuff, but that's not run nor enabled.

BTW, I only start md0 from initramfs, so udev apparently does the rest 
(and sucks at it?)

>
>>
>> When this happens, I mdadm --stop /dev/md1 and /dev/md2, followed
>> immediatly by mdadm -A /dev/md1 (using mdadm.conf which doesn't even
>> list the devices. ARRAY /dev/md1 metadata=1.2 UUID=nnn name=host:home).
>> The arrays come up and work just fine.
>>
>> What happend today however, is that md2 again does not come up, and
>> sda6[3](S) shows in /proc/mdadm. However re-assembly of the array fails
>> and only using mdadm -A /dev/md2 /dev/sda6 /dev/sdb6 shows:
>> mdadm: device 1 in /dev/md2 has wrong state in superblock, but /dev/sdb6
>> seems ok
>> mdadm: /dev/md2 assembled from 0 drives and 2 spares - not enough to
>> start the array.
>> /proc/mdadm shows as somewhat expected.
>> md2 : inactive sda6[3](S) sdb6[2](S)
>>
>> Only using sdb6 however also fails. I guess because it does not want to
>> use a spare.
>> mdadm: failed to RUN_ARRAY /dev/md2: Invalid argument
>> mdadm: Not enough devices to start the array.
>>
>> Now the really disturbing part comes from mdadm --examine.
>> valexia oliver # mdadm --examine /dev/sda6
>> /dev/sda6:
>>             Magic : a92b4efc
>>           Version : 1.2
>>       Feature Map : 0x0
>>        Array UUID : nnnn
>>              Name : host:opt  (local to host host)
>>     Creation Time : Sun Aug 28 17:46:27 2011
>>        Raid Level : -unknown-
>>      Raid Devices : 0
>>
>>    Avail Dev Size : 456165376 (217.52 GiB 233.56 GB)
>>       Data Offset : 2048 sectors
>>      Super Offset : 8 sectors
>>             State : active
>>       Device UUID : nnnn
>>
>>       Update Time : Mon May 28 20:52:35 2012
>>          Checksum : ac17255 - correct
>>            Events : 1
>>
>>
>>      Device Role : spare
>>      Array State :  ('A' == active, '.' == missing)
>>
>> sdb6 lists identical content only with the checksum's being correbt,
>> albeit different and of coruse the Device UUID. Array UUID is of course
>> identical as is creation time.
>>
>> Also to note, is that grub2 does mention an 'error: Unsupported RAID
>> level: -1000000.' which probably relates to the 'Raid Level: -unknown-'.
>>
>> As to what may have caused this? I have absolutely no idea. I did a
>> clean shutdown where the arrays get cleanly unmounted. Not 100% sure if
>> the arrays get --stopped but I would be surprised if they did not.
>>
>> So I guess is this a md driver bug? Is there anything I can do to
>> recover my data, which i cannot image it not being?
>
> This is a known bug which has been fixed.  You are now running 3.4 so are
> safe from it.
Well this strange behavior all stemmed from running 3.2.1. I've only 
upgraded to 3.4 to see if that 'fixes' it. (It didn't :( unfortunately).

The -100000 error I'm assuming for now also stems from the meta data 
being corrupt, and will probably go away when trying the below tomorrow :)

> You can recover your data by re-creating the array.
>
>    mdadm -C /dev/md2 -l10 -n2 --layout o2 --assume-clean \
>    -e 1.2  /dev/sda6 /dev/sdb6
>
> Check that I have that right - don't just assume :-)
That looks very similar to what I used to create the array with, except 
the assume-clean part. I wonder however, would it not wiser to create 
the array using /dev/sda6 missing thus creating a degraded array? 
Atleast I'll still have the sdb6 which MAY contain the data also (since 
only sda6 'apparently' has wrong state?

Also, would it not be possible to mount sdb6 using the correct offset? I 
remember raid1 array's could simply be mounted. (with a 2 disk raid10, 
from what I understand, atleast 1 disk may be mountable?)

>
> when you have created the array, check that the 'Data Offset' is still
> correct, then if it is "fsck -n" the array to ensure everything looks good.
> Then you should be back in business.
I should then be able to compare it to md1/sda5 and /dev/sdb5. Since md1 
and md2 where created with identical settings, they should be almost the 
same when comparing :)


So to summarize, my array went foobar due to an old known bug and the 
only way to fix it is to recreate the array, leaving the actual data in 
place. The FS _should_ start after 2048 sectors on the disk.

>
> NeilBrown
>
>
>
>>
>> Thanks in advance for reading this.
>>
>> Oliver
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thank you so far for your help!
Oliver

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: raid10 devices all marked as spares?!
  2012-05-28 22:44   ` Oliver Schinagl
@ 2012-05-28 23:09     ` NeilBrown
  2012-05-29 18:44       ` Oliver Schinagl
  0 siblings, 1 reply; 8+ messages in thread
From: NeilBrown @ 2012-05-28 23:09 UTC (permalink / raw)
  To: Oliver Schinagl; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 10445 bytes --]

On Tue, 29 May 2012 00:44:55 +0200 Oliver Schinagl <oliver+list@schinagl.nl>
wrote:

> On 05/29/12 00:07, NeilBrown wrote:
> > On Mon, 28 May 2012 22:50:03 +0200 Oliver Schinagl<oliver+list@schinagl.nl>
> > wrote:
> >
> >> Hi list,
> >>
> >> I'm sorry if this is the wrong place to start, but I've been quite lost
> >> as to what is going wrong here.
> >
> > No, you are in exactly the right place!
> Pfew :D
> 
> >
> >>
> >> I've been having some issues latly with my raid10 array. First some info.
> >>
> >> I have three raid10 arrays on my gentoo box on 2 drives using GPT.
> >> I was running 3.2.1 at the time but have 3.4.0 running at the moment.
> >> mdadm - v3.2.5 - 18th May 2012
> >>
> >>
> >> md0, a 2 far-copies, 1.2 metadata, raid10 array  consisting of /dev/sda4
> >> and sdb4.
> >> md1, a 2 offset-copies, 1.2 metadata, raid10 array consisting of
> >> /dev/sda5 and sdb5
> >> md2, a 2 offset-copies, 1.2 metadata, raid10 array consisting of
> >> /dev/sda6 and sdb6
> >
> > I'm liking the level of detail you are providing - thanks.
> The more information provided, the better I always recon!
> 
> >
> >>
> >> sd*1 is bios_grub data, sd*2 is 256mb fat for playing with uefi and sd*3
> >> is 8gigs of unused space, may have some version of ubuntu on it and sd*7
> >> for swap.
> >>
> >> For all of this, md0 has always worked normally. it is being assembled
> >> from initramfs where a static mdadm lives as such:
> >> /bin/mdadm -A /dev/md0 -R -a md /dev/sda4 /dev/sdb4 || exit 1
> >
> > In general I wouldn't recommend this.  Names of sd devices change when
> > devices are removed or added, so this is fragile.  It may cause the actual
> > problems you have been experiencing currently.
> Yes! Yes yes yes! I know. Kinda offtopic here, but:
> 
> I've always used a small 100mb -250mb (or 1gb on my desktop) array using 
> metadata 0.9 and autodetect. This worked perfectly. /usr, /home etc 
> where on exotic raid setups (metadata 1.2 etc) but this all just worked 
> (tm).
> 
> Recently Fedora decided booting and /usr was a mess and not long after 
> udev (i belive not only in gentoo) 'agreed' that /usr and / should be 
> merged.
> 
> With 0.90 autodetect being depreciated by the kernel anyway I decided to 
> bite the bullit and use my 8gb /usr as combined / and /usr. Now however 
> I was 'forced' to also use an initramfs to get my raid array going. Long 
> story short, I just quickly hacked that together as minimally as 
> possible, as I haven't found any 'clean' way to do 
> it/documented/recommended way to copy.
> 
> It's not only just error-prone, It's also broken. Having a disk missing 
> or fail, causes kernel panics due to init pre-maturly failing because 
> mdadm fails at finding /dev/sdb.

If you want to build a simple-as-possible initramfs to assemble you root md
array, I recommend  README.initramfs in the mdadm source code.

http://neil.brown.name/git?p=mdadm;a=blob;f=README.initramfs;h=8f9b8ddffb32f1eb3dc087ccda9bc0ff93870a33;hb=HEAD

> 
> >
> >>
> >> md1 and md2 are being brought up during boot, md0 holds root, /usr etc
> >> wheras md1 are just for home and data.
> >>
> >> The last few weeks md1 and md2 randomly fail to come up properly. md1 or
> >> md2 come up as inactive and one of the two drivers are marked as spares.
> >> (Why as spares? Why won't it try to run the array with a missing drive?)
> >> While this happens, it's completly abitrary whether sda or sdb is being
> >> used. so md1 can be sda5[2](S) and md2 can be sdb5[2](S).
> >
> > The (S) is a bit misleading here.  When an array is 'inactive', all devices
> > are marked as '(S)', because they are not currently active (nothing is as the
> > whole array is inactive).
> >
> > When md1 has sda5[2](S), is sdb5 mentioned for md1 as well, or is it simply
> > absent.   I'm guessing the second.
> Yes, i'm 99% sure however it's randomly sda6 or sdb6 that's shown. But 
> never both. Only if I do mdadm -A /dev/md2 /dev/sda6 /dev/sdb6 (after 
> stopping md2 first of course).

That makes sense.

> 
> >
> > This it most likely caused by "mdadm -I" being run by udev on device
> > discovery.  Possibly it is racing with an "mdadm -A" run from a boot script.
> > Have a look for a udev/rules.d script which run mdadm -I and maybe disable it
> > and see what happens.
> In init.d I find two scripts calling mdadm. /etc/init.d/mdadm only does 
> monitoring (mdadm --monitor --scan --daemonize) I strongly doubt that 
> forces any of the assembling? (even though scan is there?)
> 
> /etc/init.d/mdraid does some md stuff, but that's not run nor enabled.
> 
> BTW, I only start md0 from initramfs, so udev apparently does the rest 
> (and sucks at it?)

Let's just say it is not yet perfect at it.
Now that you are using mdadm 3.2.5 there is a better chance that it will work
reliably.  Is it still failing?


> 
> >
> >>
> >> When this happens, I mdadm --stop /dev/md1 and /dev/md2, followed
> >> immediatly by mdadm -A /dev/md1 (using mdadm.conf which doesn't even
> >> list the devices. ARRAY /dev/md1 metadata=1.2 UUID=nnn name=host:home).
> >> The arrays come up and work just fine.
> >>
> >> What happend today however, is that md2 again does not come up, and
> >> sda6[3](S) shows in /proc/mdadm. However re-assembly of the array fails
> >> and only using mdadm -A /dev/md2 /dev/sda6 /dev/sdb6 shows:
> >> mdadm: device 1 in /dev/md2 has wrong state in superblock, but /dev/sdb6
> >> seems ok
> >> mdadm: /dev/md2 assembled from 0 drives and 2 spares - not enough to
> >> start the array.
> >> /proc/mdadm shows as somewhat expected.
> >> md2 : inactive sda6[3](S) sdb6[2](S)
> >>
> >> Only using sdb6 however also fails. I guess because it does not want to
> >> use a spare.
> >> mdadm: failed to RUN_ARRAY /dev/md2: Invalid argument
> >> mdadm: Not enough devices to start the array.
> >>
> >> Now the really disturbing part comes from mdadm --examine.
> >> valexia oliver # mdadm --examine /dev/sda6
> >> /dev/sda6:
> >>             Magic : a92b4efc
> >>           Version : 1.2
> >>       Feature Map : 0x0
> >>        Array UUID : nnnn
> >>              Name : host:opt  (local to host host)
> >>     Creation Time : Sun Aug 28 17:46:27 2011
> >>        Raid Level : -unknown-
> >>      Raid Devices : 0
> >>
> >>    Avail Dev Size : 456165376 (217.52 GiB 233.56 GB)
> >>       Data Offset : 2048 sectors
> >>      Super Offset : 8 sectors
> >>             State : active
> >>       Device UUID : nnnn
> >>
> >>       Update Time : Mon May 28 20:52:35 2012
> >>          Checksum : ac17255 - correct
> >>            Events : 1
> >>
> >>
> >>      Device Role : spare
> >>      Array State :  ('A' == active, '.' == missing)
> >>
> >> sdb6 lists identical content only with the checksum's being correbt,
> >> albeit different and of coruse the Device UUID. Array UUID is of course
> >> identical as is creation time.
> >>
> >> Also to note, is that grub2 does mention an 'error: Unsupported RAID
> >> level: -1000000.' which probably relates to the 'Raid Level: -unknown-'.
> >>
> >> As to what may have caused this? I have absolutely no idea. I did a
> >> clean shutdown where the arrays get cleanly unmounted. Not 100% sure if
> >> the arrays get --stopped but I would be surprised if they did not.
> >>
> >> So I guess is this a md driver bug? Is there anything I can do to
> >> recover my data, which i cannot image it not being?
> >
> > This is a known bug which has been fixed.  You are now running 3.4 so are
> > safe from it.
> Well this strange behavior all stemmed from running 3.2.1. I've only 
> upgraded to 3.4 to see if that 'fixes' it. (It didn't :( unfortunately).

I think that bug was in 3.3 and 3.2.1 and fixed in 3.4 and some later 3.3.y
and 3.2.y kernels.
The bug would cause an array to be destroyed at shutdown in some unusual
circumstances.  Once destroyed you need to re-create.

> 
> The -100000 error I'm assuming for now also stems from the meta data 
> being corrupt, and will probably go away when trying the below tomorrow :)

Correct.

> 
> > You can recover your data by re-creating the array.
> >
> >    mdadm -C /dev/md2 -l10 -n2 --layout o2 --assume-clean \
> >    -e 1.2  /dev/sda6 /dev/sdb6
> >
> > Check that I have that right - don't just assume :-)
> That looks very similar to what I used to create the array with, except 
> the assume-clean part. I wonder however, would it not wiser to create 
> the array using /dev/sda6 missing thus creating a degraded array? 
> Atleast I'll still have the sdb6 which MAY contain the data also (since 
> only sda6 'apparently' has wrong state?

That would be a suitable approach - arguably safer.  If you feel more
comfortable with it, then that is a strong reason to follow that course.

> 
> Also, would it not be possible to mount sdb6 using the correct offset? I 
> remember raid1 array's could simply be mounted. (with a 2 disk raid10, 
> from what I understand, atleast 1 disk may be mountable?)

No.  That only works with the 'near' layout.  You used an 'offset' layout.
So if the chunks of data are 'A B C D E F G H', then the first disk contains
 A B C D E F G H
but the second disk contains
 B A D C F E H G
so you could mount sda6 if you added an offset with losetup or similar, but
not sdb6.

> 
> >
> > when you have created the array, check that the 'Data Offset' is still
> > correct, then if it is "fsck -n" the array to ensure everything looks good.
> > Then you should be back in business.
> I should then be able to compare it to md1/sda5 and /dev/sdb5. Since md1 
> and md2 where created with identical settings, they should be almost the 
> same when comparing :)
> 
> 
> So to summarize, my array went foobar due to an old known bug and the 
> only way to fix it is to recreate the array, leaving the actual data in 
> place. The FS _should_ start after 2048 sectors on the disk.

Correct.

NeilBrown

> 
> >
> > NeilBrown
> >
> >
> >
> >>
> >> Thanks in advance for reading this.
> >>
> >> Oliver
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> Thank you so far for your help!
> Oliver


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: raid10 devices all marked as spares?!
  2012-05-28 23:09     ` NeilBrown
@ 2012-05-29 18:44       ` Oliver Schinagl
  2012-05-29 18:48         ` Oliver Schinagl
  0 siblings, 1 reply; 8+ messages in thread
From: Oliver Schinagl @ 2012-05-29 18:44 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On 29-05-12 01:09, NeilBrown wrote:
> On Tue, 29 May 2012 00:44:55 +0200 Oliver Schinagl<oliver+list@schinagl.nl>
> wrote:
>
>> On 05/29/12 00:07, NeilBrown wrote:
>>> On Mon, 28 May 2012 22:50:03 +0200 Oliver Schinagl<oliver+list@schinagl.nl>
>>> wrote:
>>>
>>>> Hi list,
>>>>
>>>> I'm sorry if this is the wrong place to start, but I've been quite lost
>>>> as to what is going wrong here.
>>> No, you are in exactly the right place!
>> Pfew :D
>>
>>>> I've been having some issues latly with my raid10 array. First some info.
>>>>
>>>> I have three raid10 arrays on my gentoo box on 2 drives using GPT.
>>>> I was running 3.2.1 at the time but have 3.4.0 running at the moment.
>>>> mdadm - v3.2.5 - 18th May 2012
>>>>
>>>>
>>>> md0, a 2 far-copies, 1.2 metadata, raid10 array  consisting of /dev/sda4
>>>> and sdb4.
>>>> md1, a 2 offset-copies, 1.2 metadata, raid10 array consisting of
>>>> /dev/sda5 and sdb5
>>>> md2, a 2 offset-copies, 1.2 metadata, raid10 array consisting of
>>>> /dev/sda6 and sdb6
>>> I'm liking the level of detail you are providing - thanks.
>> The more information provided, the better I always recon!
>>
>>>> sd*1 is bios_grub data, sd*2 is 256mb fat for playing with uefi and sd*3
>>>> is 8gigs of unused space, may have some version of ubuntu on it and sd*7
>>>> for swap.
>>>>
>>>> For all of this, md0 has always worked normally. it is being assembled
>>>> from initramfs where a static mdadm lives as such:
>>>> /bin/mdadm -A /dev/md0 -R -a md /dev/sda4 /dev/sdb4 || exit 1
>>> In general I wouldn't recommend this.  Names of sd devices change when
>>> devices are removed or added, so this is fragile.  It may cause the actual
>>> problems you have been experiencing currently.
>> Yes! Yes yes yes! I know. Kinda offtopic here, but:
>>
>> I've always used a small 100mb -250mb (or 1gb on my desktop) array using
>> metadata 0.9 and autodetect. This worked perfectly. /usr, /home etc
>> where on exotic raid setups (metadata 1.2 etc) but this all just worked
>> (tm).
>>
>> Recently Fedora decided booting and /usr was a mess and not long after
>> udev (i belive not only in gentoo) 'agreed' that /usr and / should be
>> merged.
>>
>> With 0.90 autodetect being depreciated by the kernel anyway I decided to
>> bite the bullit and use my 8gb /usr as combined / and /usr. Now however
>> I was 'forced' to also use an initramfs to get my raid array going. Long
>> story short, I just quickly hacked that together as minimally as
>> possible, as I haven't found any 'clean' way to do
>> it/documented/recommended way to copy.
>>
>> It's not only just error-prone, It's also broken. Having a disk missing
>> or fail, causes kernel panics due to init pre-maturly failing because
>> mdadm fails at finding /dev/sdb.
> If you want to build a simple-as-possible initramfs to assemble you root md
> array, I recommend  README.initramfs in the mdadm source code.
>
> http://neil.brown.name/git?p=mdadm;a=blob;f=README.initramfs;h=8f9b8ddffb32f1eb3dc087ccda9bc0ff93870a33;hb=HEAD
>
>>>> md1 and md2 are being brought up during boot, md0 holds root, /usr etc
>>>> wheras md1 are just for home and data.
>>>>
>>>> The last few weeks md1 and md2 randomly fail to come up properly. md1 or
>>>> md2 come up as inactive and one of the two drivers are marked as spares.
>>>> (Why as spares? Why won't it try to run the array with a missing drive?)
>>>> While this happens, it's completly abitrary whether sda or sdb is being
>>>> used. so md1 can be sda5[2](S) and md2 can be sdb5[2](S).
>>> The (S) is a bit misleading here.  When an array is 'inactive', all devices
>>> are marked as '(S)', because they are not currently active (nothing is as the
>>> whole array is inactive).
>>>
>>> When md1 has sda5[2](S), is sdb5 mentioned for md1 as well, or is it simply
>>> absent.   I'm guessing the second.
>> Yes, i'm 99% sure however it's randomly sda6 or sdb6 that's shown. But
>> never both. Only if I do mdadm -A /dev/md2 /dev/sda6 /dev/sdb6 (after
>> stopping md2 first of course).
> That makes sense.
>
>>> This it most likely caused by "mdadm -I" being run by udev on device
>>> discovery.  Possibly it is racing with an "mdadm -A" run from a boot script.
>>> Have a look for a udev/rules.d script which run mdadm -I and maybe disable it
>>> and see what happens.
>> In init.d I find two scripts calling mdadm. /etc/init.d/mdadm only does
>> monitoring (mdadm --monitor --scan --daemonize) I strongly doubt that
>> forces any of the assembling? (even though scan is there?)
>>
>> /etc/init.d/mdraid does some md stuff, but that's not run nor enabled.
>>
>> BTW, I only start md0 from initramfs, so udev apparently does the rest
>> (and sucks at it?)
> Let's just say it is not yet perfect at it.
> Now that you are using mdadm 3.2.5 there is a better chance that it will work
> reliably.  Is it still failing?
Just booted 5 minutes ago, and yes, only sdb5 shows for md1. mdadm 
--stop /dev/md1; mdadm -A /dev/md1 made it come up just fine. I have 
removed mdadm (--monitor --scan) from my boot up scripts now and will 
keep an eye on this. (It was in the boot runlevel however, don't know if 
that was my mistake or not).
>
>
>>>> When this happens, I mdadm --stop /dev/md1 and /dev/md2, followed
>>>> immediatly by mdadm -A /dev/md1 (using mdadm.conf which doesn't even
>>>> list the devices. ARRAY /dev/md1 metadata=1.2 UUID=nnn name=host:home).
>>>> The arrays come up and work just fine.
>>>>
>>>> What happend today however, is that md2 again does not come up, and
>>>> sda6[3](S) shows in /proc/mdadm. However re-assembly of the array fails
>>>> and only using mdadm -A /dev/md2 /dev/sda6 /dev/sdb6 shows:
>>>> mdadm: device 1 in /dev/md2 has wrong state in superblock, but /dev/sdb6
>>>> seems ok
>>>> mdadm: /dev/md2 assembled from 0 drives and 2 spares - not enough to
>>>> start the array.
>>>> /proc/mdadm shows as somewhat expected.
>>>> md2 : inactive sda6[3](S) sdb6[2](S)
>>>>
>>>> Only using sdb6 however also fails. I guess because it does not want to
>>>> use a spare.
>>>> mdadm: failed to RUN_ARRAY /dev/md2: Invalid argument
>>>> mdadm: Not enough devices to start the array.
>>>>
>>>> Now the really disturbing part comes from mdadm --examine.
>>>> valexia oliver # mdadm --examine /dev/sda6
>>>> /dev/sda6:
>>>>              Magic : a92b4efc
>>>>            Version : 1.2
>>>>        Feature Map : 0x0
>>>>         Array UUID : nnnn
>>>>               Name : host:opt  (local to host host)
>>>>      Creation Time : Sun Aug 28 17:46:27 2011
>>>>         Raid Level : -unknown-
>>>>       Raid Devices : 0
>>>>
>>>>     Avail Dev Size : 456165376 (217.52 GiB 233.56 GB)
>>>>        Data Offset : 2048 sectors
>>>>       Super Offset : 8 sectors
>>>>              State : active
>>>>        Device UUID : nnnn
>>>>
>>>>        Update Time : Mon May 28 20:52:35 2012
>>>>           Checksum : ac17255 - correct
>>>>             Events : 1
>>>>
>>>>
>>>>       Device Role : spare
>>>>       Array State :  ('A' == active, '.' == missing)
>>>>
>>>> sdb6 lists identical content only with the checksum's being correbt,
>>>> albeit different and of coruse the Device UUID. Array UUID is of course
>>>> identical as is creation time.
>>>>
>>>> Also to note, is that grub2 does mention an 'error: Unsupported RAID
>>>> level: -1000000.' which probably relates to the 'Raid Level: -unknown-'.
>>>>
>>>> As to what may have caused this? I have absolutely no idea. I did a
>>>> clean shutdown where the arrays get cleanly unmounted. Not 100% sure if
>>>> the arrays get --stopped but I would be surprised if they did not.
>>>>
>>>> So I guess is this a md driver bug? Is there anything I can do to
>>>> recover my data, which i cannot image it not being?
>>> This is a known bug which has been fixed.  You are now running 3.4 so are
>>> safe from it.
>> Well this strange behavior all stemmed from running 3.2.1. I've only
>> upgraded to 3.4 to see if that 'fixes' it. (It didn't :( unfortunately).
> I think that bug was in 3.3 and 3.2.1 and fixed in 3.4 and some later 3.3.y
> and 3.2.y kernels.
> The bug would cause an array to be destroyed at shutdown in some unusual
> circumstances.  Once destroyed you need to re-create.
>
>> The -100000 error I'm assuming for now also stems from the meta data
>> being corrupt, and will probably go away when trying the below tomorrow :)
> Correct.
>
>>> You can recover your data by re-creating the array.
>>>
>>>     mdadm -C /dev/md2 -l10 -n2 --layout o2 --assume-clean \
>>>     -e 1.2  /dev/sda6 /dev/sdb6
>>>
>>> Check that I have that right - don't just assume :-)
>> That looks very similar to what I used to create the array with, except
>> the assume-clean part. I wonder however, would it not wiser to create
>> the array using /dev/sda6 missing thus creating a degraded array?
>> Atleast I'll still have the sdb6 which MAY contain the data also (since
>> only sda6 'apparently' has wrong state?
> That would be a suitable approach - arguably safer.  If you feel more
> comfortable with it, then that is a strong reason to follow that course.
I have tried that on sda6 but it cannot file a filesystem when trying to 
mount md2. This of course is quite scary. I am now slightly doubting if 
my chunksize is the same as before, 128k.

I've used the following command.
mdadm -C /dev/md2 -c 128 -l 10 -p o2 --assume-clean -e 1.2 -n 2 
--name=opt /dev/sda6 missing

Now I could try the same on sdb6 and hope that does work, but slightly 
scared of loosing everything on that partition, it could be possible of 
course that sdb6 is the partition that has everything in the 'proper' 
order? I will try to losetup sdb6 with an offset and see if that is 
mountable.
>
>> Also, would it not be possible to mount sdb6 using the correct offset? I
>> remember raid1 array's could simply be mounted. (with a 2 disk raid10,
>> from what I understand, atleast 1 disk may be mountable?)
> No.  That only works with the 'near' layout.  You used an 'offset' layout.
> So if the chunks of data are 'A B C D E F G H', then the first disk contains
>   A B C D E F G H
> but the second disk contains
>   B A D C F E H G
> so you could mount sda6 if you added an offset with losetup or similar, but
> not sdb6.
That's what I ment with atleast 1 disk :) This as asked above, could 
also be the other way around, right? sdb6 mountable but sda6 not?
>>> when you have created the array, check that the 'Data Offset' is still
>>> correct, then if it is "fsck -n" the array to ensure everything looks good.
>>> Then you should be back in business.
>> I should then be able to compare it to md1/sda5 and /dev/sdb5. Since md1
>> and md2 where created with identical settings, they should be almost the
>> same when comparing :)
>>
>>
>> So to summarize, my array went foobar due to an old known bug and the
>> only way to fix it is to recreate the array, leaving the actual data in
>> place. The FS _should_ start after 2048 sectors on the disk.
> Correct.
This would also be the location for the offset? 2048 *sectorsize (+1)?
>
> NeilBrown
Oliver
>
>>> NeilBrown
>>>
>>>
>>>
>>>> Thanks in advance for reading this.
>>>>
>>>> Oliver
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Thank you so far for your help!
>> Oliver


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: raid10 devices all marked as spares?!
  2012-05-29 18:44       ` Oliver Schinagl
@ 2012-05-29 18:48         ` Oliver Schinagl
  2012-05-30  1:14           ` NeilBrown
  0 siblings, 1 reply; 8+ messages in thread
From: Oliver Schinagl @ 2012-05-29 18:48 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On 29-05-12 20:44, Oliver Schinagl wrote:
> <snip>
>>>> You can recover your data by re-creating the array.
>>>>
>>>>     mdadm -C /dev/md2 -l10 -n2 --layout o2 --assume-clean \
>>>>     -e 1.2  /dev/sda6 /dev/sdb6
>>>>
>>>> Check that I have that right - don't just assume :-)
>>> That looks very similar to what I used to create the array with, except
>>> the assume-clean part. I wonder however, would it not wiser to create
>>> the array using /dev/sda6 missing thus creating a degraded array?
>>> Atleast I'll still have the sdb6 which MAY contain the data also (since
>>> only sda6 'apparently' has wrong state?
>> That would be a suitable approach - arguably safer.  If you feel more
>> comfortable with it, then that is a strong reason to follow that course.
>
> I have tried that on sda6 but it cannot file a filesystem when trying 
> to mount md2. This of course is quite scary. I am now slightly 
> doubting if my chunksize is the same as before, 128k.
>
> I've used the following command.
> mdadm -C /dev/md2 -c 128 -l 10 -p o2 --assume-clean -e 1.2 -n 2 
> --name=opt /dev/sda6 missing
>
> Now I could try the same on sdb6 and hope that does work, but slightly 
> scared of loosing everything on that partition, it could be possible 
> of course that sdb6 is the partition that has everything in the 
> 'proper' order? I will try to losetup sdb6 with an offset and see if 
> that is mountable.
Also, I forgot to mention, the thing that is really strange, is that the 
data offset is somewhere extremely strange.

Data Offset : 262144 sectors

where sda4 and sdb5 (md0 and 1) both have 2048, which sounds common and 
sensible.

<snip>

Oliver

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: raid10 devices all marked as spares?!
  2012-05-29 18:48         ` Oliver Schinagl
@ 2012-05-30  1:14           ` NeilBrown
  2012-05-30  7:11             ` Oliver Schinagl
  0 siblings, 1 reply; 8+ messages in thread
From: NeilBrown @ 2012-05-30  1:14 UTC (permalink / raw)
  To: Oliver Schinagl; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2167 bytes --]

On Tue, 29 May 2012 20:48:42 +0200 Oliver Schinagl <oliverlist@schinagl.nl>
wrote:

> On 29-05-12 20:44, Oliver Schinagl wrote:
> > <snip>
> >>>> You can recover your data by re-creating the array.
> >>>>
> >>>>     mdadm -C /dev/md2 -l10 -n2 --layout o2 --assume-clean \
> >>>>     -e 1.2  /dev/sda6 /dev/sdb6
> >>>>
> >>>> Check that I have that right - don't just assume :-)
> >>> That looks very similar to what I used to create the array with, except
> >>> the assume-clean part. I wonder however, would it not wiser to create
> >>> the array using /dev/sda6 missing thus creating a degraded array?
> >>> Atleast I'll still have the sdb6 which MAY contain the data also (since
> >>> only sda6 'apparently' has wrong state?
> >> That would be a suitable approach - arguably safer.  If you feel more
> >> comfortable with it, then that is a strong reason to follow that course.
> >
> > I have tried that on sda6 but it cannot file a filesystem when trying 
> > to mount md2. This of course is quite scary. I am now slightly 
> > doubting if my chunksize is the same as before, 128k.
> >
> > I've used the following command.
> > mdadm -C /dev/md2 -c 128 -l 10 -p o2 --assume-clean -e 1.2 -n 2 
> > --name=opt /dev/sda6 missing
> >
> > Now I could try the same on sdb6 and hope that does work, but slightly 
> > scared of loosing everything on that partition, it could be possible 
> > of course that sdb6 is the partition that has everything in the 
> > 'proper' order? I will try to losetup sdb6 with an offset and see if 
> > that is mountable.
> Also, I forgot to mention, the thing that is really strange, is that the 
> data offset is somewhere extremely strange.
> 
> Data Offset : 262144 sectors

128MB.

> 
> where sda4 and sdb5 (md0 and 1) both have 2048, which sounds common and 
> sensible.

You'll need to use an older mdadm which uses the 2048 (1MB) offset.
The next mdadm (3.3) will have a --data-offset option to make this easier to
control.  For now you need 3.2.3 or earlier.
That should make your filesystem accessible.  If it doesn't try a different
chunk size. Maybe 64, maybe 512.

NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: raid10 devices all marked as spares?!
  2012-05-30  1:14           ` NeilBrown
@ 2012-05-30  7:11             ` Oliver Schinagl
  0 siblings, 0 replies; 8+ messages in thread
From: Oliver Schinagl @ 2012-05-30  7:11 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On 30-05-12 03:14, NeilBrown wrote:
> On Tue, 29 May 2012 20:48:42 +0200 Oliver Schinagl<oliverlist@schinagl.nl>
> wrote:
>
>> On 29-05-12 20:44, Oliver Schinagl wrote:
>>> <snip>
>>>>>> You can recover your data by re-creating the array.
>>>>>>
>>>>>>      mdadm -C /dev/md2 -l10 -n2 --layout o2 --assume-clean \
>>>>>>      -e 1.2  /dev/sda6 /dev/sdb6
>>>>>>
>>>>>> Check that I have that right - don't just assume :-)
>>>>> That looks very similar to what I used to create the array with, except
>>>>> the assume-clean part. I wonder however, would it not wiser to create
>>>>> the array using /dev/sda6 missing thus creating a degraded array?
>>>>> Atleast I'll still have the sdb6 which MAY contain the data also (since
>>>>> only sda6 'apparently' has wrong state?
>>>> That would be a suitable approach - arguably safer.  If you feel more
>>>> comfortable with it, then that is a strong reason to follow that course.
>>> I have tried that on sda6 but it cannot file a filesystem when trying
>>> to mount md2. This of course is quite scary. I am now slightly
>>> doubting if my chunksize is the same as before, 128k.
>>>
>>> I've used the following command.
>>> mdadm -C /dev/md2 -c 128 -l 10 -p o2 --assume-clean -e 1.2 -n 2
>>> --name=opt /dev/sda6 missing
>>>
>>> Now I could try the same on sdb6 and hope that does work, but slightly
>>> scared of loosing everything on that partition, it could be possible
>>> of course that sdb6 is the partition that has everything in the
>>> 'proper' order? I will try to losetup sdb6 with an offset and see if
>>> that is mountable.
>> Also, I forgot to mention, the thing that is really strange, is that the
>> data offset is somewhere extremely strange.
>>
>> Data Offset : 262144 sectors
> 128MB.
>
>> where sda4 and sdb5 (md0 and 1) both have 2048, which sounds common and
>> sensible.
> You'll need to use an older mdadm which uses the 2048 (1MB) offset.
> The next mdadm (3.3) will have a --data-offset option to make this easier to
> control.  For now you need 3.2.3 or earlier.
> That should make your filesystem accessible.  If it doesn't try a different
> chunk size. Maybe 64, maybe 512.
>
> NeilBrown
Ah! Well I got scared at one point and booted some ubuntu liveUSB which 
actually did have the 2048 offset. However I believe that the 128MB 
offset however broke the start of my ext4 partition. I've used testdisk, 
which does find an ext4 partition, but it is empty. All is not lost yet 
I suppose, as I can always try scanning for backup superblocks and use 
those to repair the ext4 partition? (Hadn't found any backup blocks yet)

Luckly I still have sdb6 which I will dd into an image onto a backup 
harddisk and see if I can make mdadm use that. *fingers crossed*


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-05-30  7:11 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-28 20:50 raid10 devices all marked as spares?! Oliver Schinagl
2012-05-28 22:07 ` NeilBrown
2012-05-28 22:44   ` Oliver Schinagl
2012-05-28 23:09     ` NeilBrown
2012-05-29 18:44       ` Oliver Schinagl
2012-05-29 18:48         ` Oliver Schinagl
2012-05-30  1:14           ` NeilBrown
2012-05-30  7:11             ` Oliver Schinagl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).