imsm woes (and a small bug in mdadm)

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* imsm woes (and a small bug in mdadm)
@ 2009-12-22 17:51 Luca Berra
  2009-12-22 23:57 ` Dan Williams
  0 siblings, 1 reply; 4+ messages in thread
From: Luca Berra @ 2009-12-22 17:51 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2443 bytes --]

Note for Neil/Dan:
This email could be long and boring, the attached patch prevents a
segfault on 3.0.3 and 3.1.1, at least have a look at it.

Hello,
I have this system at home I use for dev/testing/leisure/whatever.
it has an asus mb with an embedded intel 82801 sata fakeraid (imsm) with
two WD10EADS 1T disks.
I created a mirrored container with two volumes, first one windows the
second linux.
Yesterday windows crashed, no surprise there, the surprise was that
after the crash the controller marked the first drive as failed, instead
of running the usual verify.
I readded the drive from the windows storage manager console, and since
it told me rebuild would take 50+ hours i decided to leave it going.
(the windows software is idiotic, it tries to rebuild both volumes in
parallel)
In the morning i found the drive failed rebuild, so i replaced it (will
do some tests on it and rma when i have spare time).
In order to avoid waiting 50 hours to see if it finished i decided to
try rebuilding it under linux, the linux box used dmraid instead of
mdadm and was obviously unable to boot (did i ever mention redhat/fedora
mkinitrd sucks).
I booted linux from a rescue cd and rebuilt the raid using mdadm 3.0.2.
It took only 3 hours.
now real trouble started
After reboot the intel bios showed both drives as "Offline Member"
back to the rescue cd. mdadm 3.0.2 activated the container but the two
volumes were activated using only /dev/sda (NOTE: this is the new drive
i put in this same morning, not the old one)
Seeing that mdadm 3.0.3 had some fixes related to imsm i built that
instead and tried activating the array. unfortunately it segfaulted,
tried 3.1.1: same segfault
fire gdb, bt
found in super-intel.c around line 2430 a call to
disk_list_get with the serial of /dev/sdb as first argument, which fails
returning null.
created the attached patch and rebuilt mdadm.
still it activated the container with two drives and the volume with
only one.
i lost my patience and mdadm -r /dev/md/imsm0 /dev/sdb, mdadm -a ....

it is now rebuilding

i still have to see what bios thinks of the raid when i reboot


attached, besides the patch are
mdadm -Dsvv and mdadm -Esvv before and after the hot-remove-add, in case
someone has an idea about what might had happened.


Regards,
L.

-- 
Luca Berra -- bluca@comedia.it
         Communication Media & Services S.r.l.
  /"\
  \ /     ASCII RIBBON CAMPAIGN
   X        AGAINST HTML MAIL
  / \

[-- Attachment #2: super-intel.diff --]
[-- Type: text/plain, Size: 461 bytes --]

--- super-intel.c.old	2009-12-22 17:53:56.154622836 +0000
+++ super-intel.c	2009-12-22 17:53:54.362629847 +0000
@@ -2428,6 +2428,7 @@
 			struct intel_disk *idisk;
 
 			idisk = disk_list_get(dl->serial, disk_list);
+			if(idisk) {
 			if (is_spare(&idisk->disk) &&
 			    !is_failed(&idisk->disk) && !is_configured(&idisk->disk))
 				dl->index = -1;
@@ -2435,6 +2436,7 @@
 				dl->index = -2;
 				continue;
 			}
+			}
 		}
 
 		dl->next = champion->disks;

[-- Attachment #3: mdadm-Dsvv-after.txt --]
[-- Type: text/plain, Size: 1734 bytes --]

/dev/md/imsm0:
        Version : imsm
     Raid Level : container
  Total Devices : 2

    Update Time : Tue Dec 22 17:59:45 2009
Working Devices : 2


           UUID : bee71637:467b6ae8:e1cf2626:185271b8
  Member Arrays :

    Number   Major   Minor   RaidDevice

       0       8       16        -        /dev/sdb
       1       8        0        -        /dev/sda
/dev/md/Volume0_0:
      Container : /dev/md/127, member 0
     Raid Level : raid1
     Array Size : 488636416 (466.00 GiB 500.36 GB)
  Used Dev Size : 488636548 (466.00 GiB 500.36 GB)
   Raid Devices : 2
  Total Devices : 2

    Update Time : Tue Dec 22 17:58:39 2009
          State : clean, degraded
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1

      New Level : raid0
  New Chunksize : 1K


           UUID : 5ee03c91:f3537647:88da79af:38833b16
    Number   Major   Minor   RaidDevice State
       0       8        0        0      active sync   /dev/sda
       2       8       16        1      spare rebuilding   /dev/sdb
/dev/md/Volume1_0:
      Container : /dev/md/127, member 1
     Raid Level : raid1
     Array Size : 488121344 (465.51 GiB 499.84 GB)
  Used Dev Size : 488121476 (465.51 GiB 499.84 GB)
   Raid Devices : 2
  Total Devices : 2

    Update Time : Tue Dec 22 17:58:39 2009
          State : clean, degraded, recovering
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1

 Reshape Status : 0% complete
      New Level : raid0
  New Chunksize : 1K


           UUID : 81598ffc:8420e261:bf676997:6c7a894f
    Number   Major   Minor   RaidDevice State
       0       8        0        0      active sync   /dev/sda
       2       8       16        1      spare rebuilding   /dev/sdb

[-- Attachment #4: mdadm-Dsvv-before.txt --]
[-- Type: text/plain, Size: 1609 bytes --]

/dev/md/imsm0:
        Version : imsm
     Raid Level : container
  Total Devices : 2

Working Devices : 2


           UUID : bee71637:467b6ae8:e1cf2626:185271b8
  Member Arrays :

    Number   Major   Minor   RaidDevice

       0       8       16        -        /dev/sdb
       1       8        0        -        /dev/sda
/dev/md/Volume0_0:
      Container : /dev/md/127, member 0
     Raid Level : raid1
     Array Size : 488636416 (466.00 GiB 500.36 GB)
  Used Dev Size : 488636548 (466.00 GiB 500.36 GB)
   Raid Devices : 2
  Total Devices : 1

    Update Time : Tue Dec 22 17:57:04 2009
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

      New Level : raid0
  New Chunksize : 1K


           UUID : 5ee03c91:f3537647:88da79af:38833b16
    Number   Major   Minor   RaidDevice State
       0       8        0        0      active sync   /dev/sda
       1       0        0        1      removed
/dev/md/Volume1_0:
      Container : /dev/md/127, member 1
     Raid Level : raid1
     Array Size : 488121344 (465.51 GiB 499.84 GB)
  Used Dev Size : 488121476 (465.51 GiB 499.84 GB)
   Raid Devices : 2
  Total Devices : 1

    Update Time : Tue Dec 22 17:57:04 2009
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

      New Level : raid0
  New Chunksize : 1K


           UUID : 81598ffc:8420e261:bf676997:6c7a894f
    Number   Major   Minor   RaidDevice State
       0       8        0        0      active sync   /dev/sda
       1       0        0        1      removed

[-- Attachment #5: mdadm-Esvv-after.txt --]
[-- Type: text/plain, Size: 4390 bytes --]

/dev/sdb:
          Magic : Intel Raid ISM Cfg Sig.
        Version : 1.2.00
    Orig Family : 1f641b4c
         Family : 0822300e
     Generation : 000312e2
           UUID : bee71637:467b6ae8:e1cf2626:185271b8
       Checksum : c6d838da correct
    MPB Sectors : 2
          Disks : 2
   RAID Devices : 2

  Disk01 Serial : WD-WCAV51580865
          State : active
             Id : 00000000
    Usable Size : 1953520654 (931.51 GiB 1000.20 GB)

[Volume0]:
           UUID : 5ee03c91:f3537647:88da79af:38833b16
     RAID Level : 1
        Members : 2
      This Slot : 1 (out-of-sync)
     Array Size : 977272832 (466.00 GiB 500.36 GB)
   Per Dev Size : 977273096 (466.00 GiB 500.36 GB)
  Sector Offset : 0
    Num Stripes : 3817472
     Chunk Size : 64 KiB
       Reserved : 0
  Migrate State : migrating: rebuilding
      Map State : normal <-- degraded
    Dirty State : clean

[Volume1]:
           UUID : 81598ffc:8420e261:bf676997:6c7a894f
     RAID Level : 1
        Members : 2
      This Slot : 1 (out-of-sync)
     Array Size : 976242688 (465.51 GiB 499.84 GB)
   Per Dev Size : 976242952 (465.51 GiB 499.84 GB)
  Sector Offset : 977277192
    Num Stripes : 3813448
     Chunk Size : 64 KiB
       Reserved : 0
  Migrate State : migrating: rebuilding
      Map State : uninitialized <-- degraded
    Dirty State : clean

  Disk00 Serial : WD-WCAV51580780
          State : active
             Id : 00000000
    Usable Size : 1953520654 (931.51 GiB 1000.20 GB)
/dev/sda:
          Magic : Intel Raid ISM Cfg Sig.
        Version : 1.2.00
    Orig Family : 1f641b4c
         Family : 0822300e
     Generation : 000312e2
           UUID : bee71637:467b6ae8:e1cf2626:185271b8
       Checksum : c6d838da correct
    MPB Sectors : 2
          Disks : 2
   RAID Devices : 2

  Disk00 Serial : WD-WCAV51580780
          State : active
             Id : 00000000
    Usable Size : 1953520654 (931.51 GiB 1000.20 GB)

[Volume0]:
           UUID : 5ee03c91:f3537647:88da79af:38833b16
     RAID Level : 1
        Members : 2
      This Slot : 0
     Array Size : 977272832 (466.00 GiB 500.36 GB)
   Per Dev Size : 977273096 (466.00 GiB 500.36 GB)
  Sector Offset : 0
    Num Stripes : 3817472
     Chunk Size : 64 KiB
       Reserved : 0
  Migrate State : migrating: rebuilding
      Map State : normal <-- degraded
    Dirty State : clean

[Volume1]:
           UUID : 81598ffc:8420e261:bf676997:6c7a894f
     RAID Level : 1
        Members : 2
      This Slot : 0
     Array Size : 976242688 (465.51 GiB 499.84 GB)
   Per Dev Size : 976242952 (465.51 GiB 499.84 GB)
  Sector Offset : 977277192
    Num Stripes : 3813448
     Chunk Size : 64 KiB
       Reserved : 0
  Migrate State : migrating: rebuilding
      Map State : uninitialized <-- degraded
    Dirty State : clean

  Disk01 Serial : WD-WCAV51580865
          State : active
             Id : 00000000
    Usable Size : 1953520654 (931.51 GiB 1000.20 GB)
/dev/md127:
          Magic : Intel Raid ISM Cfg Sig.
        Version : 1.2.00
    Orig Family : 1f641b4c
         Family : 0822300e
     Generation : 000312e2
           UUID : bee71637:467b6ae8:e1cf2626:185271b8
       Checksum : c6d838da correct
    MPB Sectors : 2
          Disks : 2
   RAID Devices : 2

  Disk01 Serial : WD-WCAV51580865
          State : active
             Id : 00000000
    Usable Size : 1953520654 (931.51 GiB 1000.20 GB)

[Volume0]:
           UUID : 5ee03c91:f3537647:88da79af:38833b16
     RAID Level : 1
        Members : 2
      This Slot : 1 (out-of-sync)
     Array Size : 977272832 (466.00 GiB 500.36 GB)
   Per Dev Size : 977273096 (466.00 GiB 500.36 GB)
  Sector Offset : 0
    Num Stripes : 3817472
     Chunk Size : 64 KiB
       Reserved : 0
  Migrate State : migrating: rebuilding
      Map State : normal <-- degraded
    Dirty State : clean

[Volume1]:
           UUID : 81598ffc:8420e261:bf676997:6c7a894f
     RAID Level : 1
        Members : 2
      This Slot : 1 (out-of-sync)
     Array Size : 976242688 (465.51 GiB 499.84 GB)
   Per Dev Size : 976242952 (465.51 GiB 499.84 GB)
  Sector Offset : 977277192
    Num Stripes : 3813448
     Chunk Size : 64 KiB
       Reserved : 0
  Migrate State : migrating: rebuilding
      Map State : uninitialized <-- degraded
    Dirty State : clean

  Disk00 Serial : WD-WCAV51580780
          State : active
             Id : 00000000
    Usable Size : 1953520654 (931.51 GiB 1000.20 GB)

[-- Attachment #6: mdadm-Esvv-before.txt --]
[-- Type: text/plain, Size: 3824 bytes --]

/dev/sdb:
          Magic : Intel Raid ISM Cfg Sig.
        Version : 1.2.00
    Orig Family : 1f641b4c
         Family : b2ac231a
     Generation : 000312b0
           UUID : bee71637:467b6ae8:e1cf2626:185271b8
       Checksum : 0085890a correct
    MPB Sectors : 2
          Disks : 1
   RAID Devices : 2

[Volume0]:
           UUID : 5ee03c91:f3537647:88da79af:38833b16
     RAID Level : 1
        Members : 2
      This Slot : ?
     Array Size : 977272832 (466.00 GiB 500.36 GB)
   Per Dev Size : 977273096 (466.00 GiB 500.36 GB)
  Sector Offset : 0
    Num Stripes : 3817472
     Chunk Size : 64 KiB
       Reserved : 0
  Migrate State : idle
      Map State : normal
    Dirty State : clean

[Volume1]:
           UUID : 81598ffc:8420e261:bf676997:6c7a894f
     RAID Level : 1
        Members : 2
      This Slot : ?
     Array Size : 976242688 (465.51 GiB 499.84 GB)
   Per Dev Size : 976242952 (465.51 GiB 499.84 GB)
  Sector Offset : 977277192
    Num Stripes : 3813448
     Chunk Size : 64 KiB
       Reserved : 0
  Migrate State : migrating: rebuilding
      Map State : uninitialized <-- degraded
    Dirty State : clean

  Disk00 Serial : WD-WCAV51580780
          State : active
             Id : 00000000
    Usable Size : 1953520654 (931.51 GiB 1000.20 GB)
/dev/sda:
          Magic : Intel Raid ISM Cfg Sig.
        Version : 1.2.00
    Orig Family : 1f641b4c
         Family : b2ac231a
     Generation : 000312dc
           UUID : bee71637:467b6ae8:e1cf2626:185271b8
       Checksum : 00858936 correct
    MPB Sectors : 2
          Disks : 1
   RAID Devices : 2

  Disk00 Serial : WD-WCAV51580780
          State : active
             Id : 00000000
    Usable Size : 1953520654 (931.51 GiB 1000.20 GB)

[Volume0]:
           UUID : 5ee03c91:f3537647:88da79af:38833b16
     RAID Level : 1
        Members : 2
      This Slot : 0
     Array Size : 977272832 (466.00 GiB 500.36 GB)
   Per Dev Size : 977273096 (466.00 GiB 500.36 GB)
  Sector Offset : 0
    Num Stripes : 3817472
     Chunk Size : 64 KiB
       Reserved : 0
  Migrate State : idle
      Map State : normal
    Dirty State : clean

[Volume1]:
           UUID : 81598ffc:8420e261:bf676997:6c7a894f
     RAID Level : 1
        Members : 2
      This Slot : 0
     Array Size : 976242688 (465.51 GiB 499.84 GB)
   Per Dev Size : 976242952 (465.51 GiB 499.84 GB)
  Sector Offset : 977277192
    Num Stripes : 3813448
     Chunk Size : 64 KiB
       Reserved : 0
  Migrate State : migrating: rebuilding
      Map State : uninitialized <-- degraded
    Dirty State : clean
/dev/md127:
          Magic : Intel Raid ISM Cfg Sig.
        Version : 1.2.00
    Orig Family : 1f641b4c
         Family : b2ac231a
     Generation : 000312dc
           UUID : bee71637:467b6ae8:e1cf2626:185271b8
       Checksum : 00858936 correct
    MPB Sectors : 2
          Disks : 1
   RAID Devices : 2

[Volume0]:
           UUID : 5ee03c91:f3537647:88da79af:38833b16
     RAID Level : 1
        Members : 2
      This Slot : ?
     Array Size : 977272832 (466.00 GiB 500.36 GB)
   Per Dev Size : 977273096 (466.00 GiB 500.36 GB)
  Sector Offset : 0
    Num Stripes : 3817472
     Chunk Size : 64 KiB
       Reserved : 0
  Migrate State : idle
      Map State : normal
    Dirty State : clean

[Volume1]:
           UUID : 81598ffc:8420e261:bf676997:6c7a894f
     RAID Level : 1
        Members : 2
      This Slot : ?
     Array Size : 976242688 (465.51 GiB 499.84 GB)
   Per Dev Size : 976242952 (465.51 GiB 499.84 GB)
  Sector Offset : 977277192
    Num Stripes : 3813448
     Chunk Size : 64 KiB
       Reserved : 0
  Migrate State : migrating: rebuilding
      Map State : uninitialized <-- degraded
    Dirty State : clean

  Disk00 Serial : WD-WCAV51580780
          State : active
             Id : 00000000
    Usable Size : 1953520654 (931.51 GiB 1000.20 GB)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: imsm woes (and a small bug in mdadm)
  2009-12-22 17:51 imsm woes (and a small bug in mdadm) Luca Berra
@ 2009-12-22 23:57 ` Dan Williams
  2009-12-23 13:48   ` Luca Berra
  0 siblings, 1 reply; 4+ messages in thread
From: Dan Williams @ 2009-12-22 23:57 UTC (permalink / raw)
  To: linux-raid

On Tue, Dec 22, 2009 at 10:51 AM, Luca Berra <bluca@comedia.it> wrote:
> Note for Neil/Dan:
> This email could be long and boring, the attached patch prevents a
> segfault on 3.0.3 and 3.1.1, at least have a look at it.
>
> Hello,
> I have this system at home I use for dev/testing/leisure/whatever.
> it has an asus mb with an embedded intel 82801 sata fakeraid (imsm) with
> two WD10EADS 1T disks.
> I created a mirrored container with two volumes, first one windows the
> second linux.
> Yesterday windows crashed, no surprise there, the surprise was that
> after the crash the controller marked the first drive as failed, instead
> of running the usual verify.
> I readded the drive from the windows storage manager console, and since
> it told me rebuild would take 50+ hours i decided to leave it going.
> (the windows software is idiotic, it tries to rebuild both volumes in
> parallel)
> In the morning i found the drive failed rebuild, so i replaced it (will
> do some tests on it and rma when i have spare time).
> In order to avoid waiting 50 hours to see if it finished i decided to
> try rebuilding it under linux, the linux box used dmraid instead of
> mdadm and was obviously unable to boot (did i ever mention redhat/fedora
> mkinitrd sucks).

Things get better with dracut.

> I booted linux from a rescue cd and rebuilt the raid using mdadm 3.0.2.
> It took only 3 hours.
> now real trouble started
> After reboot the intel bios showed both drives as "Offline Member"
> back to the rescue cd. mdadm 3.0.2 activated the container but the two
> volumes were activated using only /dev/sda (NOTE: this is the new drive
> i put in this same morning, not the old one)
> Seeing that mdadm 3.0.3 had some fixes related to imsm i built that
> instead and tried activating the array. unfortunately it segfaulted,
> tried 3.1.1: same segfault
> fire gdb, bt
> found in super-intel.c around line 2430 a call to
> disk_list_get with the serial of /dev/sdb as first argument, which fails
> returning null.
> created the attached patch and rebuilt mdadm.
> still it activated the container with two drives and the volume with
> only one.
> i lost my patience and mdadm -r /dev/md/imsm0 /dev/sdb, mdadm -a ....
>
> it is now rebuilding
>
> i still have to see what bios thinks of the raid when i reboot
>

Everything looks back in order now, let me know if the bios/Windows
has any problems with it.

>
> attached, besides the patch are
> mdadm -Dsvv and mdadm -Esvv before and after the hot-remove-add, in case
> someone has an idea about what might had happened.
>

Thanks for the report.  I hit that segfault recently as well, and your
fix is correct.

Is sdb the drive you replaced, or the original drive?  The 'before'
record on sdb shows that it is a single disk array with only sda's
serial number in the disk list(?), it also shows that sda has a higher
generation number.  It looks like things are back on track with the
latest code because we selected sda (highest generation number),
omitted sdb because it was not part of sda's disk list, and modified
the family number to mark the rebuild as the bios expects.

The bios marked both disks as offline because they both wanted to be
the same family number, but they had no information about each other
in their records, so it needed user intervention to clear the
conflict.  It would have been nice to see the state of the metadata
after the crash, but before the old mdadm [1] touched it as I believe
that is where the confusion started.

--
Dan

[1]: http://git.kernel.org/?p=linux/kernel/git/djbw/mdadm.git;a=commitdiff;h=a2b9798159755b6f5e867fae0dd3e25af59fc85e

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: imsm woes (and a small bug in mdadm)
  2009-12-22 23:57 ` Dan Williams
@ 2009-12-23 13:48   ` Luca Berra
  2009-12-30 19:56     ` Dan Williams
  0 siblings, 1 reply; 4+ messages in thread
From: Luca Berra @ 2009-12-23 13:48 UTC (permalink / raw)
  To: linux-raid

first thing, thanks for your attention.

On Tue, Dec 22, 2009 at 04:57:49PM -0700, Dan Williams wrote:
>On Tue, Dec 22, 2009 at 10:51 AM, Luca Berra <bluca@comedia.it> wrote:
>> try rebuilding it under linux, the linux box used dmraid instead of
>> mdadm and was obviously unable to boot (did i ever mention redhat/fedora
>> mkinitrd sucks).
>
>Things get better with dracut.
i had a cursory look at it, and it seems to be very nice....

>> it is now rebuilding
>>
>> i still have to see what bios thinks of the raid when i reboot
>>
>
>Everything looks back in order now, let me know if the bios/Windows
>has any problems with it.

after rebuild and reboot Volume0 was ok,
Volume 1 was in state "Initializing" and windows rebuilt it again,
this leads me to believe even mdadm-3.1.1 is not perfect yet.

>> attached, besides the patch are
>> mdadm -Dsvv and mdadm -Esvv before and after the hot-remove-add, in case
>> someone has an idea about what might had happened.
>>
>
>Thanks for the report.  I hit that segfault recently as well, and your
>fix is correct.
>
>Is sdb the drive you replaced, or the original drive?  The 'before'
sdb was the 'original' drive.
>record on sdb shows that it is a single disk array with only sda's
>serial number in the disk list(?), it also shows that sda has a higher
>generation number.  It looks like things are back on track with the
>latest code because we selected sda (highest generation number),
>omitted sdb because it was not part of sda's disk list, and modified
>the family number to mark the rebuild as the bios expects.
so 3.0.2 does something which is not correct???
which is the suggested mdadm version for imsm then, 3.1.1 or your git?
my data wasn't important, but i'd like to avoid someone else loosing
data.

>The bios marked both disks as offline because they both wanted to be
>the same family number, but they had no information about each other
>in their records, so it needed user intervention to clear the
this is strange, since one of the test i did was powering on the pc with
only one disk connected (tried with both of them)
>conflict.  It would have been nice to see the state of the metadata
>after the crash, but before the old mdadm [1] touched it as I believe
>that is where the confusion started.
unfortunately i did not forsee any problem so i did not take a snapshot.
btw besides mdadm -D (-E) is there any other way to collect binary
metadata (dd if=/dev/sd? bs=? skip=? count=?) ?


Regards,
L.
-- 
Luca Berra -- bluca@comedia.it
         Communication Media & Services S.r.l.
  /"\
  \ /     ASCII RIBBON CAMPAIGN
   X        AGAINST HTML MAIL
  / \

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: imsm woes (and a small bug in mdadm)
  2009-12-23 13:48   ` Luca Berra
@ 2009-12-30 19:56     ` Dan Williams
  0 siblings, 0 replies; 4+ messages in thread
From: Dan Williams @ 2009-12-30 19:56 UTC (permalink / raw)
  To: linux-raid

On Wed, Dec 23, 2009 at 6:48 AM, Luca Berra <bluca@comedia.it> wrote:
> first thing, thanks for your attention.
>

Thanks for testing and reporting back, very much appreciated.  In the
future please leave me on the Cc: I'll notice the message much faster
that way.

> On Tue, Dec 22, 2009 at 04:57:49PM -0700, Dan Williams wrote:
>>
>> On Tue, Dec 22, 2009 at 10:51 AM, Luca Berra <bluca@comedia.it> wrote:
>> Everything looks back in order now, let me know if the bios/Windows
>> has any problems with it.
>
> after rebuild and reboot Volume0 was ok,
> Volume 1 was in state "Initializing" and windows rebuilt it again,
> this leads me to believe even mdadm-3.1.1 is not perfect yet.

The Windows driver has the concept of running the array in
uninitialized mode, but by default the imsm support in mdadm will
always initialize arrays (it is not strictly needed for raid1/raid10
but it matches the Linux default of always initializing).  It looks
like the current code will try to start an initialization after a
rebuild if the initial array state was 'uninitialized', I'll fix this
up.

>
>>> attached, besides the patch are
>>> mdadm -Dsvv and mdadm -Esvv before and after the hot-remove-add, in case
>>> someone has an idea about what might had happened.
>>>
>>
>> Thanks for the report.  I hit that segfault recently as well, and your
>> fix is correct.
>>
>> Is sdb the drive you replaced, or the original drive?  The 'before'
>
> sdb was the 'original' drive.
>>
>> record on sdb shows that it is a single disk array with only sda's
>> serial number in the disk list(?), it also shows that sda has a higher
>> generation number.  It looks like things are back on track with the
>> latest code because we selected sda (highest generation number),
>> omitted sdb because it was not part of sda's disk list, and modified
>> the family number to mark the rebuild as the bios expects.
>
> so 3.0.2 does something which is not correct???

3.0.2 was missing commit a2b97981 "imsm: disambiguate family_num" [1]

> which is the suggested mdadm version for imsm then, 3.1.1 or your git?

The suggested version is always Neil's latest stable release [2].  You
can track my git, but it may rebase from time to time as Neil reviews
the incoming patch stream.

> my data wasn't important, but i'd like to avoid someone else loosing
> data.

Understood, I'm running an imsm raid5 and raid1 at home, so I have a
personal interest in this code doing the right thing as well.

>> The bios marked both disks as offline because they both wanted to be
>> the same family number, but they had no information about each other
>> in their records, so it needed user intervention to clear the
>
> this is strange, since one of the test i did was powering on the pc with
> only one disk connected (tried with both of them)
>>
>> conflict.  It would have been nice to see the state of the metadata
>> after the crash, but before the old mdadm [1] touched it as I believe
>> that is where the confusion started.
>
> unfortunately i did not forsee any problem so i did not take a snapshot.
> btw besides mdadm -D (-E) is there any other way to collect binary
> metadata (dd if=/dev/sd? bs=? skip=? count=?) ?

The anchor for imsm metadata lives at the second to last sector of the
disk (n-1).  If it grows beyond the size of 1 sector it will consume
the preceding sectors.  So, a metadata record that is 4 sectors in
size will be organized like:

sector[0]: n-1
sector[1]: n-4
sector[2]: n-3
sector[3]: n-2

The details are in load_imsm_mpb() [3]

--
Dan

[1]: http://git.kernel.org/?p=linux/kernel/git/djbw/mdadm.git;a=commitdiff;h=a2b97981
[2]: git://neil.brown.name/mdadm master
[3]: http://neil.brown.name/git?p=mdadm;a=blob;f=super-intel.c;h=d6951cc2ff7c72a578e7de2c733fde387eed0f08;hb=master#l2110
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-12-30 19:56 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-22 17:51 imsm woes (and a small bug in mdadm) Luca Berra
2009-12-22 23:57 ` Dan Williams
2009-12-23 13:48   ` Luca Berra
2009-12-30 19:56     ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).