mdadm --assemble considers event count for spares

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* mdadm --assemble considers event count for spares
@ 2013-05-27 10:05 Alexander Lyakas
  2013-05-28  1:16 ` NeilBrown
  0 siblings, 1 reply; 7+ messages in thread
From: Alexander Lyakas @ 2013-05-27 10:05 UTC (permalink / raw)
  To: linux-raid; +Cc: NeilBrown

Hi Neil,
It can happen that a spare has a higher event count than a in-array drive.
For exampe: RAID1 with two drives is rebuilding one of the drives.
Then the "good" drive fails. As a result, MD stops the rebuild and
ejects the rebuilding drive from the array. The failed drive stays in
the array, because RAID1 never ejects the last drive. However, the
"good" drive fails all IOs, so the ejected drive has a larger event
count now.
Now if MD is stopped and re-assembled, mdadm considers the spare drive
as the chosen one:

root@vc:/mnt/work/alex/mdadm-neil# ./mdadm --assemble /dev/md200
--name=alex --config=none --homehost=vc --run --auto=md --metadata=1.2
--verbose --verbose /dev/sdc2 /dev/sdd2
mdadm: looking for devices for /dev/md200
mdadm: /dev/sdc2 is identified as a member of /dev/md200, slot 0.
mdadm: /dev/sdd2 is identified as a member of /dev/md200, slot -1.
mdadm: added /dev/sdc2 to /dev/md200 as 0 (possibly out of date)
mdadm: no uptodate device for slot 2 of /dev/md200
mdadm: added /dev/sdd2 to /dev/md200 as -1
mdadm: failed to RUN_ARRAY /dev/md200: Input/output error
mdadm: Not enough devices to start the array.

Kernel doesn't accept the non-spare drive considering it as non-fresh:
May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.679396] md: md200 stopped.
May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.686870] md: bind<sdc2>
May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.687623] md: bind<sdd2>
May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.687675] md: kicking
non-fresh sdc2 from array!
May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.687680] md: unbind<sdc2>
May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.687683] md: export_rdev(sdc2)
May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.693574]
md/raid1:md200: active with 0 out of 2 mirrors
May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.693583] md200:
failed to create bitmap (-5)

This happens with the latest mdadm from git, and kernel 3.8.2.

Is this the expected behavior?
Maybe mdadm should not consider spares at all for its "chosen_drive"
logic, and perhaps not try to add them to the kernel?

Superblocks of both drives:
sdc2 - the "good" drive:
/dev/sdc2:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 8e051cc5:c536d16e:72b413fa:e7049d4b
           Name : zadara_vc:alex
  Creation Time : Mon May 27 11:33:50 2013
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 975063127 (464.95 GiB 499.23 GB)
     Array Size : 209715200 (200.00 GiB 214.75 GB)
  Used Dev Size : 419430400 (200.00 GiB 214.75 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
   Unused Space : before=1968 sectors, after=555632727 sectors
          State : clean
    Device UUID : 1f661ca3:fdc8b887:8d3638ab:f2cc0a40

Internal Bitmap : 8 sectors from superblock
    Update Time : Mon May 27 11:34:57 2013
       Checksum : 72a97357 - correct
         Events : 9

sdd2 - the "rebuilding" drive:
/dev/sdd2:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 8e051cc5:c536d16e:72b413fa:e7049d4b
           Name : zadara_vc:alex
  Creation Time : Mon May 27 11:33:50 2013
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 976123417 (465.45 GiB 499.78 GB)
     Array Size : 209715200 (200.00 GiB 214.75 GB)
  Used Dev Size : 419430400 (200.00 GiB 214.75 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
   Unused Space : before=1968 sectors, after=556693017 sectors
          State : clean
    Device UUID : 9abc7fa9:6bf95a51:51f2cd65:14232e81

Internal Bitmap : 8 sectors from superblock
    Update Time : Mon May 27 11:35:56 2013
       Checksum : 3e793a34 - correct
         Events : 26


   Device Role : spare
   Array State : A. ('A' == active, '.' == missing, 'R' == replacing)


Thanks,
Alex.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mdadm --assemble considers event count for spares
  2013-05-27 10:05 mdadm --assemble considers event count for spares Alexander Lyakas
@ 2013-05-28  1:16 ` NeilBrown
  2013-05-28  8:56   ` Alexander Lyakas
  0 siblings, 1 reply; 7+ messages in thread
From: NeilBrown @ 2013-05-28  1:16 UTC (permalink / raw)
  To: Alexander Lyakas; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 4274 bytes --]

On Mon, 27 May 2013 13:05:34 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
wrote:

> Hi Neil,
> It can happen that a spare has a higher event count than a in-array drive.
> For exampe: RAID1 with two drives is rebuilding one of the drives.
> Then the "good" drive fails. As a result, MD stops the rebuild and
> ejects the rebuilding drive from the array. The failed drive stays in
> the array, because RAID1 never ejects the last drive. However, the
> "good" drive fails all IOs, so the ejected drive has a larger event
> count now.
> Now if MD is stopped and re-assembled, mdadm considers the spare drive
> as the chosen one:
> 
> root@vc:/mnt/work/alex/mdadm-neil# ./mdadm --assemble /dev/md200
> --name=alex --config=none --homehost=vc --run --auto=md --metadata=1.2
> --verbose --verbose /dev/sdc2 /dev/sdd2
> mdadm: looking for devices for /dev/md200
> mdadm: /dev/sdc2 is identified as a member of /dev/md200, slot 0.
> mdadm: /dev/sdd2 is identified as a member of /dev/md200, slot -1.
> mdadm: added /dev/sdc2 to /dev/md200 as 0 (possibly out of date)
> mdadm: no uptodate device for slot 2 of /dev/md200
> mdadm: added /dev/sdd2 to /dev/md200 as -1
> mdadm: failed to RUN_ARRAY /dev/md200: Input/output error
> mdadm: Not enough devices to start the array.
> 
> Kernel doesn't accept the non-spare drive considering it as non-fresh:
> May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.679396] md: md200 stopped.
> May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.686870] md: bind<sdc2>
> May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.687623] md: bind<sdd2>
> May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.687675] md: kicking
> non-fresh sdc2 from array!
> May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.687680] md: unbind<sdc2>
> May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.687683] md: export_rdev(sdc2)
> May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.693574]
> md/raid1:md200: active with 0 out of 2 mirrors
> May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.693583] md200:
> failed to create bitmap (-5)
> 
> This happens with the latest mdadm from git, and kernel 3.8.2.
> 
> Is this the expected behavior?

I hadn't thought about it.

> Maybe mdadm should not consider spares at all for its "chosen_drive"
> logic, and perhaps not try to add them to the kernel?

Probably not, no.

NeilBrown



> 
> Superblocks of both drives:
> sdc2 - the "good" drive:
> /dev/sdc2:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x1
>      Array UUID : 8e051cc5:c536d16e:72b413fa:e7049d4b
>            Name : zadara_vc:alex
>   Creation Time : Mon May 27 11:33:50 2013
>      Raid Level : raid1
>    Raid Devices : 2
> 
>  Avail Dev Size : 975063127 (464.95 GiB 499.23 GB)
>      Array Size : 209715200 (200.00 GiB 214.75 GB)
>   Used Dev Size : 419430400 (200.00 GiB 214.75 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>    Unused Space : before=1968 sectors, after=555632727 sectors
>           State : clean
>     Device UUID : 1f661ca3:fdc8b887:8d3638ab:f2cc0a40
> 
> Internal Bitmap : 8 sectors from superblock
>     Update Time : Mon May 27 11:34:57 2013
>        Checksum : 72a97357 - correct
>          Events : 9
> 
> sdd2 - the "rebuilding" drive:
> /dev/sdd2:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x1
>      Array UUID : 8e051cc5:c536d16e:72b413fa:e7049d4b
>            Name : zadara_vc:alex
>   Creation Time : Mon May 27 11:33:50 2013
>      Raid Level : raid1
>    Raid Devices : 2
> 
>  Avail Dev Size : 976123417 (465.45 GiB 499.78 GB)
>      Array Size : 209715200 (200.00 GiB 214.75 GB)
>   Used Dev Size : 419430400 (200.00 GiB 214.75 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>    Unused Space : before=1968 sectors, after=556693017 sectors
>           State : clean
>     Device UUID : 9abc7fa9:6bf95a51:51f2cd65:14232e81
> 
> Internal Bitmap : 8 sectors from superblock
>     Update Time : Mon May 27 11:35:56 2013
>        Checksum : 3e793a34 - correct
>          Events : 26
> 
> 
>    Device Role : spare
>    Array State : A. ('A' == active, '.' == missing, 'R' == replacing)
> 
> 
> Thanks,
> Alex.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mdadm --assemble considers event count for spares
  2013-05-28  1:16 ` NeilBrown
@ 2013-05-28  8:56   ` Alexander Lyakas
  2013-05-28  9:15     ` NeilBrown
  0 siblings, 1 reply; 7+ messages in thread
From: Alexander Lyakas @ 2013-05-28  8:56 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Hi Neil,
can you please let me know what have you decided after/whether you had
time to think about this issue.

Thanks,
Alex.




On Tue, May 28, 2013 at 4:16 AM, NeilBrown <neilb@suse.de> wrote:
> On Mon, 27 May 2013 13:05:34 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
> wrote:
>
>> Hi Neil,
>> It can happen that a spare has a higher event count than a in-array drive.
>> For exampe: RAID1 with two drives is rebuilding one of the drives.
>> Then the "good" drive fails. As a result, MD stops the rebuild and
>> ejects the rebuilding drive from the array. The failed drive stays in
>> the array, because RAID1 never ejects the last drive. However, the
>> "good" drive fails all IOs, so the ejected drive has a larger event
>> count now.
>> Now if MD is stopped and re-assembled, mdadm considers the spare drive
>> as the chosen one:
>>
>> root@vc:/mnt/work/alex/mdadm-neil# ./mdadm --assemble /dev/md200
>> --name=alex --config=none --homehost=vc --run --auto=md --metadata=1.2
>> --verbose --verbose /dev/sdc2 /dev/sdd2
>> mdadm: looking for devices for /dev/md200
>> mdadm: /dev/sdc2 is identified as a member of /dev/md200, slot 0.
>> mdadm: /dev/sdd2 is identified as a member of /dev/md200, slot -1.
>> mdadm: added /dev/sdc2 to /dev/md200 as 0 (possibly out of date)
>> mdadm: no uptodate device for slot 2 of /dev/md200
>> mdadm: added /dev/sdd2 to /dev/md200 as -1
>> mdadm: failed to RUN_ARRAY /dev/md200: Input/output error
>> mdadm: Not enough devices to start the array.
>>
>> Kernel doesn't accept the non-spare drive considering it as non-fresh:
>> May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.679396] md: md200 stopped.
>> May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.686870] md: bind<sdc2>
>> May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.687623] md: bind<sdd2>
>> May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.687675] md: kicking
>> non-fresh sdc2 from array!
>> May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.687680] md: unbind<sdc2>
>> May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.687683] md: export_rdev(sdc2)
>> May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.693574]
>> md/raid1:md200: active with 0 out of 2 mirrors
>> May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.693583] md200:
>> failed to create bitmap (-5)
>>
>> This happens with the latest mdadm from git, and kernel 3.8.2.
>>
>> Is this the expected behavior?
>
> I hadn't thought about it.
>
>> Maybe mdadm should not consider spares at all for its "chosen_drive"
>> logic, and perhaps not try to add them to the kernel?
>
> Probably not, no.
>
> NeilBrown
>
>
>
>>
>> Superblocks of both drives:
>> sdc2 - the "good" drive:
>> /dev/sdc2:
>>           Magic : a92b4efc
>>         Version : 1.2
>>     Feature Map : 0x1
>>      Array UUID : 8e051cc5:c536d16e:72b413fa:e7049d4b
>>            Name : zadara_vc:alex
>>   Creation Time : Mon May 27 11:33:50 2013
>>      Raid Level : raid1
>>    Raid Devices : 2
>>
>>  Avail Dev Size : 975063127 (464.95 GiB 499.23 GB)
>>      Array Size : 209715200 (200.00 GiB 214.75 GB)
>>   Used Dev Size : 419430400 (200.00 GiB 214.75 GB)
>>     Data Offset : 2048 sectors
>>    Super Offset : 8 sectors
>>    Unused Space : before=1968 sectors, after=555632727 sectors
>>           State : clean
>>     Device UUID : 1f661ca3:fdc8b887:8d3638ab:f2cc0a40
>>
>> Internal Bitmap : 8 sectors from superblock
>>     Update Time : Mon May 27 11:34:57 2013
>>        Checksum : 72a97357 - correct
>>          Events : 9
>>
>> sdd2 - the "rebuilding" drive:
>> /dev/sdd2:
>>           Magic : a92b4efc
>>         Version : 1.2
>>     Feature Map : 0x1
>>      Array UUID : 8e051cc5:c536d16e:72b413fa:e7049d4b
>>            Name : zadara_vc:alex
>>   Creation Time : Mon May 27 11:33:50 2013
>>      Raid Level : raid1
>>    Raid Devices : 2
>>
>>  Avail Dev Size : 976123417 (465.45 GiB 499.78 GB)
>>      Array Size : 209715200 (200.00 GiB 214.75 GB)
>>   Used Dev Size : 419430400 (200.00 GiB 214.75 GB)
>>     Data Offset : 2048 sectors
>>    Super Offset : 8 sectors
>>    Unused Space : before=1968 sectors, after=556693017 sectors
>>           State : clean
>>     Device UUID : 9abc7fa9:6bf95a51:51f2cd65:14232e81
>>
>> Internal Bitmap : 8 sectors from superblock
>>     Update Time : Mon May 27 11:35:56 2013
>>        Checksum : 3e793a34 - correct
>>          Events : 26
>>
>>
>>    Device Role : spare
>>    Array State : A. ('A' == active, '.' == missing, 'R' == replacing)
>>
>>
>> Thanks,
>> Alex.
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mdadm --assemble considers event count for spares
  2013-05-28  8:56   ` Alexander Lyakas
@ 2013-05-28  9:15     ` NeilBrown
  2013-05-28 10:50       ` Alexander Lyakas
  0 siblings, 1 reply; 7+ messages in thread
From: NeilBrown @ 2013-05-28  9:15 UTC (permalink / raw)
  To: Alexander Lyakas; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 5161 bytes --]

On Tue, 28 May 2013 11:56:26 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
wrote:

> Hi Neil,
> can you please let me know what have you decided after/whether you had
> time to think about this issue.

I don't actually plan to think about the issue, at least not in the short
term.
If you would like to propose a concrete solution, then I would probably be
motivated to think about that and give you some feedback.

NeilBrown

> 
> Thanks,
> Alex.
> 
> 
> 
> 
> On Tue, May 28, 2013 at 4:16 AM, NeilBrown <neilb@suse.de> wrote:
> > On Mon, 27 May 2013 13:05:34 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
> > wrote:
> >
> >> Hi Neil,
> >> It can happen that a spare has a higher event count than a in-array drive.
> >> For exampe: RAID1 with two drives is rebuilding one of the drives.
> >> Then the "good" drive fails. As a result, MD stops the rebuild and
> >> ejects the rebuilding drive from the array. The failed drive stays in
> >> the array, because RAID1 never ejects the last drive. However, the
> >> "good" drive fails all IOs, so the ejected drive has a larger event
> >> count now.
> >> Now if MD is stopped and re-assembled, mdadm considers the spare drive
> >> as the chosen one:
> >>
> >> root@vc:/mnt/work/alex/mdadm-neil# ./mdadm --assemble /dev/md200
> >> --name=alex --config=none --homehost=vc --run --auto=md --metadata=1.2
> >> --verbose --verbose /dev/sdc2 /dev/sdd2
> >> mdadm: looking for devices for /dev/md200
> >> mdadm: /dev/sdc2 is identified as a member of /dev/md200, slot 0.
> >> mdadm: /dev/sdd2 is identified as a member of /dev/md200, slot -1.
> >> mdadm: added /dev/sdc2 to /dev/md200 as 0 (possibly out of date)
> >> mdadm: no uptodate device for slot 2 of /dev/md200
> >> mdadm: added /dev/sdd2 to /dev/md200 as -1
> >> mdadm: failed to RUN_ARRAY /dev/md200: Input/output error
> >> mdadm: Not enough devices to start the array.
> >>
> >> Kernel doesn't accept the non-spare drive considering it as non-fresh:
> >> May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.679396] md: md200 stopped.
> >> May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.686870] md: bind<sdc2>
> >> May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.687623] md: bind<sdd2>
> >> May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.687675] md: kicking
> >> non-fresh sdc2 from array!
> >> May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.687680] md: unbind<sdc2>
> >> May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.687683] md: export_rdev(sdc2)
> >> May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.693574]
> >> md/raid1:md200: active with 0 out of 2 mirrors
> >> May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.693583] md200:
> >> failed to create bitmap (-5)
> >>
> >> This happens with the latest mdadm from git, and kernel 3.8.2.
> >>
> >> Is this the expected behavior?
> >
> > I hadn't thought about it.
> >
> >> Maybe mdadm should not consider spares at all for its "chosen_drive"
> >> logic, and perhaps not try to add them to the kernel?
> >
> > Probably not, no.
> >
> > NeilBrown
> >
> >
> >
> >>
> >> Superblocks of both drives:
> >> sdc2 - the "good" drive:
> >> /dev/sdc2:
> >>           Magic : a92b4efc
> >>         Version : 1.2
> >>     Feature Map : 0x1
> >>      Array UUID : 8e051cc5:c536d16e:72b413fa:e7049d4b
> >>            Name : zadara_vc:alex
> >>   Creation Time : Mon May 27 11:33:50 2013
> >>      Raid Level : raid1
> >>    Raid Devices : 2
> >>
> >>  Avail Dev Size : 975063127 (464.95 GiB 499.23 GB)
> >>      Array Size : 209715200 (200.00 GiB 214.75 GB)
> >>   Used Dev Size : 419430400 (200.00 GiB 214.75 GB)
> >>     Data Offset : 2048 sectors
> >>    Super Offset : 8 sectors
> >>    Unused Space : before=1968 sectors, after=555632727 sectors
> >>           State : clean
> >>     Device UUID : 1f661ca3:fdc8b887:8d3638ab:f2cc0a40
> >>
> >> Internal Bitmap : 8 sectors from superblock
> >>     Update Time : Mon May 27 11:34:57 2013
> >>        Checksum : 72a97357 - correct
> >>          Events : 9
> >>
> >> sdd2 - the "rebuilding" drive:
> >> /dev/sdd2:
> >>           Magic : a92b4efc
> >>         Version : 1.2
> >>     Feature Map : 0x1
> >>      Array UUID : 8e051cc5:c536d16e:72b413fa:e7049d4b
> >>            Name : zadara_vc:alex
> >>   Creation Time : Mon May 27 11:33:50 2013
> >>      Raid Level : raid1
> >>    Raid Devices : 2
> >>
> >>  Avail Dev Size : 976123417 (465.45 GiB 499.78 GB)
> >>      Array Size : 209715200 (200.00 GiB 214.75 GB)
> >>   Used Dev Size : 419430400 (200.00 GiB 214.75 GB)
> >>     Data Offset : 2048 sectors
> >>    Super Offset : 8 sectors
> >>    Unused Space : before=1968 sectors, after=556693017 sectors
> >>           State : clean
> >>     Device UUID : 9abc7fa9:6bf95a51:51f2cd65:14232e81
> >>
> >> Internal Bitmap : 8 sectors from superblock
> >>     Update Time : Mon May 27 11:35:56 2013
> >>        Checksum : 3e793a34 - correct
> >>          Events : 26
> >>
> >>
> >>    Device Role : spare
> >>    Array State : A. ('A' == active, '.' == missing, 'R' == replacing)
> >>
> >>
> >> Thanks,
> >> Alex.
> >


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mdadm --assemble considers event count for spares
  2013-05-28  9:15     ` NeilBrown
@ 2013-05-28 10:50       ` Alexander Lyakas
  2013-06-03 23:51         ` NeilBrown
  2013-06-17  6:57         ` NeilBrown
  0 siblings, 2 replies; 7+ messages in thread
From: Alexander Lyakas @ 2013-05-28 10:50 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Neil,
In my opinion (I may be wrong), a spare drive (raid_disk==-1) doesn't
add any information to array assembly. It doesn't have a valid raid
slot, and I don't see how its event count is relevant. I don't think a
spare can help us much in figuring out array's latest state, which is
what assembly code tries to do.
So what I was thinking: mdadm --assemble doesn't consider spare drives
(raid_disk=-1) at all. It simply skips over them in the initial loop
after reading their superblocks. Perhaps it can keep them in a side
list. Then array is assembled with non-spare drives only.

After array is assembled, we may choose one of the following:
# User has to explicitly add the spare drives after array has been
assembled. Assemble can warn that some spares have been left out, and
tell the user what they are.
# Assemble adds the spare drives (perhaps after zeroing their
superblocks even), after it assembled the array with non-spare drives.

Alex.



On Tue, May 28, 2013 at 12:15 PM, NeilBrown <neilb@suse.de> wrote:
> On Tue, 28 May 2013 11:56:26 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
> wrote:
>
>> Hi Neil,
>> can you please let me know what have you decided after/whether you had
>> time to think about this issue.
>
> I don't actually plan to think about the issue, at least not in the short
> term.
> If you would like to propose a concrete solution, then I would probably be
> motivated to think about that and give you some feedback.
>
> NeilBrown
>
>>
>> Thanks,
>> Alex.
>>
>>
>>
>>
>> On Tue, May 28, 2013 at 4:16 AM, NeilBrown <neilb@suse.de> wrote:
>> > On Mon, 27 May 2013 13:05:34 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
>> > wrote:
>> >
>> >> Hi Neil,
>> >> It can happen that a spare has a higher event count than a in-array drive.
>> >> For exampe: RAID1 with two drives is rebuilding one of the drives.
>> >> Then the "good" drive fails. As a result, MD stops the rebuild and
>> >> ejects the rebuilding drive from the array. The failed drive stays in
>> >> the array, because RAID1 never ejects the last drive. However, the
>> >> "good" drive fails all IOs, so the ejected drive has a larger event
>> >> count now.
>> >> Now if MD is stopped and re-assembled, mdadm considers the spare drive
>> >> as the chosen one:
>> >>
>> >> root@vc:/mnt/work/alex/mdadm-neil# ./mdadm --assemble /dev/md200
>> >> --name=alex --config=none --homehost=vc --run --auto=md --metadata=1.2
>> >> --verbose --verbose /dev/sdc2 /dev/sdd2
>> >> mdadm: looking for devices for /dev/md200
>> >> mdadm: /dev/sdc2 is identified as a member of /dev/md200, slot 0.
>> >> mdadm: /dev/sdd2 is identified as a member of /dev/md200, slot -1.
>> >> mdadm: added /dev/sdc2 to /dev/md200 as 0 (possibly out of date)
>> >> mdadm: no uptodate device for slot 2 of /dev/md200
>> >> mdadm: added /dev/sdd2 to /dev/md200 as -1
>> >> mdadm: failed to RUN_ARRAY /dev/md200: Input/output error
>> >> mdadm: Not enough devices to start the array.
>> >>
>> >> Kernel doesn't accept the non-spare drive considering it as non-fresh:
>> >> May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.679396] md: md200 stopped.
>> >> May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.686870] md: bind<sdc2>
>> >> May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.687623] md: bind<sdd2>
>> >> May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.687675] md: kicking
>> >> non-fresh sdc2 from array!
>> >> May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.687680] md: unbind<sdc2>
>> >> May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.687683] md: export_rdev(sdc2)
>> >> May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.693574]
>> >> md/raid1:md200: active with 0 out of 2 mirrors
>> >> May 27 12:42:28 vsa-00000505-vc-0 kernel: [343203.693583] md200:
>> >> failed to create bitmap (-5)
>> >>
>> >> This happens with the latest mdadm from git, and kernel 3.8.2.
>> >>
>> >> Is this the expected behavior?
>> >
>> > I hadn't thought about it.
>> >
>> >> Maybe mdadm should not consider spares at all for its "chosen_drive"
>> >> logic, and perhaps not try to add them to the kernel?
>> >
>> > Probably not, no.
>> >
>> > NeilBrown
>> >
>> >
>> >
>> >>
>> >> Superblocks of both drives:
>> >> sdc2 - the "good" drive:
>> >> /dev/sdc2:
>> >>           Magic : a92b4efc
>> >>         Version : 1.2
>> >>     Feature Map : 0x1
>> >>      Array UUID : 8e051cc5:c536d16e:72b413fa:e7049d4b
>> >>            Name : zadara_vc:alex
>> >>   Creation Time : Mon May 27 11:33:50 2013
>> >>      Raid Level : raid1
>> >>    Raid Devices : 2
>> >>
>> >>  Avail Dev Size : 975063127 (464.95 GiB 499.23 GB)
>> >>      Array Size : 209715200 (200.00 GiB 214.75 GB)
>> >>   Used Dev Size : 419430400 (200.00 GiB 214.75 GB)
>> >>     Data Offset : 2048 sectors
>> >>    Super Offset : 8 sectors
>> >>    Unused Space : before=1968 sectors, after=555632727 sectors
>> >>           State : clean
>> >>     Device UUID : 1f661ca3:fdc8b887:8d3638ab:f2cc0a40
>> >>
>> >> Internal Bitmap : 8 sectors from superblock
>> >>     Update Time : Mon May 27 11:34:57 2013
>> >>        Checksum : 72a97357 - correct
>> >>          Events : 9
>> >>
>> >> sdd2 - the "rebuilding" drive:
>> >> /dev/sdd2:
>> >>           Magic : a92b4efc
>> >>         Version : 1.2
>> >>     Feature Map : 0x1
>> >>      Array UUID : 8e051cc5:c536d16e:72b413fa:e7049d4b
>> >>            Name : zadara_vc:alex
>> >>   Creation Time : Mon May 27 11:33:50 2013
>> >>      Raid Level : raid1
>> >>    Raid Devices : 2
>> >>
>> >>  Avail Dev Size : 976123417 (465.45 GiB 499.78 GB)
>> >>      Array Size : 209715200 (200.00 GiB 214.75 GB)
>> >>   Used Dev Size : 419430400 (200.00 GiB 214.75 GB)
>> >>     Data Offset : 2048 sectors
>> >>    Super Offset : 8 sectors
>> >>    Unused Space : before=1968 sectors, after=556693017 sectors
>> >>           State : clean
>> >>     Device UUID : 9abc7fa9:6bf95a51:51f2cd65:14232e81
>> >>
>> >> Internal Bitmap : 8 sectors from superblock
>> >>     Update Time : Mon May 27 11:35:56 2013
>> >>        Checksum : 3e793a34 - correct
>> >>          Events : 26
>> >>
>> >>
>> >>    Device Role : spare
>> >>    Array State : A. ('A' == active, '.' == missing, 'R' == replacing)
>> >>
>> >>
>> >> Thanks,
>> >> Alex.
>> >
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mdadm --assemble considers event count for spares
  2013-05-28 10:50       ` Alexander Lyakas
@ 2013-06-03 23:51         ` NeilBrown
  2013-06-17  6:57         ` NeilBrown
  1 sibling, 0 replies; 7+ messages in thread
From: NeilBrown @ 2013-06-03 23:51 UTC (permalink / raw)
  To: Alexander Lyakas; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1528 bytes --]

On Tue, 28 May 2013 13:50:49 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
wrote:

> Neil,
> In my opinion (I may be wrong), a spare drive (raid_disk==-1) doesn't
> add any information to array assembly. It doesn't have a valid raid
> slot, and I don't see how its event count is relevant. I don't think a
> spare can help us much in figuring out array's latest state, which is
> what assembly code tries to do.
> So what I was thinking: mdadm --assemble doesn't consider spare drives
> (raid_disk=-1) at all. It simply skips over them in the initial loop
> after reading their superblocks. Perhaps it can keep them in a side
> list. Then array is assembled with non-spare drives only.

Sounds reasonable.
I would suggest looking at the place where 'most_recent' is set in
Assemble.c, and get it to avoid updating 'most_recent' if the current device
is a spare.
Something like

		if (most_recent < devcnt) {
			if (devices[devcnt].i.events
			    > devices[most_recent].i.events)
+			      if (devices[devcnt].i.disk.state == 6)
				most_recent = devcnt;
		}

Care to give that a try?

NeilBrown


> 
> After array is assembled, we may choose one of the following:
> # User has to explicitly add the spare drives after array has been
> assembled. Assemble can warn that some spares have been left out, and
> tell the user what they are.
> # Assemble adds the spare drives (perhaps after zeroing their
> superblocks even), after it assembled the array with non-spare drives.
> 
> Alex.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mdadm --assemble considers event count for spares
  2013-05-28 10:50       ` Alexander Lyakas
  2013-06-03 23:51         ` NeilBrown
@ 2013-06-17  6:57         ` NeilBrown
  1 sibling, 0 replies; 7+ messages in thread
From: NeilBrown @ 2013-06-17  6:57 UTC (permalink / raw)
  To: Alexander Lyakas; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1319 bytes --]

On Tue, 28 May 2013 13:50:49 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
wrote:

> Neil,
> In my opinion (I may be wrong), a spare drive (raid_disk==-1) doesn't
> add any information to array assembly. It doesn't have a valid raid
> slot, and I don't see how its event count is relevant. I don't think a
> spare can help us much in figuring out array's latest state, which is
> what assembly code tries to do.
> So what I was thinking: mdadm --assemble doesn't consider spare drives
> (raid_disk=-1) at all. It simply skips over them in the initial loop
> after reading their superblocks. Perhaps it can keep them in a side
> list. Then array is assembled with non-spare drives only.
> 
> After array is assembled, we may choose one of the following:
> # User has to explicitly add the spare drives after array has been
> assembled. Assemble can warn that some spares have been left out, and
> tell the user what they are.
> # Assemble adds the spare drives (perhaps after zeroing their
> superblocks even), after it assembled the array with non-spare drives.

Hi,
 I have just committed

  http://git.neil.brown.name/?p=mdadm.git;a=commitdiff;h=f80057aec5d314798251e318555cb8ac92e4c06f

which I believe fixes this issue.  If you can test and confirm I would
appreciate it.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-06-17  6:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-27 10:05 mdadm --assemble considers event count for spares Alexander Lyakas
2013-05-28  1:16 ` NeilBrown
2013-05-28  8:56   ` Alexander Lyakas
2013-05-28  9:15     ` NeilBrown
2013-05-28 10:50       ` Alexander Lyakas
2013-06-03 23:51         ` NeilBrown
2013-06-17  6:57         ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).