linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* re-adding a disk to a raid1 array with bitmap
@ 2012-03-09 16:07 Jes Sorensen
  2012-03-28  0:40 ` NeilBrown
  0 siblings, 1 reply; 3+ messages in thread
From: Jes Sorensen @ 2012-03-09 16:07 UTC (permalink / raw)
  To: Brown, Neil; +Cc: Doug Ledford, linux-raid@vger.kernel.org

Neil,

I have been spinning my head over this for a bit trying to figure out
what is the right solution to this problem.

In bedd86b7773fd97f0d708cc0c371c8963ba7ba9a you added a test to reject
re-adding a drive to an array in some cases.

The problem I have been looking at is if one has a raid1 with a bitmap.
Basically in the situation where we have one of the drives pulled from
the array, then if I try to add it back, it fails like this:

[root@monkeybay ~]#  mdadm -I --run /dev/sdf5
mdadm: failed to add /dev/sdf5 to /dev/md32: Invalid argument.

However this works:

[root@monkeybay ~]# mdadm -a /dev/md32 /dev/sdf5
mdadm: re-added /dev/sdf5

I dug through the kernel and it shows up that the failure is due to this
test in the above mentioned commit:

+                    rdev->raid_disk != info->raid_disk)) {

So basically when doing -I it seems the disk itself expects to be
raid_disk = 0, whereas the kernel expects it should be raid_disk = 1.

I agree with the previous discussion that it makes sense to reject a
drive in the normal case without a bitmap. However it seems illogical to
me that -a works but -I should fail in this case.

What would be the right fix here? Relaxing the test in the kernel to not
require the raid_disk numbers match up for a bitmap raid, or should
mdadm be taught to examine the raids and set the expected disk number
before submitting the add_new_disk ioctl?

Cheers,
Jes

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: re-adding a disk to a raid1 array with bitmap
  2012-03-09 16:07 re-adding a disk to a raid1 array with bitmap Jes Sorensen
@ 2012-03-28  0:40 ` NeilBrown
  2012-03-28 15:17   ` Jes Sorensen
  0 siblings, 1 reply; 3+ messages in thread
From: NeilBrown @ 2012-03-28  0:40 UTC (permalink / raw)
  To: Jes Sorensen; +Cc: Doug Ledford, linux-raid@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 2722 bytes --]

On Fri, 09 Mar 2012 17:07:42 +0100 Jes Sorensen <Jes.Sorensen@redhat.com>
wrote:

> Neil,
> 
> I have been spinning my head over this for a bit trying to figure out
> what is the right solution to this problem.
> 
> In bedd86b7773fd97f0d708cc0c371c8963ba7ba9a you added a test to reject
> re-adding a drive to an array in some cases.
> 
> The problem I have been looking at is if one has a raid1 with a bitmap.
> Basically in the situation where we have one of the drives pulled from
> the array, then if I try to add it back, it fails like this:
> 
> [root@monkeybay ~]#  mdadm -I --run /dev/sdf5
> mdadm: failed to add /dev/sdf5 to /dev/md32: Invalid argument.
> 
> However this works:
> 
> [root@monkeybay ~]# mdadm -a /dev/md32 /dev/sdf5
> mdadm: re-added /dev/sdf5
> 
> I dug through the kernel and it shows up that the failure is due to this
> test in the above mentioned commit:
> 
> +                    rdev->raid_disk != info->raid_disk)) {
> 
> So basically when doing -I it seems the disk itself expects to be
> raid_disk = 0, whereas the kernel expects it should be raid_disk = 1.
> 
> I agree with the previous discussion that it makes sense to reject a
> drive in the normal case without a bitmap. However it seems illogical to
> me that -a works but -I should fail in this case.
> 
> What would be the right fix here? Relaxing the test in the kernel to not
> require the raid_disk numbers match up for a bitmap raid, or should
> mdadm be taught to examine the raids and set the expected disk number
> before submitting the add_new_disk ioctl?
> 
> Cheers,
> Jes

Does this patch fix it?

http://neil.brown.name/git?p=mdadm;a=commitdiff;h=69fe207ed68e560d76a592fd86af32a9d1deca25

I found it in a collection of half-forgotten patches recently and decided it
was almost certainly correct, but I didn't remember what motivated it.

It is entirely possible that it was seeing something like the problem you
mention.

The comment for that patch says "This is particularly important for getting
info.disk.state correct", but maybe it is equally important for getting 
info.disk.raid_disk correct.

So I think your last suggestion is right: "mdadm be taught to examine the
raids and set the expected disk number" - and that patch should do the trick.

(and thanks for the reminder to reply to this).

Note that you might need
   POLICY action=re-add
in mdadm.conf for this to work.  I don't think it is a given that when a
recently failed disk is found that it should always be re-added.  So if the
admin wants that it is reasonable to require the policy be explicitly stated.

But I'm not sure if this is currently enforced....


NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: re-adding a disk to a raid1 array with bitmap
  2012-03-28  0:40 ` NeilBrown
@ 2012-03-28 15:17   ` Jes Sorensen
  0 siblings, 0 replies; 3+ messages in thread
From: Jes Sorensen @ 2012-03-28 15:17 UTC (permalink / raw)
  To: NeilBrown; +Cc: Doug Ledford, linux-raid@vger.kernel.org

On 03/28/12 02:40, NeilBrown wrote:
> On Fri, 09 Mar 2012 17:07:42 +0100 Jes Sorensen <Jes.Sorensen@redhat.com>
> wrote:
> 
>> > Neil,
>> > 
>> > I have been spinning my head over this for a bit trying to figure out
>> > what is the right solution to this problem.
>> > 
>> > In bedd86b7773fd97f0d708cc0c371c8963ba7ba9a you added a test to reject
>> > re-adding a drive to an array in some cases.
>> > 
>> > The problem I have been looking at is if one has a raid1 with a bitmap.
>> > Basically in the situation where we have one of the drives pulled from
>> > the array, then if I try to add it back, it fails like this:
>> > 
>> > [root@monkeybay ~]#  mdadm -I --run /dev/sdf5
>> > mdadm: failed to add /dev/sdf5 to /dev/md32: Invalid argument.
>> > 
>> > However this works:
>> > 
>> > [root@monkeybay ~]# mdadm -a /dev/md32 /dev/sdf5
>> > mdadm: re-added /dev/sdf5
>> > 
>> > I dug through the kernel and it shows up that the failure is due to this
>> > test in the above mentioned commit:
>> > 
>> > +                    rdev->raid_disk != info->raid_disk)) {
>> > 
>> > So basically when doing -I it seems the disk itself expects to be
>> > raid_disk = 0, whereas the kernel expects it should be raid_disk = 1.
>> > 
>> > I agree with the previous discussion that it makes sense to reject a
>> > drive in the normal case without a bitmap. However it seems illogical to
>> > me that -a works but -I should fail in this case.
>> > 
>> > What would be the right fix here? Relaxing the test in the kernel to not
>> > require the raid_disk numbers match up for a bitmap raid, or should
>> > mdadm be taught to examine the raids and set the expected disk number
>> > before submitting the add_new_disk ioctl?
> Does this patch fix it?
> 
> http://neil.brown.name/git?p=mdadm;a=commitdiff;h=69fe207ed68e560d76a592fd86af32a9d1deca25
> 
> I found it in a collection of half-forgotten patches recently and decided it
> was almost certainly correct, but I didn't remember what motivated it.
> 
> It is entirely possible that it was seeing something like the problem you
> mention.
> 
> The comment for that patch says "This is particularly important for getting
> info.disk.state correct", but maybe it is equally important for getting 
> info.disk.raid_disk correct.
> 
> So I think your last suggestion is right: "mdadm be taught to examine the
> raids and set the expected disk number" - and that patch should do the trick.
> 
> (and thanks for the reminder to reply to this).
> 
> Note that you might need
>    POLICY action=re-add
> in mdadm.conf for this to work.  I don't think it is a given that when a
> recently failed disk is found that it should always be re-added.  So if the
> admin wants that it is reasonable to require the policy be explicitly stated.
> 
> But I'm not sure if this is currently enforced....

Voila! Yes it does indeed. It works even without adding the POLICY
statement to mdadm.conf

Thanks, I love getting bugfixes delivered like this :)

Cheers,
Jes

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-03-28 15:17 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-09 16:07 re-adding a disk to a raid1 array with bitmap Jes Sorensen
2012-03-28  0:40 ` NeilBrown
2012-03-28 15:17   ` Jes Sorensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).