"mdadm --remove" fails if it is too soon after "mdadm --fail"

public inbox for linux-raid@vger.kernel.org
 help / color / mirror / Atom feed

* "mdadm --remove" fails if it is too soon after "mdadm --fail"
@ 2009-10-14 15:32 Darius S. Naqvi
  2009-10-14 18:38 ` Darius S. Naqvi
  0 siblings, 1 reply; 3+ messages in thread
From: Darius S. Naqvi @ 2009-10-14 15:32 UTC (permalink / raw)
  To: linux-raid

Hello,

I need to automate removing of a device from a raid 1.  The way to do
this is

       mdadm <raid1> --fail <device>

followed by

       mdadm <raid1> --remove <device>

The problem is that if these commands are issued by, say, compiled C
code, the --remove can fail, apparently because it is too soon after
the --fail.  Doing retries in a loop with sleep() calls seems to work
around most instances, but it seems that in very exceptional
circumstances, waiting a few seconds is not enough.  In that case, the
--fail fails because the HOT_REMOVE_DISK ioctl failed with EBUSY.
Digging into the source in the kernel tree, it looks like the relevant
code is in drivers/md/md.c::hot_remove_disk(), and the only way this
returns EBUSY appears to be if the "raid_disk" field in the mdk_rdev_t
struct for the device is non-negative, i.e., is a valid index into the
array of devices for the raid1.

The --fail operation issues a SET_DISK_FAULTY ioctl, which looks like
it winds up in drivers/md/md.c::set_disk_faulty(), which calls
drivers/md/md.c::md_error(), which calls an error_handler specific to
the type of raid.  In the raid1 case, that winds up bing
drivers/md/raid1.c::error(), which (among other things) sets the
"Faulty" bit in the device.  And it looks like this "Faulty" bit is
translated into "raid_disk" being -1 in
drivers/md/md.c::remove_and_add_spares().  This function (i.e.,
remove_and_add_spares()) is only called from
drivers/md/md.c::md_check_recovery().  And, finally,
md_check_recovery() has comments indicating that it is "regularly
called by all per-raid-array threads".

I.e., it seems that the ioctl invoked by --fail doesn't directly set
up the device to be ready for --remove, but some other kernel thread
completes that state change.  I'm wondering if it could be the case
that when the system is very, very busy, it could take long enough for
that kernel thread to run that it would cause what I see, i.e.,
--remove fails with EBUSY, even though I've already waited about 20
seconds for the device to be ready to be removed.  If this is so, what
shall I do?  Here are the options I can think of:

1) sleep() for even longer, perhaps by increasing the sleep() on each retry

2) run a later version of the md system and/or kernel in which this
      timing window is eliminated (or reduced to a reasonably short length)

3) something else?


-- 
Darius S. Naqvi
dnaqvi@datagardens.com
http://www.datagardens.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: "mdadm --remove" fails if it is too soon after "mdadm --fail"
  2009-10-14 15:32 "mdadm --remove" fails if it is too soon after "mdadm --fail" Darius S. Naqvi
@ 2009-10-14 18:38 ` Darius S. Naqvi
  2009-10-15  1:49   ` NeilBrown
  0 siblings, 1 reply; 3+ messages in thread
From: Darius S. Naqvi @ 2009-10-14 18:38 UTC (permalink / raw)
  To: linux-raid

On Wed, 14 Oct 2009, Darius S. Naqvi wrote:

> I.e., it seems that the ioctl invoked by --fail doesn't directly set
> up the device to be ready for --remove, but some other kernel thread
> completes that state change.  I'm wondering if it could be the case
> that when the system is very, very busy, it could take long enough for
> that kernel thread to run that it would cause what I see, i.e.,
> --remove fails with EBUSY, even though I've already waited about 20
> seconds for the device to be ready to be removed.  If this is so, what
> shall I do?  Here are the options I can think of:

Sorry to reply to my own posting.  It turns out that in this case,
I've only waited 2.5 seconds.  This may affect the probability of my
hunch being correct.

-- 
Darius S. Naqvi
dnaqvi@datagardens.com
http://www.datagardens.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: "mdadm --remove" fails if it is too soon after "mdadm --fail"
  2009-10-14 18:38 ` Darius S. Naqvi
@ 2009-10-15  1:49   ` NeilBrown
  0 siblings, 0 replies; 3+ messages in thread
From: NeilBrown @ 2009-10-15  1:49 UTC (permalink / raw)
  To: Darius S. Naqvi; +Cc: linux-raid

On Thu, October 15, 2009 5:38 am, Darius S. Naqvi wrote:
> On Wed, 14 Oct 2009, Darius S. Naqvi wrote:
>
>> I.e., it seems that the ioctl invoked by --fail doesn't directly set
>> up the device to be ready for --remove, but some other kernel thread
>> completes that state change.  I'm wondering if it could be the case
>> that when the system is very, very busy, it could take long enough for
>> that kernel thread to run that it would cause what I see, i.e.,
>> --remove fails with EBUSY, even though I've already waited about 20
>> seconds for the device to be ready to be removed.  If this is so, what
>> shall I do?  Here are the options I can think of:
>
> Sorry to reply to my own posting.  It turns out that in this case,
> I've only waited 2.5 seconds.  This may affect the probability of my
> hunch being correct.

2.5 seconds certainly seems more believable than 20 seconds.
Waiting for the kernel thread to run is not the only cause for delay.
If there are any pending IO requests, you have to wait for all of those
to complete before the device can be removed from the array.
As error handling can take an arbitrarily long time, there can be
an arbitrary delay between a device being marked faulty and it being
able to be removed from the array.

So probably the best bet is simply to wait and retry as you are doing.
If I were to make it more deterministic, I would probably allow you to
'poll' or 'select' on the sysfs file /sys/block/mdX/md/dev-YYY/slot
and once that becomes 'none', the device can be removed.

NeilBrown

>
> --
> Darius S. Naqvi
> dnaqvi@datagardens.com
> http://www.datagardens.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-10-15  1:49 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-14 15:32 "mdadm --remove" fails if it is too soon after "mdadm --fail" Darius S. Naqvi
2009-10-14 18:38 ` Darius S. Naqvi
2009-10-15  1:49   ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox