From: Brett Russ <bruss@netezza.com>
To: linux-raid@vger.kernel.org
Subject: Re: bug/race in md causing device to wedge in busy state
Date: Tue, 22 Dec 2009 16:48:53 -0500 [thread overview]
Message-ID: <hgres6$20r$1@ger.gmane.org> (raw)
In-Reply-To: <4B2983AE.8020002@netezza.com>
On 12/16/2009 08:04 PM, Brett Russ wrote:
> I'm seeing cases where an attempted remove of a manually faulted disk
> from an existing RAID unit can fail with mdadm reporting "Device or
> resource busy". I've reduced the problem down to the smallest set that
> reliably reproduces the issue:
>
> Starting with 2 drives (a,b), each with at least 3 partitions:
> 1) create 3 raid1 md's on the drives using the 3 partitions
> 2) fault & remove drive b from each of the 3 md's
> 3) zero the superblock on b so it forgets where it came from (or use a
> third drive c...) and add drive b back to each of the 3 md's
> 4) fault & remove drive b from each of the 3 md's
>
> The problem was originally seen sporadically during the remove part of
> step 2, but is *very* reproducible in the remove part of step 4. I
> attribute this to the fact that there's guaranteed I/O happening during
> this step.
>
> Now here's the catch. If I change step 4 to:
> 4a) fault drive b from each of the 3 md's
> 4b) remove drive b from each of the 3 md's
> then the removes haven't yet been seen to fail with BUSY yet (i.e. no
> issues).
>
> But my scripts currently do this instead for each md:
> 4a) fault drive b from md
> 4b) sleep 0-10 seconds
> 4c) remove drive b md
> which will fail on the remove from one of the md's, almost guaranteed.
> It seems odd to me that no amount of sleeping in between these steps can
> allow me to reliably remove a faulted member of an array.
Neil et al,
Would you expect to see a dependency across md devices on the same
spindle which would affect a device remove like this?
I have to assume it's a bug since the condition doesn't clear up even
after removing the rest of the devices on the spindle, i.e. the
partition permanently reports busy.
-Brett
next prev parent reply other threads:[~2009-12-22 21:48 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-12-17 1:04 bug/race in md causing device to wedge in busy state Brett Russ
2009-12-22 21:48 ` Brett Russ [this message]
2009-12-23 23:12 ` Neil Brown
2010-01-08 15:18 ` Brett Russ
2010-01-29 14:50 ` Brett Russ
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='hgres6$20r$1@ger.gmane.org' \
--to=bruss@netezza.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.