* mdadm 3.1.1 fails to hot remove device - No such device or address
@ 2010-02-26 13:37 Foster_Brian
2010-02-26 14:42 ` Robin Hill
2010-02-26 20:42 ` Neil Brown
0 siblings, 2 replies; 4+ messages in thread
From: Foster_Brian @ 2010-02-26 13:37 UTC (permalink / raw)
To: linux-raid
Hi Neil,
We run mdadm in a NAS framework and recently updated to 3.1.1 after
using older revs for quite some time. We recently observed an issue
where we've been unable to hot remove a failed device from an array,
where that member device has been physically removed from the system.
'mdadm /dev/md# -r /dev/sdg#' returns a "No such device or address
error."
It turns out this occurs due to the dev_open() call added in the code
referenced below. The hot remove works as expected if we revert this
change with the patch shown below. Was the dev_open() added for some
functional reason I'm not aware of (i.e., are we now breaking some other
error path by doing this)? For future reference, is there a better way
to handle the situation where the member device is physically gone? Note
that we currently have a static set of devnodes; no udev or anything
like that. Thanks.
Brian
diff -urpN mdadm-3.1.1/Manage.c mdadm-3.1.1_b/Manage.c
--- mdadm-3.1.1/Manage.c 2009-11-19 00:13:29.000000000 -0500
+++ mdadm-3.1.1_b/Manage.c 2010-02-26 07:51:24.000000000 -0500
@@ -424,19 +424,12 @@ int Manage_subdevs(char *devname, int fd
} else {
j = 0;
- tfd = dev_open(dv->devname, O_RDONLY);
- if (tfd < 0 || fstat(tfd, &stb) != 0) {
- fprintf(stderr, Name ": cannot find %s:
%s\n",
- dv->devname, strerror(errno));
- if (tfd >= 0)
- close(tfd);
+ if (stat(dv->devname, &stb)) {
+ fprintf(stderr, Name ": cannot find %s:
%s\n", dv->devname, strerror(errno));
return 1;
}
- close(tfd);
if ((stb.st_mode & S_IFMT) != S_IFBLK) {
- fprintf(stderr, Name ": %s is not a "
- "block device.\n",
- dv->devname);
+ fprintf(stderr, Name ": %s is not a
block device.\n", dv->devname);
return 1;
}
}
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: mdadm 3.1.1 fails to hot remove device - No such device or address
2010-02-26 13:37 mdadm 3.1.1 fails to hot remove device - No such device or address Foster_Brian
@ 2010-02-26 14:42 ` Robin Hill
2010-02-26 15:41 ` Foster_Brian
2010-02-26 20:42 ` Neil Brown
1 sibling, 1 reply; 4+ messages in thread
From: Robin Hill @ 2010-02-26 14:42 UTC (permalink / raw)
To: linux-raid
[-- Attachment #1: Type: text/plain, Size: 1374 bytes --]
On Fri Feb 26, 2010 at 08:37:51AM -0500, Foster_Brian@emc.com wrote:
> Hi Neil,
>
> We run mdadm in a NAS framework and recently updated to 3.1.1 after
> using older revs for quite some time. We recently observed an issue
> where we've been unable to hot remove a failed device from an array,
> where that member device has been physically removed from the system.
> 'mdadm /dev/md# -r /dev/sdg#' returns a "No such device or address
> error."
>
> It turns out this occurs due to the dev_open() call added in the code
> referenced below. The hot remove works as expected if we revert this
> change with the patch shown below. Was the dev_open() added for some
> functional reason I'm not aware of (i.e., are we now breaking some other
> error path by doing this)? For future reference, is there a better way
> to handle the situation where the member device is physically gone? Note
> that we currently have a static set of devnodes; no udev or anything
> like that. Thanks.
>
Does "-r failed" or "-r detached" not work? That should be the easiest
way to remove failed/detached drives from the array.
Cheers,
Robin
--
___
( ' } | Robin Hill <robin@robinhill.me.uk> |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" |
[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: mdadm 3.1.1 fails to hot remove device - No such device or address
2010-02-26 14:42 ` Robin Hill
@ 2010-02-26 15:41 ` Foster_Brian
0 siblings, 0 replies; 4+ messages in thread
From: Foster_Brian @ 2010-02-26 15:41 UTC (permalink / raw)
To: linux-raid
> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of Robin Hill
> Sent: Friday, February 26, 2010 9:42 AM
> To: linux-raid@vger.kernel.org
> Subject: Re: mdadm 3.1.1 fails to hot remove device - No such device
or
> address
>
> On Fri Feb 26, 2010 at 08:37:51AM -0500, Foster_Brian@emc.com wrote:
>
> > Hi Neil,
> >
> > We run mdadm in a NAS framework and recently updated to 3.1.1 after
> > using older revs for quite some time. We recently observed an issue
> > where we've been unable to hot remove a failed device from an array,
> > where that member device has been physically removed from the
system.
> > 'mdadm /dev/md# -r /dev/sdg#' returns a "No such device or address
> > error."
> >
> > It turns out this occurs due to the dev_open() call added in the
code
> > referenced below. The hot remove works as expected if we revert this
> > change with the patch shown below. Was the dev_open() added for some
> > functional reason I'm not aware of (i.e., are we now breaking some
> > other error path by doing this)? For future reference, is there a
> > better way to handle the situation where the member device is
> > physically gone? Note that we currently have a static set of
> devnodes;
> > no udev or anything like that. Thanks.
> >
> Does "-r failed" or "-r detached" not work? That should be the
easiest
> way to remove failed/detached drives from the array.
>
Hmm, this sounds like what I'm missing. I'll try it when I have access
to the system again. Thanks!
Brian
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: mdadm 3.1.1 fails to hot remove device - No such device or address
2010-02-26 13:37 mdadm 3.1.1 fails to hot remove device - No such device or address Foster_Brian
2010-02-26 14:42 ` Robin Hill
@ 2010-02-26 20:42 ` Neil Brown
1 sibling, 0 replies; 4+ messages in thread
From: Neil Brown @ 2010-02-26 20:42 UTC (permalink / raw)
To: Foster_Brian; +Cc: linux-raid
On Fri, 26 Feb 2010 08:37:51 -0500
Foster_Brian@emc.com wrote:
> Hi Neil,
>
> We run mdadm in a NAS framework and recently updated to 3.1.1 after
> using older revs for quite some time. We recently observed an issue
> where we've been unable to hot remove a failed device from an array,
> where that member device has been physically removed from the system.
> 'mdadm /dev/md# -r /dev/sdg#' returns a "No such device or address
> error."
>
> It turns out this occurs due to the dev_open() call added in the code
> referenced below. The hot remove works as expected if we revert this
> change with the patch shown below. Was the dev_open() added for some
> functional reason I'm not aware of (i.e., are we now breaking some other
> error path by doing this)? For future reference, is there a better way
> to handle the situation where the member device is physically gone? Note
> that we currently have a static set of devnodes; no udev or anything
> like that. Thanks.
Thanks for the report. I have put it on my list of things to check before
releasing 3.1.2. The switch to use 'dev_open' was to allow ->devname to be
e.g. "8:32" which is used by the spare-group code in Monitor.c.
However it is a regression and I will give some thought to fixing it.
However the "recommended" way of removing devices which have been detached is:
mdadm /dev/md# -r detached
NeilBrown
>
> Brian
>
> diff -urpN mdadm-3.1.1/Manage.c mdadm-3.1.1_b/Manage.c
> --- mdadm-3.1.1/Manage.c 2009-11-19 00:13:29.000000000 -0500
> +++ mdadm-3.1.1_b/Manage.c 2010-02-26 07:51:24.000000000 -0500
> @@ -424,19 +424,12 @@ int Manage_subdevs(char *devname, int fd
> } else {
> j = 0;
>
> - tfd = dev_open(dv->devname, O_RDONLY);
> - if (tfd < 0 || fstat(tfd, &stb) != 0) {
> - fprintf(stderr, Name ": cannot find %s:
> %s\n",
> - dv->devname, strerror(errno));
> - if (tfd >= 0)
> - close(tfd);
> + if (stat(dv->devname, &stb)) {
> + fprintf(stderr, Name ": cannot find %s:
> %s\n", dv->devname, strerror(errno));
> return 1;
> }
> - close(tfd);
> if ((stb.st_mode & S_IFMT) != S_IFBLK) {
> - fprintf(stderr, Name ": %s is not a "
> - "block device.\n",
> - dv->devname);
> + fprintf(stderr, Name ": %s is not a
> block device.\n", dv->devname);
> return 1;
> }
> }
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2010-02-26 20:42 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-26 13:37 mdadm 3.1.1 fails to hot remove device - No such device or address Foster_Brian
2010-02-26 14:42 ` Robin Hill
2010-02-26 15:41 ` Foster_Brian
2010-02-26 20:42 ` Neil Brown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).