linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* mdadm 3.1.1  fails to hot remove device - No such device or address
@ 2010-02-26 13:37 Foster_Brian
  2010-02-26 14:42 ` Robin Hill
  2010-02-26 20:42 ` Neil Brown
  0 siblings, 2 replies; 4+ messages in thread
From: Foster_Brian @ 2010-02-26 13:37 UTC (permalink / raw)
  To: linux-raid

Hi Neil,

We run mdadm in a NAS framework and recently updated to 3.1.1 after
using older revs for quite some time. We recently observed an issue
where we've been unable to hot remove a failed device from an array,
where that member device has been physically removed from the system.
'mdadm /dev/md# -r /dev/sdg#' returns a "No such device or address
error."

It turns out this occurs due to the dev_open() call added in the code
referenced below. The hot remove works as expected if we revert this
change with the patch shown below. Was the dev_open() added for some
functional reason I'm not aware of (i.e., are we now breaking some other
error path by doing this)? For future reference, is there a better way
to handle the situation where the member device is physically gone? Note
that we currently have a static set of devnodes; no udev or anything
like that. Thanks.

Brian

diff -urpN mdadm-3.1.1/Manage.c mdadm-3.1.1_b/Manage.c
--- mdadm-3.1.1/Manage.c	2009-11-19 00:13:29.000000000 -0500
+++ mdadm-3.1.1_b/Manage.c	2010-02-26 07:51:24.000000000 -0500
@@ -424,19 +424,12 @@ int Manage_subdevs(char *devname, int fd
 		} else {
 			j = 0;
 
-			tfd = dev_open(dv->devname, O_RDONLY);
-			if (tfd < 0 || fstat(tfd, &stb) != 0) {
-				fprintf(stderr, Name ": cannot find %s:
%s\n",
-					dv->devname, strerror(errno));
-				if (tfd >= 0)
-					close(tfd);
+			if (stat(dv->devname, &stb)) {
+				fprintf(stderr, Name ": cannot find %s:
%s\n", dv->devname, strerror(errno));
 				return 1;
 			}
-			close(tfd);
 			if ((stb.st_mode & S_IFMT) != S_IFBLK) {
-				fprintf(stderr, Name ": %s is not a "
-					"block device.\n",
-					dv->devname);
+				fprintf(stderr, Name ": %s is not a
block device.\n", dv->devname);
 				return 1;
 			}
 		}


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mdadm 3.1.1  fails to hot remove device - No such device or address
  2010-02-26 13:37 mdadm 3.1.1 fails to hot remove device - No such device or address Foster_Brian
@ 2010-02-26 14:42 ` Robin Hill
  2010-02-26 15:41   ` Foster_Brian
  2010-02-26 20:42 ` Neil Brown
  1 sibling, 1 reply; 4+ messages in thread
From: Robin Hill @ 2010-02-26 14:42 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1374 bytes --]

On Fri Feb 26, 2010 at 08:37:51AM -0500, Foster_Brian@emc.com wrote:

> Hi Neil,
> 
> We run mdadm in a NAS framework and recently updated to 3.1.1 after
> using older revs for quite some time. We recently observed an issue
> where we've been unable to hot remove a failed device from an array,
> where that member device has been physically removed from the system.
> 'mdadm /dev/md# -r /dev/sdg#' returns a "No such device or address
> error."
> 
> It turns out this occurs due to the dev_open() call added in the code
> referenced below. The hot remove works as expected if we revert this
> change with the patch shown below. Was the dev_open() added for some
> functional reason I'm not aware of (i.e., are we now breaking some other
> error path by doing this)? For future reference, is there a better way
> to handle the situation where the member device is physically gone? Note
> that we currently have a static set of devnodes; no udev or anything
> like that. Thanks.
> 
Does "-r failed" or "-r detached" not work?  That should be the easiest
way to remove failed/detached drives from the array.

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: mdadm 3.1.1  fails to hot remove device - No such device or address
  2010-02-26 14:42 ` Robin Hill
@ 2010-02-26 15:41   ` Foster_Brian
  0 siblings, 0 replies; 4+ messages in thread
From: Foster_Brian @ 2010-02-26 15:41 UTC (permalink / raw)
  To: linux-raid

> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of Robin Hill
> Sent: Friday, February 26, 2010 9:42 AM
> To: linux-raid@vger.kernel.org
> Subject: Re: mdadm 3.1.1 fails to hot remove device - No such device
or
> address
> 
> On Fri Feb 26, 2010 at 08:37:51AM -0500, Foster_Brian@emc.com wrote:
> 
> > Hi Neil,
> >
> > We run mdadm in a NAS framework and recently updated to 3.1.1 after
> > using older revs for quite some time. We recently observed an issue
> > where we've been unable to hot remove a failed device from an array,
> > where that member device has been physically removed from the
system.
> > 'mdadm /dev/md# -r /dev/sdg#' returns a "No such device or address
> > error."
> >
> > It turns out this occurs due to the dev_open() call added in the
code
> > referenced below. The hot remove works as expected if we revert this
> > change with the patch shown below. Was the dev_open() added for some
> > functional reason I'm not aware of (i.e., are we now breaking some
> > other error path by doing this)? For future reference, is there a
> > better way to handle the situation where the member device is
> > physically gone? Note that we currently have a static set of
> devnodes;
> > no udev or anything like that. Thanks.
> >
> Does "-r failed" or "-r detached" not work?  That should be the
easiest
> way to remove failed/detached drives from the array.
> 

Hmm, this sounds like what I'm missing. I'll try it when I have access
to the system again. Thanks!

Brian

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mdadm 3.1.1  fails to hot remove device - No such device or address
  2010-02-26 13:37 mdadm 3.1.1 fails to hot remove device - No such device or address Foster_Brian
  2010-02-26 14:42 ` Robin Hill
@ 2010-02-26 20:42 ` Neil Brown
  1 sibling, 0 replies; 4+ messages in thread
From: Neil Brown @ 2010-02-26 20:42 UTC (permalink / raw)
  To: Foster_Brian; +Cc: linux-raid

On Fri, 26 Feb 2010 08:37:51 -0500
Foster_Brian@emc.com wrote:

> Hi Neil,
> 
> We run mdadm in a NAS framework and recently updated to 3.1.1 after
> using older revs for quite some time. We recently observed an issue
> where we've been unable to hot remove a failed device from an array,
> where that member device has been physically removed from the system.
> 'mdadm /dev/md# -r /dev/sdg#' returns a "No such device or address
> error."
> 
> It turns out this occurs due to the dev_open() call added in the code
> referenced below. The hot remove works as expected if we revert this
> change with the patch shown below. Was the dev_open() added for some
> functional reason I'm not aware of (i.e., are we now breaking some other
> error path by doing this)? For future reference, is there a better way
> to handle the situation where the member device is physically gone? Note
> that we currently have a static set of devnodes; no udev or anything
> like that. Thanks.

Thanks for the report.  I have put it on my list of things to check before
releasing 3.1.2.  The switch to use 'dev_open' was to allow ->devname to be
e.g. "8:32" which is used by the spare-group code in Monitor.c.
However it is a regression and I will give some thought to fixing it.

However the "recommended" way of removing devices which have been detached is:
   mdadm /dev/md# -r detached

NeilBrown


> 
> Brian
> 
> diff -urpN mdadm-3.1.1/Manage.c mdadm-3.1.1_b/Manage.c
> --- mdadm-3.1.1/Manage.c	2009-11-19 00:13:29.000000000 -0500
> +++ mdadm-3.1.1_b/Manage.c	2010-02-26 07:51:24.000000000 -0500
> @@ -424,19 +424,12 @@ int Manage_subdevs(char *devname, int fd
>  		} else {
>  			j = 0;
>  
> -			tfd = dev_open(dv->devname, O_RDONLY);
> -			if (tfd < 0 || fstat(tfd, &stb) != 0) {
> -				fprintf(stderr, Name ": cannot find %s:
> %s\n",
> -					dv->devname, strerror(errno));
> -				if (tfd >= 0)
> -					close(tfd);
> +			if (stat(dv->devname, &stb)) {
> +				fprintf(stderr, Name ": cannot find %s:
> %s\n", dv->devname, strerror(errno));
>  				return 1;
>  			}
> -			close(tfd);
>  			if ((stb.st_mode & S_IFMT) != S_IFBLK) {
> -				fprintf(stderr, Name ": %s is not a "
> -					"block device.\n",
> -					dv->devname);
> +				fprintf(stderr, Name ": %s is not a
> block device.\n", dv->devname);
>  				return 1;
>  			}
>  		}
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-02-26 20:42 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-26 13:37 mdadm 3.1.1 fails to hot remove device - No such device or address Foster_Brian
2010-02-26 14:42 ` Robin Hill
2010-02-26 15:41   ` Foster_Brian
2010-02-26 20:42 ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).