* [Patch mdadm] Add hot-unplug support to mdadm
@ 2010-04-05 16:40 Doug Ledford
2010-04-06 16:26 ` Doug Ledford
2010-04-07 1:30 ` Neil Brown
0 siblings, 2 replies; 17+ messages in thread
From: Doug Ledford @ 2010-04-05 16:40 UTC (permalink / raw)
To: Linux RAID Mailing List, Neil Brown, Dan Williams
[-- Attachment #1.1: Type: text/plain, Size: 1790 bytes --]
We have incremental mode for supporting hot plug of devices. However,
we don't support hot-unplug. This is something we need in order to
enable automatic device replacement. When implementing the support
though, I found that we had a problem in that the device special file is
already gone (well, not gone, but both open and lstat fail on the device
special file, which really makes Manage_subdevs unhappy) by the time
udev calls us to remove the device (even if I created a new udev rules
file with a very low number this was still true). So, along with adding
the code shamelessly stolen from Hawrylewicz with modifications to make
it work on non-container based arrays, I had to modify Manage_subdevs to
work on sysfs subdevice entries in the fail and remove cases. Anyway,
with this in place, and with the modified udev rules file in place,
hot-unplug now works (or at least it does in my testing, I didn't try a
container device, I'll leave tweaking that to Dan or someone else from
Intel, although it should more or less work, I don't know the Intel
metadata rules and so didn't try to make it work myself). And by
changing Manage_subdevs to use either a valid device special file or
sysfs entries, it will work on the command line with virtually any
kernel, and work from a udev rules file on kernels recent enough to have
the state entry in sysfs subdevs directories.
Oh, and Neil should review my man page additions to see if they are
sufficient for his tastes. I didn't get real wordy about the support,
it's pretty straight forward.
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #1.2: mdadm-decremental.patch --]
[-- Type: text/plain, Size: 20289 bytes --]
commit 3212dcca06f81017ec9a50c7b5d4eeb34bedaab6
Author: Doug Ledford <dledford@redhat.com>
Date: Mon Apr 5 12:32:08 2010 -0400
Initial implementation of incremental remove support
Signed-off-by: Doug Ledford <dledford@redhat.com>
diff --git a/Incremental.c b/Incremental.c
index 7ad648a..d32a8e5 100644
--- a/Incremental.c
+++ b/Incremental.c
@@ -843,3 +843,39 @@ int Incremental_container(struct supertype *st, char *devname, int verbose,
map_unlock(&map);
return 0;
}
+
+/*
+ * IncrementalRemove - Attempt to see if the passed in device belongs to any
+ * raid arrays, and if so first fail (if needed) and then remove the device.
+ *
+ * @devname - The device we want to remove
+ *
+ * Special note: We would like to just use Managedevs to fail/remove the
+ * device, but unfortunately, by the time we are called via udev, the device
+ * special file is already gone, and so we can't stat the device and se we
+ * don't have the right rdev value to use in the ioctls. So, we use the
+ * sysfs method of device removal instead, but since that's not gauranteed
+ * to work depending on the version of kernel we run on, try to use the
+ * ioctl method first and only fallback if we don't have a valid device
+ * special file. That way we can support operation manually on older kernels
+ * even if we won't be able to do this automatically via udev on older
+ * kernels.
+ */
+int IncrementalRemove(char *devname, int verbose)
+{
+ char mddev[100] = "/dev/";
+ int mdfd;
+ struct mddev_dev_s devlist;
+
+ strncpy(mddev + 5, devname, sizeof(mddev) - 5);
+ if (mdstat_check_active(mddev + 5))
+ return 1;
+ if ((mdfd = open_mddev(mddev, 0)) < 0)
+ return 1;
+ memset(&devlist, 0, sizeof(devlist));
+ devlist.devname = devname;
+ devlist.disposition = 'f';
+ Manage_subdevs(mddev, mdfd, &devlist, verbose);
+ devlist.disposition = 'r';
+ return Manage_subdevs(mddev, mdfd, &devlist, verbose);
+}
diff --git a/Manage.c b/Manage.c
index f848d8b..6539eda 100644
--- a/Manage.c
+++ b/Manage.c
@@ -346,6 +346,9 @@ int Manage_subdevs(char *devname, int fd,
mdu_disk_info_t disc;
unsigned long long array_size;
mddev_dev_t dv, next = NULL;
+ struct mdinfo *mdi = NULL;
+ struct mdinfo *dev = NULL;
+ char sys_name[20] = "dev-";
struct stat stb;
int j, jnext = 0;
int tfd;
@@ -443,16 +446,43 @@ int Manage_subdevs(char *devname, int fd,
if (jnext == 0)
continue;
} else {
+ /*
+ * For fail/remove operations, allow the disk
+ * to be completely missing, use name matching
+ * to a device in our sysfs entries to
+ * suffice. For add we need a valid block device.
+ * Leave this loop one of three ways:
+ * 1) tfd < 0 and dev is set to our device
+ * 2) tfd >= 0 and dev is NULL
+ * 3) failed to find suitable device and return
+ */
j = 0;
tfd = dev_open(dv->devname, O_RDONLY);
- if (tfd < 0 && dv->disposition == 'r' &&
- lstat(dv->devname, &stb) == 0)
- /* Be happy, the lstat worked, that is
- * enough for --remove
- */
- ;
- else {
+ if (tfd < 0 && dv->disposition != 'a') {
+ strcpy(&sys_name[4],
+ strrchr(dv->devname, '/') + 1);
+ mdi = sysfs_read(fd, 0,
+ GET_DEVS | KEEP_GONE_DEVS);
+ if (!mdi) {
+ fprintf(stderr, Name ": can't open %s "
+ "and can't read sysfs info\n",
+ dv->devname);
+ return 1;
+ }
+ for (dev = mdi->devs; dev; dev = dev->next) {
+ if (strcmp(sys_name, dev->sys_name))
+ continue;
+ break;
+ }
+ if (!dev) {
+ fprintf(stderr, Name ": can't open %s "
+ "and %s not listed in sysfs\n",
+ dv->devname, sys_name);
+ sysfs_free(mdi);
+ return 1;
+ }
+ } else {
if (tfd < 0 || fstat(tfd, &stb) != 0) {
fprintf(stderr, Name ": cannot find %s: %s\n",
dv->devname, strerror(errno));
@@ -461,12 +491,12 @@ int Manage_subdevs(char *devname, int fd,
return 1;
}
close(tfd);
- }
- if ((stb.st_mode & S_IFMT) != S_IFBLK) {
- fprintf(stderr, Name ": %s is not a "
- "block device.\n",
- dv->devname);
- return 1;
+ if ((stb.st_mode & S_IFMT) != S_IFBLK) {
+ fprintf(stderr, Name ": %s is not a "
+ "block device.\n",
+ dv->devname);
+ return 1;
+ }
}
}
switch(dv->disposition){
@@ -790,26 +820,36 @@ int Manage_subdevs(char *devname, int fd,
return 1;
}
}
- /* FIXME check that it is a current member */
- err = ioctl(fd, HOT_REMOVE_DISK, (unsigned long)stb.st_rdev);
- if (err && errno == ENODEV) {
+ /* stb.st_rdev is only valid if we have a tfd that
+ * does not indicate an error on attempt to open
+ * the devname
+ */
+ if (tfd >= 0)
+ err = ioctl(fd, HOT_REMOVE_DISK,
+ (unsigned long)stb.st_rdev);
+ if (tfd < 0 || (err && errno == ENODEV)) {
/* Old kernels rejected this if no personality
* registered */
- struct mdinfo *sra = sysfs_read(fd, 0, GET_DEVS);
- struct mdinfo *dv = NULL;
- if (sra)
- dv = sra->devs;
- for ( ; dv ; dv=dv->next)
- if (dv->disk.major == major(stb.st_rdev) &&
- dv->disk.minor == minor(stb.st_rdev))
+ if (!mdi) {
+ strcpy(&sys_name[4],
+ strrchr(dv->devname, '/') + 1);
+ mdi = sysfs_read(fd, 0, GET_DEVS |
+ KEEP_GONE_DEVS);
+ if (mdi)
+ dev = mdi->devs;
+ for ( ; dev ; dev=dev->next) {
+ if (strcmp(sys_name, dev->sys_name))
+ continue;
break;
- if (dv)
- err = sysfs_set_str(sra, dv,
+ }
+ }
+ if (dev)
+ err = sysfs_set_str(mdi, dev,
"state", "remove");
else
err = -1;
- if (sra)
- sysfs_free(sra);
+ if (mdi)
+ sysfs_free(mdi);
}
if (err) {
fprintf(stderr, Name ": hot remove failed "
@@ -844,11 +884,18 @@ int Manage_subdevs(char *devname, int fd,
case 'f': /* set faulty */
/* FIXME check current member */
- if (ioctl(fd, SET_DISK_FAULTY, (unsigned long) stb.st_rdev)) {
+ if ((tfd >= 0 && ioctl(fd, SET_DISK_FAULTY,
+ (unsigned long) stb.st_rdev)) ||
+ (tfd < 0 && sysfs_set_str(mdi, dev, "state",
+ "faulty"))) {
fprintf(stderr, Name ": set device faulty failed for %s: %s\n",
dnprintable, strerror(errno));
+ if (mdi)
+ sysfs_free(mdi);
return 1;
- }
+ }
+ if (mdi)
+ sysfs_free(mdi);
if (verbose >= 0)
fprintf(stderr, Name ": set %s faulty in %s\n",
dnprintable, devname);
diff --git a/ReadMe.c b/ReadMe.c
index ba2215c..acaeace 100644
--- a/ReadMe.c
+++ b/ReadMe.c
@@ -86,11 +86,12 @@ char Version[] = Name " - v3.1.2-dledford - 30th March 2010\n";
* At the time if writing, there is only minimal support.
*/
-char short_options[]="-ABCDEFGIQhVXWZ:vqbc:i:l:p:m:n:x:u:c:d:z:U:N:sarfRSow1tye:";
+char short_options[]=
+ "-ABCDEFGIQhVXWZ:vqbc:i:l:p:m:n:x:u:c:d:z:U:N:sarfRSow1tye:";
char short_bitmap_options[]=
- "-ABCDEFGIQhVXWZ:vqb:c:i:l:p:m:n:x:u:c:d:z:U:N:sarfRSow1tye:";
+ "-ABCDEFGIQhVXWZ:vqb:c:i:l:p:m:n:x:u:c:d:z:U:N:sarfRSow1tye:";
char short_bitmap_auto_options[]=
- "-ABCDEFGIQhVXWZ:vqb:c:i:l:p:m:n:x:u:c:d:z:U:N:sa:rfRSow1tye:";
+ "-ABCDEFGIQhVXWZ:vqb:c:i:l:p:m:n:x:u:c:d:z:U:N:sa:rfRSow1tye:";
struct option long_options[] = {
{"manage", 0, 0, '@'},
@@ -213,7 +214,7 @@ char Help[] =
" mdadm --grow options device\n"
" resize/reshape an active array\n"
" mdadm --incremental device\n"
-" add a device to an array as appropriate\n"
+" add/remove a device to/from an array as appropriate\n"
" mdadm --monitor options...\n"
" Monitor one or more array for significant changes.\n"
" mdadm device options...\n"
@@ -535,20 +536,26 @@ char Help_grow[] =
;
char Help_incr[] =
-"Usage: mdadm --incremental [-Rqrs] device\n"
+"Usage: mdadm --incremental [-Rqrsf] device\n"
"\n"
"This usage allows for incremental assembly of md arrays. Devices can be\n"
"added one at a time as they are discovered. Once an array has all expected\n"
"devices, it will be started.\n"
"\n"
-"Options that are valid with incremental assembly (-I --incremental) more are:\n"
-" --run -R : run arrays as soon as a minimal number of devices are\n"
+"Optionally, the process can be reversed by using the fail option.\n"
+"When fail mode is invoked, mdadm will see if the device belongs to an array\n"
+"and then both fail (if needed) and remove the device from that array.\n"
+"\n"
+"Options that are valid with incremental assembly (-I --incremental) are:\n"
+" --run -R : Run arrays as soon as a minimal number of devices are\n"
" : present rather than waiting for all expected.\n"
" --quiet -q : Don't print any information messages, just errors.\n"
" --rebuild -r : Rebuild the 'map' file that mdadm uses for tracking\n"
" : partial arrays.\n"
" --scan -s : Use with -R to start any arrays that have the minimal\n"
" : required number of devices, but are not yet started.\n"
+" --fail -f : First fail (if needed) and then remove device from\n"
+" : any array that it is a member of.\n"
;
char Help_config[] =
diff --git a/mdadm.8 b/mdadm.8
index 4edfc41..eaf9155 100644
--- a/mdadm.8
+++ b/mdadm.8
@@ -135,7 +135,11 @@ This provides a convenient interface to a
.I hot-plug
system. As each device is detected,
.I mdadm
-has a chance to include it in some array as appropriate.
+has a chance to include it in some array as appropriate. Optionally,
+with the
+.I \-\-fail
+flag is passed in then we will remove the device from any active array
+instead of adding it.
If a
.B CONTAINER
@@ -189,7 +193,7 @@ Change the size or shape of an active array.
.TP
.BR \-I ", " \-\-incremental
-Add a single device into an appropriate array, and possibly start the array.
+Add/remove a single device to/from an appropriate array, and possibly start the array.
.TP
.B \-\-auto-detect
@@ -1235,6 +1239,12 @@ in
.B mdadm.conf
as requiring an external bitmap, that bitmap will be attached first.
+.TP
+.BR \-\-fail ", " \-f
+This allows the hot-plug system to remove devices that have fully disappeared
+from the kernel. It will first fail and then remove the device from any
+array it belongs to.
+
.SH For Monitor mode:
.TP
.BR \-m ", " \-\-mail
@@ -2141,6 +2151,10 @@ Usage:
.I component-device
.HP 12
Usage:
+.B mdadm \-\-incremental \-\-fail
+.I component-device
+.HP 12
+Usage:
.B mdadm \-\-incremental \-\-rebuild
.HP 12
Usage:
@@ -2153,6 +2167,11 @@ passed to
.B "mdadm \-\-incremental"
to be conditionally added to an appropriate array.
+Conversely, it can also be used with the
+.B \-\-fail
+flag to do just the opposite and find whatever array a particular device
+is part of and remove the device from the array.
+
If the device passed is a
.B CONTAINER
device created by a previous call to
diff --git a/mdadm.c b/mdadm.c
index d5e34c0..cd6fd8f 100644
--- a/mdadm.c
+++ b/mdadm.c
@@ -124,6 +124,7 @@ int main(int argc, char *argv[])
ident.name[0] = 0;
ident.container = NULL;
ident.member = NULL;
+ ident.member_index = -1;
while ((option_index = -1) ,
(opt=getopt_long(argc, argv,
@@ -774,6 +775,9 @@ int main(int argc, char *argv[])
devmode = 'r';
continue;
case O(MANAGE,'f'): /* set faulty */
+ case O(INCREMENTAL,'f'): /* r for incremental is taken, use f
+ * even though we will both fail and
+ * remove the device */
devmode = 'f';
continue;
case O(INCREMENTAL,'R'):
@@ -1517,6 +1521,11 @@ int main(int argc, char *argv[])
": --incremental --scan meaningless without --run.\n");
break;
}
+ if (devmode == 'f') {
+ fprintf(stderr, Name
+ ": --incremental --scan --fail not supported.\n");
+ break;
+ }
rv = IncrementalScan(verbose);
}
if (!devlist) {
@@ -1533,6 +1542,10 @@ int main(int argc, char *argv[])
rv = 1;
break;
}
+ if (devmode == 'f') {
+ rv = IncrementalRemove(devlist->devname, verbose-quiet);
+ break;
+ }
rv = Incremental(devlist->devname, verbose-quiet, runstop,
ss, homehost, require_homehost, autof);
break;
diff --git a/mdadm.h b/mdadm.h
index d8ab85f..c113d0f 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -315,6 +315,7 @@ typedef struct mddev_ident_s {
* of some other entry.
*/
char *member; /* subarray within a container */
+ int member_index; /* subarray index within a container */
struct mddev_ident_s *next;
union {
@@ -355,6 +356,10 @@ struct mdstat_ent {
int raid_disks;
int chunk_size;
char * metadata_version;
+ struct dev_member {
+ char *name;
+ struct dev_member *next;
+ } *members;
struct mdstat_ent *next;
};
@@ -363,6 +368,7 @@ extern void free_mdstat(struct mdstat_ent *ms);
extern void mdstat_wait(int seconds);
extern void mdstat_wait_fd(int fd, const sigset_t *sigmask);
extern int mddev_busy(int devnum);
+extern int mdstat_check_active(char *devname);
struct map_ent {
struct map_ent *next;
@@ -404,6 +410,7 @@ enum sysfs_read_flags {
GET_STATE = (1 << 13),
GET_ERROR = (1 << 14),
SKIP_GONE_DEVS = (1 << 15),
+ KEEP_GONE_DEVS = (1 << 16),
};
/* If fd >= 0, get the array it is open on,
@@ -817,6 +824,7 @@ extern int Incremental_container(struct supertype *st, char *devname,
int trustworthy);
extern void RebuildMap(void);
extern int IncrementalScan(int verbose);
+extern int IncrementalRemove(char *devname, int verbose);
extern int CreateBitmap(char *filename, int force, char uuid[16],
unsigned long chunksize, unsigned long daemon_sleep,
diff --git a/mdstat.c b/mdstat.c
index 4a9f370..81d2212 100644
--- a/mdstat.c
+++ b/mdstat.c
@@ -83,6 +83,45 @@
#include <sys/select.h>
#include <ctype.h>
+static void free_member_devnames(struct dev_member **m)
+{
+ struct dev_member *t;
+ if (!*m)
+ return;
+ while(*m) {
+ t = *m;
+ *m = (*m)->next;
+ if (t->name)
+ free(t->name);
+ free(t);
+ }
+ *m = NULL;
+}
+
+static struct dev_member *add_member_devname(struct dev_member **m, char *name)
+{
+ struct dev_member *new;
+ char *t;
+
+ if (!m || !name)
+ return NULL;
+
+ new = malloc(sizeof(*new));
+ if (!new)
+ return NULL;
+ if ((t = strchr(name, '[')) == NULL)
+ {
+ /* not a device */
+ free(new);
+ return *m;
+ }
+ new->name = strndup(name, t - name);
+ new->next = *m;
+ *m = new;
+
+ return new;
+}
+
void free_mdstat(struct mdstat_ent *ms)
{
while (ms) {
@@ -91,6 +130,7 @@ void free_mdstat(struct mdstat_ent *ms)
if (ms->level) free(ms->level);
if (ms->pattern) free(ms->pattern);
if (ms->metadata_version) free(ms->metadata_version);
+ if (ms->members) free_member_devnames(&ms->members);
t = ms;
ms = ms->next;
free(t);
@@ -159,6 +199,7 @@ struct mdstat_ent *mdstat_read(int hold, int start)
ent->raid_disks = 0;
ent->chunk_size = 0;
ent->devcnt = 0;
+ ent->members = NULL;
ent->dev = strdup(line);
ent->devnum = devnum;
@@ -170,15 +211,23 @@ struct mdstat_ent *mdstat_read(int hold, int start)
ent->active = 1;
else if (strcmp(w, "inactive")==0)
ent->active = 0;
- else if (ent->active >=0 &&
+ else if (ent->active > 0 &&
ent->level == NULL &&
w[0] != '(' /*readonly*/) {
ent->level = strdup(w);
in_devs = 1;
} else if (in_devs && strcmp(w, "blocks")==0)
in_devs = 0;
- else if (in_devs) {
+ else if (in_devs || (ent->active == 0 && w[0] != '(' &&
+ w[l - 1] == ')')) {
+ if (isdigit(w[0]))
+ continue;
+ in_devs = 1;
ent->devcnt++;
+ if (!add_member_devname(&ent->members, w)) {
+ free_mdstat(ent);
+ break;
+ }
if (strncmp(w, "md", 2)==0) {
/* This has an md device as a component.
* If that device is already in the
@@ -310,3 +359,40 @@ int mddev_busy(int devnum)
free_mdstat(mdstat);
return me != NULL;
}
+
+/*
+ * Finds name of the active array holding this device
+ * @param[in] devname name of member device
+ * @param[out] devname name of array
+ *
+ * @return found (0), or
+ * not found, failure (1)
+ */
+
+int mdstat_check_active(char *devname)
+{
+ struct mdstat_ent *mdstat = mdstat_read(0, 0);
+ struct mdstat_ent *ent;
+ char *name;
+
+ if (!devname)
+ return 1;
+ name = strrchr(devname, '/');
+ if (name++ == NULL)
+ return 1;
+
+ for (ent = mdstat; ent; ent = ent->next) {
+ struct dev_member *m;
+ if (ent->active && (strstr(ent->metadata_version,"imsm") ||
+ strstr(ent->metadata_version,"ddf")))
+ /* only return container matches, not subarrays */
+ continue;
+ for (m = ent->members; m; m = m->next) {
+ if (!strcmp(m->name, name)) {
+ strcpy(devname, ent->dev);
+ return 0;
+ }
+ }
+ }
+ return 1;
+}
diff --git a/sysfs.c b/sysfs.c
index ebf9d8a..65dd848 100644
--- a/sysfs.c
+++ b/sysfs.c
@@ -273,13 +273,16 @@ struct mdinfo *sysfs_read(int fd, int devnum, unsigned long options)
strcpy(dbase, "block/dev");
if (load_sys(fname, buf)) {
- free(dev);
- if (options & SKIP_GONE_DEVS)
+ if (options & SKIP_GONE_DEVS) {
+ free(dev);
continue;
- else
+ } else if (options & KEEP_GONE_DEVS) {
+ dev->disk.major = dev->disk.minor = -1;
+ } else
goto abort;
- }
- sscanf(buf, "%d:%d", &dev->disk.major, &dev->disk.minor);
+ } else
+ sscanf(buf, "%d:%d", &dev->disk.major,
+ &dev->disk.minor);
/* special case check for block devices that can go 'offline' */
if (options & SKIP_GONE_DEVS) {
diff --git a/udev-md-raid.rules b/udev-md-raid.rules
index c9a4f0e..aff14fa 100644
--- a/udev-md-raid.rules
+++ b/udev-md-raid.rules
@@ -1,17 +1,16 @@
# do not edit this file, it will be overwritten on update
SUBSYSTEM!="block", GOTO="md_end"
-ACTION!="add|change", GOTO="md_end"
+ACTION!="add|change|remove", GOTO="md_end"
+ACTION=="remove", GOTO="md_remove"
ACTION=="change", GOTO="md_no_incr"
-# import data from a raid member and activate it
-#ENV{ID_FS_TYPE}=="linux_raid_member", IMPORT{program}="/sbin/mdadm --examine --export $tempnode", RUN+="/sbin/mdadm --incremental $env{DEVNAME}"
-# import data from a raid set
+# we are adding a raid member, activate it
+ENV{ID_FS_TYPE}=="linux_raid_member", RUN+="/sbin/mdadm -I $env{DEVNAME}"
LABEL="md_no_incr"
KERNEL!="md*", GOTO="md_end"
-# partitions have no md/{array_state,metadata_version}, but should not
-# for that reason be ignored.
+# partitions have no md/{array_state,metadata_version}
ENV{DEVTYPE}=="partition", GOTO="md_ignore_state"
# container devices have a metadata version of e.g. 'external:ddf' and
@@ -32,7 +31,12 @@ ENV{DEVTYPE}=="partition", ENV{MD_DEVNAME}=="*[0-9]", SYMLINK+="md/$env{MD_DEVNA
IMPORT{program}="/sbin/blkid -o udev -p $tempnode"
OPTIONS+="link_priority=100"
+OPTIONS+="watch"
ENV{ID_FS_USAGE}=="filesystem|other|crypto", ENV{ID_FS_UUID_ENC}=="?*", SYMLINK+="disk/by-uuid/$env{ID_FS_UUID_ENC}"
ENV{ID_FS_USAGE}=="filesystem|other", ENV{ID_FS_LABEL_ENC}=="?*", SYMLINK+="disk/by-label/$env{ID_FS_LABEL_ENC}"
+GOTO="md_end"
+
+LABEL="md_remove"
+ENV{ID_FS_TYPE}=="linux_raid_member", RUN+="/sbin/mdadm -If $env{DEVNAME}"
LABEL="md_end"
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [Patch mdadm] Add hot-unplug support to mdadm
2010-04-05 16:40 [Patch mdadm] Add hot-unplug support to mdadm Doug Ledford
@ 2010-04-06 16:26 ` Doug Ledford
2010-04-07 1:30 ` Neil Brown
1 sibling, 0 replies; 17+ messages in thread
From: Doug Ledford @ 2010-04-06 16:26 UTC (permalink / raw)
To: Linux RAID Mailing List, Neil Brown, Dan Williams
[-- Attachment #1.1: Type: text/plain, Size: 2577 bytes --]
On 04/05/2010 12:40 PM, Doug Ledford wrote:
> We have incremental mode for supporting hot plug of devices. However,
> we don't support hot-unplug. This is something we need in order to
> enable automatic device replacement. When implementing the support
> though, I found that we had a problem in that the device special file is
> already gone (well, not gone, but both open and lstat fail on the device
> special file, which really makes Manage_subdevs unhappy) by the time
> udev calls us to remove the device (even if I created a new udev rules
> file with a very low number this was still true). So, along with adding
> the code shamelessly stolen from Hawrylewicz with modifications to make
> it work on non-container based arrays, I had to modify Manage_subdevs to
> work on sysfs subdevice entries in the fail and remove cases. Anyway,
> with this in place, and with the modified udev rules file in place,
> hot-unplug now works (or at least it does in my testing, I didn't try a
> container device, I'll leave tweaking that to Dan or someone else from
> Intel, although it should more or less work, I don't know the Intel
> metadata rules and so didn't try to make it work myself). And by
> changing Manage_subdevs to use either a valid device special file or
> sysfs entries, it will work on the command line with virtually any
> kernel, and work from a udev rules file on kernels recent enough to have
> the state entry in sysfs subdevs directories.
>
> Oh, and Neil should review my man page additions to see if they are
> sufficient for his tastes. I didn't get real wordy about the support,
> it's pretty straight forward.
>
Minor incremental fixup: In the case of passing in faulty or
disconnected as the device name, since we now use the value of tfd to
determine if we should attempt ioctls or go straight to using sysfs
entries, we now need to make sure we init tdf and then set it properly
in both of the loops where we check for faulty and disconnected devices
(although I'm now highly suspicious of the faulty check code as I
suspect all the faulty devices will have the same problem that our hot
unplug code ran into and the faulty devices will not be openable and
that will mean that passing in faulty is probably just broken at this
point in time...but that's another patch for another day).
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #1.2: mdadm-3.1.2-decremental-2.patch --]
[-- Type: text/plain, Size: 1028 bytes --]
commit ddda49f6daf481471334f5675fbc1fa149d83ad6
Author: Doug Ledford <dledford@redhat.com>
Date: Tue Apr 6 12:17:25 2010 -0400
Minor bug fix to incremental remove support
Signed-off-by: Doug Ledford <dledford@redhat.com>
diff --git a/Manage.c b/Manage.c
index 6539eda..b15586b 100644
--- a/Manage.c
+++ b/Manage.c
@@ -386,6 +386,7 @@ int Manage_subdevs(char *devname, int fd,
next = dv->next;
jnext = 0;
+ tfd = -1;
if (strcmp(dv->devname, "failed")==0 ||
strcmp(dv->devname, "faulty")==0) {
@@ -406,6 +407,7 @@ int Manage_subdevs(char *devname, int fd,
stb.st_rdev = makedev(disc.major, disc.minor);
next = dv;
jnext = j+1;
+ tfd = 0;
sprintf(dvname,"%d:%d", disc.major, disc.minor);
dnprintable = dvname;
break;
@@ -440,6 +442,7 @@ int Manage_subdevs(char *devname, int fd,
stb.st_rdev = makedev(disc.major, disc.minor);
next = dv;
jnext = j+1;
+ tfd = 0;
dnprintable = dvname;
break;
}
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [Patch mdadm] Add hot-unplug support to mdadm
2010-04-05 16:40 [Patch mdadm] Add hot-unplug support to mdadm Doug Ledford
2010-04-06 16:26 ` Doug Ledford
@ 2010-04-07 1:30 ` Neil Brown
2010-04-07 2:02 ` Doug Ledford
1 sibling, 1 reply; 17+ messages in thread
From: Neil Brown @ 2010-04-07 1:30 UTC (permalink / raw)
To: Doug Ledford; +Cc: Linux RAID Mailing List, Dan Williams
On Mon, 05 Apr 2010 12:40:41 -0400
Doug Ledford <dledford@redhat.com> wrote:
> We have incremental mode for supporting hot plug of devices. However,
> we don't support hot-unplug. This is something we need in order to
> enable automatic device replacement. When implementing the support
> though, I found that we had a problem in that the device special file is
> already gone (well, not gone, but both open and lstat fail on the device
> special file, which really makes Manage_subdevs unhappy) by the time
> udev calls us to remove the device (even if I created a new udev rules
> file with a very low number this was still true). So, along with adding
> the code shamelessly stolen from Hawrylewicz with modifications to make
> it work on non-container based arrays, I had to modify Manage_subdevs to
> work on sysfs subdevice entries in the fail and remove cases. Anyway,
> with this in place, and with the modified udev rules file in place,
> hot-unplug now works (or at least it does in my testing, I didn't try a
> container device, I'll leave tweaking that to Dan or someone else from
> Intel, although it should more or less work, I don't know the Intel
> metadata rules and so didn't try to make it work myself). And by
> changing Manage_subdevs to use either a valid device special file or
> sysfs entries, it will work on the command line with virtually any
> kernel, and work from a udev rules file on kernels recent enough to have
> the state entry in sysfs subdevs directories.
>
> Oh, and Neil should review my man page additions to see if they are
> sufficient for his tastes. I didn't get real wordy about the support,
> it's pretty straight forward.
>
Thanks...
Observations:
-In udev-md-raid.rules you add
OPTIONS+="watch"
Why?
- I cannot say I like the process of hunting through /proc/mdstat for
a device name that textually matches the name that udev has just
removed from /dev. It feels too indirect.
I was really hoping to just use /sys/$DEVPATH/holders to find the
containing arrays, but $DEVPATH has been destroyed before udev
gets to run anything .. even though the final object is still in
use. That is really sad!
So we need to either use the block device name (e.g. sdb) as you
did or the major:minor numbers. The most direct way to get these
is with $name or $major:$minor. And there is no direct way to map between
them any more, so if mdadm needs both, we have to pass both.
The name can be used for searching /proc/mdstat or
/sys/class/block/md*/md/dev-*
and can be used for failing/removing via sysfs.
the major:minor can be use for searching with GET_DISK_INFO
and for failing/removing via ioctl.
So I'm probably going to have to be satisfied with hunting
through /proc/mdstat even though I don't like it.
But if we can just pass $name to mdadm instead of passing
$ENV{DEVNAME} and stripping the "/dev/" it would be a little
bit happier. i.e.
mdadm -If sdb
would it help do you think to support e.g.
mdadm -If --devnum $major:$minor $name
so that mdadm would have the major/minor in case of an old
kernel without support for removing via sysfs?
Or something horrible like
mdadm -If $name($major:$minor)
?? maybe not.
- I'm not a big fan of in/out parameters, particularly when they mean
different things (component device / array device). I would rather
replace mdstat_check_active with e.g. mstat_by_component which
returns a 'struct mdstat_ent' having search for one with a component
matching the arguement.
- SKIP_GONE_DEVS / KEEP_GONE_DEVS is starting to look really ugly.
I wonder if we can get rid of both and always do what KEEP_GONE_DEVS
does. That will probably require changes elsewhere, but I think it would
be a good thing.
- man page update looks OK though
+has a chance to include it in some array as appropriate. Optionally,
+with the
+.I \-\-fail
+flag is passed in then we will remove the device from any active array
+instead of adding it.
needs to lose 'is' or s/the/if/ or maybe be rewritten.
Thanks. I'll try to find time to incorporate some of this - I'd like to
break it up into 2 or 3 patches. One to add dev-name searching in mdstat,
maybe one to change how Manage parses device names so "sdb" can be understood
as a different thing to "/dev/whatever", and one to tie it all together
and provide the new functionality.
Oh, and maybe one to get ride of SKIP_GONE_DEVS.
NeilBrown
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Patch mdadm] Add hot-unplug support to mdadm
2010-04-07 1:30 ` Neil Brown
@ 2010-04-07 2:02 ` Doug Ledford
2010-04-07 2:24 ` Doug Ledford
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Doug Ledford @ 2010-04-07 2:02 UTC (permalink / raw)
To: Neil Brown; +Cc: Linux RAID Mailing List, Dan Williams
[-- Attachment #1: Type: text/plain, Size: 7566 bytes --]
On 04/06/2010 09:30 PM, Neil Brown wrote:
> On Mon, 05 Apr 2010 12:40:41 -0400
> Doug Ledford <dledford@redhat.com> wrote:
>
>> We have incremental mode for supporting hot plug of devices. However,
>> we don't support hot-unplug. This is something we need in order to
>> enable automatic device replacement. When implementing the support
>> though, I found that we had a problem in that the device special file is
>> already gone (well, not gone, but both open and lstat fail on the device
>> special file, which really makes Manage_subdevs unhappy) by the time
>> udev calls us to remove the device (even if I created a new udev rules
>> file with a very low number this was still true). So, along with adding
>> the code shamelessly stolen from Hawrylewicz with modifications to make
>> it work on non-container based arrays, I had to modify Manage_subdevs to
>> work on sysfs subdevice entries in the fail and remove cases. Anyway,
>> with this in place, and with the modified udev rules file in place,
>> hot-unplug now works (or at least it does in my testing, I didn't try a
>> container device, I'll leave tweaking that to Dan or someone else from
>> Intel, although it should more or less work, I don't know the Intel
>> metadata rules and so didn't try to make it work myself). And by
>> changing Manage_subdevs to use either a valid device special file or
>> sysfs entries, it will work on the command line with virtually any
>> kernel, and work from a udev rules file on kernels recent enough to have
>> the state entry in sysfs subdevs directories.
>>
>> Oh, and Neil should review my man page additions to see if they are
>> sufficient for his tastes. I didn't get real wordy about the support,
>> it's pretty straight forward.
>>
>
> Thanks...
>
> Observations:
>
> -In udev-md-raid.rules you add
> OPTIONS+="watch"
> Why?
udev upstream added it to their version of the file so I brought it over
(hmm, I should say upstream to me, the Fedora
/lib/udev/rules.d/64-md-raid.rules file is packaged as part of udev, and
they added it there, although I can't say for certain that upstream,
upstream also added it, so I was making the files consistent). I don't
always keep up with the various options in udev, so my assumption was
that if the people who really know it added it, they probably had a
reason and I was just trying to make sure it didn't get lost.
> - I cannot say I like the process of hunting through /proc/mdstat for
> a device name that textually matches the name that udev has just
> removed from /dev. It feels too indirect.
> I was really hoping to just use /sys/$DEVPATH/holders to find the
> containing arrays, but $DEVPATH has been destroyed before udev
> gets to run anything .. even though the final object is still in
> use. That is really sad!
I agree. The fact that everything disappears before we can use it to
remove it from the array was a bit problematic. And I too thought that
this was too indirect and error prone. But, the fact remains that A) if
we have that textual name somewhere in some array, then that means that
array holds a reference to that textual name and B) as long as we hold a
textual reference to that name, the name is not free and can not be
reused. In other words, our removal act is necessarily a barrier
against races. There can be no two devices with the same textual name,
and if the same device is added back to the system before we have
removed our holder on it, then it gets a totally new name. So, in
truth, even though it *feels* too indirect, it is in fact perfectly safe.
> So we need to either use the block device name (e.g. sdb) as you
> did or the major:minor numbers. The most direct way to get these
> is with $name or $major:$minor. And there is no direct way to map between
> them any more, so if mdadm needs both, we have to pass both.
>
> The name can be used for searching /proc/mdstat or
> /sys/class/block/md*/md/dev-*
> and can be used for failing/removing via sysfs.
> the major:minor can be use for searching with GET_DISK_INFO
> and for failing/removing via ioctl.
>
> So I'm probably going to have to be satisfied with hunting
> through /proc/mdstat even though I don't like it.
> But if we can just pass $name to mdadm instead of passing
> $ENV{DEVNAME} and stripping the "/dev/" it would be a little
> bit happier. i.e.
>
> mdadm -If sdb
>
> would it help do you think to support e.g.
>
> mdadm -If --devnum $major:$minor $name
As I recall, on a remove event the major minor is already gone too (but
I could be wrong, the machine has since been blown away and reinstalled
and the file I had the whole env during a remove event stored off in is
gone with it).
And I would be careful about using $env{DEVNAME}, in fact I'm removing
usage of it in my local rules files right now because I've found it is
inconsistent (part of the time it includes /dev/ and part of the time it
doesn't, where as $tempnode is consistent).
> so that mdadm would have the major/minor in case of an old
> kernel without support for removing via sysfs?
> Or something horrible like
> mdadm -If $name($major:$minor)
> ?? maybe not.
Personally, I'm fine with just the name for the reason I listed above.
> - I'm not a big fan of in/out parameters, particularly when they mean
> different things (component device / array device). I would rather
> replace mdstat_check_active with e.g. mstat_by_component which
> returns a 'struct mdstat_ent' having search for one with a component
> matching the arguement.
Fair enough. I stole that code anyway and it's clunky in that it
expects a /dev/$name input and gives a plain $name output causing me to
have to prefill the char string with /dev/ and then copy /dev/$name
after that and point the function at a ways into the char string in
order to get /dev/$name as output. Clunky, but it worked.
> - SKIP_GONE_DEVS / KEEP_GONE_DEVS is starting to look really ugly.
> I wonder if we can get rid of both and always do what KEEP_GONE_DEVS
> does. That will probably require changes elsewhere, but I think it would
> be a good thing.
I think that would be a good thing. Even if a device is failed and no
longer has a valid block link, if it's still being held by our array it
should be in the list. We should simply check that each device is live
when walking the list.
> - man page update looks OK though
>
> +has a chance to include it in some array as appropriate. Optionally,
> +with the
> +.I \-\-fail
> +flag is passed in then we will remove the device from any active array
> +instead of adding it.
>
> needs to lose 'is' or s/the/if/ or maybe be rewritten.
Oops, s/with/when/ makes it all better.
> Thanks. I'll try to find time to incorporate some of this - I'd like to
> break it up into 2 or 3 patches. One to add dev-name searching in mdstat,
> maybe one to change how Manage parses device names so "sdb" can be understood
> as a different thing to "/dev/whatever", and one to tie it all together
> and provide the new functionality.
> Oh, and maybe one to get ride of SKIP_GONE_DEVS.
Let me know if you want me to redo/resend any of it.
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Patch mdadm] Add hot-unplug support to mdadm
2010-04-07 2:02 ` Doug Ledford
@ 2010-04-07 2:24 ` Doug Ledford
2010-04-07 3:07 ` Doug Ledford
2010-04-07 5:32 ` Luca Berra
2010-04-08 23:31 ` Neil Brown
2 siblings, 1 reply; 17+ messages in thread
From: Doug Ledford @ 2010-04-07 2:24 UTC (permalink / raw)
To: Neil Brown; +Cc: Linux RAID Mailing List, Dan Williams
[-- Attachment #1: Type: text/plain, Size: 1962 bytes --]
On 04/06/2010 10:02 PM, Doug Ledford wrote:
> On 04/06/2010 09:30 PM, Neil Brown wrote:
>> On Mon, 05 Apr 2010 12:40:41 -0400
>> Doug Ledford <dledford@redhat.com> wrote:
>> would it help do you think to support e.g.
>>
>> mdadm -If --devnum $major:$minor $name
>
> As I recall, on a remove event the major minor is already gone too (but
> I could be wrong, the machine has since been blown away and reinstalled
> and the file I had the whole env during a remove event stored off in is
> gone with it).
>
> And I would be careful about using $env{DEVNAME}, in fact I'm removing
> usage of it in my local rules files right now because I've found it is
> inconsistent (part of the time it includes /dev/ and part of the time it
> doesn't, where as $tempnode is consistent).
OK, so to expand on this a bit further. $env{DEVNAME} sometimes refers
to devices via a full path (aka, /dev/sda1), and sometimes by just the
last part of the path (aka, md0). I found this when creating a rule to
do incremental assembly on containers during late udev startup (I need
this because the initramfs created by dracut *only* starts necessary
arrays, and if you have a additional arrays in a container, then the
change event generated during udev startup can be used to trigger
incremental assembly of those additional arrays, but $env{DEVNAME} will
just be md0 or whatever, not a full path name). So, I switched
everything to $tempnode as it always points to a device special file.
Except when doing removal events, in which case $tempnode doesn't exist
because there is no node to point to. So, long story short, you're best
off to use $tempnode on any device add|change events, and $env{DEVNAME}
on remove events.
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Patch mdadm] Add hot-unplug support to mdadm
2010-04-07 2:24 ` Doug Ledford
@ 2010-04-07 3:07 ` Doug Ledford
0 siblings, 0 replies; 17+ messages in thread
From: Doug Ledford @ 2010-04-07 3:07 UTC (permalink / raw)
To: Neil Brown; +Cc: Linux RAID Mailing List, Dan Williams
[-- Attachment #1.1: Type: text/plain, Size: 402 bytes --]
Here's another minor incremental patch to the original. This makes
hot-remove of imsm devices work. The sanity tests just needed to be
updated to deal with an already missing device.
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #1.2: mdadm-3.1.2-decremental-3.patch --]
[-- Type: text/plain, Size: 1079 bytes --]
commit 1934c67fe5bc25a79393ad78e29bf9ef778bc701
Author: Doug Ledford <dledford@redhat.com>
Date: Tue Apr 6 23:02:47 2010 -0400
Only close lfd if we have an external metadata type since that's the only
time we'll have it open.
If we weren't able to open the device file, assume the device is gone
already and skip the sysfs_unique_holder test as it is guaranteed to
fail.
Signed-off-by: Doug Ledford <dledford@redhat.com>
diff --git a/Manage.c b/Manage.c
index b15586b..a690cfc 100644
--- a/Manage.c
+++ b/Manage.c
@@ -811,6 +811,7 @@ int Manage_subdevs(char *devname, int fd,
* rely on the 'detached' checks
*/
if (strcmp(dv->devname, "detached") == 0 ||
+ tfd < 0 ||
sysfs_unique_holder(dnum, stb.st_rdev))
/* pass */;
else {
@@ -878,8 +879,8 @@ int Manage_subdevs(char *devname, int fd,
ping_manager(name);
free(name);
+ close(lfd);
}
- close(lfd);
if (verbose >= 0)
fprintf(stderr, Name ": hot removed %s\n",
dnprintable);
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [Patch mdadm] Add hot-unplug support to mdadm
2010-04-07 2:02 ` Doug Ledford
2010-04-07 2:24 ` Doug Ledford
@ 2010-04-07 5:32 ` Luca Berra
2010-04-07 6:59 ` Neil Brown
2010-04-08 23:31 ` Neil Brown
2 siblings, 1 reply; 17+ messages in thread
From: Luca Berra @ 2010-04-07 5:32 UTC (permalink / raw)
To: Linux RAID Mailing List
On Tue, Apr 06, 2010 at 10:02:58PM -0400, Doug Ledford wrote:
>On 04/06/2010 09:30 PM, Neil Brown wrote:
>> - I cannot say I like the process of hunting through /proc/mdstat for
>> a device name that textually matches the name that udev has just
>> removed from /dev. It feels too indirect.
>> I was really hoping to just use /sys/$DEVPATH/holders to find the
>> containing arrays, but $DEVPATH has been destroyed before udev
>> gets to run anything .. even though the final object is still in
>> use. That is really sad!
>
>I agree. The fact that everything disappears before we can use it to
>remove it from the array was a bit problematic. And I too thought that
>this was too indirect and error prone. But, the fact remains that A) if
Is there any chance of having udev issuing a pre-remove event, before
destroying /dev and sysfs, it would be far more useful than what we have
now.
L.
--
Luca Berra -- bluca@comedia.it
Communication Media & Services S.r.l.
/"\
\ / ASCII RIBBON CAMPAIGN
X AGAINST HTML MAIL
/ \
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Patch mdadm] Add hot-unplug support to mdadm
2010-04-07 5:32 ` Luca Berra
@ 2010-04-07 6:59 ` Neil Brown
0 siblings, 0 replies; 17+ messages in thread
From: Neil Brown @ 2010-04-07 6:59 UTC (permalink / raw)
To: Luca Berra; +Cc: Linux RAID Mailing List
On Wed, 7 Apr 2010 07:32:55 +0200
Luca Berra <bluca@comedia.it> wrote:
> On Tue, Apr 06, 2010 at 10:02:58PM -0400, Doug Ledford wrote:
> >On 04/06/2010 09:30 PM, Neil Brown wrote:
> >> - I cannot say I like the process of hunting through /proc/mdstat for
> >> a device name that textually matches the name that udev has just
> >> removed from /dev. It feels too indirect.
> >> I was really hoping to just use /sys/$DEVPATH/holders to find the
> >> containing arrays, but $DEVPATH has been destroyed before udev
> >> gets to run anything .. even though the final object is still in
> >> use. That is really sad!
> >
> >I agree. The fact that everything disappears before we can use it to
> >remove it from the array was a bit problematic. And I too thought that
> >this was too indirect and error prone. But, the fact remains that A) if
>
> Is there any chance of having udev issuing a pre-remove event, before
> destroying /dev and sysfs, it would be far more useful than what we have
> now.
No. udev has not ability to delay the kernel from removing things.
The kernel removes stuff, then sends a message to udev - udev responds by
cleaning up the user-space side.
If you want to know anything about a device on removal, you need to record
that information somewhere. udev does this to some extent, so when a device
is removed, it knows the major/minor number (at least I assume that is how it
works). See /dev/.udev/db/
So we could generate uevents whenever a devices is added to or removed from
an array, then get udev to somehow record the current state so that when a
device is removed from the system, we know which array it is part of.
But that seems like a bit of a sledgehammer approach - my first impression is
that it would be worse than scanning /proc/mdstat.
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Patch mdadm] Add hot-unplug support to mdadm
2010-04-07 2:02 ` Doug Ledford
2010-04-07 2:24 ` Doug Ledford
2010-04-07 5:32 ` Luca Berra
@ 2010-04-08 23:31 ` Neil Brown
2010-04-09 0:33 ` Neil Brown
2 siblings, 1 reply; 17+ messages in thread
From: Neil Brown @ 2010-04-08 23:31 UTC (permalink / raw)
To: Doug Ledford; +Cc: Linux RAID Mailing List, Dan Williams
On Tue, 06 Apr 2010 22:02:58 -0400
Doug Ledford <dledford@redhat.com> wrote:
> Let me know if you want me to redo/resend any of it.
>
not necessary, but some remove would be appreciated.
I've create a temporary branch at 'hotunplug':
http://neil.brown.name/git?p=mdadm;a=shortlog;h=refs/heads/hotunplug
with my revised version of your code.
I substantially rewrote the changes to Manage.c and found that
I didn't need KEEP_GONE_DEVS, though I would still like to tidy
that up.
The combined patch is below.
I have only performed limited testing.
Thanks,
NeilBrown
diff --git a/Incremental.c b/Incremental.c
index 7ad648a..ed29488 100644
--- a/Incremental.c
+++ b/Incremental.c
@@ -843,3 +843,42 @@ int Incremental_container(struct supertype *st, char *devname, int verbose,
map_unlock(&map);
return 0;
}
+
+/*
+ * IncrementalRemove - Attempt to see if the passed in device belongs to any
+ * raid arrays, and if so first fail (if needed) and then remove the device.
+ *
+ * @devname - The device we want to remove
+ *
+ * Note: the device name must be a kernel name like "sda", so
+ * that we can find it in /proc/mdstat
+ */
+int IncrementalRemove(char *devname, int verbose)
+{
+ int mdfd;
+ struct mdstat_ent *ent;
+ struct mddev_dev_s devlist;
+
+ if (strchr(devname, '/')) {
+ fprintf(stderr, Name ": incremental removal requires a "
+ "kernel device name, not a file: %s\n", devname);
+ return 1;
+ }
+ ent = mdstat_by_component(devname);
+ if (!ent) {
+ fprintf(stderr, Name ": %s does not appear to be a component "
+ "of any array\n", devname);
+ return 1;
+ }
+ mdfd = open_dev(ent->devnum);
+ if (mdfd < 0) {
+ fprintf(stderr, Name ": Cannot open array %s!!\n", ent->dev);
+ return 1;
+ }
+ memset(&devlist, 0, sizeof(devlist));
+ devlist.devname = devname;
+ devlist.disposition = 'f';
+ Manage_subdevs(ent->dev, mdfd, &devlist, verbose);
+ devlist.disposition = 'r';
+ return Manage_subdevs(ent->dev, mdfd, &devlist, verbose);
+}
diff --git a/Manage.c b/Manage.c
index f848d8b..ba585d2 100644
--- a/Manage.c
+++ b/Manage.c
@@ -341,6 +341,8 @@ int Manage_subdevs(char *devname, int fd,
* 'f' - set the device faulty SET_DISK_FAULTY
* device can be 'detached' in which case any device that
* is inaccessible will be marked faulty.
+ * For 'f' and 'r', the device can also be a kernel-internal
+ * name such as 'sdb'.
*/
mdu_array_info_t array;
mdu_disk_info_t disc;
@@ -353,6 +355,7 @@ int Manage_subdevs(char *devname, int fd,
int duuid[4];
int ouuid[4];
int lfd = -1;
+ int sysfd = -1;
if (ioctl(fd, GET_ARRAY_INFO, &array)) {
fprintf(stderr, Name ": cannot get array info for %s\n",
@@ -442,6 +445,40 @@ int Manage_subdevs(char *devname, int fd,
}
if (jnext == 0)
continue;
+ } else if (strchr(dv->devname, '/') == NULL &&
+ strlen(dv->devname) < 50) {
+ /* Assume this is a kernel-internal name like 'sda1' */
+ int found = 0;
+ char dname[55];
+ if (dv->disposition != 'r' && dv->disposition != 'f') {
+ fprintf(stderr, Name ": %s only meaningful "
+ "with -r of -f, not -%c\n",
+ dv->devname, dv->disposition);
+ return 1;
+ }
+
+ sprintf(dname, "dev-%s", dv->devname);
+ sysfd = sysfs_open(fd2devnum(fd), dname, "block/dev");
+ if (sysfd >= 0) {
+ char dn[20];
+ int mj,mn;
+ if (sysfs_fd_get_str(sysfd, dn, 20) > 0 &&
+ sscanf(dn, "%d:%d", &mj,&mn) == 2) {
+ stb.st_rdev = makedev(mj,mn);
+ found = 1;
+ }
+ close(sysfd);
+ sysfd = -1;
+ }
+ if (!found) {
+ sysfd = sysfs_open(fd2devnum(fd), dname, "state");
+ if (sysfd < 0) {
+ fprintf(stderr, Name ": %s does not appear "
+ "to be a component of %s\n",
+ dv->devname, devname);
+ return 1;
+ }
+ }
} else {
j = 0;
@@ -753,6 +790,8 @@ int Manage_subdevs(char *devname, int fd,
fprintf(stderr, Name ": Cannot remove disks from a"
" \'member\' array, perform this"
" operation on the parent container\n");
+ if (sysfd >= 0)
+ close(sysfd);
return 1;
}
if (tst->ss->external) {
@@ -771,6 +810,8 @@ int Manage_subdevs(char *devname, int fd,
fprintf(stderr, Name
": Cannot get exclusive access "
" to container - odd\n");
+ if (sysfd >= 0)
+ close(sysfd);
return 1;
}
/* in the detached case it is not possible to
@@ -778,6 +819,7 @@ int Manage_subdevs(char *devname, int fd,
* rely on the 'detached' checks
*/
if (strcmp(dv->devname, "detached") == 0 ||
+ sysfd >= 0 ||
sysfs_unique_holder(dnum, stb.st_rdev))
/* pass */;
else {
@@ -791,25 +833,37 @@ int Manage_subdevs(char *devname, int fd,
}
}
/* FIXME check that it is a current member */
- err = ioctl(fd, HOT_REMOVE_DISK, (unsigned long)stb.st_rdev);
- if (err && errno == ENODEV) {
- /* Old kernels rejected this if no personality
- * registered */
- struct mdinfo *sra = sysfs_read(fd, 0, GET_DEVS);
- struct mdinfo *dv = NULL;
- if (sra)
- dv = sra->devs;
- for ( ; dv ; dv=dv->next)
- if (dv->disk.major == major(stb.st_rdev) &&
- dv->disk.minor == minor(stb.st_rdev))
- break;
- if (dv)
- err = sysfs_set_str(sra, dv,
- "state", "remove");
- else
+ if (sysfd >= 0) {
+ /* device has been removed and we don't know
+ * the major:minor number
+ */
+ int n = write(sysfd, "remove", 6);
+ if (n != 6)
err = -1;
- if (sra)
- sysfs_free(sra);
+ else
+ err = 0;
+ close(sysfd);
+ } else {
+ err = ioctl(fd, HOT_REMOVE_DISK, (unsigned long)stb.st_rdev);
+ if (err && errno == ENODEV) {
+ /* Old kernels rejected this if no personality
+ * registered */
+ struct mdinfo *sra = sysfs_read(fd, 0, GET_DEVS);
+ struct mdinfo *dv = NULL;
+ if (sra)
+ dv = sra->devs;
+ for ( ; dv ; dv=dv->next)
+ if (dv->disk.major == major(stb.st_rdev) &&
+ dv->disk.minor == minor(stb.st_rdev))
+ break;
+ if (dv)
+ err = sysfs_set_str(sra, dv,
+ "state", "remove");
+ else
+ err = -1;
+ if (sra)
+ sysfs_free(sra);
+ }
}
if (err) {
fprintf(stderr, Name ": hot remove failed "
@@ -836,19 +890,26 @@ int Manage_subdevs(char *devname, int fd,
ping_manager(name);
free(name);
}
- close(lfd);
+ if (lfd >= 0)
+ close(lfd);
if (verbose >= 0)
- fprintf(stderr, Name ": hot removed %s\n",
- dnprintable);
+ fprintf(stderr, Name ": hot removed %s from %s\n",
+ dnprintable, devname);
break;
case 'f': /* set faulty */
/* FIXME check current member */
- if (ioctl(fd, SET_DISK_FAULTY, (unsigned long) stb.st_rdev)) {
+ if ((sysfd >= 0 && write(sysfd, "faulty", 6) != 6) ||
+ (sysfd < 0 && ioctl(fd, SET_DISK_FAULTY,
+ (unsigned long) stb.st_rdev))) {
fprintf(stderr, Name ": set device faulty failed for %s: %s\n",
dnprintable, strerror(errno));
+ if (sysfd >= 0)
+ close(sysfd);
return 1;
}
+ if (sysfd >= 0)
+ close(sysfd);
if (verbose >= 0)
fprintf(stderr, Name ": set %s faulty in %s\n",
dnprintable, devname);
diff --git a/ReadMe.c b/ReadMe.c
index 9d5a211..b43d1a0 100644
--- a/ReadMe.c
+++ b/ReadMe.c
@@ -213,7 +213,7 @@ char Help[] =
" mdadm --grow options device\n"
" resize/reshape an active array\n"
" mdadm --incremental device\n"
-" add a device to an array as appropriate\n"
+" add/remove a device to/from an array as appropriate\n"
" mdadm --monitor options...\n"
" Monitor one or more array for significant changes.\n"
" mdadm device options...\n"
@@ -256,7 +256,7 @@ char OptionHelp[] =
" --examine-bitmap -X: Display the detail of a bitmap file\n"
" --monitor -F : monitor (follow) some arrays\n"
" --grow -G : resize/ reshape and array\n"
-" --incremental -I : add a single device to an array as appropriate\n"
+" --incremental -I : add/remove a single device to/from an array as appropriate\n"
" --query -Q : Display general information about how a\n"
" device relates to the md driver\n"
" --auto-detect : Start arrays auto-detected by the kernel\n"
@@ -535,20 +535,26 @@ char Help_grow[] =
;
char Help_incr[] =
-"Usage: mdadm --incremental [-Rqrs] device\n"
+"Usage: mdadm --incremental [-Rqrsf] device\n"
"\n"
"This usage allows for incremental assembly of md arrays. Devices can be\n"
"added one at a time as they are discovered. Once an array has all expected\n"
"devices, it will be started.\n"
"\n"
-"Options that are valid with incremental assembly (-I --incremental) more are:\n"
-" --run -R : run arrays as soon as a minimal number of devices are\n"
+"Optionally, the process can be reversed by using the fail option.\n"
+"When fail mode is invoked, mdadm will see if the device belongs to an array\n"
+"and then both fail (if needed) and remove the device from that array.\n"
+"\n"
+"Options that are valid with incremental assembly (-I --incremental) are:\n"
+" --run -R : Run arrays as soon as a minimal number of devices are\n"
" : present rather than waiting for all expected.\n"
" --quiet -q : Don't print any information messages, just errors.\n"
" --rebuild -r : Rebuild the 'map' file that mdadm uses for tracking\n"
" : partial arrays.\n"
" --scan -s : Use with -R to start any arrays that have the minimal\n"
" : required number of devices, but are not yet started.\n"
+" --fail -f : First fail (if needed) and then remove device from\n"
+" : any array that it is a member of.\n"
;
char Help_config[] =
diff --git a/mdadm.8 b/mdadm.8
index 4edfc41..f547fda 100644
--- a/mdadm.8
+++ b/mdadm.8
@@ -136,6 +136,10 @@ This provides a convenient interface to a
system. As each device is detected,
.I mdadm
has a chance to include it in some array as appropriate.
+Optionally, when the
+.I \-\-fail
+flag is passed in we will remove the device from any active array
+instead of adding it.
If a
.B CONTAINER
@@ -189,7 +193,7 @@ Change the size or shape of an active array.
.TP
.BR \-I ", " \-\-incremental
-Add a single device into an appropriate array, and possibly start the array.
+Add/remove a single device to/from an appropriate array, and possibly start the array.
.TP
.B \-\-auto-detect
@@ -1235,6 +1239,15 @@ in
.B mdadm.conf
as requiring an external bitmap, that bitmap will be attached first.
+.TP
+.BR \-\-fail ", " \-f
+This allows the hot-plug system to remove devices that have fully disappeared
+from the kernel. It will first fail and then remove the device from any
+array it belongs to.
+The device name given should be a kernel device name such as "sda",
+not a name in
+.IR /dev .
+
.SH For Monitor mode:
.TP
.BR \-m ", " \-\-mail
@@ -2141,6 +2154,10 @@ Usage:
.I component-device
.HP 12
Usage:
+.B mdadm \-\-incremental \-\-fail
+.I component-device
+.HP 12
+Usage:
.B mdadm \-\-incremental \-\-rebuild
.HP 12
Usage:
@@ -2153,6 +2170,11 @@ passed to
.B "mdadm \-\-incremental"
to be conditionally added to an appropriate array.
+Conversely, it can also be used with the
+.B \-\-fail
+flag to do just the opposite and find whatever array a particular device
+is part of and remove the device from that array.
+
If the device passed is a
.B CONTAINER
device created by a previous call to
diff --git a/mdadm.c b/mdadm.c
index d5e34c0..6690546 100644
--- a/mdadm.c
+++ b/mdadm.c
@@ -774,6 +774,9 @@ int main(int argc, char *argv[])
devmode = 'r';
continue;
case O(MANAGE,'f'): /* set faulty */
+ case O(INCREMENTAL,'f'): /* r for incremental is taken, use f
+ * even though we will both fail and
+ * remove the device */
devmode = 'f';
continue;
case O(INCREMENTAL,'R'):
@@ -1517,6 +1520,11 @@ int main(int argc, char *argv[])
": --incremental --scan meaningless without --run.\n");
break;
}
+ if (devmode == 'f') {
+ fprintf(stderr, Name
+ ": --incremental --scan --fail not supported.\n");
+ break;
+ }
rv = IncrementalScan(verbose);
}
if (!devlist) {
@@ -1533,6 +1541,10 @@ int main(int argc, char *argv[])
rv = 1;
break;
}
+ if (devmode == 'f') {
+ rv = IncrementalRemove(devlist->devname, verbose-quiet);
+ break;
+ }
rv = Incremental(devlist->devname, verbose-quiet, runstop,
ss, homehost, require_homehost, autof);
break;
diff --git a/mdadm.h b/mdadm.h
index 0386129..2b47ae7 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -353,6 +353,10 @@ struct mdstat_ent {
int raid_disks;
int chunk_size;
char * metadata_version;
+ struct dev_member {
+ char *name;
+ struct dev_member *next;
+ } *members;
struct mdstat_ent *next;
};
@@ -361,6 +365,7 @@ extern void free_mdstat(struct mdstat_ent *ms);
extern void mdstat_wait(int seconds);
extern void mdstat_wait_fd(int fd, const sigset_t *sigmask);
extern int mddev_busy(int devnum);
+extern struct mdstat_ent *mdstat_by_component(char *name);
struct map_ent {
struct map_ent *next;
@@ -815,7 +820,7 @@ extern int Incremental_container(struct supertype *st, char *devname,
int trustworthy);
extern void RebuildMap(void);
extern int IncrementalScan(int verbose);
-
+extern int IncrementalRemove(char *devname, int verbose);
extern int CreateBitmap(char *filename, int force, char uuid[16],
unsigned long chunksize, unsigned long daemon_sleep,
unsigned long write_behind,
diff --git a/mdstat.c b/mdstat.c
index 4a9f370..58d349d 100644
--- a/mdstat.c
+++ b/mdstat.c
@@ -83,14 +83,41 @@
#include <sys/select.h>
#include <ctype.h>
+static void free_member_devnames(struct dev_member *m)
+{
+ while(m) {
+ struct dev_member *t = m;
+
+ m = m->next;
+ free(t->name);
+ free(t);
+ }
+}
+
+static void add_member_devname(struct dev_member **m, char *name)
+{
+ struct dev_member *new;
+ char *t;
+
+ if ((t = strchr(name, '[')) == NULL)
+ /* not a device */
+ return;
+
+ new = malloc(sizeof(*new));
+ new->name = strndup(name, t - name);
+ new->next = *m;
+ *m = new;
+}
+
void free_mdstat(struct mdstat_ent *ms)
{
while (ms) {
struct mdstat_ent *t;
- if (ms->dev) free(ms->dev);
- if (ms->level) free(ms->level);
- if (ms->pattern) free(ms->pattern);
- if (ms->metadata_version) free(ms->metadata_version);
+ free(ms->dev);
+ free(ms->level);
+ free(ms->pattern);
+ free(ms->metadata_version);
+ free_member_devnames(ms->members);
t = ms;
ms = ms->next;
free(t);
@@ -159,6 +186,7 @@ struct mdstat_ent *mdstat_read(int hold, int start)
ent->raid_disks = 0;
ent->chunk_size = 0;
ent->devcnt = 0;
+ ent->members = NULL;
ent->dev = strdup(line);
ent->devnum = devnum;
@@ -168,9 +196,10 @@ struct mdstat_ent *mdstat_read(int hold, int start)
char *eq;
if (strcmp(w, "active")==0)
ent->active = 1;
- else if (strcmp(w, "inactive")==0)
+ else if (strcmp(w, "inactive")==0) {
ent->active = 0;
- else if (ent->active >=0 &&
+ in_devs = 1;
+ } else if (ent->active > 0 &&
ent->level == NULL &&
w[0] != '(' /*readonly*/) {
ent->level = strdup(w);
@@ -179,6 +208,7 @@ struct mdstat_ent *mdstat_read(int hold, int start)
in_devs = 0;
else if (in_devs) {
ent->devcnt++;
+ add_member_devname(&ent->members, w);
if (strncmp(w, "md", 2)==0) {
/* This has an md device as a component.
* If that device is already in the
@@ -310,3 +340,30 @@ int mddev_busy(int devnum)
free_mdstat(mdstat);
return me != NULL;
}
+
+struct mdstat_ent *mdstat_by_component(char *name)
+{
+ struct mdstat_ent *mdstat = mdstat_read(0, 0);
+
+ while (mdstat) {
+ struct dev_member *m;
+ struct mdstat_ent *ent;
+ if (ent->metadata_version &&
+ strncmp(ent->metadata_version, "external:", 9) == 0 &&
+ is_subarray(ent->metadata_version+9))
+ /* don't return subarrays, only containers */
+ ;
+ else for (m = mdstat->members; m; m = m->next) {
+ if (strcmp(m->name, name) == 0) {
+ free_mdstat(mdstat->next);
+ mdstat->next = NULL;
+ return mdstat;
+ }
+ }
+ ent = mdstat;
+ mdstat = mdstat->next;
+ ent->next = NULL;
+ free_mdstat(ent);
+ }
+ return NULL;
+}
diff --git a/udev-md-raid.rules b/udev-md-raid.rules
index c9a4f0e..da52058 100644
--- a/udev-md-raid.rules
+++ b/udev-md-raid.rules
@@ -1,13 +1,13 @@
# do not edit this file, it will be overwritten on update
SUBSYSTEM!="block", GOTO="md_end"
-ACTION!="add|change", GOTO="md_end"
-ACTION=="change", GOTO="md_no_incr"
-# import data from a raid member and activate it
-#ENV{ID_FS_TYPE}=="linux_raid_member", IMPORT{program}="/sbin/mdadm --examine --export $tempnode", RUN+="/sbin/mdadm --incremental $env{DEVNAME}"
-# import data from a raid set
-LABEL="md_no_incr"
+# handle potential components of arrays
+ENV{ID_FS_TYPE}=="linux_raid_member", ACTION=="remove", RUN+="/sbin/mdadm -If $name"
+ENV{ID_FS_TYPE}=="linux_raid_member", ACTION=="add", RUN+="/sbin/mdadm --incremental $env{DEVNAME}"
+
+# handle md arrays
+ACTION!="add|change", GOTO="md_end"
KERNEL!="md*", GOTO="md_end"
# partitions have no md/{array_state,metadata_version}, but should not
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [Patch mdadm] Add hot-unplug support to mdadm
2010-04-08 23:31 ` Neil Brown
@ 2010-04-09 0:33 ` Neil Brown
2010-04-09 20:02 ` Doug Ledford
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Neil Brown @ 2010-04-09 0:33 UTC (permalink / raw)
To: Neil Brown; +Cc: Doug Ledford, Linux RAID Mailing List, Dan Williams
On Fri, 9 Apr 2010 09:31:53 +1000
Neil Brown <neilb@suse.de> wrote:
> On Tue, 06 Apr 2010 22:02:58 -0400
> Doug Ledford <dledford@redhat.com> wrote:
>
> > Let me know if you want me to redo/resend any of it.
> >
>
> not necessary, but some remove would be appreciated.
>
I suspect you cope graciously with most of my typos, but this one is
ridiculous!
s/remove/review/
:-(
NeilBrown
P.S. When could the word "remove" be used in that context? The only excuse
I can think of is that a "remove" is a palette cleanser at a banquet.
So "some remove" would be "some crystallized fruit or ginger bread" - always
appreciated :-)
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Patch mdadm] Add hot-unplug support to mdadm
2010-04-09 0:33 ` Neil Brown
@ 2010-04-09 20:02 ` Doug Ledford
2010-04-13 9:28 ` Tomáš Dulík
2010-04-13 19:04 ` Doug Ledford
2 siblings, 0 replies; 17+ messages in thread
From: Doug Ledford @ 2010-04-09 20:02 UTC (permalink / raw)
To: Neil Brown; +Cc: Linux RAID Mailing List, Dan Williams
[-- Attachment #1: Type: text/plain, Size: 1406 bytes --]
On 04/08/2010 08:33 PM, Neil Brown wrote:
> On Fri, 9 Apr 2010 09:31:53 +1000
> Neil Brown <neilb@suse.de> wrote:
>
>> On Tue, 06 Apr 2010 22:02:58 -0400
>> Doug Ledford <dledford@redhat.com> wrote:
>>
>>> Let me know if you want me to redo/resend any of it.
>>>
>>
>> not necessary, but some remove would be appreciated.
>>
> I suspect you cope graciously with most of my typos, but this one is
> ridiculous!
>
> s/remove/review/
>
> :-(
Yeah, that's when it transitions from typo to thinko.
> NeilBrown
>
> P.S. When could the word "remove" be used in that context? The only excuse
> I can think of is that a "remove" is a palette cleanser at a banquet.
> So "some remove" would be "some crystallized fruit or ginger bread" - always
> appreciated :-)
Not sure how well that would travel across the pond unless it was
airmail ;-)
Initial review looks fine to me. I think I would suggest a few more
tweaks to the default udev rules file, but I'll forward those
separately. Later tonight I'll do actual unit testing of your patch to
make sure it doesn't regress any functionality and if it needs any
fixes, forward those on to you as well.
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Patch mdadm] Add hot-unplug support to mdadm
2010-04-09 0:33 ` Neil Brown
2010-04-09 20:02 ` Doug Ledford
@ 2010-04-13 9:28 ` Tomáš Dulík
2010-04-13 16:27 ` Doug Ledford
2010-04-13 18:49 ` Doug Ledford
2010-04-13 19:04 ` Doug Ledford
2 siblings, 2 replies; 17+ messages in thread
From: Tomáš Dulík @ 2010-04-13 9:28 UTC (permalink / raw)
To: Doug Ledford; +Cc: Linux RAID Mailing List
Hi Doug,
first of all: thanks for your work on hot-unplug!
I am new to Linux RAID, have been using HW RAID before but after my LSI
controller burned to ashes I decided I don't want to see HW RAID ... ever.
First thing I found weird on Linux RAID was the missing support for dead
device removal.
I spent last 3 weeks trying to write various scripts for UDEV "remove"
and mdadm "Fail" events handling, but finally I found the same thing
like you - it is not possible to remove dead device from an array,
because the events are issued too late. The only way to remove dead
device is reboot, which is not what I would expect as solution in Linux
world.
So I downloaded your code from Neil's git
(http://neil.brown.name/git?p=mdadm;a=shortlog;h=refs/heads/hotunplug)
and also applied the "Minor incremental fixup" mentioned in your message
below.
The compiled mdadm works OK for normal operations (--fail, --remove,
--add), but crashes with Segmentation fault for the "--incremental
--fail" operation if I use it for a disk that I have just disconnected.
Here is what I've got:
# gdb --args ./mdadm -If sda3
GNU gdb 6.8-debian
This GDB was configured as "x86_64-linux-gnu"...
(gdb) run
Starting program: /root/mdadm-git/mdadm/mdadm -If sda3
Program received signal SIGSEGV, Segmentation fault.
0x000000000040a796 in mdstat_by_component (name=0x7fff0d0aee83 "sda3")
at mdstat.c:351
351 if (ent->metadata_version &&
(gdb) where
#0 0x000000000040a796 in mdstat_by_component (name=0x7fff0d0aee83
"sda3") at mdstat.c:351
#1 0x000000000042411c in IncrementalRemove (devname=0x7fff0d0aee83
"sda3", verbose=0) at Incremental.c:867
#2 0x00000000004075a7 in main (argc=3, argv=0x7fff0d0ad698) at mdadm.c:1545
It does not matter if I use sda3 or sda, the result is the same.
What am I doing wrong?
This is my environment:
# uname -a
Linux xeric 2.6.26-2-xen-amd64 #1 SMP Thu Nov 5 04:27:12 UTC 2009 x86_64
GNU/Linux
# modinfo md_mod
filename: /lib/modules/2.6.26-2-xen-amd64/kernel/drivers/md/md-mod.ko
alias: block-major-9-*
alias: md
license: GPL
depends:
vermagic: 2.6.26-2-xen-amd64 SMP mod_unload modversions Xen
parm: start_dirty_degraded:int
# cat /proc/mdstat
Personalities : [raid1]
md2 : active (auto-read-only) raid1 sda3[0] sdb3[1]
9767424 blocks [2/2] [UU]
bitmap: 0/150 pages [0KB], 32KB chunk
md1 : active raid1 sda2[2](F) sdb2[1]
468752512 blocks [2/1] [_U]
bitmap: 18/224 pages [72KB], 1024KB chunk
md0 : active raid1 sda1[0] sdb1[1]
497856 blocks [2/2] [UU]
bitmap: 0/61 pages [0KB], 4KB chunk
Thanks for your help!
Tomas Dulik,
FAI TBU Zlin,
Nad Stranemi 4511,
CZECH REPUBLIC
phone: +420 57 603 5187
On 04/05/2010 12:40 PM, Doug Ledford wrote:
> Minor incremental fixup: In the case of passing in faulty or
> disconnected as the device name, since we now use the value of tfd to
> determine if we should attempt ioctls or go straight to using sysfs
> entries, we now need to make sure we init tdf and then set it properly
> in both of the loops where we check for faulty and disconnected devices
> (although I'm now highly suspicious of the faulty check code as I
> suspect all the faulty devices will have the same problem that our hot
> unplug code ran into and the faulty devices will not be openable and
> that will mean that passing in faulty is probably just broken at this
> point in time...but that's another patch for another day).
>
> --
> Doug Ledford <dledford@xxxxxxxxxx>
> GPG KeyID: CFBFF194
> http://people.redhat.com/dledford
>
> Infiniband specific RPMs available at
> http://people.redhat.com/dledford/Infiniband
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Patch mdadm] Add hot-unplug support to mdadm
2010-04-13 9:28 ` Tomáš Dulík
@ 2010-04-13 16:27 ` Doug Ledford
2010-04-13 18:49 ` Doug Ledford
1 sibling, 0 replies; 17+ messages in thread
From: Doug Ledford @ 2010-04-13 16:27 UTC (permalink / raw)
To: Tomáš Dulík; +Cc: Linux RAID Mailing List
[-- Attachment #1: Type: text/plain, Size: 1755 bytes --]
On 04/13/2010 05:28 AM, Tomáš Dulík wrote:
> Hi Doug,
>
> first of all: thanks for your work on hot-unplug!
> I am new to Linux RAID, have been using HW RAID before but after my LSI
> controller burned to ashes I decided I don't want to see HW RAID ... ever.
>
> First thing I found weird on Linux RAID was the missing support for dead
> device removal.
> I spent last 3 weeks trying to write various scripts for UDEV "remove"
> and mdadm "Fail" events handling, but finally I found the same thing
> like you - it is not possible to remove dead device from an array,
> because the events are issued too late. The only way to remove dead
> device is reboot, which is not what I would expect as solution in Linux
> world.
>
> So I downloaded your code from Neil's git
> (http://neil.brown.name/git?p=mdadm;a=shortlog;h=refs/heads/hotunplug)
> and also applied the "Minor incremental fixup" mentioned in your message
> below.
>
> The compiled mdadm works OK for normal operations (--fail, --remove,
> --add), but crashes with Segmentation fault for the "--incremental
> --fail" operation if I use it for a disk that I have just disconnected.
> Here is what I've got:
[ snip ]
Thanks for the report. You aren't likely doing anything wrong. Neil
rewrote significant portions of my code and I suspect there is a
lingering issue in there. I would have caught it already if I hadn't
ended up taking some time off sick. But, I'm back and working on it
now, so I'll send an updated patch to the list soon.
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Patch mdadm] Add hot-unplug support to mdadm
2010-04-13 9:28 ` Tomáš Dulík
2010-04-13 16:27 ` Doug Ledford
@ 2010-04-13 18:49 ` Doug Ledford
[not found] ` <4BC5ADB2.2060705@unart.cz>
1 sibling, 1 reply; 17+ messages in thread
From: Doug Ledford @ 2010-04-13 18:49 UTC (permalink / raw)
To: Tomáš Dulík; +Cc: Linux RAID Mailing List, Neil Brown
[-- Attachment #1.1: Type: text/plain, Size: 2278 bytes --]
On 04/13/2010 05:28 AM, Tomáš Dulík wrote:
> Hi Doug,
>
> first of all: thanks for your work on hot-unplug!
> I am new to Linux RAID, have been using HW RAID before but after my LSI
> controller burned to ashes I decided I don't want to see HW RAID ... ever.
>
> First thing I found weird on Linux RAID was the missing support for dead
> device removal.
> I spent last 3 weeks trying to write various scripts for UDEV "remove"
> and mdadm "Fail" events handling, but finally I found the same thing
> like you - it is not possible to remove dead device from an array,
> because the events are issued too late. The only way to remove dead
> device is reboot, which is not what I would expect as solution in Linux
> world.
>
> So I downloaded your code from Neil's git
> (http://neil.brown.name/git?p=mdadm;a=shortlog;h=refs/heads/hotunplug)
> and also applied the "Minor incremental fixup" mentioned in your message
> below.
>
> The compiled mdadm works OK for normal operations (--fail, --remove,
> --add), but crashes with Segmentation fault for the "--incremental
> --fail" operation if I use it for a disk that I have just disconnected.
> Here is what I've got:
>
> # gdb --args ./mdadm -If sda3
> GNU gdb 6.8-debian
> This GDB was configured as "x86_64-linux-gnu"...
> (gdb) run
> Starting program: /root/mdadm-git/mdadm/mdadm -If sda3
> Program received signal SIGSEGV, Segmentation fault.
> 0x000000000040a796 in mdstat_by_component (name=0x7fff0d0aee83 "sda3")
> at mdstat.c:351
> 351 if (ent->metadata_version &&
> (gdb) where
> #0 0x000000000040a796 in mdstat_by_component (name=0x7fff0d0aee83
> "sda3") at mdstat.c:351
> #1 0x000000000042411c in IncrementalRemove (devname=0x7fff0d0aee83
> "sda3", verbose=0) at Incremental.c:867
> #2 0x00000000004075a7 in main (argc=3, argv=0x7fff0d0ad698) at
> mdadm.c:1545
>
> It does not matter if I use sda3 or sda, the result is the same.
> What am I doing wrong?
There was a thinko in Neil's patch that is fixed with the attached patch.
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #1.2: 0001-hotunplug-we-are-testing-mdstat-not-ent-which-is-und.patch --]
[-- Type: text/plain, Size: 1053 bytes --]
From b937950110190ce00f16d91a3423a66fde080a95 Mon Sep 17 00:00:00 2001
From: Doug Ledford <dledford@redhat.com>
Date: Tue, 13 Apr 2010 13:12:59 -0400
Subject: [PATCH 1/4] [hotunplug] we are testing mdstat, not ent which is undefined at this
point
Signed-off-by: Doug Ledford <dledford@redhat.com>
---
mdstat.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/mdstat.c b/mdstat.c
index 58d349d..3bb74fa 100644
--- a/mdstat.c
+++ b/mdstat.c
@@ -348,9 +348,9 @@ struct mdstat_ent *mdstat_by_component(char *name)
while (mdstat) {
struct dev_member *m;
struct mdstat_ent *ent;
- if (ent->metadata_version &&
- strncmp(ent->metadata_version, "external:", 9) == 0 &&
- is_subarray(ent->metadata_version+9))
+ if (mdstat->metadata_version &&
+ strncmp(mdstat->metadata_version, "external:", 9) == 0 &&
+ is_subarray(mdstat->metadata_version+9))
/* don't return subarrays, only containers */
;
else for (m = mdstat->members; m; m = m->next) {
--
1.6.6.1
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [Patch mdadm] Add hot-unplug support to mdadm
2010-04-09 0:33 ` Neil Brown
2010-04-09 20:02 ` Doug Ledford
2010-04-13 9:28 ` Tomáš Dulík
@ 2010-04-13 19:04 ` Doug Ledford
2 siblings, 0 replies; 17+ messages in thread
From: Doug Ledford @ 2010-04-13 19:04 UTC (permalink / raw)
To: Neil Brown; +Cc: Linux RAID Mailing List, Dan Williams
[-- Attachment #1.1: Type: text/plain, Size: 3209 bytes --]
On 04/08/2010 08:33 PM, Neil Brown wrote:
> On Fri, 9 Apr 2010 09:31:53 +1000
> Neil Brown <neilb@suse.de> wrote:
>
>> On Tue, 06 Apr 2010 22:02:58 -0400
>> Doug Ledford <dledford@redhat.com> wrote:
>>
>>> Let me know if you want me to redo/resend any of it.
>>>
>>
>> not necessary, but some remove would be appreciated.
>>
> I suspect you cope graciously with most of my typos, but this one is
> ridiculous!
>
> s/remove/review/
>
> :-(
>
> NeilBrown
>
> P.S. When could the word "remove" be used in that context? The only excuse
> I can think of is that a "remove" is a palette cleanser at a banquet.
> So "some remove" would be "some crystallized fruit or ginger bread" - always
> appreciated :-)
I've already sent two minor patches, the touchup patch from my original
submission (which hasn't hit your git repo yet but needs to) and the one
to resolve the segfault in your initial patch. I also have these two
additional patches that apply on top of your hotunplug branch. With
these applied, the code is back to the same basic functionality as the
patch I posted originally. However, don't take that to mean it's done
;-) Of note, there are still a couple lingering issues:
1) The kernel won't let us set a device faulty if the array is a
non-redundant array type. This is true of mdadm -If $name and also
mdadm <mddevice> -f <device>. We need to fix this. I suspect the
reason that the kernel doesn't set non-redundant array types as failed
in this situation is on the hopes that the underlying device will come
back. If it does, and we haven't failed the device yet, then
theoretically we could pick up where we left off instead of ever having
to fail at all, where as if we fail at all, the entire array goes down.
As it turns out, this is a pipe dream. The reason I say that this is a
pipe dream is that our continued holding of a reference to the device
simply guarantees that even if it came back, it's going to come back
with a different device name and major/minor pair, so we will *never* be
able to pick up where we left off. So once that device goes away, our
array is toast whether we like it or not.
2) IMSM containers are still a bit flaky on their incremental remove/add
support. I've been speaking with Dan about this and we've agreed that
the fact that they incrementally assemble differently than regular md
raid arrays is problematic, especially for udev rules files, and so he
was going to work some on the issue (I also have a patch that gets
things minimally functional here, but I'm not attaching that to this email).
My IMSM rules in the udev file are dracut specific. I don't know which
distros will end up switch to dracut from mkinitrd, but we already have
for a couple releases of Fedora now. They could be modified for other
boot initramfs builders as well, but they basically default imsm array
handling to mdadm in the absence of the rd_NO_MDIMSM command line option.
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #1.2: 0003-udev-rules-imsm-arrays-are-a-different-ID_FS_TYPE-so.patch --]
[-- Type: text/plain, Size: 1359 bytes --]
From 506c331a6030531fe647c7a654bd87ae32373543 Mon Sep 17 00:00:00 2001
From: Doug Ledford <dledford@redhat.com>
Date: Tue, 13 Apr 2010 14:38:44 -0400
Subject: [PATCH 3/4] udev rules: imsm arrays are a different ID_FS_TYPE, so check for them as
well as regular mdadm arrays. Check to see if we were told on the boot
command line to ignore IMSM arrays first though (aka, rd_NO_MDIMSM is not
empty).
Signed-off-by: Doug Ledford <dledford@redhat.com>
---
udev-md-raid.rules | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/udev-md-raid.rules b/udev-md-raid.rules
index da52058..712c5e5 100644
--- a/udev-md-raid.rules
+++ b/udev-md-raid.rules
@@ -4,7 +4,9 @@ SUBSYSTEM!="block", GOTO="md_end"
# handle potential components of arrays
ENV{ID_FS_TYPE}=="linux_raid_member", ACTION=="remove", RUN+="/sbin/mdadm -If $name"
-ENV{ID_FS_TYPE}=="linux_raid_member", ACTION=="add", RUN+="/sbin/mdadm --incremental $env{DEVNAME}"
+ENV{ID_FS_TYPE}=="linux_raid_member", ACTION=="add", RUN+="/sbin/mdadm -I $tempnode"
+ENV{ID_FS_TYPE}=="isw_raid_member", ENV{rd_NO_MDIMSM}!="?*", ACTION=="remove", RUN+="/sbin/mdadm -If $name"
+ENV{ID_FS_TYPE}=="isw_raid_member", ENV{rd_NO_MDIMSM}!="?*", ACTION=="add", RUN+="/sbin/mdadm -I $tempnode"
# handle md arrays
ACTION!="add|change", GOTO="md_end"
--
1.6.6.1
[-- Attachment #1.3: 0004-The-assumption-is-that-if-tfd-0-then-we-have-a-valid.patch --]
[-- Type: text/plain, Size: 2210 bytes --]
From d1f7ba12a4e1ec9ed3a005a84489073d80be8cc9 Mon Sep 17 00:00:00 2001
From: Doug Ledford <dledford@redhat.com>
Date: Tue, 13 Apr 2010 14:44:24 -0400
Subject: [PATCH 4/4] The assumption is that if tfd >= 0, then we have a valid major/minor pair
in stb.st_rdev to use in ioctl calls. Therefore, even if we use the sysfs
block/dev entry to get our major/minor pair, we should really be using
tfd and not sysfd as it changes assumptions made later on.
Note: I actually don't like this. I would prefer that in the case of a
kernel internal name that we skipped tfd altogether and went strait to
sysfs operation exclusively. But that's just my preference. I'm
concerned that sometime in the future we'll make the mistake of assuming
that tfd >= 0 means dv->devname is a fully qualified path name and not
a kernel internal name. We don't now, but I could easily see it happening.
Signed-off-by: Doug Ledford <dledford@redhat.com>
---
Manage.c | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/Manage.c b/Manage.c
index 043e3f6..9e450b9 100644
--- a/Manage.c
+++ b/Manage.c
@@ -386,7 +386,7 @@ int Manage_subdevs(char *devname, int fd,
next = dv->next;
jnext = 0;
- tfd = -1;
+ tfd = sysfd = -1;
if (strcmp(dv->devname, "failed")==0 ||
strcmp(dv->devname, "faulty")==0) {
@@ -461,17 +461,16 @@ int Manage_subdevs(char *devname, int fd,
}
sprintf(dname, "dev-%s", dv->devname);
- sysfd = sysfs_open(fd2devnum(fd), dname, "block/dev");
- if (sysfd >= 0) {
+ tfd = sysfs_open(fd2devnum(fd), dname, "block/dev");
+ if (tfd >= 0) {
char dn[20];
int mj,mn;
- if (sysfs_fd_get_str(sysfd, dn, 20) > 0 &&
+ if (sysfs_fd_get_str(tfd, dn, 20) > 0 &&
sscanf(dn, "%d:%d", &mj,&mn) == 2) {
stb.st_rdev = makedev(mj,mn);
found = 1;
}
- close(sysfd);
- sysfd = -1;
+ close(tfd);
}
if (!found) {
sysfd = sysfs_open(fd2devnum(fd), dname, "state");
@@ -481,6 +480,7 @@ int Manage_subdevs(char *devname, int fd,
dv->devname, devname);
return 1;
}
+ tfd = -1;
}
} else {
j = 0;
--
1.6.6.1
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [Patch mdadm] Add hot-unplug support to mdadm
[not found] ` <4BC5ADB2.2060705@unart.cz>
@ 2010-04-15 5:24 ` Neil Brown
2010-04-15 13:11 ` Tomáš Dulík
0 siblings, 1 reply; 17+ messages in thread
From: Neil Brown @ 2010-04-15 5:24 UTC (permalink / raw)
To: Tomáš Dulík; +Cc: Doug Ledford, Linux RAID Mailing List
On Wed, 14 Apr 2010 13:57:38 +0200
Tomáš Dulík <dulik@unart.cz> wrote:
> I have already started to write the documentation for this great feature
> here:
> https://raid.wiki.kernel.org/index.php/Hotplug
Great - thanks for taking the initiative with that.
However:
All previous versions of mdadm were unable to remove disconnected or dead
devices from md arrays.
This is not corrrect. You can removed disconnected devices with:
mdadm /dev/mdx --fail detached --remove detached
will fail, then remove, all component devices on /dev/mdx which
have been detached from the system.
This has been possible since about 2.6.2.
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Patch mdadm] Add hot-unplug support to mdadm
2010-04-15 5:24 ` Neil Brown
@ 2010-04-15 13:11 ` Tomáš Dulík
0 siblings, 0 replies; 17+ messages in thread
From: Tomáš Dulík @ 2010-04-15 13:11 UTC (permalink / raw)
To: Neil Brown; +Cc: Doug Ledford, Linux RAID Mailing List
Neil Brown napsal(a):
> This is not corrrect. You can removed disconnected devices with:
>
> mdadm /dev/mdx --fail detached --remove detached
>
> will fail, then remove, all component devices on /dev/mdx which
> have been detached from the system.
> This has been possible since about 2.6.2.
>
Well, this is another great thing I overlooked in the manual.
Using this feature and some tiny scripts, I am now able to do fully
automatic hot-unplug even with the old mdadm which came with my
distribution:
https://raid.wiki.kernel.org/index.php/Hotplug#Fully_automated_hotplug_and_hot-unplug_using_UDEV_rules
Very cool, thanks for your help!
Tomas
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2010-04-15 13:11 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-05 16:40 [Patch mdadm] Add hot-unplug support to mdadm Doug Ledford
2010-04-06 16:26 ` Doug Ledford
2010-04-07 1:30 ` Neil Brown
2010-04-07 2:02 ` Doug Ledford
2010-04-07 2:24 ` Doug Ledford
2010-04-07 3:07 ` Doug Ledford
2010-04-07 5:32 ` Luca Berra
2010-04-07 6:59 ` Neil Brown
2010-04-08 23:31 ` Neil Brown
2010-04-09 0:33 ` Neil Brown
2010-04-09 20:02 ` Doug Ledford
2010-04-13 9:28 ` Tomáš Dulík
2010-04-13 16:27 ` Doug Ledford
2010-04-13 18:49 ` Doug Ledford
[not found] ` <4BC5ADB2.2060705@unart.cz>
2010-04-15 5:24 ` Neil Brown
2010-04-15 13:11 ` Tomáš Dulík
2010-04-13 19:04 ` Doug Ledford
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).