* [Qemu-devel] [PATCH v2 0/7] md: add hot-plug and hot-unplug support
@ 2013-03-29 5:42 Liu Yuan
2013-03-29 5:42 ` [Qemu-devel] [PATCH v2 1/7] md: add support for simultaneous disk failure on the same node Liu Yuan
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: Liu Yuan @ 2013-03-29 5:42 UTC (permalink / raw)
To: qemu-devel
From: Liu Yuan <tailai.ly@taobao.com>
v2:
- fix object stale purging for md
This is the final part for MD. With this patch set, We even allow group plug,
group unplug and disks/node failure during (un)plugging.
Also add disk information function for collie.
Liu Yuan (7):
md: add support for simultaneous disk failure on the same node
tests/055: test simultaneous multiple disk failures on the same node
net: enlarge connect_to_addr() scope
md: add hot-plug and hot-unplug support
collie: add new commands to manipulate multi-disks
sheep: remove duplicate recovery complete notification
tsets: add 057 to test md hot-plug and hot-unplug
collie/collie.c | 2 +-
collie/node.c | 161 +++++++++++++++++++++++
include/internal_proto.h | 16 +++
include/net.h | 8 ++
include/sheepdog_proto.h | 2 +
sheep/md.c | 321 ++++++++++++++++++++++++++++++++--------------
sheep/ops.c | 59 ++++++++-
sheep/sheep_priv.h | 5 +-
sheep/sockfd_cache.c | 8 --
sheep/store.c | 3 +-
tests/055 | 7 +
tests/055.out | 8 ++
tests/057 | 57 ++++++++
tests/057.out | 55 ++++++++
tests/group | 1 +
15 files changed, 607 insertions(+), 106 deletions(-)
create mode 100755 tests/057
create mode 100644 tests/057.out
--
1.7.9.5
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH v2 1/7] md: add support for simultaneous disk failure on the same node
2013-03-29 5:42 [Qemu-devel] [PATCH v2 0/7] md: add hot-plug and hot-unplug support Liu Yuan
@ 2013-03-29 5:42 ` Liu Yuan
2013-03-29 5:42 ` [Qemu-devel] [PATCH v2 2/7] tests/055: test simultaneous multiple disk failures " Liu Yuan
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Liu Yuan @ 2013-03-29 5:42 UTC (permalink / raw)
To: qemu-devel
From: Liu Yuan <tailai.ly@taobao.com>
Don't panic out, instead we simply remove this disk for this nested disk
failures.
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
---
sheep/md.c | 66 +++++++++++++++++++++++++++++++++++++++++++-----------------
1 file changed, 48 insertions(+), 18 deletions(-)
diff --git a/sheep/md.c b/sheep/md.c
index ed474c8..821a391 100644
--- a/sheep/md.c
+++ b/sheep/md.c
@@ -154,40 +154,77 @@ static inline void calculate_vdisks(struct disk *disks, int nr_disks,
#define MDNAME "user.md.size"
#define MDSIZE sizeof(uint64_t)
+/*
+ * If path is broken during initilization or not support xattr return 0. We can
+ * safely use 0 to represent failure case because 0 space path can be
+ * considered as broken path.
+ */
static uint64_t init_path_space(char *path)
{
struct statvfs fs;
uint64_t size;
+ if (!is_xattr_enabled(path)) {
+ sd_iprintf("multi-disk support need xattr feature");
+ goto broken_path;
+ }
+
if (getxattr(path, MDNAME, &size, MDSIZE) < 0) {
- if (errno == ENODATA)
+ if (errno == ENODATA) {
goto create;
- else
- panic("%s, %m", path);
+ } else {
+ sd_eprintf("%s, %m", path);
+ goto broken_path;
+ }
}
return size;
create:
- if (statvfs(path, &fs) < 0)
- panic("get disk %s space failed %m", path);
+ if (statvfs(path, &fs) < 0) {
+ sd_eprintf("get disk %s space failed %m", path);
+ goto broken_path;
+ }
size = (int64_t)fs.f_frsize * fs.f_bfree;
- if (setxattr(path, MDNAME, &size, MDSIZE, 0) < 0)
- panic("%s, %m", path);
+ if (setxattr(path, MDNAME, &size, MDSIZE, 0) < 0) {
+ sd_eprintf("%s, %m", path);
+ goto broken_path;
+ }
return size;
+broken_path:
+ return 0;
+}
+
+static inline void remove_disk(int idx)
+{
+ int i;
+
+ sd_iprintf("%s from multi-disk array", md_disks[idx].path);
+ /*
+ * We need to keep last disk path to generate EIO when all disks are
+ * broken
+ */
+ for (i = idx; i < md_nr_disks - 1; i++)
+ md_disks[i] = md_disks[i + 1];
+
+ md_nr_disks--;
}
uint64_t md_init_space(void)
{
- uint64_t total = 0;
+ uint64_t total;
int i;
+reinit:
if (!md_nr_disks)
return 0;
+ total = 0;
for (i = 0; i < md_nr_disks; i++) {
- if (!is_xattr_enabled(md_disks[i].path))
- panic("multi-disk support need xattr feature");
md_disks[i].space = init_path_space(md_disks[i].path);
+ if (!md_disks[i].space) {
+ remove_disk(i);
+ goto reinit;
+ }
total += md_disks[i].space;
}
calculate_vdisks(md_disks, md_nr_disks, total);
@@ -329,15 +366,8 @@ static inline void kick_recover(void)
static void unplug_disk(int idx)
{
- int i;
- /*
- * We need to keep last disk path to generate EIO when all disks are
- * broken
- */
- for (i = idx; i < md_nr_disks - 1; i++)
- md_disks[i] = md_disks[i + 1];
- md_nr_disks--;
+ remove_disk(idx);
sys->disk_space = md_init_space();
if (md_nr_disks > 0)
kick_recover();
--
1.7.9.5
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH v2 2/7] tests/055: test simultaneous multiple disk failures on the same node
2013-03-29 5:42 [Qemu-devel] [PATCH v2 0/7] md: add hot-plug and hot-unplug support Liu Yuan
2013-03-29 5:42 ` [Qemu-devel] [PATCH v2 1/7] md: add support for simultaneous disk failure on the same node Liu Yuan
@ 2013-03-29 5:42 ` Liu Yuan
2013-03-29 5:42 ` [Qemu-devel] [PATCH v2 3/7] net: enlarge connect_to_addr() scope Liu Yuan
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Liu Yuan @ 2013-03-29 5:42 UTC (permalink / raw)
To: qemu-devel
From: Liu Yuan <tailai.ly@taobao.com>
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
---
tests/055 | 7 +++++++
tests/055.out | 8 ++++++++
2 files changed, 15 insertions(+)
diff --git a/tests/055 b/tests/055
index 4c3bfb9..6b99552 100755
--- a/tests/055
+++ b/tests/055
@@ -42,6 +42,13 @@ $COLLIE cluster info | _filter_cluster_info
# simulate all disks failure
rm $STORE/1/d2 -rf
+dd if=/dev/urandom | $COLLIE vdi write test
+$COLLIE vdi check test
+$COLLIE cluster info | _filter_cluster_info
+
+# simulate simultaneous multiple disks failure
+rm $STORE/2/d0 -rf
+rm $STORE/2/d1 -rf
dd if=/dev/zero | $COLLIE vdi write test
$COLLIE vdi check test
$COLLIE cluster info | _filter_cluster_info
diff --git a/tests/055.out b/tests/055.out
index 284665a..e28188e 100644
--- a/tests/055.out
+++ b/tests/055.out
@@ -22,3 +22,11 @@ Cluster created at DATE
Epoch Time Version
DATE 2 [127.0.0.1:7000, 127.0.0.1:7002]
DATE 1 [127.0.0.1:7000, 127.0.0.1:7001, 127.0.0.1:7002]
+finish check&repair test
+Cluster status: running
+
+Cluster created at DATE
+
+Epoch Time Version
+DATE 2 [127.0.0.1:7000, 127.0.0.1:7002]
+DATE 1 [127.0.0.1:7000, 127.0.0.1:7001, 127.0.0.1:7002]
--
1.7.9.5
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH v2 3/7] net: enlarge connect_to_addr() scope
2013-03-29 5:42 [Qemu-devel] [PATCH v2 0/7] md: add hot-plug and hot-unplug support Liu Yuan
2013-03-29 5:42 ` [Qemu-devel] [PATCH v2 1/7] md: add support for simultaneous disk failure on the same node Liu Yuan
2013-03-29 5:42 ` [Qemu-devel] [PATCH v2 2/7] tests/055: test simultaneous multiple disk failures " Liu Yuan
@ 2013-03-29 5:42 ` Liu Yuan
2013-03-29 5:42 ` [Qemu-devel] [PATCH v2 4/7] md: add hot-plug and hot-unplug support Liu Yuan
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Liu Yuan @ 2013-03-29 5:42 UTC (permalink / raw)
To: qemu-devel
From: Liu Yuan <tailai.ly@taobao.com>
This is a prepare patch.
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
---
include/net.h | 8 ++++++++
sheep/sockfd_cache.c | 8 --------
2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/include/net.h b/include/net.h
index a68c880..75ac197 100644
--- a/include/net.h
+++ b/include/net.h
@@ -72,4 +72,12 @@ int set_rcv_timeout(int fd);
int get_local_addr(uint8_t *bytes);
bool inetaddr_is_valid(char *addr);
+static inline int connect_to_addr(const uint8_t *addr, int port)
+{
+ char name[INET6_ADDRSTRLEN];
+
+ addr_to_str(name, sizeof(name), addr, 0);
+ return connect_to(name, port);
+}
+
#endif
diff --git a/sheep/sockfd_cache.c b/sheep/sockfd_cache.c
index ddefbf2..1e0d6cf 100644
--- a/sheep/sockfd_cache.c
+++ b/sheep/sockfd_cache.c
@@ -337,14 +337,6 @@ static inline void check_idx(int idx)
queue_work(sys->sockfd_wqueue, w);
}
-static inline int connect_to_addr(const uint8_t *addr, int port)
-{
- char name[INET6_ADDRSTRLEN];
-
- addr_to_str(name, sizeof(name), addr, 0);
- return connect_to(name, port);
-}
-
/* Add the node back if it is still alive */
static inline int revalidate_node(const struct node_id *nid)
{
--
1.7.9.5
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH v2 4/7] md: add hot-plug and hot-unplug support
2013-03-29 5:42 [Qemu-devel] [PATCH v2 0/7] md: add hot-plug and hot-unplug support Liu Yuan
` (2 preceding siblings ...)
2013-03-29 5:42 ` [Qemu-devel] [PATCH v2 3/7] net: enlarge connect_to_addr() scope Liu Yuan
@ 2013-03-29 5:42 ` Liu Yuan
2013-03-29 5:42 ` [Qemu-devel] [PATCH v2 5/7] collie: add new commands to manipulate multi-disks Liu Yuan
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Liu Yuan @ 2013-03-29 5:42 UTC (permalink / raw)
To: qemu-devel
From: Liu Yuan <tailai.ly@taobao.com>
We allow group plug, group unplug and disks failure during (un)plugging.
Also add disk information function for collie.
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
---
collie/collie.c | 2 +-
include/internal_proto.h | 16 +++
include/sheepdog_proto.h | 2 +
sheep/md.c | 263 ++++++++++++++++++++++++++++++++--------------
sheep/ops.c | 45 ++++++++
sheep/sheep_priv.h | 5 +-
sheep/store.c | 3 +-
7 files changed, 253 insertions(+), 83 deletions(-)
diff --git a/collie/collie.c b/collie/collie.c
index 08c78eb..19085b4 100644
--- a/collie/collie.c
+++ b/collie/collie.c
@@ -19,7 +19,7 @@
#include "util.h"
static const char program_name[] = "collie";
-const char *sdhost = "localhost";
+const char *sdhost = "127.0.0.1";
int sdport = SD_LISTEN_PORT;
bool highlight = true;
bool raw_output;
diff --git a/include/internal_proto.h b/include/internal_proto.h
index 6f1fdb3..c43855b 100644
--- a/include/internal_proto.h
+++ b/include/internal_proto.h
@@ -69,6 +69,9 @@
#define SD_OP_FLUSH_PEER 0xAE
#define SD_OP_NOTIFY_VDI_ADD 0xAF
#define SD_OP_DELETE_CACHE 0xB0
+#define SD_OP_MD_INFO 0xB1
+#define SD_OP_MD_PLUG 0xB2
+#define SD_OP_MD_UNPLUG 0xB3
/* internal flags for hdr.flags, must be above 0x80 */
#define SD_FLAG_CMD_RECOVERY 0x0080
@@ -229,4 +232,17 @@ struct vdi_op_message {
uint8_t data[0];
};
+struct md_info {
+ int idx;
+ uint64_t size;
+ uint64_t used;
+ char path[PATH_MAX];
+};
+
+#define MD_MAX_DISK 64 /* FIXME remove roof and make it dynamic */
+struct sd_md_info {
+ struct md_info disk[MD_MAX_DISK];
+ int nr;
+};
+
#endif /* __INTERNAL_PROTO_H__ */
diff --git a/include/sheepdog_proto.h b/include/sheepdog_proto.h
index fe3738b..94baede 100644
--- a/include/sheepdog_proto.h
+++ b/include/sheepdog_proto.h
@@ -13,6 +13,8 @@
#include <inttypes.h>
#include <stdint.h>
+#include <linux/limits.h>
+
#include "util.h"
#define SD_PROTO_VER 0x02
diff --git a/sheep/md.c b/sheep/md.c
index 821a391..124f2ba 100644
--- a/sheep/md.c
+++ b/sheep/md.c
@@ -21,11 +21,12 @@
#include <sys/xattr.h>
#include <dirent.h>
#include <pthread.h>
+#include <string.h>
#include "sheep_priv.h"
+#include "util.h"
#define MD_DEFAULT_VDISKS 128
-#define MD_MAX_DISK 64 /* FIXME remove roof and make it dynamic */
#define MD_MAX_VDISK (MD_MAX_DISK * MD_DEFAULT_VDISKS)
struct disk {
@@ -123,20 +124,33 @@ static inline struct vdisk *oid_to_vdisk(uint64_t oid)
return oid_to_vdisk_from(md_vds, md_nr_vds, oid);
}
-int md_init_disk(char *path)
+static int path_to_disk_idx(char *path)
{
+ int i;
+
+ for (i = 0; i < md_nr_disks; i++)
+ if (strcmp(md_disks[i].path, path) == 0)
+ return i;
+
+ return -1;
+}
+
+void md_add_disk(char *path)
+{
+ if (path_to_disk_idx(path) != -1) {
+ sd_eprintf("duplicate path %s", path);
+ return;
+ }
+
md_nr_disks++;
- if (xmkdir(path, def_dmode) < 0)
- panic("%s, %m", path);
pstrcpy(md_disks[md_nr_disks - 1].path, PATH_MAX, path);
- sd_iprintf("%s added to md, nr %d", md_disks[md_nr_disks - 1].path,
+ sd_iprintf("%s, nr %d", md_disks[md_nr_disks - 1].path,
md_nr_disks);
- return 0;
}
static inline void calculate_vdisks(struct disk *disks, int nr_disks,
- uint64_t total)
+ uint64_t total)
{
uint64_t avg_size = total / nr_disks;
float factor;
@@ -154,6 +168,79 @@ static inline void calculate_vdisks(struct disk *disks, int nr_disks,
#define MDNAME "user.md.size"
#define MDSIZE sizeof(uint64_t)
+static int get_total_object_size(uint64_t oid, char *ignore, void *total)
+{
+ uint64_t *t = total;
+ *t += get_objsize(oid);
+
+ return SD_RES_SUCCESS;
+}
+
+/* If cleanup is true, temporary objects will be removed */
+static int for_each_object_in_path(char *path,
+ int (*func)(uint64_t, char *, void *),
+ bool cleanup, void *arg)
+{
+ DIR *dir;
+ struct dirent *d;
+ uint64_t oid;
+ int ret = SD_RES_SUCCESS;
+ char p[PATH_MAX];
+
+ dir = opendir(path);
+ if (!dir) {
+ sd_eprintf("failed to open %s, %m", path);
+ return SD_RES_EIO;
+ }
+
+ while ((d = readdir(dir))) {
+ if (!strncmp(d->d_name, ".", 1))
+ continue;
+
+ oid = strtoull(d->d_name, NULL, 16);
+ if (oid == 0 || oid == ULLONG_MAX)
+ continue;
+
+ /* don't call callback against temporary objects */
+ if (strlen(d->d_name) == 20 &&
+ strcmp(d->d_name + 16, ".tmp") == 0) {
+ if (cleanup) {
+ snprintf(p, PATH_MAX, "%s/%016"PRIx64".tmp",
+ path, oid);
+ sd_dprintf("remove tmp object %s", p);
+ unlink(p);
+ }
+ continue;
+ }
+
+ ret = func(oid, path, arg);
+ if (ret != SD_RES_SUCCESS)
+ break;
+ }
+ closedir(dir);
+ return ret;
+}
+
+static uint64_t get_path_size(char *path, uint64_t *used)
+{
+ struct statvfs fs;
+ uint64_t size;
+
+ if (statvfs(path, &fs) < 0) {
+ sd_eprintf("get disk %s space failed %m", path);
+ return 0;
+ }
+ size = (int64_t)fs.f_frsize * fs.f_bfree;
+
+ if (!used)
+ goto out;
+ if (for_each_object_in_path(path, get_total_object_size, false, used)
+ != SD_RES_SUCCESS)
+ return 0;
+out:
+ return size;
+}
+
/*
* If path is broken during initilization or not support xattr return 0. We can
* safely use 0 to represent failure case because 0 space path can be
@@ -161,9 +248,13 @@ static inline void calculate_vdisks(struct disk *disks, int nr_disks,
*/
static uint64_t init_path_space(char *path)
{
- struct statvfs fs;
uint64_t size;
+ if (xmkdir(path, def_dmode) < 0) {
+ sd_eprintf("%s, %m", path);
+ goto broken_path;
+ }
+
if (!is_xattr_enabled(path)) {
sd_iprintf("multi-disk support need xattr feature");
goto broken_path;
@@ -180,11 +271,9 @@ static uint64_t init_path_space(char *path)
return size;
create:
- if (statvfs(path, &fs) < 0) {
- sd_eprintf("get disk %s space failed %m", path);
+ size = get_path_size(path, NULL);
+ if (!size)
goto broken_path;
- }
- size = (int64_t)fs.f_frsize * fs.f_bfree;
if (setxattr(path, MDNAME, &size, MDSIZE, 0) < 0) {
sd_eprintf("%s, %m", path);
goto broken_path;
@@ -229,7 +318,8 @@ reinit:
}
calculate_vdisks(md_disks, md_nr_disks, total);
md_nr_vds = disks_to_vdisks(md_disks, md_nr_disks, md_vds);
- sys->enable_md = true;
+ if (!sys->enable_md)
+ sys->enable_md = true;
return total;
}
@@ -259,51 +349,6 @@ static char *get_object_path_nolock(uint64_t oid)
return md_disks[vd->idx].path;
}
-/* If cleanup is true, temporary objects will be removed */
-static int for_each_object_in_path(char *path,
- int (*func)(uint64_t, char *, void *),
- bool cleanup, void *arg)
-{
- DIR *dir;
- struct dirent *d;
- uint64_t oid;
- int ret = SD_RES_SUCCESS;
- char p[PATH_MAX];
-
- dir = opendir(path);
- if (!dir) {
- sd_eprintf("failed to open %s, %m", path);
- return SD_RES_EIO;
- }
-
- while ((d = readdir(dir))) {
- if (!strncmp(d->d_name, ".", 1))
- continue;
-
- oid = strtoull(d->d_name, NULL, 16);
- if (oid == 0 || oid == ULLONG_MAX)
- continue;
-
- /* don't call callback against temporary objects */
- if (strlen(d->d_name) == 20 &&
- strcmp(d->d_name + 16, ".tmp") == 0) {
- if (cleanup) {
- snprintf(p, PATH_MAX, "%s/%016"PRIx64".tmp",
- path, oid);
- sd_dprintf("remove tmp object %s", p);
- unlink(p);
- }
- continue;
- }
-
- ret = func(oid, path, arg);
- if (ret != SD_RES_SUCCESS)
- break;
- }
- closedir(dir);
- return ret;
-}
-
int for_each_object_in_wd(int (*func)(uint64_t oid, char *path, void *arg),
bool cleanup, void *arg)
{
@@ -345,17 +390,6 @@ struct md_work {
char path[PATH_MAX];
};
-static int path_to_disk_idx(char *path)
-{
- int i;
-
- for (i = 0; i < md_nr_disks; i++)
- if (strcmp(md_disks[i].path, path) == 0)
- return i;
-
- return -1;
-}
-
static inline void kick_recover(void)
{
struct vnode_info *vinfo = get_vnode_info();
@@ -364,15 +398,6 @@ static inline void kick_recover(void)
put_vnode_info(vinfo);
}
-static void unplug_disk(int idx)
-{
-
- remove_disk(idx);
- sys->disk_space = md_init_space();
- if (md_nr_disks > 0)
- kick_recover();
-}
-
static void md_do_recover(struct work *work)
{
struct md_work *mw = container_of(work, struct md_work, work);
@@ -383,7 +408,10 @@ static void md_do_recover(struct work *work)
if (idx < 0)
/* Just ignore the duplicate EIO of the same path */
goto out;
- unplug_disk(idx);
+ remove_disk(idx);
+ sys->disk_space = md_init_space();
+ if (md_nr_disks > 0)
+ kick_recover();
out:
pthread_rwlock_unlock(&md_lock);
free(mw);
@@ -500,3 +528,80 @@ int md_get_stale_path(uint64_t oid, uint32_t epoch, char *path)
return SD_RES_NO_OBJ;
}
+
+uint32_t md_get_info(struct sd_md_info *info)
+{
+ uint32_t ret = sizeof(*info);
+ int i;
+
+ memset(info, 0, ret);
+ pthread_rwlock_rdlock(&md_lock);
+ for (i = 0; i < md_nr_disks; i++) {
+ info->disk[i].idx = i;
+ pstrcpy(info->disk[i].path, PATH_MAX, md_disks[i].path);
+ info->disk[i].size = get_path_size(info->disk[i].path,
+ &info->disk[i].used);
+ if (!info->disk[i].size) {
+ ret = 0;
+ break;
+ }
+ }
+ info->nr = md_nr_disks;
+ pthread_rwlock_unlock(&md_lock);
+ return ret;
+}
+
+static inline void md_del_disk(char *path)
+{
+ int idx = path_to_disk_idx(path);
+
+ if (idx < 0) {
+ sd_eprintf("invalid path %s", path);
+ return;
+ }
+ remove_disk(idx);
+}
+
+static int do_plug_unplug(char *disks, bool plug)
+{
+ char *path;
+ int old_nr, ret = SD_RES_UNKNOWN;
+
+ pthread_rwlock_wrlock(&md_lock);
+ old_nr = md_nr_disks;
+ path = strtok(disks, ",");
+ do {
+ if (plug)
+ md_add_disk(path);
+ else
+ md_del_disk(path);
+ } while ((path = strtok(NULL, ",")));
+
+ /* If no disks change, bail out */
+ if (old_nr == md_nr_disks)
+ goto out;
+
+ sys->disk_space = md_init_space();
+ /*
+ * We have to kick recover aggressively because there is possibility
+ * that nr of disks are removed during md_init_space() happens to equal
+ * nr of disks we added.
+ */
+ if (md_nr_disks > 0)
+ kick_recover();
+
+ ret = SD_RES_SUCCESS;
+out:
+ pthread_rwlock_unlock(&md_lock);
+ return ret;
+}
+
+int md_plug_disks(char *disks)
+{
+ return do_plug_unplug(disks, true);
+}
+
+int md_unplug_disks(char *disks)
+{
+ return do_plug_unplug(disks, false);
+}
diff --git a/sheep/ops.c b/sheep/ops.c
index 8cba70d..3839437 100644
--- a/sheep/ops.c
+++ b/sheep/ops.c
@@ -667,6 +667,33 @@ static int local_set_cache_size(const struct sd_req *req, struct sd_rsp *rsp,
return SD_RES_SUCCESS;
}
+static int local_md_info(struct request *request)
+{
+ struct sd_rsp *rsp = &request->rp;
+ struct sd_req *req = &request->rq;
+
+ assert(req->data_length == sizeof(struct sd_md_info));
+ rsp->data_length = md_get_info((struct sd_md_info *)request->data);
+
+ return rsp->data_length ? SD_RES_SUCCESS : SD_RES_UNKNOWN;
+}
+
+static int local_md_plug(const struct sd_req *req, struct sd_rsp *rsp,
+ void *data)
+{
+ char *disks = (char *)data;
+
+ return md_plug_disks(disks);
+}
+
+static int local_md_unplug(const struct sd_req *req, struct sd_rsp *rsp,
+ void *data)
+{
+ char *disks = (char *)data;
+
+ return md_unplug_disks(disks);
+}
+
static int cluster_restore(const struct sd_req *req, struct sd_rsp *rsp,
void *data)
{
@@ -1110,6 +1137,24 @@ static struct sd_op_template sd_ops[] = {
.process_main = local_set_cache_size,
},
+ [SD_OP_MD_INFO] = {
+ .name = "MD_INFO",
+ .type = SD_OP_TYPE_LOCAL,
+ .process_work = local_md_info,
+ },
+
+ [SD_OP_MD_PLUG] = {
+ .name = "MD_PLUG_DISKS",
+ .type = SD_OP_TYPE_LOCAL,
+ .process_main = local_md_plug,
+ },
+
+ [SD_OP_MD_UNPLUG] = {
+ .name = "MD_UNPLUG_DISKS",
+ .type = SD_OP_TYPE_LOCAL,
+ .process_main = local_md_unplug,
+ },
+
/* gateway I/O operations */
[SD_OP_CREATE_AND_WRITE_OBJ] = {
.name = "CREATE_AND_WRITE_OBJ",
diff --git a/sheep/sheep_priv.h b/sheep/sheep_priv.h
index 652fd3a..098a7bb 100644
--- a/sheep/sheep_priv.h
+++ b/sheep/sheep_priv.h
@@ -417,11 +417,14 @@ int journal_file_init(const char *path, size_t size, bool skip);
int journal_file_write(uint64_t oid, const char *buf, size_t size, off_t, bool);
/* md.c */
-int md_init_disk(char *path);
+void md_add_disk(char *path);
uint64_t md_init_space(void);
char *get_object_path(uint64_t oid);
int md_handle_eio(char *);
bool md_exist(uint64_t oid);
int md_get_stale_path(uint64_t oid, uint32_t epoch, char *path);
+uint32_t md_get_info(struct sd_md_info *info);
+int md_plug_disks(char *disks);
+int md_unplug_disks(char *disks);
#endif
diff --git a/sheep/store.c b/sheep/store.c
index 58303fa..cbf24dc 100644
--- a/sheep/store.c
+++ b/sheep/store.c
@@ -269,8 +269,7 @@ static int init_obj_path(const char *base_path, char *argp)
/* Eat up the first component */
strtok(argp, ",");
while ((p = strtok(NULL, ",")))
- if (md_init_disk(p) < 0)
- return -1;
+ md_add_disk(p);
return init_path(obj_path, NULL);
}
--
1.7.9.5
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH v2 5/7] collie: add new commands to manipulate multi-disks
2013-03-29 5:42 [Qemu-devel] [PATCH v2 0/7] md: add hot-plug and hot-unplug support Liu Yuan
` (3 preceding siblings ...)
2013-03-29 5:42 ` [Qemu-devel] [PATCH v2 4/7] md: add hot-plug and hot-unplug support Liu Yuan
@ 2013-03-29 5:42 ` Liu Yuan
2013-03-29 5:42 ` [Qemu-devel] [PATCH v2 6/7] sheep: remove duplicate recovery complete notification Liu Yuan
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Liu Yuan @ 2013-03-29 5:42 UTC (permalink / raw)
To: qemu-devel
From: Liu Yuan <tailai.ly@taobao.com>
Three command added:
$ collie node md info # show information about md of {this node, all nodes}
$ collie node md plug path1,{path2,...} # plug disk(s) into node
$ collie node md unplug path1,{path2,...} # unplug disk(s) into node
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
---
collie/node.c | 161 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 161 insertions(+)
diff --git a/collie/node.c b/collie/node.c
index a87d095..edb614d 100644
--- a/collie/node.c
+++ b/collie/node.c
@@ -11,6 +11,10 @@
#include "collie.h"
+static struct node_cmd_data {
+ bool all_nodes;
+} node_cmd_data;
+
static void cal_total_vdi_size(uint32_t vid, const char *name, const char *tag,
uint32_t snapid, uint32_t flags,
const struct sheepdog_inode *i, void *data)
@@ -213,6 +217,159 @@ static int node_kill(int argc, char **argv)
return EXIT_SUCCESS;
}
+static int node_md_info(struct node_id *nid)
+{
+ struct sd_md_info info = {};
+ char size_str[UINT64_DECIMAL_SIZE], used_str[UINT64_DECIMAL_SIZE];
+ struct sd_req hdr;
+ struct sd_rsp *rsp = (struct sd_rsp *)&hdr;
+ int fd, ret, i;
+
+ fd = connect_to_addr(nid->addr, nid->port);
+ if (fd < 0) {
+ fprintf(stderr, "Failed to connect %d\n", fd);
+ return EXIT_FAILURE;
+ }
+
+ sd_init_req(&hdr, SD_OP_MD_INFO);
+ hdr.data_length = sizeof(info);
+
+ ret = collie_exec_req(fd, &hdr, &info);
+ close(fd);
+ if (ret) {
+ fprintf(stderr, "Failed to connect\n");
+ return EXIT_FAILURE;
+ }
+
+ if (rsp->result != SD_RES_SUCCESS) {
+ fprintf(stderr, "failed to get multi-disk infomation: %s\n",
+ sd_strerror(rsp->result));
+ return EXIT_FAILURE;
+ }
+
+ for (i = 0; i < info.nr; i++) {
+ size_to_str(info.disk[i].size, size_str, sizeof(size_str));
+ size_to_str(info.disk[i].used, used_str, sizeof(used_str));
+ fprintf(stdout, "%2d\t%s\t%s\t%s\n", info.disk[i].idx, size_str,
+ used_str, info.disk[i].path);
+ }
+ return EXIT_SUCCESS;
+}
+
+static int md_info(int argc, char **argv)
+{
+ int i, ret;
+
+ fprintf(stdout, "Id\tSize\tUse\tPath\n");
+
+ if (!node_cmd_data.all_nodes) {
+ struct node_id nid = {.port = sdport};
+
+ if (!str_to_addr(sdhost, nid.addr)) {
+ fprintf(stderr, "Invalid address %s\n", sdhost);
+ return EXIT_FAILURE;
+ }
+
+ return node_md_info(&nid);
+ }
+
+ for (i = 0; i < sd_nodes_nr; i++) {
+ fprintf(stdout, "Node %d:\n", i);
+ ret = node_md_info(&sd_nodes[i].nid);
+ if (ret != EXIT_SUCCESS)
+ return EXIT_FAILURE;
+ }
+ return EXIT_SUCCESS;
+}
+
+static int do_plug_unplug(char *disks, bool plug)
+{
+ struct sd_req hdr;
+ struct sd_rsp *rsp = (struct sd_rsp *)&hdr;
+ int fd, ret;
+
+ fd = connect_to(sdhost, sdport);
+ if (fd < 0) {
+ fprintf(stderr, "Failed to connect %s:%d\n", sdhost, sdport);
+ return EXIT_FAILURE;
+ }
+
+ if (plug)
+ sd_init_req(&hdr, SD_OP_MD_PLUG);
+ else
+ sd_init_req(&hdr, SD_OP_MD_UNPLUG);
+ hdr.flags = SD_FLAG_CMD_WRITE;
+ hdr.data_length = strlen(disks);
+
+ ret = collie_exec_req(fd, &hdr, disks);
+ close(fd);
+ if (ret) {
+ fprintf(stderr, "Failed to connect\n");
+ return EXIT_FAILURE;
+ }
+
+ if (rsp->result != SD_RES_SUCCESS) {
+ fprintf(stderr, "Failed to execute request, look for sheep.log"
+ " for more information\n");
+ return EXIT_FAILURE;
+ }
+
+ return EXIT_SUCCESS;
+}
+
+static int md_plug(int argc, char **argv)
+{
+ return do_plug_unplug(argv[optind], true);
+}
+
+static int md_unplug(int argc, char **argv)
+{
+ return do_plug_unplug(argv[optind], false);
+}
+
+static struct subcommand node_md_cmd[] = {
+ {"info", NULL, NULL, "show multi-disk information",
+ NULL, 0, md_info},
+ {"plug", NULL, NULL, "plug more disk(s) into node",
+ NULL, SUBCMD_FLAG_NEED_THIRD_ARG, md_plug},
+ {"unplug", NULL, NULL, "unplug disk(s) from node",
+ NULL, 0, md_unplug},
+ {NULL},
+};
+
+static int node_md(int argc, char **argv)
+{
+ int i;
+
+ for (i = 0; node_md_cmd[i].name; i++) {
+ if (!strcmp(node_md_cmd[i].name, argv[optind])) {
+ optind++;
+ return node_md_cmd[i].fn(argc, argv);
+ }
+ }
+
+ subcommand_usage(argv[1], argv[2], EXIT_FAILURE);
+ return EXIT_FAILURE;
+}
+
+
+static int node_parser(int ch, char *opt)
+{
+ switch (ch) {
+ case 'A':
+ node_cmd_data.all_nodes = true;
+ break;
+ }
+
+ return 0;
+}
+
+static struct sd_option node_options[] = {
+ {'A', "all", false, "show md information of all the nodes"},
+
+ { 0, NULL, false, NULL },
+};
+
static struct subcommand node_cmd[] = {
{"kill", "<node id>", "aprh", "kill node", NULL,
SUBCMD_FLAG_NEED_THIRD_ARG | SUBCMD_FLAG_NEED_NODELIST, node_kill},
@@ -224,10 +381,14 @@ static struct subcommand node_cmd[] = {
SUBCMD_FLAG_NEED_NODELIST, node_recovery},
{"cache", "<cache size>", "aprh", "specify max cache size", NULL,
SUBCMD_FLAG_NEED_THIRD_ARG, node_cache},
+ {"md", NULL, "apAh", "See 'collie node md' for more information",
+ node_md_cmd, SUBCMD_FLAG_NEED_NODELIST|SUBCMD_FLAG_NEED_THIRD_ARG,
+ node_md, node_options},
{NULL,},
};
struct command node_command = {
"node",
node_cmd,
+ node_parser
};
--
1.7.9.5
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH v2 6/7] sheep: remove duplicate recovery complete notification
2013-03-29 5:42 [Qemu-devel] [PATCH v2 0/7] md: add hot-plug and hot-unplug support Liu Yuan
` (4 preceding siblings ...)
2013-03-29 5:42 ` [Qemu-devel] [PATCH v2 5/7] collie: add new commands to manipulate multi-disks Liu Yuan
@ 2013-03-29 5:42 ` Liu Yuan
2013-03-29 5:42 ` [Qemu-devel] [PATCH v2 7/7] tsets: add 057 to test md hot-plug and hot-unplug Liu Yuan
2013-03-29 5:46 ` [Qemu-devel] [PATCH v2 0/7] md: add hot-plug and hot-unplug support Liu Yuan
7 siblings, 0 replies; 9+ messages in thread
From: Liu Yuan @ 2013-03-29 5:42 UTC (permalink / raw)
To: qemu-devel
From: Liu Yuan <tailai.ly@taobao.com>
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
---
sheep/ops.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/sheep/ops.c b/sheep/ops.c
index 3839437..204befd 100644
--- a/sheep/ops.c
+++ b/sheep/ops.c
@@ -624,7 +624,19 @@ static int cluster_recovery_completion(const struct sd_req *req,
nr_recovereds = 0;
}
- recovereds[nr_recovereds++] = *(struct sd_node *)node;
+ /*
+ * Disk failure might send duplicate notication, ingore it.
+ *
+ * We can't simply stop disk recovery from sending notication because
+ * disk recovery might supersed node recovery, which indeed need
+ * to send notication
+ */
+ for (i = 0; i < nr_recovereds; i++)
+ if (!node_id_cmp(&node->nid, &recovereds[i].nid)) {
+ sd_dprintf("duplicate %s", node_to_str(node));
+ return SD_RES_SUCCESS;
+ }
+ recovereds[nr_recovereds++] = *node;
qsort(recovereds, nr_recovereds, sizeof(*recovereds), node_id_cmp);
sd_dprintf("%s is recovered at epoch %d", node_to_str(node), epoch);
--
1.7.9.5
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH v2 7/7] tsets: add 057 to test md hot-plug and hot-unplug
2013-03-29 5:42 [Qemu-devel] [PATCH v2 0/7] md: add hot-plug and hot-unplug support Liu Yuan
` (5 preceding siblings ...)
2013-03-29 5:42 ` [Qemu-devel] [PATCH v2 6/7] sheep: remove duplicate recovery complete notification Liu Yuan
@ 2013-03-29 5:42 ` Liu Yuan
2013-03-29 5:46 ` [Qemu-devel] [PATCH v2 0/7] md: add hot-plug and hot-unplug support Liu Yuan
7 siblings, 0 replies; 9+ messages in thread
From: Liu Yuan @ 2013-03-29 5:42 UTC (permalink / raw)
To: qemu-devel
From: Liu Yuan <tailai.ly@taobao.com>
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
---
tests/057 | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
tests/057.out | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
tests/group | 1 +
3 files changed, 113 insertions(+)
create mode 100755 tests/057
create mode 100644 tests/057.out
diff --git a/tests/057 b/tests/057
new file mode 100755
index 0000000..94d02af
--- /dev/null
+++ b/tests/057
@@ -0,0 +1,57 @@
+#!/bin/bash
+
+# Test md hot-plug and hot-unplug
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1 # failure is the default!
+md=true
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_cleanup
+
+for i in 0 1 2; do
+ _start_sheep $i
+done
+_wait_for_sheep 3
+$COLLIE cluster format
+sleep 1
+$COLLIE vdi create test 100M -P
+
+$COLLIE node md info | awk '{$2="MASKED";print $0}'
+
+# plug during node event
+_start_sheep 3
+_wait_for_sheep 4
+$COLLIE node md plug $STORE/0/d3,$STORE/0/d4
+_wait_for_sheep_recovery 0
+$COLLIE node md info | awk '{$2="MASKED";print $0}'
+$COLLIE vdi check test
+$COLLIE cluster info | _filter_cluster_info
+
+# plug duplicate path
+$COLLIE node md plug $STORE/0/d3
+$COLLIE node recovery
+$COLLIE node md info | awk '{$2="MASKED";print $0}'
+
+# unplug
+$COLLIE node md unplug $STORE/0/d0,$STORE/0/d1
+_wait_for_sheep_recovery 0
+$COLLIE node md info | awk '{$2="MASKED";print $0}'
+$COLLIE vdi check test
+$COLLIE cluster info | _filter_cluster_info
+
+
+# unplug invalid path
+$COLLIE node md unplug $STORE/0/d0
+$COLLIE node recovery
+$COLLIE node md info | awk '{$2="MASKED";print $0}'
+$COLLIE cluster info | _filter_cluster_info
+
+# check stale object purging
+find $STORE/*/d*/.stale/ -type f
diff --git a/tests/057.out b/tests/057.out
new file mode 100644
index 0000000..ec3e7c1
--- /dev/null
+++ b/tests/057.out
@@ -0,0 +1,55 @@
+QA output created by 057
+using backend farm store
+Id MASKED Use Path
+0 MASKED GB 48 MB /tmp/sheepdog/0/d0
+1 MASKED GB 24 MB /tmp/sheepdog/0/d1
+2 MASKED GB 32 MB /tmp/sheepdog/0/d2
+Id MASKED Use Path
+0 MASKED GB 16 MB /tmp/sheepdog/0/d0
+1 MASKED GB 12 MB /tmp/sheepdog/0/d1
+2 MASKED GB 32 MB /tmp/sheepdog/0/d2
+3 MASKED GB 8.0 MB /tmp/sheepdog/0/d3
+4 MASKED GB 28 MB /tmp/sheepdog/0/d4
+finish check&repair test
+Cluster status: running
+
+Cluster created at DATE
+
+Epoch Time Version
+DATE 2 [127.0.0.1:7000, 127.0.0.1:7001, 127.0.0.1:7002, 127.0.0.1:7003]
+DATE 1 [127.0.0.1:7000, 127.0.0.1:7001, 127.0.0.1:7002]
+Failed to execute request, look for sheep.log for more information
+Nodes In Recovery:
+ Id Host:Port V-Nodes Zone
+Id MASKED Use Path
+0 MASKED GB 16 MB /tmp/sheepdog/0/d0
+1 MASKED GB 12 MB /tmp/sheepdog/0/d1
+2 MASKED GB 32 MB /tmp/sheepdog/0/d2
+3 MASKED GB 8.0 MB /tmp/sheepdog/0/d3
+4 MASKED GB 28 MB /tmp/sheepdog/0/d4
+Id MASKED Use Path
+0 MASKED GB 32 MB /tmp/sheepdog/0/d2
+1 MASKED GB 24 MB /tmp/sheepdog/0/d3
+2 MASKED GB 40 MB /tmp/sheepdog/0/d4
+finish check&repair test
+Cluster status: running
+
+Cluster created at DATE
+
+Epoch Time Version
+DATE 2 [127.0.0.1:7000, 127.0.0.1:7001, 127.0.0.1:7002, 127.0.0.1:7003]
+DATE 1 [127.0.0.1:7000, 127.0.0.1:7001, 127.0.0.1:7002]
+Failed to execute request, look for sheep.log for more information
+Nodes In Recovery:
+ Id Host:Port V-Nodes Zone
+Id MASKED Use Path
+0 MASKED GB 32 MB /tmp/sheepdog/0/d2
+1 MASKED GB 24 MB /tmp/sheepdog/0/d3
+2 MASKED GB 40 MB /tmp/sheepdog/0/d4
+Cluster status: running
+
+Cluster created at DATE
+
+Epoch Time Version
+DATE 2 [127.0.0.1:7000, 127.0.0.1:7001, 127.0.0.1:7002, 127.0.0.1:7003]
+DATE 1 [127.0.0.1:7000, 127.0.0.1:7001, 127.0.0.1:7002]
diff --git a/tests/group b/tests/group
index 66d970d..ed786cc 100644
--- a/tests/group
+++ b/tests/group
@@ -70,3 +70,4 @@
054 auto quick cluster md
055 auto cluster md
056 auto quick cluster md
+057 auto quick cluster md
--
1.7.9.5
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/7] md: add hot-plug and hot-unplug support
2013-03-29 5:42 [Qemu-devel] [PATCH v2 0/7] md: add hot-plug and hot-unplug support Liu Yuan
` (6 preceding siblings ...)
2013-03-29 5:42 ` [Qemu-devel] [PATCH v2 7/7] tsets: add 057 to test md hot-plug and hot-unplug Liu Yuan
@ 2013-03-29 5:46 ` Liu Yuan
7 siblings, 0 replies; 9+ messages in thread
From: Liu Yuan @ 2013-03-29 5:46 UTC (permalink / raw)
To: qemu-devel
On 03/29/2013 01:42 PM, Liu Yuan wrote:
> From: Liu Yuan <tailai.ly@taobao.com>
>
> v2:
> - fix object stale purging for md
>
> This is the final part for MD. With this patch set, We even allow group plug,
> group unplug and disks/node failure during (un)plugging.
>
> Also add disk information function for collie.
>
> Liu Yuan (7):
> md: add support for simultaneous disk failure on the same node
> tests/055: test simultaneous multiple disk failures on the same node
> net: enlarge connect_to_addr() scope
> md: add hot-plug and hot-unplug support
> collie: add new commands to manipulate multi-disks
> sheep: remove duplicate recovery complete notification
> tsets: add 057 to test md hot-plug and hot-unplug
>
> collie/collie.c | 2 +-
> collie/node.c | 161 +++++++++++++++++++++++
> include/internal_proto.h | 16 +++
> include/net.h | 8 ++
> include/sheepdog_proto.h | 2 +
> sheep/md.c | 321 ++++++++++++++++++++++++++++++++--------------
> sheep/ops.c | 59 ++++++++-
> sheep/sheep_priv.h | 5 +-
> sheep/sockfd_cache.c | 8 --
> sheep/store.c | 3 +-
> tests/055 | 7 +
> tests/055.out | 8 ++
> tests/057 | 57 ++++++++
> tests/057.out | 55 ++++++++
> tests/group | 1 +
> 15 files changed, 607 insertions(+), 106 deletions(-)
> create mode 100755 tests/057
> create mode 100644 tests/057.out
>
Sorry for the noise. I silly sent to the wrong list.
Thanks,
Yuan
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-03-29 5:46 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-29 5:42 [Qemu-devel] [PATCH v2 0/7] md: add hot-plug and hot-unplug support Liu Yuan
2013-03-29 5:42 ` [Qemu-devel] [PATCH v2 1/7] md: add support for simultaneous disk failure on the same node Liu Yuan
2013-03-29 5:42 ` [Qemu-devel] [PATCH v2 2/7] tests/055: test simultaneous multiple disk failures " Liu Yuan
2013-03-29 5:42 ` [Qemu-devel] [PATCH v2 3/7] net: enlarge connect_to_addr() scope Liu Yuan
2013-03-29 5:42 ` [Qemu-devel] [PATCH v2 4/7] md: add hot-plug and hot-unplug support Liu Yuan
2013-03-29 5:42 ` [Qemu-devel] [PATCH v2 5/7] collie: add new commands to manipulate multi-disks Liu Yuan
2013-03-29 5:42 ` [Qemu-devel] [PATCH v2 6/7] sheep: remove duplicate recovery complete notification Liu Yuan
2013-03-29 5:42 ` [Qemu-devel] [PATCH v2 7/7] tsets: add 057 to test md hot-plug and hot-unplug Liu Yuan
2013-03-29 5:46 ` [Qemu-devel] [PATCH v2 0/7] md: add hot-plug and hot-unplug support Liu Yuan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).