dm-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] multipath-tools: Ceph rbd support
@ 2016-07-05  8:12 Mike Christie
  2016-07-05  8:12 ` [PATCH 1/4] multipath-tools: add rbd discovery detection Mike Christie
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Mike Christie @ 2016-07-05  8:12 UTC (permalink / raw)
  To: dm-devel, christophe.varoqui

The following patches add Ceph rbd support for handling
blacklisted devices. This does not support features like
multibus.

My specific use is for exporting rbd images through multiple
LIO instances. In this case, we have one rbd instance that
has the exclusive lock and send WRITE requests. If that host
becomes unreachable, then another host will grab the lock,
and blacklist the original host to prevent it from sending stale
IO (when blacklisted IO will be failed by the OSD).

To recover from this, this patchset adds a repair() callout
to the checker. If the path is in the PATH_DOWN state this
callout can be used to fix it up. For my case, I am remapping
the device to flush stale IO and cleanup the old lock,
and then unblacklisting myself.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/4] multipath-tools: add rbd discovery detection
  2016-07-05  8:12 [PATCH 0/4] multipath-tools: Ceph rbd support Mike Christie
@ 2016-07-05  8:12 ` Mike Christie
  2016-07-05  8:12 ` [PATCH 2/4] multipath-tools: add checker callout to repair path Mike Christie
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Mike Christie @ 2016-07-05  8:12 UTC (permalink / raw)
  To: dm-devel, christophe.varoqui; +Cc: Mike Christie

rbd is a block device interface for Ceph. It does not support
any SCSI commands, so this patch adds bus detection and virtual
vendor/product pathinfo.

Signed-off-by: Mike Christie <mchristi@redhat.com>
---
 libmultipath/checkers.h  |  1 +
 libmultipath/discovery.c | 22 ++++++++++++++++++++++
 libmultipath/structs.h   |  1 +
 3 files changed, 24 insertions(+)

diff --git a/libmultipath/checkers.h b/libmultipath/checkers.h
index a935b3f..ea59c94 100644
--- a/libmultipath/checkers.h
+++ b/libmultipath/checkers.h
@@ -84,6 +84,7 @@ enum path_check_state {
 #define EMC_CLARIION "emc_clariion"
 #define READSECTOR0  "readsector0"
 #define CCISS_TUR    "cciss_tur"
+#define RBD          "rbd"
 
 #define DEFAULT_CHECKER DIRECTIO
 
diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
index 126a54f..31bb02d 100644
--- a/libmultipath/discovery.c
+++ b/libmultipath/discovery.c
@@ -1136,6 +1136,23 @@ scsi_sysfs_pathinfo (struct path * pp)
 }
 
 static int
+rbd_sysfs_pathinfo (struct path * pp)
+{
+	sprintf(pp->vendor_id, "Ceph");
+	sprintf(pp->product_id, "RBD");
+
+	condlog(3, "%s: vendor = %s product = %s", pp->dev, pp->vendor_id,
+		pp->product_id);
+	/*
+	 * set the hwe configlet pointer
+	 */
+	pp->hwe = find_hwe(conf->hwtable, pp->vendor_id, pp->product_id, NULL);
+
+	/* should we fake host / bus / target / lun so print looks nice */
+	return 0;
+}
+
+static int
 ccw_sysfs_pathinfo (struct path * pp)
 {
 	struct udev_device *parent;
@@ -1337,6 +1354,8 @@ sysfs_pathinfo(struct path * pp)
 		pp->bus = SYSFS_BUS_CCW;
 	if (!strncmp(pp->dev,"sd", 2))
 		pp->bus = SYSFS_BUS_SCSI;
+	if (!strncmp(pp->dev,"rbd", 3))
+		pp->bus = SYSFS_BUS_RBD;
 
 	if (pp->bus == SYSFS_BUS_UNDEF)
 		return 0;
@@ -1349,6 +1368,9 @@ sysfs_pathinfo(struct path * pp)
 	} else if (pp->bus == SYSFS_BUS_CCISS) {
 		if (cciss_sysfs_pathinfo(pp))
 			return 1;
+	} else if (pp->bus == SYSFS_BUS_RBD) {
+		if (rbd_sysfs_pathinfo(pp))
+			return 1;
 	}
 	return 0;
 }
diff --git a/libmultipath/structs.h b/libmultipath/structs.h
index ab7dc25..84e56dc 100644
--- a/libmultipath/structs.h
+++ b/libmultipath/structs.h
@@ -52,6 +52,7 @@ enum sysfs_buses {
 	SYSFS_BUS_IDE,
 	SYSFS_BUS_CCW,
 	SYSFS_BUS_CCISS,
+	SYSFS_BUS_RBD,
 };
 
 enum pathstates {
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/4] multipath-tools: add checker callout to repair path
  2016-07-05  8:12 [PATCH 0/4] multipath-tools: Ceph rbd support Mike Christie
  2016-07-05  8:12 ` [PATCH 1/4] multipath-tools: add rbd discovery detection Mike Christie
@ 2016-07-05  8:12 ` Mike Christie
  2016-07-05  8:12 ` [PATCH 3/4] multipath-tools: Add rbd checker Mike Christie
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Mike Christie @ 2016-07-05  8:12 UTC (permalink / raw)
  To: dm-devel, christophe.varoqui; +Cc: Mike Christie

This patch adds a callback which can be used to repair a path
if check() has determined it is in the PATH_DOWN state.

The next patch that adds rbd checker support which will use this to
handle the case where a rbd device is blacklisted.

Signed-off-by: Mike Christie <mchristi@redhat.com>
---
 libmultipath/checkers.c              | 23 +++++++++++++++++++++++
 libmultipath/checkers.h              |  4 ++++
 libmultipath/checkers/cciss_tur.c    |  5 +++++
 libmultipath/checkers/directio.c     |  5 +++++
 libmultipath/checkers/emc_clariion.c |  5 +++++
 libmultipath/checkers/hp_sw.c        |  5 +++++
 libmultipath/checkers/rdac.c         |  5 +++++
 libmultipath/checkers/readsector0.c  |  5 +++++
 libmultipath/checkers/tur.c          |  5 +++++
 multipathd/main.c                    | 14 +++++++++++++-
 10 files changed, 75 insertions(+), 1 deletion(-)

diff --git a/libmultipath/checkers.c b/libmultipath/checkers.c
index ef1d099..de6a973 100644
--- a/libmultipath/checkers.c
+++ b/libmultipath/checkers.c
@@ -139,6 +139,14 @@ struct checker * add_checker (char * name)
 	if (!c->free)
 		goto out;
 
+	c->repair = (void (*)(struct checker *)) dlsym(c->handle,
+						       "libcheck_repair");
+	errstr = dlerror();
+	if (errstr != NULL)
+		condlog(0, "A dynamic linking error occurred: (%s)", errstr);
+	if (!c->repair)
+		goto out;
+
 	c->fd = 0;
 	c->sync = 1;
 	list_add(&c->node, &checkers);
@@ -204,6 +212,20 @@ void checker_put (struct checker * dst)
 	free_checker(src);
 }
 
+void checker_repair (struct checker * c)
+{
+	if (!c)
+		return;
+
+	c->message[0] = '\0';
+	if (c->disable) {
+		MSG(c, "checker disabled");
+		return;
+	}
+
+	c->repair(c);
+}
+
 int checker_check (struct checker * c)
 {
 	int r;
@@ -268,6 +290,7 @@ void checker_get (struct checker * dst, char * name)
 	dst->sync = src->sync;
 	strncpy(dst->name, src->name, CHECKER_NAME_LEN);
 	strncpy(dst->message, src->message, CHECKER_MSG_LEN);
+	dst->repair = src->repair;
 	dst->check = src->check;
 	dst->init = src->init;
 	dst->free = src->free;
diff --git a/libmultipath/checkers.h b/libmultipath/checkers.h
index ea59c94..d665736 100644
--- a/libmultipath/checkers.h
+++ b/libmultipath/checkers.h
@@ -113,6 +113,9 @@ struct checker {
 						multipath-wide. Use MALLOC if
 						you want to stuff data in. */
 	int (*check)(struct checker *);
+	void (*repair)(struct checker *);     /* called if check returns
+					        PATH_DOWN to bring path into
+						usable state */
 	int (*init)(struct checker *);       /* to allocate the context */
 	void (*free)(struct checker *);      /* to free the context */
 };
@@ -132,6 +135,7 @@ void checker_set_async (struct checker *);
 void checker_set_fd (struct checker *, int);
 void checker_enable (struct checker *);
 void checker_disable (struct checker *);
+void checker_repair (struct checker *);
 int checker_check (struct checker *);
 int checker_selected (struct checker *);
 char * checker_name (struct checker *);
diff --git a/libmultipath/checkers/cciss_tur.c b/libmultipath/checkers/cciss_tur.c
index 4c26901..7e4eb81 100644
--- a/libmultipath/checkers/cciss_tur.c
+++ b/libmultipath/checkers/cciss_tur.c
@@ -63,6 +63,11 @@ void libcheck_free (struct checker * c)
 	return;
 }
 
+void libcheck_repair (struct checker * c)
+{
+	return;
+}
+
 extern int
 libcheck_check (struct checker * c)
 {
diff --git a/libmultipath/checkers/directio.c b/libmultipath/checkers/directio.c
index 94bf8f7..eec12d5 100644
--- a/libmultipath/checkers/directio.c
+++ b/libmultipath/checkers/directio.c
@@ -118,6 +118,11 @@ void libcheck_free (struct checker * c)
 	free(ct);
 }
 
+void libcheck_repair (struct checker * c)
+{
+	return;
+}
+
 static int
 check_state(int fd, struct directio_context *ct, int sync, int timeout_secs)
 {
diff --git a/libmultipath/checkers/emc_clariion.c b/libmultipath/checkers/emc_clariion.c
index a797734..53db066 100644
--- a/libmultipath/checkers/emc_clariion.c
+++ b/libmultipath/checkers/emc_clariion.c
@@ -91,6 +91,11 @@ void libcheck_free (struct checker * c)
 	free(c->context);
 }
 
+void libcheck_repair (struct checker * c)
+{
+	return;
+}
+
 int libcheck_check (struct checker * c)
 {
 	unsigned char sense_buffer[128] = { 0, };
diff --git a/libmultipath/checkers/hp_sw.c b/libmultipath/checkers/hp_sw.c
index fe5e0f9..0cc1111 100644
--- a/libmultipath/checkers/hp_sw.c
+++ b/libmultipath/checkers/hp_sw.c
@@ -44,6 +44,11 @@ void libcheck_free (struct checker * c)
 	return;
 }
 
+void libcheck_repair (struct checker * c)
+{
+	return;
+}
+
 static int
 do_inq(int sg_fd, int cmddt, int evpd, unsigned int pg_op,
        void *resp, int mx_resp_len, int noisy, unsigned int timeout)
diff --git a/libmultipath/checkers/rdac.c b/libmultipath/checkers/rdac.c
index 00e3c44..68682c8 100644
--- a/libmultipath/checkers/rdac.c
+++ b/libmultipath/checkers/rdac.c
@@ -139,6 +139,11 @@ void libcheck_free (struct checker * c)
 	return;
 }
 
+void libcheck_repair (struct checker * c)
+{
+	return;
+}
+
 static int
 do_inq(int sg_fd, unsigned int pg_op, void *resp, int mx_resp_len,
        unsigned int timeout)
diff --git a/libmultipath/checkers/readsector0.c b/libmultipath/checkers/readsector0.c
index 0550fb6..b3ed1f3 100644
--- a/libmultipath/checkers/readsector0.c
+++ b/libmultipath/checkers/readsector0.c
@@ -23,6 +23,11 @@ void libcheck_free (struct checker * c)
 	return;
 }
 
+void libcheck_repair (struct checker * c)
+{
+	return;
+}
+
 int libcheck_check (struct checker * c)
 {
 	unsigned char buf[4096];
diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c
index 2edc8ad..338d4a3 100644
--- a/libmultipath/checkers/tur.c
+++ b/libmultipath/checkers/tur.c
@@ -97,6 +97,11 @@ void libcheck_free (struct checker * c)
 	return;
 }
 
+void libcheck_repair (struct checker * c)
+{
+	return;
+}
+
 #define TUR_MSG(msg, fmt, args...) snprintf(msg, CHECKER_MSG_LEN, fmt, ##args);
 
 int
diff --git a/multipathd/main.c b/multipathd/main.c
index c0ca571..14728d5 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1635,6 +1635,14 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
 	return 1;
 }
 
+void repair_path(struct vectors * vecs, struct path * pp)
+{
+	if (pp->state != PATH_DOWN)
+		return;
+
+	checker_repair(&pp->checker);
+}
+
 static void *
 checkerloop (void *ap)
 {
@@ -1665,6 +1673,7 @@ checkerloop (void *ap)
 	while (1) {
 		struct timeval diff_time, start_time, end_time;
 		int num_paths = 0, ticks = 0, signo, strict_timing, rc = 0;
+		int checked;
 		sigset_t mask;
 
 		if (gettimeofday(&start_time, NULL) != 0)
@@ -1695,7 +1704,10 @@ checkerloop (void *ap)
 			lock(vecs->lock);
 			pthread_testcancel();
 			vector_foreach_slot (vecs->pathvec, pp, i) {
-				num_paths += check_path(vecs, pp, ticks);
+				checked = check_path(vecs, pp, ticks);
+				if (checked)
+					repair_path(vecs, pp);
+				num_paths += checked;
 			}
 			lock_cleanup_pop(vecs->lock);
 		}
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/4] multipath-tools: Add rbd checker.
  2016-07-05  8:12 [PATCH 0/4] multipath-tools: Ceph rbd support Mike Christie
  2016-07-05  8:12 ` [PATCH 1/4] multipath-tools: add rbd discovery detection Mike Christie
  2016-07-05  8:12 ` [PATCH 2/4] multipath-tools: add checker callout to repair path Mike Christie
@ 2016-07-05  8:12 ` Mike Christie
  2016-07-05  8:12 ` [PATCH 4/4] multipath-tools: Add rbd to the hwtable Mike Christie
  2016-07-08  8:15 ` [PATCH 0/4] multipath-tools: Ceph rbd support Christophe Varoqui
  4 siblings, 0 replies; 9+ messages in thread
From: Mike Christie @ 2016-07-05  8:12 UTC (permalink / raw)
  To: dm-devel, christophe.varoqui; +Cc: Mike Christie

This checker currently only handles the case where a path is failed
due to it being blacklisted by the ceph cluster. The specific use
case for me is when LIO exports rbd images through multiple LIO
instances.

The problem it handles is when rbd instance1 has the exclusive lock,
but becomes unreachable. Another host in the cluster will take over
and blacklist the instance1. This prevents it from sending stale IO
and corrupting date.

Later, when the host is reachable, we will want to failback to it.
To this, the checker will detect we were blacklisted, unmap the old
image which will make sure old IO is failed, and then remap the image
and unblacklist the host. multipathd will then handle this like a
path being removed and re-added.

Signed-off-by: Mike Christie <mchristi@redhat.com>
---
 libmultipath/checkers/Makefile |   6 +-
 libmultipath/checkers/rbd.c    | 612 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 617 insertions(+), 1 deletion(-)
 create mode 100644 libmultipath/checkers/rbd.c

diff --git a/libmultipath/checkers/Makefile b/libmultipath/checkers/Makefile
index 4b1a108..1538eb8 100644
--- a/libmultipath/checkers/Makefile
+++ b/libmultipath/checkers/Makefile
@@ -11,12 +11,16 @@ LIBS= \
 	libcheckdirectio.so \
 	libcheckemc_clariion.so \
 	libcheckhp_sw.so \
-	libcheckrdac.so
+	libcheckrdac.so \
+	libcheckrbd.so
 
 CFLAGS += -I..
 
 all: $(LIBS)
 
+libcheckrbd.so: rbd.o
+	$(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ -lrados -ludev
+
 libcheckdirectio.so: libsg.o directio.o
 	$(CC) $(LDFLAGS) $(SHARED_FLAGS) -o $@ $^ -laio
 
diff --git a/libmultipath/checkers/rbd.c b/libmultipath/checkers/rbd.c
new file mode 100644
index 0000000..071d3f3
--- /dev/null
+++ b/libmultipath/checkers/rbd.c
@@ -0,0 +1,612 @@
+/*
+ * Copyright (c) 2016 Red Hat
+ * Copyright (c) 2004 Christophe Varoqui
+ *
+ * Code based off of tur.c and ceph's krbd.cc
+ */
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <errno.h>
+#include <pthread.h>
+#include <libudev.h>
+#include <ifaddrs.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/ioctl.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <arpa/inet.h>
+
+#include "rados/librados.h"
+
+#include "structs.h"
+#include "checkers.h"
+
+#include "../libmultipath/debug.h"
+#include "../libmultipath/uevent.h"
+
+struct rbd_checker_context;
+typedef int (thread_fn)(struct rbd_checker_context *ct, char *msg);
+
+#define RBD_MSG(msg, fmt, args...) snprintf(msg, CHECKER_MSG_LEN, fmt, ##args);
+
+struct rbd_checker_context {
+	int rbd_bus_id;
+	unsigned int addr_nonce;
+	char *blk_lst_addr;
+	char *config_info;
+	int remapped;
+	int blacklisted;
+
+	rados_t cluster;
+
+	int state;
+	int running;
+	time_t time;
+	thread_fn *fn;
+	pthread_t thread;
+	pthread_mutex_t lock;
+	pthread_cond_t active;
+	pthread_spinlock_t hldr_lock;
+	int holders;
+	char message[CHECKER_MSG_LEN];
+};
+
+int libcheck_init(struct checker * c)
+{
+	struct rbd_checker_context *ct;
+	struct udev_device *block_dev;
+	struct udev_device *bus_dev;
+	struct udev *udev;
+	struct stat sb;
+	const char *block_name, *nonce, *config;
+	char sysfs_path[PATH_SIZE];
+	int ret;
+
+	ct = malloc(sizeof(struct rbd_checker_context));
+	if (!ct)
+		return 1;
+	memset(ct, 0, sizeof(struct rbd_checker_context));
+	ct->holders = 1;
+	pthread_cond_init(&ct->active, NULL);
+	pthread_mutex_init(&ct->lock, NULL);
+	pthread_spin_init(&ct->hldr_lock, PTHREAD_PROCESS_PRIVATE);
+	c->context = ct;
+
+	/*
+	 * The rbd block layer sysfs device is not linked to the rbd bus
+	 * device that we interact with, so figure that out now.
+	 */
+	if (fstat(c->fd, &sb) != 0)
+		goto free_ct;
+
+	udev = udev_new();
+	if (!udev)
+		goto free_ct;
+
+	block_dev = udev_device_new_from_devnum(udev, 'b', sb.st_rdev);
+	if (!block_dev)
+		goto free_udev;
+
+	block_name  = udev_device_get_sysname(block_dev);
+	ret = sscanf(block_name, "rbd%d", &ct->rbd_bus_id);
+
+	udev_device_unref(block_dev);
+	if (ret != 1)
+		goto free_udev;
+
+	snprintf(sysfs_path, sizeof(sysfs_path), "/sys/bus/rbd/devices/%d",
+		 ct->rbd_bus_id);
+	bus_dev = udev_device_new_from_syspath(udev, sysfs_path);
+	if (!bus_dev)
+		goto free_udev;
+
+	nonce = udev_device_get_sysattr_value(bus_dev, "client_addr_nonce");
+	if (!nonce)
+		goto free_dev;
+
+	ret = sscanf(nonce, "%u\n", &ct->addr_nonce);
+	if (ret != 1)
+		goto free_dev;
+
+	config = udev_device_get_sysattr_value(bus_dev, "config_info");
+	if (!config)
+		goto free_dev;
+
+	ct->config_info = strdup(config);
+	if (!ct->config_info)
+		goto free_dev;
+
+	if (rados_create(&ct->cluster, NULL) < 0)
+		goto free_config;
+
+	if (rados_conf_read_file(ct->cluster, NULL) < 0)
+		goto shutdown_rados;
+
+	ret = rados_connect(ct->cluster);
+	if (ret < 0)
+		goto shutdown_rados;
+
+	udev_device_unref(bus_dev);
+	udev_unref(udev);
+
+	return 0;
+
+shutdown_rados:
+	rados_shutdown(ct->cluster);
+free_config:
+	free(ct->config_info);
+free_dev:
+	udev_device_unref(bus_dev);
+free_udev:
+	udev_unref(udev);
+free_ct:
+	free(ct);
+	return 1;
+}
+
+void cleanup_context(struct rbd_checker_context *ct)
+{
+	pthread_mutex_destroy(&ct->lock);
+	pthread_cond_destroy(&ct->active);
+	pthread_spin_destroy(&ct->hldr_lock);
+
+	rados_shutdown(ct->cluster);
+
+	if (ct->blk_lst_addr)
+		free(ct->blk_lst_addr);
+	free(ct->config_info);
+	free(ct);
+}
+
+void libcheck_free(struct checker * c)
+{
+	if (c->context) {
+		struct rbd_checker_context *ct = c->context;
+		int holders;
+		pthread_t thread;
+
+		pthread_spin_lock(&ct->hldr_lock);
+		ct->holders--;
+		holders = ct->holders;
+		thread = ct->thread;
+		pthread_spin_unlock(&ct->hldr_lock);
+		if (holders)
+			pthread_cancel(thread);
+		else
+			cleanup_context(ct);
+		c->context = NULL;
+	}
+}
+
+static int rbd_match_addr(struct in6_addr *inaddr)
+{
+	struct ifaddrs *ifap, *ifa;
+	int ret = 1;
+
+	if (getifaddrs(&ifap))
+		return -EAGAIN;
+
+	for (ifa = ifap; ifa; ifa = ifa->ifa_next) {
+		struct sockaddr_in *s4;
+	        struct sockaddr_in6 *s6;
+
+		if (!ifa->ifa_addr)
+			continue;
+
+		switch (ifa->ifa_addr->sa_family) {
+		case AF_INET:
+			s4 = (struct sockaddr_in *)(ifa->ifa_addr);
+			if (!memcmp(&s4->sin_addr, (struct in_addr *) inaddr,
+				    sizeof(struct in_addr)))
+				goto free_ifap;
+			break;
+		case AF_INET6:
+			s6 = (struct sockaddr_in6 *)(ifa->ifa_addr);
+			if (!memcmp(&s6->sin6_addr, inaddr,
+				    sizeof(struct in6_addr)))
+				goto free_ifap;
+			break;
+		default:
+			continue;
+		}
+	}
+	ret = 0;
+
+free_ifap:
+	freeifaddrs(ifap);
+	return ret;
+}
+
+static int rbd_is_blacklisted(struct rbd_checker_context *ct, char *msg)
+{
+	char *nonce, *addr_tok, *start, *save;
+	char *cmd[2];
+	char *blklist, *stat;
+	size_t blklist_len, stat_len;
+	unsigned int blklisted_nonce;
+	int ret;
+	char *addr;
+	struct in6_addr inaddr;
+
+	cmd[0] = "{\"prefix\": \"osd blacklist ls\"}";
+	cmd[1] = NULL;
+
+	ret = rados_mon_command(ct->cluster, (const char **)cmd, 1, "", 0,
+				&blklist, &blklist_len, &stat, &stat_len);
+	if (ret < 0) {
+		RBD_MSG(msg, "rbd checker failed: mon command failed %d",
+			ret);
+		return ret;
+	}
+
+	if (!blklist || !blklist_len)
+		goto free_bufs;
+
+	/*
+	 * parse list of addrs with the format
+	 * ipv4:port/nonce date time\n
+	 * or
+	 * [ipv6]:port/nonce date time\n
+	 */
+	ret = 0;
+	for (start = blklist; ; start = NULL) {
+		addr_tok = strtok_r(start, "\n", &save);
+		if (!addr_tok || !strlen(addr_tok))
+			break;
+
+		nonce = strchr(addr_tok, '/');
+		if (!nonce || strlen(nonce) < 2) {
+			RBD_MSG(msg, "rbd%d checker failed: invalid blacklist %s",
+				ct->rbd_bus_id, addr_tok);
+			break;
+		}
+		nonce++;
+		blklisted_nonce = atoi(nonce);
+
+		if (blklisted_nonce == ct->addr_nonce) {
+			char *port, *end;
+
+			condlog(3, "rbd%d checker matched nonce %s\n",
+				ct->rbd_bus_id, nonce);
+			addr = addr_tok;
+			if (addr[0] == '[') {
+				addr++;
+				end = strrchr(addr, ']');
+				if (!end) {
+					ret = -EINVAL;
+					break;
+				}
+				*end = '\0';
+				end++;
+
+				port = strchr(end, ':');
+				if (!port) {
+					ret = -EINVAL;
+					break;
+				}
+				*port = '\0';
+
+				ret = inet_pton(AF_INET6, addr,
+						(struct in6_addr *) &inaddr);
+			} else {
+				port = strchr(addr, ':');
+				if (!port) {
+					ret = -EINVAL;
+					break;
+				}
+				*port = '\0';
+
+				ret = inet_pton(AF_INET, addr,
+						(struct in_addr *) &inaddr);
+			}
+
+			if (ret != 1) {
+				break;
+			}
+
+			ret = rbd_match_addr(&inaddr);
+			if (ret == 1) {
+				ct->blk_lst_addr = strdup(addr);
+				if (!ct->blk_lst_addr) {
+					ret = -ENOMEM;
+					break;
+				}
+
+				ct->blacklisted = 1;
+				RBD_MSG(msg, "rbd%d checker: %s/%u is blacklisted",
+					ct->rbd_bus_id, addr, blklisted_nonce);
+			}
+			break;
+		}
+	}
+
+free_bufs:
+	rados_buffer_free(blklist);
+	rados_buffer_free(stat);
+	return ret;
+}
+
+int rbd_check(struct rbd_checker_context *ct, char *msg)
+{
+	if (ct->blacklisted || rbd_is_blacklisted(ct, msg) == 1)
+		return PATH_DOWN;
+
+	RBD_MSG(msg, "rbd checker reports path is up");
+	/*
+	 * Path may have issues, but the ceph cluster is at least
+	 * accepting IO, so we can attempt to do IO.
+	 *
+	 * TODO: in future versions, we can run other tests to
+	 * verify OSDs and networks.
+	 */
+	return PATH_UP;
+}
+
+int safe_write(int fd, const void *buf, size_t count)
+{
+	while (count > 0) {
+		ssize_t r = write(fd, buf, count);
+		if (r < 0) {
+			if (errno == EINTR)
+				continue;
+			return -errno;
+		}
+		count -= r;
+		buf = (char *)buf + r;
+	}
+	return 0;
+}
+
+static int sysfs_write_rbd_bus(const char *which, const char *buf,
+			       size_t buf_len)
+{
+	char sysfs_path[PATH_SIZE];
+	int fd;
+	int r;
+
+	/* we require newer kernels so single_major should alwayws be there */
+	snprintf(sysfs_path, sizeof(sysfs_path),
+		 "/sys/bus/rbd/%s_single_major", which);
+	fd = open(sysfs_path, O_WRONLY);
+	if (fd < 0)
+		return -errno;
+
+	r = safe_write(fd, buf, buf_len);
+	close(fd);
+	return r;
+}
+
+static int sysfs_write_rbd_add(const char *buf, int buf_len)
+{
+	return sysfs_write_rbd_bus("add", buf, buf_len);
+}
+
+static int sysfs_write_rbd_remove(const char *buf, int buf_len)
+{
+	return sysfs_write_rbd_bus("remove", buf, buf_len);
+}
+
+static int rbd_rm_blacklist(struct rbd_checker_context *ct)
+{
+	char *cmd[2];
+	char *stat, *cmd_str;
+	size_t stat_len;
+	int ret;
+
+	ret = asprintf(&cmd_str, "{\"prefix\": \"osd blacklist\", \"blacklistop\": \"rm\", \"addr\": \"%s:0/%u\"}",
+		       ct->blk_lst_addr, ct->addr_nonce);
+	if (ret == -1)
+		return -ENOMEM;
+
+	cmd[0] = cmd_str;
+	cmd[1] = NULL;
+
+	ret = rados_mon_command(ct->cluster, (const char **)cmd, 1, "", 0,
+				NULL, 0, &stat, &stat_len);
+	if (ret < 0) {
+		condlog(1, "rbd%d repair failed to remove blacklist for %s/%u %d",
+			ct->rbd_bus_id, ct->blk_lst_addr, ct->addr_nonce, ret);
+		goto free_cmd;
+	}
+
+	condlog(1, "rbd%d repair rm blacklist for %s/%d",
+	       ct->rbd_bus_id, ct->blk_lst_addr, ct->addr_nonce);
+	free(stat);
+free_cmd:
+	free(cmd_str);
+	return ret;
+}
+
+static int rbd_repair(struct rbd_checker_context *ct, char *msg)
+{
+	char del[17];
+	int ret;
+
+	if (!ct->blacklisted)
+		return PATH_UP;
+
+	if (!ct->remapped) {
+		ret = sysfs_write_rbd_add(ct->config_info,
+					  strlen(ct->config_info) + 1);
+		if (ret) {
+			RBD_MSG(msg, "rbd%d repair failed to remap. Err %d",
+				ct->rbd_bus_id, ret);
+			return PATH_DOWN;
+		}
+	}
+	ct->remapped = 1;
+
+	snprintf(del, sizeof(del), "%d force", ct->rbd_bus_id);
+	ret = sysfs_write_rbd_remove(del, strlen(del) + 1);
+	if (ret) {
+		RBD_MSG(msg, "rbd%d repair failed to clean up. Err %d",
+			ct->rbd_bus_id, ret);
+		return PATH_DOWN;
+	}
+
+	ret = rbd_rm_blacklist(ct);
+	if (ret) {
+		RBD_MSG(msg, "rbd%d repair could not remove blacklist entry. Err %d",
+			ct->rbd_bus_id, ret);
+		return PATH_DOWN;
+	}
+
+	ct->remapped = 0;
+	ct->blacklisted = 0;
+
+	RBD_MSG(msg, "rbd%d has been repaired", ct->rbd_bus_id);
+	return PATH_UP;
+}
+
+#define rbd_thread_cleanup_push(ct) pthread_cleanup_push(cleanup_func, ct)
+#define rbd_thread_cleanup_pop(ct) pthread_cleanup_pop(1)
+
+void cleanup_func(void *data)
+{
+	int holders;
+	struct rbd_checker_context *ct = data;
+	pthread_spin_lock(&ct->hldr_lock);
+	ct->holders--;
+	holders = ct->holders;
+	ct->thread = 0;
+	pthread_spin_unlock(&ct->hldr_lock);
+	if (!holders)
+		cleanup_context(ct);
+}
+
+void *rbd_thread(void *ctx)
+{
+	struct rbd_checker_context *ct = ctx;
+	int state;
+
+	condlog(3, "rbd%d thread starting up", ct->rbd_bus_id);
+
+	ct->message[0] = '\0';
+	/* This thread can be canceled, so setup clean up */
+	rbd_thread_cleanup_push(ct)
+
+	/* checker start up */
+	pthread_mutex_lock(&ct->lock);
+	ct->state = PATH_PENDING;
+	pthread_mutex_unlock(&ct->lock);
+
+	state = ct->fn(ct, ct->message);
+
+	/* checker done */
+	pthread_mutex_lock(&ct->lock);
+	ct->state = state;
+	pthread_mutex_unlock(&ct->lock);
+	pthread_cond_signal(&ct->active);
+
+	condlog(3, "rbd%d thead finished, state %s", ct->rbd_bus_id,
+		checker_state_name(state));
+	rbd_thread_cleanup_pop(ct);
+	return ((void *)0);
+}
+
+static void rbd_timeout(struct timespec *tsp)
+{
+	struct timeval now;
+
+	gettimeofday(&now, NULL);
+	tsp->tv_sec = now.tv_sec;
+	tsp->tv_nsec = now.tv_usec * 1000;
+	tsp->tv_nsec += 1000000; /* 1 millisecond */
+}
+
+static int rbd_exec_fn(struct checker *c, thread_fn *fn)
+{
+	struct rbd_checker_context *ct = c->context;
+	struct timespec tsp;
+	pthread_attr_t attr;
+	int rbd_status, r;
+
+	if (c->sync)
+		return rbd_check(ct, c->message);
+	/*
+	 * Async mode
+	 */
+	r = pthread_mutex_lock(&ct->lock);
+	if (r != 0) {
+		condlog(2, "rbd%d mutex lock failed with %d", ct->rbd_bus_id,
+			r);
+		MSG(c, "rbd%d thread failed to initialize", ct->rbd_bus_id);
+		return PATH_WILD;
+	}
+
+	if (ct->running) {
+		/* Check if checker is still running */
+		if (ct->thread) {
+			condlog(3, "rbd%d thread not finished", ct->rbd_bus_id);
+			rbd_status = PATH_PENDING;
+		} else {
+			/* checker done */
+			ct->running = 0;
+			rbd_status = ct->state;
+			strncpy(c->message, ct->message, CHECKER_MSG_LEN);
+			c->message[CHECKER_MSG_LEN - 1] = '\0';
+		}
+		pthread_mutex_unlock(&ct->lock);
+	} else {
+		/* Start new checker */
+		ct->state = PATH_UNCHECKED;
+		ct->fn = fn;
+		pthread_spin_lock(&ct->hldr_lock);
+		ct->holders++;
+		pthread_spin_unlock(&ct->hldr_lock);
+		setup_thread_attr(&attr, 32 * 1024, 1);
+		r = pthread_create(&ct->thread, &attr, rbd_thread, ct);
+		if (r) {
+			pthread_mutex_unlock(&ct->lock);
+			ct->thread = 0;
+			ct->holders--;
+			condlog(3, "rbd%d failed to start rbd thread, using sync mode",
+				ct->rbd_bus_id);
+			return fn(ct, c->message);
+		}
+		pthread_attr_destroy(&attr);
+		rbd_timeout(&tsp);
+		r = pthread_cond_timedwait(&ct->active, &ct->lock, &tsp);
+		rbd_status = ct->state;
+		strncpy(c->message, ct->message,CHECKER_MSG_LEN);
+		c->message[CHECKER_MSG_LEN -1] = '\0';
+		pthread_mutex_unlock(&ct->lock);
+
+		if (ct->thread &&
+		    (rbd_status == PATH_PENDING || rbd_status == PATH_UNCHECKED)) {
+			condlog(3, "rbd%d thread still running",
+				ct->rbd_bus_id);
+			ct->running = 1;
+			rbd_status = PATH_PENDING;
+		}
+	}
+
+	return rbd_status;
+}
+
+void libcheck_repair(struct checker * c)
+{
+	struct rbd_checker_context *ct = c->context;
+
+	if (!ct || !ct->blacklisted)
+		return;
+	rbd_exec_fn(c, rbd_repair);
+}
+
+int libcheck_check(struct checker * c)
+{
+	struct rbd_checker_context *ct = c->context;
+
+	if (!ct)
+		return PATH_UNCHECKED;
+
+	if (ct->blacklisted)
+		return PATH_DOWN;
+
+	return rbd_exec_fn(c, rbd_check);
+}
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/4] multipath-tools: Add rbd to the hwtable
  2016-07-05  8:12 [PATCH 0/4] multipath-tools: Ceph rbd support Mike Christie
                   ` (2 preceding siblings ...)
  2016-07-05  8:12 ` [PATCH 3/4] multipath-tools: Add rbd checker Mike Christie
@ 2016-07-05  8:12 ` Mike Christie
  2016-07-08  8:15 ` [PATCH 0/4] multipath-tools: Ceph rbd support Christophe Varoqui
  4 siblings, 0 replies; 9+ messages in thread
From: Mike Christie @ 2016-07-05  8:12 UTC (permalink / raw)
  To: dm-devel, christophe.varoqui; +Cc: Mike Christie

Add rbd to hwtable. These defaults are for the HA type of setup
supported by the checker. We do no support features like multibus
at the dm-multipath level yet.

Signed-off-by: Mike Christie <mchristi@redhat.com>
---
 libmultipath/hwtable.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/libmultipath/hwtable.c b/libmultipath/hwtable.c
index 6116124..27a6ff1 100644
--- a/libmultipath/hwtable.c
+++ b/libmultipath/hwtable.c
@@ -1206,6 +1206,21 @@ static struct hwentry default_hw[] = {
 		.pgfailback    = 30,
 		.minio         = 128,
 	},
+	{
+		.vendor        = "Ceph",
+		.product       = "RBD",
+		.features      = DEFAULT_FEATURES,
+		.hwhandler     = DEFAULT_HWHANDLER,
+		.pgpolicy      = FAILOVER,
+		.pgfailback    = -FAILBACK_IMMEDIATE,
+		.no_path_retry = NO_PATH_RETRY_FAIL,
+		.checker_name  = RBD,
+		.user_friendly_names = USER_FRIENDLY_NAMES_ON,
+		.uid_attribute = "ID_UID",
+		.prio_name     = PRIO_CONST,
+		.deferred_remove = DEFERRED_REMOVE_ON,
+	},
+
 	/*
 	 * EOL
 	 */
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/4] multipath-tools: Ceph rbd support
  2016-07-05  8:12 [PATCH 0/4] multipath-tools: Ceph rbd support Mike Christie
                   ` (3 preceding siblings ...)
  2016-07-05  8:12 ` [PATCH 4/4] multipath-tools: Add rbd to the hwtable Mike Christie
@ 2016-07-08  8:15 ` Christophe Varoqui
  2016-07-08 15:46   ` Mike Christie
  4 siblings, 1 reply; 9+ messages in thread
From: Christophe Varoqui @ 2016-07-08  8:15 UTC (permalink / raw)
  To: Mike Christie; +Cc: device-mapper development


[-- Attachment #1.1: Type: text/plain, Size: 1076 bytes --]

Hi Mike,

this patchset was broken by the resync with hannes rcu v2 branch.
Can you rebase it in the light of the changes to the conf pointer access
applied in dbd9ad6f0e555707335ec71e1c5bec1723e02f79 ?

Thanks.

On Tue, Jul 5, 2016 at 10:12 AM, Mike Christie <mchristi@redhat.com> wrote:

> The following patches add Ceph rbd support for handling
> blacklisted devices. This does not support features like
> multibus.
>
> My specific use is for exporting rbd images through multiple
> LIO instances. In this case, we have one rbd instance that
> has the exclusive lock and send WRITE requests. If that host
> becomes unreachable, then another host will grab the lock,
> and blacklist the original host to prevent it from sending stale
> IO (when blacklisted IO will be failed by the OSD).
>
> To recover from this, this patchset adds a repair() callout
> to the checker. If the path is in the PATH_DOWN state this
> callout can be used to fix it up. For my case, I am remapping
> the device to flush stale IO and cleanup the old lock,
> and then unblacklisting myself.
>
>
>

[-- Attachment #1.2: Type: text/html, Size: 1485 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/4] multipath-tools: Ceph rbd support
  2016-07-08  8:15 ` [PATCH 0/4] multipath-tools: Ceph rbd support Christophe Varoqui
@ 2016-07-08 15:46   ` Mike Christie
  0 siblings, 0 replies; 9+ messages in thread
From: Mike Christie @ 2016-07-08 15:46 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development

On 07/08/2016 03:15 AM, Christophe Varoqui wrote:
> Hi Mike,
> 
> this patchset was broken by the resync with hannes rcu v2 branch.
> Can you rebase it in the light of the changes to the conf pointer access
> applied in dbd9ad6f0e555707335ec71e1c5bec1723e02f79 ?
> 

Yeah, no problem.


> Thanks.
> 
> On Tue, Jul 5, 2016 at 10:12 AM, Mike Christie <mchristi@redhat.com
> <mailto:mchristi@redhat.com>> wrote:
> 
>     The following patches add Ceph rbd support for handling
>     blacklisted devices. This does not support features like
>     multibus.
> 
>     My specific use is for exporting rbd images through multiple
>     LIO instances. In this case, we have one rbd instance that
>     has the exclusive lock and send WRITE requests. If that host
>     becomes unreachable, then another host will grab the lock,
>     and blacklist the original host to prevent it from sending stale
>     IO (when blacklisted IO will be failed by the OSD).
> 
>     To recover from this, this patchset adds a repair() callout
>     to the checker. If the path is in the PATH_DOWN state this
>     callout can be used to fix it up. For my case, I am remapping
>     the device to flush stale IO and cleanup the old lock,
>     and then unblacklisting myself.
> 
> 
> 
> 
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 4/4] multipath-tools: Add rbd to the hwtable
@ 2016-07-19 12:28 Xose Vazquez Perez
  0 siblings, 0 replies; 9+ messages in thread
From: Xose Vazquez Perez @ 2016-07-19 12:28 UTC (permalink / raw)
  To: Mike Christie, device-mapper development, Christophe Varoqui

Mike Christie wrote:

> Add rbd to hwtable. These defaults are for the HA type of setup
> supported by the checker. We do no support features like multibus
> at the dm-multipath level yet.

Please use this patch as guide:
https://patchwork.kernel.org/patch/9225521/

> Signed-off-by: Mike Christie <mchristi@redhat.com>
> ---
>  libmultipath/hwtable.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/libmultipath/hwtable.c b/libmultipath/hwtable.c
> index 6116124..27a6ff1 100644
> --- a/libmultipath/hwtable.c
> +++ b/libmultipath/hwtable.c
> @@ -1206,6 +1206,21 @@ static struct hwentry default_hw[] = {
>  		.pgfailback    = 30,
>  		.minio         = 128,
>  	},
> +	{
> +		.vendor        = "Ceph",
> +		.product       = "RBD",
> +		.features      = DEFAULT_FEATURES,
> +		.hwhandler     = DEFAULT_HWHANDLER,
> +		.pgpolicy      = FAILOVER,
> +		.pgfailback    = -FAILBACK_IMMEDIATE,
> +		.no_path_retry = NO_PATH_RETRY_FAIL,
> +		.checker_name  = RBD,
> +		.user_friendly_names = USER_FRIENDLY_NAMES_ON,

USER_FRIENDLY_NAMES_ON is highly inadvisable.

> +		.uid_attribute = "ID_UID",
> +		.prio_name     = PRIO_CONST,
> +		.deferred_remove = DEFERRED_REMOVE_ON,

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 4/4] multipath-tools: Add rbd to the hwtable
  2016-08-08 12:01 PATCH 0/4] multipath-tools: Ceph rbd support v2 Mike Christie
@ 2016-08-08 12:01 ` Mike Christie
  0 siblings, 0 replies; 9+ messages in thread
From: Mike Christie @ 2016-08-08 12:01 UTC (permalink / raw)
  To: dm-devel, christophe.varoqui; +Cc: Mike Christie

Add rbd to hwtable. These defaults are for the HA type of setup
supported by the checker. We do no support features like multibus
at the dm-multipath level yet.

Changes since v1:
1. Drop settings that were defaults and follow template.
2. Drop ID_UID use.

Signed-off-by: Mike Christie <mchristi@redhat.com>
---
 libmultipath/hwtable.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/libmultipath/hwtable.c b/libmultipath/hwtable.c
index 8c074f9..c307477 100644
--- a/libmultipath/hwtable.c
+++ b/libmultipath/hwtable.c
@@ -839,6 +839,21 @@ static struct hwentry default_hw[] = {
 		.flush_on_last_del = FLUSH_ENABLED,
 		.dev_loss      = 30,
 	},
+	{
+	/*
+	 * Red Hat
+	 *
+	 * Maintainer: Mike Christie
+	 * Mail: mchristi@redhat.com
+	 */
+		.vendor        = "Ceph",
+		.product       = "RBD",
+		.pgpolicy      = FAILOVER,
+		.no_path_retry = NO_PATH_RETRY_FAIL,
+		.checker_name  = RBD,
+		.deferred_remove = DEFERRED_REMOVE_ON,
+	},
+
 	/*
 	 * Tegile Systems
 	 */
-- 
2.7.2

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-08-08 12:01 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-05  8:12 [PATCH 0/4] multipath-tools: Ceph rbd support Mike Christie
2016-07-05  8:12 ` [PATCH 1/4] multipath-tools: add rbd discovery detection Mike Christie
2016-07-05  8:12 ` [PATCH 2/4] multipath-tools: add checker callout to repair path Mike Christie
2016-07-05  8:12 ` [PATCH 3/4] multipath-tools: Add rbd checker Mike Christie
2016-07-05  8:12 ` [PATCH 4/4] multipath-tools: Add rbd to the hwtable Mike Christie
2016-07-08  8:15 ` [PATCH 0/4] multipath-tools: Ceph rbd support Christophe Varoqui
2016-07-08 15:46   ` Mike Christie
  -- strict thread matches above, loose matches on Subject: below --
2016-07-19 12:28 [PATCH 4/4] multipath-tools: Add rbd to the hwtable Xose Vazquez Perez
2016-08-08 12:01 PATCH 0/4] multipath-tools: Ceph rbd support v2 Mike Christie
2016-08-08 12:01 ` [PATCH 4/4] multipath-tools: Add rbd to the hwtable Mike Christie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).