* [PATCH 00/13] systemd integraion
@ 2013-11-15 10:29 Hannes Reinecke
2013-11-15 10:29 ` [PATCH 01/13] Improve logging for orphan_path() Hannes Reinecke
` (12 more replies)
0 siblings, 13 replies; 24+ messages in thread
From: Hannes Reinecke @ 2013-11-15 10:29 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: dm-devel
Hi all,
this is a patch series for proper systemd integration.
Main idea is to have the CLI socket controlled by systemd,
so that the daemon will be started automatically if someone
calls 'multipathd -kSomething'.
The neat thing with this is that we can use the systemd-provided
watchdog to detect stalls in checkerloop(), which always have
been bitten us in the past and are really hard to diagnose
remotely.
Plus with socket activation we can now safely implement a
'one-shot' mode, where the daemon exits if no maps are discovered.
Which means that we can enable the socket always, and don't
have to worry about a daemon sitting in the background eating
up resources and doing nothing.
Oh. And some general fixes, too.
Hannes Reinecke (13):
Improve logging for orphan_path()
Set priority to '0' for PATH_BLOCKED or PATH_DOWN
libmultipath: fixup strlcpy
libmultipath: return error numbers from sysfs_get_XXX
libmultipath: do not stall on recv_packet()
multipathd: switch to socket activation for systemd
multipathd: use sd_notify() to inform systemd
multipathd: Add option '-s' to suppress timestamps
multipathd: Implement systemd watchdog integration
multipathd: enable core dumps for systemd
multipathd: Read environment variables from systemd
multipathd: measure path check time
multipathd: no_map_shutdown option
libmultipath/Makefile | 2 +-
libmultipath/config.h | 1 +
libmultipath/configure.c | 4 +-
libmultipath/debug.c | 17 ++++----
libmultipath/devmapper.c | 16 ++++----
libmultipath/dict.c | 32 +++++++++++++++
libmultipath/discovery.c | 43 +++++++++++---------
libmultipath/discovery.h | 2 +-
libmultipath/propsel.c | 2 +-
libmultipath/structs_vec.c | 9 ++---
libmultipath/structs_vec.h | 2 +-
libmultipath/util.c | 3 +-
libmultipath/uxsock.c | 41 +++++++++++++++++--
multipath.conf.annotated | 9 +++++
multipath/multipath.conf.5 | 14 ++++++-
multipathd/Makefile | 4 +-
multipathd/main.c | 94 ++++++++++++++++++++++++++++++++++---------
multipathd/multipathd.8 | 11 ++++-
multipathd/multipathd.service | 10 +++--
multipathd/multipathd.socket | 5 +++
multipathd/uxclnt.c | 4 +-
multipathd/uxlsnr.c | 4 +-
22 files changed, 245 insertions(+), 84 deletions(-)
create mode 100644 multipathd/multipathd.socket
--
1.8.1.4
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 01/13] Improve logging for orphan_path()
2013-11-15 10:29 [PATCH 00/13] systemd integraion Hannes Reinecke
@ 2013-11-15 10:29 ` Hannes Reinecke
2013-11-15 10:29 ` [PATCH 02/13] Set priority to '0' for PATH_BLOCKED or PATH_DOWN Hannes Reinecke
` (11 subsequent siblings)
12 siblings, 0 replies; 24+ messages in thread
From: Hannes Reinecke @ 2013-11-15 10:29 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: dm-devel
orphan_path() is called from various sections, so add a
description to the call to aid debugging.
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
libmultipath/configure.c | 4 ++--
libmultipath/structs_vec.c | 6 +++---
libmultipath/structs_vec.h | 2 +-
multipathd/main.c | 5 +++--
4 files changed, 9 insertions(+), 8 deletions(-)
diff --git a/libmultipath/configure.c b/libmultipath/configure.c
index 7f5e065..aad3c9f 100644
--- a/libmultipath/configure.c
+++ b/libmultipath/configure.c
@@ -518,7 +518,7 @@ coalesce_paths (struct vectors * vecs, vector newmp, char * refwwid, int force_r
/* 1. if path has no unique id or wwid blacklisted */
if (memcmp(empty_buff, pp1->wwid, WWID_SIZE) == 0 ||
filter_path(conf, pp1) > 0) {
- orphan_path(pp1);
+ orphan_path(pp1, "wwid blacklisted");
continue;
}
@@ -528,7 +528,7 @@ coalesce_paths (struct vectors * vecs, vector newmp, char * refwwid, int force_r
/* 3. if path has disappeared */
if (!pp1->size) {
- orphan_path(pp1);
+ orphan_path(pp1, "invalid size");
continue;
}
diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
index 993d3e0..14ea1c7 100644
--- a/libmultipath/structs_vec.c
+++ b/libmultipath/structs_vec.c
@@ -77,8 +77,9 @@ adopt_paths (vector pathvec, struct multipath * mpp, int get_info)
}
extern void
-orphan_path (struct path * pp)
+orphan_path (struct path * pp, const char *reason)
{
+ condlog(3, "%s: orphan path, %s", pp->dev, reason);
pp->mpp = NULL;
pp->dmstate = PSTATE_UNDEF;
pp->uid_attribute = NULL;
@@ -98,8 +99,7 @@ orphan_paths (vector pathvec, struct multipath * mpp)
vector_foreach_slot (pathvec, pp, i) {
if (pp->mpp == mpp) {
- condlog(4, "%s: orphaned", pp->dev);
- orphan_path(pp);
+ orphan_path(pp, "map flushed");
}
}
}
diff --git a/libmultipath/structs_vec.h b/libmultipath/structs_vec.h
index a907e85..c6278ac 100644
--- a/libmultipath/structs_vec.h
+++ b/libmultipath/structs_vec.h
@@ -17,7 +17,7 @@ void set_no_path_retry(struct multipath *mpp);
int adopt_paths (vector pathvec, struct multipath * mpp, int get_info);
void orphan_paths (vector pathvec, struct multipath * mpp);
-void orphan_path (struct path * pp);
+void orphan_path (struct path * pp, const char *reason);
int verify_paths(struct multipath * mpp, struct vectors * vecs, vector rpvec);
int update_mpp_paths(struct multipath * mpp, vector pathvec);
diff --git a/multipathd/main.c b/multipathd/main.c
index 8681aa2..91d7bfc 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -139,6 +139,7 @@ coalesce_maps(struct vectors *vecs, vector nmpv)
int j;
vector_foreach_slot (ompv, ompp, i) {
+ condlog(3, "%s: coalesce map", ompp->alias);
if (!find_mp_by_wwid(nmpv, ompp->wwid)) {
/*
* remove all current maps not allowed by the
@@ -229,7 +230,7 @@ flush_map(struct multipath * mpp, struct vectors * vecs)
}
else {
dm_lib_release();
- condlog(2, "%s: devmap removed", mpp->alias);
+ condlog(2, "%s: map flushed", mpp->alias);
}
orphan_paths(vecs->pathvec, mpp);
@@ -561,7 +562,7 @@ rescan:
fail_map:
remove_map(mpp, vecs, 1);
fail:
- orphan_path(pp);
+ orphan_path(pp, "failed to add path");
return 1;
}
--
1.8.1.4
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 02/13] Set priority to '0' for PATH_BLOCKED or PATH_DOWN
2013-11-15 10:29 [PATCH 00/13] systemd integraion Hannes Reinecke
2013-11-15 10:29 ` [PATCH 01/13] Improve logging for orphan_path() Hannes Reinecke
@ 2013-11-15 10:29 ` Hannes Reinecke
2013-11-15 10:29 ` [PATCH 03/13] libmultipath: fixup strlcpy Hannes Reinecke
` (10 subsequent siblings)
12 siblings, 0 replies; 24+ messages in thread
From: Hannes Reinecke @ 2013-11-15 10:29 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: dm-devel
When a path is down or blocked we need to initialize the priority
to '0'. Otherwise multipathd will discard the maps during reload
and fail to start if all paths are temporarily down.
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
libmultipath/discovery.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
index 04e3ead..3292358 100644
--- a/libmultipath/discovery.c
+++ b/libmultipath/discovery.c
@@ -1117,6 +1117,9 @@ pathinfo (struct path *pp, vector hwtable, int mask)
} else {
condlog(3, "%s: path inaccessible", pp->dev);
pp->chkrstate = pp->state = path_state;
+ if (path_state == PATH_PENDING ||
+ path_state == PATH_DOWN)
+ pp->priority = 0;
}
}
--
1.8.1.4
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 03/13] libmultipath: fixup strlcpy
2013-11-15 10:29 [PATCH 00/13] systemd integraion Hannes Reinecke
2013-11-15 10:29 ` [PATCH 01/13] Improve logging for orphan_path() Hannes Reinecke
2013-11-15 10:29 ` [PATCH 02/13] Set priority to '0' for PATH_BLOCKED or PATH_DOWN Hannes Reinecke
@ 2013-11-15 10:29 ` Hannes Reinecke
2013-11-15 10:29 ` [PATCH 04/13] libmultipath: return error numbers from sysfs_get_XXX Hannes Reinecke
` (9 subsequent siblings)
12 siblings, 0 replies; 24+ messages in thread
From: Hannes Reinecke @ 2013-11-15 10:29 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: dm-devel
The final comparison was wrong; 'size' was never decreased.
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
libmultipath/util.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/libmultipath/util.c b/libmultipath/util.c
index 41ac21b..a9f5939 100644
--- a/libmultipath/util.c
+++ b/libmultipath/util.c
@@ -112,8 +112,7 @@ size_t strlcpy(char *dst, const char *src, size_t size)
bytes++;
}
- /* If size == 0 there is no space for a final null... */
- if (size)
+ if (bytes == size)
*q = '\0';
return bytes;
}
--
1.8.1.4
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 04/13] libmultipath: return error numbers from sysfs_get_XXX
2013-11-15 10:29 [PATCH 00/13] systemd integraion Hannes Reinecke
` (2 preceding siblings ...)
2013-11-15 10:29 ` [PATCH 03/13] libmultipath: fixup strlcpy Hannes Reinecke
@ 2013-11-15 10:29 ` Hannes Reinecke
2013-11-17 17:34 ` Christophe Varoqui
2013-11-15 10:29 ` [PATCH 05/13] libmultipath: do not stall on recv_packet() Hannes Reinecke
` (8 subsequent siblings)
12 siblings, 1 reply; 24+ messages in thread
From: Hannes Reinecke @ 2013-11-15 10:29 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: dm-devel
Instead of returning hand-crafted error values from sysfs_get_XXX
functions we should be using the standard error numbers.
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
libmultipath/discovery.c | 40 +++++++++++++++++++++-------------------
libmultipath/discovery.h | 2 +-
libmultipath/propsel.c | 2 +-
libmultipath/structs_vec.c | 3 +--
4 files changed, 24 insertions(+), 23 deletions(-)
diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
index 3292358..d5557d9 100644
--- a/libmultipath/discovery.c
+++ b/libmultipath/discovery.c
@@ -140,31 +140,34 @@ path_discovery (vector pathvec, struct config * conf, int flag)
}
#define declare_sysfs_get_str(fname) \
-extern int \
+extern ssize_t \
sysfs_get_##fname (struct udev_device * udev, char * buff, size_t len) \
{ \
+ ssize_t ret; \
const char * attr; \
const char * devname; \
\
+ if (!udev) \
+ return -ENOSYS; \
+ \
devname = udev_device_get_sysname(udev); \
\
attr = udev_device_get_sysattr_value(udev, #fname); \
if (!attr) { \
condlog(3, "%s: attribute %s not found in sysfs", \
devname, #fname); \
- return 1; \
+ return -ENXIO; \
} \
if (strlen(attr) > len) { \
condlog(3, "%s: overflow in attribute %s", \
devname, #fname); \
- return 2; \
+ return -EINVAL; \
} \
- strlcpy(buff, attr, len); \
- return 0; \
+ ret = strlcpy(buff, attr, len); \
+ return ret; \
}
declare_sysfs_get_str(devtype);
-declare_sysfs_get_str(cutype);
declare_sysfs_get_str(vendor);
declare_sysfs_get_str(model);
declare_sysfs_get_str(rev);
@@ -180,7 +183,7 @@ sysfs_get_timeout(struct path *pp, unsigned int *timeout)
unsigned int t;
if (!pp->udev || pp->bus != SYSFS_BUS_SCSI)
- return 1;
+ return -ENOSYS;
parent = pp->udev;
while (parent) {
@@ -192,7 +195,7 @@ sysfs_get_timeout(struct path *pp, unsigned int *timeout)
}
if (!attr) {
condlog(3, "%s: No timeout value in sysfs", pp->dev);
- return 1;
+ return -ENXIO;
}
r = sscanf(attr, "%u\n", &t);
@@ -200,7 +203,7 @@ sysfs_get_timeout(struct path *pp, unsigned int *timeout)
if (r != 1) {
condlog(3, "%s: Cannot parse timeout attribute '%s'",
pp->dev, attr);
- return 1;
+ return -EINVAL;
}
*timeout = t;
@@ -650,17 +653,17 @@ scsi_sysfs_pathinfo (struct path * pp)
if (!attr_path || pp->sg_id.host_no == -1)
return 1;
- if (sysfs_get_vendor(parent, pp->vendor_id, SCSI_VENDOR_SIZE))
+ if (sysfs_get_vendor(parent, pp->vendor_id, SCSI_VENDOR_SIZE) <= 0)
return 1;
condlog(3, "%s: vendor = %s", pp->dev, pp->vendor_id);
- if (sysfs_get_model(parent, pp->product_id, SCSI_PRODUCT_SIZE))
+ if (sysfs_get_model(parent, pp->product_id, SCSI_PRODUCT_SIZE) <= 0)
return 1;
condlog(3, "%s: product = %s", pp->dev, pp->product_id);
- if (sysfs_get_rev(parent, pp->rev, SCSI_REV_SIZE))
+ if (sysfs_get_rev(parent, pp->rev, SCSI_REV_SIZE) <= 0)
return 1;
condlog(3, "%s: rev = %s", pp->dev, pp->rev);
@@ -712,7 +715,7 @@ ccw_sysfs_pathinfo (struct path * pp)
condlog(3, "%s: vendor = %s", pp->dev, pp->vendor_id);
- if (sysfs_get_devtype(parent, attr_buff, FILE_NAME_SIZE))
+ if (sysfs_get_devtype(parent, attr_buff, FILE_NAME_SIZE) <= 0)
return 1;
if (!strncmp(attr_buff, "3370", 4)) {
@@ -772,17 +775,17 @@ cciss_sysfs_pathinfo (struct path * pp)
if (!attr_path || pp->sg_id.host_no == -1)
return 1;
- if (sysfs_get_vendor(parent, pp->vendor_id, SCSI_VENDOR_SIZE))
+ if (sysfs_get_vendor(parent, pp->vendor_id, SCSI_VENDOR_SIZE) <= 0)
return 1;
condlog(3, "%s: vendor = %s", pp->dev, pp->vendor_id);
- if (sysfs_get_model(parent, pp->product_id, SCSI_PRODUCT_SIZE))
+ if (sysfs_get_model(parent, pp->product_id, SCSI_PRODUCT_SIZE) <= 0)
return 1;
condlog(3, "%s: product = %s", pp->dev, pp->product_id);
- if (sysfs_get_rev(parent, pp->rev, SCSI_REV_SIZE))
+ if (sysfs_get_rev(parent, pp->rev, SCSI_REV_SIZE) <= 0)
return 1;
condlog(3, "%s: rev = %s", pp->dev, pp->rev);
@@ -816,7 +819,7 @@ common_sysfs_pathinfo (struct path * pp)
condlog(4, "%s: udev not initialised", pp->dev);
return 1;
}
- if (sysfs_get_dev(pp->udev, pp->dev_t, BLK_DEV_SIZE)) {
+ if (sysfs_get_dev(pp->udev, pp->dev_t, BLK_DEV_SIZE) <= 0) {
condlog(3, "%s: no 'dev' attribute in sysfs", pp->dev);
return 1;
}
@@ -956,8 +959,7 @@ get_state (struct path * pp, int daemon)
if (daemon)
checker_set_async(c);
if (!conf->checker_timeout &&
- (pp->bus != SYSFS_BUS_SCSI ||
- sysfs_get_timeout(pp, &(c->timeout))))
+ sysfs_get_timeout(pp, &(c->timeout)) <= 0)
c->timeout = DEF_TIMEOUT;
state = checker_check(c);
condlog(3, "%s: state = %s", pp->dev, checker_state_name(state));
diff --git a/libmultipath/discovery.h b/libmultipath/discovery.h
index d049ead..3d2d0ce 100644
--- a/libmultipath/discovery.h
+++ b/libmultipath/discovery.h
@@ -26,7 +26,7 @@
struct config;
-int sysfs_get_dev (struct udev_device *udev, char * buff, size_t len);
+ssize_t sysfs_get_dev (struct udev_device *udev, char * buff, size_t len);
int path_discovery (vector pathvec, struct config * conf, int flag);
int do_tur (char *);
diff --git a/libmultipath/propsel.c b/libmultipath/propsel.c
index e47d0ca..f092227 100644
--- a/libmultipath/propsel.c
+++ b/libmultipath/propsel.c
@@ -351,7 +351,7 @@ out:
condlog(3, "%s: checker timeout = %u s (config file default)",
pp->dev, c->timeout);
}
- else if (pp->udev && sysfs_get_timeout(pp, &c->timeout) == 0)
+ else if (sysfs_get_timeout(pp, &c->timeout) > 0)
condlog(3, "%s: checker timeout = %u ms (sysfs setting)",
pp->dev, c->timeout);
else {
diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
index 14ea1c7..abb2c56 100644
--- a/libmultipath/structs_vec.c
+++ b/libmultipath/structs_vec.c
@@ -451,8 +451,7 @@ verify_paths(struct multipath * mpp, struct vectors * vecs, vector rpvec)
/*
* see if path is in sysfs
*/
- if (!pp->udev || sysfs_get_dev(pp->udev, pp->dev_t,
- BLK_DEV_SIZE)) {
+ if (sysfs_get_dev(pp->udev, pp->dev_t, BLK_DEV_SIZE) <= 0) {
if (pp->state != PATH_DOWN) {
condlog(1, "%s: removing valid path %s in state %d",
mpp->alias, pp->dev, pp->state);
--
1.8.1.4
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 05/13] libmultipath: do not stall on recv_packet()
2013-11-15 10:29 [PATCH 00/13] systemd integraion Hannes Reinecke
` (3 preceding siblings ...)
2013-11-15 10:29 ` [PATCH 04/13] libmultipath: return error numbers from sysfs_get_XXX Hannes Reinecke
@ 2013-11-15 10:29 ` Hannes Reinecke
2013-11-15 10:29 ` [PATCH 06/13] multipathd: switch to socket activation for systemd Hannes Reinecke
` (7 subsequent siblings)
12 siblings, 0 replies; 24+ messages in thread
From: Hannes Reinecke @ 2013-11-15 10:29 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: dm-devel
The CLI socket might have been closed or the daemon might have
been terminated by systemd without closing the CLI socket.
Hence we need to poll the socket if further data is avalailable,
otherwise the read() call will hang.
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
libmultipath/uxsock.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/libmultipath/uxsock.c b/libmultipath/uxsock.c
index ce89428..fcad56e 100644
--- a/libmultipath/uxsock.c
+++ b/libmultipath/uxsock.c
@@ -107,9 +107,24 @@ size_t write_all(int fd, const void *buf, size_t len)
size_t read_all(int fd, void *buf, size_t len)
{
size_t total = 0;
+ ssize_t n;
+ int ret;
+ struct pollfd pfd;
while (len) {
- ssize_t n = read(fd, buf, len);
+ pfd.fd = fd;
+ pfd.events = POLLIN;
+ ret = poll(&pfd, 1, 1000);
+ if (!ret) {
+ errno = ETIMEDOUT;
+ return total;
+ } else if (ret < 0) {
+ if (errno == EINTR)
+ continue;
+ return total;
+ } else if (!pfd.revents & POLLIN)
+ continue;
+ n = read(fd, buf, len);
if (n < 0) {
if ((errno == EINTR) || (errno == EAGAIN))
continue;
--
1.8.1.4
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 06/13] multipathd: switch to socket activation for systemd
2013-11-15 10:29 [PATCH 00/13] systemd integraion Hannes Reinecke
` (4 preceding siblings ...)
2013-11-15 10:29 ` [PATCH 05/13] libmultipath: do not stall on recv_packet() Hannes Reinecke
@ 2013-11-15 10:29 ` Hannes Reinecke
2013-11-15 10:29 ` [PATCH 07/13] multipathd: use sd_notify() to inform systemd Hannes Reinecke
` (6 subsequent siblings)
12 siblings, 0 replies; 24+ messages in thread
From: Hannes Reinecke @ 2013-11-15 10:29 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: dm-devel
multipathd already has a netlink socket for CLI commands, which
can be used for socket activation.
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
libmultipath/Makefile | 2 +-
libmultipath/uxsock.c | 24 +++++++++++++++++++++---
multipathd/Makefile | 4 +++-
multipathd/multipathd.service | 7 +++----
multipathd/multipathd.socket | 5 +++++
multipathd/uxclnt.c | 4 +---
multipathd/uxlsnr.c | 4 +---
7 files changed, 35 insertions(+), 15 deletions(-)
create mode 100644 multipathd/multipathd.socket
diff --git a/libmultipath/Makefile b/libmultipath/Makefile
index ae1d8a3..c28413c 100644
--- a/libmultipath/Makefile
+++ b/libmultipath/Makefile
@@ -7,7 +7,7 @@ include ../Makefile.inc
SONAME=0
DEVLIB = libmultipath.so
LIBS = $(DEVLIB).$(SONAME)
-LIBDEPS = -lpthread -ldl -ldevmapper -ludev
+LIBDEPS = -lpthread -ldl -ldevmapper -lsystemd-daemon -ludev
OBJS = memory.o parser.o vector.o devmapper.o callout.o \
hwtable.o blacklist.o util.o dmparser.o config.o \
diff --git a/libmultipath/uxsock.c b/libmultipath/uxsock.c
index fcad56e..58f0bca 100644
--- a/libmultipath/uxsock.c
+++ b/libmultipath/uxsock.c
@@ -16,9 +16,11 @@
#include <sys/poll.h>
#include <signal.h>
#include <errno.h>
+#include <systemd/sd-daemon.h>
#include "memory.h"
#include "uxsock.h"
+#include "debug.h"
/*
* connect to a unix domain socket
@@ -36,10 +38,12 @@ int ux_socket_connect(const char *name)
fd = socket(AF_LOCAL, SOCK_STREAM, 0);
if (fd == -1) {
+ condlog(3, "Couldn't create ux_socket, error %d", errno);
return -1;
}
if (connect(fd, (struct sockaddr *)&addr, len) == -1) {
+ condlog(3, "Couldn't connect to ux_socket, error %d", errno);
close(fd);
return -1;
}
@@ -53,11 +57,24 @@ int ux_socket_connect(const char *name)
*/
int ux_socket_listen(const char *name)
{
- int fd, len;
+ int fd, len, num;
struct sockaddr_un addr;
+ num = sd_listen_fds(0);
+ if (num > 1) {
+ condlog(3, "sd_listen_fds returned %d fds", num);
+ return -1;
+ } else if (num == 1) {
+ fd = SD_LISTEN_FDS_START + 0;
+ condlog(3, "using fd %d from sd_listen_fds", fd);
+ return fd;
+ }
+
fd = socket(AF_LOCAL, SOCK_STREAM, 0);
- if (fd == -1) return -1;
+ if (fd == -1) {
+ condlog(3, "Couldn't create ux_socket, error %d", errno);
+ return -1;
+ }
memset(&addr, 0, sizeof(addr));
addr.sun_family = AF_LOCAL;
@@ -66,15 +83,16 @@ int ux_socket_listen(const char *name)
strncpy(&addr.sun_path[1], name, len);
if (bind(fd, (struct sockaddr *)&addr, len) == -1) {
+ condlog(3, "Couldn't bind to ux_socket, error %d", errno);
close(fd);
return -1;
}
if (listen(fd, 10) == -1) {
+ condlog(3, "Couldn't listen to ux_socket, error %d", errno);
close(fd);
return -1;
}
-
return fd;
}
diff --git a/multipathd/Makefile b/multipathd/Makefile
index b490c1d..f9801b0 100644
--- a/multipathd/Makefile
+++ b/multipathd/Makefile
@@ -6,7 +6,7 @@ include ../Makefile.inc
# basic flags setting
#
CFLAGS += -I$(multipathdir) -I$(mpathpersistdir)
-LDFLAGS += -lpthread -ldevmapper -lreadline -ludev -ldl \
+LDFLAGS += -lpthread -ldevmapper -lreadline -lsystemd-daemon -ludev -ldl \
-L$(multipathdir) -lmultipath -L$(mpathpersistdir) -lmpathpersist
#
@@ -37,6 +37,7 @@ install:
$(INSTALL_PROGRAM) -d $(DESTDIR)$(rcdir)
$(INSTALL_PROGRAM) -d $(DESTDIR)$(unitdir)
$(INSTALL_PROGRAM) -m 644 $(EXEC).service $(DESTDIR)$(unitdir)
+ $(INSTALL_PROGRAM) -m 644 $(EXEC).socket $(DESTDIR)$(unitdir)
$(INSTALL_PROGRAM) -d $(DESTDIR)$(mandir)
$(INSTALL_PROGRAM) -m 644 $(EXEC).8.gz $(DESTDIR)$(mandir)
@@ -45,6 +46,7 @@ uninstall:
rm -f $(DESTDIR)$(rcdir)/$(EXEC)
rm -f $(DESTDIR)$(mandir)/$(EXEC).8.gz
rm -f $(DESTDIR)$(unitdir)/$(EXEC).service
+ rm -f $(DESTDIR)$(unitdir)/$(EXEC).socket
clean:
rm -f core *.o $(EXEC) *.gz
diff --git a/multipathd/multipathd.service b/multipathd/multipathd.service
index d6da067..bc62bc5 100644
--- a/multipathd/multipathd.service
+++ b/multipathd/multipathd.service
@@ -6,11 +6,10 @@ DefaultDependencies=no
Conflicts=shutdown.target
[Service]
-Type=forking
-PIDFile=/var/run/multipathd.pid
-ExecStart=/sbin/multipathd
+ExecStart=/sbin/multipathd -d
ExecReload=/sbin/multipathd reconfigure
-#ExecStop=/path/to/scrip delete-me if not necessary
+ExecStop=/sbin/multipathd shutdown
[Install]
WantedBy=sysinit.target
+Also=multipathd.socket
diff --git a/multipathd/multipathd.socket b/multipathd/multipathd.socket
new file mode 100644
index 0000000..3d4b6da
--- /dev/null
+++ b/multipathd/multipathd.socket
@@ -0,0 +1,5 @@
+[Socket]
+ListenStream=@/org/kernel/linux/storage/multipathd
+
+[Install]
+WantedBy=sockets.target
diff --git a/multipathd/uxclnt.c b/multipathd/uxclnt.c
index 3e4e381..e86be21 100644
--- a/multipathd/uxclnt.c
+++ b/multipathd/uxclnt.c
@@ -121,10 +121,8 @@ int uxclnt(char * inbuf)
int fd;
fd = ux_socket_connect(DEFAULT_SOCKET);
- if (fd == -1) {
- perror("ux_socket_connect");
+ if (fd == -1)
exit(1);
- }
if (inbuf)
process_req(fd, inbuf);
diff --git a/multipathd/uxlsnr.c b/multipathd/uxlsnr.c
index c0ddd8d..ed8e012 100644
--- a/multipathd/uxlsnr.c
+++ b/multipathd/uxlsnr.c
@@ -112,10 +112,8 @@ void * uxsock_listen(int (*uxsock_trigger)(char *, char **, int *, void *),
ux_sock = ux_socket_listen(DEFAULT_SOCKET);
- if (ux_sock == -1) {
- condlog(0, "ux_socket_listen error");
+ if (ux_sock == -1)
exit(1);
- }
pthread_cleanup_push(uxsock_cleanup, NULL);
--
1.8.1.4
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 07/13] multipathd: use sd_notify() to inform systemd
2013-11-15 10:29 [PATCH 00/13] systemd integraion Hannes Reinecke
` (5 preceding siblings ...)
2013-11-15 10:29 ` [PATCH 06/13] multipathd: switch to socket activation for systemd Hannes Reinecke
@ 2013-11-15 10:29 ` Hannes Reinecke
2013-11-15 10:29 ` [PATCH 08/13] multipathd: Add option '-s' to suppress timestamps Hannes Reinecke
` (5 subsequent siblings)
12 siblings, 0 replies; 24+ messages in thread
From: Hannes Reinecke @ 2013-11-15 10:29 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: dm-devel
Implement sd_notify() to inform systemd about our internal state.
And we should be using the service type 'notify' so the systemd
doesn't try to flood multipathd if it's still in discovery.
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
multipathd/main.c | 7 ++++++-
multipathd/multipathd.service | 3 ++-
2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/multipathd/main.c b/multipathd/main.c
index 91d7bfc..b4146fd 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -17,6 +17,7 @@
#include <limits.h>
#include <linux/oom.h>
#include <libudev.h>
+#include <systemd/sd-daemon.h>
#include <semaphore.h>
#include <mpath_persist.h>
@@ -1604,7 +1605,7 @@ child (void * param)
}
running_state = DAEMON_START;
-
+ sd_notify(0, "STATUS=startup");
condlog(2, "--------start up--------");
condlog(2, "read " DEFAULT_CONFIGFILE);
@@ -1672,6 +1673,7 @@ child (void * param)
/*
* fetch and configure both paths and multipaths
*/
+ sd_notify(0, "STATUS=configure");
running_state = DAEMON_CONFIGURE;
lock(vecs->lock);
@@ -1700,11 +1702,14 @@ child (void * param)
/* Ignore errors, we can live without */
running_state = DAEMON_RUNNING;
+ sd_notify(0, "READY=1\nSTATUS=running");
/*
* exit path
*/
while(sem_wait(&exit_sem) != 0); /* Do nothing */
+
+ sd_notify(0, "STATUS=shutdown");
running_state = DAEMON_SHUTDOWN;
lock(vecs->lock);
if (conf->queue_without_daemon == QUE_NO_DAEMON_OFF)
diff --git a/multipathd/multipathd.service b/multipathd/multipathd.service
index bc62bc5..03b4270 100644
--- a/multipathd/multipathd.service
+++ b/multipathd/multipathd.service
@@ -6,9 +6,10 @@ DefaultDependencies=no
Conflicts=shutdown.target
[Service]
+Type=notify
+NotifyAccess=main
ExecStart=/sbin/multipathd -d
ExecReload=/sbin/multipathd reconfigure
-ExecStop=/sbin/multipathd shutdown
[Install]
WantedBy=sysinit.target
--
1.8.1.4
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 08/13] multipathd: Add option '-s' to suppress timestamps
2013-11-15 10:29 [PATCH 00/13] systemd integraion Hannes Reinecke
` (6 preceding siblings ...)
2013-11-15 10:29 ` [PATCH 07/13] multipathd: use sd_notify() to inform systemd Hannes Reinecke
@ 2013-11-15 10:29 ` Hannes Reinecke
2013-11-15 10:29 ` [PATCH 09/13] multipathd: Implement systemd watchdog integration Hannes Reinecke
` (4 subsequent siblings)
12 siblings, 0 replies; 24+ messages in thread
From: Hannes Reinecke @ 2013-11-15 10:29 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: dm-devel
systemd prefixes any messages to stdout with a timestamp, so it's
quite pointless to do it ourself.
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
libmultipath/debug.c | 17 ++++++++++-------
libmultipath/devmapper.c | 16 +++++++++-------
multipathd/main.c | 11 +++++++----
multipathd/multipathd.8 | 7 +++++--
multipathd/multipathd.service | 2 +-
5 files changed, 32 insertions(+), 21 deletions(-)
diff --git a/libmultipath/debug.c b/libmultipath/debug.c
index d30517d..bad78a8 100644
--- a/libmultipath/debug.c
+++ b/libmultipath/debug.c
@@ -21,15 +21,18 @@ void dlog (int sink, int prio, const char * fmt, ...)
thres = (conf) ? conf->verbosity : 0;
if (prio <= thres) {
- if (!sink) {
- time_t t = time(NULL);
- struct tm *tb = localtime(&t);
- char buff[16];
+ if (sink < 1) {
+ if (sink == 0) {
+ time_t t = time(NULL);
+ struct tm *tb = localtime(&t);
+ char buff[16];
- strftime(buff, sizeof(buff), "%b %d %H:%M:%S", tb);
- buff[sizeof(buff)-1] = '\0';
+ strftime(buff, sizeof(buff),
+ "%b %d %H:%M:%S", tb);
+ buff[sizeof(buff)-1] = '\0';
- fprintf(stdout, "%s | ", buff);
+ fprintf(stdout, "%s | ", buff);
+ }
vfprintf(stdout, fmt, ap);
}
else
diff --git a/libmultipath/devmapper.c b/libmultipath/devmapper.c
index 4c8b923..71281e6 100644
--- a/libmultipath/devmapper.c
+++ b/libmultipath/devmapper.c
@@ -71,15 +71,17 @@ dm_write_log (int level, const char *file, int line, const char *f, ...)
return;
va_start(ap, f);
- if (!logsink) {
- time_t t = time(NULL);
- struct tm *tb = localtime(&t);
- char buff[16];
+ if (logsink < 1) {
+ if (logsink == 0) {
+ time_t t = time(NULL);
+ struct tm *tb = localtime(&t);
+ char buff[16];
- strftime(buff, sizeof(buff), "%b %d %H:%M:%S", tb);
- buff[sizeof(buff)-1] = '\0';
+ strftime(buff, sizeof(buff), "%b %d %H:%M:%S", tb);
+ buff[sizeof(buff)-1] = '\0';
- fprintf(stdout, "%s | ", buff);
+ fprintf(stdout, "%s | ", buff);
+ }
fprintf(stdout, "libdevmapper: %s(%i): ", file, line);
vfprintf(stdout, f, ap);
fprintf(stdout, "\n");
diff --git a/multipathd/main.c b/multipathd/main.c
index b4146fd..72b3740 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1598,7 +1598,7 @@ child (void * param)
setup_thread_attr(&uevent_attr, 128 * 1024, 1);
setup_thread_attr(&waiter_attr, 32 * 1024, 1);
- if (logsink) {
+ if (logsink == 1) {
setup_thread_attr(&log_attr, 64 * 1024, 0);
log_thread_start(&log_attr);
pthread_attr_destroy(&log_attr);
@@ -1754,7 +1754,7 @@ child (void * param)
condlog(2, "--------shut down-------");
- if (logsink)
+ if (logsink == 1)
log_thread_stop();
/*
@@ -1854,7 +1854,7 @@ main (int argc, char *argv[])
if (!conf)
exit(1);
- while ((arg = getopt(argc, argv, ":dv:k::")) != EOF ) {
+ while ((arg = getopt(argc, argv, ":dsv:k::")) != EOF ) {
switch(arg) {
case 'd':
logsink = 0;
@@ -1867,6 +1867,9 @@ main (int argc, char *argv[])
conf->verbosity = atoi(optarg);
break;
+ case 's':
+ logsink = -1;
+ break;
case 'k':
uxclnt(optarg);
exit(0);
@@ -1891,7 +1894,7 @@ main (int argc, char *argv[])
exit(0);
}
- if (!logsink)
+ if (logsink < 1)
err = 0;
else
err = daemonize();
diff --git a/multipathd/multipathd.8 b/multipathd/multipathd.8
index ccc5f54..2aea150 100644
--- a/multipathd/multipathd.8
+++ b/multipathd/multipathd.8
@@ -20,8 +20,11 @@ devmap reconfiguration, so that it can refresh its failed path list.
.SH OPTIONS
.TP
.B \-d
-Forground Mode. Don't daemonize, and print all messages to stdout and stderr.
-.TP
+Foreground Mode. Don't daemonize, and print all messages to stdout and stderr.
+.TP
+.B \-s
+Suppress timestamps. Do not prefix logging messages with a timestamp.
+.TP
.B -v "level"
Verbosity level. Print additional information while running multipathd. A level of 0 means only print errors. A level of 3 or greater prints debugging information as well.
.TP
diff --git a/multipathd/multipathd.service b/multipathd/multipathd.service
index 03b4270..fb84025 100644
--- a/multipathd/multipathd.service
+++ b/multipathd/multipathd.service
@@ -8,7 +8,7 @@ Conflicts=shutdown.target
[Service]
Type=notify
NotifyAccess=main
-ExecStart=/sbin/multipathd -d
+ExecStart=/sbin/multipathd -d -s
ExecReload=/sbin/multipathd reconfigure
[Install]
--
1.8.1.4
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 09/13] multipathd: Implement systemd watchdog integration
2013-11-15 10:29 [PATCH 00/13] systemd integraion Hannes Reinecke
` (7 preceding siblings ...)
2013-11-15 10:29 ` [PATCH 08/13] multipathd: Add option '-s' to suppress timestamps Hannes Reinecke
@ 2013-11-15 10:29 ` Hannes Reinecke
2013-11-22 22:17 ` Benjamin Marzinski
2013-11-15 10:29 ` [PATCH 10/13] multipathd: enable core dumps for systemd Hannes Reinecke
` (3 subsequent siblings)
12 siblings, 1 reply; 24+ messages in thread
From: Hannes Reinecke @ 2013-11-15 10:29 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: dm-devel
In the past there have been several instances where multipathd
would hang with the checkerloop as some path checker might not
be able to return in time.
This patch now activates the watchdog feature from systemd
to shutdown (and possibly restart) multipathd in these
situations.
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
multipath/multipath.conf.5 | 7 +++++--
multipathd/main.c | 15 ++++++++++++++-
multipathd/multipathd.service | 1 +
3 files changed, 20 insertions(+), 3 deletions(-)
diff --git a/multipath/multipath.conf.5 b/multipath/multipath.conf.5
index 0fd3035..cf5bec0 100644
--- a/multipath/multipath.conf.5
+++ b/multipath/multipath.conf.5
@@ -71,8 +71,11 @@ section recognizes the following keywords:
.B polling_interval
interval between two path checks in seconds. For properly functioning paths,
the interval between checks will gradually increase to
-.B max_polling_interval;
-default is
+.B max_polling_interval.
+This value will be overridden by the
+.B WatchdogSec
+setting in the multipathd.service definition if systemd is used.
+Default is
.I 5
.TP
.B max_polling_interval
diff --git a/multipathd/main.c b/multipathd/main.c
index 72b3740..abeebc2 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1286,6 +1286,7 @@ checkerloop (void *ap)
lock(vecs->lock);
pthread_testcancel();
condlog(4, "tick");
+ sd_notify(0, "WATCHDOG=1");
if (vecs->pathvec) {
vector_foreach_slot (vecs->pathvec, pp, i) {
@@ -1585,7 +1586,8 @@ child (void * param)
pthread_attr_t log_attr, misc_attr, uevent_attr;
struct vectors * vecs;
struct multipath * mpp;
- int i;
+ char *envp;
+ int i, checkint;
int rc, pid_rc;
mlockall(MCL_CURRENT | MCL_FUTURE);
@@ -1658,6 +1660,17 @@ child (void * param)
conf->daemon = 1;
udev_set_sync_support(0);
+ envp = getenv("WATCHDOG_USEC");
+ if (envp && sscanf(envp, "%d", &checkint) == 1) {
+ /* Value is in microseconds */
+ checkint = checkint / 1000000;
+ if (checkint > conf->max_checkint)
+ conf->max_checkint = checkint;
+ conf->checkint = checkint;
+ condlog(3, "set checkint to %d max %d",
+ conf->checkint, conf->max_checkint);
+ }
+
/*
* Start uevent listener early to catch events
*/
diff --git a/multipathd/multipathd.service b/multipathd/multipathd.service
index fb84025..848a231 100644
--- a/multipathd/multipathd.service
+++ b/multipathd/multipathd.service
@@ -10,6 +10,7 @@ Type=notify
NotifyAccess=main
ExecStart=/sbin/multipathd -d -s
ExecReload=/sbin/multipathd reconfigure
+WatchdogSec=5s
[Install]
WantedBy=sysinit.target
--
1.8.1.4
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 10/13] multipathd: enable core dumps for systemd
2013-11-15 10:29 [PATCH 00/13] systemd integraion Hannes Reinecke
` (8 preceding siblings ...)
2013-11-15 10:29 ` [PATCH 09/13] multipathd: Implement systemd watchdog integration Hannes Reinecke
@ 2013-11-15 10:29 ` Hannes Reinecke
2013-11-15 10:29 ` [PATCH 11/13] multipathd: Read environment variables from systemd Hannes Reinecke
` (2 subsequent siblings)
12 siblings, 0 replies; 24+ messages in thread
From: Hannes Reinecke @ 2013-11-15 10:29 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: dm-devel
Add 'LimitCORE' definition to the service file to enable core
dumps when running under systemd.
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
multipathd/multipathd.service | 1 +
1 file changed, 1 insertion(+)
diff --git a/multipathd/multipathd.service b/multipathd/multipathd.service
index 848a231..6472dd5 100644
--- a/multipathd/multipathd.service
+++ b/multipathd/multipathd.service
@@ -11,6 +11,7 @@ NotifyAccess=main
ExecStart=/sbin/multipathd -d -s
ExecReload=/sbin/multipathd reconfigure
WatchdogSec=5s
+LimitCORE=infinity
[Install]
WantedBy=sysinit.target
--
1.8.1.4
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 11/13] multipathd: Read environment variables from systemd
2013-11-15 10:29 [PATCH 00/13] systemd integraion Hannes Reinecke
` (9 preceding siblings ...)
2013-11-15 10:29 ` [PATCH 10/13] multipathd: enable core dumps for systemd Hannes Reinecke
@ 2013-11-15 10:29 ` Hannes Reinecke
2013-11-15 10:29 ` [PATCH 12/13] multipathd: measure path check time Hannes Reinecke
2013-11-15 10:29 ` [PATCH 13/13] multipathd: no_map_shutdown option Hannes Reinecke
12 siblings, 0 replies; 24+ messages in thread
From: Hannes Reinecke @ 2013-11-15 10:29 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: dm-devel
When systemd adjusts 'OOMScoreAdjust' and 'LimitNOFILE'
we should take those settings and not try to adjust them
again on our side.
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
multipathd/main.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/multipathd/main.c b/multipathd/main.c
index abeebc2..e5b061b 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1551,7 +1551,13 @@ set_oom_adj (void)
#endif
FILE *fp;
struct stat st;
+ char *envp;
+ envp = getenv("OOMScoreAdjust");
+ if (envp) {
+ condlog(3, "Using systemd provided OOMScoreAdjust");
+ return;
+ }
do {
if (stat(file, &st) == 0){
fp = fopen(file, "w");
@@ -1625,7 +1631,11 @@ child (void * param)
setlogmask(LOG_UPTO(conf->verbosity + 3));
- if (conf->max_fds) {
+ envp = getenv("LimitNOFILE");
+
+ if (envp) {
+ condlog(2,"Using systemd provided open fds limit of %s", envp);
+ } else if (conf->max_fds) {
struct rlimit fd_limit;
if (getrlimit(RLIMIT_NOFILE, &fd_limit) < 0) {
--
1.8.1.4
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 12/13] multipathd: measure path check time
2013-11-15 10:29 [PATCH 00/13] systemd integraion Hannes Reinecke
` (10 preceding siblings ...)
2013-11-15 10:29 ` [PATCH 11/13] multipathd: Read environment variables from systemd Hannes Reinecke
@ 2013-11-15 10:29 ` Hannes Reinecke
2013-11-15 10:29 ` [PATCH 13/13] multipathd: no_map_shutdown option Hannes Reinecke
12 siblings, 0 replies; 24+ messages in thread
From: Hannes Reinecke @ 2013-11-15 10:29 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: dm-devel
Instrument code to measure path check time.
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
multipathd/main.c | 28 +++++++++++++++++++++-------
1 file changed, 21 insertions(+), 7 deletions(-)
diff --git a/multipathd/main.c b/multipathd/main.c
index e5b061b..d386e0a 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1102,7 +1102,7 @@ int update_path_groups(struct multipath *mpp, struct vectors *vecs, int refresh)
return 0;
}
-void
+int
check_path (struct vectors * vecs, struct path * pp)
{
int newstate;
@@ -1111,10 +1111,10 @@ check_path (struct vectors * vecs, struct path * pp)
int oldchkrstate = pp->chkrstate;
if (!pp->mpp)
- return;
+ return 0;
if (pp->tick && --pp->tick)
- return; /* don't check this path yet */
+ return 0; /* don't check this path yet */
/*
* provision a next check soonest,
@@ -1129,7 +1129,7 @@ check_path (struct vectors * vecs, struct path * pp)
if (newstate == PATH_WILD || newstate == PATH_UNCHECKED) {
condlog(2, "%s: unusable path", pp->dev);
pathinfo(pp, conf->hwtable, 0);
- return;
+ return 1;
}
/*
* Async IO in flight. Keep the previous path state
@@ -1137,7 +1137,7 @@ check_path (struct vectors * vecs, struct path * pp)
*/
if (newstate == PATH_PENDING) {
pp->tick = 1;
- return;
+ return 0;
}
/*
* Synchronize with kernel state
@@ -1175,7 +1175,7 @@ check_path (struct vectors * vecs, struct path * pp)
pp->mpp->failback_tick = 0;
pp->mpp->stat_path_failures++;
- return;
+ return 1;
}
if(newstate == PATH_UP || newstate == PATH_GHOST){
@@ -1260,6 +1260,7 @@ check_path (struct vectors * vecs, struct path * pp)
(chkr_new_path_up && followover_should_failback(pp)))
switch_pathgroup(pp->mpp);
}
+ return 1;
}
static void *
@@ -1282,6 +1283,11 @@ checkerloop (void *ap)
}
while (1) {
+ struct timeval diff_time, start_time, end_time;
+ int num_paths = 0;
+
+ if (gettimeofday(&start_time, NULL) != 0)
+ start_time.tv_sec = 0;
pthread_cleanup_push(cleanup_lock, &vecs->lock);
lock(vecs->lock);
pthread_testcancel();
@@ -1290,7 +1296,7 @@ checkerloop (void *ap)
if (vecs->pathvec) {
vector_foreach_slot (vecs->pathvec, pp, i) {
- check_path(vecs, pp);
+ num_paths += check_path(vecs, pp);
}
}
if (vecs->mpvec) {
@@ -1306,6 +1312,14 @@ checkerloop (void *ap)
}
lock_cleanup_pop(vecs->lock);
+ if (start_time.tv_sec &&
+ gettimeofday(&end_time, NULL) == 0 &&
+ num_paths) {
+ timersub(&end_time, &start_time, &diff_time);
+ condlog(3, "checked %d path%s in %lu.%06lu secs",
+ num_paths, num_paths > 1 ? "s" : "",
+ diff_time.tv_sec, diff_time.tv_usec);
+ }
sleep(1);
}
return NULL;
--
1.8.1.4
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 13/13] multipathd: no_map_shutdown option
2013-11-15 10:29 [PATCH 00/13] systemd integraion Hannes Reinecke
` (11 preceding siblings ...)
2013-11-15 10:29 ` [PATCH 12/13] multipathd: measure path check time Hannes Reinecke
@ 2013-11-15 10:29 ` Hannes Reinecke
2013-11-21 23:17 ` Benjamin Marzinski
12 siblings, 1 reply; 24+ messages in thread
From: Hannes Reinecke @ 2013-11-15 10:29 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: dm-devel
This patchs implements the 'no_map_shutdown' configuration option.
Idea is that the daemon will exit if no maps are found, as then
there's nothing to be done for the daemon anyway.
The daemon will be restarted via socket activation for any CLI
command.
This this option the multipathd.socket unit can be activated always
and won't add to the overall system load.
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
libmultipath/config.h | 1 +
libmultipath/dict.c | 32 ++++++++++++++++++++++++++++++++
multipath.conf.annotated | 9 +++++++++
multipath/multipath.conf.5 | 7 +++++++
multipathd/main.c | 20 +++++++++++++++-----
multipathd/multipathd.8 | 4 ++++
multipathd/multipathd.service | 2 +-
7 files changed, 69 insertions(+), 6 deletions(-)
diff --git a/libmultipath/config.h b/libmultipath/config.h
index 9c467e8..590fb32 100644
--- a/libmultipath/config.h
+++ b/libmultipath/config.h
@@ -103,6 +103,7 @@ struct config {
int fast_io_fail;
unsigned int dev_loss;
int log_checker_err;
+ int no_map_shutdown;
int allow_queueing;
uid_t uid;
gid_t gid;
diff --git a/libmultipath/dict.c b/libmultipath/dict.c
index 0bf9587..da706f7 100644
--- a/libmultipath/dict.c
+++ b/libmultipath/dict.c
@@ -545,6 +545,25 @@ def_log_checker_err_handler(vector strvec)
}
static int
+def_no_map_shutdown_handler(vector strvec)
+{
+ char * buff;
+
+ buff = set_value(strvec);
+ if (!buff)
+ return 1;
+
+ if (!strncmp(buff, "on", 2) || !strncmp(buff, "yes", 3) ||
+ !strncmp(buff, "1", 1))
+ conf->no_map_shutdown = 1;
+ else
+ conf->no_map_shutdown = 0;
+
+ free(buff);
+ return 0;
+}
+
+static int
def_reservation_key_handler(vector strvec)
{
char *buff;
@@ -2714,6 +2733,18 @@ snprint_def_log_checker_err (char * buff, int len, void * data)
}
static int
+snprint_def_no_map_shutdown (char * buff, int len, void * data)
+{
+ switch (conf->no_map_shutdown) {
+ case 0:
+ return snprintf(buff, len, "\"no\"");
+ case 1:
+ return snprintf(buff, len, "\"yes\"");
+ }
+ return 0;
+}
+
+static int
snprint_def_user_friendly_names (char * buff, int len, void * data)
{
if (conf->user_friendly_names == USER_FRIENDLY_NAMES_ON)
@@ -2850,6 +2881,7 @@ init_keywords(void)
install_keyword("reservation_key", &def_reservation_key_handler, &snprint_def_reservation_key);
install_keyword("retain_attached_hw_handler", &def_retain_hwhandler_handler, &snprint_def_retain_hwhandler_handler);
install_keyword("detect_prio", &def_detect_prio_handler, &snprint_def_detect_prio);
+ install_keyword("no_map_shutdown", &def_no_map_shutdown_handler, &snprint_def_no_map_shutdown);
__deprecated install_keyword("default_selector", &def_selector_handler, NULL);
__deprecated install_keyword("default_path_grouping_policy", &def_pgpolicy_handler, NULL);
__deprecated install_keyword("default_uid_attribute", &def_uid_attribute_handler, NULL);
diff --git a/multipath.conf.annotated b/multipath.conf.annotated
index a20302c..4b56bc8 100644
--- a/multipath.conf.annotated
+++ b/multipath.conf.annotated
@@ -215,6 +215,15 @@
# user_friendly_names no
#
# #
+# # name : no_map_shutdown
+# # scope : multipathd
+# # desc : If set to "yes" the multipath daemon will terminate
+# # if no multipath devices have been found.
+# # values : yes|no
+# # default : no
+# no_map_shutdown no
+#
+# #
# # name : mode
# # scope : multipath & multipathd
# # desc : The mode to use for the multipath device nodes, in octal.
diff --git a/multipath/multipath.conf.5 b/multipath/multipath.conf.5
index cf5bec0..dba0027 100644
--- a/multipath/multipath.conf.5
+++ b/multipath/multipath.conf.5
@@ -409,6 +409,13 @@ will automatically use the
.I alua
prioritizer. If not, the prioritizer will be selected as usual. Default is
.I no
+.TP
+.B no_map_shutdown
+If set to
+.I yes
+, the multipath daemon will terminate if no multipath devices are found.
+Default is
+.I no
.
.SH "blacklist section"
The
diff --git a/multipathd/main.c b/multipathd/main.c
index d386e0a..d4a5809 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1026,11 +1026,11 @@ followover_should_failback(struct path * pp)
return 1;
}
-static void
+static int
defered_failback_tick (vector mpvec)
{
struct multipath * mpp;
- unsigned int i;
+ unsigned int i, mpp_count = 0;
vector_foreach_slot (mpvec, mpp, i) {
/*
@@ -1042,7 +1042,9 @@ defered_failback_tick (vector mpvec)
if (!mpp->failback_tick && need_switch_pathgroup(mpp, 1))
switch_pathgroup(mpp);
}
+ mpp_count++;
}
+ return mpp_count;
}
static void
@@ -1284,7 +1286,7 @@ checkerloop (void *ap)
while (1) {
struct timeval diff_time, start_time, end_time;
- int num_paths = 0;
+ int num_paths = 0, num_maps = 0;
if (gettimeofday(&start_time, NULL) != 0)
start_time.tv_sec = 0;
@@ -1300,7 +1302,7 @@ checkerloop (void *ap)
}
}
if (vecs->mpvec) {
- defered_failback_tick(vecs->mpvec);
+ num_maps = defered_failback_tick(vecs->mpvec);
retry_count_tick(vecs->mpvec);
}
if (count)
@@ -1320,6 +1322,11 @@ checkerloop (void *ap)
num_paths, num_paths > 1 ? "s" : "",
diff_time.tv_sec, diff_time.tv_usec);
}
+ if (!num_maps && conf->no_map_shutdown) {
+ condlog(3, "terminating, no multipath devices");
+ exit_daemon();
+ break;
+ }
sleep(1);
}
return NULL;
@@ -1891,7 +1898,7 @@ main (int argc, char *argv[])
if (!conf)
exit(1);
- while ((arg = getopt(argc, argv, ":dsv:k::")) != EOF ) {
+ while ((arg = getopt(argc, argv, ":dnsv:k::")) != EOF ) {
switch(arg) {
case 'd':
logsink = 0;
@@ -1904,6 +1911,9 @@ main (int argc, char *argv[])
conf->verbosity = atoi(optarg);
break;
+ case 'n':
+ conf->no_map_shutdown = 1;
+ break;
case 's':
logsink = -1;
break;
diff --git a/multipathd/multipathd.8 b/multipathd/multipathd.8
index 2aea150..eb39c9c 100644
--- a/multipathd/multipathd.8
+++ b/multipathd/multipathd.8
@@ -22,6 +22,10 @@ devmap reconfiguration, so that it can refresh its failed path list.
.B \-d
Foreground Mode. Don't daemonize, and print all messages to stdout and stderr.
.TP
+.B \-n
+Oneshot Mode; daemon will terminate if no multipath devices are found.
+Equivalent to set 'no_map_shutdown' to '1' in the configuration file.
+.TP
.B \-s
Suppress timestamps. Do not prefix logging messages with a timestamp.
.TP
diff --git a/multipathd/multipathd.service b/multipathd/multipathd.service
index 6472dd5..b34667d 100644
--- a/multipathd/multipathd.service
+++ b/multipathd/multipathd.service
@@ -8,7 +8,7 @@ Conflicts=shutdown.target
[Service]
Type=notify
NotifyAccess=main
-ExecStart=/sbin/multipathd -d -s
+ExecStart=/sbin/multipathd -d -s -n
ExecReload=/sbin/multipathd reconfigure
WatchdogSec=5s
LimitCORE=infinity
--
1.8.1.4
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH 04/13] libmultipath: return error numbers from sysfs_get_XXX
2013-11-15 10:29 ` [PATCH 04/13] libmultipath: return error numbers from sysfs_get_XXX Hannes Reinecke
@ 2013-11-17 17:34 ` Christophe Varoqui
2013-11-18 6:51 ` Hannes Reinecke
0 siblings, 1 reply; 24+ messages in thread
From: Christophe Varoqui @ 2013-11-17 17:34 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: dm-devel
[-- Attachment #1.1: Type: text/plain, Size: 9398 bytes --]
declare_sysfs_get_str(devtype);
-declare_sysfs_get_str(cutype);
declare_sysfs_get_str(vendor);
declare_sysfs_get_str(model);
declare_sysfs_get_str(rev);
This part seems correct, as this function is not used anywhere, but
orthogonal to the patch.
Can you confirm this slip is safe ?
Best regards,
Christophe Varoqui
www.opensvc.com
On Fri, Nov 15, 2013 at 11:29 AM, Hannes Reinecke <hare@suse.de> wrote:
> Instead of returning hand-crafted error values from sysfs_get_XXX
> functions we should be using the standard error numbers.
>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
> libmultipath/discovery.c | 40 +++++++++++++++++++++-------------------
> libmultipath/discovery.h | 2 +-
> libmultipath/propsel.c | 2 +-
> libmultipath/structs_vec.c | 3 +--
> 4 files changed, 24 insertions(+), 23 deletions(-)
>
> diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
> index 3292358..d5557d9 100644
> --- a/libmultipath/discovery.c
> +++ b/libmultipath/discovery.c
> @@ -140,31 +140,34 @@ path_discovery (vector pathvec, struct config *
> conf, int flag)
> }
>
> #define declare_sysfs_get_str(fname) \
> -extern int \
> +extern ssize_t \
> sysfs_get_##fname (struct udev_device * udev, char * buff, size_t len) \
> { \
> + ssize_t ret; \
> const char * attr; \
> const char * devname; \
> \
> + if (!udev) \
> + return -ENOSYS; \
> + \
> devname = udev_device_get_sysname(udev); \
> \
> attr = udev_device_get_sysattr_value(udev, #fname); \
> if (!attr) { \
> condlog(3, "%s: attribute %s not found in sysfs", \
> devname, #fname); \
> - return 1; \
> + return -ENXIO; \
> } \
> if (strlen(attr) > len) { \
> condlog(3, "%s: overflow in attribute %s", \
> devname, #fname); \
> - return 2; \
> + return -EINVAL; \
> } \
> - strlcpy(buff, attr, len); \
> - return 0; \
> + ret = strlcpy(buff, attr, len); \
> + return ret; \
> }
>
> declare_sysfs_get_str(devtype);
> -declare_sysfs_get_str(cutype);
> declare_sysfs_get_str(vendor);
> declare_sysfs_get_str(model);
> declare_sysfs_get_str(rev);
> @@ -180,7 +183,7 @@ sysfs_get_timeout(struct path *pp, unsigned int
> *timeout)
> unsigned int t;
>
> if (!pp->udev || pp->bus != SYSFS_BUS_SCSI)
> - return 1;
> + return -ENOSYS;
>
> parent = pp->udev;
> while (parent) {
> @@ -192,7 +195,7 @@ sysfs_get_timeout(struct path *pp, unsigned int
> *timeout)
> }
> if (!attr) {
> condlog(3, "%s: No timeout value in sysfs", pp->dev);
> - return 1;
> + return -ENXIO;
> }
>
> r = sscanf(attr, "%u\n", &t);
> @@ -200,7 +203,7 @@ sysfs_get_timeout(struct path *pp, unsigned int
> *timeout)
> if (r != 1) {
> condlog(3, "%s: Cannot parse timeout attribute '%s'",
> pp->dev, attr);
> - return 1;
> + return -EINVAL;
> }
>
> *timeout = t;
> @@ -650,17 +653,17 @@ scsi_sysfs_pathinfo (struct path * pp)
> if (!attr_path || pp->sg_id.host_no == -1)
> return 1;
>
> - if (sysfs_get_vendor(parent, pp->vendor_id, SCSI_VENDOR_SIZE))
> + if (sysfs_get_vendor(parent, pp->vendor_id, SCSI_VENDOR_SIZE) <= 0)
> return 1;
>
> condlog(3, "%s: vendor = %s", pp->dev, pp->vendor_id);
>
> - if (sysfs_get_model(parent, pp->product_id, SCSI_PRODUCT_SIZE))
> + if (sysfs_get_model(parent, pp->product_id, SCSI_PRODUCT_SIZE) <=
> 0)
> return 1;
>
> condlog(3, "%s: product = %s", pp->dev, pp->product_id);
>
> - if (sysfs_get_rev(parent, pp->rev, SCSI_REV_SIZE))
> + if (sysfs_get_rev(parent, pp->rev, SCSI_REV_SIZE) <= 0)
> return 1;
>
> condlog(3, "%s: rev = %s", pp->dev, pp->rev);
> @@ -712,7 +715,7 @@ ccw_sysfs_pathinfo (struct path * pp)
>
> condlog(3, "%s: vendor = %s", pp->dev, pp->vendor_id);
>
> - if (sysfs_get_devtype(parent, attr_buff, FILE_NAME_SIZE))
> + if (sysfs_get_devtype(parent, attr_buff, FILE_NAME_SIZE) <= 0)
> return 1;
>
> if (!strncmp(attr_buff, "3370", 4)) {
> @@ -772,17 +775,17 @@ cciss_sysfs_pathinfo (struct path * pp)
> if (!attr_path || pp->sg_id.host_no == -1)
> return 1;
>
> - if (sysfs_get_vendor(parent, pp->vendor_id, SCSI_VENDOR_SIZE))
> + if (sysfs_get_vendor(parent, pp->vendor_id, SCSI_VENDOR_SIZE) <= 0)
> return 1;
>
> condlog(3, "%s: vendor = %s", pp->dev, pp->vendor_id);
>
> - if (sysfs_get_model(parent, pp->product_id, SCSI_PRODUCT_SIZE))
> + if (sysfs_get_model(parent, pp->product_id, SCSI_PRODUCT_SIZE) <=
> 0)
> return 1;
>
> condlog(3, "%s: product = %s", pp->dev, pp->product_id);
>
> - if (sysfs_get_rev(parent, pp->rev, SCSI_REV_SIZE))
> + if (sysfs_get_rev(parent, pp->rev, SCSI_REV_SIZE) <= 0)
> return 1;
>
> condlog(3, "%s: rev = %s", pp->dev, pp->rev);
> @@ -816,7 +819,7 @@ common_sysfs_pathinfo (struct path * pp)
> condlog(4, "%s: udev not initialised", pp->dev);
> return 1;
> }
> - if (sysfs_get_dev(pp->udev, pp->dev_t, BLK_DEV_SIZE)) {
> + if (sysfs_get_dev(pp->udev, pp->dev_t, BLK_DEV_SIZE) <= 0) {
> condlog(3, "%s: no 'dev' attribute in sysfs", pp->dev);
> return 1;
> }
> @@ -956,8 +959,7 @@ get_state (struct path * pp, int daemon)
> if (daemon)
> checker_set_async(c);
> if (!conf->checker_timeout &&
> - (pp->bus != SYSFS_BUS_SCSI ||
> - sysfs_get_timeout(pp, &(c->timeout))))
> + sysfs_get_timeout(pp, &(c->timeout)) <= 0)
> c->timeout = DEF_TIMEOUT;
> state = checker_check(c);
> condlog(3, "%s: state = %s", pp->dev, checker_state_name(state));
> diff --git a/libmultipath/discovery.h b/libmultipath/discovery.h
> index d049ead..3d2d0ce 100644
> --- a/libmultipath/discovery.h
> +++ b/libmultipath/discovery.h
> @@ -26,7 +26,7 @@
>
> struct config;
>
> -int sysfs_get_dev (struct udev_device *udev, char * buff, size_t len);
> +ssize_t sysfs_get_dev (struct udev_device *udev, char * buff, size_t len);
> int path_discovery (vector pathvec, struct config * conf, int flag);
>
> int do_tur (char *);
> diff --git a/libmultipath/propsel.c b/libmultipath/propsel.c
> index e47d0ca..f092227 100644
> --- a/libmultipath/propsel.c
> +++ b/libmultipath/propsel.c
> @@ -351,7 +351,7 @@ out:
> condlog(3, "%s: checker timeout = %u s (config file
> default)",
> pp->dev, c->timeout);
> }
> - else if (pp->udev && sysfs_get_timeout(pp, &c->timeout) == 0)
> + else if (sysfs_get_timeout(pp, &c->timeout) > 0)
> condlog(3, "%s: checker timeout = %u ms (sysfs setting)",
> pp->dev, c->timeout);
> else {
> diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
> index 14ea1c7..abb2c56 100644
> --- a/libmultipath/structs_vec.c
> +++ b/libmultipath/structs_vec.c
> @@ -451,8 +451,7 @@ verify_paths(struct multipath * mpp, struct vectors *
> vecs, vector rpvec)
> /*
> * see if path is in sysfs
> */
> - if (!pp->udev || sysfs_get_dev(pp->udev, pp->dev_t,
> - BLK_DEV_SIZE)) {
> + if (sysfs_get_dev(pp->udev, pp->dev_t, BLK_DEV_SIZE) <= 0)
> {
> if (pp->state != PATH_DOWN) {
> condlog(1, "%s: removing valid path %s in
> state %d",
> mpp->alias, pp->dev, pp->state);
> --
> 1.8.1.4
>
>
[-- Attachment #1.2: Type: text/html, Size: 12511 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 04/13] libmultipath: return error numbers from sysfs_get_XXX
2013-11-17 17:34 ` Christophe Varoqui
@ 2013-11-18 6:51 ` Hannes Reinecke
0 siblings, 0 replies; 24+ messages in thread
From: Hannes Reinecke @ 2013-11-18 6:51 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: dm-devel
On 11/17/2013 06:34 PM, Christophe Varoqui wrote:
> declare_sysfs_get_str(devtype);
> -declare_sysfs_get_str(cutype);
> declare_sysfs_get_str(vendor);
> declare_sysfs_get_str(model);
> declare_sysfs_get_str(rev);
>
> This part seems correct, as this function is not used anywhere, but
> orthogonal to the patch.
> Can you confirm this slip is safe ?
>
Yes. I've added the 'cutype' attribute back then when I've added
multipath support for DASD, but then it turned out this attribute
is not required (DASD selection relies on 'devtype').
So it's safe to remove it.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 13/13] multipathd: no_map_shutdown option
2013-11-15 10:29 ` [PATCH 13/13] multipathd: no_map_shutdown option Hannes Reinecke
@ 2013-11-21 23:17 ` Benjamin Marzinski
2013-11-22 9:12 ` Hannes Reinecke
0 siblings, 1 reply; 24+ messages in thread
From: Benjamin Marzinski @ 2013-11-21 23:17 UTC (permalink / raw)
To: device-mapper development
I don't understand this option? When I played around with it, it seems
like it only causes problems. For instance. If I remove all the
multipath devices with "multipath -F", then multipathd stops.
Incidentally, this pisses off the watchdog timer.
This pops up in /var/log/messages
Nov 21 11:04:05 ask-08 systemd[1]: multipathd.service watchdog timeout!
Nov 21 11:04:05 ask-08 systemd[1]: Unit multipathd.service entered
failed state
and checking the status shows the service as failed.
# service multipathd status
multipathd.service - Device-Mapper Multipath Device Controller
Loaded: loaded (/usr/lib/systemd/system/multipathd.service; enabled)
Active: failed (Result: watchdog) since Thu 2013-11-21 11:04:00 CST; 3min 25s ago
Process: 22687 ExecStart=/sbin/multipathd -d -s -n (code=exited, status=0/SUCCESS)
Status: "shutdown"
But the bigger issue is that multipathd is now stopped, and running
"multipath" doesn't bring it back again. You will create the multipath
devices, but nothing will be monitoring them. The same thing happens if
the multipathd service is started before the any multipathable devices
are discovered. It starts up and then shuts back down (again tripping
the watchdog timer). When those devices finally get discovered, there is
nothing to listening for the uevents, so no multipath devices ever get
created.
I must be missing something here.
-Ben
On Fri, Nov 15, 2013 at 11:29:44AM +0100, Hannes Reinecke wrote:
> This patchs implements the 'no_map_shutdown' configuration option.
> Idea is that the daemon will exit if no maps are found, as then
> there's nothing to be done for the daemon anyway.
> The daemon will be restarted via socket activation for any CLI
> command.
> This this option the multipathd.socket unit can be activated always
> and won't add to the overall system load.
>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
> libmultipath/config.h | 1 +
> libmultipath/dict.c | 32 ++++++++++++++++++++++++++++++++
> multipath.conf.annotated | 9 +++++++++
> multipath/multipath.conf.5 | 7 +++++++
> multipathd/main.c | 20 +++++++++++++++-----
> multipathd/multipathd.8 | 4 ++++
> multipathd/multipathd.service | 2 +-
> 7 files changed, 69 insertions(+), 6 deletions(-)
>
> diff --git a/libmultipath/config.h b/libmultipath/config.h
> index 9c467e8..590fb32 100644
> --- a/libmultipath/config.h
> +++ b/libmultipath/config.h
> @@ -103,6 +103,7 @@ struct config {
> int fast_io_fail;
> unsigned int dev_loss;
> int log_checker_err;
> + int no_map_shutdown;
> int allow_queueing;
> uid_t uid;
> gid_t gid;
> diff --git a/libmultipath/dict.c b/libmultipath/dict.c
> index 0bf9587..da706f7 100644
> --- a/libmultipath/dict.c
> +++ b/libmultipath/dict.c
> @@ -545,6 +545,25 @@ def_log_checker_err_handler(vector strvec)
> }
>
> static int
> +def_no_map_shutdown_handler(vector strvec)
> +{
> + char * buff;
> +
> + buff = set_value(strvec);
> + if (!buff)
> + return 1;
> +
> + if (!strncmp(buff, "on", 2) || !strncmp(buff, "yes", 3) ||
> + !strncmp(buff, "1", 1))
> + conf->no_map_shutdown = 1;
> + else
> + conf->no_map_shutdown = 0;
> +
> + free(buff);
> + return 0;
> +}
> +
> +static int
> def_reservation_key_handler(vector strvec)
> {
> char *buff;
> @@ -2714,6 +2733,18 @@ snprint_def_log_checker_err (char * buff, int len, void * data)
> }
>
> static int
> +snprint_def_no_map_shutdown (char * buff, int len, void * data)
> +{
> + switch (conf->no_map_shutdown) {
> + case 0:
> + return snprintf(buff, len, "\"no\"");
> + case 1:
> + return snprintf(buff, len, "\"yes\"");
> + }
> + return 0;
> +}
> +
> +static int
> snprint_def_user_friendly_names (char * buff, int len, void * data)
> {
> if (conf->user_friendly_names == USER_FRIENDLY_NAMES_ON)
> @@ -2850,6 +2881,7 @@ init_keywords(void)
> install_keyword("reservation_key", &def_reservation_key_handler, &snprint_def_reservation_key);
> install_keyword("retain_attached_hw_handler", &def_retain_hwhandler_handler, &snprint_def_retain_hwhandler_handler);
> install_keyword("detect_prio", &def_detect_prio_handler, &snprint_def_detect_prio);
> + install_keyword("no_map_shutdown", &def_no_map_shutdown_handler, &snprint_def_no_map_shutdown);
> __deprecated install_keyword("default_selector", &def_selector_handler, NULL);
> __deprecated install_keyword("default_path_grouping_policy", &def_pgpolicy_handler, NULL);
> __deprecated install_keyword("default_uid_attribute", &def_uid_attribute_handler, NULL);
> diff --git a/multipath.conf.annotated b/multipath.conf.annotated
> index a20302c..4b56bc8 100644
> --- a/multipath.conf.annotated
> +++ b/multipath.conf.annotated
> @@ -215,6 +215,15 @@
> # user_friendly_names no
> #
> # #
> +# # name : no_map_shutdown
> +# # scope : multipathd
> +# # desc : If set to "yes" the multipath daemon will terminate
> +# # if no multipath devices have been found.
> +# # values : yes|no
> +# # default : no
> +# no_map_shutdown no
> +#
> +# #
> # # name : mode
> # # scope : multipath & multipathd
> # # desc : The mode to use for the multipath device nodes, in octal.
> diff --git a/multipath/multipath.conf.5 b/multipath/multipath.conf.5
> index cf5bec0..dba0027 100644
> --- a/multipath/multipath.conf.5
> +++ b/multipath/multipath.conf.5
> @@ -409,6 +409,13 @@ will automatically use the
> .I alua
> prioritizer. If not, the prioritizer will be selected as usual. Default is
> .I no
> +.TP
> +.B no_map_shutdown
> +If set to
> +.I yes
> +, the multipath daemon will terminate if no multipath devices are found.
> +Default is
> +.I no
> .
> .SH "blacklist section"
> The
> diff --git a/multipathd/main.c b/multipathd/main.c
> index d386e0a..d4a5809 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -1026,11 +1026,11 @@ followover_should_failback(struct path * pp)
> return 1;
> }
>
> -static void
> +static int
> defered_failback_tick (vector mpvec)
> {
> struct multipath * mpp;
> - unsigned int i;
> + unsigned int i, mpp_count = 0;
>
> vector_foreach_slot (mpvec, mpp, i) {
> /*
> @@ -1042,7 +1042,9 @@ defered_failback_tick (vector mpvec)
> if (!mpp->failback_tick && need_switch_pathgroup(mpp, 1))
> switch_pathgroup(mpp);
> }
> + mpp_count++;
> }
> + return mpp_count;
> }
>
> static void
> @@ -1284,7 +1286,7 @@ checkerloop (void *ap)
>
> while (1) {
> struct timeval diff_time, start_time, end_time;
> - int num_paths = 0;
> + int num_paths = 0, num_maps = 0;
>
> if (gettimeofday(&start_time, NULL) != 0)
> start_time.tv_sec = 0;
> @@ -1300,7 +1302,7 @@ checkerloop (void *ap)
> }
> }
> if (vecs->mpvec) {
> - defered_failback_tick(vecs->mpvec);
> + num_maps = defered_failback_tick(vecs->mpvec);
> retry_count_tick(vecs->mpvec);
> }
> if (count)
> @@ -1320,6 +1322,11 @@ checkerloop (void *ap)
> num_paths, num_paths > 1 ? "s" : "",
> diff_time.tv_sec, diff_time.tv_usec);
> }
> + if (!num_maps && conf->no_map_shutdown) {
> + condlog(3, "terminating, no multipath devices");
> + exit_daemon();
> + break;
> + }
> sleep(1);
> }
> return NULL;
> @@ -1891,7 +1898,7 @@ main (int argc, char *argv[])
> if (!conf)
> exit(1);
>
> - while ((arg = getopt(argc, argv, ":dsv:k::")) != EOF ) {
> + while ((arg = getopt(argc, argv, ":dnsv:k::")) != EOF ) {
> switch(arg) {
> case 'd':
> logsink = 0;
> @@ -1904,6 +1911,9 @@ main (int argc, char *argv[])
>
> conf->verbosity = atoi(optarg);
> break;
> + case 'n':
> + conf->no_map_shutdown = 1;
> + break;
> case 's':
> logsink = -1;
> break;
> diff --git a/multipathd/multipathd.8 b/multipathd/multipathd.8
> index 2aea150..eb39c9c 100644
> --- a/multipathd/multipathd.8
> +++ b/multipathd/multipathd.8
> @@ -22,6 +22,10 @@ devmap reconfiguration, so that it can refresh its failed path list.
> .B \-d
> Foreground Mode. Don't daemonize, and print all messages to stdout and stderr.
> .TP
> +.B \-n
> +Oneshot Mode; daemon will terminate if no multipath devices are found.
> +Equivalent to set 'no_map_shutdown' to '1' in the configuration file.
> +.TP
> .B \-s
> Suppress timestamps. Do not prefix logging messages with a timestamp.
> .TP
> diff --git a/multipathd/multipathd.service b/multipathd/multipathd.service
> index 6472dd5..b34667d 100644
> --- a/multipathd/multipathd.service
> +++ b/multipathd/multipathd.service
> @@ -8,7 +8,7 @@ Conflicts=shutdown.target
> [Service]
> Type=notify
> NotifyAccess=main
> -ExecStart=/sbin/multipathd -d -s
> +ExecStart=/sbin/multipathd -d -s -n
> ExecReload=/sbin/multipathd reconfigure
> WatchdogSec=5s
> LimitCORE=infinity
> --
> 1.8.1.4
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 13/13] multipathd: no_map_shutdown option
2013-11-21 23:17 ` Benjamin Marzinski
@ 2013-11-22 9:12 ` Hannes Reinecke
2013-11-22 9:30 ` Christophe Varoqui
0 siblings, 1 reply; 24+ messages in thread
From: Hannes Reinecke @ 2013-11-22 9:12 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: dm-devel
On 11/22/2013 12:17 AM, Benjamin Marzinski wrote:
> I don't understand this option? When I played around with it, it seems
> like it only causes problems. For instance. If I remove all the
> multipath devices with "multipath -F", then multipathd stops.
> Incidentally, this pisses off the watchdog timer.
>
> This pops up in /var/log/messages
>
> Nov 21 11:04:05 ask-08 systemd[1]: multipathd.service watchdog timeout!
> Nov 21 11:04:05 ask-08 systemd[1]: Unit multipathd.service entered
> failed state
>
> and checking the status shows the service as failed.
>
> # service multipathd status
> multipathd.service - Device-Mapper Multipath Device Controller
> Loaded: loaded (/usr/lib/systemd/system/multipathd.service; enabled)
> Active: failed (Result: watchdog) since Thu 2013-11-21 11:04:00 CST; 3min 25s ago
> Process: 22687 ExecStart=/sbin/multipathd -d -s -n (code=exited, status=0/SUCCESS)
> Status: "shutdown"
>
Yeah, because I forgot to set the exit status via sd_notify.
I'll be updating a patchset to include that.
> But the bigger issue is that multipathd is now stopped, and running
> "multipath" doesn't bring it back again.
Ah. Yes, you are absolutely right, of course.
I've forgot about this.
> You will create the multipath
> devices, but nothing will be monitoring them. The same thing happens if
> the multipathd service is started before the any multipathable devices
> are discovered. It starts up and then shuts back down (again tripping
> the watchdog timer). When those devices finally get discovered, there is
> nothing to listening for the uevents, so no multipath devices ever get
> created.
>
> I must be missing something here.
>
No, it's actually true what you've said.
Multipathd can only listen to events if it's started.
But that's precisely what systemd is for, right?
Starting a service when something is written onto a socket?
Guess it need some more work here.
But I could finally do my long-term project of merging
multipath and multipathd to share the same codebase, with
multipath just using the multipathd CLI and having no logic
on it's own.
But in the light of this, this patch is indeed not working
as designed, and should not be applied as of now.
Which doesn't affect the other parts in the series, which
_definitely_ should be applied.
Christophe, how to proceed?
Do you need a new patchset with the last patch removed,
or are you fine with the current one and just not applying
the last patch?
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 13/13] multipathd: no_map_shutdown option
2013-11-22 9:12 ` Hannes Reinecke
@ 2013-11-22 9:30 ` Christophe Varoqui
2013-11-22 10:04 ` Hannes Reinecke
0 siblings, 1 reply; 24+ messages in thread
From: Christophe Varoqui @ 2013-11-22 9:30 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: dm-devel
[-- Attachment #1.1: Type: text/plain, Size: 3464 bytes --]
Hi,
To my knowledge, systemd has not been integrated in all distributions.
It would be nice not to break the build of multipath-tools on those
environments.
The current patchset breaks the build on non-systemd systems as soon as
patch 06/13.
I can apply the patchset from 01/13 to 05/13 included, and let you refactor
the systemd integration proper in a new series.
Is that ok with you ?
Best regards,
Christophe Varoqui
www.opensvc.com
On Fri, Nov 22, 2013 at 10:12 AM, Hannes Reinecke <hare@suse.de> wrote:
> On 11/22/2013 12:17 AM, Benjamin Marzinski wrote:
> > I don't understand this option? When I played around with it, it seems
> > like it only causes problems. For instance. If I remove all the
> > multipath devices with "multipath -F", then multipathd stops.
> > Incidentally, this pisses off the watchdog timer.
> >
> > This pops up in /var/log/messages
> >
> > Nov 21 11:04:05 ask-08 systemd[1]: multipathd.service watchdog timeout!
> > Nov 21 11:04:05 ask-08 systemd[1]: Unit multipathd.service entered
> > failed state
> >
> > and checking the status shows the service as failed.
> >
> > # service multipathd status
> > multipathd.service - Device-Mapper Multipath Device Controller
> > Loaded: loaded (/usr/lib/systemd/system/multipathd.service; enabled)
> > Active: failed (Result: watchdog) since Thu 2013-11-21 11:04:00 CST;
> 3min 25s ago
> > Process: 22687 ExecStart=/sbin/multipathd -d -s -n (code=exited,
> status=0/SUCCESS)
> > Status: "shutdown"
> >
> Yeah, because I forgot to set the exit status via sd_notify.
> I'll be updating a patchset to include that.
>
> > But the bigger issue is that multipathd is now stopped, and running
> > "multipath" doesn't bring it back again.
>
> Ah. Yes, you are absolutely right, of course.
> I've forgot about this.
>
> > You will create the multipath
> > devices, but nothing will be monitoring them. The same thing happens if
> > the multipathd service is started before the any multipathable devices
> > are discovered. It starts up and then shuts back down (again tripping
> > the watchdog timer). When those devices finally get discovered, there is
> > nothing to listening for the uevents, so no multipath devices ever get
> > created.
> >
> > I must be missing something here.
> >
> No, it's actually true what you've said.
> Multipathd can only listen to events if it's started.
> But that's precisely what systemd is for, right?
> Starting a service when something is written onto a socket?
> Guess it need some more work here.
>
> But I could finally do my long-term project of merging
> multipath and multipathd to share the same codebase, with
> multipath just using the multipathd CLI and having no logic
> on it's own.
>
> But in the light of this, this patch is indeed not working
> as designed, and should not be applied as of now.
>
> Which doesn't affect the other parts in the series, which
> _definitely_ should be applied.
> Christophe, how to proceed?
>
> Do you need a new patchset with the last patch removed,
> or are you fine with the current one and just not applying
> the last patch?
>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke zSeries & Storage
> hare@suse.de +49 911 74053 688
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
>
[-- Attachment #1.2: Type: text/html, Size: 4466 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 13/13] multipathd: no_map_shutdown option
2013-11-22 9:30 ` Christophe Varoqui
@ 2013-11-22 10:04 ` Hannes Reinecke
2013-11-22 10:11 ` Christophe Varoqui
0 siblings, 1 reply; 24+ messages in thread
From: Hannes Reinecke @ 2013-11-22 10:04 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: dm-devel
On 11/22/2013 10:30 AM, Christophe Varoqui wrote:
> Hi,
>
> To my knowledge, systemd has not been integrated in all distributions.
> It would be nice not to break the build of multipath-tools on those
> environments.
>
> The current patchset breaks the build on non-systemd systems as soon
> as patch 06/13.
> I can apply the patchset from 01/13 to 05/13 included, and let you
> refactor the systemd integration proper in a new series.
>
> Is that ok with you ?
>
Yes, of course.
I'll be reworking the remaining patches to make systemd optional.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 13/13] multipathd: no_map_shutdown option
2013-11-22 10:04 ` Hannes Reinecke
@ 2013-11-22 10:11 ` Christophe Varoqui
0 siblings, 0 replies; 24+ messages in thread
From: Christophe Varoqui @ 2013-11-22 10:11 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: dm-devel
[-- Attachment #1.1: Type: text/plain, Size: 1045 bytes --]
Excellent,
I'll push the 01-05 patchset to the master git repo this evening (GMT+1)
On Fri, Nov 22, 2013 at 11:04 AM, Hannes Reinecke <hare@suse.de> wrote:
> On 11/22/2013 10:30 AM, Christophe Varoqui wrote:
> > Hi,
> >
> > To my knowledge, systemd has not been integrated in all distributions.
> > It would be nice not to break the build of multipath-tools on those
> > environments.
> >
> > The current patchset breaks the build on non-systemd systems as soon
> > as patch 06/13.
> > I can apply the patchset from 01/13 to 05/13 included, and let you
> > refactor the systemd integration proper in a new series.
> >
> > Is that ok with you ?
> >
> Yes, of course.
>
> I'll be reworking the remaining patches to make systemd optional.
>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke zSeries & Storage
> hare@suse.de +49 911 74053 688
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
>
[-- Attachment #1.2: Type: text/html, Size: 1641 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 09/13] multipathd: Implement systemd watchdog integration
2013-11-15 10:29 ` [PATCH 09/13] multipathd: Implement systemd watchdog integration Hannes Reinecke
@ 2013-11-22 22:17 ` Benjamin Marzinski
2013-11-25 7:50 ` Hannes Reinecke
0 siblings, 1 reply; 24+ messages in thread
From: Benjamin Marzinski @ 2013-11-22 22:17 UTC (permalink / raw)
To: device-mapper development
On Fri, Nov 15, 2013 at 11:29:40AM +0100, Hannes Reinecke wrote:
> In the past there have been several instances where multipathd
> would hang with the checkerloop as some path checker might not
> be able to return in time.
> This patch now activates the watchdog feature from systemd
> to shutdown (and possibly restart) multipathd in these
> situations.
This might need more of a systemd fix that a multipathd one, but once
multipathd times out the watchdog timer, even if it starts sending
notifications at an acceptable rate again, the service is still listed
as failed.
# service multipathd status
Redirecting to # /bin/systemctl status multipathd.service
multipathd.service - Device-Mapper Multipath Device Controller
Loaded: loaded (/usr/lib/systemd/system/multipathd.service; enabled)
Active: failed (Result: watchdog) since Fri 2013-11-22 09:43:01 CST; 9min ago
Main PID: 6321
Status: "running"
CGroup: name=systemd:/system/multipathd.service
└─6321 /sbin/multipathd -d -s
More annoying, the logs fills up with messages like
Nov 22 09:46:28 ask-08 systemd[1]: multipathd.service: Got notification
message from PID 6321, but reception only permitted for PID 0
Nov 22 09:46:29 ask-08 systemd[1]: multipathd.service: Got notification
message from PID 6321, but reception only permitted for PID 0
Nov 22 09:46:30 ask-08 systemd[1]: multipathd.service: Got notification
message from PID 6321, but reception only permitted for PID 0
Nov 22 09:46:31 ask-08 systemd[1]: multipathd.service: Got notification
message from PID 6321, but reception only permitted for PID 0
Also
# service multipathd stop
won't kill it. Even worse
# service multipathd start
WILL kill it without successfully restarting another version. A second
# service multipathd start
is necessary to get things back to a functional state again.
I'm not asking for systemd to actually shut down multipathd. In a
production setup, killing multipathd because it had a temporary stall
seems like bad default behavior. I haven't looked at the systemd
watchdog code to know if this is possible, but ideally, multipathd would
be able to just start sending watchdog notifications again, and be able
to continue on with just a message in the logs recording the timeout.
I realize that there is a benefit to letting people know that there was
a problem, but the way it's appearing now, it will be pretty confusing to
the sysadmin who sees that, and filling up the logs with notification
rejections is pretty annoying.
And as long as I'm asking for systemd things, the ability to add a rule
to the unit file that kills the service and forces a core dump when
watchdog timer was tripped would help tracking down what's stalling the
checker loop. Like I said before, I don't think this should be
happening by default, but putting it in there commented out might not be
a bad idea.
-Ben
>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
> multipath/multipath.conf.5 | 7 +++++--
> multipathd/main.c | 15 ++++++++++++++-
> multipathd/multipathd.service | 1 +
> 3 files changed, 20 insertions(+), 3 deletions(-)
>
> diff --git a/multipath/multipath.conf.5 b/multipath/multipath.conf.5
> index 0fd3035..cf5bec0 100644
> --- a/multipath/multipath.conf.5
> +++ b/multipath/multipath.conf.5
> @@ -71,8 +71,11 @@ section recognizes the following keywords:
> .B polling_interval
> interval between two path checks in seconds. For properly functioning paths,
> the interval between checks will gradually increase to
> -.B max_polling_interval;
> -default is
> +.B max_polling_interval.
> +This value will be overridden by the
> +.B WatchdogSec
> +setting in the multipathd.service definition if systemd is used.
> +Default is
> .I 5
> .TP
> .B max_polling_interval
> diff --git a/multipathd/main.c b/multipathd/main.c
> index 72b3740..abeebc2 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -1286,6 +1286,7 @@ checkerloop (void *ap)
> lock(vecs->lock);
> pthread_testcancel();
> condlog(4, "tick");
> + sd_notify(0, "WATCHDOG=1");
>
> if (vecs->pathvec) {
> vector_foreach_slot (vecs->pathvec, pp, i) {
> @@ -1585,7 +1586,8 @@ child (void * param)
> pthread_attr_t log_attr, misc_attr, uevent_attr;
> struct vectors * vecs;
> struct multipath * mpp;
> - int i;
> + char *envp;
> + int i, checkint;
> int rc, pid_rc;
>
> mlockall(MCL_CURRENT | MCL_FUTURE);
> @@ -1658,6 +1660,17 @@ child (void * param)
>
> conf->daemon = 1;
> udev_set_sync_support(0);
> + envp = getenv("WATCHDOG_USEC");
> + if (envp && sscanf(envp, "%d", &checkint) == 1) {
> + /* Value is in microseconds */
> + checkint = checkint / 1000000;
> + if (checkint > conf->max_checkint)
> + conf->max_checkint = checkint;
> + conf->checkint = checkint;
> + condlog(3, "set checkint to %d max %d",
> + conf->checkint, conf->max_checkint);
> + }
> +
> /*
> * Start uevent listener early to catch events
> */
> diff --git a/multipathd/multipathd.service b/multipathd/multipathd.service
> index fb84025..848a231 100644
> --- a/multipathd/multipathd.service
> +++ b/multipathd/multipathd.service
> @@ -10,6 +10,7 @@ Type=notify
> NotifyAccess=main
> ExecStart=/sbin/multipathd -d -s
> ExecReload=/sbin/multipathd reconfigure
> +WatchdogSec=5s
>
> [Install]
> WantedBy=sysinit.target
> --
> 1.8.1.4
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 09/13] multipathd: Implement systemd watchdog integration
2013-11-22 22:17 ` Benjamin Marzinski
@ 2013-11-25 7:50 ` Hannes Reinecke
2013-11-25 16:21 ` Hannes Reinecke
0 siblings, 1 reply; 24+ messages in thread
From: Hannes Reinecke @ 2013-11-25 7:50 UTC (permalink / raw)
To: dm-devel
On 11/22/2013 11:17 PM, Benjamin Marzinski wrote:
> On Fri, Nov 15, 2013 at 11:29:40AM +0100, Hannes Reinecke wrote:
>> In the past there have been several instances where multipathd
>> would hang with the checkerloop as some path checker might not
>> be able to return in time.
>> This patch now activates the watchdog feature from systemd
>> to shutdown (and possibly restart) multipathd in these
>> situations.
>
> This might need more of a systemd fix that a multipathd one, but once
> multipathd times out the watchdog timer, even if it starts sending
> notifications at an acceptable rate again, the service is still listed
> as failed.
>
> # service multipathd status
> Redirecting to # /bin/systemctl status multipathd.service
> multipathd.service - Device-Mapper Multipath Device Controller
> Loaded: loaded (/usr/lib/systemd/system/multipathd.service; enabled)
> Active: failed (Result: watchdog) since Fri 2013-11-22 09:43:01 CST; 9min ago
> Main PID: 6321
> Status: "running"
> CGroup: name=systemd:/system/multipathd.service
> └─6321 /sbin/multipathd -d -s
>
> More annoying, the logs fills up with messages like
>
> Nov 22 09:46:28 ask-08 systemd[1]: multipathd.service: Got notification
> message from PID 6321, but reception only permitted for PID 0
> Nov 22 09:46:29 ask-08 systemd[1]: multipathd.service: Got notification
> message from PID 6321, but reception only permitted for PID 0
> Nov 22 09:46:30 ask-08 systemd[1]: multipathd.service: Got notification
> message from PID 6321, but reception only permitted for PID 0
> Nov 22 09:46:31 ask-08 systemd[1]: multipathd.service: Got notification
> message from PID 6321, but reception only permitted for PID 0
>
> Also
>
> # service multipathd stop
>
> won't kill it. Even worse
>
> # service multipathd start
>
> WILL kill it without successfully restarting another version. A second
>
> # service multipathd start
>
> is necessary to get things back to a functional state again.
>
Actually, upstream systemd (>= v207) now has a new flag
restart=on-watchdog
With that systemd should be restarting multipathd after a watchdog
timeout. That should solve you immediate problem here.
> I'm not asking for systemd to actually shut down multipathd. In a
> production setup, killing multipathd because it had a temporary stall
> seems like bad default behavior. I haven't looked at the systemd
> watchdog code to know if this is possible, but ideally, multipathd would
> be able to just start sending watchdog notifications again, and be able
> to continue on with just a message in the logs recording the timeout.
>
Not stopping. Restarting.
The whole point of the watchdog code is to take some action if the
watchdog messages fail.
We should aim for
a) make the watchdog interval the longest interval we're prepared to
checkerloop to complete (hence the patch to measure the elapsed
time per loop iteration)
b) have systemd restart multipathd whenever the watchdog triggers,
as then we're sure we can't recover from this.
That should cover your sentiment, right?
> I realize that there is a benefit to letting people know that there was
> a problem, but the way it's appearing now, it will be pretty confusing to
> the sysadmin who sees that, and filling up the logs with notification
> rejections is pretty annoying.
>
Yeah, correct. We should be using the 'restart' flag in the service
file. I did not do this as the patch went into systemd only
recently, and one would need to figure out how to treat
installations where an older systemd version is running.
> And as long as I'm asking for systemd things, the ability to add a rule
> to the unit file that kills the service and forces a core dump when
> watchdog timer was tripped would help tracking down what's stalling the
> checker loop. Like I said before, I don't think this should be
> happening by default, but putting it in there commented out might not be
> a bad idea.
>
Yeah, that would be preferable. Sadly there is no 'force coredump'
option. What I would like to have is a
'on-watchdog'
option in systemd, where one can configure the action which needs to
be taken when the watchdog triggers.
Only adding a new option is touching systemd in tons of various
places, so my initial attempt here failed.
So I went for the easier option to just add a new flag to an
existing setting.
Cheers,
Hannes
P.S.: But hey, at least someone is actually testing this stuff.
Cool.
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 09/13] multipathd: Implement systemd watchdog integration
2013-11-25 7:50 ` Hannes Reinecke
@ 2013-11-25 16:21 ` Hannes Reinecke
0 siblings, 0 replies; 24+ messages in thread
From: Hannes Reinecke @ 2013-11-25 16:21 UTC (permalink / raw)
To: dm-devel
On 11/25/2013 08:50 AM, Hannes Reinecke wrote:
> On 11/22/2013 11:17 PM, Benjamin Marzinski wrote:
[ .. ]
>> I'm not asking for systemd to actually shut down multipathd. In a
>> production setup, killing multipathd because it had a temporary stall
>> seems like bad default behavior. I haven't looked at the systemd
>> watchdog code to know if this is possible, but ideally, multipathd would
>> be able to just start sending watchdog notifications again, and be able
>> to continue on with just a message in the logs recording the timeout.
>>
> Not stopping. Restarting.
> The whole point of the watchdog code is to take some action if the
> watchdog messages fail.
> We should aim for
> a) make the watchdog interval the longest interval we're prepared to
> checkerloop to complete (hence the patch to measure the elapsed
> time per loop iteration)
> b) have systemd restart multipathd whenever the watchdog triggers,
> as then we're sure we can't recover from this.
>
> That should cover your sentiment, right?
>
>> I realize that there is a benefit to letting people know that there was
>> a problem, but the way it's appearing now, it will be pretty confusing to
>> the sysadmin who sees that, and filling up the logs with notification
>> rejections is pretty annoying.
>>
> Yeah, correct. We should be using the 'restart' flag in the service
> file. I did not do this as the patch went into systemd only
> recently, and one would need to figure out how to treat
> installations where an older systemd version is running.
>
And it also looks as if we'd be tripping over RH bug#982379, where
the watchdog fails to shutdown a process properly.
Which apparently is fixed in 206.
So we'd need a recent systemd for that to work properly.
I'm _quite_ sure there are errors in earlier versions, where the
watchdog feature just causes a new process to be started, without
terminating the old one. _Very_ annoying.
I'll retest with latest systemd. And make the watchdog feature
selective on the systemd version.
Cheers,
Hannes
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2013-11-25 16:21 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-15 10:29 [PATCH 00/13] systemd integraion Hannes Reinecke
2013-11-15 10:29 ` [PATCH 01/13] Improve logging for orphan_path() Hannes Reinecke
2013-11-15 10:29 ` [PATCH 02/13] Set priority to '0' for PATH_BLOCKED or PATH_DOWN Hannes Reinecke
2013-11-15 10:29 ` [PATCH 03/13] libmultipath: fixup strlcpy Hannes Reinecke
2013-11-15 10:29 ` [PATCH 04/13] libmultipath: return error numbers from sysfs_get_XXX Hannes Reinecke
2013-11-17 17:34 ` Christophe Varoqui
2013-11-18 6:51 ` Hannes Reinecke
2013-11-15 10:29 ` [PATCH 05/13] libmultipath: do not stall on recv_packet() Hannes Reinecke
2013-11-15 10:29 ` [PATCH 06/13] multipathd: switch to socket activation for systemd Hannes Reinecke
2013-11-15 10:29 ` [PATCH 07/13] multipathd: use sd_notify() to inform systemd Hannes Reinecke
2013-11-15 10:29 ` [PATCH 08/13] multipathd: Add option '-s' to suppress timestamps Hannes Reinecke
2013-11-15 10:29 ` [PATCH 09/13] multipathd: Implement systemd watchdog integration Hannes Reinecke
2013-11-22 22:17 ` Benjamin Marzinski
2013-11-25 7:50 ` Hannes Reinecke
2013-11-25 16:21 ` Hannes Reinecke
2013-11-15 10:29 ` [PATCH 10/13] multipathd: enable core dumps for systemd Hannes Reinecke
2013-11-15 10:29 ` [PATCH 11/13] multipathd: Read environment variables from systemd Hannes Reinecke
2013-11-15 10:29 ` [PATCH 12/13] multipathd: measure path check time Hannes Reinecke
2013-11-15 10:29 ` [PATCH 13/13] multipathd: no_map_shutdown option Hannes Reinecke
2013-11-21 23:17 ` Benjamin Marzinski
2013-11-22 9:12 ` Hannes Reinecke
2013-11-22 9:30 ` Christophe Varoqui
2013-11-22 10:04 ` Hannes Reinecke
2013-11-22 10:11 ` Christophe Varoqui
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.