All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] multipath-tools: cli_add_map cleanup and misc fixes
@ 2024-06-05 23:22 Benjamin Marzinski
  2024-06-05 23:22 ` [PATCH 1/7] multipathd: fix flush check in flush_map() Benjamin Marzinski
                   ` (6 more replies)
  0 siblings, 7 replies; 24+ messages in thread
From: Benjamin Marzinski @ 2024-06-05 23:22 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

This patchset is half a grab bag of small fixes and half cleaning up
cli_add_map() and making it accept adding a multipath device by WWID
instead of just by alias.

Benjamin Marzinski (7):
  multipathd: fix flush check in flush_map()
  libmultipath: check for not PATH_UP in detect_alua
  multipathd: refresh multipath before calling set_no_path_retry()
  libmultipath: add name and minor outputs for dm_map_present_by_uuid()
  multipath-tools: Makefile.inc: compile with -fexceptions
  multipathd: free alias if cli_add_map() is cancelled
  multipathd: make cli_add_map() handle adding maps by WWID correctly

 Makefile.inc                      |  3 +-
 libmultipath/devmapper.c          | 11 ++++-
 libmultipath/devmapper.h          |  2 +-
 libmultipath/discovery.c          |  2 +-
 libmultipath/libmultipath.version |  7 ++-
 libmultipath/valid.c              |  2 +-
 multipathd/cli_handlers.c         | 71 ++++++++++++++-----------------
 multipathd/main.c                 |  2 +-
 8 files changed, 55 insertions(+), 45 deletions(-)

-- 
2.45.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/7] multipathd: fix flush check in flush_map()
  2024-06-05 23:22 [PATCH 0/7] multipath-tools: cli_add_map cleanup and misc fixes Benjamin Marzinski
@ 2024-06-05 23:22 ` Benjamin Marzinski
  2024-06-27  9:31   ` Martin Wilck
  2024-06-05 23:22 ` [PATCH 2/7] libmultipath: check for not PATH_UP in detect_alua Benjamin Marzinski
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 24+ messages in thread
From: Benjamin Marzinski @ 2024-06-05 23:22 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

Forgot the comparison in the "if" statement.

Fixes 8a3898339 ("multipathd: sync features on flush_map failure corner case")

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 multipathd/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/multipathd/main.c b/multipathd/main.c
index 09286dd0..58afe14a 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -813,7 +813,7 @@ flush_map(struct multipath * mpp, struct vectors * vecs)
 {
 	int r = dm_suspend_and_flush_map(mpp->alias, 0);
 	if (r != DM_FLUSH_OK) {
-		if (DM_FLUSH_FAIL_CANT_RESTORE)
+		if (r == DM_FLUSH_FAIL_CANT_RESTORE)
 			remove_feature(&mpp->features, "queue_if_no_path");
 		condlog(0, "%s: can't flush", mpp->alias);
 		return r;
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/7] libmultipath: check for not PATH_UP in detect_alua
  2024-06-05 23:22 [PATCH 0/7] multipath-tools: cli_add_map cleanup and misc fixes Benjamin Marzinski
  2024-06-05 23:22 ` [PATCH 1/7] multipathd: fix flush check in flush_map() Benjamin Marzinski
@ 2024-06-05 23:22 ` Benjamin Marzinski
  2024-06-27  9:31   ` Martin Wilck
  2024-06-05 23:22 ` [PATCH 3/7] multipathd: refresh multipath before calling set_no_path_retry() Benjamin Marzinski
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 24+ messages in thread
From: Benjamin Marzinski @ 2024-06-05 23:22 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

Previously detect_alua was setting pp->tpgs if the path state was either
PATH_UP or PATH_REMOVED. Setting pp->tpgs for PATH_REMOVED makes no
sense.  PATH_UP is the only state that path_offline() returns where we
should set pp->tpgs, so just check for that, instead of checking for all
the states where we shouldn't set pp->tpgs.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/discovery.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
index e2052422..380e0e95 100644
--- a/libmultipath/discovery.c
+++ b/libmultipath/discovery.c
@@ -1087,7 +1087,7 @@ detect_alua(struct path * pp)
 			return;
 
 		state = path_offline(pp);
-		if (state == PATH_DOWN || state == PATH_PENDING)
+		if (state != PATH_UP)
 			return;
 
 		pp->tpgs = TPGS_NONE;
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/7] multipathd: refresh multipath before calling set_no_path_retry()
  2024-06-05 23:22 [PATCH 0/7] multipath-tools: cli_add_map cleanup and misc fixes Benjamin Marzinski
  2024-06-05 23:22 ` [PATCH 1/7] multipathd: fix flush check in flush_map() Benjamin Marzinski
  2024-06-05 23:22 ` [PATCH 2/7] libmultipath: check for not PATH_UP in detect_alua Benjamin Marzinski
@ 2024-06-05 23:22 ` Benjamin Marzinski
  2024-06-27  7:47   ` Martin Wilck
  2024-06-05 23:22 ` [PATCH 4/7] libmultipath: add name and minor outputs for dm_map_present_by_uuid() Benjamin Marzinski
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 24+ messages in thread
From: Benjamin Marzinski @ 2024-06-05 23:22 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

Make sure that all callers of set_no_path_retry() update the
multipath->features string from the kernel before calling it, so that
they have the current queueing state to check.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 multipathd/cli_handlers.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/multipathd/cli_handlers.c b/multipathd/cli_handlers.c
index 117570e1..3dc5e690 100644
--- a/multipathd/cli_handlers.c
+++ b/multipathd/cli_handlers.c
@@ -934,6 +934,8 @@ cli_restore_queueing(void *v, struct strbuf *reply, void *data)
 		pthread_cleanup_push(put_multipath_config, conf);
 		select_no_path_retry(conf, mpp);
 		pthread_cleanup_pop(1);
+		if (refresh_multipath(vecs, mpp))
+			return -ENODEV;
 		set_no_path_retry(mpp);
 	} else
 		condlog(2, "%s: queueing not disabled. Nothing to do", mapname);
@@ -956,6 +958,10 @@ cli_restore_all_queueing(void *v, struct strbuf *reply, void *data)
 			pthread_cleanup_push(put_multipath_config, conf);
 			select_no_path_retry(conf, mpp);
 			pthread_cleanup_pop(1);
+			if (refresh_multipath(vecs, mpp)) {
+				i--;
+				continue;
+			}
 			set_no_path_retry(mpp);
 		} else
 			condlog(2, "%s: queueing not disabled. Nothing to do",
@@ -983,6 +989,8 @@ cli_disable_queueing(void *v, struct strbuf *reply, void *data)
 	mpp->retry_tick = 0;
 	mpp->no_path_retry = NO_PATH_RETRY_FAIL;
 	mpp->disable_queueing = 1;
+	if (refresh_multipath(vecs, mpp))
+		return -ENODEV;
 	set_no_path_retry(mpp);
 	return 0;
 }
@@ -1001,6 +1009,10 @@ cli_disable_all_queueing(void *v, struct strbuf *reply, void *data)
 		mpp->retry_tick = 0;
 		mpp->no_path_retry = NO_PATH_RETRY_FAIL;
 		mpp->disable_queueing = 1;
+		if (refresh_multipath(vecs, mpp)) {
+			i--;
+			continue;
+		}
 		set_no_path_retry(mpp);
 	}
 	return 0;
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/7] libmultipath: add name and minor outputs for dm_map_present_by_uuid()
  2024-06-05 23:22 [PATCH 0/7] multipath-tools: cli_add_map cleanup and misc fixes Benjamin Marzinski
                   ` (2 preceding siblings ...)
  2024-06-05 23:22 ` [PATCH 3/7] multipathd: refresh multipath before calling set_no_path_retry() Benjamin Marzinski
@ 2024-06-05 23:22 ` Benjamin Marzinski
  2024-06-27  9:27   ` Martin Wilck
  2024-06-05 23:22 ` [PATCH 5/7] multipath-tools: Makefile.inc: compile with -fexceptions Benjamin Marzinski
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 24+ messages in thread
From: Benjamin Marzinski @ 2024-06-05 23:22 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

add arguments to dm_map_present_by_uuid() to allow optionally fetching
the device name and minor number for the devices found by WWID. These
will be used by a later patch.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/devmapper.c          | 11 ++++++++++-
 libmultipath/devmapper.h          |  2 +-
 libmultipath/libmultipath.version |  2 +-
 libmultipath/valid.c              |  2 +-
 4 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/libmultipath/devmapper.c b/libmultipath/devmapper.c
index 08bb3c51..7fe68841 100644
--- a/libmultipath/devmapper.c
+++ b/libmultipath/devmapper.c
@@ -936,7 +936,7 @@ out:
  *  -1 : error
  */
 int
-dm_map_present_by_uuid(const char *uuid)
+dm_map_present_by_uuid(const char *uuid, char **name_p, int *minor)
 {
 	struct dm_task *dmt;
 	struct dm_info info;
@@ -966,6 +966,15 @@ dm_map_present_by_uuid(const char *uuid)
 		goto out_task;
 
 	r = !!info.exists;
+	if (!r)
+		goto out_task;
+
+	if (name_p) {
+		const char *name = dm_task_get_name(dmt);
+		*name_p = (name && strlen(name)) ? strdup(name) : NULL;
+	}
+	if (minor)
+		*minor = info.minor;
 
 out_task:
 	dm_task_destroy(dmt);
diff --git a/libmultipath/devmapper.h b/libmultipath/devmapper.h
index 19b79c5b..d9c71e92 100644
--- a/libmultipath/devmapper.h
+++ b/libmultipath/devmapper.h
@@ -43,7 +43,7 @@ int dm_simplecmd_noflush (int task, const char *name, uint16_t udev_flags);
 int dm_addmap_create (struct multipath *mpp, char *params);
 int dm_addmap_reload (struct multipath *mpp, char *params, int flush);
 int dm_map_present (const char *name);
-int dm_map_present_by_uuid(const char *uuid);
+int dm_map_present_by_uuid(const char *uuid, char **name_p, int *minor);
 int dm_get_map(const char *name, unsigned long long *size, char **outparams);
 int dm_get_status(const char *name, char **outstatus);
 int dm_type(const char *name, char *type);
diff --git a/libmultipath/libmultipath.version b/libmultipath/libmultipath.version
index eb511749..600de394 100644
--- a/libmultipath/libmultipath.version
+++ b/libmultipath/libmultipath.version
@@ -43,7 +43,7 @@ LIBMPATHCOMMON_1.0.0 {
 	put_multipath_config;
 };
 
-LIBMULTIPATH_24.0.0 {
+LIBMULTIPATH_25.0.0 {
 global:
 	/* symbols referenced by multipath and multipathd */
 	add_foreign;
diff --git a/libmultipath/valid.c b/libmultipath/valid.c
index f2237787..3b060192 100644
--- a/libmultipath/valid.c
+++ b/libmultipath/valid.c
@@ -360,7 +360,7 @@ is_path_valid(const char *name, struct config *conf, struct path *pp,
 	if (check_wwids_file(pp->wwid, 0) == 0)
 		return PATH_IS_VALID_NO_CHECK;
 
-	if (dm_map_present_by_uuid(pp->wwid) == 1)
+	if (dm_map_present_by_uuid(pp->wwid, NULL, NULL) == 1)
 		return PATH_IS_VALID;
 
 	/* all these act like FIND_MULTIPATHS_STRICT for finding if a
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5/7] multipath-tools: Makefile.inc: compile with -fexceptions
  2024-06-05 23:22 [PATCH 0/7] multipath-tools: cli_add_map cleanup and misc fixes Benjamin Marzinski
                   ` (3 preceding siblings ...)
  2024-06-05 23:22 ` [PATCH 4/7] libmultipath: add name and minor outputs for dm_map_present_by_uuid() Benjamin Marzinski
@ 2024-06-05 23:22 ` Benjamin Marzinski
  2024-06-27  9:32   ` Martin Wilck
  2024-06-05 23:22 ` [PATCH 6/7] multipathd: free alias if cli_add_map() is cancelled Benjamin Marzinski
  2024-06-05 23:22 ` [PATCH 7/7] multipathd: make cli_add_map() handle adding maps by WWID correctly Benjamin Marzinski
  6 siblings, 1 reply; 24+ messages in thread
From: Benjamin Marzinski @ 2024-06-05 23:22 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

multipath is not currently running the cleanup_functions from
__attribute__((cleanup(cleanup_function))) when threads are cancelled.
To do this, it needs to be compiled with -fexceptions

https://gcc.gnu.org/onlinedocs/gcc-14.1.0/gcc/Common-Variable-Attributes.html#index-cleanup-variable-attribute
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 Makefile.inc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Makefile.inc b/Makefile.inc
index 81b86cd8..ed4a4234 100644
--- a/Makefile.inc
+++ b/Makefile.inc
@@ -105,7 +105,8 @@ CPPFLAGS	:= $(FORTIFY_OPT) $(CPPFLAGS) $(D_URCU_VERSION) \
 		   -DRUNTIME_DIR=\"$(runtimedir)\" -DCONFIG_DIR=\"$(TGTDIR)$(configdir)\" \
 		   -DDEFAULT_CONFIGFILE=\"$(TGTDIR)$(configfile)\" -DSTATE_DIR=\"$(TGTDIR)$(statedir)\" \
 		   -DEXTRAVERSION=\"$(EXTRAVERSION)\" -MMD -MP
-CFLAGS		:= -std=gnu99 $(CFLAGS) $(OPTFLAGS) $(WARNFLAGS) -pipe
+CFLAGS		:= -std=gnu99 $(CFLAGS) $(OPTFLAGS) $(WARNFLAGS) -pipe \
+		   -fexceptions
 BIN_CFLAGS	:= -fPIE -DPIE
 LIB_CFLAGS	:= -fPIC
 SHARED_FLAGS	:= -shared
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 6/7] multipathd: free alias if cli_add_map() is cancelled
  2024-06-05 23:22 [PATCH 0/7] multipath-tools: cli_add_map cleanup and misc fixes Benjamin Marzinski
                   ` (4 preceding siblings ...)
  2024-06-05 23:22 ` [PATCH 5/7] multipath-tools: Makefile.inc: compile with -fexceptions Benjamin Marzinski
@ 2024-06-05 23:22 ` Benjamin Marzinski
  2024-06-27  9:32   ` Martin Wilck
  2024-06-05 23:22 ` [PATCH 7/7] multipathd: make cli_add_map() handle adding maps by WWID correctly Benjamin Marzinski
  6 siblings, 1 reply; 24+ messages in thread
From: Benjamin Marzinski @ 2024-06-05 23:22 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

Use the cleanup attribute to make sure that alias is freed if
cli_add_map() is cancelled.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 multipathd/cli_handlers.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/multipathd/cli_handlers.c b/multipathd/cli_handlers.c
index 3dc5e690..34e0147e 100644
--- a/multipathd/cli_handlers.c
+++ b/multipathd/cli_handlers.c
@@ -699,7 +699,8 @@ cli_add_map (void * v, struct strbuf *reply, void * data)
 	char * param = get_keyparam(v, KEY_MAP);
 	int major = -1, minor = -1;
 	char dev_path[FILE_NAME_SIZE];
-	char *refwwid, *alias = NULL;
+	char *refwwid;
+	char *alias __attribute__((cleanup(cleanup_charp))) = NULL;
 	int rc, count = 0;
 	struct config *conf;
 	int invalid = 0;
@@ -748,7 +749,6 @@ cli_add_map (void * v, struct strbuf *reply, void * data)
 		return 1;
 	}
 	rc = ev_add_map(dev_path, alias, vecs);
-	free(alias);
 	return rc;
 }
 
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 7/7] multipathd: make cli_add_map() handle adding maps by WWID correctly
  2024-06-05 23:22 [PATCH 0/7] multipath-tools: cli_add_map cleanup and misc fixes Benjamin Marzinski
                   ` (5 preceding siblings ...)
  2024-06-05 23:22 ` [PATCH 6/7] multipathd: free alias if cli_add_map() is cancelled Benjamin Marzinski
@ 2024-06-05 23:22 ` Benjamin Marzinski
  2024-06-27  9:33   ` Martin Wilck
  6 siblings, 1 reply; 24+ messages in thread
From: Benjamin Marzinski @ 2024-06-05 23:22 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

cli_add_map() runs filter_wwid() on the input param as if it were a
WWID, but dm_get_major_minor() will never find the multipath device if
the user actually passes in a WWID. To handle this case, call
get_refwwid() early in the function, and use dm_map_present_by_uuid() to
check if the map exists, and find its alias and minor number.

Also, the do/while loop is unnecessarily confusing and only avoids one
repeated function call. Remove it to simplify the cli_add_map().

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/libmultipath.version |  5 +++
 multipathd/cli_handlers.c         | 57 +++++++++++--------------------
 2 files changed, 25 insertions(+), 37 deletions(-)

diff --git a/libmultipath/libmultipath.version b/libmultipath/libmultipath.version
index 600de394..fa78bf93 100644
--- a/libmultipath/libmultipath.version
+++ b/libmultipath/libmultipath.version
@@ -241,3 +241,8 @@ global:
 local:
 	*;
 };
+
+LIBMULTIPATH_25.1.0 {
+global:
+	dm_map_present_by_uuid;
+} LIBMULTIPATH_25.0.0;
diff --git a/multipathd/cli_handlers.c b/multipathd/cli_handlers.c
index 34e0147e..6016c51b 100644
--- a/multipathd/cli_handlers.c
+++ b/multipathd/cli_handlers.c
@@ -697,57 +697,40 @@ cli_add_map (void * v, struct strbuf *reply, void * data)
 {
 	struct vectors * vecs = (struct vectors *)data;
 	char * param = get_keyparam(v, KEY_MAP);
-	int major = -1, minor = -1;
 	char dev_path[FILE_NAME_SIZE];
-	char *refwwid;
+	char *refwwid __attribute__((cleanup(cleanup_charp))) = NULL;
 	char *alias __attribute__((cleanup(cleanup_charp))) = NULL;
-	int rc, count = 0;
-	struct config *conf;
-	int invalid = 0;
+	int rc, minor = -1;
 
 	param = convert_dev(param, 0);
 	condlog(2, "%s: add map (operator)", param);
 
-	conf = get_multipath_config();
-	pthread_cleanup_push(put_multipath_config, conf);
-	if (filter_wwid(conf->blist_wwid, conf->elist_wwid, param, NULL) > 0)
-		invalid = 1;
-	pthread_cleanup_pop(1);
-	if (invalid) {
+	if (get_refwwid(CMD_NONE, param, DEV_DEVMAP, vecs->pathvec, &refwwid) ==
+	    PATHINFO_SKIPPED) {
 		append_strbuf_str(reply, "blacklisted\n");
 		condlog(2, "%s: map blacklisted", param);
 		return 1;
+	} else if (!refwwid) {
+		condlog(2, "%s: unknown map.", param);
+		return -ENODEV;
 	}
-	do {
-		if (dm_get_major_minor(param, &major, &minor) < 0)
-			condlog(count ? 2 : 3,
-				"%s: not a device mapper table", param);
-		else {
-			sprintf(dev_path, "dm-%d", minor);
-			alias = dm_mapname(major, minor);
+	if (dm_map_present_by_uuid(refwwid, &alias, &minor) <= 0) {
+		condlog(3, "%s: map not present. creating", param);
+		if (coalesce_paths(vecs, NULL, refwwid, FORCE_RELOAD_NONE,
+				   CMD_NONE) != CP_OK) {
+			condlog(2, "%s: coalesce_paths failed", param);
+			return 1;
 		}
-		/*if there is no mapname found, we first create the device*/
-		if (!alias && !count) {
-			condlog(3, "%s: mapname not found for %d:%d",
-				param, major, minor);
-			get_refwwid(CMD_NONE, param, DEV_DEVMAP,
-				    vecs->pathvec, &refwwid);
-			if (refwwid) {
-				if (coalesce_paths(vecs, NULL, refwwid,
-						   FORCE_RELOAD_NONE, CMD_NONE)
-				    != CP_OK)
-					condlog(2, "%s: coalesce_paths failed",
-									param);
-				free(refwwid);
-			}
-		} /*we attempt to create device only once*/
-		count++;
-	} while (!alias && (count < 2));
-
+		if (dm_map_present_by_uuid(refwwid, &alias, &minor) <= 0) {
+			condlog(2, "%s: failed getting map", param);
+			return 1;
+		}
+	}
 	if (!alias) {
-		condlog(2, "%s: add map failed", param);
+		condlog(2, "%s: failed getting alias", param);
 		return 1;
 	}
+	sprintf(dev_path, "dm-%d", minor);
 	rc = ev_add_map(dev_path, alias, vecs);
 	return rc;
 }
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/7] multipathd: refresh multipath before calling set_no_path_retry()
  2024-06-05 23:22 ` [PATCH 3/7] multipathd: refresh multipath before calling set_no_path_retry() Benjamin Marzinski
@ 2024-06-27  7:47   ` Martin Wilck
  2024-06-27 18:32     ` Benjamin Marzinski
  0 siblings, 1 reply; 24+ messages in thread
From: Martin Wilck @ 2024-06-27  7:47 UTC (permalink / raw)
  To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development

On Wed, 2024-06-05 at 19:22 -0400, Benjamin Marzinski wrote:
> Make sure that all callers of set_no_path_retry() update the
> multipath->features string from the kernel before calling it, so that
> they have the current queueing state to check.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

I am not sure about this one. 

refresh_multipath() is not a cheap operation, requiring at least 3
device-mapper ioctls, and we hold the multipath lock while we call it,
meaning that multipathd can't take care of other important things while
it's handling this CLI request.

OTOH, the kernel only changes the features string when the map is
reloaded. Nobody should be reloading the map under normal conditions
anyway, and if they do, we would see an uevent shortly afterwards. IOW,
I think it's safe to assume that multipathd has the correct features
string cached, except in pathological situations.

Am I overlooking something? Do you have evidence of errors occuring
because multipathd has a wrong features string cached?

Regards
Martin






> ---
>  multipathd/cli_handlers.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/multipathd/cli_handlers.c b/multipathd/cli_handlers.c
> index 117570e1..3dc5e690 100644
> --- a/multipathd/cli_handlers.c
> +++ b/multipathd/cli_handlers.c
> @@ -934,6 +934,8 @@ cli_restore_queueing(void *v, struct strbuf
> *reply, void *data)
>  		pthread_cleanup_push(put_multipath_config, conf);
>  		select_no_path_retry(conf, mpp);
>  		pthread_cleanup_pop(1);
> +		if (refresh_multipath(vecs, mpp))
> +			return -ENODEV;
>  		set_no_path_retry(mpp);
>  	} else
>  		condlog(2, "%s: queueing not disabled. Nothing to
> do", mapname);
> @@ -956,6 +958,10 @@ cli_restore_all_queueing(void *v, struct strbuf
> *reply, void *data)
>  			pthread_cleanup_push(put_multipath_config,
> conf);
>  			select_no_path_retry(conf, mpp);
>  			pthread_cleanup_pop(1);
> +			if (refresh_multipath(vecs, mpp)) {
> +				i--;
> +				continue;
> +			}
>  			set_no_path_retry(mpp);
>  		} else
>  			condlog(2, "%s: queueing not disabled.
> Nothing to do",
> @@ -983,6 +989,8 @@ cli_disable_queueing(void *v, struct strbuf
> *reply, void *data)
>  	mpp->retry_tick = 0;
>  	mpp->no_path_retry = NO_PATH_RETRY_FAIL;
>  	mpp->disable_queueing = 1;
> +	if (refresh_multipath(vecs, mpp))
> +		return -ENODEV;
>  	set_no_path_retry(mpp);
>  	return 0;
>  }
> @@ -1001,6 +1009,10 @@ cli_disable_all_queueing(void *v, struct
> strbuf *reply, void *data)
>  		mpp->retry_tick = 0;
>  		mpp->no_path_retry = NO_PATH_RETRY_FAIL;
>  		mpp->disable_queueing = 1;
> +		if (refresh_multipath(vecs, mpp)) {
> +			i--;
> +			continue;
> +		}
>  		set_no_path_retry(mpp);
>  	}
>  	return 0;


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/7] libmultipath: add name and minor outputs for dm_map_present_by_uuid()
  2024-06-05 23:22 ` [PATCH 4/7] libmultipath: add name and minor outputs for dm_map_present_by_uuid() Benjamin Marzinski
@ 2024-06-27  9:27   ` Martin Wilck
  2024-06-27 18:48     ` Benjamin Marzinski
  0 siblings, 1 reply; 24+ messages in thread
From: Martin Wilck @ 2024-06-27  9:27 UTC (permalink / raw)
  To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development

On Wed, 2024-06-05 at 19:22 -0400, Benjamin Marzinski wrote:
> add arguments to dm_map_present_by_uuid() to allow optionally
> fetching
> the device name and minor number for the devices found by WWID. These
> will be used by a later patch.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

I have to say I don't like this much. At least the function name should
be changed. But actually, our handling of DM_DEVICE_INFO and
DM_DEVICE_STATUS dm tasks needs refactoring. As far as the kernel is
concerned, all dm ioctls that refer to an existing map will find the
map either by uuid, by name, or by major/minor, and the map uuid and
name will always be filled in.

I'll see if I can come up with something to clean this up.

Regards
Martin 


> ---
>  libmultipath/devmapper.c          | 11 ++++++++++-
>  libmultipath/devmapper.h          |  2 +-
>  libmultipath/libmultipath.version |  2 +-
>  libmultipath/valid.c              |  2 +-
>  4 files changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/libmultipath/devmapper.c b/libmultipath/devmapper.c
> index 08bb3c51..7fe68841 100644
> --- a/libmultipath/devmapper.c
> +++ b/libmultipath/devmapper.c
> @@ -936,7 +936,7 @@ out:
>   *  -1 : error
>   */
>  int
> -dm_map_present_by_uuid(const char *uuid)
> +dm_map_present_by_uuid(const char *uuid, char **name_p, int *minor)
>  {
>  	struct dm_task *dmt;
>  	struct dm_info info;
> @@ -966,6 +966,15 @@ dm_map_present_by_uuid(const char *uuid)
>  		goto out_task;
>  
>  	r = !!info.exists;
> +	if (!r)
> +		goto out_task;
> +
> +	if (name_p) {
> +		const char *name = dm_task_get_name(dmt);
> +		*name_p = (name && strlen(name)) ? strdup(name) :
> NULL;
> +	}
> +	if (minor)
> +		*minor = info.minor;
>  
>  out_task:
>  	dm_task_destroy(dmt);
> diff --git a/libmultipath/devmapper.h b/libmultipath/devmapper.h
> index 19b79c5b..d9c71e92 100644
> --- a/libmultipath/devmapper.h
> +++ b/libmultipath/devmapper.h
> @@ -43,7 +43,7 @@ int dm_simplecmd_noflush (int task, const char
> *name, uint16_t udev_flags);
>  int dm_addmap_create (struct multipath *mpp, char *params);
>  int dm_addmap_reload (struct multipath *mpp, char *params, int
> flush);
>  int dm_map_present (const char *name);
> -int dm_map_present_by_uuid(const char *uuid);
> +int dm_map_present_by_uuid(const char *uuid, char **name_p, int
> *minor);
>  int dm_get_map(const char *name, unsigned long long *size, char
> **outparams);
>  int dm_get_status(const char *name, char **outstatus);
>  int dm_type(const char *name, char *type);
> diff --git a/libmultipath/libmultipath.version
> b/libmultipath/libmultipath.version
> index eb511749..600de394 100644
> --- a/libmultipath/libmultipath.version
> +++ b/libmultipath/libmultipath.version
> @@ -43,7 +43,7 @@ LIBMPATHCOMMON_1.0.0 {
>  	put_multipath_config;
>  };
>  
> -LIBMULTIPATH_24.0.0 {
> +LIBMULTIPATH_25.0.0 {
>  global:
>  	/* symbols referenced by multipath and multipathd */
>  	add_foreign;
> diff --git a/libmultipath/valid.c b/libmultipath/valid.c
> index f2237787..3b060192 100644
> --- a/libmultipath/valid.c
> +++ b/libmultipath/valid.c
> @@ -360,7 +360,7 @@ is_path_valid(const char *name, struct config
> *conf, struct path *pp,
>  	if (check_wwids_file(pp->wwid, 0) == 0)
>  		return PATH_IS_VALID_NO_CHECK;
>  
> -	if (dm_map_present_by_uuid(pp->wwid) == 1)
> +	if (dm_map_present_by_uuid(pp->wwid, NULL, NULL) == 1)
>  		return PATH_IS_VALID;
>  
>  	/* all these act like FIND_MULTIPATHS_STRICT for finding if
> a


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/7] multipathd: fix flush check in flush_map()
  2024-06-05 23:22 ` [PATCH 1/7] multipathd: fix flush check in flush_map() Benjamin Marzinski
@ 2024-06-27  9:31   ` Martin Wilck
  0 siblings, 0 replies; 24+ messages in thread
From: Martin Wilck @ 2024-06-27  9:31 UTC (permalink / raw)
  To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development

On Wed, 2024-06-05 at 19:22 -0400, Benjamin Marzinski wrote:
> Forgot the comparison in the "if" statement.
> 
> Fixes 8a3898339 ("multipathd: sync features on flush_map failure
> corner case")
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Ups. The reviewer was asleep, it seems.

Reviewed-by: Martin Wilck <mwilck@suse.com>

(Btw, I am sorry for the long delay of this review).

> ---
>  multipathd/main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/multipathd/main.c b/multipathd/main.c
> index 09286dd0..58afe14a 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -813,7 +813,7 @@ flush_map(struct multipath * mpp, struct vectors
> * vecs)
>  {
>  	int r = dm_suspend_and_flush_map(mpp->alias, 0);
>  	if (r != DM_FLUSH_OK) {
> -		if (DM_FLUSH_FAIL_CANT_RESTORE)
> +		if (r == DM_FLUSH_FAIL_CANT_RESTORE)
>  			remove_feature(&mpp->features,
> "queue_if_no_path");
>  		condlog(0, "%s: can't flush", mpp->alias);
>  		return r;


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/7] libmultipath: check for not PATH_UP in detect_alua
  2024-06-05 23:22 ` [PATCH 2/7] libmultipath: check for not PATH_UP in detect_alua Benjamin Marzinski
@ 2024-06-27  9:31   ` Martin Wilck
  0 siblings, 0 replies; 24+ messages in thread
From: Martin Wilck @ 2024-06-27  9:31 UTC (permalink / raw)
  To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development

On Wed, 2024-06-05 at 19:22 -0400, Benjamin Marzinski wrote:
> Previously detect_alua was setting pp->tpgs if the path state was
> either
> PATH_UP or PATH_REMOVED. Setting pp->tpgs for PATH_REMOVED makes no
> sense.  PATH_UP is the only state that path_offline() returns where
> we
> should set pp->tpgs, so just check for that, instead of checking for
> all
> the states where we shouldn't set pp->tpgs.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Reviewed-by: Martin Wilck <mwilck@suse.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/7] multipath-tools: Makefile.inc: compile with -fexceptions
  2024-06-05 23:22 ` [PATCH 5/7] multipath-tools: Makefile.inc: compile with -fexceptions Benjamin Marzinski
@ 2024-06-27  9:32   ` Martin Wilck
  0 siblings, 0 replies; 24+ messages in thread
From: Martin Wilck @ 2024-06-27  9:32 UTC (permalink / raw)
  To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development

On Wed, 2024-06-05 at 19:22 -0400, Benjamin Marzinski wrote:
> multipath is not currently running the cleanup_functions from
> __attribute__((cleanup(cleanup_function))) when threads are
> cancelled.
> To do this, it needs to be compiled with -fexceptions
> 
> https://gcc.gnu.org/onlinedocs/gcc-14.1.0/gcc/Common-Variable-Attributes.html#index-cleanup-variable-attribute
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Reviewed-by: Martin Wilck <mwilck@suse.com>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/7] multipathd: free alias if cli_add_map() is cancelled
  2024-06-05 23:22 ` [PATCH 6/7] multipathd: free alias if cli_add_map() is cancelled Benjamin Marzinski
@ 2024-06-27  9:32   ` Martin Wilck
  0 siblings, 0 replies; 24+ messages in thread
From: Martin Wilck @ 2024-06-27  9:32 UTC (permalink / raw)
  To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development

On Wed, 2024-06-05 at 19:22 -0400, Benjamin Marzinski wrote:
> Use the cleanup attribute to make sure that alias is freed if
> cli_add_map() is cancelled.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Reviewed-by: Martin Wilck <mwilck@suse.com>
>  


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 7/7] multipathd: make cli_add_map() handle adding maps by WWID correctly
  2024-06-05 23:22 ` [PATCH 7/7] multipathd: make cli_add_map() handle adding maps by WWID correctly Benjamin Marzinski
@ 2024-06-27  9:33   ` Martin Wilck
  0 siblings, 0 replies; 24+ messages in thread
From: Martin Wilck @ 2024-06-27  9:33 UTC (permalink / raw)
  To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development

On Wed, 2024-06-05 at 19:22 -0400, Benjamin Marzinski wrote:
> 
> cli_add_map() runs filter_wwid() on the input param as if it were a
> WWID, but dm_get_major_minor() will never find the multipath device
> if
> the user actually passes in a WWID. To handle this case, call
> get_refwwid() early in the function, and use dm_map_present_by_uuid()
> to
> check if the map exists, and find its alias and minor number.
> 
> Also, the do/while loop is unnecessarily confusing and only avoids
> one
> repeated function call. Remove it to simplify the cli_add_map().
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Looks good, but it depends on 4/7, which needs some work.

Martin



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/7] multipathd: refresh multipath before calling set_no_path_retry()
  2024-06-27  7:47   ` Martin Wilck
@ 2024-06-27 18:32     ` Benjamin Marzinski
  2024-06-27 18:39       ` Benjamin Marzinski
  0 siblings, 1 reply; 24+ messages in thread
From: Benjamin Marzinski @ 2024-06-27 18:32 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development

On Thu, Jun 27, 2024 at 09:47:33AM +0200, Martin Wilck wrote:
> On Wed, 2024-06-05 at 19:22 -0400, Benjamin Marzinski wrote:
> > Make sure that all callers of set_no_path_retry() update the
> > multipath->features string from the kernel before calling it, so that
> > they have the current queueing state to check.
> > 
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> 
> I am not sure about this one. 
> 
> refresh_multipath() is not a cheap operation, requiring at least 3
> device-mapper ioctls, and we hold the multipath lock while we call it,
> meaning that multipathd can't take care of other important things while
> it's handling this CLI request.
> 
> OTOH, the kernel only changes the features string when the map is
> reloaded. Nobody should be reloading the map under normal conditions
> anyway, and if they do, we would see an uevent shortly afterwards. IOW,
> I think it's safe to assume that multipathd has the correct features
> string cached, except in pathological situations.
> 
> Am I overlooking something? Do you have evidence of errors occuring
> because multipathd has a wrong features string cached?

Well, I did run into a situation where I couldn't disable queueing
because of mpp->features being out of date. I fixed that specific issue
in patch 1/7. But in a broader context, the kernel doesn't send out any
events when it gets a message to change the queueing mode in the
features string, so there is always a possibility that something outside
of multipathd changed it, and multipathd doesn't find out about it.
Normally, check_path() will find out and resolve the issue within
seconds. However, if the multipath device lost all its path devices,
then check_path() won't ever call set_no_path_retry() on it, and nothing
will resync it with the kernel state until an event comes in.

Here's a (admittedly fairly improbable) sequence of events that would
get you into problems.

- There is a multipath device with no paths that is currently in
recovery mode waiting to timeout. It's opened by something with
outstanding IO.

- A user runs "multipath -f" to remove the device, and deligation to
multipathd times out so multipath attempts to locally remove the
device. It disables queueing and attempts the remove, which fails
because the device is still in use.

- multipathd hits the retries timeout and disables queueing.

- multipath restores queueing because the remove failed.

- more IO comes into the multipath device.

In this case, the multipath device will be stuck open due to queued
IO. The user could reasonably try to disable queueing to resolve this
issue. This will report success, but not actually do anything.

Granted, there is an unavoidable race here. It's always possible that
after multipathd updates the features string, it gets changed outside of
its control. But this cuts the window down to something small instead
of anytime after you lost all your paths. Any if you get unlucky enough
to hit that window, you can just rerun the command and it will work.

My main reason for wanting this is that cli_disable_queueing() (and
friends) is not a command you run all the time. It's a command to run
when you are cleaning up a mess. It doesn't have to be fast. It has to
work when things are stuck.

Thoughts?

-Ben

> Regards
> Martin
> 
> 
> 
> 
> 
> 
> > ---
> >  multipathd/cli_handlers.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/multipathd/cli_handlers.c b/multipathd/cli_handlers.c
> > index 117570e1..3dc5e690 100644
> > --- a/multipathd/cli_handlers.c
> > +++ b/multipathd/cli_handlers.c
> > @@ -934,6 +934,8 @@ cli_restore_queueing(void *v, struct strbuf
> > *reply, void *data)
> >  		pthread_cleanup_push(put_multipath_config, conf);
> >  		select_no_path_retry(conf, mpp);
> >  		pthread_cleanup_pop(1);
> > +		if (refresh_multipath(vecs, mpp))
> > +			return -ENODEV;
> >  		set_no_path_retry(mpp);
> >  	} else
> >  		condlog(2, "%s: queueing not disabled. Nothing to
> > do", mapname);
> > @@ -956,6 +958,10 @@ cli_restore_all_queueing(void *v, struct strbuf
> > *reply, void *data)
> >  			pthread_cleanup_push(put_multipath_config,
> > conf);
> >  			select_no_path_retry(conf, mpp);
> >  			pthread_cleanup_pop(1);
> > +			if (refresh_multipath(vecs, mpp)) {
> > +				i--;
> > +				continue;
> > +			}
> >  			set_no_path_retry(mpp);
> >  		} else
> >  			condlog(2, "%s: queueing not disabled.
> > Nothing to do",
> > @@ -983,6 +989,8 @@ cli_disable_queueing(void *v, struct strbuf
> > *reply, void *data)
> >  	mpp->retry_tick = 0;
> >  	mpp->no_path_retry = NO_PATH_RETRY_FAIL;
> >  	mpp->disable_queueing = 1;
> > +	if (refresh_multipath(vecs, mpp))
> > +		return -ENODEV;
> >  	set_no_path_retry(mpp);
> >  	return 0;
> >  }
> > @@ -1001,6 +1009,10 @@ cli_disable_all_queueing(void *v, struct
> > strbuf *reply, void *data)
> >  		mpp->retry_tick = 0;
> >  		mpp->no_path_retry = NO_PATH_RETRY_FAIL;
> >  		mpp->disable_queueing = 1;
> > +		if (refresh_multipath(vecs, mpp)) {
> > +			i--;
> > +			continue;
> > +		}
> >  		set_no_path_retry(mpp);
> >  	}
> >  	return 0;


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/7] multipathd: refresh multipath before calling set_no_path_retry()
  2024-06-27 18:32     ` Benjamin Marzinski
@ 2024-06-27 18:39       ` Benjamin Marzinski
  2024-06-27 21:31         ` Martin Wilck
  0 siblings, 1 reply; 24+ messages in thread
From: Benjamin Marzinski @ 2024-06-27 18:39 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development

On Thu, Jun 27, 2024 at 02:33:00PM -0400, Benjamin Marzinski wrote:
> On Thu, Jun 27, 2024 at 09:47:33AM +0200, Martin Wilck wrote:
> > On Wed, 2024-06-05 at 19:22 -0400, Benjamin Marzinski wrote:
> > > Make sure that all callers of set_no_path_retry() update the
> > > multipath->features string from the kernel before calling it, so that
> > > they have the current queueing state to check.
> > > 
> > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > 
> > I am not sure about this one. 
> > 
> > refresh_multipath() is not a cheap operation, requiring at least 3
> > device-mapper ioctls, and we hold the multipath lock while we call it,
> > meaning that multipathd can't take care of other important things while
> > it's handling this CLI request.
> > 
> > OTOH, the kernel only changes the features string when the map is
> > reloaded. Nobody should be reloading the map under normal conditions
> > anyway, and if they do, we would see an uevent shortly afterwards. IOW,
> > I think it's safe to assume that multipathd has the correct features
> > string cached, except in pathological situations.
> > 
> > Am I overlooking something? Do you have evidence of errors occuring
> > because multipathd has a wrong features string cached?
> 
> Well, I did run into a situation where I couldn't disable queueing
> because of mpp->features being out of date. I fixed that specific issue
> in patch 1/7. But in a broader context, the kernel doesn't send out any
> events when it gets a message to change the queueing mode in the
> features string, so there is always a possibility that something outside
> of multipathd changed it, and multipathd doesn't find out about it.
> Normally, check_path() will find out and resolve the issue within
> seconds. However, if the multipath device lost all its path devices,
> then check_path() won't ever call set_no_path_retry() on it

Err... It won't ever call update_multipath_strings(), which is
what will sync the mpp->features string with the kernel and make
set_no_path_retry() do the right thing.

> and nothing
> will resync it with the kernel state until an event comes in.
> 
> Here's a (admittedly fairly improbable) sequence of events that would
> get you into problems.
> 
> - There is a multipath device with no paths that is currently in
> recovery mode waiting to timeout. It's opened by something with
> outstanding IO.
> 
> - A user runs "multipath -f" to remove the device, and deligation to
> multipathd times out so multipath attempts to locally remove the
> device. It disables queueing and attempts the remove, which fails
> because the device is still in use.
> 
> - multipathd hits the retries timeout and disables queueing.
> 
> - multipath restores queueing because the remove failed.
> 
> - more IO comes into the multipath device.
> 
> In this case, the multipath device will be stuck open due to queued
> IO. The user could reasonably try to disable queueing to resolve this
> issue. This will report success, but not actually do anything.
because multipathd never syncs the features string with the kernel
before calling set_no_path_retry().

> 
> Granted, there is an unavoidable race here. It's always possible that
> after multipathd updates the features string, it gets changed outside of
> its control. But this cuts the window down to something small instead
> of anytime after you lost all your paths. Any if you get unlucky enough
> to hit that window, you can just rerun the command and it will work.
> 
> My main reason for wanting this is that cli_disable_queueing() (and
> friends) is not a command you run all the time. It's a command to run
> when you are cleaning up a mess. It doesn't have to be fast. It has to
> work when things are stuck.
> 
> Thoughts?
> 
> -Ben
> 
> > Regards
> > Martin
> > 
> > 
> > 
> > 
> > 
> > 
> > > ---
> > >  multipathd/cli_handlers.c | 12 ++++++++++++
> > >  1 file changed, 12 insertions(+)
> > > 
> > > diff --git a/multipathd/cli_handlers.c b/multipathd/cli_handlers.c
> > > index 117570e1..3dc5e690 100644
> > > --- a/multipathd/cli_handlers.c
> > > +++ b/multipathd/cli_handlers.c
> > > @@ -934,6 +934,8 @@ cli_restore_queueing(void *v, struct strbuf
> > > *reply, void *data)
> > >  		pthread_cleanup_push(put_multipath_config, conf);
> > >  		select_no_path_retry(conf, mpp);
> > >  		pthread_cleanup_pop(1);
> > > +		if (refresh_multipath(vecs, mpp))
> > > +			return -ENODEV;
> > >  		set_no_path_retry(mpp);
> > >  	} else
> > >  		condlog(2, "%s: queueing not disabled. Nothing to
> > > do", mapname);
> > > @@ -956,6 +958,10 @@ cli_restore_all_queueing(void *v, struct strbuf
> > > *reply, void *data)
> > >  			pthread_cleanup_push(put_multipath_config,
> > > conf);
> > >  			select_no_path_retry(conf, mpp);
> > >  			pthread_cleanup_pop(1);
> > > +			if (refresh_multipath(vecs, mpp)) {
> > > +				i--;
> > > +				continue;
> > > +			}
> > >  			set_no_path_retry(mpp);
> > >  		} else
> > >  			condlog(2, "%s: queueing not disabled.
> > > Nothing to do",
> > > @@ -983,6 +989,8 @@ cli_disable_queueing(void *v, struct strbuf
> > > *reply, void *data)
> > >  	mpp->retry_tick = 0;
> > >  	mpp->no_path_retry = NO_PATH_RETRY_FAIL;
> > >  	mpp->disable_queueing = 1;
> > > +	if (refresh_multipath(vecs, mpp))
> > > +		return -ENODEV;
> > >  	set_no_path_retry(mpp);
> > >  	return 0;
> > >  }
> > > @@ -1001,6 +1009,10 @@ cli_disable_all_queueing(void *v, struct
> > > strbuf *reply, void *data)
> > >  		mpp->retry_tick = 0;
> > >  		mpp->no_path_retry = NO_PATH_RETRY_FAIL;
> > >  		mpp->disable_queueing = 1;
> > > +		if (refresh_multipath(vecs, mpp)) {
> > > +			i--;
> > > +			continue;
> > > +		}
> > >  		set_no_path_retry(mpp);
> > >  	}
> > >  	return 0;


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/7] libmultipath: add name and minor outputs for dm_map_present_by_uuid()
  2024-06-27  9:27   ` Martin Wilck
@ 2024-06-27 18:48     ` Benjamin Marzinski
  2024-06-27 20:42       ` Martin Wilck
  0 siblings, 1 reply; 24+ messages in thread
From: Benjamin Marzinski @ 2024-06-27 18:48 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development

On Thu, Jun 27, 2024 at 11:27:41AM +0200, Martin Wilck wrote:
> On Wed, 2024-06-05 at 19:22 -0400, Benjamin Marzinski wrote:
> > add arguments to dm_map_present_by_uuid() to allow optionally
> > fetching
> > the device name and minor number for the devices found by WWID. These
> > will be used by a later patch.
> > 
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> 
> I have to say I don't like this much. At least the function name should
> be changed. But actually, our handling of DM_DEVICE_INFO and
> DM_DEVICE_STATUS dm tasks needs refactoring. As far as the kernel is
> concerned, all dm ioctls that refer to an existing map will find the
> map either by uuid, by name, or by major/minor, and the map uuid and
> name will always be filled in.
> 
> I'll see if I can come up with something to clean this up.

Do you want be to resend this patch with a name change, or will that
just happen as part of your cleanup?

-Ben

> 
> Regards
> Martin 
> 
> 
> > ---
> >  libmultipath/devmapper.c          | 11 ++++++++++-
> >  libmultipath/devmapper.h          |  2 +-
> >  libmultipath/libmultipath.version |  2 +-
> >  libmultipath/valid.c              |  2 +-
> >  4 files changed, 13 insertions(+), 4 deletions(-)
> > 
> > diff --git a/libmultipath/devmapper.c b/libmultipath/devmapper.c
> > index 08bb3c51..7fe68841 100644
> > --- a/libmultipath/devmapper.c
> > +++ b/libmultipath/devmapper.c
> > @@ -936,7 +936,7 @@ out:
> >   *  -1 : error
> >   */
> >  int
> > -dm_map_present_by_uuid(const char *uuid)
> > +dm_map_present_by_uuid(const char *uuid, char **name_p, int *minor)
> >  {
> >  	struct dm_task *dmt;
> >  	struct dm_info info;
> > @@ -966,6 +966,15 @@ dm_map_present_by_uuid(const char *uuid)
> >  		goto out_task;
> >  
> >  	r = !!info.exists;
> > +	if (!r)
> > +		goto out_task;
> > +
> > +	if (name_p) {
> > +		const char *name = dm_task_get_name(dmt);
> > +		*name_p = (name && strlen(name)) ? strdup(name) :
> > NULL;
> > +	}
> > +	if (minor)
> > +		*minor = info.minor;
> >  
> >  out_task:
> >  	dm_task_destroy(dmt);
> > diff --git a/libmultipath/devmapper.h b/libmultipath/devmapper.h
> > index 19b79c5b..d9c71e92 100644
> > --- a/libmultipath/devmapper.h
> > +++ b/libmultipath/devmapper.h
> > @@ -43,7 +43,7 @@ int dm_simplecmd_noflush (int task, const char
> > *name, uint16_t udev_flags);
> >  int dm_addmap_create (struct multipath *mpp, char *params);
> >  int dm_addmap_reload (struct multipath *mpp, char *params, int
> > flush);
> >  int dm_map_present (const char *name);
> > -int dm_map_present_by_uuid(const char *uuid);
> > +int dm_map_present_by_uuid(const char *uuid, char **name_p, int
> > *minor);
> >  int dm_get_map(const char *name, unsigned long long *size, char
> > **outparams);
> >  int dm_get_status(const char *name, char **outstatus);
> >  int dm_type(const char *name, char *type);
> > diff --git a/libmultipath/libmultipath.version
> > b/libmultipath/libmultipath.version
> > index eb511749..600de394 100644
> > --- a/libmultipath/libmultipath.version
> > +++ b/libmultipath/libmultipath.version
> > @@ -43,7 +43,7 @@ LIBMPATHCOMMON_1.0.0 {
> >  	put_multipath_config;
> >  };
> >  
> > -LIBMULTIPATH_24.0.0 {
> > +LIBMULTIPATH_25.0.0 {
> >  global:
> >  	/* symbols referenced by multipath and multipathd */
> >  	add_foreign;
> > diff --git a/libmultipath/valid.c b/libmultipath/valid.c
> > index f2237787..3b060192 100644
> > --- a/libmultipath/valid.c
> > +++ b/libmultipath/valid.c
> > @@ -360,7 +360,7 @@ is_path_valid(const char *name, struct config
> > *conf, struct path *pp,
> >  	if (check_wwids_file(pp->wwid, 0) == 0)
> >  		return PATH_IS_VALID_NO_CHECK;
> >  
> > -	if (dm_map_present_by_uuid(pp->wwid) == 1)
> > +	if (dm_map_present_by_uuid(pp->wwid, NULL, NULL) == 1)
> >  		return PATH_IS_VALID;
> >  
> >  	/* all these act like FIND_MULTIPATHS_STRICT for finding if
> > a


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/7] libmultipath: add name and minor outputs for dm_map_present_by_uuid()
  2024-06-27 18:48     ` Benjamin Marzinski
@ 2024-06-27 20:42       ` Martin Wilck
  0 siblings, 0 replies; 24+ messages in thread
From: Martin Wilck @ 2024-06-27 20:42 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: Christophe Varoqui, device-mapper development

On Thu, 2024-06-27 at 14:48 -0400, Benjamin Marzinski wrote:
> On Thu, Jun 27, 2024 at 11:27:41AM +0200, Martin Wilck wrote:
> > On Wed, 2024-06-05 at 19:22 -0400, Benjamin Marzinski wrote:
> > > add arguments to dm_map_present_by_uuid() to allow optionally
> > > fetching
> > > the device name and minor number for the devices found by WWID.
> > > These
> > > will be used by a later patch.
> > > 
> > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > 
> > I have to say I don't like this much. At least the function name
> > should
> > be changed. But actually, our handling of DM_DEVICE_INFO and
> > DM_DEVICE_STATUS dm tasks needs refactoring. As far as the kernel
> > is
> > concerned, all dm ioctls that refer to an existing map will find
> > the
> > map either by uuid, by name, or by major/minor, and the map uuid
> > and
> > name will always be filled in.
> > 
> > I'll see if I can come up with something to clean this up.
> 
> Do you want be to resend this patch with a name change, or will that
> just happen as part of your cleanup?

Please give me a day or two. I'm working on the cleanup, which has
grown quite a bit. I hope to be able to send it tomorrow.

Martin



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/7] multipathd: refresh multipath before calling set_no_path_retry()
  2024-06-27 18:39       ` Benjamin Marzinski
@ 2024-06-27 21:31         ` Martin Wilck
  2024-07-01 18:41           ` Benjamin Marzinski
  0 siblings, 1 reply; 24+ messages in thread
From: Martin Wilck @ 2024-06-27 21:31 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: Christophe Varoqui, device-mapper development

On Thu, 2024-06-27 at 14:39 -0400, Benjamin Marzinski wrote:
> On Thu, Jun 27, 2024 at 02:33:00PM -0400, Benjamin Marzinski wrote:
> > On Thu, Jun 27, 2024 at 09:47:33AM +0200, Martin Wilck wrote:
> > > On Wed, 2024-06-05 at 19:22 -0400, Benjamin Marzinski wrote:
> > > > Make sure that all callers of set_no_path_retry() update the
> > > > multipath->features string from the kernel before calling it,
> > > > so that
> > > > they have the current queueing state to check.
> > > > 
> > > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > > 
> > > I am not sure about this one. 
> > > 
> > > refresh_multipath() is not a cheap operation, requiring at least
> > > 3
> > > device-mapper ioctls, and we hold the multipath lock while we
> > > call it,
> > > meaning that multipathd can't take care of other important things
> > > while
> > > it's handling this CLI request.
> > > 
> > > OTOH, the kernel only changes the features string when the map is
> > > reloaded. Nobody should be reloading the map under normal
> > > conditions
> > > anyway, and if they do, we would see an uevent shortly
> > > afterwards. IOW,
> > > I think it's safe to assume that multipathd has the correct
> > > features
> > > string cached, except in pathological situations.
> > > 
> > > Am I overlooking something? Do you have evidence of errors
> > > occuring
> > > because multipathd has a wrong features string cached?
> > 
> > Well, I did run into a situation where I couldn't disable queueing
> > because of mpp->features being out of date. I fixed that specific
> > issue
> > in patch 1/7. But in a broader context, the kernel doesn't send out
> > any
> > events when it gets a message to change the queueing mode in the
> > features string, so there is always a possibility that something
> > outside
> > of multipathd changed it, and multipathd doesn't find out about it.

Right, and that's a general problem that can't be solved in multipath-
tools. All we can do is make sure that our own tool - multipath -
doesn't mess with the kernel state at all while multipathd is running.
If users use "dmsetup message" or the like while multipathd is running,
they're obviously on their own. Perhaps we should really disable all
code in multipath that falls back to local actions, or at least run the
fallback code only when the user adds an option like "--force".

> > Normally, check_path() will find out and resolve the issue within
> > seconds. However, if the multipath device lost all its path
> > devices,
> > then check_path() won't ever call set_no_path_retry() on it
> 
> Err... It won't ever call update_multipath_strings(), which is
> what will sync the mpp->features string with the kernel and make
> set_no_path_retry() do the right thing.

Yeah, check_path() is not the right place for handling maps with no
paths.

I have thought for a long time that the periodic checker, instead of
blindly going through the paths vec, should follow the multipath vec,
handle each path of the mpp in turn, and when that's done, act on the
map. If we did it this way, we could also handle the special case of an
empty map in the checker loop.

> > and nothing
> > will resync it with the kernel state until an event comes in.
> > 
> > Here's a (admittedly fairly improbable) sequence of events that
> > would
> > get you into problems.
> > 
> > - There is a multipath device with no paths that is currently in
> > recovery mode waiting to timeout. It's opened by something with
> > outstanding IO.
> > 
> > - A user runs "multipath -f" to remove the device, and deligation
> > to
> > multipathd times out so multipath attempts to locally remove the
> > device. It disables queueing and attempts the remove, which fails
> > because the device is still in use.
> > 
> > - multipathd hits the retries timeout and disables queueing.
> > 
> > - multipath restores queueing because the remove failed.
> > 
> > - more IO comes into the multipath device.
> > 
> > In this case, the multipath device will be stuck open due to queued
> > IO. The user could reasonably try to disable queueing to resolve
> > this
> > issue. This will report success, but not actually do anything.
> because multipathd never syncs the features string with the kernel
> before calling set_no_path_retry().

Are you suggesting that we call refresh_multipath() (or
update_multipath_table()) from set_no_path_retry()?

Before 0f850db ("multipathd: clean up set_no_path_retry"), multipathd
would always set queue_if_no_path to the value it deemed correct, and I
think this was ok, and generally race-free. IIUC, the main purpose of
0f850db was to avoid unnecessary message ioctls (which would cause
locking and flag manipulation in the kernel).

0f850db introduced the check of the current features string as cached
by multipathd. Later we added some patches to be more confident that
this cached string correctly reflects the kernel state, but we can
never by 100% sure, of course.

But if we now call refresh_multipath() every time in
set_no_path_retry(), we'll be replacing one DM ioctl (the original
dm_message()) with multiple ioctls, while still maintaining a race
window. That wouldn't make sense to me.

> > 
> > Granted, there is an unavoidable race here. It's always possible
> > that
> > after multipathd updates the features string, it gets changed
> > outside of
> > its control. But this cuts the window down to something small
> > instead
> > of anytime after you lost all your paths. Any if you get unlucky
> > enough
> > to hit that window, you can just rerun the command and it will
> > work.
> > 
> > My main reason for wanting this is that cli_disable_queueing() (and
> > friends) is not a command you run all the time. It's a command to
> > run
> > when you are cleaning up a mess. It doesn't have to be fast. It has
> > to
> > work when things are stuck.
> > 
> > Thoughts?

I remain somewhat confused how this should be handled in general; see
above. But I won't insist on rejecting your patch.

Martin


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/7] multipathd: refresh multipath before calling set_no_path_retry()
  2024-06-27 21:31         ` Martin Wilck
@ 2024-07-01 18:41           ` Benjamin Marzinski
  2024-07-02  7:34             ` Martin Wilck
  0 siblings, 1 reply; 24+ messages in thread
From: Benjamin Marzinski @ 2024-07-01 18:41 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development

On Thu, Jun 27, 2024 at 11:31:56PM +0200, Martin Wilck wrote:
> On Thu, 2024-06-27 at 14:39 -0400, Benjamin Marzinski wrote:
> > On Thu, Jun 27, 2024 at 02:33:00PM -0400, Benjamin Marzinski wrote:
> > > On Thu, Jun 27, 2024 at 09:47:33AM +0200, Martin Wilck wrote:
> > > > On Wed, 2024-06-05 at 19:22 -0400, Benjamin Marzinski wrote:


> > > Normally, check_path() will find out and resolve the issue within
> > > seconds. However, if the multipath device lost all its path
> > > devices,
> > > then check_path() won't ever call set_no_path_retry() on it
> > 
> > Err... It won't ever call update_multipath_strings(), which is
> > what will sync the mpp->features string with the kernel and make
> > set_no_path_retry() do the right thing.
> 
> Yeah, check_path() is not the right place for handling maps with no
> paths.
> 
> I have thought for a long time that the periodic checker, instead of
> blindly going through the paths vec, should follow the multipath vec,
> handle each path of the mpp in turn, and when that's done, act on the
> map. If we did it this way, we could also handle the special case of an
> empty map in the checker loop.

Not handling the multipath updated every check_path() makes a lot of
sense. I personally think it's fine, and even a good thing, to
periodically resync the daemon's mpp state with the kernel state, just
to make sure we didn't hit a corner case, since we know that they are
possible.  However, doing it once per checker interval makes more sence
than doing it once for every path per checker interval. I don't think
that buys us anything. We could even do it once every max checker
interval or something.

The only real thing that resyncing on every path check changes is that
it lowers the chance that we could think the path is up while we check
it, when in reality the kernel has already marked it down but we haven't
gotten the event yet. Most of the time when the kernel finds that the
path is down, and immediately after that, the path checker finds that
it's up, it's problem with the path checker not correctly capturing the
actual usability of the path, and we're just going to ping-pong or mark
it as marginal. So there's rarely any advantage in doing this for every
path. 


> Are you suggesting that we call refresh_multipath() (or
> update_multipath_table()) from set_no_path_retry()?

Not exactly. check_path() already calls update_multipath_strings() which
is refresh_multipath() but without the unnecessary call to
dm_get_info(). We probably are calling dm_get_info() more often than we
need to be already. But if we could remove that from
refresh_multipath(), and only call it when necessary(), we could
probably switch all the calls where we sync the state and then call
set_no_path_retry() to use a renamed setup_multipath().


> Before 0f850db ("multipathd: clean up set_no_path_retry"), multipathd
> would always set queue_if_no_path to the value it deemed correct, and I
> think this was ok, and generally race-free. IIUC, the main purpose of
> 0f850db was to avoid unnecessary message ioctls (which would cause
> locking and flag manipulation in the kernel).
> 
> 0f850db introduced the check of the current features string as cached
> by multipathd. Later we added some patches to be more confident that
> this cached string correctly reflects the kernel state, but we can
> never by 100% sure, of course.
> 
> But if we now call refresh_multipath() every time in
> set_no_path_retry(), we'll be replacing one DM ioctl (the original
> dm_message()) with multiple ioctls, while still maintaining a race
> window. That wouldn't make sense to me.

My purpose in writing 0f850db wasn't to cut down on ioctls. It was to
keep set_no_path_retry() from incorrectly handling queueing in cases
where no_path_retry was set to a number, and the device had no usable
paths.  My purpose in this patch is also to make sure
set_no_path_retry() does the right thing. Obviously, this one is more of
a corner case, and if instead we were occasionally resyncing with the
kernel even when a multipath device had no paths, that would solve this
issue, and I'd be fine with dropping this patch.

So unless it conflicts with work you're already doing, I'll send a new
patchset that changes checkerloop() to pull the multipath device
updating work out of check_path(). I'm not sure that it makes a lot of
sense to change how we loop through the paths, since check_path() does
work on some paths that aren't assigned to a multipath device. Plus
different paths in a multipath device will need checking at different
times depending on their states anyway, so you can't just go through all
the paths at once.

-Ben
 
> > > 
> > > Granted, there is an unavoidable race here. It's always possible
> > > that
> > > after multipathd updates the features string, it gets changed
> > > outside of
> > > its control. But this cuts the window down to something small
> > > instead
> > > of anytime after you lost all your paths. Any if you get unlucky
> > > enough
> > > to hit that window, you can just rerun the command and it will
> > > work.
> > > 
> > > My main reason for wanting this is that cli_disable_queueing() (and
> > > friends) is not a command you run all the time. It's a command to
> > > run
> > > when you are cleaning up a mess. It doesn't have to be fast. It has
> > > to
> > > work when things are stuck.
> > > 
> > > Thoughts?
> 
> I remain somewhat confused how this should be handled in general; see
> above. But I won't insist on rejecting your patch.
> 
> Martin


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/7] multipathd: refresh multipath before calling set_no_path_retry()
  2024-07-01 18:41           ` Benjamin Marzinski
@ 2024-07-02  7:34             ` Martin Wilck
  2024-07-02 18:10               ` Benjamin Marzinski
  0 siblings, 1 reply; 24+ messages in thread
From: Martin Wilck @ 2024-07-02  7:34 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: Christophe Varoqui, device-mapper development

On Mon, 2024-07-01 at 14:41 -0400, Benjamin Marzinski wrote:
> 
> 
> 
> My purpose in writing 0f850db wasn't to cut down on ioctls. It was to
> keep set_no_path_retry() from incorrectly handling queueing in cases
> where no_path_retry was set to a number, and the device had no usable
> paths.  My purpose in this patch is also to make sure
> set_no_path_retry() does the right thing. Obviously, this one is more
> of
> a corner case, and if instead we were occasionally resyncing with the
> kernel even when a multipath device had no paths, that would solve
> this
> issue, and I'd be fine with dropping this patch.

That sounds good.

> So unless it conflicts with work you're already doing, I'll send a
> new
> patchset that changes checkerloop() to pull the multipath device
> updating work out of check_path(). I'm not sure that it makes a lot
> of
> sense to change how we loop through the paths, since check_path()
> does
> work on some paths that aren't assigned to a multipath device. 
> Plus
> different paths in a multipath device will need checking at different
> times depending on their states anyway, so you can't just go through
> all
> the paths at once.

While I agree that syncing the map state just once per checkerloop and
map is the right thing, I believe we should do it as soon as we have
checked all paths that need to be checked for the given map. And that's
most easily done by looping over the mpath vector, like this:

for each mpp
    for each path in mpp
        if path needs to be checked:
            check path
    sync mpp

for each path
    if (!pp->mpp and path needs to be checked)
          check path

Of course we could do this in multiple steps and move the mpp sync out
of the path loop first, like you suggested. But if we do this, there
may be a considerable time delay between checking the paths of a given
map and syncing the related mpp state, especially on systems with a
large number of devices, and that might be problematic.

Path status changes are asynchronous by nature, and thus whatever we do
will not be perfect. IMO it makes sense to do all checks related to a
given map in as short a time interval as possible. This way we are most
likely to get a consistent picture of the current state of the map in
question in the kernel.

In the long run, I think what we should do is use asynchronous checkers
exclusively. And instead of waiting a millisecond for them to complete
synchronously, we should trigger the path check on all paths (that need
to be checked) in one go, then wait for all checkers to complete (this
is the difficult part, I know), and then go over all the results, again
without waiting for anything.

Another project I've been wanting to do for a long time :-/

Regards
Martin



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/7] multipathd: refresh multipath before calling set_no_path_retry()
  2024-07-02  7:34             ` Martin Wilck
@ 2024-07-02 18:10               ` Benjamin Marzinski
  2024-07-04  7:19                 ` Martin Wilck
  0 siblings, 1 reply; 24+ messages in thread
From: Benjamin Marzinski @ 2024-07-02 18:10 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development

On Tue, Jul 02, 2024 at 09:34:17AM +0200, Martin Wilck wrote:
> On Mon, 2024-07-01 at 14:41 -0400, Benjamin Marzinski wrote:
> > 
> > 
> > 
> > My purpose in writing 0f850db wasn't to cut down on ioctls. It was to
> > keep set_no_path_retry() from incorrectly handling queueing in cases
> > where no_path_retry was set to a number, and the device had no usable
> > paths.  My purpose in this patch is also to make sure
> > set_no_path_retry() does the right thing. Obviously, this one is more
> > of
> > a corner case, and if instead we were occasionally resyncing with the
> > kernel even when a multipath device had no paths, that would solve
> > this
> > issue, and I'd be fine with dropping this patch.
> 
> That sounds good.
> 
> > So unless it conflicts with work you're already doing, I'll send a
> > new
> > patchset that changes checkerloop() to pull the multipath device
> > updating work out of check_path(). I'm not sure that it makes a lot
> > of
> > sense to change how we loop through the paths, since check_path()
> > does
> > work on some paths that aren't assigned to a multipath device. 
> > Plus
> > different paths in a multipath device will need checking at different
> > times depending on their states anyway, so you can't just go through
> > all
> > the paths at once.
> 
> While I agree that syncing the map state just once per checkerloop and
> map is the right thing, I believe we should do it as soon as we have
> checked all paths that need to be checked for the given map. And that's
> most easily done by looping over the mpath vector, like this:
> 
> for each mpp
>     for each path in mpp
>         if path needs to be checked:
>             check path
>     sync mpp

Huh? Are you suggesting syncing the mpp state every second? That would
be an increase in the number of ioctls we're doing in many cases. For
instance, a 4 path device with all paths working and a default
configuration will sync the map 4 times for every 20 seconds.
Unfortunately, it's possible that all these syncs would happend in a
clump and then we'd wait 20 seconds to rapidly sync 4 times again.  An 8
path device with all paths down would sync 8 times every 5 seconds. In
this case syncing once a second would be less work, but in general this
looks like multipathd doing more ioctls. 

The primary purpose of these periodic syncs is to protect us from
external changes that don't trigger a dm event. This is a rare enough
thing that syncing every max_checkint seems often enough. So is "sync
mpp" just shorthand for "sync mpp if it needs to be synced", or is there
some benefit that I'm overlooking to sync every second?

> 
> for each path
>     if (!pp->mpp and path needs to be checked)
>           check path
> 
> Of course we could do this in multiple steps and move the mpp sync out
> of the path loop first, like you suggested. But if we do this, there
> may be a considerable time delay between checking the paths of a given
> map and syncing the related mpp state, especially on systems with a
> large number of devices, and that might be problematic.
>
> Path status changes are asynchronous by nature, and thus whatever we do
> will not be perfect. IMO it makes sense to do all checks related to a
> given map in as short a time interval as possible. This way we are most
> likely to get a consistent picture of the current state of the map in
> question in the kernel.

I'm not against your idea, but the way to code is now, even doing your
mpp loop doesn't really guarantee that. Just because all the paths get
checked every 20 seconds, doesn't mean that they will be ready to be
checked on the same second. For paths that get added after the multipath
device is created, this is often not the case. And even if the paths
start in sync, there are multiple reasons why they can drop out of sync.

But it we wanted to, we probably could fix that and space the checkers
out better at the same time. I don't think it would be that hard to make
the paths change their next check tick by a second every so often so that
all the paths in the same state from a given multipath device run their
checkers at the same time. We could even make the different multipath
devices spread out when they are checked, so that they aren't trying to
all run their checkers at once.

I'll take a stab at this in my updated patchset.

-Ben

> In the long run, I think what we should do is use asynchronous checkers
> exclusively. And instead of waiting a millisecond for them to complete
> synchronously, we should trigger the path check on all paths (that need
> to be checked) in one go, then wait for all checkers to complete (this
> is the difficult part, I know), and then go over all the results, again
> without waiting for anything.
> 
> Another project I've been wanting to do for a long time :-/
> 
> Regards
> Martin
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/7] multipathd: refresh multipath before calling set_no_path_retry()
  2024-07-02 18:10               ` Benjamin Marzinski
@ 2024-07-04  7:19                 ` Martin Wilck
  0 siblings, 0 replies; 24+ messages in thread
From: Martin Wilck @ 2024-07-04  7:19 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: Christophe Varoqui, device-mapper development

On Tue, 2024-07-02 at 14:10 -0400, Benjamin Marzinski wrote:
> On Tue, Jul 02, 2024 at 09:34:17AM +0200, Martin Wilck wrote:
> > On Mon, 2024-07-01 at 14:41 -0400, Benjamin Marzinski wrote:
> > > 
> > > 
> > > 
> > > My purpose in writing 0f850db wasn't to cut down on ioctls. It
> > > was to
> > > keep set_no_path_retry() from incorrectly handling queueing in
> > > cases
> > > where no_path_retry was set to a number, and the device had no
> > > usable
> > > paths.  My purpose in this patch is also to make sure
> > > set_no_path_retry() does the right thing. Obviously, this one is
> > > more
> > > of
> > > a corner case, and if instead we were occasionally resyncing with
> > > the
> > > kernel even when a multipath device had no paths, that would
> > > solve
> > > this
> > > issue, and I'd be fine with dropping this patch.
> > 
> > That sounds good.
> > 
> > > So unless it conflicts with work you're already doing, I'll send
> > > a
> > > new
> > > patchset that changes checkerloop() to pull the multipath device
> > > updating work out of check_path(). I'm not sure that it makes a
> > > lot
> > > of
> > > sense to change how we loop through the paths, since check_path()
> > > does
> > > work on some paths that aren't assigned to a multipath device. 
> > > Plus
> > > different paths in a multipath device will need checking at
> > > different
> > > times depending on their states anyway, so you can't just go
> > > through
> > > all
> > > the paths at once.
> > 
> > While I agree that syncing the map state just once per checkerloop
> > and
> > map is the right thing, I believe we should do it as soon as we
> > have
> > checked all paths that need to be checked for the given map. And
> > that's
> > most easily done by looping over the mpath vector, like this:
> > 
> > for each mpp
> >     for each path in mpp
> >         if path needs to be checked:
> >             check path
> >     sync mpp
> 
> Huh? Are you suggesting syncing the mpp state every second? That
> would
> be an increase in the number of ioctls we're doing in many cases.

That was not my intention.

My pseudo-code was oversimplified. We could of carry a boolean in the
inner loop that would check whether any of the path checks suggest
doing a map sync, iow, if any of the check_path() invocations 
had proceeded down to the call to update_multipath_strings(). That way
we'd necessarily do less ioctls. No?

From there, we could improve the logic / timing of the syncs further.

> I'm not against your idea, but the way to code is now, even doing
> your
> mpp loop doesn't really guarantee that. Just because all the paths
> get
> checked every 20 seconds, doesn't mean that they will be ready to be
> checked on the same second. For paths that get added after the
> multipath
> device is created, this is often not the case. And even if the paths
> start in sync, there are multiple reasons why they can drop out of
> sync.

True. I am not claiming my idea was perfectly thought through. But I
think the issues you mention could be overcome. For example, we might
check all "healthy" paths of a map at the same time. If a path goes
down and back up again, it might get out of sync with this "rhythm" the
way we handle it now, but we might as well just treat it like the other
healthy paths again.

> 
> But it we wanted to, we probably could fix that and space the
> checkers
> out better at the same time. I don't think it would be that hard to
> make
> the paths change their next check tick by a second every so often so
> that
> all the paths in the same state from a given multipath device run
> their
> checkers at the same time. We could even make the different multipath
> devices spread out when they are checked, so that they aren't trying
> to
> all run their checkers at once.

Interesting idea, even though I think that once we do truly
asynchronous path checking, that won't be much of an issue any more.

> 
> I'll take a stab at this in my updated patchset.

Ok, looking forward to it.

Martin


> 
> -Ben
> 
> > In the long run, I think what we should do is use asynchronous
> > checkers
> > exclusively. And instead of waiting a millisecond for them to
> > complete
> > synchronously, we should trigger the path check on all paths (that
> > need
> > to be checked) in one go, then wait for all checkers to complete
> > (this
> > is the difficult part, I know), and then go over all the results,
> > again
> > without waiting for anything.
> > 
> > Another project I've been wanting to do for a long time :-/
> > 
> > Regards
> > Martin
> > 
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2024-07-04  7:19 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-05 23:22 [PATCH 0/7] multipath-tools: cli_add_map cleanup and misc fixes Benjamin Marzinski
2024-06-05 23:22 ` [PATCH 1/7] multipathd: fix flush check in flush_map() Benjamin Marzinski
2024-06-27  9:31   ` Martin Wilck
2024-06-05 23:22 ` [PATCH 2/7] libmultipath: check for not PATH_UP in detect_alua Benjamin Marzinski
2024-06-27  9:31   ` Martin Wilck
2024-06-05 23:22 ` [PATCH 3/7] multipathd: refresh multipath before calling set_no_path_retry() Benjamin Marzinski
2024-06-27  7:47   ` Martin Wilck
2024-06-27 18:32     ` Benjamin Marzinski
2024-06-27 18:39       ` Benjamin Marzinski
2024-06-27 21:31         ` Martin Wilck
2024-07-01 18:41           ` Benjamin Marzinski
2024-07-02  7:34             ` Martin Wilck
2024-07-02 18:10               ` Benjamin Marzinski
2024-07-04  7:19                 ` Martin Wilck
2024-06-05 23:22 ` [PATCH 4/7] libmultipath: add name and minor outputs for dm_map_present_by_uuid() Benjamin Marzinski
2024-06-27  9:27   ` Martin Wilck
2024-06-27 18:48     ` Benjamin Marzinski
2024-06-27 20:42       ` Martin Wilck
2024-06-05 23:22 ` [PATCH 5/7] multipath-tools: Makefile.inc: compile with -fexceptions Benjamin Marzinski
2024-06-27  9:32   ` Martin Wilck
2024-06-05 23:22 ` [PATCH 6/7] multipathd: free alias if cli_add_map() is cancelled Benjamin Marzinski
2024-06-27  9:32   ` Martin Wilck
2024-06-05 23:22 ` [PATCH 7/7] multipathd: make cli_add_map() handle adding maps by WWID correctly Benjamin Marzinski
2024-06-27  9:33   ` Martin Wilck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.