* [PATCH 01/22] libmultipath: rename dm_map_present_by_wwid() and add outputs
2024-07-13 6:04 [PATCH 00/22] path checker refactor and misc fixes Benjamin Marzinski
@ 2024-07-13 6:04 ` Benjamin Marzinski
2024-07-15 10:53 ` Martin Wilck
2024-07-15 15:14 ` Martin Wilck
2024-07-13 6:04 ` [PATCH 02/22] multipathd: make cli_add_map() handle adding maps by WWID correctly Benjamin Marzinski
` (20 subsequent siblings)
21 siblings, 2 replies; 58+ messages in thread
From: Benjamin Marzinski @ 2024-07-13 6:04 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
add arguments to dm_map_present_by_wwid() to allow optionally fetching
the device name and minor number for the devices found by WWID. These
will be used by a later patch. Since it can now also be used to get the
name and minor number of a device from its WWID, rename it to
dm_find_map_by_wwid()
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
libmultipath/devmapper.c | 16 ++++++++++++----
libmultipath/devmapper.h | 2 +-
libmultipath/valid.c | 2 +-
tests/valid.c | 12 ++++++------
4 files changed, 20 insertions(+), 12 deletions(-)
diff --git a/libmultipath/devmapper.c b/libmultipath/devmapper.c
index bbbadeee..60d786c6 100644
--- a/libmultipath/devmapper.c
+++ b/libmultipath/devmapper.c
@@ -872,16 +872,24 @@ int dm_is_mpath(const char *name)
}
}
-int dm_map_present_by_wwid(const char *wwid)
+/* if name is non-NULL, it must point to an array of WWID_SIZE bytes */
+int dm_find_map_by_wwid(const char *wwid, char *name, int *minor)
{
char tmp[DM_UUID_LEN];
+ struct dm_info dmi;
+ int rc;
if (safe_sprintf(tmp, UUID_PREFIX "%s", wwid))
return DMP_ERR;
- return libmp_mapinfo(DM_MAP_BY_UUID,
- (mapid_t) { .str = tmp },
- (mapinfo_t) { .name = NULL });
+ rc = libmp_mapinfo(DM_MAP_BY_UUID,
+ (mapid_t) { .str = tmp },
+ (mapinfo_t) { .name = name,
+ .dmi = minor ? &dmi : NULL });
+ if (rc == DMP_OK && minor)
+ *minor = dmi.minor;
+
+ return rc;
}
static int dm_dev_t (const char *mapname, char *dev_t, int len)
diff --git a/libmultipath/devmapper.h b/libmultipath/devmapper.h
index c28991fb..1de694c9 100644
--- a/libmultipath/devmapper.h
+++ b/libmultipath/devmapper.h
@@ -132,7 +132,7 @@ int dm_simplecmd_flush (int task, const char *name, uint16_t udev_flags);
int dm_simplecmd_noflush (int task, const char *name, uint16_t udev_flags);
int dm_addmap_create (struct multipath *mpp, char *params);
int dm_addmap_reload (struct multipath *mpp, char *params, int flush);
-int dm_map_present_by_wwid(const char *uuid);
+int dm_find_map_by_wwid(const char *wwid, char *name, int *minor);
enum {
DM_IS_MPATH_NO,
diff --git a/libmultipath/valid.c b/libmultipath/valid.c
index 9267cef9..b7e0cc9b 100644
--- a/libmultipath/valid.c
+++ b/libmultipath/valid.c
@@ -360,7 +360,7 @@ is_path_valid(const char *name, struct config *conf, struct path *pp,
if (check_wwids_file(pp->wwid, 0) == 0)
return PATH_IS_VALID_NO_CHECK;
- if (dm_map_present_by_wwid(pp->wwid) == DMP_OK)
+ if (dm_find_map_by_wwid(pp->wwid, NULL, NULL) == DMP_OK)
return PATH_IS_VALID;
/* all these act like FIND_MULTIPATHS_STRICT for finding if a
diff --git a/tests/valid.c b/tests/valid.c
index a93bbe50..d9325f8b 100644
--- a/tests/valid.c
+++ b/tests/valid.c
@@ -189,10 +189,10 @@ int __wrap_check_wwids_file(char *wwid, int write_wwid)
return -1;
}
-int __wrap_dm_map_present_by_wwid(const char *uuid)
+int __wrap_dm_find_map_by_wwid(const char *wwid, char *name, int *minor)
{
int ret = mock_type(int);
- assert_string_equal(uuid, mock_ptr_type(char *));
+ assert_string_equal(wwid, mock_ptr_type(char *));
return ret;
}
@@ -271,8 +271,8 @@ static void setup_passing(char *name, char *wwid, unsigned int check_multipathd,
will_return(__wrap_check_wwids_file, wwid);
if (stage == STAGE_CHECK_WWIDS)
return;
- will_return(__wrap_dm_map_present_by_wwid, 0);
- will_return(__wrap_dm_map_present_by_wwid, wwid);
+ will_return(__wrap_dm_find_map_by_wwid, 0);
+ will_return(__wrap_dm_find_map_by_wwid, wwid);
}
static void test_bad_arguments(void **state)
@@ -516,8 +516,8 @@ static void test_check_uuid_present(void **state)
memset(&pp, 0, sizeof(pp));
conf.find_multipaths = FIND_MULTIPATHS_STRICT;
setup_passing(name, wwid, CHECK_MPATHD_RUNNING, STAGE_CHECK_WWIDS);
- will_return(__wrap_dm_map_present_by_wwid, 1);
- will_return(__wrap_dm_map_present_by_wwid, wwid);
+ will_return(__wrap_dm_find_map_by_wwid, 1);
+ will_return(__wrap_dm_find_map_by_wwid, wwid);
assert_int_equal(is_path_valid(name, &conf, &pp, true),
PATH_IS_VALID);
assert_string_equal(pp.dev, name);
--
2.45.0
^ permalink raw reply related [flat|nested] 58+ messages in thread* Re: [PATCH 01/22] libmultipath: rename dm_map_present_by_wwid() and add outputs
2024-07-13 6:04 ` [PATCH 01/22] libmultipath: rename dm_map_present_by_wwid() and add outputs Benjamin Marzinski
@ 2024-07-15 10:53 ` Martin Wilck
2024-07-15 15:51 ` Martin Wilck
2024-07-15 15:14 ` Martin Wilck
1 sibling, 1 reply; 58+ messages in thread
From: Martin Wilck @ 2024-07-15 10:53 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
Hi Ben,
On Sat, 2024-07-13 at 02:04 -0400, Benjamin Marzinski wrote:
> add arguments to dm_map_present_by_wwid() to allow optionally
> fetching
> the device name and minor number for the devices found by WWID.
> These
> will be used by a later patch. Since it can now also be used to get
> the
> name and minor number of a device from its WWID, rename it to
> dm_find_map_by_wwid()
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Is this supposed to apply over my v1 patchseries? That patch series
needed major amends, and I'm getting merge conflicts trying to apply it
on v2. I can untangle that, but it'd be easier to figure out if we'd
setted on a final reviewed version of my patch set.
Regards
Martin
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 01/22] libmultipath: rename dm_map_present_by_wwid() and add outputs
2024-07-15 10:53 ` Martin Wilck
@ 2024-07-15 15:51 ` Martin Wilck
0 siblings, 0 replies; 58+ messages in thread
From: Martin Wilck @ 2024-07-15 15:51 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Mon, 2024-07-15 at 12:53 +0200, Martin Wilck wrote:
> Hi Ben,
>
> On Sat, 2024-07-13 at 02:04 -0400, Benjamin Marzinski wrote:
> > add arguments to dm_map_present_by_wwid() to allow optionally
> > fetching
> > the device name and minor number for the devices found by WWID.
> > These
> > will be used by a later patch. Since it can now also be used to get
> > the
> > name and minor number of a device from its WWID, rename it to
> > dm_find_map_by_wwid()
> >
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
>
> Is this supposed to apply over my v1 patchseries? That patch series
> needed major amends, and I'm getting merge conflicts trying to apply
> it
> on v2. I can untangle that, but it'd be easier to figure out if we'd
> setted on a final reviewed version of my patch set.
Never mind, my email filter seems to have swallowed part of the series.
The full series applies cleanly to my v2 tree.
Martin
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 01/22] libmultipath: rename dm_map_present_by_wwid() and add outputs
2024-07-13 6:04 ` [PATCH 01/22] libmultipath: rename dm_map_present_by_wwid() and add outputs Benjamin Marzinski
2024-07-15 10:53 ` Martin Wilck
@ 2024-07-15 15:14 ` Martin Wilck
2024-07-16 14:22 ` Benjamin Marzinski
1 sibling, 1 reply; 58+ messages in thread
From: Martin Wilck @ 2024-07-15 15:14 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Sat, 2024-07-13 at 02:04 -0400, Benjamin Marzinski wrote:
> add arguments to dm_map_present_by_wwid() to allow optionally
> fetching
> the device name and minor number for the devices found by WWID.
> These
> will be used by a later patch. Since it can now also be used to get
> the
> name and minor number of a device from its WWID, rename it to
> dm_find_map_by_wwid()
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
> libmultipath/devmapper.c | 16 ++++++++++++----
> libmultipath/devmapper.h | 2 +-
> libmultipath/valid.c | 2 +-
> tests/valid.c | 12 ++++++------
> 4 files changed, 20 insertions(+), 12 deletions(-)
>
> diff --git a/libmultipath/devmapper.h b/libmultipath/devmapper.h
> index c28991fb..1de694c9 100644
> --- a/libmultipath/devmapper.h
> +++ b/libmultipath/devmapper.h
> @@ -132,7 +132,7 @@ int dm_simplecmd_flush (int task, const char
> *name, uint16_t udev_flags);
> int dm_simplecmd_noflush (int task, const char *name, uint16_t
> udev_flags);
> int dm_addmap_create (struct multipath *mpp, char *params);
> int dm_addmap_reload (struct multipath *mpp, char *params, int
> flush);
> -int dm_map_present_by_wwid(const char *uuid);
> +int dm_find_map_by_wwid(const char *wwid, char *name, int *minor);
>
Nitpick: According to my personal taste, it would be cleaner to pass in
the entire dm_info structure rather than just the minor number. One day
we may need other fields as well.
Martin
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 01/22] libmultipath: rename dm_map_present_by_wwid() and add outputs
2024-07-15 15:14 ` Martin Wilck
@ 2024-07-16 14:22 ` Benjamin Marzinski
0 siblings, 0 replies; 58+ messages in thread
From: Benjamin Marzinski @ 2024-07-16 14:22 UTC (permalink / raw)
To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development
On Mon, Jul 15, 2024 at 05:14:36PM +0200, Martin Wilck wrote:
> On Sat, 2024-07-13 at 02:04 -0400, Benjamin Marzinski wrote:
> > add arguments to dm_map_present_by_wwid() to allow optionally
> > fetching
> > the device name and minor number for the devices found by WWID.
> > These
> > will be used by a later patch. Since it can now also be used to get
> > the
> > name and minor number of a device from its WWID, rename it to
> > dm_find_map_by_wwid()
> >
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> > libmultipath/devmapper.c | 16 ++++++++++++----
> > libmultipath/devmapper.h | 2 +-
> > libmultipath/valid.c | 2 +-
> > tests/valid.c | 12 ++++++------
> > 4 files changed, 20 insertions(+), 12 deletions(-)
> >
> > diff --git a/libmultipath/devmapper.h b/libmultipath/devmapper.h
> > index c28991fb..1de694c9 100644
> > --- a/libmultipath/devmapper.h
> > +++ b/libmultipath/devmapper.h
> > @@ -132,7 +132,7 @@ int dm_simplecmd_flush (int task, const char
> > *name, uint16_t udev_flags);
> > int dm_simplecmd_noflush (int task, const char *name, uint16_t
> > udev_flags);
> > int dm_addmap_create (struct multipath *mpp, char *params);
> > int dm_addmap_reload (struct multipath *mpp, char *params, int
> > flush);
> > -int dm_map_present_by_wwid(const char *uuid);
> > +int dm_find_map_by_wwid(const char *wwid, char *name, int *minor);
> >
>
> Nitpick: According to my personal taste, it would be cleaner to pass in
> the entire dm_info structure rather than just the minor number. One day
> we may need other fields as well.
Sure.
>
> Martin
^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH 02/22] multipathd: make cli_add_map() handle adding maps by WWID correctly
2024-07-13 6:04 [PATCH 00/22] path checker refactor and misc fixes Benjamin Marzinski
2024-07-13 6:04 ` [PATCH 01/22] libmultipath: rename dm_map_present_by_wwid() and add outputs Benjamin Marzinski
@ 2024-07-13 6:04 ` Benjamin Marzinski
2024-07-13 6:04 ` [PATCH 03/22] multipathd: remove checker restart optimization Benjamin Marzinski
` (19 subsequent siblings)
21 siblings, 0 replies; 58+ messages in thread
From: Benjamin Marzinski @ 2024-07-13 6:04 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
cli_add_map() runs filter_wwid() on the input param as if it were a
WWID, but dm_get_major_minor() will never find the multipath device if
the user actually passes in a WWID. To handle this case, call
get_refwwid() early in the function, and use dm_find_map_by_wwid() to
check if the map exists, and find its alias and minor number.
Also, the do/while loop is unnecessarily confusing and only avoids one
repeated function call. Remove it to simplify cli_add_map().
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
libmultipath/libmultipath.version | 5 +++
multipathd/cli_handlers.c | 59 ++++++++++---------------------
2 files changed, 24 insertions(+), 40 deletions(-)
diff --git a/libmultipath/libmultipath.version b/libmultipath/libmultipath.version
index 959f675e..d074032b 100644
--- a/libmultipath/libmultipath.version
+++ b/libmultipath/libmultipath.version
@@ -242,3 +242,8 @@ global:
local:
*;
};
+
+LIBMULTIPATH_25.1.0 {
+global:
+ dm_find_map_by_wwid;
+} LIBMULTIPATH_25.0.0;
diff --git a/multipathd/cli_handlers.c b/multipathd/cli_handlers.c
index 0106213e..0643be15 100644
--- a/multipathd/cli_handlers.c
+++ b/multipathd/cli_handlers.c
@@ -697,57 +697,36 @@ cli_add_map (void * v, struct strbuf *reply, void * data)
{
struct vectors * vecs = (struct vectors *)data;
char * param = get_keyparam(v, KEY_MAP);
- int major = -1, minor = -1;
char dev_path[FILE_NAME_SIZE];
- char *refwwid;
- char *alias __attribute__((cleanup(cleanup_charp))) = NULL;
- int rc, count = 0;
- struct config *conf;
- int invalid = 0;
+ char *refwwid __attribute__((cleanup(cleanup_charp))) = NULL;
+ char alias[WWID_SIZE];
+ int rc, minor = -1;
param = convert_dev(param, 0);
condlog(2, "%s: add map (operator)", param);
- conf = get_multipath_config();
- pthread_cleanup_push(put_multipath_config, conf);
- if (filter_wwid(conf->blist_wwid, conf->elist_wwid, param, NULL) > 0)
- invalid = 1;
- pthread_cleanup_pop(1);
- if (invalid) {
+ if (get_refwwid(CMD_NONE, param, DEV_DEVMAP, vecs->pathvec, &refwwid) ==
+ PATHINFO_SKIPPED) {
append_strbuf_str(reply, "blacklisted\n");
condlog(2, "%s: map blacklisted", param);
return 1;
+ } else if (!refwwid) {
+ condlog(2, "%s: unknown map.", param);
+ return -ENODEV;
}
- do {
- if (dm_get_major_minor(param, &major, &minor) < 0)
- condlog(count ? 2 : 3,
- "%s: not a device mapper table", param);
- else {
- sprintf(dev_path, "dm-%d", minor);
- alias = dm_mapname(major, minor);
+ if (dm_find_map_by_wwid(refwwid, alias, &minor) != DMP_OK) {
+ condlog(3, "%s: map not present. creating", param);
+ if (coalesce_paths(vecs, NULL, refwwid, FORCE_RELOAD_NONE,
+ CMD_NONE) != CP_OK) {
+ condlog(2, "%s: coalesce_paths failed", param);
+ return 1;
+ }
+ if (dm_find_map_by_wwid(refwwid, alias, &minor) != DMP_OK) {
+ condlog(2, "%s: failed getting map", param);
+ return 1;
}
- /*if there is no mapname found, we first create the device*/
- if (!alias && !count) {
- condlog(3, "%s: mapname not found for %d:%d",
- param, major, minor);
- get_refwwid(CMD_NONE, param, DEV_DEVMAP,
- vecs->pathvec, &refwwid);
- if (refwwid) {
- if (coalesce_paths(vecs, NULL, refwwid,
- FORCE_RELOAD_NONE, CMD_NONE)
- != CP_OK)
- condlog(2, "%s: coalesce_paths failed",
- param);
- free(refwwid);
- }
- } /*we attempt to create device only once*/
- count++;
- } while (!alias && (count < 2));
-
- if (!alias) {
- condlog(2, "%s: add map failed", param);
- return 1;
}
+ sprintf(dev_path, "dm-%d", minor);
rc = ev_add_map(dev_path, alias, vecs);
return rc;
}
--
2.45.0
^ permalink raw reply related [flat|nested] 58+ messages in thread* [PATCH 03/22] multipathd: remove checker restart optimization
2024-07-13 6:04 [PATCH 00/22] path checker refactor and misc fixes Benjamin Marzinski
2024-07-13 6:04 ` [PATCH 01/22] libmultipath: rename dm_map_present_by_wwid() and add outputs Benjamin Marzinski
2024-07-13 6:04 ` [PATCH 02/22] multipathd: make cli_add_map() handle adding maps by WWID correctly Benjamin Marzinski
@ 2024-07-13 6:04 ` Benjamin Marzinski
2024-07-13 6:04 ` [PATCH 04/22] multipathd: refactor path state getting code into a helper Benjamin Marzinski
` (18 subsequent siblings)
21 siblings, 0 replies; 58+ messages in thread
From: Benjamin Marzinski @ 2024-07-13 6:04 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
Future commits will make this optimization unusable.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
multipathd/main.c | 27 +++++----------------------
1 file changed, 5 insertions(+), 22 deletions(-)
diff --git a/multipathd/main.c b/multipathd/main.c
index 386cd076..0b0ebc8a 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2704,7 +2704,7 @@ checkerloop (void *ap)
while (1) {
struct timespec diff_time, start_time, end_time;
- int num_paths = 0, strict_timing, rc = 0, i = 0;
+ int num_paths = 0, strict_timing, rc = 0;
unsigned int ticks = 0;
enum checker_state checker_state = CHECKER_STARTING;
@@ -2723,7 +2723,7 @@ checkerloop (void *ap)
sd_notify(0, "WATCHDOG=1");
#endif
while (checker_state != CHECKER_FINISHED) {
- unsigned int paths_checked = 0;
+ unsigned int paths_checked = 0, i;
struct timespec chk_start_time;
pthread_cleanup_push(cleanup_lock, &vecs->lock);
@@ -2733,28 +2733,11 @@ checkerloop (void *ap)
if (checker_state == CHECKER_STARTING) {
vector_foreach_slot(vecs->pathvec, pp, i)
pp->is_checked = false;
- i = 0;
checker_state = CHECKER_RUNNING;
- } else {
- /*
- * Paths could have been removed since we
- * dropped the lock. Find the path to continue
- * checking at. Since paths can be removed from
- * anywhere in the vector, but can only be added
- * at the end, the last checked path must be
- * between its old location, and the start or
- * the vector.
- */
- if (i >= VECTOR_SIZE(vecs->pathvec))
- i = VECTOR_SIZE(vecs->pathvec) - 1;
- while ((pp = VECTOR_SLOT(vecs->pathvec, i))) {
- if (pp->is_checked == true)
- break;
- i--;
- }
- i++;
}
- vector_foreach_slot_after (vecs->pathvec, pp, i) {
+ vector_foreach_slot(vecs->pathvec, pp, i) {
+ if (pp->is_checked)
+ continue;
pp->is_checked = true;
rc = check_path(vecs, pp, ticks);
if (rc < 0) {
--
2.45.0
^ permalink raw reply related [flat|nested] 58+ messages in thread* [PATCH 04/22] multipathd: refactor path state getting code into a helper
2024-07-13 6:04 [PATCH 00/22] path checker refactor and misc fixes Benjamin Marzinski
` (2 preceding siblings ...)
2024-07-13 6:04 ` [PATCH 03/22] multipathd: remove checker restart optimization Benjamin Marzinski
@ 2024-07-13 6:04 ` Benjamin Marzinski
2024-07-15 15:19 ` Martin Wilck
2024-07-13 6:04 ` [PATCH 05/22] multipathd: handle uninitialized paths in new function Benjamin Marzinski
` (17 subsequent siblings)
21 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2024-07-13 6:04 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
Pull the code that gets the new path state out into a helper function
named check_path_state(), in preparation for splittig check_path()
into two functions.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
multipathd/main.c | 64 ++++++++++++++++++++++++++++-------------------
1 file changed, 38 insertions(+), 26 deletions(-)
diff --git a/multipathd/main.c b/multipathd/main.c
index 0b0ebc8a..9b1b5226 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2305,6 +2305,43 @@ should_skip_path(struct path *pp){
return 0;
}
+static int
+check_path_state(struct path *pp)
+{
+ int newstate;
+ struct config *conf;
+
+ newstate = path_offline(pp);
+ if (newstate == PATH_UP) {
+ conf = get_multipath_config();
+ pthread_cleanup_push(put_multipath_config, conf);
+ newstate = get_state(pp, conf, 1, newstate);
+ pthread_cleanup_pop(1);
+ } else {
+ checker_clear_message(&pp->checker);
+ condlog(3, "%s: state %s, checker not called",
+ pp->dev, checker_state_name(newstate));
+ }
+ /*
+ * Wait for uevent for removed paths;
+ * some LLDDs like zfcp keep paths unavailable
+ * without sending uevents.
+ */
+ if (newstate == PATH_REMOVED)
+ newstate = PATH_DOWN;
+
+ if (newstate == PATH_WILD || newstate == PATH_UNCHECKED) {
+ condlog(2, "%s: unusable path (%s) - checker failed",
+ pp->dev, checker_state_name(newstate));
+ LOG_MSG(2, pp);
+ conf = get_multipath_config();
+ pthread_cleanup_push(put_multipath_config, conf);
+ pathinfo(pp, conf, 0);
+ pthread_cleanup_pop(1);
+ }
+ return newstate;
+}
+
/*
* Returns '1' if the path has been checked, '-1' if it was blacklisted
* and '0' otherwise
@@ -2384,33 +2421,8 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
*/
pp->tick = checkint;
- newstate = path_offline(pp);
- if (newstate == PATH_UP) {
- conf = get_multipath_config();
- pthread_cleanup_push(put_multipath_config, conf);
- newstate = get_state(pp, conf, 1, newstate);
- pthread_cleanup_pop(1);
- } else {
- checker_clear_message(&pp->checker);
- condlog(3, "%s: state %s, checker not called",
- pp->dev, checker_state_name(newstate));
- }
- /*
- * Wait for uevent for removed paths;
- * some LLDDs like zfcp keep paths unavailable
- * without sending uevents.
- */
- if (newstate == PATH_REMOVED)
- newstate = PATH_DOWN;
-
+ newstate = check_path_state(pp);
if (newstate == PATH_WILD || newstate == PATH_UNCHECKED) {
- condlog(2, "%s: unusable path (%s) - checker failed",
- pp->dev, checker_state_name(newstate));
- LOG_MSG(2, pp);
- conf = get_multipath_config();
- pthread_cleanup_push(put_multipath_config, conf);
- pathinfo(pp, conf, 0);
- pthread_cleanup_pop(1);
return 1;
} else if ((newstate != PATH_UP && newstate != PATH_GHOST &&
newstate != PATH_PENDING) && (pp->state == PATH_DELAYED)) {
--
2.45.0
^ permalink raw reply related [flat|nested] 58+ messages in thread* Re: [PATCH 04/22] multipathd: refactor path state getting code into a helper
2024-07-13 6:04 ` [PATCH 04/22] multipathd: refactor path state getting code into a helper Benjamin Marzinski
@ 2024-07-15 15:19 ` Martin Wilck
2024-07-15 16:07 ` Martin Wilck
0 siblings, 1 reply; 58+ messages in thread
From: Martin Wilck @ 2024-07-15 15:19 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Sat, 2024-07-13 at 02:04 -0400, Benjamin Marzinski wrote:
> Pull the code that gets the new path state out into a helper function
> named check_path_state(), in preparation for splittig check_path()
> into two functions.
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
> multipathd/main.c | 64 ++++++++++++++++++++++++++++-----------------
> --
> 1 file changed, 38 insertions(+), 26 deletions(-)
>
> -
> + newstate = check_path_state(pp);
> if (newstate == PATH_WILD || newstate == PATH_UNCHECKED) {
> - condlog(2, "%s: unusable path (%s) - checker
> failed",
> - pp->dev, checker_state_name(newstate));
> - LOG_MSG(2, pp);
> - conf = get_multipath_config();
> - pthread_cleanup_push(put_multipath_config, conf);
> - pathinfo(pp, conf, 0);
> - pthread_cleanup_pop(1);
> return 1;
Nit: Remove braces here?
> } else if ((newstate != PATH_UP && newstate != PATH_GHOST &&
> newstate != PATH_PENDING) && (pp->state ==
> PATH_DELAYED)) {
Thanks,
Martin
^ permalink raw reply [flat|nested] 58+ messages in thread* Re: [PATCH 04/22] multipathd: refactor path state getting code into a helper
2024-07-15 15:19 ` Martin Wilck
@ 2024-07-15 16:07 ` Martin Wilck
0 siblings, 0 replies; 58+ messages in thread
From: Martin Wilck @ 2024-07-15 16:07 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Mon, 2024-07-15 at 17:19 +0200, Martin Wilck wrote:
> On Sat, 2024-07-13 at 02:04 -0400, Benjamin Marzinski wrote:
> > return 1;
>
> Nit: Remove braces here?
Ok, you did this later on.
Reviewed-by: Martin Wilck <mwilck@suse.com>
^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH 05/22] multipathd: handle uninitialized paths in new function
2024-07-13 6:04 [PATCH 00/22] path checker refactor and misc fixes Benjamin Marzinski
` (3 preceding siblings ...)
2024-07-13 6:04 ` [PATCH 04/22] multipathd: refactor path state getting code into a helper Benjamin Marzinski
@ 2024-07-13 6:04 ` Benjamin Marzinski
2024-07-15 15:34 ` Martin Wilck
2024-07-13 6:04 ` [PATCH 06/22] multipathd: make check_path() static Benjamin Marzinski
` (16 subsequent siblings)
21 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2024-07-13 6:04 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
Pull the code to handle uninitialized paths out of check_path() and into
a new function called handle_uninitialized_paths(). This should cause
no functional changes.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
multipathd/main.c | 171 ++++++++++++++++++++++++++++------------------
1 file changed, 105 insertions(+), 66 deletions(-)
diff --git a/multipathd/main.c b/multipathd/main.c
index 9b1b5226..9e329e89 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2343,8 +2343,7 @@ check_path_state(struct path *pp)
}
/*
- * Returns '1' if the path has been checked, '-1' if it was blacklisted
- * and '0' otherwise
+ * Returns '1' if the path has been checked and '0' otherwise
*/
int
check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
@@ -2354,16 +2353,13 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
int chkr_new_path_up = 0;
int disable_reinstate = 0;
int oldchkrstate = pp->chkrstate;
- int retrigger_tries;
unsigned int checkint, max_checkint;
struct config *conf;
int marginal_pathgroups, marginal_changed = 0;
int ret;
bool need_reload;
- if (((pp->initialized == INIT_OK || pp->initialized == INIT_PARTIAL ||
- pp->initialized == INIT_REQUESTED_UDEV) && !pp->mpp) ||
- pp->initialized == INIT_REMOVED)
+ if (pp->initialized == INIT_REMOVED)
return 0;
if (pp->tick)
@@ -2372,7 +2368,6 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
return 0; /* don't check this path yet */
conf = get_multipath_config();
- retrigger_tries = conf->retrigger_tries;
checkint = conf->checkint;
max_checkint = conf->max_checkint;
marginal_pathgroups = conf->marginal_pathgroups;
@@ -2383,38 +2378,6 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
pp->checkint = checkint;
};
- if (!pp->mpp && pp->initialized == INIT_MISSING_UDEV) {
- if (pp->retriggers < retrigger_tries) {
- static const char change[] = "change";
- ssize_t ret;
-
- condlog(2, "%s: triggering change event to reinitialize",
- pp->dev);
- pp->initialized = INIT_REQUESTED_UDEV;
- pp->retriggers++;
- ret = sysfs_attr_set_value(pp->udev, "uevent", change,
- sizeof(change) - 1);
- if (ret != sizeof(change) - 1)
- log_sysfs_attr_set_value(1, ret,
- "%s: failed to trigger change event",
- pp->dev);
- return 0;
- } else {
- condlog(1, "%s: not initialized after %d udev retriggers",
- pp->dev, retrigger_tries);
- /*
- * Make sure that the "add missing path" code path
- * below may reinstate the path later, if it ever
- * comes up again.
- * The WWID needs not be cleared; if it was set, the
- * state hadn't been INIT_MISSING_UDEV in the first
- * place.
- */
- pp->initialized = INIT_FAILED;
- return 0;
- }
- }
-
/*
* provision a next check soonest,
* in case we exit abnormally from here
@@ -2435,32 +2398,6 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
pp->checkint = checkint;
return 1;
}
- if (!pp->mpp) {
- if (!strlen(pp->wwid) &&
- (pp->initialized == INIT_FAILED ||
- pp->initialized == INIT_NEW) &&
- (newstate == PATH_UP || newstate == PATH_GHOST)) {
- condlog(2, "%s: add missing path", pp->dev);
- conf = get_multipath_config();
- pthread_cleanup_push(put_multipath_config, conf);
- ret = pathinfo(pp, conf, DI_ALL | DI_BLACKLIST);
- pthread_cleanup_pop(1);
- /* INIT_OK implies ret == PATHINFO_OK */
- if (pp->initialized == INIT_OK) {
- ev_add_path(pp, vecs, 1);
- pp->tick = 1;
- } else {
- if (ret == PATHINFO_SKIPPED)
- return -1;
- /*
- * We failed multiple times to initialize this
- * path properly. Don't re-check too often.
- */
- pp->checkint = max_checkint;
- }
- }
- return 0;
- }
/*
* Async IO in flight. Keep the previous path state
* and reschedule as soon as possible
@@ -2679,6 +2616,104 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
return 1;
}
+/*
+ * Returns -1 if the path was blacklisted, and 0 otherwise
+ */
+static int
+handle_uninitialized_path(struct vectors * vecs, struct path * pp,
+ unsigned int ticks)
+{
+ int newstate;
+ int retrigger_tries;
+ unsigned int checkint, max_checkint;
+ struct config *conf;
+ int ret;
+
+ if (((pp->initialized == INIT_OK || pp->initialized == INIT_PARTIAL ||
+ pp->initialized == INIT_REQUESTED_UDEV) && !pp->mpp) ||
+ pp->initialized == INIT_REMOVED)
+ return 0;
+
+ if (pp->tick)
+ pp->tick -= (pp->tick > ticks) ? ticks : pp->tick;
+ if (pp->tick)
+ return 0; /* don't check this path yet */
+
+ conf = get_multipath_config();
+ retrigger_tries = conf->retrigger_tries;
+ checkint = conf->checkint;
+ max_checkint = conf->max_checkint;
+ put_multipath_config(conf);
+
+ if (!pp->mpp && pp->initialized == INIT_MISSING_UDEV) {
+ if (pp->retriggers < retrigger_tries) {
+ static const char change[] = "change";
+ ssize_t ret;
+
+ condlog(2, "%s: triggering change event to reinitialize",
+ pp->dev);
+ pp->initialized = INIT_REQUESTED_UDEV;
+ pp->retriggers++;
+ ret = sysfs_attr_set_value(pp->udev, "uevent", change,
+ sizeof(change) - 1);
+ if (ret != sizeof(change) - 1)
+ log_sysfs_attr_set_value(1, ret,
+ "%s: failed to trigger change event",
+ pp->dev);
+ return 0;
+ } else {
+ condlog(1, "%s: not initialized after %d udev retriggers",
+ pp->dev, retrigger_tries);
+ /*
+ * Make sure that the "add missing path" code path
+ * below may reinstate the path later, if it ever
+ * comes up again.
+ * The WWID needs not be cleared; if it was set, the
+ * state hadn't been INIT_MISSING_UDEV in the first
+ * place.
+ */
+ pp->initialized = INIT_FAILED;
+ return 0;
+ }
+ }
+
+ /*
+ * provision a next check soonest,
+ * in case we exit abnormally from here
+ */
+ pp->tick = checkint;
+
+ newstate = check_path_state(pp);
+
+ if (!pp->mpp) {
+ if (!strlen(pp->wwid) &&
+ (pp->initialized == INIT_FAILED ||
+ pp->initialized == INIT_NEW) &&
+ (newstate == PATH_UP || newstate == PATH_GHOST)) {
+ condlog(2, "%s: add missing path", pp->dev);
+ conf = get_multipath_config();
+ pthread_cleanup_push(put_multipath_config, conf);
+ ret = pathinfo(pp, conf, DI_ALL | DI_BLACKLIST);
+ pthread_cleanup_pop(1);
+ /* INIT_OK implies ret == PATHINFO_OK */
+ if (pp->initialized == INIT_OK) {
+ ev_add_path(pp, vecs, 1);
+ pp->tick = 1;
+ } else {
+ if (ret == PATHINFO_SKIPPED)
+ return -1;
+ /*
+ * We failed multiple times to initialize this
+ * path properly. Don't re-check too often.
+ */
+ pp->checkint = max_checkint;
+ }
+ }
+ return 0;
+ }
+ return 0;
+}
+
enum checker_state {
CHECKER_STARTING,
CHECKER_RUNNING,
@@ -2751,7 +2786,11 @@ checkerloop (void *ap)
if (pp->is_checked)
continue;
pp->is_checked = true;
- rc = check_path(vecs, pp, ticks);
+ if (pp->mpp)
+ rc = check_path(vecs, pp, ticks);
+ else
+ rc = handle_uninitialized_path(vecs, pp,
+ ticks);
if (rc < 0) {
condlog(1, "%s: check_path() failed, removing",
pp->dev);
--
2.45.0
^ permalink raw reply related [flat|nested] 58+ messages in thread* Re: [PATCH 05/22] multipathd: handle uninitialized paths in new function
2024-07-13 6:04 ` [PATCH 05/22] multipathd: handle uninitialized paths in new function Benjamin Marzinski
@ 2024-07-15 15:34 ` Martin Wilck
2024-07-15 15:36 ` Martin Wilck
0 siblings, 1 reply; 58+ messages in thread
From: Martin Wilck @ 2024-07-15 15:34 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Sat, 2024-07-13 at 02:04 -0400, Benjamin Marzinski wrote:
> Pull the code to handle uninitialized paths out of check_path() and
> into
> a new function called handle_uninitialized_paths(). This should cause
> no functional changes.
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
> multipathd/main.c | 171 ++++++++++++++++++++++++++++----------------
> --
> 1 file changed, 105 insertions(+), 66 deletions(-)
>
> diff --git a/multipathd/main.c b/multipathd/main.c
> index 9b1b5226..9e329e89 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -2343,8 +2343,7 @@ check_path_state(struct path *pp)
> }
>
> /*
> - * Returns '1' if the path has been checked, '-1' if it was
> blacklisted
> - * and '0' otherwise
> + * Returns '1' if the path has been checked and '0' otherwise
> */
> int
> check_path (struct vectors * vecs, struct path * pp, unsigned int
> ticks)
> @@ -2354,16 +2353,13 @@ check_path (struct vectors * vecs, struct
> path * pp, unsigned int ticks)
> int chkr_new_path_up = 0;
> int disable_reinstate = 0;
> int oldchkrstate = pp->chkrstate;
> - int retrigger_tries;
> unsigned int checkint, max_checkint;
> struct config *conf;
> int marginal_pathgroups, marginal_changed = 0;
> int ret;
> bool need_reload;
>
> - if (((pp->initialized == INIT_OK || pp->initialized ==
> INIT_PARTIAL ||
> - pp->initialized == INIT_REQUESTED_UDEV) && !pp->mpp)
> ||
> - pp->initialized == INIT_REMOVED)
> + if (pp->initialized == INIT_REMOVED)
> return 0;
>
> if (pp->tick)
> @@ -2372,7 +2368,6 @@ check_path (struct vectors * vecs, struct path
> * pp, unsigned int ticks)
> return 0; /* don't check this path yet */
>
> conf = get_multipath_config();
> - retrigger_tries = conf->retrigger_tries;
> checkint = conf->checkint;
> max_checkint = conf->max_checkint;
> marginal_pathgroups = conf->marginal_pathgroups;
> @@ -2383,38 +2378,6 @@ check_path (struct vectors * vecs, struct path
> * pp, unsigned int ticks)
> pp->checkint = checkint;
> };
>
> - if (!pp->mpp && pp->initialized == INIT_MISSING_UDEV) {
> - if (pp->retriggers < retrigger_tries) {
> - static const char change[] = "change";
> - ssize_t ret;
> -
> - condlog(2, "%s: triggering change event to
> reinitialize",
> - pp->dev);
> - pp->initialized = INIT_REQUESTED_UDEV;
> - pp->retriggers++;
> - ret = sysfs_attr_set_value(pp->udev,
> "uevent", change,
> - sizeof(change) -
> 1);
> - if (ret != sizeof(change) - 1)
> - log_sysfs_attr_set_value(1, ret,
> - "%s: failed
> to trigger change event",
> - pp->dev);
> - return 0;
> - } else {
> - condlog(1, "%s: not initialized after %d
> udev retriggers",
> - pp->dev, retrigger_tries);
> - /*
> - * Make sure that the "add missing path"
> code path
> - * below may reinstate the path later, if it
> ever
> - * comes up again.
> - * The WWID needs not be cleared; if it was
> set, the
> - * state hadn't been INIT_MISSING_UDEV in
> the first
> - * place.
> - */
> - pp->initialized = INIT_FAILED;
> - return 0;
> - }
> - }
> -
> /*
> * provision a next check soonest,
> * in case we exit abnormally from here
> @@ -2435,32 +2398,6 @@ check_path (struct vectors * vecs, struct path
> * pp, unsigned int ticks)
> pp->checkint = checkint;
> return 1;
> }
> - if (!pp->mpp) {
> - if (!strlen(pp->wwid) &&
> - (pp->initialized == INIT_FAILED ||
> - pp->initialized == INIT_NEW) &&
> - (newstate == PATH_UP || newstate == PATH_GHOST))
> {
> - condlog(2, "%s: add missing path", pp->dev);
> - conf = get_multipath_config();
> - pthread_cleanup_push(put_multipath_config,
> conf);
> - ret = pathinfo(pp, conf, DI_ALL |
> DI_BLACKLIST);
> - pthread_cleanup_pop(1);
> - /* INIT_OK implies ret == PATHINFO_OK */
> - if (pp->initialized == INIT_OK) {
> - ev_add_path(pp, vecs, 1);
> - pp->tick = 1;
> - } else {
> - if (ret == PATHINFO_SKIPPED)
> - return -1;
> - /*
> - * We failed multiple times to
> initialize this
> - * path properly. Don't re-check too
> often.
> - */
> - pp->checkint = max_checkint;
> - }
> - }
> - return 0;
> - }
> /*
> * Async IO in flight. Keep the previous path state
> * and reschedule as soon as possible
> @@ -2679,6 +2616,104 @@ check_path (struct vectors * vecs, struct
> path * pp, unsigned int ticks)
> return 1;
> }
>
> +/*
> + * Returns -1 if the path was blacklisted, and 0 otherwise
> + */
> +static int
> +handle_uninitialized_path(struct vectors * vecs, struct path * pp,
> + unsigned int ticks)
> +{
> + int newstate;
> + int retrigger_tries;
> + unsigned int checkint, max_checkint;
> + struct config *conf;
> + int ret;
> +
> + if (((pp->initialized == INIT_OK || pp->initialized ==
> INIT_PARTIAL ||
> + pp->initialized == INIT_REQUESTED_UDEV) && !pp->mpp)
> ||
> + pp->initialized == INIT_REMOVED)
> + return 0;
!pp->mpp always holds if this function is called, so we can skip this
part of the test.
> +
> + if (pp->tick)
> + pp->tick -= (pp->tick > ticks) ? ticks : pp->tick;
> + if (pp->tick)
> + return 0; /* don't check this path yet */
IMO it would make sense to move this code into checkerloop(). Actually
we could make sure that for those paths that are initialized but not
part of an mpp (see above), pp->tick is always zero.
> +
> + conf = get_multipath_config();
> + retrigger_tries = conf->retrigger_tries;
> + checkint = conf->checkint;
> + max_checkint = conf->max_checkint;
> + put_multipath_config(conf);
> +
> + if (!pp->mpp && pp->initialized == INIT_MISSING_UDEV) {
> + if (pp->retriggers < retrigger_tries) {
> + static const char change[] = "change";
> + ssize_t ret;
> +
> + condlog(2, "%s: triggering change event to
> reinitialize",
> + pp->dev);
> + pp->initialized = INIT_REQUESTED_UDEV;
> + pp->retriggers++;
> + ret = sysfs_attr_set_value(pp->udev,
> "uevent", change,
> + sizeof(change) -
> 1);
> + if (ret != sizeof(change) - 1)
> + log_sysfs_attr_set_value(1, ret,
> + "%s: failed
> to trigger change event",
> + pp->dev);
> + return 0;
> + } else {
> + condlog(1, "%s: not initialized after %d
> udev retriggers",
> + pp->dev, retrigger_tries);
> + /*
> + * Make sure that the "add missing path"
> code path
> + * below may reinstate the path later, if it
> ever
> + * comes up again.
> + * The WWID needs not be cleared; if it was
> set, the
> + * state hadn't been INIT_MISSING_UDEV in
> the first
> + * place.
> + */
> + pp->initialized = INIT_FAILED;
> + return 0;
> + }
> + }
> +
> + /*
> + * provision a next check soonest,
> + * in case we exit abnormally from here
> + */
> + pp->tick = checkint;
> +
> + newstate = check_path_state(pp);
> +
> + if (!pp->mpp) {
Again, this test can be skipped, or am I overlooking something?
> + if (!strlen(pp->wwid) &&
> + (pp->initialized == INIT_FAILED ||
> + pp->initialized == INIT_NEW) &&
> + (newstate == PATH_UP || newstate == PATH_GHOST))
> {
> + condlog(2, "%s: add missing path", pp->dev);
> + conf = get_multipath_config();
> + pthread_cleanup_push(put_multipath_config,
> conf);
> + ret = pathinfo(pp, conf, DI_ALL |
> DI_BLACKLIST);
> + pthread_cleanup_pop(1);
> + /* INIT_OK implies ret == PATHINFO_OK */
> + if (pp->initialized == INIT_OK) {
> + ev_add_path(pp, vecs, 1);
> + pp->tick = 1;
> + } else {
> + if (ret == PATHINFO_SKIPPED)
> + return -1;
> + /*
> + * We failed multiple times to
> initialize this
> + * path properly. Don't re-check too
> often.
> + */
> + pp->checkint = max_checkint;
> + }
> + }
> + return 0;
> + }
> + return 0;
> +}
> +
> enum checker_state {
> CHECKER_STARTING,
> CHECKER_RUNNING,
> @@ -2751,7 +2786,11 @@ checkerloop (void *ap)
> if (pp->is_checked)
> continue;
> pp->is_checked = true;
> - rc = check_path(vecs, pp, ticks);
> + if (pp->mpp)
> + rc = check_path(vecs, pp,
> ticks);
> + else
> + rc =
> handle_uninitialized_path(vecs, pp,
> +
> ticks);
> if (rc < 0) {
> condlog(1, "%s: check_path()
> failed, removing",
> pp->dev);
^ permalink raw reply [flat|nested] 58+ messages in thread* Re: [PATCH 05/22] multipathd: handle uninitialized paths in new function
2024-07-15 15:34 ` Martin Wilck
@ 2024-07-15 15:36 ` Martin Wilck
2024-07-16 14:23 ` Benjamin Marzinski
0 siblings, 1 reply; 58+ messages in thread
From: Martin Wilck @ 2024-07-15 15:36 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Mon, 2024-07-15 at 17:34 +0200, Martin Wilck wrote:
> On Sat, 2024-07-13 at 02:04 -0400, Benjamin Marzinski wrote:
> > Pull the code to handle uninitialized paths out of check_path() and
> > into
> > a new function called handle_uninitialized_paths(). This should
> > cause
> > no functional changes.
> >
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> > multipathd/main.c | 171 ++++++++++++++++++++++++++++--------------
> > --
> > --
> > 1 file changed, 105 insertions(+), 66 deletions(-)
>
> !pp->mpp always holds if this function is called, so we can skip this
> part of the test.
>
Just realized that you did this in patch 7.
LGTM, then, but if you resubmit, I suggest you just squash these two.
Martin
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 05/22] multipathd: handle uninitialized paths in new function
2024-07-15 15:36 ` Martin Wilck
@ 2024-07-16 14:23 ` Benjamin Marzinski
0 siblings, 0 replies; 58+ messages in thread
From: Benjamin Marzinski @ 2024-07-16 14:23 UTC (permalink / raw)
To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development
On Mon, Jul 15, 2024 at 05:36:54PM +0200, Martin Wilck wrote:
> On Mon, 2024-07-15 at 17:34 +0200, Martin Wilck wrote:
> > On Sat, 2024-07-13 at 02:04 -0400, Benjamin Marzinski wrote:
> > > Pull the code to handle uninitialized paths out of check_path() and
> > > into
> > > a new function called handle_uninitialized_paths(). This should
> > > cause
> > > no functional changes.
> > >
> > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > > ---
> > > multipathd/main.c | 171 ++++++++++++++++++++++++++++--------------
> > > --
> > > --
> > > 1 file changed, 105 insertions(+), 66 deletions(-)
> >
> > !pp->mpp always holds if this function is called, so we can skip this
> > part of the test.
> >
>
> Just realized that you did this in patch 7.
>
> LGTM, then, but if you resubmit, I suggest you just squash these two.
Sure.
>
> Martin
^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH 06/22] multipathd: make check_path() static
2024-07-13 6:04 [PATCH 00/22] path checker refactor and misc fixes Benjamin Marzinski
` (4 preceding siblings ...)
2024-07-13 6:04 ` [PATCH 05/22] multipathd: handle uninitialized paths in new function Benjamin Marzinski
@ 2024-07-13 6:04 ` Benjamin Marzinski
2024-07-13 6:04 ` [PATCH 07/22] multipathd: remove redundant checks in handle_uninitialized_path() Benjamin Marzinski
` (15 subsequent siblings)
21 siblings, 0 replies; 58+ messages in thread
From: Benjamin Marzinski @ 2024-07-13 6:04 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
multipathd/main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/multipathd/main.c b/multipathd/main.c
index 9e329e89..276329c4 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2345,7 +2345,7 @@ check_path_state(struct path *pp)
/*
* Returns '1' if the path has been checked and '0' otherwise
*/
-int
+static int
check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
{
int newstate;
--
2.45.0
^ permalink raw reply related [flat|nested] 58+ messages in thread* [PATCH 07/22] multipathd: remove redundant checks in handle_uninitialized_path()
2024-07-13 6:04 [PATCH 00/22] path checker refactor and misc fixes Benjamin Marzinski
` (5 preceding siblings ...)
2024-07-13 6:04 ` [PATCH 06/22] multipathd: make check_path() static Benjamin Marzinski
@ 2024-07-13 6:04 ` Benjamin Marzinski
2024-07-13 6:04 ` [PATCH 08/22] multipathd: check paths immediately after failing udev initialization Benjamin Marzinski
` (14 subsequent siblings)
21 siblings, 0 replies; 58+ messages in thread
From: Benjamin Marzinski @ 2024-07-13 6:04 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
multipathd/main.c | 51 +++++++++++++++++++++--------------------------
1 file changed, 23 insertions(+), 28 deletions(-)
diff --git a/multipathd/main.c b/multipathd/main.c
index 276329c4..d6f4704b 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2629,9 +2629,8 @@ handle_uninitialized_path(struct vectors * vecs, struct path * pp,
struct config *conf;
int ret;
- if (((pp->initialized == INIT_OK || pp->initialized == INIT_PARTIAL ||
- pp->initialized == INIT_REQUESTED_UDEV) && !pp->mpp) ||
- pp->initialized == INIT_REMOVED)
+ if (pp->initialized != INIT_NEW && pp->initialized != INIT_FAILED &&
+ pp->initialized != INIT_MISSING_UDEV)
return 0;
if (pp->tick)
@@ -2645,7 +2644,7 @@ handle_uninitialized_path(struct vectors * vecs, struct path * pp,
max_checkint = conf->max_checkint;
put_multipath_config(conf);
- if (!pp->mpp && pp->initialized == INIT_MISSING_UDEV) {
+ if (pp->initialized == INIT_MISSING_UDEV) {
if (pp->retriggers < retrigger_tries) {
static const char change[] = "change";
ssize_t ret;
@@ -2685,31 +2684,27 @@ handle_uninitialized_path(struct vectors * vecs, struct path * pp,
newstate = check_path_state(pp);
- if (!pp->mpp) {
- if (!strlen(pp->wwid) &&
- (pp->initialized == INIT_FAILED ||
- pp->initialized == INIT_NEW) &&
- (newstate == PATH_UP || newstate == PATH_GHOST)) {
- condlog(2, "%s: add missing path", pp->dev);
- conf = get_multipath_config();
- pthread_cleanup_push(put_multipath_config, conf);
- ret = pathinfo(pp, conf, DI_ALL | DI_BLACKLIST);
- pthread_cleanup_pop(1);
- /* INIT_OK implies ret == PATHINFO_OK */
- if (pp->initialized == INIT_OK) {
- ev_add_path(pp, vecs, 1);
- pp->tick = 1;
- } else {
- if (ret == PATHINFO_SKIPPED)
- return -1;
- /*
- * We failed multiple times to initialize this
- * path properly. Don't re-check too often.
- */
- pp->checkint = max_checkint;
- }
+ if (!strlen(pp->wwid) &&
+ (pp->initialized == INIT_FAILED || pp->initialized == INIT_NEW) &&
+ (newstate == PATH_UP || newstate == PATH_GHOST)) {
+ condlog(2, "%s: add missing path", pp->dev);
+ conf = get_multipath_config();
+ pthread_cleanup_push(put_multipath_config, conf);
+ ret = pathinfo(pp, conf, DI_ALL | DI_BLACKLIST);
+ pthread_cleanup_pop(1);
+ /* INIT_OK implies ret == PATHINFO_OK */
+ if (pp->initialized == INIT_OK) {
+ ev_add_path(pp, vecs, 1);
+ pp->tick = 1;
+ } else {
+ if (ret == PATHINFO_SKIPPED)
+ return -1;
+ /*
+ * We failed multiple times to initialize this
+ * path properly. Don't re-check too often.
+ */
+ pp->checkint = max_checkint;
}
- return 0;
}
return 0;
}
--
2.45.0
^ permalink raw reply related [flat|nested] 58+ messages in thread* [PATCH 08/22] multipathd: check paths immediately after failing udev initialization
2024-07-13 6:04 [PATCH 00/22] path checker refactor and misc fixes Benjamin Marzinski
` (6 preceding siblings ...)
2024-07-13 6:04 ` [PATCH 07/22] multipathd: remove redundant checks in handle_uninitialized_path() Benjamin Marzinski
@ 2024-07-13 6:04 ` Benjamin Marzinski
2024-07-15 15:39 ` Martin Wilck
2024-07-13 6:04 ` [PATCH 09/22] multipathd: set pp->tick = max_checkint in handle_uninitialized_path Benjamin Marzinski
` (13 subsequent siblings)
21 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2024-07-13 6:04 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
When handle_uninitialized_path() moves a path from INIT_MISSING_UDEV to
INIT_FAILED, it has already waited for conf->retrigger_delay seconds. It
might as well check the path state now, instead of waiting for the next
iteration.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
multipathd/main.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/multipathd/main.c b/multipathd/main.c
index d6f4704b..234fbf7d 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2672,7 +2672,6 @@ handle_uninitialized_path(struct vectors * vecs, struct path * pp,
* place.
*/
pp->initialized = INIT_FAILED;
- return 0;
}
}
--
2.45.0
^ permalink raw reply related [flat|nested] 58+ messages in thread* [PATCH 09/22] multipathd: set pp->tick = max_checkint in handle_uninitialized_path
2024-07-13 6:04 [PATCH 00/22] path checker refactor and misc fixes Benjamin Marzinski
` (7 preceding siblings ...)
2024-07-13 6:04 ` [PATCH 08/22] multipathd: check paths immediately after failing udev initialization Benjamin Marzinski
@ 2024-07-13 6:04 ` Benjamin Marzinski
2024-07-15 16:00 ` Martin Wilck
2024-07-13 6:04 ` [PATCH 10/22] multipathd: return 0 from path_check() if that path wasn't checked Benjamin Marzinski
` (12 subsequent siblings)
21 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2024-07-13 6:04 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
When handle_uninitialized_path() checks a path, it will either:
1. trigger a new uevent to initalize the path, in which case it won't
check the path again until that uevent is processed and updates
pp->tick
2. blacklist the path, in which case the path gets removed
3. intialize the path correctly, in which case it sets pp->tick = 1
4. fail to initialize the path, in which case it was supposed to set
pp->tick to max_checkint, but instead it set pp->checkint to
max_checkint, which never worked correctly.
By setting pp->tick to max_checkint at the start,
handle_uninitialized_path() will continue to work as it previously did,
except in case 4, where in now works correctly. There's no point in
messing with pp->checkint for paths that haven't been initialized yet.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
multipathd/main.c | 20 +++-----------------
1 file changed, 3 insertions(+), 17 deletions(-)
diff --git a/multipathd/main.c b/multipathd/main.c
index 234fbf7d..5e89fae5 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2625,7 +2625,6 @@ handle_uninitialized_path(struct vectors * vecs, struct path * pp,
{
int newstate;
int retrigger_tries;
- unsigned int checkint, max_checkint;
struct config *conf;
int ret;
@@ -2640,8 +2639,7 @@ handle_uninitialized_path(struct vectors * vecs, struct path * pp,
conf = get_multipath_config();
retrigger_tries = conf->retrigger_tries;
- checkint = conf->checkint;
- max_checkint = conf->max_checkint;
+ pp->tick = conf->max_checkint;
put_multipath_config(conf);
if (pp->initialized == INIT_MISSING_UDEV) {
@@ -2675,12 +2673,6 @@ handle_uninitialized_path(struct vectors * vecs, struct path * pp,
}
}
- /*
- * provision a next check soonest,
- * in case we exit abnormally from here
- */
- pp->tick = checkint;
-
newstate = check_path_state(pp);
if (!strlen(pp->wwid) &&
@@ -2695,14 +2687,8 @@ handle_uninitialized_path(struct vectors * vecs, struct path * pp,
if (pp->initialized == INIT_OK) {
ev_add_path(pp, vecs, 1);
pp->tick = 1;
- } else {
- if (ret == PATHINFO_SKIPPED)
- return -1;
- /*
- * We failed multiple times to initialize this
- * path properly. Don't re-check too often.
- */
- pp->checkint = max_checkint;
+ } else if (ret == PATHINFO_SKIPPED) {
+ return -1;
}
}
return 0;
--
2.45.0
^ permalink raw reply related [flat|nested] 58+ messages in thread* Re: [PATCH 09/22] multipathd: set pp->tick = max_checkint in handle_uninitialized_path
2024-07-13 6:04 ` [PATCH 09/22] multipathd: set pp->tick = max_checkint in handle_uninitialized_path Benjamin Marzinski
@ 2024-07-15 16:00 ` Martin Wilck
2024-07-16 14:52 ` Benjamin Marzinski
0 siblings, 1 reply; 58+ messages in thread
From: Martin Wilck @ 2024-07-15 16:00 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Sat, 2024-07-13 at 02:04 -0400, Benjamin Marzinski wrote:
> When handle_uninitialized_path() checks a path, it will either:
> 1. trigger a new uevent to initalize the path, in which case it won't
> check the path again until that uevent is processed and updates
> pp->tick
> 2. blacklist the path, in which case the path gets removed
> 3. intialize the path correctly, in which case it sets pp->tick = 1
> 4. fail to initialize the path, in which case it was supposed to set
> pp->tick to max_checkint, but instead it set pp->checkint to
> max_checkint, which never worked correctly.
>
> By setting pp->tick to max_checkint at the start,
> handle_uninitialized_path() will continue to work as it previously
> did,
> except in case 4, where in now works correctly. There's no point in
> messing with pp->checkint for paths that haven't been initialized
> yet.
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Q: I suppose it makes sense to backport this to earlier releases?
Martin
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 09/22] multipathd: set pp->tick = max_checkint in handle_uninitialized_path
2024-07-15 16:00 ` Martin Wilck
@ 2024-07-16 14:52 ` Benjamin Marzinski
2024-07-16 15:31 ` Martin Wilck
0 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2024-07-16 14:52 UTC (permalink / raw)
To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development
On Mon, Jul 15, 2024 at 06:00:33PM +0200, Martin Wilck wrote:
> On Sat, 2024-07-13 at 02:04 -0400, Benjamin Marzinski wrote:
> > When handle_uninitialized_path() checks a path, it will either:
> > 1. trigger a new uevent to initalize the path, in which case it won't
> > check the path again until that uevent is processed and updates
> > pp->tick
> > 2. blacklist the path, in which case the path gets removed
> > 3. intialize the path correctly, in which case it sets pp->tick = 1
> > 4. fail to initialize the path, in which case it was supposed to set
> > pp->tick to max_checkint, but instead it set pp->checkint to
> > max_checkint, which never worked correctly.
> >
> > By setting pp->tick to max_checkint at the start,
> > handle_uninitialized_path() will continue to work as it previously
> > did,
> > except in case 4, where in now works correctly. There's no point in
> > messing with pp->checkint for paths that haven't been initialized
> > yet.
> >
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
>
> Reviewed-by: Martin Wilck <mwilck@suse.com>
>
> Q: I suppose it makes sense to backport this to earlier releases?
It's not a horrible bug. Previously, we set pp->tick to checkint early
in the function. This just means that these problematic paths will
be checked every checkint instead of every max_checkint.
Also, do you mean distros backporting this change? I'm not sure how
much value there is in maintaining upstream stable branches.
-Ben
>
> Martin
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 09/22] multipathd: set pp->tick = max_checkint in handle_uninitialized_path
2024-07-16 14:52 ` Benjamin Marzinski
@ 2024-07-16 15:31 ` Martin Wilck
0 siblings, 0 replies; 58+ messages in thread
From: Martin Wilck @ 2024-07-16 15:31 UTC (permalink / raw)
To: Benjamin Marzinski; +Cc: Christophe Varoqui, device-mapper development
On Tue, 2024-07-16 at 10:52 -0400, Benjamin Marzinski wrote:
> On Mon, Jul 15, 2024 at 06:00:33PM +0200, Martin Wilck wrote:
> > On Sat, 2024-07-13 at 02:04 -0400, Benjamin Marzinski wrote:
> > > When handle_uninitialized_path() checks a path, it will either:
> > > 1. trigger a new uevent to initalize the path, in which case it
> > > won't
> > > check the path again until that uevent is processed and
> > > updates
> > > pp->tick
> > > 2. blacklist the path, in which case the path gets removed
> > > 3. intialize the path correctly, in which case it sets pp->tick =
> > > 1
> > > 4. fail to initialize the path, in which case it was supposed to
> > > set
> > > pp->tick to max_checkint, but instead it set pp->checkint to
> > > max_checkint, which never worked correctly.
> > >
> > > By setting pp->tick to max_checkint at the start,
> > > handle_uninitialized_path() will continue to work as it
> > > previously
> > > did,
> > > except in case 4, where in now works correctly. There's no point
> > > in
> > > messing with pp->checkint for paths that haven't been initialized
> > > yet.
> > >
> > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> >
> > Reviewed-by: Martin Wilck <mwilck@suse.com>
> >
> > Q: I suppose it makes sense to backport this to earlier releases?
>
> It's not a horrible bug. Previously, we set pp->tick to checkint
> early
> in the function. This just means that these problematic paths will
> be checked every checkint instead of every max_checkint.
>
> Also, do you mean distros backporting this change? I'm not sure how
> much value there is in maintaining upstream stable branches.
Yes, I was referring to distro backporting. I don't want to start
stable branches upstream, although we've had several cases where I
thought it might be worth it.
Martin
^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH 10/22] multipathd: return 0 from path_check() if that path wasn't checked
2024-07-13 6:04 [PATCH 00/22] path checker refactor and misc fixes Benjamin Marzinski
` (8 preceding siblings ...)
2024-07-13 6:04 ` [PATCH 09/22] multipathd: set pp->tick = max_checkint in handle_uninitialized_path Benjamin Marzinski
@ 2024-07-13 6:04 ` Benjamin Marzinski
2024-07-15 16:05 ` Martin Wilck
2024-07-13 6:04 ` [PATCH 11/22] multipathd: reorder path state checks Benjamin Marzinski
` (11 subsequent siblings)
21 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2024-07-13 6:04 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
If check_path_state() returns PATH_WILD or PATH_UNCHECKED, then the
path wasn't correctly checked, and path_check() should return 0.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
multipathd/main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/multipathd/main.c b/multipathd/main.c
index 5e89fae5..43bd5936 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2386,7 +2386,7 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
newstate = check_path_state(pp);
if (newstate == PATH_WILD || newstate == PATH_UNCHECKED) {
- return 1;
+ return 0;
} else if ((newstate != PATH_UP && newstate != PATH_GHOST &&
newstate != PATH_PENDING) && (pp->state == PATH_DELAYED)) {
/* If path state become failed again cancel path delay state */
--
2.45.0
^ permalink raw reply related [flat|nested] 58+ messages in thread* Re: [PATCH 10/22] multipathd: return 0 from path_check() if that path wasn't checked
2024-07-13 6:04 ` [PATCH 10/22] multipathd: return 0 from path_check() if that path wasn't checked Benjamin Marzinski
@ 2024-07-15 16:05 ` Martin Wilck
2024-07-16 14:52 ` Benjamin Marzinski
0 siblings, 1 reply; 58+ messages in thread
From: Martin Wilck @ 2024-07-15 16:05 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Sat, 2024-07-13 at 02:04 -0400, Benjamin Marzinski wrote:
> If check_path_state() returns PATH_WILD or PATH_UNCHECKED, then the
> path wasn't correctly checked, and path_check() should return 0.
You mean check_path(). Other than that,
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Reviewed_by: Martin Wilck <mwilck@suse.com>
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 10/22] multipathd: return 0 from path_check() if that path wasn't checked
2024-07-15 16:05 ` Martin Wilck
@ 2024-07-16 14:52 ` Benjamin Marzinski
0 siblings, 0 replies; 58+ messages in thread
From: Benjamin Marzinski @ 2024-07-16 14:52 UTC (permalink / raw)
To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development
On Mon, Jul 15, 2024 at 06:05:28PM +0200, Martin Wilck wrote:
> On Sat, 2024-07-13 at 02:04 -0400, Benjamin Marzinski wrote:
> > If check_path_state() returns PATH_WILD or PATH_UNCHECKED, then the
> > path wasn't correctly checked, and path_check() should return 0.
>
> You mean check_path(). Other than that,
I'll fix that in the next version.
-Ben
>
> >
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
>
> Reviewed_by: Martin Wilck <mwilck@suse.com>
^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH 11/22] multipathd: reorder path state checks
2024-07-13 6:04 [PATCH 00/22] path checker refactor and misc fixes Benjamin Marzinski
` (9 preceding siblings ...)
2024-07-13 6:04 ` [PATCH 10/22] multipathd: return 0 from path_check() if that path wasn't checked Benjamin Marzinski
@ 2024-07-13 6:04 ` Benjamin Marzinski
2024-07-15 16:23 ` Martin Wilck
2024-07-13 6:04 ` [PATCH 12/22] multipathd: adjust when mpp is synced with the kernel Benjamin Marzinski
` (10 subsequent siblings)
21 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2024-07-13 6:04 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
Reorder the path state checks in check_path(), so that it first does
all the checks which can result in returning with the path unchecked or
removed from the multipath device.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
multipathd/main.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/multipathd/main.c b/multipathd/main.c
index 43bd5936..fbd253ca 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2385,19 +2385,8 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
pp->tick = checkint;
newstate = check_path_state(pp);
- if (newstate == PATH_WILD || newstate == PATH_UNCHECKED) {
+ if (newstate == PATH_WILD || newstate == PATH_UNCHECKED)
return 0;
- } else if ((newstate != PATH_UP && newstate != PATH_GHOST &&
- newstate != PATH_PENDING) && (pp->state == PATH_DELAYED)) {
- /* If path state become failed again cancel path delay state */
- pp->state = newstate;
- /*
- * path state bad again should change the check interval time
- * to the shortest delay
- */
- pp->checkint = checkint;
- return 1;
- }
/*
* Async IO in flight. Keep the previous path state
* and reschedule as soon as possible
@@ -2436,6 +2425,17 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
return 0;
}
+ if ((newstate != PATH_UP && newstate != PATH_GHOST &&
+ newstate != PATH_PENDING) && (pp->state == PATH_DELAYED)) {
+ /* If path state become failed again cancel path delay state */
+ pp->state = newstate;
+ /*
+ * path state bad again should change the check interval time
+ * to the shortest delay
+ */
+ pp->checkint = checkint;
+ return 1;
+ }
if ((newstate == PATH_UP || newstate == PATH_GHOST) &&
(san_path_check_enabled(pp->mpp) ||
marginal_path_check_enabled(pp->mpp))) {
--
2.45.0
^ permalink raw reply related [flat|nested] 58+ messages in thread* Re: [PATCH 11/22] multipathd: reorder path state checks
2024-07-13 6:04 ` [PATCH 11/22] multipathd: reorder path state checks Benjamin Marzinski
@ 2024-07-15 16:23 ` Martin Wilck
0 siblings, 0 replies; 58+ messages in thread
From: Martin Wilck @ 2024-07-15 16:23 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Sat, 2024-07-13 at 02:04 -0400, Benjamin Marzinski wrote:
> Reorder the path state checks in check_path(), so that it first does
> all the checks which can result in returning with the path unchecked
> or
> removed from the multipath device.
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
This will move the code in question to after the call to
update_multipath_strings().
But that logic is modified in the next patch anyway, so
Reviewed-by: Martin Wilck <mwilck@suse.com>
^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH 12/22] multipathd: adjust when mpp is synced with the kernel
2024-07-13 6:04 [PATCH 00/22] path checker refactor and misc fixes Benjamin Marzinski
` (10 preceding siblings ...)
2024-07-13 6:04 ` [PATCH 11/22] multipathd: reorder path state checks Benjamin Marzinski
@ 2024-07-13 6:04 ` Benjamin Marzinski
2024-07-15 16:23 ` Martin Wilck
2024-07-13 6:04 ` [PATCH 13/22] multipathd: resync map after setup_map in ev_remove_path Benjamin Marzinski
` (9 subsequent siblings)
21 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2024-07-13 6:04 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
Move the code to sync the mpp device state into a helper function and
add a counter to to make sure that the device is synced at least once
every max_checkint secs. This makes sure that multipath devices with no
paths will still get synced with the kernel. Also, if multiple paths
are checked in the same loop, the multipath device will only be synced
with the kernel once, since every time the mpp is synced in any code
path, mpp->sync_tick is reset.
The code still syncs the mpp before updating the path state for two
main reasons.
1. Sometimes multipathd leaves the mpp with a garbage state. Future
patches will fix most of these cases, but the code intentially
does not remove the mpp is resyncing fails while checking paths.
But this does leave the mpp with a garbage state.
2. The kernel chages the multipath state independently of multipathd. If
the kernel fails a path, a uevent will arrive shortly. But the kernel
doesn't provide any notification when it switches the active
path group or if it ends up picking a different one than multipathd
selected. Multipathd needs to know the actual current pathgroup to
know when it should be switching them.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
libmultipath/configure.c | 1 +
libmultipath/structs.h | 2 ++
libmultipath/structs_vec.c | 5 +++
multipathd/main.c | 64 +++++++++++++++++++++++++-------------
4 files changed, 50 insertions(+), 22 deletions(-)
diff --git a/libmultipath/configure.c b/libmultipath/configure.c
index b4de863c..34158e31 100644
--- a/libmultipath/configure.c
+++ b/libmultipath/configure.c
@@ -358,6 +358,7 @@ int setup_map(struct multipath *mpp, char **params, struct vectors *vecs)
sysfs_set_scsi_tmo(conf, mpp);
marginal_pathgroups = conf->marginal_pathgroups;
+ mpp->sync_tick = conf->max_checkint;
pthread_cleanup_pop(1);
if (!mpp->features || !mpp->hwhandler || !mpp->selector) {
diff --git a/libmultipath/structs.h b/libmultipath/structs.h
index 3b91e39c..002eeae1 100644
--- a/libmultipath/structs.h
+++ b/libmultipath/structs.h
@@ -453,6 +453,8 @@ struct multipath {
int ghost_delay;
int ghost_delay_tick;
int queue_mode;
+ unsigned int sync_tick;
+ bool is_checked;
uid_t uid;
gid_t gid;
mode_t mode;
diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
index 731b1bce..60360c76 100644
--- a/libmultipath/structs_vec.c
+++ b/libmultipath/structs_vec.c
@@ -505,10 +505,15 @@ update_multipath_table (struct multipath *mpp, vector pathvec, int flags)
char __attribute__((cleanup(cleanup_charp))) *params = NULL;
char __attribute__((cleanup(cleanup_charp))) *status = NULL;
unsigned long long size = mpp->size;
+ struct config *conf;
if (!mpp)
return r;
+ conf = get_multipath_config();
+ mpp->sync_tick = conf->max_checkint;
+ put_multipath_config(conf);
+
r = libmp_mapinfo(DM_MAP_BY_NAME | MAPINFO_MPATH_ONLY,
(mapid_t) { .str = mpp->alias },
(mapinfo_t) {
diff --git a/multipathd/main.c b/multipathd/main.c
index fbd253ca..179fec24 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2342,6 +2342,37 @@ check_path_state(struct path *pp)
return newstate;
}
+static void
+do_check_mpp(struct vectors * vecs, struct multipath *mpp)
+{
+ int i, ret;
+ struct path *pp;
+
+ mpp->is_checked = true;
+ ret = update_multipath_strings(mpp, vecs->pathvec);
+ if (ret != DMP_OK) {
+ condlog(1, "%s: %s", mpp->alias, ret == DMP_NOT_FOUND ?
+ "device not found" :
+ "couldn't synchronize with kernel state");
+ vector_foreach_slot (mpp->paths, pp, i)
+ pp->dmstate = PSTATE_UNDEF;
+ return;
+ }
+ set_no_path_retry(mpp);
+}
+
+static void
+check_mpp(struct vectors * vecs, struct multipath *mpp, unsigned int ticks)
+{
+ if (mpp->sync_tick)
+ mpp->sync_tick -= (mpp->sync_tick > ticks) ? ticks :
+ mpp->sync_tick;
+ if (mpp->sync_tick)
+ return;
+
+ do_check_mpp(vecs, mpp);
+}
+
/*
* Returns '1' if the path has been checked and '0' otherwise
*/
@@ -2356,7 +2387,6 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
unsigned int checkint, max_checkint;
struct config *conf;
int marginal_pathgroups, marginal_changed = 0;
- int ret;
bool need_reload;
if (pp->initialized == INIT_REMOVED)
@@ -2395,26 +2425,6 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
pp->tick = 1;
return 0;
}
- /*
- * Synchronize with kernel state
- */
- ret = update_multipath_strings(pp->mpp, vecs->pathvec);
- if (ret != DMP_OK) {
- if (ret == DMP_NOT_FOUND) {
- /* multipath device missing. Likely removed */
- condlog(1, "%s: multipath device '%s' not found",
- pp->dev, pp->mpp ? pp->mpp->alias : "");
- return 0;
- } else
- condlog(1, "%s: Couldn't synchronize with kernel state",
- pp->dev);
- pp->dmstate = PSTATE_UNDEF;
- }
- /* if update_multipath_strings orphaned the path, quit early */
- if (!pp->mpp)
- return 0;
- set_no_path_retry(pp->mpp);
-
if (pp->recheck_wwid == RECHECK_WWID_ON &&
(newstate == PATH_UP || newstate == PATH_GHOST) &&
((pp->state != PATH_UP && pp->state != PATH_GHOST) ||
@@ -2424,7 +2434,12 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
handle_path_wwid_change(pp, vecs);
return 0;
}
-
+ if (!pp->mpp->is_checked) {
+ do_check_mpp(vecs, pp->mpp);
+ /* if update_multipath_strings orphaned the path, quit early */
+ if (!pp->mpp)
+ return 0;
+ }
if ((newstate != PATH_UP && newstate != PATH_GHOST &&
newstate != PATH_PENDING) && (pp->state == PATH_DELAYED)) {
/* If path state become failed again cancel path delay state */
@@ -2752,12 +2767,17 @@ checkerloop (void *ap)
while (checker_state != CHECKER_FINISHED) {
unsigned int paths_checked = 0, i;
struct timespec chk_start_time;
+ struct multipath *mpp;
pthread_cleanup_push(cleanup_lock, &vecs->lock);
lock(&vecs->lock);
pthread_testcancel();
+ vector_foreach_slot(vecs->mpvec, mpp, i)
+ mpp->is_checked = false;
get_monotonic_time(&chk_start_time);
if (checker_state == CHECKER_STARTING) {
+ vector_foreach_slot(vecs->mpvec, mpp, i)
+ check_mpp(vecs, mpp, ticks);
vector_foreach_slot(vecs->pathvec, pp, i)
pp->is_checked = false;
checker_state = CHECKER_RUNNING;
--
2.45.0
^ permalink raw reply related [flat|nested] 58+ messages in thread* Re: [PATCH 12/22] multipathd: adjust when mpp is synced with the kernel
2024-07-13 6:04 ` [PATCH 12/22] multipathd: adjust when mpp is synced with the kernel Benjamin Marzinski
@ 2024-07-15 16:23 ` Martin Wilck
2024-07-16 15:14 ` Benjamin Marzinski
0 siblings, 1 reply; 58+ messages in thread
From: Martin Wilck @ 2024-07-15 16:23 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Sat, 2024-07-13 at 02:04 -0400, Benjamin Marzinski wrote:
> Move the code to sync the mpp device state into a helper function and
> add a counter to to make sure that the device is synced at least once
> every max_checkint secs. This makes sure that multipath devices with
> no
> paths will still get synced with the kernel. Also, if multiple paths
> are checked in the same loop, the multipath device will only be
> synced
> with the kernel once, since every time the mpp is synced in any code
> path, mpp->sync_tick is reset.
>
> The code still syncs the mpp before updating the path state for two
> main reasons.
>
> 1. Sometimes multipathd leaves the mpp with a garbage state. Future
> patches will fix most of these cases, but the code intentially
> does not remove the mpp is resyncing fails while checking paths.
> But this does leave the mpp with a garbage state.
>
> 2. The kernel chages the multipath state independently of multipathd.
> If
> the kernel fails a path, a uevent will arrive shortly. But the
> kernel
> doesn't provide any notification when it switches the active
> path group or if it ends up picking a different one than
> multipathd
> selected. Multipathd needs to know the actual current pathgroup to
> know when it should be switching them.
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
> libmultipath/configure.c | 1 +
> libmultipath/structs.h | 2 ++
> libmultipath/structs_vec.c | 5 +++
> multipathd/main.c | 64 +++++++++++++++++++++++++-----------
> --
> 4 files changed, 50 insertions(+), 22 deletions(-)
>
>
> diff --git a/multipathd/main.c b/multipathd/main.c
> index fbd253ca..179fec24 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -2342,6 +2342,37 @@ check_path_state(struct path *pp)
> return newstate;
> }
>
> +static void
> +do_check_mpp(struct vectors * vecs, struct multipath *mpp)
> +{
> + int i, ret;
> + struct path *pp;
> +
> + mpp->is_checked = true;
> + ret = update_multipath_strings(mpp, vecs->pathvec);
> + if (ret != DMP_OK) {
> + condlog(1, "%s: %s", mpp->alias, ret ==
> DMP_NOT_FOUND ?
> + "device not found" :
> + "couldn't synchronize with kernel state");
> + vector_foreach_slot (mpp->paths, pp, i)
> + pp->dmstate = PSTATE_UNDEF;
> + return;
> + }
> + set_no_path_retry(mpp);
> +}
> +
> +static void
> +check_mpp(struct vectors * vecs, struct multipath *mpp, unsigned int
> ticks)
> +{
> + if (mpp->sync_tick)
> + mpp->sync_tick -= (mpp->sync_tick > ticks) ? ticks :
> + mpp->sync_tick;
> + if (mpp->sync_tick)
> + return;
> +
> + do_check_mpp(vecs, mpp);
> +}
> +
> /*
> * Returns '1' if the path has been checked and '0' otherwise
> */
> @@ -2356,7 +2387,6 @@ check_path (struct vectors * vecs, struct path
> * pp, unsigned int ticks)
> unsigned int checkint, max_checkint;
> struct config *conf;
> int marginal_pathgroups, marginal_changed = 0;
> - int ret;
> bool need_reload;
>
> if (pp->initialized == INIT_REMOVED)
> @@ -2395,26 +2425,6 @@ check_path (struct vectors * vecs, struct path
> * pp, unsigned int ticks)
> pp->tick = 1;
> return 0;
> }
> - /*
> - * Synchronize with kernel state
> - */
> - ret = update_multipath_strings(pp->mpp, vecs->pathvec);
> - if (ret != DMP_OK) {
> - if (ret == DMP_NOT_FOUND) {
> - /* multipath device missing. Likely removed
> */
> - condlog(1, "%s: multipath device '%s' not
> found",
> - pp->dev, pp->mpp ? pp->mpp->alias :
> "");
> - return 0;
> - } else
> - condlog(1, "%s: Couldn't synchronize with
> kernel state",
> - pp->dev);
> - pp->dmstate = PSTATE_UNDEF;
> - }
> - /* if update_multipath_strings orphaned the path, quit early
> */
> - if (!pp->mpp)
> - return 0;
> - set_no_path_retry(pp->mpp);
> -
> if (pp->recheck_wwid == RECHECK_WWID_ON &&
> (newstate == PATH_UP || newstate == PATH_GHOST) &&
> ((pp->state != PATH_UP && pp->state != PATH_GHOST) ||
> @@ -2424,7 +2434,12 @@ check_path (struct vectors * vecs, struct path
> * pp, unsigned int ticks)
> handle_path_wwid_change(pp, vecs);
> return 0;
> }
> -
> + if (!pp->mpp->is_checked) {
> + do_check_mpp(vecs, pp->mpp);
> + /* if update_multipath_strings orphaned the path,
> quit early */
> + if (!pp->mpp)
> + return 0;
> + }
> if ((newstate != PATH_UP && newstate != PATH_GHOST &&
> newstate != PATH_PENDING) && (pp->state ==
> PATH_DELAYED)) {
> /* If path state become failed again cancel path
> delay state */
> @@ -2752,12 +2767,17 @@ checkerloop (void *ap)
> while (checker_state != CHECKER_FINISHED) {
> unsigned int paths_checked = 0, i;
> struct timespec chk_start_time;
> + struct multipath *mpp;
>
> pthread_cleanup_push(cleanup_lock, &vecs-
> >lock);
> lock(&vecs->lock);
> pthread_testcancel();
> + vector_foreach_slot(vecs->mpvec, mpp, i)
> + mpp->is_checked = false;
Why is this not done inside the "if (checker_state == CHECKER_STARTING)
code path?
Martin
> get_monotonic_time(&chk_start_time);
> if (checker_state == CHECKER_STARTING) {
> + vector_foreach_slot(vecs->mpvec,
> mpp, i)
> + check_mpp(vecs, mpp, ticks);
> vector_foreach_slot(vecs->pathvec,
> pp, i)
> pp->is_checked = false;
> checker_state = CHECKER_RUNNING;
^ permalink raw reply [flat|nested] 58+ messages in thread* Re: [PATCH 12/22] multipathd: adjust when mpp is synced with the kernel
2024-07-15 16:23 ` Martin Wilck
@ 2024-07-16 15:14 ` Benjamin Marzinski
0 siblings, 0 replies; 58+ messages in thread
From: Benjamin Marzinski @ 2024-07-16 15:14 UTC (permalink / raw)
To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development
On Mon, Jul 15, 2024 at 06:23:53PM +0200, Martin Wilck wrote:
> On Sat, 2024-07-13 at 02:04 -0400, Benjamin Marzinski wrote:
> > Move the code to sync the mpp device state into a helper function and
> > add a counter to to make sure that the device is synced at least once
> > every max_checkint secs. This makes sure that multipath devices with
> > no
> > paths will still get synced with the kernel. Also, if multiple paths
> > are checked in the same loop, the multipath device will only be
> > synced
> > with the kernel once, since every time the mpp is synced in any code
> > path, mpp->sync_tick is reset.
> >
> > The code still syncs the mpp before updating the path state for two
> > main reasons.
> >
> > 1. Sometimes multipathd leaves the mpp with a garbage state. Future
> > patches will fix most of these cases, but the code intentially
> > does not remove the mpp is resyncing fails while checking paths.
> > But this does leave the mpp with a garbage state.
> >
> > 2. The kernel chages the multipath state independently of multipathd.
> > If
> > the kernel fails a path, a uevent will arrive shortly. But the
> > kernel
> > doesn't provide any notification when it switches the active
> > path group or if it ends up picking a different one than
> > multipathd
> > selected. Multipathd needs to know the actual current pathgroup to
> > know when it should be switching them.
> >
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> > libmultipath/configure.c | 1 +
> > libmultipath/structs.h | 2 ++
> > libmultipath/structs_vec.c | 5 +++
> > multipathd/main.c | 64 +++++++++++++++++++++++++-----------
> > --
> > 4 files changed, 50 insertions(+), 22 deletions(-)
> >
> >
> > diff --git a/multipathd/main.c b/multipathd/main.c
> > index fbd253ca..179fec24 100644
> > --- a/multipathd/main.c
> > +++ b/multipathd/main.c
> > @@ -2752,12 +2767,17 @@ checkerloop (void *ap)
> > while (checker_state != CHECKER_FINISHED) {
> > unsigned int paths_checked = 0, i;
> > struct timespec chk_start_time;
> > + struct multipath *mpp;
> >
> > pthread_cleanup_push(cleanup_lock, &vecs-
> > >lock);
> > lock(&vecs->lock);
> > pthread_testcancel();
> > + vector_foreach_slot(vecs->mpvec, mpp, i)
> > + mpp->is_checked = false;
>
> Why is this not done inside the "if (checker_state == CHECKER_STARTING)
> code path?
Since we slept here, there was more time for something to change in a
way where we needed to resync the state. Also, like I mention in the
comments, there are places where we fail in some function that is
updating the mpp state and don't resync it with the kernel. I'm fairly
sure (but not totally certain) future patches catch all of these except
for failing to update the state in the checker function itself. (We've
never removed the multipath device for failing to resync its state in
the check_path(), unlike failures when responding to an event or after
reloading the device. This makes sense to me. Since we're doing these
resyncs in check_path() routinely, not because we think something has
changed, it seems reasonable to assume that nothing has changed if we
fail to resync).
Clearing is_checked here doesn't guarantee that we will resync the mpp.
Only that we will resync if we need to check one of its paths, after the
interruption. When we switch to checking paths by mpp later, this will
never happen, except in corner cases.
-Ben
> Martin
>
>
> > get_monotonic_time(&chk_start_time);
> > if (checker_state == CHECKER_STARTING) {
> > + vector_foreach_slot(vecs->mpvec,
> > mpp, i)
> > + check_mpp(vecs, mpp, ticks);
> > vector_foreach_slot(vecs->pathvec,
> > pp, i)
> > pp->is_checked = false;
> > checker_state = CHECKER_RUNNING;
^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH 13/22] multipathd: resync map after setup_map in ev_remove_path
2024-07-13 6:04 [PATCH 00/22] path checker refactor and misc fixes Benjamin Marzinski
` (11 preceding siblings ...)
2024-07-13 6:04 ` [PATCH 12/22] multipathd: adjust when mpp is synced with the kernel Benjamin Marzinski
@ 2024-07-13 6:04 ` Benjamin Marzinski
2024-07-15 16:32 ` Martin Wilck
2024-07-13 6:04 ` [PATCH 14/22] multipathd: resync map after setup_map in resize_map Benjamin Marzinski
` (8 subsequent siblings)
21 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2024-07-13 6:04 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
In ev_remove_path() it was possible to exit after calling setup_map()
without resyncing the mpp state with the kernel. This meant that the
mpp state in multipathd might not match with the kernel state at all.
It's safe to exit before calling setup_map() if either wait_for_udev
or need_do_map is set. In both cases, setup_map() will later be called,
either by a uevent or by the calling function.
Once setup_map() has been called, setup_multipath() and sync_map_state()
are now always called, to make sure the mpp matches the kernel state.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
multipathd/main.c | 40 +++++++++++++++++++---------------------
1 file changed, 19 insertions(+), 21 deletions(-)
diff --git a/multipathd/main.c b/multipathd/main.c
index 179fec24..3c84c2a0 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1395,6 +1395,8 @@ ev_remove_path (struct path *pp, struct vectors * vecs, int need_do_map)
* avoid referring to the map of an orphaned path
*/
if ((mpp = pp->mpp)) {
+ char devt[BLK_DEV_SIZE];
+
/*
* Mark the path as removed. In case of success, we
* will delete it for good. Otherwise, it will be deleted
@@ -1428,12 +1430,6 @@ ev_remove_path (struct path *pp, struct vectors * vecs, int need_do_map)
flush_map_nopaths(mpp, vecs))
goto out;
- if (setup_map(mpp, ¶ms, vecs)) {
- condlog(0, "%s: failed to setup map for"
- " removal of path %s", mpp->alias, pp->dev);
- goto fail;
- }
-
if (mpp->wait_for_udev) {
mpp->wait_for_udev = 2;
retval = REMOVE_PATH_DELAY;
@@ -1444,33 +1440,35 @@ ev_remove_path (struct path *pp, struct vectors * vecs, int need_do_map)
retval = REMOVE_PATH_DELAY;
goto out;
}
+
+ if (setup_map(mpp, ¶ms, vecs)) {
+ condlog(0, "%s: failed to setup map for"
+ " removal of path %s", mpp->alias, pp->dev);
+ goto fail;
+ }
/*
* reload the map
*/
mpp->action = ACT_RELOAD;
+ strlcpy(devt, pp->dev_t, sizeof(devt));
if (domap(mpp, params, 1) == DOMAP_FAIL) {
condlog(0, "%s: failed in domap for "
"removal of path %s",
mpp->alias, pp->dev);
retval = REMOVE_PATH_FAILURE;
- } else {
- /*
- * update our state from kernel
- */
- char devt[BLK_DEV_SIZE];
-
- strlcpy(devt, pp->dev_t, sizeof(devt));
-
- /* setup_multipath will free the path
- * regardless of whether it succeeds or
- * fails */
- if (setup_multipath(vecs, mpp))
- return REMOVE_PATH_MAP_ERROR;
- sync_map_state(mpp);
+ }
+ /*
+ * update mpp state from kernel even if domap failed.
+ * If the path was removed from the mpp, setup_multipath will
+ * free the path regardless of whether it succeeds or fails
+ */
+ if (setup_multipath(vecs, mpp))
+ return REMOVE_PATH_MAP_ERROR;
+ sync_map_state(mpp);
+ if (retval == REMOVE_PATH_SUCCESS)
condlog(2, "%s: path removed from map %s",
devt, mpp->alias);
- }
} else {
/* mpp == NULL */
if ((i = find_slot(vecs->pathvec, (void *)pp)) != -1)
--
2.45.0
^ permalink raw reply related [flat|nested] 58+ messages in thread* Re: [PATCH 13/22] multipathd: resync map after setup_map in ev_remove_path
2024-07-13 6:04 ` [PATCH 13/22] multipathd: resync map after setup_map in ev_remove_path Benjamin Marzinski
@ 2024-07-15 16:32 ` Martin Wilck
2024-07-16 15:15 ` Benjamin Marzinski
0 siblings, 1 reply; 58+ messages in thread
From: Martin Wilck @ 2024-07-15 16:32 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Sat, 2024-07-13 at 02:04 -0400, Benjamin Marzinski wrote:
> In ev_remove_path() it was possible to exit after calling setup_map()
> without resyncing the mpp state with the kernel. This meant that the
> mpp state in multipathd might not match with the kernel state at all.
>
> It's safe to exit before calling setup_map() if either wait_for_udev
> or need_do_map is set. In both cases, setup_map() will later be
> called,
> either by a uevent or by the calling function.
>
> Once setup_map() has been called, setup_multipath() and
> sync_map_state()
> are now always called, to make sure the mpp matches the kernel state.
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
> multipathd/main.c | 40 +++++++++++++++++++---------------------
> 1 file changed, 19 insertions(+), 21 deletions(-)
>
> diff --git a/multipathd/main.c b/multipathd/main.c
> index 179fec24..3c84c2a0 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -1395,6 +1395,8 @@ ev_remove_path (struct path *pp, struct vectors
> * vecs, int need_do_map)
> * avoid referring to the map of an orphaned path
> */
> if ((mpp = pp->mpp)) {
> + char devt[BLK_DEV_SIZE];
> +
> /*
> * Mark the path as removed. In case of success, we
> * will delete it for good. Otherwise, it will be
> deleted
> @@ -1428,12 +1430,6 @@ ev_remove_path (struct path *pp, struct
> vectors * vecs, int need_do_map)
> flush_map_nopaths(mpp, vecs))
> goto out;
>
> - if (setup_map(mpp, ¶ms, vecs)) {
> - condlog(0, "%s: failed to setup map for"
> - " removal of path %s", mpp->alias,
> pp->dev);
> - goto fail;
> - }
> -
> if (mpp->wait_for_udev) {
> mpp->wait_for_udev = 2;
> retval = REMOVE_PATH_DELAY;
> @@ -1444,33 +1440,35 @@ ev_remove_path (struct path *pp, struct
> vectors * vecs, int need_do_map)
> retval = REMOVE_PATH_DELAY;
> goto out;
> }
> +
> + if (setup_map(mpp, ¶ms, vecs)) {
> + condlog(0, "%s: failed to setup map for"
> + " removal of path %s", mpp->alias,
> pp->dev);
> + goto fail;
> + }
> /*
> * reload the map
> */
> mpp->action = ACT_RELOAD;
> + strlcpy(devt, pp->dev_t, sizeof(devt));
Move this statement further down, directly before the call to
setup_multipath(), perhaps? It's only for the condlog anyway.
Except for that, LGTM.
Martin
> if (domap(mpp, params, 1) == DOMAP_FAIL) {
> condlog(0, "%s: failed in domap for "
> "removal of path %s",
> mpp->alias, pp->dev);
> retval = REMOVE_PATH_FAILURE;
> - } else {
> - /*
> - * update our state from kernel
> - */
> - char devt[BLK_DEV_SIZE];
> -
> - strlcpy(devt, pp->dev_t, sizeof(devt));
> -
> - /* setup_multipath will free the path
> - * regardless of whether it succeeds or
> - * fails */
> - if (setup_multipath(vecs, mpp))
> - return REMOVE_PATH_MAP_ERROR;
> - sync_map_state(mpp);
> + }
> + /*
> + * update mpp state from kernel even if domap
> failed.
> + * If the path was removed from the mpp,
> setup_multipath will
> + * free the path regardless of whether it succeeds
> or fails
> + */
> + if (setup_multipath(vecs, mpp))
> + return REMOVE_PATH_MAP_ERROR;
> + sync_map_state(mpp);
>
> + if (retval == REMOVE_PATH_SUCCESS)
> condlog(2, "%s: path removed from map %s",
> devt, mpp->alias);
> - }
> } else {
> /* mpp == NULL */
> if ((i = find_slot(vecs->pathvec, (void *)pp)) != -
> 1)
^ permalink raw reply [flat|nested] 58+ messages in thread* Re: [PATCH 13/22] multipathd: resync map after setup_map in ev_remove_path
2024-07-15 16:32 ` Martin Wilck
@ 2024-07-16 15:15 ` Benjamin Marzinski
0 siblings, 0 replies; 58+ messages in thread
From: Benjamin Marzinski @ 2024-07-16 15:15 UTC (permalink / raw)
To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development
On Mon, Jul 15, 2024 at 06:32:22PM +0200, Martin Wilck wrote:
> On Sat, 2024-07-13 at 02:04 -0400, Benjamin Marzinski wrote:
> > In ev_remove_path() it was possible to exit after calling setup_map()
> > without resyncing the mpp state with the kernel. This meant that the
> > mpp state in multipathd might not match with the kernel state at all.
> >
> > It's safe to exit before calling setup_map() if either wait_for_udev
> > or need_do_map is set. In both cases, setup_map() will later be
> > called,
> > either by a uevent or by the calling function.
> >
> > Once setup_map() has been called, setup_multipath() and
> > sync_map_state()
> > are now always called, to make sure the mpp matches the kernel state.
> >
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> > multipathd/main.c | 40 +++++++++++++++++++---------------------
> > 1 file changed, 19 insertions(+), 21 deletions(-)
> >
> > diff --git a/multipathd/main.c b/multipathd/main.c
> > index 179fec24..3c84c2a0 100644
> > --- a/multipathd/main.c
> > +++ b/multipathd/main.c
> > @@ -1395,6 +1395,8 @@ ev_remove_path (struct path *pp, struct vectors
> > * vecs, int need_do_map)
> > * avoid referring to the map of an orphaned path
> > */
> > if ((mpp = pp->mpp)) {
> > + char devt[BLK_DEV_SIZE];
> > +
> > /*
> > * Mark the path as removed. In case of success, we
> > * will delete it for good. Otherwise, it will be
> > deleted
> > @@ -1428,12 +1430,6 @@ ev_remove_path (struct path *pp, struct
> > vectors * vecs, int need_do_map)
> > flush_map_nopaths(mpp, vecs))
> > goto out;
> >
> > - if (setup_map(mpp, ¶ms, vecs)) {
> > - condlog(0, "%s: failed to setup map for"
> > - " removal of path %s", mpp->alias,
> > pp->dev);
> > - goto fail;
> > - }
> > -
> > if (mpp->wait_for_udev) {
> > mpp->wait_for_udev = 2;
> > retval = REMOVE_PATH_DELAY;
> > @@ -1444,33 +1440,35 @@ ev_remove_path (struct path *pp, struct
> > vectors * vecs, int need_do_map)
> > retval = REMOVE_PATH_DELAY;
> > goto out;
> > }
> > +
> > + if (setup_map(mpp, ¶ms, vecs)) {
> > + condlog(0, "%s: failed to setup map for"
> > + " removal of path %s", mpp->alias,
> > pp->dev);
> > + goto fail;
> > + }
> > /*
> > * reload the map
> > */
> > mpp->action = ACT_RELOAD;
> > + strlcpy(devt, pp->dev_t, sizeof(devt));
>
> Move this statement further down, directly before the call to
> setup_multipath(), perhaps? It's only for the condlog anyway.
>
Sure.
> Except for that, LGTM.
>
> Martin
>
>
> > if (domap(mpp, params, 1) == DOMAP_FAIL) {
> > condlog(0, "%s: failed in domap for "
> > "removal of path %s",
> > mpp->alias, pp->dev);
> > retval = REMOVE_PATH_FAILURE;
> > - } else {
> > - /*
> > - * update our state from kernel
> > - */
> > - char devt[BLK_DEV_SIZE];
> > -
> > - strlcpy(devt, pp->dev_t, sizeof(devt));
> > -
> > - /* setup_multipath will free the path
> > - * regardless of whether it succeeds or
> > - * fails */
> > - if (setup_multipath(vecs, mpp))
> > - return REMOVE_PATH_MAP_ERROR;
> > - sync_map_state(mpp);
> > + }
> > + /*
> > + * update mpp state from kernel even if domap
> > failed.
> > + * If the path was removed from the mpp,
> > setup_multipath will
> > + * free the path regardless of whether it succeeds
> > or fails
> > + */
> > + if (setup_multipath(vecs, mpp))
> > + return REMOVE_PATH_MAP_ERROR;
> > + sync_map_state(mpp);
> >
> > + if (retval == REMOVE_PATH_SUCCESS)
> > condlog(2, "%s: path removed from map %s",
> > devt, mpp->alias);
> > - }
> > } else {
> > /* mpp == NULL */
> > if ((i = find_slot(vecs->pathvec, (void *)pp)) != -
> > 1)
^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH 14/22] multipathd: resync map after setup_map in resize_map
2024-07-13 6:04 [PATCH 00/22] path checker refactor and misc fixes Benjamin Marzinski
` (12 preceding siblings ...)
2024-07-13 6:04 ` [PATCH 13/22] multipathd: resync map after setup_map in ev_remove_path Benjamin Marzinski
@ 2024-07-13 6:04 ` Benjamin Marzinski
2024-07-13 6:04 ` [PATCH 15/22] multipathd: always resync map in reload_and_sync_map Benjamin Marzinski
` (7 subsequent siblings)
21 siblings, 0 replies; 58+ messages in thread
From: Benjamin Marzinski @ 2024-07-13 6:04 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
In resize_map() it was possible to exit after calling setup_map()
without resyncing the mpp state with the kernel. This meant that the mpp
state in multipathd might not match with the kernel state at all.
Once setup_map() has been called, setup_multipath() and sync_map_state()
are now always called, to make sure the mpp matches the kernel state.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
multipathd/main.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/multipathd/main.c b/multipathd/main.c
index 3c84c2a0..c9cb9ac8 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1534,6 +1534,7 @@ needs_ro_update(struct multipath *mpp, int ro)
int resize_map(struct multipath *mpp, unsigned long long size,
struct vectors * vecs)
{
+ int ret = 0;
char *params __attribute__((cleanup(cleanup_charp))) = NULL;
unsigned long long orig_size = mpp->size;
@@ -1543,7 +1544,8 @@ int resize_map(struct multipath *mpp, unsigned long long size,
condlog(0, "%s: failed to setup map for resize : %s",
mpp->alias, strerror(errno));
mpp->size = orig_size;
- return 1;
+ ret = 1;
+ goto out;
}
mpp->action = ACT_RESIZE;
mpp->force_udev_reload = 1;
@@ -1551,13 +1553,14 @@ int resize_map(struct multipath *mpp, unsigned long long size,
condlog(0, "%s: failed to resize map : %s", mpp->alias,
strerror(errno));
mpp->size = orig_size;
- return 1;
+ ret = 1;
}
+out:
if (setup_multipath(vecs, mpp) != 0)
return 2;
sync_map_state(mpp);
- return 0;
+ return ret;
}
static int
--
2.45.0
^ permalink raw reply related [flat|nested] 58+ messages in thread* [PATCH 15/22] multipathd: always resync map in reload_and_sync_map
2024-07-13 6:04 [PATCH 00/22] path checker refactor and misc fixes Benjamin Marzinski
` (13 preceding siblings ...)
2024-07-13 6:04 ` [PATCH 14/22] multipathd: resync map after setup_map in resize_map Benjamin Marzinski
@ 2024-07-13 6:04 ` Benjamin Marzinski
2024-07-15 16:40 ` Martin Wilck
2024-07-13 6:05 ` [PATCH 16/22] multipathd: correctly handle paths removed for a wwid change Benjamin Marzinski
` (6 subsequent siblings)
21 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2024-07-13 6:04 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
reload_and_sync_map() needs to always resync the map after calling
reload_map(), because the mpp state may no longer match the kernel
state if reload_map() fails.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
multipathd/main.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/multipathd/main.c b/multipathd/main.c
index c9cb9ac8..0f763e29 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2188,13 +2188,15 @@ static int reload_map(struct vectors *vecs, struct multipath *mpp,
int reload_and_sync_map(struct multipath *mpp, struct vectors *vecs)
{
+ int ret = 0;
+
if (reload_map(vecs, mpp, 1))
- return 1;
+ ret = 1;
if (setup_multipath(vecs, mpp) != 0)
return 2;
sync_map_state(mpp);
- return 0;
+ return ret;
}
static int check_path_reinstate_state(struct path * pp) {
--
2.45.0
^ permalink raw reply related [flat|nested] 58+ messages in thread* [PATCH 16/22] multipathd: correctly handle paths removed for a wwid change
2024-07-13 6:04 [PATCH 00/22] path checker refactor and misc fixes Benjamin Marzinski
` (14 preceding siblings ...)
2024-07-13 6:04 ` [PATCH 15/22] multipathd: always resync map in reload_and_sync_map Benjamin Marzinski
@ 2024-07-13 6:05 ` Benjamin Marzinski
2024-07-15 19:16 ` Martin Wilck
2024-07-13 6:05 ` [PATCH 17/22] multipathd: handle changed wwid when adding partial path Benjamin Marzinski
` (5 subsequent siblings)
21 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2024-07-13 6:05 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
If check_path() exitted because the path's wwid changed and it was
removed, checkerloop() wasn't decrementing the pathvec loop count.
This caused the next path to be skipped by the checker loop.
To solve this, make check_path() return -1 when a path is removed,
make handle_uninitialized_path() also remove the path if it was
blacklisted, and make checkerloop() just decrement the loop count
when a path returns -1.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
multipathd/main.c | 33 ++++++++++++++++++++-------------
multipathd/main.h | 2 +-
2 files changed, 21 insertions(+), 14 deletions(-)
diff --git a/multipathd/main.c b/multipathd/main.c
index 0f763e29..e32af693 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -974,20 +974,24 @@ rescan_path(struct udev_device *ud)
}
}
-void
+/* Returns true if the path was removed */
+bool
handle_path_wwid_change(struct path *pp, struct vectors *vecs)
{
struct udev_device *udd;
static const char add[] = "add";
ssize_t ret;
char dev[FILE_NAME_SIZE];
+ bool removed = false;
if (!pp || !pp->udev)
- return;
+ return removed;
strlcpy(dev, pp->dev, sizeof(dev));
udd = udev_device_ref(pp->udev);
- if (!(ev_remove_path(pp, vecs, 1) & REMOVE_PATH_SUCCESS) && pp->mpp) {
+ if (ev_remove_path(pp, vecs, 1) & REMOVE_PATH_SUCCESS) {
+ removed = true;
+ } else if (pp->mpp) {
pp->dmstate = PSTATE_FAILED;
dm_fail_path(pp->mpp->alias, pp->dev_t);
}
@@ -997,6 +1001,7 @@ handle_path_wwid_change(struct path *pp, struct vectors *vecs)
if (ret != sizeof(add) - 1)
log_sysfs_attr_set_value(1, ret,
"%s: failed to trigger add event", dev);
+ return removed;
}
bool
@@ -2377,7 +2382,8 @@ check_mpp(struct vectors * vecs, struct multipath *mpp, unsigned int ticks)
}
/*
- * Returns '1' if the path has been checked and '0' otherwise
+ * Returns '1' if the path has been checked, -1 if the path was removed,
+ * and '0' otherwise
*/
static int
check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
@@ -2434,8 +2440,7 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
pp->dmstate == PSTATE_FAILED) &&
check_path_wwid_change(pp)) {
condlog(0, "%s: path wwid change detected. Removing", pp->dev);
- handle_path_wwid_change(pp, vecs);
- return 0;
+ return handle_path_wwid_change(pp, vecs)? -1 : 0;
}
if (!pp->mpp->is_checked) {
do_check_mpp(vecs, pp->mpp);
@@ -2635,7 +2640,7 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
}
/*
- * Returns -1 if the path was blacklisted, and 0 otherwise
+ * Returns -1 if the path was removed, and 0 otherwise
*/
static int
handle_uninitialized_path(struct vectors * vecs, struct path * pp,
@@ -2706,6 +2711,12 @@ handle_uninitialized_path(struct vectors * vecs, struct path * pp,
ev_add_path(pp, vecs, 1);
pp->tick = 1;
} else if (ret == PATHINFO_SKIPPED) {
+ int i;
+
+ condlog(1, "%s: path blacklisted. removing", pp->dev);
+ if ((i = find_slot(vecs->pathvec, (void *)pp)) != -1)
+ vector_del_slot(vecs->pathvec, i);
+ free_path(pp);
return -1;
}
}
@@ -2794,13 +2805,9 @@ checkerloop (void *ap)
else
rc = handle_uninitialized_path(vecs, pp,
ticks);
- if (rc < 0) {
- condlog(1, "%s: check_path() failed, removing",
- pp->dev);
- vector_del_slot(vecs->pathvec, i);
- free_path(pp);
+ if (rc < 0)
i--;
- } else
+ else
num_paths += rc;
if (++paths_checked % 128 == 0 &&
(lock_has_waiters(&vecs->lock) ||
diff --git a/multipathd/main.h b/multipathd/main.h
index 4fcd6402..7aa93ca3 100644
--- a/multipathd/main.h
+++ b/multipathd/main.h
@@ -47,7 +47,7 @@ int setup_multipath(struct vectors * vecs, struct multipath * mpp);
int update_multipath(struct vectors *vecs, char *mapname);
int reload_and_sync_map(struct multipath *mpp, struct vectors *vecs);
-void handle_path_wwid_change(struct path *pp, struct vectors *vecs);
+bool handle_path_wwid_change(struct path *pp, struct vectors *vecs);
bool check_path_wwid_change(struct path *pp);
int finish_path_init(struct path *pp, struct vectors * vecs);
int resize_map(struct multipath *mpp, unsigned long long size,
--
2.45.0
^ permalink raw reply related [flat|nested] 58+ messages in thread* Re: [PATCH 16/22] multipathd: correctly handle paths removed for a wwid change
2024-07-13 6:05 ` [PATCH 16/22] multipathd: correctly handle paths removed for a wwid change Benjamin Marzinski
@ 2024-07-15 19:16 ` Martin Wilck
2024-07-16 15:16 ` Benjamin Marzinski
0 siblings, 1 reply; 58+ messages in thread
From: Martin Wilck @ 2024-07-15 19:16 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Sat, 2024-07-13 at 02:05 -0400, Benjamin Marzinski wrote:
> If check_path() exitted because the path's wwid changed and it was
> removed, checkerloop() wasn't decrementing the pathvec loop count.
> This caused the next path to be skipped by the checker loop.
>
> To solve this, make check_path() return -1 when a path is removed,
> make handle_uninitialized_path() also remove the path if it was
> blacklisted, and make checkerloop() just decrement the loop count
> when a path returns -1.
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Would you mind making these return values symbolic? Other than that,
LGTM.
Martin
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 16/22] multipathd: correctly handle paths removed for a wwid change
2024-07-15 19:16 ` Martin Wilck
@ 2024-07-16 15:16 ` Benjamin Marzinski
0 siblings, 0 replies; 58+ messages in thread
From: Benjamin Marzinski @ 2024-07-16 15:16 UTC (permalink / raw)
To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development
On Mon, Jul 15, 2024 at 09:16:32PM +0200, Martin Wilck wrote:
> On Sat, 2024-07-13 at 02:05 -0400, Benjamin Marzinski wrote:
> > If check_path() exitted because the path's wwid changed and it was
> > removed, checkerloop() wasn't decrementing the pathvec loop count.
> > This caused the next path to be skipped by the checker loop.
> >
> > To solve this, make check_path() return -1 when a path is removed,
> > make handle_uninitialized_path() also remove the path if it was
> > blacklisted, and make checkerloop() just decrement the loop count
> > when a path returns -1.
> >
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
>
> Would you mind making these return values symbolic? Other than that,
> LGTM.
Yeah, I can do that.
>
> Martin
^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH 17/22] multipathd: handle changed wwid when adding partial path
2024-07-13 6:04 [PATCH 00/22] path checker refactor and misc fixes Benjamin Marzinski
` (15 preceding siblings ...)
2024-07-13 6:05 ` [PATCH 16/22] multipathd: correctly handle paths removed for a wwid change Benjamin Marzinski
@ 2024-07-13 6:05 ` Benjamin Marzinski
2024-07-15 19:40 ` Martin Wilck
2024-07-13 6:05 ` [PATCH 18/22] multipathd: don't read conf->checkint twice in check_path Benjamin Marzinski
` (4 subsequent siblings)
21 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2024-07-13 6:05 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
If multipathd noticed that the WWID didn't match the device when adding
a partial path, but removing it failed, multipathd wasn't disabling the
path. Instead of calling handle_path_wwid_change(), which doesn't make
much sense when multipathd is expecting a uevent, just manually disable
the path if ev_remove_path() fails.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
multipathd/cli_handlers.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/multipathd/cli_handlers.c b/multipathd/cli_handlers.c
index 0643be15..5d8ba3a6 100644
--- a/multipathd/cli_handlers.c
+++ b/multipathd/cli_handlers.c
@@ -540,7 +540,11 @@ add_partial_path(struct path *pp, struct vectors *vecs)
if (strlen(wwid) && strncmp(wwid, pp->wwid, WWID_SIZE) != 0) {
condlog(0, "%s: path wwid changed from '%s' to '%s'. removing",
pp->dev, wwid, pp->wwid);
- ev_remove_path(pp, vecs, 1);
+ if (!(ev_remove_path(pp, vecs, 1) & REMOVE_PATH_SUCCESS) &&
+ pp->mpp) {
+ pp->dmstate = PSTATE_FAILED;
+ dm_fail_path(pp->mpp->alias, pp->dev_t);
+ }
udev_device_unref(udd);
return -1;
}
--
2.45.0
^ permalink raw reply related [flat|nested] 58+ messages in thread* Re: [PATCH 17/22] multipathd: handle changed wwid when adding partial path
2024-07-13 6:05 ` [PATCH 17/22] multipathd: handle changed wwid when adding partial path Benjamin Marzinski
@ 2024-07-15 19:40 ` Martin Wilck
2024-07-16 15:45 ` Benjamin Marzinski
0 siblings, 1 reply; 58+ messages in thread
From: Martin Wilck @ 2024-07-15 19:40 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Sat, 2024-07-13 at 02:05 -0400, Benjamin Marzinski wrote:
> If multipathd noticed that the WWID didn't match the device when
> adding
> a partial path, but removing it failed, multipathd wasn't disabling
> the
> path. Instead of calling handle_path_wwid_change(), which doesn't
> make
> much sense when multipathd is expecting a uevent, just manually
> disable
> the path if ev_remove_path() fails.
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
> multipathd/cli_handlers.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/multipathd/cli_handlers.c b/multipathd/cli_handlers.c
> index 0643be15..5d8ba3a6 100644
> --- a/multipathd/cli_handlers.c
> +++ b/multipathd/cli_handlers.c
> @@ -540,7 +540,11 @@ add_partial_path(struct path *pp, struct vectors
> *vecs)
> if (strlen(wwid) && strncmp(wwid, pp->wwid, WWID_SIZE) != 0)
> {
> condlog(0, "%s: path wwid changed from '%s' to '%s'.
> removing",
> pp->dev, wwid, pp->wwid);
> - ev_remove_path(pp, vecs, 1);
> + if (!(ev_remove_path(pp, vecs, 1) &
> REMOVE_PATH_SUCCESS) &&
> + pp->mpp) {
> + pp->dmstate = PSTATE_FAILED;
> + dm_fail_path(pp->mpp->alias, pp->dev_t);
> + }
> udev_device_unref(udd);
> return -1;
> }
Is this sufficient? Can we be certain that this path won't be
reinstated by sync_map_state() later on? That's at least not
immediately obvious. Perhaps we should orphan the path?
Martin
^ permalink raw reply [flat|nested] 58+ messages in thread* Re: [PATCH 17/22] multipathd: handle changed wwid when adding partial path
2024-07-15 19:40 ` Martin Wilck
@ 2024-07-16 15:45 ` Benjamin Marzinski
0 siblings, 0 replies; 58+ messages in thread
From: Benjamin Marzinski @ 2024-07-16 15:45 UTC (permalink / raw)
To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development
On Mon, Jul 15, 2024 at 09:40:36PM +0200, Martin Wilck wrote:
> On Sat, 2024-07-13 at 02:05 -0400, Benjamin Marzinski wrote:
> > If multipathd noticed that the WWID didn't match the device when
> > adding
> > a partial path, but removing it failed, multipathd wasn't disabling
> > the
> > path. Instead of calling handle_path_wwid_change(), which doesn't
> > make
> > much sense when multipathd is expecting a uevent, just manually
> > disable
> > the path if ev_remove_path() fails.
> >
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> > multipathd/cli_handlers.c | 6 +++++-
> > 1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/multipathd/cli_handlers.c b/multipathd/cli_handlers.c
> > index 0643be15..5d8ba3a6 100644
> > --- a/multipathd/cli_handlers.c
> > +++ b/multipathd/cli_handlers.c
> > @@ -540,7 +540,11 @@ add_partial_path(struct path *pp, struct vectors
> > *vecs)
> > if (strlen(wwid) && strncmp(wwid, pp->wwid, WWID_SIZE) != 0)
> > {
> > condlog(0, "%s: path wwid changed from '%s' to '%s'.
> > removing",
> > pp->dev, wwid, pp->wwid);
> > - ev_remove_path(pp, vecs, 1);
> > + if (!(ev_remove_path(pp, vecs, 1) &
> > REMOVE_PATH_SUCCESS) &&
> > + pp->mpp) {
> > + pp->dmstate = PSTATE_FAILED;
> > + dm_fail_path(pp->mpp->alias, pp->dev_t);
> > + }
> > udev_device_unref(udd);
> > return -1;
> > }
>
>
> Is this sufficient? Can we be certain that this path won't be
> reinstated by sync_map_state() later on? That's at least not
> immediately obvious. Perhaps we should orphan the path?
We can't orphan the path completely. That's why ev_remove_path() calls
set_path_removed(), which keeps track of the fact it's still in the mpp,
so that we can complete the removal once it is actually removed from the
multipath device. Otherwise multipathd could just add it to another
device, while it was still part of the old device.
But you are correct about sync_map_state(). Good catch. We usually never
need to worry about sync_map_state() restoring paths in INIT_REMOVED,
since they really have been removed. In this case we do. We should
probably be setting pp->state to PATH_UNCHECKED in orphan_path() anyway,
which will keep sync_map_state() from restoring it. Since the path is
in INIT_REMOVED, check_path will never update it's state. But
sync_map_state() should also be ignoring paths in INIT_REMOVED. I vote
we do both.
-Ben
>
> Martin
^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH 18/22] multipathd: don't read conf->checkint twice in check_path
2024-07-13 6:04 [PATCH 00/22] path checker refactor and misc fixes Benjamin Marzinski
` (16 preceding siblings ...)
2024-07-13 6:05 ` [PATCH 17/22] multipathd: handle changed wwid when adding partial path Benjamin Marzinski
@ 2024-07-13 6:05 ` Benjamin Marzinski
2024-07-15 19:41 ` Martin Wilck
2024-07-13 6:05 ` [PATCH 19/22] multipathd: make multipath devices manage their path check times Benjamin Marzinski
` (3 subsequent siblings)
21 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2024-07-13 6:05 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
check_path() already has saved the value of conf->checkint in the local
variable checkint, so there's no reason to read it again.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
multipathd/main.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/multipathd/main.c b/multipathd/main.c
index e32af693..daf668eb 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2510,9 +2510,7 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
* upon state change, reset the checkint
* to the shortest delay
*/
- conf = get_multipath_config();
- pp->checkint = conf->checkint;
- put_multipath_config(conf);
+ pp->checkint = checkint;
if (newstate != PATH_UP && newstate != PATH_GHOST) {
/*
--
2.45.0
^ permalink raw reply related [flat|nested] 58+ messages in thread* [PATCH 19/22] multipathd: make multipath devices manage their path check times
2024-07-13 6:04 [PATCH 00/22] path checker refactor and misc fixes Benjamin Marzinski
` (17 preceding siblings ...)
2024-07-13 6:05 ` [PATCH 18/22] multipathd: don't read conf->checkint twice in check_path Benjamin Marzinski
@ 2024-07-13 6:05 ` Benjamin Marzinski
2024-07-15 20:49 ` Martin Wilck
2024-07-13 6:05 ` [PATCH 20/22] multipathd: factor out actual path checking loop from checkerloop Benjamin Marzinski
` (2 subsequent siblings)
21 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2024-07-13 6:05 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
multipathd's path checking can be very bursty, with all the paths
running their path checkers at the same time, and then all doing nothing
while waiting for the next check time. Alternatively, the paths in a
multipath device might all run their path checkers at a different time,
which can keep the multipath device from having a coherent view of the
state of all of its paths.
This patch makes all the paths of a multipath device converge to running
their checkers at the same time, and spreads out when this time is for
the different multipath devices.
To do this, the checking time is divided into adjustment intervals
(conf->adjust_int), so that the checkers run at some index within this
interval. conf->adjust_int is chosen so that it is at least twice as
long as conf->max_checkint, and is a multiple of all possible
pp->checkint values. This means that regardless of pp->checkint, the
path should always be checked on the same indexes, each adjustement
interval.
Each multipath device has a goal index. These are evenly spread out
between 0 and conf->max_checkint. Every conf->adjust_int seconds, each
multipath device should try to check all of its paths on its goal index.
If a path isn't checked on the goal index, then pp->tick for the next
check after the goal_idx will be decremented by one, to align it over
time.
In order for the path checkers to run every pp->checkint seconds,
multipathd needs to track how long a path check has been pending for,
and subtract that time from the number of ticks till the checker is run
again. If the checker has been pending for more that pp->checkint,
the path will be rechecked on the next tick after the checker returns.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
libmultipath/config.c | 13 ++++++
libmultipath/config.h | 1 +
libmultipath/structs.c | 1 +
libmultipath/structs.h | 1 +
multipathd/main.c | 90 ++++++++++++++++++++++++++++++++++--------
5 files changed, 89 insertions(+), 17 deletions(-)
diff --git a/libmultipath/config.c b/libmultipath/config.c
index 83fa7369..ca584372 100644
--- a/libmultipath/config.c
+++ b/libmultipath/config.c
@@ -982,6 +982,19 @@ int _init_config (const char *file, struct config *conf)
conf->checkint = conf->max_checkint;
condlog(3, "polling interval: %d, max: %d",
conf->checkint, conf->max_checkint);
+ /*
+ * make sure that that adjust_int is at least twice as large
+ * as checkint and is a multiple of all possible values of
+ * pp->checkint.
+ */
+ if (conf->max_checkint % conf->checkint == 0) {
+ conf->adjust_int = 2 * conf->max_checkint;
+ } else {
+ conf->adjust_int = conf->checkint;
+ while (2 * conf->adjust_int < conf->max_checkint)
+ conf->adjust_int *= 2;
+ conf->adjust_int *= conf->max_checkint;
+ }
if (conf->blist_devnode == NULL) {
conf->blist_devnode = vector_alloc();
diff --git a/libmultipath/config.h b/libmultipath/config.h
index 384193ab..800c0ca9 100644
--- a/libmultipath/config.h
+++ b/libmultipath/config.h
@@ -147,6 +147,7 @@ struct config {
int minio_rq;
unsigned int checkint;
unsigned int max_checkint;
+ unsigned int adjust_int;
bool use_watchdog;
int pgfailback;
int rr_weight;
diff --git a/libmultipath/structs.c b/libmultipath/structs.c
index 1583e001..472e1001 100644
--- a/libmultipath/structs.c
+++ b/libmultipath/structs.c
@@ -148,6 +148,7 @@ uninitialize_path(struct path *pp)
pp->dmstate = PSTATE_UNDEF;
pp->uid_attribute = NULL;
pp->checker_timeout = 0;
+ pp->pending_ticks = 0;
if (checker_selected(&pp->checker))
checker_put(&pp->checker);
diff --git a/libmultipath/structs.h b/libmultipath/structs.h
index 002eeae1..457d7836 100644
--- a/libmultipath/structs.h
+++ b/libmultipath/structs.h
@@ -360,6 +360,7 @@ struct path {
unsigned long long size;
unsigned int checkint;
unsigned int tick;
+ unsigned int pending_ticks;
int bus;
int offline;
int state;
diff --git a/multipathd/main.c b/multipathd/main.c
index daf668eb..42ccb92f 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2386,7 +2386,7 @@ check_mpp(struct vectors * vecs, struct multipath *mpp, unsigned int ticks)
* and '0' otherwise
*/
static int
-check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
+do_check_path (struct vectors * vecs, struct path * pp)
{
int newstate;
int new_path_up = 0;
@@ -2398,14 +2398,6 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
int marginal_pathgroups, marginal_changed = 0;
bool need_reload;
- if (pp->initialized == INIT_REMOVED)
- return 0;
-
- if (pp->tick)
- pp->tick -= (pp->tick > ticks) ? ticks : pp->tick;
- if (pp->tick)
- return 0; /* don't check this path yet */
-
conf = get_multipath_config();
checkint = conf->checkint;
max_checkint = conf->max_checkint;
@@ -2417,12 +2409,6 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
pp->checkint = checkint;
};
- /*
- * provision a next check soonest,
- * in case we exit abnormally from here
- */
- pp->tick = checkint;
-
newstate = check_path_state(pp);
if (newstate == PATH_WILD || newstate == PATH_UNCHECKED)
return 0;
@@ -2584,7 +2570,6 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
condlog(4, "%s: delay next check %is",
pp->dev_t, pp->checkint);
}
- pp->tick = pp->checkint;
}
}
else if (newstate != PATH_UP && newstate != PATH_GHOST) {
@@ -2637,6 +2622,76 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
return 1;
}
+static int
+check_path (struct vectors * vecs, struct path * pp, unsigned int ticks,
+ time_t start_secs)
+{
+ int r;
+ unsigned int adjust_int, max_checkint;
+ struct config *conf;
+ time_t next_idx, goal_idx;
+
+ if (pp->initialized == INIT_REMOVED)
+ return 0;
+
+ if (pp->tick)
+ pp->tick -= (pp->tick > ticks) ? ticks : pp->tick;
+ if (pp->tick)
+ return 0; /* don't check this path yet */
+
+ conf = get_multipath_config();
+ max_checkint = conf->max_checkint;
+ adjust_int = conf->adjust_int;
+ put_multipath_config(conf);
+
+ r = do_check_path(vecs, pp);
+
+ /*
+ * do_check_path() removed or orphaned the path.
+ */
+ if (r < 0 || !pp->mpp)
+ return r;
+
+ /*
+ * the path checker is pending
+ */
+ if (pp->tick != 0) {
+ pp->pending_ticks++;
+ return r;
+ }
+
+ /* schedule the next check */
+ pp->tick = pp->checkint;
+ if (pp->pending_ticks >= pp->tick)
+ pp->tick = 1;
+ else
+ pp->tick -= pp->pending_ticks;
+ pp->pending_ticks = 0;
+
+ if (pp->tick == 1)
+ return r;
+
+ /*
+ * every mpp has a goal_idx in the range of
+ * 0 <= goal_idx < conf->max_checkint
+ *
+ * The next check has an index, next_idx, in the range of
+ * 0 <= next_idx < conf->adjust_int
+ *
+ * Every adjust_int seconds, each multipath device should try to
+ * check all of its paths on its goal_idx. If a path isn't checked
+ * on the goal_idx, then pp->tick for the next check after the
+ * goal_idx will be decremented by one, to align it over time.
+ */
+ goal_idx = (find_slot(vecs->mpvec, pp->mpp)) *
+ max_checkint / VECTOR_SIZE(vecs->mpvec);
+ next_idx = (start_secs + pp->tick) % adjust_int;
+ if (next_idx > goal_idx && next_idx - goal_idx < pp->checkint)
+ pp->tick--;
+
+ return r;
+}
+
/*
* Returns -1 if the path was removed, and 0 otherwise
*/
@@ -2799,7 +2854,8 @@ checkerloop (void *ap)
continue;
pp->is_checked = true;
if (pp->mpp)
- rc = check_path(vecs, pp, ticks);
+ rc = check_path(vecs, pp, ticks,
+ chk_start_time.tv_sec);
else
rc = handle_uninitialized_path(vecs, pp,
ticks);
--
2.45.0
^ permalink raw reply related [flat|nested] 58+ messages in thread* Re: [PATCH 19/22] multipathd: make multipath devices manage their path check times
2024-07-13 6:05 ` [PATCH 19/22] multipathd: make multipath devices manage their path check times Benjamin Marzinski
@ 2024-07-15 20:49 ` Martin Wilck
2024-07-16 16:36 ` Martin Wilck
2024-07-16 18:23 ` Benjamin Marzinski
0 siblings, 2 replies; 58+ messages in thread
From: Martin Wilck @ 2024-07-15 20:49 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Sat, 2024-07-13 at 02:05 -0400, Benjamin Marzinski wrote:
> multipathd's path checking can be very bursty, with all the paths
> running their path checkers at the same time, and then all doing
> nothing
> while waiting for the next check time. Alternatively, the paths in a
> multipath device might all run their path checkers at a different
> time,
> which can keep the multipath device from having a coherent view of
> the
> state of all of its paths.
>
> This patch makes all the paths of a multipath device converge to
> running
> their checkers at the same time, and spreads out when this time is
> for
> the different multipath devices.
>
> To do this, the checking time is divided into adjustment intervals
> (conf->adjust_int), so that the checkers run at some index within
> this
> interval. conf->adjust_int is chosen so that it is at least twice as
> long as conf->max_checkint, and is a multiple of all possible
> pp->checkint values. This means that regardless of pp->checkint, the
> path should always be checked on the same indexes, each adjustement
> interval.
>
> Each multipath device has a goal index. These are evenly spread out
> between 0 and conf->max_checkint. Every conf->adjust_int seconds,
> each
> multipath device should try to check all of its paths on its goal
> index.
> If a path isn't checked on the goal index, then pp->tick for the next
> check after the goal_idx will be decremented by one, to align it over
> time.
>
> In order for the path checkers to run every pp->checkint seconds,
> multipathd needs to track how long a path check has been pending for,
> and subtract that time from the number of ticks till the checker is
> run
> again. If the checker has been pending for more that pp->checkint,
> the path will be rechecked on the next tick after the checker
> returns.
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Nice, thanks a lot. I have a few remarks below.
> ---
> libmultipath/config.c | 13 ++++++
> libmultipath/config.h | 1 +
> libmultipath/structs.c | 1 +
> libmultipath/structs.h | 1 +
> multipathd/main.c | 90 ++++++++++++++++++++++++++++++++++------
> --
> 5 files changed, 89 insertions(+), 17 deletions(-)
>
> diff --git a/libmultipath/config.c b/libmultipath/config.c
> index 83fa7369..ca584372 100644
> --- a/libmultipath/config.c
> +++ b/libmultipath/config.c
> @@ -982,6 +982,19 @@ int _init_config (const char *file, struct
> config *conf)
> conf->checkint = conf->max_checkint;
> condlog(3, "polling interval: %d, max: %d",
> conf->checkint, conf->max_checkint);
> + /*
> + * make sure that that adjust_int is at least twice as large
> + * as checkint and is a multiple of all possible values of
> + * pp->checkint.
> + */
> + if (conf->max_checkint % conf->checkint == 0) {
> + conf->adjust_int = 2 * conf->max_checkint;
> + } else {
> + conf->adjust_int = conf->checkint;
> + while (2 * conf->adjust_int < conf->max_checkint)
> + conf->adjust_int *= 2;
> + conf->adjust_int *= conf->max_checkint;
> + }
>
> if (conf->blist_devnode == NULL) {
> conf->blist_devnode = vector_alloc();
> diff --git a/libmultipath/config.h b/libmultipath/config.h
> index 384193ab..800c0ca9 100644
> --- a/libmultipath/config.h
> +++ b/libmultipath/config.h
> @@ -147,6 +147,7 @@ struct config {
> int minio_rq;
> unsigned int checkint;
> unsigned int max_checkint;
> + unsigned int adjust_int;
> bool use_watchdog;
> int pgfailback;
> int rr_weight;
> diff --git a/libmultipath/structs.c b/libmultipath/structs.c
> index 1583e001..472e1001 100644
> --- a/libmultipath/structs.c
> +++ b/libmultipath/structs.c
> @@ -148,6 +148,7 @@ uninitialize_path(struct path *pp)
> pp->dmstate = PSTATE_UNDEF;
> pp->uid_attribute = NULL;
> pp->checker_timeout = 0;
> + pp->pending_ticks = 0;
>
> if (checker_selected(&pp->checker))
> checker_put(&pp->checker);
> diff --git a/libmultipath/structs.h b/libmultipath/structs.h
> index 002eeae1..457d7836 100644
> --- a/libmultipath/structs.h
> +++ b/libmultipath/structs.h
> @@ -360,6 +360,7 @@ struct path {
> unsigned long long size;
> unsigned int checkint;
> unsigned int tick;
> + unsigned int pending_ticks;
> int bus;
> int offline;
> int state;
> diff --git a/multipathd/main.c b/multipathd/main.c
> index daf668eb..42ccb92f 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -2386,7 +2386,7 @@ check_mpp(struct vectors * vecs, struct
> multipath *mpp, unsigned int ticks)
> * and '0' otherwise
> */
> static int
> -check_path (struct vectors * vecs, struct path * pp, unsigned int
> ticks)
> +do_check_path (struct vectors * vecs, struct path * pp)
> {
> int newstate;
> int new_path_up = 0;
> @@ -2398,14 +2398,6 @@ check_path (struct vectors * vecs, struct path
> * pp, unsigned int ticks)
> int marginal_pathgroups, marginal_changed = 0;
> bool need_reload;
>
> - if (pp->initialized == INIT_REMOVED)
> - return 0;
> -
> - if (pp->tick)
> - pp->tick -= (pp->tick > ticks) ? ticks : pp->tick;
> - if (pp->tick)
> - return 0; /* don't check this path yet */
> -
> conf = get_multipath_config();
> checkint = conf->checkint;
> max_checkint = conf->max_checkint;
> @@ -2417,12 +2409,6 @@ check_path (struct vectors * vecs, struct path
> * pp, unsigned int ticks)
> pp->checkint = checkint;
> };
>
> - /*
> - * provision a next check soonest,
> - * in case we exit abnormally from here
> - */
> - pp->tick = checkint;
> -
> newstate = check_path_state(pp);
> if (newstate == PATH_WILD || newstate == PATH_UNCHECKED)
> return 0;
> @@ -2584,7 +2570,6 @@ check_path (struct vectors * vecs, struct path
> * pp, unsigned int ticks)
> condlog(4, "%s: delay next check
> %is",
> pp->dev_t, pp->checkint);
> }
> - pp->tick = pp->checkint;
> }
> }
> else if (newstate != PATH_UP && newstate != PATH_GHOST) {
> @@ -2637,6 +2622,76 @@ check_path (struct vectors * vecs, struct path
> * pp, unsigned int ticks)
> return 1;
> }
>
> +static int
> +check_path (struct vectors * vecs, struct path * pp, unsigned int
> ticks,
> + time_t start_secs)
> +{
> + int r;
> + unsigned int adjust_int, max_checkint;
> + struct config *conf;
> + time_t next_idx, goal_idx;
> +
> + if (pp->initialized == INIT_REMOVED)
> + return 0;
> +
> + if (pp->tick)
> + pp->tick -= (pp->tick > ticks) ? ticks : pp->tick;
> + if (pp->tick)
> + return 0; /* don't check this path yet */
> +
> + conf = get_multipath_config();
> + max_checkint = conf->max_checkint;
> + adjust_int = conf->adjust_int;
> + put_multipath_config(conf);
> +
> + r = do_check_path(vecs, pp);
> +
> + /*
> + * do_check_path() removed or orphaned the path.
> + */
> + if (r < 0 || !pp->mpp)
> + return r;
> +
> + /*
> + * the path checker is pending
> + */
> + if (pp->tick != 0) {
I'm not getting this logic. Please explain. Why is pp->tick != 0 at
this point equivalent to the checker being pending?
> + pp->pending_ticks++;
> + return r;
> + }
> +
> + /* schedule the next check */
> + pp->tick = pp->checkint;
> + if (pp->pending_ticks >= pp->tick)
> + pp->tick = 1;
> + else
> + pp->tick -= pp->pending_ticks;
> + pp->pending_ticks = 0;
> +
> + if (pp->tick == 1)
> + return r;
> +
> + /*
> + * every mpp has a goal_idx in the range of
> + * 0 <= goal_idx < conf->max_checkint
> + *
> + * The next check has an index, next_idx, in the range of
> + * 0 <= next_idx < conf->adjust_int
> + *
> + * Every adjust_int seconds, each multipath device should
> try to
> + * check all of its paths on its goal_idx. If a path isn't
> checked
> + * on the goal_idx, then pp->tick for the next check after
> the
> + * goal_idx will be decremented by one, to align it over
> time.
> + */
> + goal_idx = (find_slot(vecs->mpvec, pp->mpp)) *
> + max_checkint / VECTOR_SIZE(vecs->mpvec);
> + next_idx = (start_secs + pp->tick) % adjust_int;
> + if (next_idx > goal_idx && next_idx - goal_idx < pp-
> >checkint)
> + pp->tick--;
Hm. IMHO this calculation should be little simpler, as follows:
cur_idx = start_secs % adjust_int;
if ((goal_idx - cur_idx) % pp->checkint != 0)
pp->tick--;
My reasoning goes like this: If (goal_idx - cur_idx) % pp->checkint is
0, we need no adjustment, because (assuming checkint remains constant)
it will be checked at the next goal_idx. Otherwise, we check one tick
earlier, and will do so at the next check again, etc. This is similar
to your approach, but not exactly the same, AFAICS. In particular, I am
unsure about your "if (next_idx - goal_idx < pp->checkint)" condition.
If checkint grows toward max_checkint, it may take longer until this
algorithm converges, but sooner or later, it should.
Instead of stepping, we could also force the idx immediately, by doing
pp->tick -= (goal_idx - cur_idx) % pp->checkint;
which would be even simpler (the check intervals would be less evenly
spaced, but I don't think that would hurt us much).
Am I misguided here?
Regards
Martin
^ permalink raw reply [flat|nested] 58+ messages in thread* Re: [PATCH 19/22] multipathd: make multipath devices manage their path check times
2024-07-15 20:49 ` Martin Wilck
@ 2024-07-16 16:36 ` Martin Wilck
2024-07-16 18:26 ` Benjamin Marzinski
2024-07-16 18:23 ` Benjamin Marzinski
1 sibling, 1 reply; 58+ messages in thread
From: Martin Wilck @ 2024-07-16 16:36 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Mon, 2024-07-15 at 22:49 +0200, Martin Wilck wrote:
> On Sat, 2024-07-13 at 02:05 -0400, Benjamin Marzinski wrote:
> > multipathd's path checking can be very bursty, with all the paths
> > running their path checkers at the same time, and then all doing
> > nothing
> > while waiting for the next check time. Alternatively, the paths in
> > a
> > multipath device might all run their path checkers at a different
> > time,
> > which can keep the multipath device from having a coherent view of
> > the
> > state of all of its paths.
> >
> > This patch makes all the paths of a multipath device converge to
> > running
> > their checkers at the same time, and spreads out when this time is
> > for
> > the different multipath devices.
> >
> > To do this, the checking time is divided into adjustment intervals
> > (conf->adjust_int), so that the checkers run at some index within
> > this
> > interval. conf->adjust_int is chosen so that it is at least twice
> > as
> > long as conf->max_checkint, and is a multiple of all possible
> > pp->checkint values. This means that regardless of pp->checkint,
> > the
> > path should always be checked on the same indexes, each adjustement
> > interval.
> >
> > Each multipath device has a goal index. These are evenly spread out
> > between 0 and conf->max_checkint. Every conf->adjust_int seconds,
> > each
> > multipath device should try to check all of its paths on its goal
> > index.
> > If a path isn't checked on the goal index, then pp->tick for the
> > next
> > check after the goal_idx will be decremented by one, to align it
> > over
> > time.
> >
> > In order for the path checkers to run every pp->checkint seconds,
> > multipathd needs to track how long a path check has been pending
> > for,
> > and subtract that time from the number of ticks till the checker is
> > run
> > again. If the checker has been pending for more that pp->checkint,
> > the path will be rechecked on the next tick after the checker
> > returns.
> >
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
>
> Nice, thanks a lot. I have a few remarks below.
>
>
> > ---
> > libmultipath/config.c | 13 ++++++
> > libmultipath/config.h | 1 +
> > libmultipath/structs.c | 1 +
> > libmultipath/structs.h | 1 +
> > multipathd/main.c | 90 ++++++++++++++++++++++++++++++++++----
> > --
> > --
> > 5 files changed, 89 insertions(+), 17 deletions(-)
> >
> > diff --git a/libmultipath/config.c b/libmultipath/config.c
> > index 83fa7369..ca584372 100644
> > --- a/libmultipath/config.c
> > +++ b/libmultipath/config.c
> > @@ -982,6 +982,19 @@ int _init_config (const char *file, struct
> > config *conf)
> > conf->checkint = conf->max_checkint;
> > condlog(3, "polling interval: %d, max: %d",
> > conf->checkint, conf->max_checkint);
> > + /*
> > + * make sure that that adjust_int is at least twice as
> > large
> > + * as checkint and is a multiple of all possible values of
> > + * pp->checkint.
> > + */
> > + if (conf->max_checkint % conf->checkint == 0) {
> > + conf->adjust_int = 2 * conf->max_checkint;
> > + } else {
> > + conf->adjust_int = conf->checkint;
> > + while (2 * conf->adjust_int < conf->max_checkint)
> > + conf->adjust_int *= 2;
> > + conf->adjust_int *= conf->max_checkint;
> > + }
> >
> > if (conf->blist_devnode == NULL) {
> > conf->blist_devnode = vector_alloc();
> > diff --git a/libmultipath/config.h b/libmultipath/config.h
> > index 384193ab..800c0ca9 100644
> > --- a/libmultipath/config.h
> > +++ b/libmultipath/config.h
> > @@ -147,6 +147,7 @@ struct config {
> > int minio_rq;
> > unsigned int checkint;
> > unsigned int max_checkint;
> > + unsigned int adjust_int;
> > bool use_watchdog;
> > int pgfailback;
> > int rr_weight;
> > diff --git a/libmultipath/structs.c b/libmultipath/structs.c
> > index 1583e001..472e1001 100644
> > --- a/libmultipath/structs.c
> > +++ b/libmultipath/structs.c
> > @@ -148,6 +148,7 @@ uninitialize_path(struct path *pp)
> > pp->dmstate = PSTATE_UNDEF;
> > pp->uid_attribute = NULL;
> > pp->checker_timeout = 0;
> > + pp->pending_ticks = 0;
> >
> > if (checker_selected(&pp->checker))
> > checker_put(&pp->checker);
> > diff --git a/libmultipath/structs.h b/libmultipath/structs.h
> > index 002eeae1..457d7836 100644
> > --- a/libmultipath/structs.h
> > +++ b/libmultipath/structs.h
> > @@ -360,6 +360,7 @@ struct path {
> > unsigned long long size;
> > unsigned int checkint;
> > unsigned int tick;
> > + unsigned int pending_ticks;
> > int bus;
> > int offline;
> > int state;
> > diff --git a/multipathd/main.c b/multipathd/main.c
> > index daf668eb..42ccb92f 100644
> > --- a/multipathd/main.c
> > +++ b/multipathd/main.c
> > @@ -2386,7 +2386,7 @@ check_mpp(struct vectors * vecs, struct
> > multipath *mpp, unsigned int ticks)
> > * and '0' otherwise
> > */
> > static int
> > -check_path (struct vectors * vecs, struct path * pp, unsigned int
> > ticks)
> > +do_check_path (struct vectors * vecs, struct path * pp)
> > {
> > int newstate;
> > int new_path_up = 0;
> > @@ -2398,14 +2398,6 @@ check_path (struct vectors * vecs, struct
> > path
> > * pp, unsigned int ticks)
> > int marginal_pathgroups, marginal_changed = 0;
> > bool need_reload;
> >
> > - if (pp->initialized == INIT_REMOVED)
> > - return 0;
> > -
> > - if (pp->tick)
> > - pp->tick -= (pp->tick > ticks) ? ticks : pp->tick;
> > - if (pp->tick)
> > - return 0; /* don't check this path yet */
> > -
> > conf = get_multipath_config();
> > checkint = conf->checkint;
> > max_checkint = conf->max_checkint;
> > @@ -2417,12 +2409,6 @@ check_path (struct vectors * vecs, struct
> > path
> > * pp, unsigned int ticks)
> > pp->checkint = checkint;
> > };
> >
> > - /*
> > - * provision a next check soonest,
> > - * in case we exit abnormally from here
> > - */
> > - pp->tick = checkint;
> > -
> > newstate = check_path_state(pp);
> > if (newstate == PATH_WILD || newstate == PATH_UNCHECKED)
> > return 0;
> > @@ -2584,7 +2570,6 @@ check_path (struct vectors * vecs, struct
> > path
> > * pp, unsigned int ticks)
> > condlog(4, "%s: delay next check
> > %is",
> > pp->dev_t, pp->checkint);
> > }
> > - pp->tick = pp->checkint;
> > }
> > }
> > else if (newstate != PATH_UP && newstate != PATH_GHOST) {
> > @@ -2637,6 +2622,76 @@ check_path (struct vectors * vecs, struct
> > path
> > * pp, unsigned int ticks)
> > return 1;
> > }
> >
> > +static int
> > +check_path (struct vectors * vecs, struct path * pp, unsigned int
> > ticks,
> > + time_t start_secs)
> > +{
> > + int r;
> > + unsigned int adjust_int, max_checkint;
> > + struct config *conf;
> > + time_t next_idx, goal_idx;
> > +
> > + if (pp->initialized == INIT_REMOVED)
> > + return 0;
> > +
> > + if (pp->tick)
> > + pp->tick -= (pp->tick > ticks) ? ticks : pp->tick;
> > + if (pp->tick)
> > + return 0; /* don't check this path yet */
> > +
> > + conf = get_multipath_config();
> > + max_checkint = conf->max_checkint;
> > + adjust_int = conf->adjust_int;
> > + put_multipath_config(conf);
> > +
> > + r = do_check_path(vecs, pp);
> > +
> > + /*
> > + * do_check_path() removed or orphaned the path.
> > + */
> > + if (r < 0 || !pp->mpp)
> > + return r;
> > +
> > + /*
> > + * the path checker is pending
> > + */
> > + if (pp->tick != 0) {
>
> I'm not getting this logic. Please explain. Why is pp->tick != 0 at
> this point equivalent to the checker being pending?
>
> > + pp->pending_ticks++;
> > + return r;
> > + }
> > +
> > + /* schedule the next check */
> > + pp->tick = pp->checkint;
> > + if (pp->pending_ticks >= pp->tick)
> > + pp->tick = 1;
> > + else
> > + pp->tick -= pp->pending_ticks;
> > + pp->pending_ticks = 0;
> > +
> > + if (pp->tick == 1)
> > + return r;
> > +
> > + /*
> > + * every mpp has a goal_idx in the range of
> > + * 0 <= goal_idx < conf->max_checkint
> > + *
> > + * The next check has an index, next_idx, in the range of
> > + * 0 <= next_idx < conf->adjust_int
> > + *
> > + * Every adjust_int seconds, each multipath device should
> > try to
> > + * check all of its paths on its goal_idx. If a path isn't
> > checked
> > + * on the goal_idx, then pp->tick for the next check after
> > the
> > + * goal_idx will be decremented by one, to align it over
> > time.
> > + */
> > + goal_idx = (find_slot(vecs->mpvec, pp->mpp)) *
> > + max_checkint / VECTOR_SIZE(vecs->mpvec);
> > + next_idx = (start_secs + pp->tick) % adjust_int;
> > + if (next_idx > goal_idx && next_idx - goal_idx < pp-
> > > checkint)
> > + pp->tick--;
This code raises compilation errors on some architectures, see
https://github.com/openSUSE/multipath-tools/actions/runs/9942603306/job/27464440204
Martin
^ permalink raw reply [flat|nested] 58+ messages in thread* Re: [PATCH 19/22] multipathd: make multipath devices manage their path check times
2024-07-16 16:36 ` Martin Wilck
@ 2024-07-16 18:26 ` Benjamin Marzinski
0 siblings, 0 replies; 58+ messages in thread
From: Benjamin Marzinski @ 2024-07-16 18:26 UTC (permalink / raw)
To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development
On Tue, Jul 16, 2024 at 06:36:35PM +0200, Martin Wilck wrote:
> On Mon, 2024-07-15 at 22:49 +0200, Martin Wilck wrote:
> > On Sat, 2024-07-13 at 02:05 -0400, Benjamin Marzinski wrote:
> > > + next_idx = (start_secs + pp->tick) % adjust_int;
> > > + if (next_idx > goal_idx && next_idx - goal_idx < pp-
> > > > checkint)
> > > + pp->tick--;
>
> This code raises compilation errors on some architectures, see
> https://github.com/openSUSE/multipath-tools/actions/runs/9942603306/job/27464440204
>
> Martin
Thanks. Switching to your modulo idea for checking offset will avoid
this.
-Ben
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 19/22] multipathd: make multipath devices manage their path check times
2024-07-15 20:49 ` Martin Wilck
2024-07-16 16:36 ` Martin Wilck
@ 2024-07-16 18:23 ` Benjamin Marzinski
1 sibling, 0 replies; 58+ messages in thread
From: Benjamin Marzinski @ 2024-07-16 18:23 UTC (permalink / raw)
To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development
On Mon, Jul 15, 2024 at 10:49:52PM +0200, Martin Wilck wrote:
> On Sat, 2024-07-13 at 02:05 -0400, Benjamin Marzinski wrote:
> > multipathd's path checking can be very bursty, with all the paths
> > running their path checkers at the same time, and then all doing
> > nothing
> > while waiting for the next check time. Alternatively, the paths in a
> > multipath device might all run their path checkers at a different
> > time,
> > which can keep the multipath device from having a coherent view of
> > the
> > state of all of its paths.
> >
> > This patch makes all the paths of a multipath device converge to
> > running
> > their checkers at the same time, and spreads out when this time is
> > for
> > the different multipath devices.
> >
> > To do this, the checking time is divided into adjustment intervals
> > (conf->adjust_int), so that the checkers run at some index within
> > this
> > interval. conf->adjust_int is chosen so that it is at least twice as
> > long as conf->max_checkint, and is a multiple of all possible
> > pp->checkint values. This means that regardless of pp->checkint, the
> > path should always be checked on the same indexes, each adjustement
> > interval.
> >
> > Each multipath device has a goal index. These are evenly spread out
> > between 0 and conf->max_checkint. Every conf->adjust_int seconds,
> > each
> > multipath device should try to check all of its paths on its goal
> > index.
> > If a path isn't checked on the goal index, then pp->tick for the next
> > check after the goal_idx will be decremented by one, to align it over
> > time.
> >
> > In order for the path checkers to run every pp->checkint seconds,
> > multipathd needs to track how long a path check has been pending for,
> > and subtract that time from the number of ticks till the checker is
> > run
> > again. If the checker has been pending for more that pp->checkint,
> > the path will be rechecked on the next tick after the checker
> > returns.
> >
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
>
> Nice, thanks a lot. I have a few remarks below.
>
>
> > ---
> > libmultipath/config.c | 13 ++++++
> > libmultipath/config.h | 1 +
> > libmultipath/structs.c | 1 +
> > libmultipath/structs.h | 1 +
> > multipathd/main.c | 90 ++++++++++++++++++++++++++++++++++------
> > --
> > 5 files changed, 89 insertions(+), 17 deletions(-)
> >
> > diff --git a/libmultipath/config.c b/libmultipath/config.c
> > index 83fa7369..ca584372 100644
> > --- a/libmultipath/config.c
> > +++ b/libmultipath/config.c
> > @@ -982,6 +982,19 @@ int _init_config (const char *file, struct
> > config *conf)
> > conf->checkint = conf->max_checkint;
> > condlog(3, "polling interval: %d, max: %d",
> > conf->checkint, conf->max_checkint);
> > + /*
> > + * make sure that that adjust_int is at least twice as large
> > + * as checkint and is a multiple of all possible values of
> > + * pp->checkint.
> > + */
> > + if (conf->max_checkint % conf->checkint == 0) {
> > + conf->adjust_int = 2 * conf->max_checkint;
> > + } else {
> > + conf->adjust_int = conf->checkint;
> > + while (2 * conf->adjust_int < conf->max_checkint)
> > + conf->adjust_int *= 2;
> > + conf->adjust_int *= conf->max_checkint;
> > + }
> >
> > if (conf->blist_devnode == NULL) {
> > conf->blist_devnode = vector_alloc();
> > diff --git a/libmultipath/config.h b/libmultipath/config.h
> > index 384193ab..800c0ca9 100644
> > --- a/libmultipath/config.h
> > +++ b/libmultipath/config.h
> > @@ -147,6 +147,7 @@ struct config {
> > int minio_rq;
> > unsigned int checkint;
> > unsigned int max_checkint;
> > + unsigned int adjust_int;
> > bool use_watchdog;
> > int pgfailback;
> > int rr_weight;
> > diff --git a/libmultipath/structs.c b/libmultipath/structs.c
> > index 1583e001..472e1001 100644
> > --- a/libmultipath/structs.c
> > +++ b/libmultipath/structs.c
> > @@ -148,6 +148,7 @@ uninitialize_path(struct path *pp)
> > pp->dmstate = PSTATE_UNDEF;
> > pp->uid_attribute = NULL;
> > pp->checker_timeout = 0;
> > + pp->pending_ticks = 0;
> >
> > if (checker_selected(&pp->checker))
> > checker_put(&pp->checker);
> > diff --git a/libmultipath/structs.h b/libmultipath/structs.h
> > index 002eeae1..457d7836 100644
> > --- a/libmultipath/structs.h
> > +++ b/libmultipath/structs.h
> > @@ -360,6 +360,7 @@ struct path {
> > unsigned long long size;
> > unsigned int checkint;
> > unsigned int tick;
> > + unsigned int pending_ticks;
> > int bus;
> > int offline;
> > int state;
> > diff --git a/multipathd/main.c b/multipathd/main.c
> > index daf668eb..42ccb92f 100644
> > --- a/multipathd/main.c
> > +++ b/multipathd/main.c
> > @@ -2386,7 +2386,7 @@ check_mpp(struct vectors * vecs, struct
> > multipath *mpp, unsigned int ticks)
> > * and '0' otherwise
> > */
> > static int
> > -check_path (struct vectors * vecs, struct path * pp, unsigned int
> > ticks)
> > +do_check_path (struct vectors * vecs, struct path * pp)
> > {
> > int newstate;
> > int new_path_up = 0;
> > @@ -2398,14 +2398,6 @@ check_path (struct vectors * vecs, struct path
> > * pp, unsigned int ticks)
> > int marginal_pathgroups, marginal_changed = 0;
> > bool need_reload;
> >
> > - if (pp->initialized == INIT_REMOVED)
> > - return 0;
> > -
> > - if (pp->tick)
> > - pp->tick -= (pp->tick > ticks) ? ticks : pp->tick;
> > - if (pp->tick)
> > - return 0; /* don't check this path yet */
> > -
> > conf = get_multipath_config();
> > checkint = conf->checkint;
> > max_checkint = conf->max_checkint;
> > @@ -2417,12 +2409,6 @@ check_path (struct vectors * vecs, struct path
> > * pp, unsigned int ticks)
> > pp->checkint = checkint;
> > };
> >
> > - /*
> > - * provision a next check soonest,
> > - * in case we exit abnormally from here
> > - */
> > - pp->tick = checkint;
> > -
> > newstate = check_path_state(pp);
> > if (newstate == PATH_WILD || newstate == PATH_UNCHECKED)
> > return 0;
> > @@ -2584,7 +2570,6 @@ check_path (struct vectors * vecs, struct path
> > * pp, unsigned int ticks)
> > condlog(4, "%s: delay next check
> > %is",
> > pp->dev_t, pp->checkint);
> > }
> > - pp->tick = pp->checkint;
> > }
> > }
> > else if (newstate != PATH_UP && newstate != PATH_GHOST) {
> > @@ -2637,6 +2622,76 @@ check_path (struct vectors * vecs, struct path
> > * pp, unsigned int ticks)
> > return 1;
> > }
> >
> > +static int
> > +check_path (struct vectors * vecs, struct path * pp, unsigned int
> > ticks,
> > + time_t start_secs)
> > +{
> > + int r;
> > + unsigned int adjust_int, max_checkint;
> > + struct config *conf;
> > + time_t next_idx, goal_idx;
> > +
> > + if (pp->initialized == INIT_REMOVED)
> > + return 0;
> > +
> > + if (pp->tick)
> > + pp->tick -= (pp->tick > ticks) ? ticks : pp->tick;
> > + if (pp->tick)
> > + return 0; /* don't check this path yet */
> > +
> > + conf = get_multipath_config();
> > + max_checkint = conf->max_checkint;
> > + adjust_int = conf->adjust_int;
> > + put_multipath_config(conf);
> > +
> > + r = do_check_path(vecs, pp);
> > +
> > + /*
> > + * do_check_path() removed or orphaned the path.
> > + */
> > + if (r < 0 || !pp->mpp)
> > + return r;
> > +
> > + /*
> > + * the path checker is pending
> > + */
> > + if (pp->tick != 0) {
>
> I'm not getting this logic. Please explain. Why is pp->tick != 0 at
> this point equivalent to the checker being pending?
It's not actually just pending. This also happens for a specific case of
PATH_DELAY paths. The short answer is that we should always respect a
pp->tick value set directly in do_check_path(), since that means that we
are supposed to recheck as soon as possible. And as far as incrementing
pp->pending_ticks for PATH_DELAY devices, it seemed harmless. It will
just make the first recheck after a path stops being delayed happen
sooner.
But you are correct, and it's not hard to clear pp->pending_ticks when
the path is in the PATH_DELAY state.
Actually setting pp->ticks to 1 for PATH_DELAY paths seems wrong. I
believe the only reason to do it is to make sure that we re-enqueue the
path for io err checking on time. But we know exactly how long we need
to wait until we do that, so we should be setting pp->tick to the
correct time when we need to, and leaving it alone otherwise. And it
shouldn't have anything to do with whether or not the path is in the
marginal pathgroup.
>
> > + pp->pending_ticks++;
> > + return r;
> > + }
> > +
> > + /* schedule the next check */
> > + pp->tick = pp->checkint;
> > + if (pp->pending_ticks >= pp->tick)
> > + pp->tick = 1;
> > + else
> > + pp->tick -= pp->pending_ticks;
> > + pp->pending_ticks = 0;
> > +
> > + if (pp->tick == 1)
> > + return r;
> > +
> > + /*
> > + * every mpp has a goal_idx in the range of
> > + * 0 <= goal_idx < conf->max_checkint
> > + *
> > + * The next check has an index, next_idx, in the range of
> > + * 0 <= next_idx < conf->adjust_int
> > + *
> > + * Every adjust_int seconds, each multipath device should
> > try to
> > + * check all of its paths on its goal_idx. If a path isn't
> > checked
> > + * on the goal_idx, then pp->tick for the next check after
> > the
> > + * goal_idx will be decremented by one, to align it over
> > time.
> > + */
> > + goal_idx = (find_slot(vecs->mpvec, pp->mpp)) *
> > + max_checkint / VECTOR_SIZE(vecs->mpvec);
> > + next_idx = (start_secs + pp->tick) % adjust_int;
> > + if (next_idx > goal_idx && next_idx - goal_idx < pp-
> > >checkint)
> > + pp->tick--;
>
> Hm. IMHO this calculation should be little simpler, as follows:
>
> cur_idx = start_secs % adjust_int;
> if ((goal_idx - cur_idx) % pp->checkint != 0)
> pp->tick--;
We need to use next_idx instead of cur_idx. cur_idx is the time when the
path checker completed, not when it started. If the checker returned
PATH_PENDING, these two times will be different. Since path checkers may
not always take the same amount of time to complete, we can't use their
completing time to line them up on the same index. Instead we track how
long the checker was pending for, and remove that from the length of
time till the next check, so that checkers always start every
pp->checkint seconds.
But you are correct that we could just use
((goal_idx - next_idx) % pp->checkint != 0)
> My reasoning goes like this: If (goal_idx - cur_idx) % pp->checkint is
> 0, we need no adjustment, because (assuming checkint remains constant)
> it will be checked at the next goal_idx. Otherwise, we check one tick
> earlier, and will do so at the next check again, etc. This is similar
> to your approach, but not exactly the same, AFAICS. In particular, I am
> unsure about your "if (next_idx - goal_idx < pp->checkint)" condition.
The idea was that we would only modify pp->tick once per adjust_int. But
in reflection, there's really no reason to do that. Your version makes
more sense.
> If checkint grows toward max_checkint, it may take longer until this
> algorithm converges, but sooner or later, it should.
>
> Instead of stepping, we could also force the idx immediately, by doing
>
> pp->tick -= (goal_idx - cur_idx) % pp->checkint;
>
> which would be even simpler (the check intervals would be less evenly
> spaced, but I don't think that would hurt us much).
I think you mean:
pp->tick -= (cur_idx - goal_idx) % pp->checkint;
But at any rate, I think we should try to keep the intervals mostly the
same. Plus I feel like shortening the intervals makes the most sense,
while the above will sometimes shorten and sometimes lengthen them.
-Ben
> Am I misguided here?
>
> Regards
> Martin
^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH 20/22] multipathd: factor out actual path checking loop from checkerloop
2024-07-13 6:04 [PATCH 00/22] path checker refactor and misc fixes Benjamin Marzinski
` (18 preceding siblings ...)
2024-07-13 6:05 ` [PATCH 19/22] multipathd: make multipath devices manage their path check times Benjamin Marzinski
@ 2024-07-13 6:05 ` Benjamin Marzinski
2024-07-15 20:54 ` Martin Wilck
2024-07-13 6:05 ` [PATCH 21/22] multipathd: check paths in order by mpp Benjamin Marzinski
2024-07-13 6:05 ` [PATCH 22/22] multipathd: clean up the correct wwid in check_path_wwid_change Benjamin Marzinski
21 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2024-07-13 6:05 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
Move the code that actually loops through the paths and checks them into
a separate function, to stop it from being so heavily indented. This
will be more imporant when a future patch makes paths checked by mpp.
No functional changes.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
multipathd/main.c | 65 +++++++++++++++++++++++++----------------------
1 file changed, 35 insertions(+), 30 deletions(-)
diff --git a/multipathd/main.c b/multipathd/main.c
index 42ccb92f..27e18a0c 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2782,6 +2782,38 @@ enum checker_state {
CHECKER_FINISHED,
};
+static enum checker_state
+check_paths(struct vectors *vecs, unsigned int ticks, int *num_paths_p)
+{
+ unsigned int paths_checked = 0;
+ struct timespec diff_time, start_time, end_time;
+ struct path *pp;
+ int i, rc;
+
+ get_monotonic_time(&start_time);
+ vector_foreach_slot(vecs->pathvec, pp, i) {
+ if (pp->is_checked)
+ continue;
+ pp->is_checked = true;
+ if (pp->mpp)
+ rc = check_path(vecs, pp, ticks, start_time.tv_sec);
+ else
+ rc = handle_uninitialized_path(vecs, pp, ticks);
+ if (rc < 0)
+ i--;
+ else
+ *num_paths_p += rc;
+ if (++paths_checked % 128 == 0 &&
+ (lock_has_waiters(&vecs->lock) || waiting_clients())) {
+ get_monotonic_time(&end_time);
+ timespecsub(&end_time, &start_time, &diff_time);
+ if (diff_time.tv_sec > 0)
+ return CHECKER_RUNNING;
+ }
+ }
+ return CHECKER_FINISHED;
+}
+
static void *
checkerloop (void *ap)
{
@@ -2813,7 +2845,7 @@ checkerloop (void *ap)
while (1) {
struct timespec diff_time, start_time, end_time;
- int num_paths = 0, strict_timing, rc = 0;
+ int num_paths = 0, strict_timing;
unsigned int ticks = 0;
enum checker_state checker_state = CHECKER_STARTING;
@@ -2832,16 +2864,14 @@ checkerloop (void *ap)
sd_notify(0, "WATCHDOG=1");
#endif
while (checker_state != CHECKER_FINISHED) {
- unsigned int paths_checked = 0, i;
- struct timespec chk_start_time;
struct multipath *mpp;
+ int i;
pthread_cleanup_push(cleanup_lock, &vecs->lock);
lock(&vecs->lock);
pthread_testcancel();
vector_foreach_slot(vecs->mpvec, mpp, i)
mpp->is_checked = false;
- get_monotonic_time(&chk_start_time);
if (checker_state == CHECKER_STARTING) {
vector_foreach_slot(vecs->mpvec, mpp, i)
check_mpp(vecs, mpp, ticks);
@@ -2849,32 +2879,7 @@ checkerloop (void *ap)
pp->is_checked = false;
checker_state = CHECKER_RUNNING;
}
- vector_foreach_slot(vecs->pathvec, pp, i) {
- if (pp->is_checked)
- continue;
- pp->is_checked = true;
- if (pp->mpp)
- rc = check_path(vecs, pp, ticks,
- chk_start_time.tv_sec);
- else
- rc = handle_uninitialized_path(vecs, pp,
- ticks);
- if (rc < 0)
- i--;
- else
- num_paths += rc;
- if (++paths_checked % 128 == 0 &&
- (lock_has_waiters(&vecs->lock) ||
- waiting_clients())) {
- get_monotonic_time(&end_time);
- timespecsub(&end_time, &chk_start_time,
- &diff_time);
- if (diff_time.tv_sec > 0)
- goto unlock;
- }
- }
- checker_state = CHECKER_FINISHED;
-unlock:
+ checker_state = check_paths(vecs, ticks, &num_paths);
lock_cleanup_pop(vecs->lock);
if (checker_state != CHECKER_FINISHED) {
/* Yield to waiters */
--
2.45.0
^ permalink raw reply related [flat|nested] 58+ messages in thread* [PATCH 21/22] multipathd: check paths in order by mpp
2024-07-13 6:04 [PATCH 00/22] path checker refactor and misc fixes Benjamin Marzinski
` (19 preceding siblings ...)
2024-07-13 6:05 ` [PATCH 20/22] multipathd: factor out actual path checking loop from checkerloop Benjamin Marzinski
@ 2024-07-13 6:05 ` Benjamin Marzinski
2024-07-15 21:19 ` Martin Wilck
2024-07-13 6:05 ` [PATCH 22/22] multipathd: clean up the correct wwid in check_path_wwid_change Benjamin Marzinski
21 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2024-07-13 6:05 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
Instead of checking all of the paths in vecs->pathvec in order, first
check all the paths in each multipath device. Then check the
uninitialized paths.
One issue with checking paths this way is that the multipath device can
be resynced or even removed while a path is being checked. The path can
also be removed. If there is any change to the multipath device,
multipathd needs to loop through its paths again, because the current
indexes may not longer be valid.
To do this change mpp->is_checked to an int called mpp->synced_count,
and increment it whenever the multipath device gets resynced. After each
path is checked, make sure that the multipath device still exists, that
mpp->synced_count hasn't changed. If either has happened, restart
checking at the current index in mpvec (which will either be the same
mpp if it was just resynced, or the next mpp if the last one was
deleted). Since the multipath device is resynced when its first path is
checked, this restart will happen to every multipath device at least
once per loop. But the paths themselves aren't rechecked, so it's not
much overhead.
If resyncing a multipath device fails in do_check_mpp(), there may be
path devices that have pp->mpp set, but are no longer in one of the
multipath device's pathgroups, and thus will not get checked. This
almost definitely means the multipath device was deleted. If
do_check_mpp() failed to resync the device, but it wasn't deleted, it
will get called again in max_checkint seconds even if it no longer has
mpp->pg set, and the paths will get checked again after that.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
libmultipath/structs.h | 2 +-
libmultipath/structs_vec.c | 1 +
multipathd/main.c | 54 ++++++++++++++++++++++++++++++++------
3 files changed, 48 insertions(+), 9 deletions(-)
diff --git a/libmultipath/structs.h b/libmultipath/structs.h
index 457d7836..91509881 100644
--- a/libmultipath/structs.h
+++ b/libmultipath/structs.h
@@ -455,7 +455,7 @@ struct multipath {
int ghost_delay_tick;
int queue_mode;
unsigned int sync_tick;
- bool is_checked;
+ int synced_count;
uid_t uid;
gid_t gid;
mode_t mode;
diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
index 60360c76..704e0f21 100644
--- a/libmultipath/structs_vec.c
+++ b/libmultipath/structs_vec.c
@@ -513,6 +513,7 @@ update_multipath_table (struct multipath *mpp, vector pathvec, int flags)
conf = get_multipath_config();
mpp->sync_tick = conf->max_checkint;
put_multipath_config(conf);
+ mpp->synced_count++;
r = libmp_mapinfo(DM_MAP_BY_NAME | MAPINFO_MPATH_ONLY,
(mapid_t) { .str = mpp->alias },
diff --git a/multipathd/main.c b/multipathd/main.c
index 27e18a0c..d51bc852 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2356,7 +2356,6 @@ do_check_mpp(struct vectors * vecs, struct multipath *mpp)
int i, ret;
struct path *pp;
- mpp->is_checked = true;
ret = update_multipath_strings(mpp, vecs->pathvec);
if (ret != DMP_OK) {
condlog(1, "%s: %s", mpp->alias, ret == DMP_NOT_FOUND ?
@@ -2428,7 +2427,7 @@ do_check_path (struct vectors * vecs, struct path * pp)
condlog(0, "%s: path wwid change detected. Removing", pp->dev);
return handle_path_wwid_change(pp, vecs)? -1 : 0;
}
- if (!pp->mpp->is_checked) {
+ if (pp->mpp->synced_count == 0) {
do_check_mpp(vecs, pp->mpp);
/* if update_multipath_strings orphaned the path, quit early */
if (!pp->mpp)
@@ -2787,18 +2786,57 @@ check_paths(struct vectors *vecs, unsigned int ticks, int *num_paths_p)
{
unsigned int paths_checked = 0;
struct timespec diff_time, start_time, end_time;
+ struct multipath *mpp;
struct path *pp;
int i, rc;
get_monotonic_time(&start_time);
+
+ vector_foreach_slot(vecs->mpvec, mpp, i) {
+ struct pathgroup *pgp;
+ struct path *pp;
+ int j, k;
+ bool check_for_waiters = false;
+ /* maps can be rechecked, so this is not always 0 */
+ int synced_count = mpp->synced_count;
+
+ vector_foreach_slot (mpp->pg, pgp, j) {
+ vector_foreach_slot (pgp->paths, pp, k) {
+ if (!pp->mpp || pp->is_checked)
+ continue;
+ pp->is_checked = true;
+ rc = check_path(vecs, pp, ticks,
+ start_time.tv_sec);
+ if (rc > 0)
+ *num_paths_p += 1;
+ if (++paths_checked % 128 == 0)
+ check_for_waiters = true;
+ /*
+ * mpp has been removed or resynced. Path may
+ * have been removed.
+ */
+ if (VECTOR_SLOT(vecs->mpvec, i) != mpp ||
+ synced_count != mpp->synced_count) {
+ i--;
+ goto next_mpp;
+ }
+ }
+ }
+next_mpp:
+ if (check_for_waiters &&
+ (lock_has_waiters(&vecs->lock) || waiting_clients())) {
+ get_monotonic_time(&end_time);
+ timespecsub(&end_time, &start_time, &diff_time);
+ if (diff_time.tv_sec > 0)
+ return CHECKER_RUNNING;
+ }
+ }
vector_foreach_slot(vecs->pathvec, pp, i) {
- if (pp->is_checked)
+ if (pp->mpp || pp->is_checked)
continue;
pp->is_checked = true;
- if (pp->mpp)
- rc = check_path(vecs, pp, ticks, start_time.tv_sec);
- else
- rc = handle_uninitialized_path(vecs, pp, ticks);
+
+ rc = handle_uninitialized_path(vecs, pp, ticks);
if (rc < 0)
i--;
else
@@ -2871,7 +2909,7 @@ checkerloop (void *ap)
lock(&vecs->lock);
pthread_testcancel();
vector_foreach_slot(vecs->mpvec, mpp, i)
- mpp->is_checked = false;
+ mpp->synced_count = 0;
if (checker_state == CHECKER_STARTING) {
vector_foreach_slot(vecs->mpvec, mpp, i)
check_mpp(vecs, mpp, ticks);
--
2.45.0
^ permalink raw reply related [flat|nested] 58+ messages in thread* Re: [PATCH 21/22] multipathd: check paths in order by mpp
2024-07-13 6:05 ` [PATCH 21/22] multipathd: check paths in order by mpp Benjamin Marzinski
@ 2024-07-15 21:19 ` Martin Wilck
2024-07-16 20:00 ` Benjamin Marzinski
0 siblings, 1 reply; 58+ messages in thread
From: Martin Wilck @ 2024-07-15 21:19 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Sat, 2024-07-13 at 02:05 -0400, Benjamin Marzinski wrote:
> Instead of checking all of the paths in vecs->pathvec in order, first
> check all the paths in each multipath device. Then check the
> uninitialized paths.
>
> One issue with checking paths this way is that the multipath device
> can
> be resynced or even removed while a path is being checked. The path
> can
> also be removed. If there is any change to the multipath device,
> multipathd needs to loop through its paths again, because the current
> indexes may not longer be valid.
>
> To do this change mpp->is_checked to an int called mpp->synced_count,
> and increment it whenever the multipath device gets resynced. After
> each
> path is checked, make sure that the multipath device still exists,
> that
> mpp->synced_count hasn't changed. If either has happened, restart
> checking at the current index in mpvec (which will either be the same
> mpp if it was just resynced, or the next mpp if the last one was
> deleted). Since the multipath device is resynced when its first path
> is
> checked, this restart will happen to every multipath device at least
> once per loop. But the paths themselves aren't rechecked, so it's not
> much overhead.
Why don't we just sync the map once before we start checking the paths,
instead of doing it when the first path is checked?
> If resyncing a multipath device fails in do_check_mpp(), there may be
> path devices that have pp->mpp set, but are no longer in one of the
> multipath device's pathgroups, and thus will not get checked. This
> almost definitely means the multipath device was deleted. If
> do_check_mpp() failed to resync the device, but it wasn't deleted, it
> will get called again in max_checkint seconds even if it no longer
> has
> mpp->pg set, and the paths will get checked again after that.
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
> libmultipath/structs.h | 2 +-
> libmultipath/structs_vec.c | 1 +
> multipathd/main.c | 54 ++++++++++++++++++++++++++++++++----
> --
> 3 files changed, 48 insertions(+), 9 deletions(-)
>
> diff --git a/libmultipath/structs.h b/libmultipath/structs.h
> index 457d7836..91509881 100644
> --- a/libmultipath/structs.h
> +++ b/libmultipath/structs.h
> @@ -455,7 +455,7 @@ struct multipath {
> int ghost_delay_tick;
> int queue_mode;
> unsigned int sync_tick;
> - bool is_checked;
> + int synced_count;
> uid_t uid;
> gid_t gid;
> mode_t mode;
> diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
> index 60360c76..704e0f21 100644
> --- a/libmultipath/structs_vec.c
> +++ b/libmultipath/structs_vec.c
> @@ -513,6 +513,7 @@ update_multipath_table (struct multipath *mpp,
> vector pathvec, int flags)
> conf = get_multipath_config();
> mpp->sync_tick = conf->max_checkint;
> put_multipath_config(conf);
> + mpp->synced_count++;
>
> r = libmp_mapinfo(DM_MAP_BY_NAME | MAPINFO_MPATH_ONLY,
> (mapid_t) { .str = mpp->alias },
> diff --git a/multipathd/main.c b/multipathd/main.c
> index 27e18a0c..d51bc852 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -2356,7 +2356,6 @@ do_check_mpp(struct vectors * vecs, struct
> multipath *mpp)
> int i, ret;
> struct path *pp;
>
> - mpp->is_checked = true;
> ret = update_multipath_strings(mpp, vecs->pathvec);
> if (ret != DMP_OK) {
> condlog(1, "%s: %s", mpp->alias, ret ==
> DMP_NOT_FOUND ?
> @@ -2428,7 +2427,7 @@ do_check_path (struct vectors * vecs, struct
> path * pp)
> condlog(0, "%s: path wwid change detected.
> Removing", pp->dev);
> return handle_path_wwid_change(pp, vecs)? -1 : 0;
> }
> - if (!pp->mpp->is_checked) {
> + if (pp->mpp->synced_count == 0) {
> do_check_mpp(vecs, pp->mpp);
> /* if update_multipath_strings orphaned the path,
> quit early */
> if (!pp->mpp)
> @@ -2787,18 +2786,57 @@ check_paths(struct vectors *vecs, unsigned
> int ticks, int *num_paths_p)
> {
> unsigned int paths_checked = 0;
> struct timespec diff_time, start_time, end_time;
> + struct multipath *mpp;
> struct path *pp;
> int i, rc;
>
> get_monotonic_time(&start_time);
> +
> + vector_foreach_slot(vecs->mpvec, mpp, i) {
> + struct pathgroup *pgp;
> + struct path *pp;
> + int j, k;
> + bool check_for_waiters = false;
> + /* maps can be rechecked, so this is not always 0 */
> + int synced_count = mpp->synced_count;
> +
> + vector_foreach_slot (mpp->pg, pgp, j) {
> + vector_foreach_slot (pgp->paths, pp, k) {
> + if (!pp->mpp || pp->is_checked)
> + continue;
> + pp->is_checked = true;
> + rc = check_path(vecs, pp, ticks,
> + start_time.tv_sec);
> + if (rc > 0)
> + *num_paths_p += 1;
> + if (++paths_checked % 128 == 0)
> + check_for_waiters = true;
> + /*
> + * mpp has been removed or resynced.
> Path may
> + * have been removed.
> + */
> + if (VECTOR_SLOT(vecs->mpvec, i) !=
> mpp ||
> + synced_count != mpp-
> >synced_count) {
> + i--;
> + goto next_mpp;
> + }
I'm aware that you have implemented the rather vague idea I had
mentioned previously, thanks a lot for that.
This is ok, but the fact that the arrays we are looping over can be
modified deeply inside functions inside the loop feels fragile. I
wonder if it would be possible to clean up our code flow such that
neither mpp nor path objects can be removed at this point the code.
The sync code could set flags on paths and / or multipath objects that
tell the caller that these objects are no longer valid, and we could
remove them when we're done with the loop. I'm aware that we'd still
need to cover path group changes. Perhaps we should create a temporary
array of all paths in the map (possibly sorted by major/minor number).
We'd ignore newly added paths (they should have been checked recently
anyway, shouldn't they?) and skip over paths that are flagged as being
removed.
I realize I'm making vague suggestions again. Your current work shows
that this is more complicated than I first imagined. Thanks a lot for
considering all the corner cases.
This isn't something I'd expect to change in the current patch set,
just something to think about for the future.
> + }
> + }
> +next_mpp:
> + if (check_for_waiters &&
> + (lock_has_waiters(&vecs->lock) ||
> waiting_clients())) {
> + get_monotonic_time(&end_time);
> + timespecsub(&end_time, &start_time,
> &diff_time);
> + if (diff_time.tv_sec > 0)
> + return CHECKER_RUNNING;
> + }
> + }
> vector_foreach_slot(vecs->pathvec, pp, i) {
> - if (pp->is_checked)
> + if (pp->mpp || pp->is_checked)
> continue;
> pp->is_checked = true;
> - if (pp->mpp)
> - rc = check_path(vecs, pp, ticks,
> start_time.tv_sec);
> - else
> - rc = handle_uninitialized_path(vecs, pp,
> ticks);
> +
> + rc = handle_uninitialized_path(vecs, pp, ticks);
> if (rc < 0)
> i--;
> else
> @@ -2871,7 +2909,7 @@ checkerloop (void *ap)
> lock(&vecs->lock);
> pthread_testcancel();
> vector_foreach_slot(vecs->mpvec, mpp, i)
> - mpp->is_checked = false;
> + mpp->synced_count = 0;
Why do we need to reset this? Can't we just increment the counter
whenever we sync, and let it wrap at UINT_MAX?
> if (checker_state == CHECKER_STARTING) {
> vector_foreach_slot(vecs->mpvec,
> mpp, i)
> check_mpp(vecs, mpp, ticks);
^ permalink raw reply [flat|nested] 58+ messages in thread* Re: [PATCH 21/22] multipathd: check paths in order by mpp
2024-07-15 21:19 ` Martin Wilck
@ 2024-07-16 20:00 ` Benjamin Marzinski
0 siblings, 0 replies; 58+ messages in thread
From: Benjamin Marzinski @ 2024-07-16 20:00 UTC (permalink / raw)
To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development
On Mon, Jul 15, 2024 at 11:19:57PM +0200, Martin Wilck wrote:
> On Sat, 2024-07-13 at 02:05 -0400, Benjamin Marzinski wrote:
> > Instead of checking all of the paths in vecs->pathvec in order, first
> > check all the paths in each multipath device. Then check the
> > uninitialized paths.
> >
> > One issue with checking paths this way is that the multipath device
> > can
> > be resynced or even removed while a path is being checked. The path
> > can
> > also be removed. If there is any change to the multipath device,
> > multipathd needs to loop through its paths again, because the current
> > indexes may not longer be valid.
> >
> > To do this change mpp->is_checked to an int called mpp->synced_count,
> > and increment it whenever the multipath device gets resynced. After
> > each
> > path is checked, make sure that the multipath device still exists,
> > that
> > mpp->synced_count hasn't changed. If either has happened, restart
> > checking at the current index in mpvec (which will either be the same
> > mpp if it was just resynced, or the next mpp if the last one was
> > deleted). Since the multipath device is resynced when its first path
> > is
> > checked, this restart will happen to every multipath device at least
> > once per loop. But the paths themselves aren't rechecked, so it's not
> > much overhead.
>
> Why don't we just sync the map once before we start checking the paths,
> instead of doing it when the first path is checked?
I guess we could loop through all the paths of a multipath device and
see if any of them are due for a check this loop. The if the path
checker returns pending we will have resynced the map for no benefit,
and we will continue to do so until the checker completes. This could
lead to us doing more resyncs than we were before, with no added
benefit.
> > If resyncing a multipath device fails in do_check_mpp(), there may be
> > path devices that have pp->mpp set, but are no longer in one of the
> > multipath device's pathgroups, and thus will not get checked. This
> > almost definitely means the multipath device was deleted. If
> > do_check_mpp() failed to resync the device, but it wasn't deleted, it
> > will get called again in max_checkint seconds even if it no longer
> > has
> > mpp->pg set, and the paths will get checked again after that.
> >
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> > libmultipath/structs.h | 2 +-
> > libmultipath/structs_vec.c | 1 +
> > multipathd/main.c | 54 ++++++++++++++++++++++++++++++++----
> > --
> > 3 files changed, 48 insertions(+), 9 deletions(-)
> >
> > diff --git a/libmultipath/structs.h b/libmultipath/structs.h
> > index 457d7836..91509881 100644
> > --- a/libmultipath/structs.h
> > +++ b/libmultipath/structs.h
> > @@ -455,7 +455,7 @@ struct multipath {
> > int ghost_delay_tick;
> > int queue_mode;
> > unsigned int sync_tick;
> > - bool is_checked;
> > + int synced_count;
> > uid_t uid;
> > gid_t gid;
> > mode_t mode;
> > diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
> > index 60360c76..704e0f21 100644
> > --- a/libmultipath/structs_vec.c
> > +++ b/libmultipath/structs_vec.c
> > @@ -513,6 +513,7 @@ update_multipath_table (struct multipath *mpp,
> > vector pathvec, int flags)
> > conf = get_multipath_config();
> > mpp->sync_tick = conf->max_checkint;
> > put_multipath_config(conf);
> > + mpp->synced_count++;
> >
> > r = libmp_mapinfo(DM_MAP_BY_NAME | MAPINFO_MPATH_ONLY,
> > (mapid_t) { .str = mpp->alias },
> > diff --git a/multipathd/main.c b/multipathd/main.c
> > index 27e18a0c..d51bc852 100644
> > --- a/multipathd/main.c
> > +++ b/multipathd/main.c
> > @@ -2356,7 +2356,6 @@ do_check_mpp(struct vectors * vecs, struct
> > multipath *mpp)
> > int i, ret;
> > struct path *pp;
> >
> > - mpp->is_checked = true;
> > ret = update_multipath_strings(mpp, vecs->pathvec);
> > if (ret != DMP_OK) {
> > condlog(1, "%s: %s", mpp->alias, ret ==
> > DMP_NOT_FOUND ?
> > @@ -2428,7 +2427,7 @@ do_check_path (struct vectors * vecs, struct
> > path * pp)
> > condlog(0, "%s: path wwid change detected.
> > Removing", pp->dev);
> > return handle_path_wwid_change(pp, vecs)? -1 : 0;
> > }
> > - if (!pp->mpp->is_checked) {
> > + if (pp->mpp->synced_count == 0) {
> > do_check_mpp(vecs, pp->mpp);
> > /* if update_multipath_strings orphaned the path,
> > quit early */
> > if (!pp->mpp)
> > @@ -2787,18 +2786,57 @@ check_paths(struct vectors *vecs, unsigned
> > int ticks, int *num_paths_p)
> > {
> > unsigned int paths_checked = 0;
> > struct timespec diff_time, start_time, end_time;
> > + struct multipath *mpp;
> > struct path *pp;
> > int i, rc;
> >
> > get_monotonic_time(&start_time);
> > +
> > + vector_foreach_slot(vecs->mpvec, mpp, i) {
> > + struct pathgroup *pgp;
> > + struct path *pp;
> > + int j, k;
> > + bool check_for_waiters = false;
> > + /* maps can be rechecked, so this is not always 0 */
> > + int synced_count = mpp->synced_count;
> > +
> > + vector_foreach_slot (mpp->pg, pgp, j) {
> > + vector_foreach_slot (pgp->paths, pp, k) {
> > + if (!pp->mpp || pp->is_checked)
> > + continue;
> > + pp->is_checked = true;
> > + rc = check_path(vecs, pp, ticks,
> > + start_time.tv_sec);
> > + if (rc > 0)
> > + *num_paths_p += 1;
> > + if (++paths_checked % 128 == 0)
> > + check_for_waiters = true;
> > + /*
> > + * mpp has been removed or resynced.
> > Path may
> > + * have been removed.
> > + */
> > + if (VECTOR_SLOT(vecs->mpvec, i) !=
> > mpp ||
> > + synced_count != mpp-
> > >synced_count) {
> > + i--;
> > + goto next_mpp;
> > + }
>
> I'm aware that you have implemented the rather vague idea I had
> mentioned previously, thanks a lot for that.
>
> This is ok, but the fact that the arrays we are looping over can be
> modified deeply inside functions inside the loop feels fragile. I
> wonder if it would be possible to clean up our code flow such that
> neither mpp nor path objects can be removed at this point the code.
We could handle the paths WWID changing by just failng it immediately,
and then removing it after we check the other paths. But unless we
either wanted to resync even when it was unnecessary like I mentioned
above, or we decide that we don't need to resync before updating the
path state, then no. There will be cases when we need to start checking
the path to see if we need to resync and when we resync the device could
be gone or have changed pathgroups.
Now, there's no reason I know of why we must resync the multipath device
before checking a path. I want to do it mostly because I think there is
a good chance that we will run into some weird corner case behaviors if
we stop doing it. It's nice to grab a lock so no other thread can change
the state, and then make get it to match the kernel state, before you
start changing that state based on you path check results. But the
moment after you grab that state it could already be out of date, so
multipathd needs to handle that possibility regardless.
Also the current changes already open us up to new corner cases. For
instance, if you fail calling update_multipath_strings() in
do_check_mpp(), mpp->pg may well be cleared. This means that you won't
end up checking any of the other paths in that multipath device until
something else reloads the device, either an event or check_mpp(). We
could work around this by calling check_path() for paths with pp->mpp
set but that haven't been checked yet in the loop where we currently
just call handle_uninitialized_path(). I debated doing this, but almost
always when do_check_mpp() fails, it will be because the device was
removed.
Another solution could be to set sync_tick to something low if we fail
when resyncing the mpp. That would let us process the remove event that
is almost definitely waiting for us, but still retry syncing the mpp
state soonish if that's not the case. I should probably do that
regardless of whether we check the unchecked paths with pp->mpp set in
the uninitialized paths loop.
But at any rate, we could try simply not resyncing the map until after
checking the paths, and handling whatever bugs appear. Or even try
something that dovetails with what I believe is your bigger vision,
where we start all the path checkers for all the paths of a device, and
only look to see which ones have completed once we're all done. If some
of the path checkers returned, we could save the checker results in
pp->newstate, check if the wwids changed and handle that, and then
resync the mpp and update it's state based on the checker results for
it's current paths.
But yeah, that's work for a different day.
> The sync code could set flags on paths and / or multipath objects that
> tell the caller that these objects are no longer valid, and we could
> remove them when we're done with the loop. I'm aware that we'd still
> need to cover path group changes. Perhaps we should create a temporary
> array of all paths in the map (possibly sorted by major/minor number).
> We'd ignore newly added paths (they should have been checked recently
> anyway, shouldn't they?) and skip over paths that are flagged as being
> removed.
>
> I realize I'm making vague suggestions again. Your current work shows
> that this is more complicated than I first imagined. Thanks a lot for
> considering all the corner cases.
>
> This isn't something I'd expect to change in the current patch set,
> just something to think about for the future.
>
> > + }
> > + }
> > +next_mpp:
> > + if (check_for_waiters &&
> > + (lock_has_waiters(&vecs->lock) ||
> > waiting_clients())) {
> > + get_monotonic_time(&end_time);
> > + timespecsub(&end_time, &start_time,
> > &diff_time);
> > + if (diff_time.tv_sec > 0)
> > + return CHECKER_RUNNING;
> > + }
> > + }
> > vector_foreach_slot(vecs->pathvec, pp, i) {
> > - if (pp->is_checked)
> > + if (pp->mpp || pp->is_checked)
> > continue;
> > pp->is_checked = true;
> > - if (pp->mpp)
> > - rc = check_path(vecs, pp, ticks,
> > start_time.tv_sec);
> > - else
> > - rc = handle_uninitialized_path(vecs, pp,
> > ticks);
> > +
> > + rc = handle_uninitialized_path(vecs, pp, ticks);
> > if (rc < 0)
> > i--;
> > else
> > @@ -2871,7 +2909,7 @@ checkerloop (void *ap)
> > lock(&vecs->lock);
> > pthread_testcancel();
> > vector_foreach_slot(vecs->mpvec, mpp, i)
> > - mpp->is_checked = false;
> > + mpp->synced_count = 0;
>
> Why do we need to reset this? Can't we just increment the counter
> whenever we sync, and let it wrap at UINT_MAX?
We need to restart looping through the mpp paths whenever
mpp->synced_count has changed, but we only call do_check_mpp() in
do_check_path() if mpp->synced_count is 0, since we only want to do that
once per time we check the mpp. I should probably rename that
to do_sync_mpp().
> > if (checker_state == CHECKER_STARTING) {
> > vector_foreach_slot(vecs->mpvec,
> > mpp, i)
> > check_mpp(vecs, mpp, ticks);
^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH 22/22] multipathd: clean up the correct wwid in check_path_wwid_change
2024-07-13 6:04 [PATCH 00/22] path checker refactor and misc fixes Benjamin Marzinski
` (20 preceding siblings ...)
2024-07-13 6:05 ` [PATCH 21/22] multipathd: check paths in order by mpp Benjamin Marzinski
@ 2024-07-13 6:05 ` Benjamin Marzinski
2024-07-15 21:22 ` Martin Wilck
21 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2024-07-13 6:05 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
After check_path_wwid_change() grabbed a new copy of the wwid, it was
stripping trailing whitespace off of pp->wwid, instead of the copy it
just got.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
multipathd/main.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/multipathd/main.c b/multipathd/main.c
index d51bc852..50b6f3eb 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1025,9 +1025,9 @@ check_path_wwid_change(struct path *pp)
}
/*Strip any trailing blanks */
- for (i = strlen(pp->wwid); i > 0 && pp->wwid[i-1] == ' '; i--);
+ for (i = strlen(wwid); i > 0 && wwid[i-1] == ' '; i--);
/* no-op */
- pp->wwid[i] = '\0';
+ wwid[i] = '\0';
condlog(4, "%s: Got wwid %s by sgio", pp->dev, wwid);
if (strncmp(wwid, pp->wwid, WWID_SIZE)) {
--
2.45.0
^ permalink raw reply related [flat|nested] 58+ messages in thread* Re: [PATCH 22/22] multipathd: clean up the correct wwid in check_path_wwid_change
2024-07-13 6:05 ` [PATCH 22/22] multipathd: clean up the correct wwid in check_path_wwid_change Benjamin Marzinski
@ 2024-07-15 21:22 ` Martin Wilck
2024-07-16 20:00 ` Benjamin Marzinski
0 siblings, 1 reply; 58+ messages in thread
From: Martin Wilck @ 2024-07-15 21:22 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Sat, 2024-07-13 at 02:05 -0400, Benjamin Marzinski wrote:
> After check_path_wwid_change() grabbed a new copy of the wwid, it was
> stripping trailing whitespace off of pp->wwid, instead of the copy it
> just got.
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
I'd suggest adding:
4660cff ("multipathd: add recheck_wwid option to verify the path wwid")
^ permalink raw reply [flat|nested] 58+ messages in thread* Re: [PATCH 22/22] multipathd: clean up the correct wwid in check_path_wwid_change
2024-07-15 21:22 ` Martin Wilck
@ 2024-07-16 20:00 ` Benjamin Marzinski
0 siblings, 0 replies; 58+ messages in thread
From: Benjamin Marzinski @ 2024-07-16 20:00 UTC (permalink / raw)
To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development
On Mon, Jul 15, 2024 at 11:22:06PM +0200, Martin Wilck wrote:
> On Sat, 2024-07-13 at 02:05 -0400, Benjamin Marzinski wrote:
> > After check_path_wwid_change() grabbed a new copy of the wwid, it was
> > stripping trailing whitespace off of pp->wwid, instead of the copy it
> > just got.
> >
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
>
> Reviewed-by: Martin Wilck <mwilck@suse.com>
>
> I'd suggest adding:
> 4660cff ("multipathd: add recheck_wwid option to verify the path wwid")
Sure.
>
^ permalink raw reply [flat|nested] 58+ messages in thread