* [PATCH 0/3] fix issue of multipathd not tracking device
@ 2025-03-24 20:55 Benjamin Marzinski
2025-03-24 20:55 ` [PATCH 1/3] multipathd: monitor new multipath dev even if we can't update it Benjamin Marzinski
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Benjamin Marzinski @ 2025-03-24 20:55 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
I ran into an issue where multipathd wasn't tracking a multipath
device that was created by the multipath command. It turns out that if
multipathd fully initialized a path device, and that device later
goes offline and then a multipath device that could use that path is
created by multipath, multipathd will attempt to reload the device
to use the offline path, which will fail. This will cause it to not
track the multipath device at all.
The first patch fixes this. The second patch allows mutipathd to track
these offline paths that should belong to a device, and add them to
device once they come back online. The third patch is just a cleanup.
Benjamin Marzinski (3):
multipathd: monitor new multipath dev even if we can't update it
multipathd: re-add paths skipped because they were offline
multipathd: don't update paths in INIT_MISSING_UDEV
libmultipath/print.c | 1 +
libmultipath/structs.h | 5 ++++
libmultipath/structs_vec.c | 5 ++++
multipathd/main.c | 59 ++++++++++++++++++++++++++++++++++++--
4 files changed, 67 insertions(+), 3 deletions(-)
--
2.48.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/3] multipathd: monitor new multipath dev even if we can't update it
2025-03-24 20:55 [PATCH 0/3] fix issue of multipathd not tracking device Benjamin Marzinski
@ 2025-03-24 20:55 ` Benjamin Marzinski
2025-03-25 22:33 ` Martin Wilck
2025-03-24 20:55 ` [PATCH 2/3] multipathd: re-add paths skipped because they were offline Benjamin Marzinski
2025-03-24 20:55 ` [PATCH 3/3] multipathd: don't update paths in INIT_MISSING_UDEV Benjamin Marzinski
2 siblings, 1 reply; 10+ messages in thread
From: Benjamin Marzinski @ 2025-03-24 20:55 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
If a multipath device was created by the multipath command, multipathd
might not agree with how the device was created. ev_add_map() can reload
the device with a different table by calling add_map_without_path() ->
update_map(). If this reloading of the map failed, multipathd was simply
ignoring the multipath device, even though it still existed.
One way that reloading can fail is if a path that multipathd already has
initialized goes offline. If a multipath device is created by the
multipath command while the path is offline, it will not use the offline
path, since multipath won't be able to get the necessary pathinfo.
However, multipathd will already have the pathinfo for the path, and may
not even know that it's offline, since the path is an orphan. When it
tries to reload the device, it will include the offline path, and the
reload will fail.
Instead of ignoring the device if it can't reload it, multipathd should
just montior it as it is. When the path device is no longer offline, it
can be added back to the multipath device by calling
"multipathd reconfigure" or "multipathd add path <path>".
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
multipathd/main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/multipathd/main.c b/multipathd/main.c
index e63b6aa7..7aaae773 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -679,7 +679,7 @@ retry:
}
fail:
- if (new_map && (retries < 0 || wait_for_events(mpp, vecs))) {
+ if (new_map && wait_for_events(mpp, vecs)) {
condlog(0, "%s: failed to create new map", mpp->alias);
remove_map(mpp, vecs->pathvec, vecs->mpvec);
return 1;
--
2.48.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/3] multipathd: re-add paths skipped because they were offline
2025-03-24 20:55 [PATCH 0/3] fix issue of multipathd not tracking device Benjamin Marzinski
2025-03-24 20:55 ` [PATCH 1/3] multipathd: monitor new multipath dev even if we can't update it Benjamin Marzinski
@ 2025-03-24 20:55 ` Benjamin Marzinski
2025-03-25 22:33 ` Martin Wilck
2025-03-24 20:55 ` [PATCH 3/3] multipathd: don't update paths in INIT_MISSING_UDEV Benjamin Marzinski
2 siblings, 1 reply; 10+ messages in thread
From: Benjamin Marzinski @ 2025-03-24 20:55 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
When a new device is added by the multipath command, multipathd may know
of other paths that cannot be added to the device because they are
currently offline. Instead of ignoring these paths, multipathd will now
re-add them when they come back online. To do this, it multipathd needs
a new device initialized state, INIT_OFFLINE, to track devices that were
in INIT_OK, but could not be added to an existing multipath device
because they were offline. These paths are handled along with the other
uninitialized paths.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
libmultipath/print.c | 1 +
libmultipath/structs.h | 5 ++++
libmultipath/structs_vec.c | 5 ++++
multipathd/main.c | 58 ++++++++++++++++++++++++++++++++++++--
4 files changed, 67 insertions(+), 2 deletions(-)
diff --git a/libmultipath/print.c b/libmultipath/print.c
index 00c03ace..ed8adebe 100644
--- a/libmultipath/print.c
+++ b/libmultipath/print.c
@@ -572,6 +572,7 @@ static int snprint_initialized(struct strbuf *buff, const struct path * pp)
[INIT_OK] = "ok",
[INIT_REMOVED] = "removed",
[INIT_PARTIAL] = "partial",
+ [INIT_OFFLINE] = "offline",
};
const char *str;
diff --git a/libmultipath/structs.h b/libmultipath/structs.h
index 28de9a7f..8644407f 100644
--- a/libmultipath/structs.h
+++ b/libmultipath/structs.h
@@ -258,6 +258,11 @@ enum initialized_states {
* change uevent is received.
*/
INIT_PARTIAL,
+ /*
+ * INIT_OFFLINE: paths that should be part of an existing multipath
+ * device, but cannot be added because they are offline,
+ */
+ INIT_OFFLINE,
INIT_LAST__,
};
diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
index f6407e12..f122d056 100644
--- a/libmultipath/structs_vec.c
+++ b/libmultipath/structs_vec.c
@@ -389,6 +389,9 @@ static void orphan_paths(vector pathvec, struct multipath *mpp, const char *reas
free_path(pp);
} else
orphan_path(pp, reason);
+ } else if (pp->initialized == INIT_OFFLINE &&
+ strncmp(mpp->wwid, pp->wwid, WWID_SIZE) == 0) {
+ pp->initialized = INIT_OK;
}
}
}
@@ -595,6 +598,8 @@ void sync_paths(struct multipath *mpp, vector pathvec)
found = 0;
vector_foreach_slot(mpp->pg, pgp, j) {
if (find_slot(pgp->paths, (void *)pp) != -1) {
+ if (pp->initialized == INIT_OFFLINE)
+ pp->initialized = INIT_OK;
found = 1;
break;
}
diff --git a/multipathd/main.c b/multipathd/main.c
index 7aaae773..ecad5a4f 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -644,11 +644,44 @@ pr_register_active_paths(struct multipath *mpp)
}
}
+static void
+save_offline_paths(struct multipath *mpp, vector offline_paths)
+{
+ unsigned int i, j;
+ struct path *pp;
+ struct pathgroup *pgp;
+
+ vector_foreach_slot (mpp->pg, pgp, i)
+ vector_foreach_slot (pgp->paths, pp, j)
+ if (pp->initialized == INIT_OK &&
+ pp->sysfs_state == PATH_DOWN)
+ store_path(offline_paths, pp);
+}
+
+static void
+handle_orphaned_offline_paths(vector offline_paths)
+{
+ unsigned int i;
+ struct path *pp;
+
+ vector_foreach_slot (offline_paths, pp, i)
+ if (pp->mpp == NULL)
+ pp->initialized = INIT_OFFLINE;
+}
+
+static void
+cleanup_reset_vec(struct vector_s **v)
+{
+ vector_reset(*v);
+}
+
static int
update_map (struct multipath *mpp, struct vectors *vecs, int new_map)
{
int retries = 3;
char *params __attribute__((cleanup(cleanup_charp))) = NULL;
+ struct vector_s offline_paths_vec = { .allocated = 0 };
+ vector offline_paths __attribute__((cleanup(cleanup_reset_vec))) = &offline_paths_vec;
retry:
condlog(4, "%s: updating new map", mpp->alias);
@@ -685,6 +718,9 @@ fail:
return 1;
}
+ if (new_map && retries < 0)
+ save_offline_paths(mpp, offline_paths);
+
if (setup_multipath(vecs, mpp))
return 1;
@@ -695,6 +731,9 @@ fail:
if (mpp->prflag == PRFLAG_SET)
pr_register_active_paths(mpp);
+ if (VECTOR_SIZE(offline_paths) != 0)
+ handle_orphaned_offline_paths(offline_paths);
+
if (retries < 0)
condlog(0, "%s: failed reload in new map update", mpp->alias);
return 0;
@@ -2793,7 +2832,8 @@ check_uninitialized_path(struct path * pp, unsigned int ticks)
struct config *conf;
if (pp->initialized != INIT_NEW && pp->initialized != INIT_FAILED &&
- pp->initialized != INIT_MISSING_UDEV)
+ pp->initialized != INIT_MISSING_UDEV &&
+ pp->initialized != INIT_OFFLINE)
return CHECK_PATH_SKIPPED;
if (pp->tick)
@@ -2849,7 +2889,8 @@ update_uninitialized_path(struct vectors * vecs, struct path * pp)
struct config *conf;
if (pp->initialized != INIT_NEW && pp->initialized != INIT_FAILED &&
- pp->initialized != INIT_MISSING_UDEV)
+ pp->initialized != INIT_MISSING_UDEV &&
+ pp->initialized != INIT_OFFLINE)
return CHECK_PATH_SKIPPED;
newstate = get_new_state(pp);
@@ -2875,6 +2916,19 @@ update_uninitialized_path(struct vectors * vecs, struct path * pp)
free_path(pp);
return CHECK_PATH_REMOVED;
}
+ } else if (pp->initialized == INIT_OFFLINE &&
+ (newstate == PATH_UP || newstate == PATH_GHOST)) {
+ pp->initialized = INIT_OK;
+ if (pp->recheck_wwid == RECHECK_WWID_ON &&
+ check_path_wwid_change(pp)) {
+ condlog(0, "%s: path wwid change detected. Removing",
+ pp->dev);
+ return handle_path_wwid_change(pp, vecs)?
+ CHECK_PATH_REMOVED :
+ CHECK_PATH_SKIPPED;
+ }
+ ev_add_path(pp, vecs, 1);
+ pp->tick = 1;
}
return CHECK_PATH_CHECKED;
}
--
2.48.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/3] multipathd: don't update paths in INIT_MISSING_UDEV
2025-03-24 20:55 [PATCH 0/3] fix issue of multipathd not tracking device Benjamin Marzinski
2025-03-24 20:55 ` [PATCH 1/3] multipathd: monitor new multipath dev even if we can't update it Benjamin Marzinski
2025-03-24 20:55 ` [PATCH 2/3] multipathd: re-add paths skipped because they were offline Benjamin Marzinski
@ 2025-03-24 20:55 ` Benjamin Marzinski
2025-03-25 22:40 ` Martin Wilck
2 siblings, 1 reply; 10+ messages in thread
From: Benjamin Marzinski @ 2025-03-24 20:55 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
There is nothing for update_uninitialized_path() to do for paths in the
INIT_MISSING_UDEV state. In fact, there shouldn't be any paths in this
state when update_uninitialized_path() is called, since they will have
switched to a different state in check_uninitialized_path().
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
multipathd/main.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/multipathd/main.c b/multipathd/main.c
index ecad5a4f..9c44e6e6 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2889,7 +2889,6 @@ update_uninitialized_path(struct vectors * vecs, struct path * pp)
struct config *conf;
if (pp->initialized != INIT_NEW && pp->initialized != INIT_FAILED &&
- pp->initialized != INIT_MISSING_UDEV &&
pp->initialized != INIT_OFFLINE)
return CHECK_PATH_SKIPPED;
--
2.48.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] multipathd: monitor new multipath dev even if we can't update it
2025-03-24 20:55 ` [PATCH 1/3] multipathd: monitor new multipath dev even if we can't update it Benjamin Marzinski
@ 2025-03-25 22:33 ` Martin Wilck
2025-03-27 23:51 ` Benjamin Marzinski
0 siblings, 1 reply; 10+ messages in thread
From: Martin Wilck @ 2025-03-25 22:33 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Mon, 2025-03-24 at 16:55 -0400, Benjamin Marzinski wrote:
> If a multipath device was created by the multipath command,
> multipathd
> might not agree with how the device was created. ev_add_map() can
> reload
> the device with a different table by calling add_map_without_path() -
> >
> update_map(). If this reloading of the map failed, multipathd was
> simply
> ignoring the multipath device, even though it still existed.
>
> One way that reloading can fail is if a path that multipathd already
> has
> initialized goes offline. If a multipath device is created by the
> multipath command while the path is offline, it will not use the
> offline
> path, since multipath won't be able to get the necessary pathinfo.
> However, multipathd will already have the pathinfo for the path, and
> may
> not even know that it's offline, since the path is an orphan. When it
> tries to reload the device, it will include the offline path, and the
> reload will fail.
Am I understanding correctly that during the reload, bdev_open() failed
in the kernel because the path was offline?
I was thinking about my recent tests [1] where I'd come to the
conclusion that reload failure is hardly possible. While I'd realized
that "trying to add a path device that was not previously mapped" might
fail, I didn't think of the scenario you describe here.
[1] https://lore.kernel.org/dm-devel/ee6fcbda31fd1f13969653782417fbed748f5bc7.camel@suse.com/
>
> Instead of ignoring the device if it can't reload it, multipathd
> should
> just montior it as it is. When the path device is no longer offline,
> it
> can be added back to the multipath device by calling
> "multipathd reconfigure" or "multipathd add path <path>".
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/3] multipathd: re-add paths skipped because they were offline
2025-03-24 20:55 ` [PATCH 2/3] multipathd: re-add paths skipped because they were offline Benjamin Marzinski
@ 2025-03-25 22:33 ` Martin Wilck
2025-03-28 16:36 ` Benjamin Marzinski
0 siblings, 1 reply; 10+ messages in thread
From: Martin Wilck @ 2025-03-25 22:33 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Mon, 2025-03-24 at 16:55 -0400, Benjamin Marzinski wrote:
> When a new device is added by the multipath command, multipathd may
> know
> of other paths that cannot be added to the device because they are
> currently offline. Instead of ignoring these paths, multipathd will
> now
> re-add them when they come back online. To do this, it multipathd
> needs
> a new device initialized state, INIT_OFFLINE, to track devices that
> were
> in INIT_OK, but could not be added to an existing multipath device
> because they were offline. These paths are handled along with the
> other
> uninitialized paths.
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
This looks good in general, but I'm not certain about INIT_OFFLINE. The
property "path was temporarily offline while we tried to add it to a
map" is not logically related to path initialization, IMO. Also, we
have plenty of (pp->initialized != INIT_xyz) conditionals in the code,
and your patch touches only 2 of them. I find it difficult to rule out
that some of them might be broken by the additional init state. Have
you double-checked them all?
Maybe it would be better to store this property in a separate flag in
struct path?
2 more remarks below.
Thanks
Martin
> ---
> libmultipath/print.c | 1 +
> libmultipath/structs.h | 5 ++++
> libmultipath/structs_vec.c | 5 ++++
> multipathd/main.c | 58
> ++++++++++++++++++++++++++++++++++++--
> 4 files changed, 67 insertions(+), 2 deletions(-)
>
> diff --git a/libmultipath/print.c b/libmultipath/print.c
> index 00c03ace..ed8adebe 100644
> --- a/libmultipath/print.c
> +++ b/libmultipath/print.c
> @@ -572,6 +572,7 @@ static int snprint_initialized(struct strbuf
> *buff, const struct path * pp)
> [INIT_OK] = "ok",
> [INIT_REMOVED] = "removed",
> [INIT_PARTIAL] = "partial",
> + [INIT_OFFLINE] = "offline",
> };
> const char *str;
>
> diff --git a/libmultipath/structs.h b/libmultipath/structs.h
> index 28de9a7f..8644407f 100644
> --- a/libmultipath/structs.h
> +++ b/libmultipath/structs.h
> @@ -258,6 +258,11 @@ enum initialized_states {
> * change uevent is received.
> */
> INIT_PARTIAL,
> + /*
> + * INIT_OFFLINE: paths that should be part of an existing
> multipath
> + * device, but cannot be added because they are offline,
> + */
> + INIT_OFFLINE,
> INIT_LAST__,
> };
>
> diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
> index f6407e12..f122d056 100644
> --- a/libmultipath/structs_vec.c
> +++ b/libmultipath/structs_vec.c
> @@ -389,6 +389,9 @@ static void orphan_paths(vector pathvec, struct
> multipath *mpp, const char *reas
> free_path(pp);
> } else
> orphan_path(pp, reason);
> + } else if (pp->initialized == INIT_OFFLINE &&
> + strncmp(mpp->wwid, pp->wwid, WWID_SIZE)
> == 0) {
> + pp->initialized = INIT_OK;
> }
> }
> }
> @@ -595,6 +598,8 @@ void sync_paths(struct multipath *mpp, vector
> pathvec)
> found = 0;
> vector_foreach_slot(mpp->pg, pgp, j) {
> if (find_slot(pgp->paths, (void *)pp) != -1)
> {
> + if (pp->initialized == INIT_OFFLINE)
> + pp->initialized = INIT_OK;
> found = 1;
> break;
> }
> diff --git a/multipathd/main.c b/multipathd/main.c
> index 7aaae773..ecad5a4f 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -644,11 +644,44 @@ pr_register_active_paths(struct multipath *mpp)
> }
> }
>
> +static void
> +save_offline_paths(struct multipath *mpp, vector offline_paths)
const struct multipath *mpp ?
> +{
> + unsigned int i, j;
> + struct path *pp;
> + struct pathgroup *pgp;
> +
> + vector_foreach_slot (mpp->pg, pgp, i)
> + vector_foreach_slot (pgp->paths, pp, j)
> + if (pp->initialized == INIT_OK &&
> + pp->sysfs_state == PATH_DOWN)
> + store_path(offline_paths, pp);
You're not handling error from store_path here. I don't think you need
to, but perhaps indicate this by adding a comment or calling
(void)store_path(...).
> +}
> +
> +static void
> +handle_orphaned_offline_paths(vector offline_paths)
> +{
> + unsigned int i;
> + struct path *pp;
> +
> + vector_foreach_slot (offline_paths, pp, i)
> + if (pp->mpp == NULL)
> + pp->initialized = INIT_OFFLINE;
> +}
> +
> +static void
> +cleanup_reset_vec(struct vector_s **v)
> +{
> + vector_reset(*v);
> +}
> +
> static int
> update_map (struct multipath *mpp, struct vectors *vecs, int
> new_map)
> {
> int retries = 3;
> char *params __attribute__((cleanup(cleanup_charp))) = NULL;
> + struct vector_s offline_paths_vec = { .allocated = 0 };
> + vector offline_paths
> __attribute__((cleanup(cleanup_reset_vec))) = &offline_paths_vec;
>
> retry:
> condlog(4, "%s: updating new map", mpp->alias);
> @@ -685,6 +718,9 @@ fail:
> return 1;
> }
>
> + if (new_map && retries < 0)
> + save_offline_paths(mpp, offline_paths);
> +
> if (setup_multipath(vecs, mpp))
> return 1;
>
> @@ -695,6 +731,9 @@ fail:
> if (mpp->prflag == PRFLAG_SET)
> pr_register_active_paths(mpp);
>
> + if (VECTOR_SIZE(offline_paths) != 0)
> + handle_orphaned_offline_paths(offline_paths);
> +
> if (retries < 0)
> condlog(0, "%s: failed reload in new map update",
> mpp->alias);
> return 0;
> @@ -2793,7 +2832,8 @@ check_uninitialized_path(struct path * pp,
> unsigned int ticks)
> struct config *conf;
>
> if (pp->initialized != INIT_NEW && pp->initialized !=
> INIT_FAILED &&
> - pp->initialized != INIT_MISSING_UDEV)
> + pp->initialized != INIT_MISSING_UDEV &&
> + pp->initialized != INIT_OFFLINE)
> return CHECK_PATH_SKIPPED;
>
> if (pp->tick)
> @@ -2849,7 +2889,8 @@ update_uninitialized_path(struct vectors *
> vecs, struct path * pp)
> struct config *conf;
>
> if (pp->initialized != INIT_NEW && pp->initialized !=
> INIT_FAILED &&
> - pp->initialized != INIT_MISSING_UDEV)
> + pp->initialized != INIT_MISSING_UDEV &&
> + pp->initialized != INIT_OFFLINE)
> return CHECK_PATH_SKIPPED;
>
> newstate = get_new_state(pp);
> @@ -2875,6 +2916,19 @@ update_uninitialized_path(struct vectors *
> vecs, struct path * pp)
> free_path(pp);
> return CHECK_PATH_REMOVED;
> }
> + } else if (pp->initialized == INIT_OFFLINE &&
> + (newstate == PATH_UP || newstate == PATH_GHOST))
> {
> + pp->initialized = INIT_OK;
> + if (pp->recheck_wwid == RECHECK_WWID_ON &&
> + check_path_wwid_change(pp)) {
> + condlog(0, "%s: path wwid change detected.
> Removing",
> + pp->dev);
> + return handle_path_wwid_change(pp, vecs)?
> + CHECK_PATH_REMOVED :
> + CHECK_PATH_SKIPPED;
> + }
> + ev_add_path(pp, vecs, 1);
> + pp->tick = 1;
> }
> return CHECK_PATH_CHECKED;
> }
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 3/3] multipathd: don't update paths in INIT_MISSING_UDEV
2025-03-24 20:55 ` [PATCH 3/3] multipathd: don't update paths in INIT_MISSING_UDEV Benjamin Marzinski
@ 2025-03-25 22:40 ` Martin Wilck
0 siblings, 0 replies; 10+ messages in thread
From: Martin Wilck @ 2025-03-25 22:40 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Mon, 2025-03-24 at 16:55 -0400, Benjamin Marzinski wrote:
> There is nothing for update_uninitialized_path() to do for paths in
> the
> INIT_MISSING_UDEV state. In fact, there shouldn't be any paths in
> this
> state when update_uninitialized_path() is called, since they will
> have
> switched to a different state in check_uninitialized_path().
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
> ---
> multipathd/main.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/multipathd/main.c b/multipathd/main.c
> index ecad5a4f..9c44e6e6 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -2889,7 +2889,6 @@ update_uninitialized_path(struct vectors *
> vecs, struct path * pp)
> struct config *conf;
>
> if (pp->initialized != INIT_NEW && pp->initialized !=
> INIT_FAILED &&
> - pp->initialized != INIT_MISSING_UDEV &&
> pp->initialized != INIT_OFFLINE)
> return CHECK_PATH_SKIPPED;
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] multipathd: monitor new multipath dev even if we can't update it
2025-03-25 22:33 ` Martin Wilck
@ 2025-03-27 23:51 ` Benjamin Marzinski
0 siblings, 0 replies; 10+ messages in thread
From: Benjamin Marzinski @ 2025-03-27 23:51 UTC (permalink / raw)
To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development
On Tue, Mar 25, 2025 at 11:33:12PM +0100, Martin Wilck wrote:
> On Mon, 2025-03-24 at 16:55 -0400, Benjamin Marzinski wrote:
> > If a multipath device was created by the multipath command,
> > multipathd
> > might not agree with how the device was created. ev_add_map() can
> > reload
> > the device with a different table by calling add_map_without_path() -
> > >
> > update_map(). If this reloading of the map failed, multipathd was
> > simply
> > ignoring the multipath device, even though it still existed.
> >
> > One way that reloading can fail is if a path that multipathd already
> > has
> > initialized goes offline. If a multipath device is created by the
> > multipath command while the path is offline, it will not use the
> > offline
> > path, since multipath won't be able to get the necessary pathinfo.
> > However, multipathd will already have the pathinfo for the path, and
> > may
> > not even know that it's offline, since the path is an orphan. When it
> > tries to reload the device, it will include the offline path, and the
> > reload will fail.
>
> Am I understanding correctly that during the reload, bdev_open() failed
> in the kernel because the path was offline?
Yep. It's failing in sd_open() -> scsi_block_when_processing_errors()
see the comment here:
https://github.com/torvalds/linux/blob/5c2a430e85994f4873ea5ec42091baa1153bc731/drivers/scsi/sd.c#L1523
> I was thinking about my recent tests [1] where I'd come to the
> conclusion that reload failure is hardly possible. While I'd realized
> that "trying to add a path device that was not previously mapped" might
> fail, I didn't think of the scenario you describe here.
>
> [1] https://lore.kernel.org/dm-devel/ee6fcbda31fd1f13969653782417fbed748f5bc7.camel@suse.com/
>
> >
> > Instead of ignoring the device if it can't reload it, multipathd
> > should
> > just montior it as it is. When the path device is no longer offline,
> > it
> > can be added back to the multipath device by calling
> > "multipathd reconfigure" or "multipathd add path <path>".
>
>
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
>
> Reviewed-by: Martin Wilck <mwilck@suse.com>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/3] multipathd: re-add paths skipped because they were offline
2025-03-25 22:33 ` Martin Wilck
@ 2025-03-28 16:36 ` Benjamin Marzinski
2025-03-28 19:15 ` Martin Wilck
0 siblings, 1 reply; 10+ messages in thread
From: Benjamin Marzinski @ 2025-03-28 16:36 UTC (permalink / raw)
To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development
On Tue, Mar 25, 2025 at 11:33:18PM +0100, Martin Wilck wrote:
> On Mon, 2025-03-24 at 16:55 -0400, Benjamin Marzinski wrote:
> > When a new device is added by the multipath command, multipathd may
> > know
> > of other paths that cannot be added to the device because they are
> > currently offline. Instead of ignoring these paths, multipathd will
> > now
> > re-add them when they come back online. To do this, it multipathd
> > needs
> > a new device initialized state, INIT_OFFLINE, to track devices that
> > were
> > in INIT_OK, but could not be added to an existing multipath device
> > because they were offline. These paths are handled along with the
> > other
> > uninitialized paths.
> >
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
>
> This looks good in general, but I'm not certain about INIT_OFFLINE. The
> property "path was temporarily offline while we tried to add it to a
> map" is not logically related to path initialization, IMO. Also, we
> have plenty of (pp->initialized != INIT_xyz) conditionals in the code,
> and your patch touches only 2 of them. I find it difficult to rule out
> that some of them might be broken by the additional init state. Have
> you double-checked them all?
Yeah. I may have overlooked something, but I did look through them. I
just double-checked pathinfo(), since if that's called with DI_ALL and
succeeds, it will unconditionally set the path to INIT_OK. There isn't
a case where that could happen to a path in INIT_OFFLINE, but it would
make sense to add a check there, just to make it obvious that this
shouldn't move the path from INIT_OFFLINE to INIT_OK. Even if DI_CHECKER
is set and pathinfo() discovers that the path has come back up, It's
easier to just wait for checkerloop to check the path, and handle adding
it the the multipath device there.
I should point out what this patch doesn't do, since that might clear up
some cases where you would think I need to handle offline paths. This
patch is only taking care of the case where a multipath device exists,
but multipathd won't start monitoring it because it can't reload the
table with the offline paths. It does not handle the case where offline
paths exist, and that keeps multipathd from creating a device in the
first place.
The easiest way for that to happen is if the device WWID is not in the
wwids file and find_multipaths is set to "on" or "smart", where a device
will get multipathed when a second path appears. In this case, you could
add a single path and fully initialize it, then have that path go
offline, and later add a second path. Multipathd would see that there
are two paths, and try to create a multipath device. However the
offline path would keep the device from being created. Since we
call pathinfo(DI_CHECKER) in adopt_paths(), multipathd should know
that the path is offline before it tries to create the device, so
we could create the device without any offline paths, and then set those
paths to INIT_OFFLINE, for later addition.
This case seemed less important to me than the one where we end up with
an untracked mlutipath device. This problem has always existed and it's
immediately obvious when it occurs (there's no multipath device), and
yet I've never gotten a bug report for it. This makes sense because it's
pretty hard to hit. Usually, the path will either not get initialized in
the first place because it's offline (in which case we won't attempt to
use it as part of a multipath device) or multipathd will create a
multipath device using it as soon as the path gets added, leaving a very
small window between when multipathd initializes the path and when
it creates the multipath device. The larger window for the path to
go offline only happens in the case where multipathd doesn't know
to create a multipath device right away.
The case with the untracked device is pretty much as hard to hit, and
has also always existed, but it's not obvious to the user that their
multipath device isn't being tracked by multipathd. That seems much
worse to me. I'm considering fixing the case where the device doesn't
get created at all, but that can happen in multiple places, so I think
it'll be a little messier. Also, I can't decide if multipathd should try
to create the device with all the paths first, and only ignore the
offline paths on retries, if the first create attempt fails, or if it
should always skip the offline paths, at least if if just checked their
state in adopt_paths(). Thoughts?
>
> Maybe it would be better to store this property in a separate flag in
> struct path?
Sure. I can switch to that.
> 2 more remarks below.
>
> Thanks
> Martin
>
> > ---
> > libmultipath/print.c | 1 +
> > libmultipath/structs.h | 5 ++++
> > libmultipath/structs_vec.c | 5 ++++
> > multipathd/main.c | 58
> > ++++++++++++++++++++++++++++++++++++--
> > 4 files changed, 67 insertions(+), 2 deletions(-)
> >
> > diff --git a/libmultipath/print.c b/libmultipath/print.c
> > index 00c03ace..ed8adebe 100644
> > --- a/libmultipath/print.c
> > +++ b/libmultipath/print.c
> > @@ -572,6 +572,7 @@ static int snprint_initialized(struct strbuf
> > *buff, const struct path * pp)
> > [INIT_OK] = "ok",
> > [INIT_REMOVED] = "removed",
> > [INIT_PARTIAL] = "partial",
> > + [INIT_OFFLINE] = "offline",
> > };
> > const char *str;
> >
> > diff --git a/libmultipath/structs.h b/libmultipath/structs.h
> > index 28de9a7f..8644407f 100644
> > --- a/libmultipath/structs.h
> > +++ b/libmultipath/structs.h
> > @@ -258,6 +258,11 @@ enum initialized_states {
> > * change uevent is received.
> > */
> > INIT_PARTIAL,
> > + /*
> > + * INIT_OFFLINE: paths that should be part of an existing
> > multipath
> > + * device, but cannot be added because they are offline,
> > + */
> > + INIT_OFFLINE,
> > INIT_LAST__,
> > };
> >
> > diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
> > index f6407e12..f122d056 100644
> > --- a/libmultipath/structs_vec.c
> > +++ b/libmultipath/structs_vec.c
> > @@ -389,6 +389,9 @@ static void orphan_paths(vector pathvec, struct
> > multipath *mpp, const char *reas
> > free_path(pp);
> > } else
> > orphan_path(pp, reason);
> > + } else if (pp->initialized == INIT_OFFLINE &&
> > + strncmp(mpp->wwid, pp->wwid, WWID_SIZE)
> > == 0) {
> > + pp->initialized = INIT_OK;
> > }
> > }
> > }
> > @@ -595,6 +598,8 @@ void sync_paths(struct multipath *mpp, vector
> > pathvec)
> > found = 0;
> > vector_foreach_slot(mpp->pg, pgp, j) {
> > if (find_slot(pgp->paths, (void *)pp) != -1)
> > {
> > + if (pp->initialized == INIT_OFFLINE)
> > + pp->initialized = INIT_OK;
> > found = 1;
> > break;
> > }
> > diff --git a/multipathd/main.c b/multipathd/main.c
> > index 7aaae773..ecad5a4f 100644
> > --- a/multipathd/main.c
> > +++ b/multipathd/main.c
> > @@ -644,11 +644,44 @@ pr_register_active_paths(struct multipath *mpp)
> > }
> > }
> >
> > +static void
> > +save_offline_paths(struct multipath *mpp, vector offline_paths)
>
> const struct multipath *mpp ?
Sure.
> > +{
> > + unsigned int i, j;
> > + struct path *pp;
> > + struct pathgroup *pgp;
> > +
> > + vector_foreach_slot (mpp->pg, pgp, i)
> > + vector_foreach_slot (pgp->paths, pp, j)
> > + if (pp->initialized == INIT_OK &&
> > + pp->sysfs_state == PATH_DOWN)
> > + store_path(offline_paths, pp);
>
> You're not handling error from store_path here. I don't think you need
> to, but perhaps indicate this by adding a comment or calling
> (void)store_path(...).
Yeah. There's no real point in failing here if storing the path fails.
I can comment this.
> > +}
> > +
> > +static void
> > +handle_orphaned_offline_paths(vector offline_paths)
> > +{
> > + unsigned int i;
> > + struct path *pp;
> > +
> > + vector_foreach_slot (offline_paths, pp, i)
> > + if (pp->mpp == NULL)
> > + pp->initialized = INIT_OFFLINE;
> > +}
> > +
> > +static void
> > +cleanup_reset_vec(struct vector_s **v)
> > +{
> > + vector_reset(*v);
> > +}
> > +
> > static int
> > update_map (struct multipath *mpp, struct vectors *vecs, int
> > new_map)
> > {
> > int retries = 3;
> > char *params __attribute__((cleanup(cleanup_charp))) = NULL;
> > + struct vector_s offline_paths_vec = { .allocated = 0 };
> > + vector offline_paths
> > __attribute__((cleanup(cleanup_reset_vec))) = &offline_paths_vec;
> >
> > retry:
> > condlog(4, "%s: updating new map", mpp->alias);
> > @@ -685,6 +718,9 @@ fail:
> > return 1;
> > }
> >
> > + if (new_map && retries < 0)
> > + save_offline_paths(mpp, offline_paths);
> > +
> > if (setup_multipath(vecs, mpp))
> > return 1;
> >
> > @@ -695,6 +731,9 @@ fail:
> > if (mpp->prflag == PRFLAG_SET)
> > pr_register_active_paths(mpp);
> >
> > + if (VECTOR_SIZE(offline_paths) != 0)
> > + handle_orphaned_offline_paths(offline_paths);
> > +
> > if (retries < 0)
> > condlog(0, "%s: failed reload in new map update",
> > mpp->alias);
> > return 0;
> > @@ -2793,7 +2832,8 @@ check_uninitialized_path(struct path * pp,
> > unsigned int ticks)
> > struct config *conf;
> >
> > if (pp->initialized != INIT_NEW && pp->initialized !=
> > INIT_FAILED &&
> > - pp->initialized != INIT_MISSING_UDEV)
> > + pp->initialized != INIT_MISSING_UDEV &&
> > + pp->initialized != INIT_OFFLINE)
> > return CHECK_PATH_SKIPPED;
> >
> > if (pp->tick)
> > @@ -2849,7 +2889,8 @@ update_uninitialized_path(struct vectors *
> > vecs, struct path * pp)
> > struct config *conf;
> >
> > if (pp->initialized != INIT_NEW && pp->initialized !=
> > INIT_FAILED &&
> > - pp->initialized != INIT_MISSING_UDEV)
> > + pp->initialized != INIT_MISSING_UDEV &&
> > + pp->initialized != INIT_OFFLINE)
> > return CHECK_PATH_SKIPPED;
> >
> > newstate = get_new_state(pp);
> > @@ -2875,6 +2916,19 @@ update_uninitialized_path(struct vectors *
> > vecs, struct path * pp)
> > free_path(pp);
> > return CHECK_PATH_REMOVED;
> > }
> > + } else if (pp->initialized == INIT_OFFLINE &&
> > + (newstate == PATH_UP || newstate == PATH_GHOST))
> > {
> > + pp->initialized = INIT_OK;
> > + if (pp->recheck_wwid == RECHECK_WWID_ON &&
> > + check_path_wwid_change(pp)) {
> > + condlog(0, "%s: path wwid change detected.
> > Removing",
> > + pp->dev);
> > + return handle_path_wwid_change(pp, vecs)?
> > + CHECK_PATH_REMOVED :
> > + CHECK_PATH_SKIPPED;
> > + }
> > + ev_add_path(pp, vecs, 1);
> > + pp->tick = 1;
> > }
> > return CHECK_PATH_CHECKED;
> > }
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/3] multipathd: re-add paths skipped because they were offline
2025-03-28 16:36 ` Benjamin Marzinski
@ 2025-03-28 19:15 ` Martin Wilck
0 siblings, 0 replies; 10+ messages in thread
From: Martin Wilck @ 2025-03-28 19:15 UTC (permalink / raw)
To: Benjamin Marzinski; +Cc: Christophe Varoqui, device-mapper development
On Fri, 2025-03-28 at 12:36 -0400, Benjamin Marzinski wrote:
> On Tue, Mar 25, 2025 at 11:33:18PM +0100, Martin Wilck wrote:
> > On Mon, 2025-03-24 at 16:55 -0400, Benjamin Marzinski wrote:
> > > When a new device is added by the multipath command, multipathd
> > > may
> > > know
> > > of other paths that cannot be added to the device because they
> > > are
> > > currently offline. Instead of ignoring these paths, multipathd
> > > will
> > > now
> > > re-add them when they come back online. To do this, it multipathd
> > > needs
> > > a new device initialized state, INIT_OFFLINE, to track devices
> > > that
> > > were
> > > in INIT_OK, but could not be added to an existing multipath
> > > device
> > > because they were offline. These paths are handled along with the
> > > other
> > > uninitialized paths.
> > >
> > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> >
> > This looks good in general, but I'm not certain about INIT_OFFLINE.
> > The
> > property "path was temporarily offline while we tried to add it to
> > a
> > map" is not logically related to path initialization, IMO. Also, we
> > have plenty of (pp->initialized != INIT_xyz) conditionals in the
> > code,
> > and your patch touches only 2 of them. I find it difficult to rule
> > out
> > that some of them might be broken by the additional init state.
> > Have
> > you double-checked them all?
>
> Yeah. I may have overlooked something, but I did look through them. I
> just double-checked pathinfo(), since if that's called with DI_ALL
> and
> succeeds, it will unconditionally set the path to INIT_OK. There
> isn't
> a case where that could happen to a path in INIT_OFFLINE, but it
> would
> make sense to add a check there, just to make it obvious that this
> shouldn't move the path from INIT_OFFLINE to INIT_OK. Even if
> DI_CHECKER
> is set and pathinfo() discovers that the path has come back up, It's
> easier to just wait for checkerloop to check the path, and handle
> adding
> it the the multipath device there.
>
> I should point out what this patch doesn't do, since that might clear
> up
> some cases where you would think I need to handle offline paths. This
> patch is only taking care of the case where a multipath device
> exists,
> but multipathd won't start monitoring it because it can't reload the
> table with the offline paths. It does not handle the case where
> offline
> paths exist, and that keeps multipathd from creating a device in the
> first place.
>
> The easiest way for that to happen is if the device WWID is not in
> the
> wwids file and find_multipaths is set to "on" or "smart", where a
> device
> will get multipathed when a second path appears. In this case, you
> could
> add a single path and fully initialize it, then have that path go
> offline, and later add a second path. Multipathd would see that there
> are two paths, and try to create a multipath device. However the
> offline path would keep the device from being created. Since we
> call pathinfo(DI_CHECKER) in adopt_paths(), multipathd should know
> that the path is offline before it tries to create the device, so
> we could create the device without any offline paths, and then set
> those
> paths to INIT_OFFLINE, for later addition.
>
> This case seemed less important to me than the one where we end up
> with
> an untracked mlutipath device. This problem has always existed and
> it's
> immediately obvious when it occurs (there's no multipath device), and
> yet I've never gotten a bug report for it. This makes sense because
> it's
> pretty hard to hit. Usually, the path will either not get initialized
> in
> the first place because it's offline (in which case we won't attempt
> to
> use it as part of a multipath device) or multipathd will create a
> multipath device using it as soon as the path gets added, leaving a
> very
> small window between when multipathd initializes the path and when
> it creates the multipath device. The larger window for the path to
> go offline only happens in the case where multipathd doesn't know
> to create a multipath device right away.
>
> The case with the untracked device is pretty much as hard to hit, and
> has also always existed, but it's not obvious to the user that their
> multipath device isn't being tracked by multipathd. That seems much
> worse to me. I'm considering fixing the case where the device doesn't
> get created at all, but that can happen in multiple places, so I
> think
> it'll be a little messier. Also, I can't decide if multipathd should
> try
> to create the device with all the paths first, and only ignore the
> offline paths on retries, if the first create attempt fails, or if it
> should always skip the offline paths, at least if if just checked
> their
> state in adopt_paths(). Thoughts?
I agree that this is a corner case, and I have also never seen a bug
report about it. I wouldn't want to spend a lot of work on it.
Arguably the inconsistency is in the kernel, as the kernel doesn't mind
carrying around an offline device in a map after it has been added, but
refuses to add it in the first place while offline.
But I'm not keen on changing anything in that area, either.
Martin
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-03-28 19:15 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-24 20:55 [PATCH 0/3] fix issue of multipathd not tracking device Benjamin Marzinski
2025-03-24 20:55 ` [PATCH 1/3] multipathd: monitor new multipath dev even if we can't update it Benjamin Marzinski
2025-03-25 22:33 ` Martin Wilck
2025-03-27 23:51 ` Benjamin Marzinski
2025-03-24 20:55 ` [PATCH 2/3] multipathd: re-add paths skipped because they were offline Benjamin Marzinski
2025-03-25 22:33 ` Martin Wilck
2025-03-28 16:36 ` Benjamin Marzinski
2025-03-28 19:15 ` Martin Wilck
2025-03-24 20:55 ` [PATCH 3/3] multipathd: don't update paths in INIT_MISSING_UDEV Benjamin Marzinski
2025-03-25 22:40 ` Martin Wilck
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.