* [PATCH 0/8] multipath fixes to tableless device handling
@ 2024-10-31 18:32 Benjamin Marzinski
2024-10-31 18:32 ` [PATCH 1/8] multipathd: print an error when failing to connect to multipathd Benjamin Marzinski
` (9 more replies)
0 siblings, 10 replies; 24+ messages in thread
From: Benjamin Marzinski @ 2024-10-31 18:32 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
This patchset include a couple of miscellaneous cleanups, but is mostly
in response to issues brought up in
https://github.com/opensvc/multipath-tools/issues/100
It adds auto restarting to the multipathd.service unit file. I'm fairly
conflicted about the correct limits to be placed on these restarts, so
if anyone has strong opinions and a good argument, I'm more than willing
to change them.
The bulk of the changes deal with how multipath handles devices without
any table. Multipath was supposed to delete these if they were left
behind after a failed device creation, but that code was broken.
However devices aren't typically left behind after failed creates, so it
didn't matter.
A bigger issue is that multipathd and multipath could fail if tableless
devices were present. With this patchset, they will simply ignore
tableless devices that don't have a multipath prefixed DM UUID. The last
patch is a RFC patch that changes the behavior for tableless devices
that do have a multipath prefixed DM UUID. It makes multipath remove
these devices on the grounds that they are likely failed creates.
However, looking at the libdevmapper code, I'm not sure that we actually
want to do this. When a DM_DEVICE_CREATE task is run, it will first
create a tableless device, and then immediately load the table and
resume the device. So it's possible any that tableless devices which
multipath sees are actually in the process of getting created. I left
the patch as part of the set, but I'm not sure that removing the devices
is the right answer here. I haven't ever seen any tableless multipath
devices lying around (and I couldn't force any to get created when I
tried). Unless other people have seen these, then I don't think the
final patch of this set should go in.
Benjamin Marzinski (8):
multipathd: print an error when failing to connect to multipathd
libmultipath: check DM UUID earlier in libmp_mapinfo__
libmultipath: Don't fail dm_get_maps for DMP_NOT_FOUND
multipathd.service: restart multipathd on failure
libmultipath: use MAPINFO_CHECK_UUID in dm_get_multipath
libmultipath: signal multipath UUID device with no table
libmultipath: fix removing device after failed creation
libmultipath: remove devices with no table and multipath DM UUID
libmultipath/devmapper.c | 56 +++++++++++++++++++++++---------
libmultipath/devmapper.h | 6 ++--
multipathd/multipathd.service.in | 3 ++
multipathd/uxclnt.c | 4 ++-
4 files changed, 51 insertions(+), 18 deletions(-)
--
2.46.2
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 1/8] multipathd: print an error when failing to connect to multipathd
2024-10-31 18:32 [PATCH 0/8] multipath fixes to tableless device handling Benjamin Marzinski
@ 2024-10-31 18:32 ` Benjamin Marzinski
2024-10-31 18:32 ` [PATCH 2/8] libmultipath: check DM UUID earlier in libmp_mapinfo__ Benjamin Marzinski
` (8 subsequent siblings)
9 siblings, 0 replies; 24+ messages in thread
From: Benjamin Marzinski @ 2024-10-31 18:32 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
Issuing multipathd commands when the daemon wasn't running didn't
print anything.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
multipathd/uxclnt.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/multipathd/uxclnt.c b/multipathd/uxclnt.c
index 16133a88..55d253a7 100644
--- a/multipathd/uxclnt.c
+++ b/multipathd/uxclnt.c
@@ -52,8 +52,10 @@ int uxclnt(char * inbuf, unsigned int timeout)
if (!inbuf)
return 1;
fd = mpath_connect();
- if (fd == -1)
+ if (fd == -1) {
+ fprintf(stderr, "ERROR: failed to connect to multipathd\n");
return 1;
+ }
ret = process_req(fd, inbuf, timeout);
--
2.46.2
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 2/8] libmultipath: check DM UUID earlier in libmp_mapinfo__
2024-10-31 18:32 [PATCH 0/8] multipath fixes to tableless device handling Benjamin Marzinski
2024-10-31 18:32 ` [PATCH 1/8] multipathd: print an error when failing to connect to multipathd Benjamin Marzinski
@ 2024-10-31 18:32 ` Benjamin Marzinski
2024-10-31 18:32 ` [PATCH 3/8] libmultipath: Don't fail dm_get_maps for DMP_NOT_FOUND Benjamin Marzinski
` (7 subsequent siblings)
9 siblings, 0 replies; 24+ messages in thread
From: Benjamin Marzinski @ 2024-10-31 18:32 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
Before checking the target details, first check that the device has a
"mpath-" dm uuid prefix. If it doesn't then we can just ignore the
device. This keeps multipath from printing error messages for
non-multipath devices with multiple targets for instance.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
libmultipath/devmapper.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/libmultipath/devmapper.c b/libmultipath/devmapper.c
index c497c225..41c6ae4d 100644
--- a/libmultipath/devmapper.c
+++ b/libmultipath/devmapper.c
@@ -715,6 +715,16 @@ static int libmp_mapinfo__(int flags, mapid_t id, mapinfo_t info, const char *ma
return DMP_NOT_FOUND;
}
+ if ((info.name && !(name = dm_task_get_name(dmt)))
+ || ((info.uuid || flags & MAPINFO_CHECK_UUID)
+ && !(uuid = dm_task_get_uuid(dmt))))
+ return DMP_ERR;
+
+ if (flags & MAPINFO_CHECK_UUID && !is_mpath_uuid(uuid)) {
+ condlog(3, "%s: UUID mismatch: %s", fname__, uuid);
+ return DMP_NO_MATCH;
+ }
+
if (info.target || info.status || info.size || flags & MAPINFO_TGT_TYPE__) {
if (dm_get_next_target(dmt, NULL, &start, &length,
&target_type, ¶ms) != NULL) {
@@ -740,18 +750,10 @@ static int libmp_mapinfo__(int flags, mapid_t id, mapinfo_t info, const char *ma
* Check possible error conditions.
* If error is returned, don't touch any output parameters.
*/
- if ((info.name && !(name = dm_task_get_name(dmt)))
- || ((info.uuid || flags & MAPINFO_CHECK_UUID)
- && !(uuid = dm_task_get_uuid(dmt)))
- || (info.status && !(tmp_status = strdup(params)))
+ if ((info.status && !(tmp_status = strdup(params)))
|| (info.target && !tmp_target && !(tmp_target = strdup(params))))
return DMP_ERR;
- if (flags & MAPINFO_CHECK_UUID && !is_mpath_uuid(uuid)) {
- condlog(3, "%s: UUID mismatch: %s", fname__, uuid);
- return DMP_NO_MATCH;
- }
-
if (info.name) {
strlcpy(info.name, name, WWID_SIZE);
condlog(4, "%s: %s: name: \"%s\"", fname__, map_id, info.name);
--
2.46.2
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 3/8] libmultipath: Don't fail dm_get_maps for DMP_NOT_FOUND
2024-10-31 18:32 [PATCH 0/8] multipath fixes to tableless device handling Benjamin Marzinski
2024-10-31 18:32 ` [PATCH 1/8] multipathd: print an error when failing to connect to multipathd Benjamin Marzinski
2024-10-31 18:32 ` [PATCH 2/8] libmultipath: check DM UUID earlier in libmp_mapinfo__ Benjamin Marzinski
@ 2024-10-31 18:32 ` Benjamin Marzinski
2024-10-31 18:32 ` [PATCH 4/8] multipathd.service: restart multipathd on failure Benjamin Marzinski
` (6 subsequent siblings)
9 siblings, 0 replies; 24+ messages in thread
From: Benjamin Marzinski @ 2024-10-31 18:32 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
DMP_NOT_FOUND is not an error.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
libmultipath/devmapper.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/libmultipath/devmapper.c b/libmultipath/devmapper.c
index 41c6ae4d..33eadb12 100644
--- a/libmultipath/devmapper.c
+++ b/libmultipath/devmapper.c
@@ -1265,6 +1265,7 @@ int dm_get_maps(vector mp)
vector_set_slot(mp, mpp);
break;
case DMP_NO_MATCH:
+ case DMP_NOT_FOUND:
break;
default:
return 1;
--
2.46.2
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 4/8] multipathd.service: restart multipathd on failure
2024-10-31 18:32 [PATCH 0/8] multipath fixes to tableless device handling Benjamin Marzinski
` (2 preceding siblings ...)
2024-10-31 18:32 ` [PATCH 3/8] libmultipath: Don't fail dm_get_maps for DMP_NOT_FOUND Benjamin Marzinski
@ 2024-10-31 18:32 ` Benjamin Marzinski
2024-10-31 18:32 ` [PATCH 5/8] libmultipath: use MAPINFO_CHECK_UUID in dm_get_multipath Benjamin Marzinski
` (5 subsequent siblings)
9 siblings, 0 replies; 24+ messages in thread
From: Benjamin Marzinski @ 2024-10-31 18:32 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
systemd will now restart multipathd on failure unless it has already been
started 3 times in the last 30 seconds.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
multipathd/multipathd.service.in | 3 +++
1 file changed, 3 insertions(+)
diff --git a/multipathd/multipathd.service.in b/multipathd/multipathd.service.in
index 646001e6..b6a25b31 100644
--- a/multipathd/multipathd.service.in
+++ b/multipathd/multipathd.service.in
@@ -12,12 +12,15 @@ Conflicts=initrd-cleanup.service
ConditionKernelCommandLine=!nompath
ConditionKernelCommandLine=!multipath=off
ConditionVirtualization=!container
+StartLimitIntervalSec=30
+StartLimitBurst=3
[Service]
Type=notify
NotifyAccess=main
ExecStart=@BINDIR@/multipathd -d -s
ExecReload=@BINDIR@/multipathd reconfigure
+Restart=on-failure
TasksMax=infinity
LimitRTPRIO=10
CPUWeight=1000
--
2.46.2
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 5/8] libmultipath: use MAPINFO_CHECK_UUID in dm_get_multipath
2024-10-31 18:32 [PATCH 0/8] multipath fixes to tableless device handling Benjamin Marzinski
` (3 preceding siblings ...)
2024-10-31 18:32 ` [PATCH 4/8] multipathd.service: restart multipathd on failure Benjamin Marzinski
@ 2024-10-31 18:32 ` Benjamin Marzinski
2024-10-31 18:32 ` [PATCH 6/8] libmultipath: signal multipath UUID device with no table Benjamin Marzinski
` (4 subsequent siblings)
9 siblings, 0 replies; 24+ messages in thread
From: Benjamin Marzinski @ 2024-10-31 18:32 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
Instead of seperately calling is_mpath_uuid(), just use
MAPINFO_CHECK_UUID when calling libmp_mapinfo.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
libmultipath/devmapper.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/libmultipath/devmapper.c b/libmultipath/devmapper.c
index 33eadb12..13d8c1e5 100644
--- a/libmultipath/devmapper.c
+++ b/libmultipath/devmapper.c
@@ -1211,7 +1211,8 @@ static int dm_get_multipath(const char *name, struct multipath **pmpp)
if (!mpp->alias)
return DMP_ERR;
- if ((rc = libmp_mapinfo(DM_MAP_BY_NAME | MAPINFO_MPATH_ONLY,
+ if ((rc = libmp_mapinfo(DM_MAP_BY_NAME | MAPINFO_CHECK_UUID |
+ MAPINFO_MPATH_ONLY,
(mapid_t) { .str = name },
(mapinfo_t) {
.size = &mpp->size,
@@ -1220,9 +1221,6 @@ static int dm_get_multipath(const char *name, struct multipath **pmpp)
})) != DMP_OK)
return rc;
- if (!is_mpath_uuid(uuid))
- return DMP_NO_MATCH;
-
strlcpy(mpp->wwid, uuid + UUID_PREFIX_LEN, sizeof(mpp->wwid));
*pmpp = steal_ptr(mpp);
--
2.46.2
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 6/8] libmultipath: signal multipath UUID device with no table
2024-10-31 18:32 [PATCH 0/8] multipath fixes to tableless device handling Benjamin Marzinski
` (4 preceding siblings ...)
2024-10-31 18:32 ` [PATCH 5/8] libmultipath: use MAPINFO_CHECK_UUID in dm_get_multipath Benjamin Marzinski
@ 2024-10-31 18:32 ` Benjamin Marzinski
2024-11-06 16:17 ` Martin Wilck
2024-10-31 18:33 ` [PATCH 7/8] libmultipath: fix removing device after failed creation Benjamin Marzinski
` (3 subsequent siblings)
9 siblings, 1 reply; 24+ messages in thread
From: Benjamin Marzinski @ 2024-10-31 18:32 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
if libmp_mapinfo() is run on a device that has a multipath prefixed DM
UUID but no table (either active or inactive), it will now return with a
new exit code, DMP_BAD_DEV, to signal that this is an invalid multipath
device. Currently all code paths treat this identically to
DMP_NOT_FOUND.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
libmultipath/devmapper.c | 18 ++++++++++++++++++
libmultipath/devmapper.h | 5 ++++-
2 files changed, 22 insertions(+), 1 deletion(-)
diff --git a/libmultipath/devmapper.c b/libmultipath/devmapper.c
index 13d8c1e5..6e11d5b5 100644
--- a/libmultipath/devmapper.c
+++ b/libmultipath/devmapper.c
@@ -726,6 +726,22 @@ static int libmp_mapinfo__(int flags, mapid_t id, mapinfo_t info, const char *ma
}
if (info.target || info.status || info.size || flags & MAPINFO_TGT_TYPE__) {
+ if (!dmi.live_table) {
+ /*
+ * If this is device has a multipath uuid but no
+ * table, flag it, so multipath can clean it up
+ */
+ if (flags & MAPINFO_CHECK_UUID &&
+ !dmi.inactive_table) {
+ condlog(2, "%s: multipath map %s has no table",
+ fname__, map_id);
+ return DMP_BAD_DEV;
+ } else {
+ condlog(2, "%s: map %s has no table", fname__,
+ map_id);
+ return DMP_NOT_FOUND;
+ }
+ }
if (dm_get_next_target(dmt, NULL, &start, &length,
&target_type, ¶ms) != NULL) {
condlog(2, "%s: map %s has multiple targets", fname__, map_id);
@@ -869,6 +885,7 @@ int dm_is_mpath(const char *name)
return DM_IS_MPATH_YES;
case DMP_NOT_FOUND:
case DMP_NO_MATCH:
+ case DMP_BAD_DEV:
return DM_IS_MPATH_NO;
case DMP_ERR:
default:
@@ -1262,6 +1279,7 @@ int dm_get_maps(vector mp)
}
vector_set_slot(mp, mpp);
break;
+ case DMP_BAD_DEV:
case DMP_NO_MATCH:
case DMP_NOT_FOUND:
break;
diff --git a/libmultipath/devmapper.h b/libmultipath/devmapper.h
index ba05e0a1..a2dcfb84 100644
--- a/libmultipath/devmapper.h
+++ b/libmultipath/devmapper.h
@@ -35,6 +35,7 @@ enum {
DMP_OK,
DMP_NOT_FOUND,
DMP_NO_MATCH,
+ DMP_BAD_DEV,
DMP_LAST__,
};
@@ -99,7 +100,9 @@ typedef struct libmp_map_info {
* @returns:
* DMP_OK if successful.
* DMP_NOT_FOUND if the map wasn't found, or has no or multiple targets.
- * DMP_NO_MATCH if the map didn't match @tgt_type (see above).
+ * DMP_NO_MATCH if the map didn't match @tgt_type (see above) or didn't
+ * have a multipath uuid prefix.
+ * DMP_BAD_DEV if the map has a multipath uuid prefix but no table.
* DMP_ERR if some other error occurred.
*
* This function obtains the requested information for the device-mapper map
--
2.46.2
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 7/8] libmultipath: fix removing device after failed creation
2024-10-31 18:32 [PATCH 0/8] multipath fixes to tableless device handling Benjamin Marzinski
` (5 preceding siblings ...)
2024-10-31 18:32 ` [PATCH 6/8] libmultipath: signal multipath UUID device with no table Benjamin Marzinski
@ 2024-10-31 18:33 ` Benjamin Marzinski
2024-10-31 18:33 ` [RFC PATCH 8/8] libmultipath: remove devices with no table and multipath DM UUID Benjamin Marzinski
` (2 subsequent siblings)
9 siblings, 0 replies; 24+ messages in thread
From: Benjamin Marzinski @ 2024-10-31 18:33 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
dm_flush_nap_nosync() only removes a device if it has a multipath table.
On failed removes, there is no table, so this function does nothing.
Instead, if libmp_mapinfo() returns DMP_BAD_DEV, then there is an empty map,
and it is removed using dm_device_remove().
Also, since there are no longer any callers of dm_flush_map_nosync(),
remove it.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
libmultipath/devmapper.c | 7 +++++--
libmultipath/devmapper.h | 1 -
2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/libmultipath/devmapper.c b/libmultipath/devmapper.c
index 6e11d5b5..0d1552f5 100644
--- a/libmultipath/devmapper.c
+++ b/libmultipath/devmapper.c
@@ -556,9 +556,12 @@ int dm_addmap_create (struct multipath *mpp, char * params)
* Failing the second part leaves an empty map. Clean it up.
*/
err = errno;
- if (dm_map_present(mpp->alias)) {
+ if (libmp_mapinfo(DM_MAP_BY_NAME | MAPINFO_MPATH_ONLY |
+ MAPINFO_CHECK_UUID,
+ (mapid_t) { .str = mpp->alias },
+ (mapinfo_t) { .uuid = NULL }) == DMP_BAD_DEV) {
condlog(3, "%s: failed to load map (a path might be in use)", mpp->alias);
- dm_flush_map_nosync(mpp->alias);
+ dm_device_remove(mpp->alias, 0);
}
if (err != EROFS) {
condlog(3, "%s: failed to load map, error %d",
diff --git a/libmultipath/devmapper.h b/libmultipath/devmapper.h
index a2dcfb84..16d6e04f 100644
--- a/libmultipath/devmapper.h
+++ b/libmultipath/devmapper.h
@@ -166,7 +166,6 @@ enum {
int dm_flush_map__ (const char *mapname, int flags, int retries);
#define dm_flush_map(mapname) dm_flush_map__(mapname, DMFL_NEED_SYNC, 0)
-#define dm_flush_map_nosync(mapname) dm_flush_map__(mapname, DMFL_NONE, 0)
#define dm_suspend_and_flush_map(mapname, retries) \
dm_flush_map__(mapname, DMFL_NEED_SYNC|DMFL_SUSPEND, retries)
int dm_flush_map_nopaths(const char * mapname, int deferred_remove);
--
2.46.2
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [RFC PATCH 8/8] libmultipath: remove devices with no table and multipath DM UUID
2024-10-31 18:32 [PATCH 0/8] multipath fixes to tableless device handling Benjamin Marzinski
` (6 preceding siblings ...)
2024-10-31 18:33 ` [PATCH 7/8] libmultipath: fix removing device after failed creation Benjamin Marzinski
@ 2024-10-31 18:33 ` Benjamin Marzinski
2024-11-06 16:52 ` [PATCH 0/8] multipath fixes to tableless device handling Martin Wilck
2024-11-06 16:52 ` Martin Wilck
9 siblings, 0 replies; 24+ messages in thread
From: Benjamin Marzinski @ 2024-10-31 18:33 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
if dm_get_multipath() returns DMP_BAD_DEV in dm_get_maps(), it means
that it found an device with a DM UUID prefix of "mpath-" but no table.
Treat this as the remains of a failed device creation, and remove it
like dm_addmap_create() does when it fails to fully create a device and
it's left in this state.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
libmultipath/devmapper.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/libmultipath/devmapper.c b/libmultipath/devmapper.c
index 0d1552f5..908d759c 100644
--- a/libmultipath/devmapper.c
+++ b/libmultipath/devmapper.c
@@ -1283,6 +1283,10 @@ int dm_get_maps(vector mp)
vector_set_slot(mp, mpp);
break;
case DMP_BAD_DEV:
+ condlog(2, "%s: removing empty multipath device",
+ names->name);
+ dm_device_remove(names->name, 0);
+ break;
case DMP_NO_MATCH:
case DMP_NOT_FOUND:
break;
--
2.46.2
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH 6/8] libmultipath: signal multipath UUID device with no table
2024-10-31 18:32 ` [PATCH 6/8] libmultipath: signal multipath UUID device with no table Benjamin Marzinski
@ 2024-11-06 16:17 ` Martin Wilck
2024-11-07 17:42 ` Benjamin Marzinski
0 siblings, 1 reply; 24+ messages in thread
From: Martin Wilck @ 2024-11-06 16:17 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Thu, 2024-10-31 at 14:32 -0400, Benjamin Marzinski wrote:
> if libmp_mapinfo() is run on a device that has a multipath prefixed
> DM
> UUID but no table (either active or inactive), it will now return
> with a
> new exit code, DMP_BAD_DEV, to signal that this is an invalid
> multipath
> device. Currently all code paths treat this identically to
> DMP_NOT_FOUND.
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
> libmultipath/devmapper.c | 18 ++++++++++++++++++
> libmultipath/devmapper.h | 5 ++++-
> 2 files changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/libmultipath/devmapper.c b/libmultipath/devmapper.c
> index 13d8c1e5..6e11d5b5 100644
> --- a/libmultipath/devmapper.c
> +++ b/libmultipath/devmapper.c
> @@ -726,6 +726,22 @@ static int libmp_mapinfo__(int flags, mapid_t
> id, mapinfo_t info, const char *ma
> }
>
> if (info.target || info.status || info.size || flags &
> MAPINFO_TGT_TYPE__) {
> + if (!dmi.live_table) {
> + /*
> + * If this is device has a multipath uuid
> but no
> + * table, flag it, so multipath can clean it
> up
> + */
> + if (flags & MAPINFO_CHECK_UUID &&
> + !dmi.inactive_table) {
> + condlog(2, "%s: multipath map %s has
> no table",
> + fname__, map_id);
> + return DMP_BAD_DEV;
What about the case when there's an inactive_table? AFAIU that means
that a resume operation after a table (re)load failed or was
interrupted, or that the program that (re)loaded the table (multipathd,
probably) died before it could resume the table.
In this case we'd fall through the the "map has no targets" case, but
shouldn't we also treat it as DMP_BAD_DEV?
Martin
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/8] multipath fixes to tableless device handling
2024-10-31 18:32 [PATCH 0/8] multipath fixes to tableless device handling Benjamin Marzinski
` (7 preceding siblings ...)
2024-10-31 18:33 ` [RFC PATCH 8/8] libmultipath: remove devices with no table and multipath DM UUID Benjamin Marzinski
@ 2024-11-06 16:52 ` Martin Wilck
2024-11-07 12:20 ` Martin Wilck
2024-11-06 16:52 ` Martin Wilck
9 siblings, 1 reply; 24+ messages in thread
From: Martin Wilck @ 2024-11-06 16:52 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Thu, 2024-10-31 at 14:32 -0400, Benjamin Marzinski wrote:
> This patchset include a couple of miscellaneous cleanups, but is
> mostly
> in response to issues brought up in
> https://github.com/opensvc/multipath-tools/issues/100
> It adds auto restarting to the multipathd.service unit file. I'm
> fairly
> conflicted about the correct limits to be placed on these restarts,
> so
> if anyone has strong opinions and a good argument, I'm more than
> willing
> to change them.
>
> The bulk of the changes deal with how multipath handles devices
> without
> any table. Multipath was supposed to delete these if they were left
> behind after a failed device creation, but that code was broken.
> However devices aren't typically left behind after failed creates, so
> it
> didn't matter.
>
> A bigger issue is that multipathd and multipath could fail if
> tableless
> devices were present. With this patchset, they will simply ignore
> tableless devices that don't have a multipath prefixed DM UUID. The
> last
> patch is a RFC patch that changes the behavior for tableless devices
> that do have a multipath prefixed DM UUID. It makes multipath remove
> these devices on the grounds that they are likely failed creates.
> However, looking at the libdevmapper code, I'm not sure that we
> actually
> want to do this. When a DM_DEVICE_CREATE task is run, it will first
> create a tableless device, and then immediately load the table and
> resume the device. So it's possible any that tableless devices which
> multipath sees are actually in the process of getting created. I left
> the patch as part of the set, but I'm not sure that removing the
> devices
> is the right answer here. I haven't ever seen any tableless multipath
> devices lying around (and I couldn't force any to get created when I
> tried). Unless other people have seen these, then I don't think the
> final patch of this set should go in.
>
Does it do any harm if we just keep these devices around? If there are
path devices around with matching UUIDs, the right thing to do for
multipathd would be to reload the map's table with an appropriate
multipath target. Otherwise, the map will just float around empty and
do no harm. This is how multipathd 0.9.9 and earlier behave, and
I see no issue with that.
It'd be much worse if someone created a table with an mpath-style UUID
and name but an incompatible immutable target type. But in that case,
again, we should probably just keep our hands off that map.
Martin
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/8] multipath fixes to tableless device handling
2024-10-31 18:32 [PATCH 0/8] multipath fixes to tableless device handling Benjamin Marzinski
` (8 preceding siblings ...)
2024-11-06 16:52 ` [PATCH 0/8] multipath fixes to tableless device handling Martin Wilck
@ 2024-11-06 16:52 ` Martin Wilck
2024-11-06 22:39 ` Martin Wilck
9 siblings, 1 reply; 24+ messages in thread
From: Martin Wilck @ 2024-11-06 16:52 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Thu, 2024-10-31 at 14:32 -0400, Benjamin Marzinski wrote:
> This patchset include a couple of miscellaneous cleanups, but is
> mostly
> in response to issues brought up in
> https://github.com/opensvc/multipath-tools/issues/100
> It adds auto restarting to the multipathd.service unit file. I'm
> fairly
> conflicted about the correct limits to be placed on these restarts,
> so
> if anyone has strong opinions and a good argument, I'm more than
> willing
> to change them.
>
> The bulk of the changes deal with how multipath handles devices
> without
> any table. Multipath was supposed to delete these if they were left
> behind after a failed device creation, but that code was broken.
> However devices aren't typically left behind after failed creates, so
> it
> didn't matter.
>
> A bigger issue is that multipathd and multipath could fail if
> tableless
> devices were present. With this patchset, they will simply ignore
> tableless devices that don't have a multipath prefixed DM UUID. The
> last
> patch is a RFC patch that changes the behavior for tableless devices
> that do have a multipath prefixed DM UUID. It makes multipath remove
> these devices on the grounds that they are likely failed creates.
> However, looking at the libdevmapper code, I'm not sure that we
> actually
> want to do this. When a DM_DEVICE_CREATE task is run, it will first
> create a tableless device, and then immediately load the table and
> resume the device. So it's possible any that tableless devices which
> multipath sees are actually in the process of getting created. I left
> the patch as part of the set, but I'm not sure that removing the
> devices
> is the right answer here. I haven't ever seen any tableless multipath
> devices lying around (and I couldn't force any to get created when I
> tried). Unless other people have seen these, then I don't think the
> final patch of this set should go in.
>
> Benjamin Marzinski (8):
> multipathd: print an error when failing to connect to multipathd
> libmultipath: check DM UUID earlier in libmp_mapinfo__
> libmultipath: Don't fail dm_get_maps for DMP_NOT_FOUND
> multipathd.service: restart multipathd on failure
> libmultipath: use MAPINFO_CHECK_UUID in dm_get_multipath
> libmultipath: signal multipath UUID device with no table
> libmultipath: fix removing device after failed creation
> libmultipath: remove devices with no table and multipath DM UUID
>
> libmultipath/devmapper.c | 56 +++++++++++++++++++++++-------
> --
> libmultipath/devmapper.h | 6 ++--
> multipathd/multipathd.service.in | 3 ++
> multipathd/uxclnt.c | 4 ++-
> 4 files changed, 51 insertions(+), 18 deletions(-)
For patch 1-6 of this series:
Reviewed-by: Martin Wilck <mwilck@suse.com>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/8] multipath fixes to tableless device handling
2024-11-06 16:52 ` Martin Wilck
@ 2024-11-06 22:39 ` Martin Wilck
2024-11-07 17:48 ` Benjamin Marzinski
0 siblings, 1 reply; 24+ messages in thread
From: Martin Wilck @ 2024-11-06 22:39 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Wed, 2024-11-06 at 17:52 +0100, Martin Wilck wrote:
>
> For patch 1-6 of this series:
> Reviewed-by: Martin Wilck <mwilck@suse.com>
The series breaks some unit tests. I can fix that.
Martin
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/8] multipath fixes to tableless device handling
2024-11-06 16:52 ` [PATCH 0/8] multipath fixes to tableless device handling Martin Wilck
@ 2024-11-07 12:20 ` Martin Wilck
2024-11-07 17:43 ` Benjamin Marzinski
0 siblings, 1 reply; 24+ messages in thread
From: Martin Wilck @ 2024-11-07 12:20 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Wed, 2024-11-06 at 17:52 +0100, Martin Wilck wrote:
> On Thu, 2024-10-31 at 14:32 -0400, Benjamin Marzinski wrote:
> > This patchset include a couple of miscellaneous cleanups, but is
> > mostly
> > in response to issues brought up in
> > https://github.com/opensvc/multipath-tools/issues/100
> > It adds auto restarting to the multipathd.service unit file. I'm
> > fairly
> > conflicted about the correct limits to be placed on these restarts,
> > so
> > if anyone has strong opinions and a good argument, I'm more than
> > willing
> > to change them.
> >
> > The bulk of the changes deal with how multipath handles devices
> > without
> > any table. Multipath was supposed to delete these if they were left
> > behind after a failed device creation, but that code was broken.
> > However devices aren't typically left behind after failed creates,
> > so
> > it
> > didn't matter.
> >
> > A bigger issue is that multipathd and multipath could fail if
> > tableless
> > devices were present. With this patchset, they will simply ignore
> > tableless devices that don't have a multipath prefixed DM UUID. The
> > last
> > patch is a RFC patch that changes the behavior for tableless
> > devices
> > that do have a multipath prefixed DM UUID. It makes multipath
> > remove
> > these devices on the grounds that they are likely failed creates.
> > However, looking at the libdevmapper code, I'm not sure that we
> > actually
> > want to do this. When a DM_DEVICE_CREATE task is run, it will
> > first
> > create a tableless device, and then immediately load the table and
> > resume the device. So it's possible any that tableless devices
> > which
> > multipath sees are actually in the process of getting created. I
> > left
> > the patch as part of the set, but I'm not sure that removing the
> > devices
> > is the right answer here. I haven't ever seen any tableless
> > multipath
> > devices lying around (and I couldn't force any to get created when
> > I
> > tried). Unless other people have seen these, then I don't think the
> > final patch of this set should go in.
> >
>
> Does it do any harm if we just keep these devices around? If there
> are
> path devices around with matching UUIDs, the right thing to do for
> multipathd would be to reload the map's table with an appropriate
> multipath target. Otherwise, the map will just float around empty and
> do no harm. This is how multipathd 0.9.9 and earlier behave, and
> I see no issue with that.
Thinking about it, I believe we should actually accept devices that
have an mpath UUID and an empty table, and add them to our mpvec. We
should treat them like devices that have a multipath target but no path
devices. If any matching paths become available, the map will be
reloaded and the issue will be fixed. IMO, this is less prone to race
conditions than trying to delete and re-add the devices.
Martin
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 6/8] libmultipath: signal multipath UUID device with no table
2024-11-06 16:17 ` Martin Wilck
@ 2024-11-07 17:42 ` Benjamin Marzinski
0 siblings, 0 replies; 24+ messages in thread
From: Benjamin Marzinski @ 2024-11-07 17:42 UTC (permalink / raw)
To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development
On Wed, Nov 06, 2024 at 05:17:30PM +0100, Martin Wilck wrote:
> On Thu, 2024-10-31 at 14:32 -0400, Benjamin Marzinski wrote:
> > if libmp_mapinfo() is run on a device that has a multipath prefixed
> > DM
> > UUID but no table (either active or inactive), it will now return
> > with a
> > new exit code, DMP_BAD_DEV, to signal that this is an invalid
> > multipath
> > device. Currently all code paths treat this identically to
> > DMP_NOT_FOUND.
> >
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> > libmultipath/devmapper.c | 18 ++++++++++++++++++
> > libmultipath/devmapper.h | 5 ++++-
> > 2 files changed, 22 insertions(+), 1 deletion(-)
> >
> > diff --git a/libmultipath/devmapper.c b/libmultipath/devmapper.c
> > index 13d8c1e5..6e11d5b5 100644
> > --- a/libmultipath/devmapper.c
> > +++ b/libmultipath/devmapper.c
> > @@ -726,6 +726,22 @@ static int libmp_mapinfo__(int flags, mapid_t
> > id, mapinfo_t info, const char *ma
> > }
> >
> > if (info.target || info.status || info.size || flags &
> > MAPINFO_TGT_TYPE__) {
> > + if (!dmi.live_table) {
> > + /*
> > + * If this is device has a multipath uuid
> > but no
> > + * table, flag it, so multipath can clean it
> > up
> > + */
> > + if (flags & MAPINFO_CHECK_UUID &&
> > + !dmi.inactive_table) {
> > + condlog(2, "%s: multipath map %s has
> > no table",
> > + fname__, map_id);
> > + return DMP_BAD_DEV;
>
>
> What about the case when there's an inactive_table? AFAIU that means
> that a resume operation after a table (re)load failed or was
> interrupted, or that the program that (re)loaded the table (multipathd,
> probably) died before it could resume the table.
>
> In this case we'd fall through the the "map has no targets" case, but
> shouldn't we also treat it as DMP_BAD_DEV?
This is because I was deleting these devices in patch 8, and I didn't
want to delete devices that where in the process of getting modified by
multipath. The code could still delete devices that were getting created
if it ran during the window between when the dm device gets created, and
a table gets loaded. In that case there would be no table. But there
is another window that happens after this during creation and during
a reload failure, where the device doesn't have an active table but
still has an inactive one. I didn't want to delete devices like this,
since it could effect existing devices. We could, for instance, end up
deleting a device that multipath was currently working on and was about
to clean up itself.
But if we aren't deleting these devices, then yes, it would make more
sense to characterize these as DMP_BAD_DEV.
-Ben
>
> Martin
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/8] multipath fixes to tableless device handling
2024-11-07 12:20 ` Martin Wilck
@ 2024-11-07 17:43 ` Benjamin Marzinski
2024-11-07 18:02 ` Martin Wilck
0 siblings, 1 reply; 24+ messages in thread
From: Benjamin Marzinski @ 2024-11-07 17:43 UTC (permalink / raw)
To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development
On Thu, Nov 07, 2024 at 01:20:09PM +0100, Martin Wilck wrote:
> On Wed, 2024-11-06 at 17:52 +0100, Martin Wilck wrote:
> > On Thu, 2024-10-31 at 14:32 -0400, Benjamin Marzinski wrote:
> > > This patchset include a couple of miscellaneous cleanups, but is
> > > mostly
> > > in response to issues brought up in
> > > https://github.com/opensvc/multipath-tools/issues/100
> > > It adds auto restarting to the multipathd.service unit file. I'm
> > > fairly
> > > conflicted about the correct limits to be placed on these restarts,
> > > so
> > > if anyone has strong opinions and a good argument, I'm more than
> > > willing
> > > to change them.
> > >
> > > The bulk of the changes deal with how multipath handles devices
> > > without
> > > any table. Multipath was supposed to delete these if they were left
> > > behind after a failed device creation, but that code was broken.
> > > However devices aren't typically left behind after failed creates,
> > > so
> > > it
> > > didn't matter.
> > >
> > > A bigger issue is that multipathd and multipath could fail if
> > > tableless
> > > devices were present. With this patchset, they will simply ignore
> > > tableless devices that don't have a multipath prefixed DM UUID. The
> > > last
> > > patch is a RFC patch that changes the behavior for tableless
> > > devices
> > > that do have a multipath prefixed DM UUID. It makes multipath
> > > remove
> > > these devices on the grounds that they are likely failed creates.
> > > However, looking at the libdevmapper code, I'm not sure that we
> > > actually
> > > want to do this. When a DM_DEVICE_CREATE task is run, it will
> > > first
> > > create a tableless device, and then immediately load the table and
> > > resume the device. So it's possible any that tableless devices
> > > which
> > > multipath sees are actually in the process of getting created. I
> > > left
> > > the patch as part of the set, but I'm not sure that removing the
> > > devices
> > > is the right answer here. I haven't ever seen any tableless
> > > multipath
> > > devices lying around (and I couldn't force any to get created when
> > > I
> > > tried). Unless other people have seen these, then I don't think the
> > > final patch of this set should go in.
> > >
> >
> > Does it do any harm if we just keep these devices around? If there
> > are
> > path devices around with matching UUIDs, the right thing to do for
> > multipathd would be to reload the map's table with an appropriate
> > multipath target. Otherwise, the map will just float around empty and
> > do no harm. This is how multipathd 0.9.9 and earlier behave, and
> > I see no issue with that.
>
> Thinking about it, I believe we should actually accept devices that
> have an mpath UUID and an empty table, and add them to our mpvec. We
> should treat them like devices that have a multipath target but no path
> devices. If any matching paths become available, the map will be
> reloaded and the issue will be fixed. IMO, this is less prone to race
> conditions than trying to delete and re-add the devices.
I'm not sure off the top of my head how easy it will be to handle
devices with no dm table at all, but I'll take a look into this.
-Ben
> Martin
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/8] multipath fixes to tableless device handling
2024-11-06 22:39 ` Martin Wilck
@ 2024-11-07 17:48 ` Benjamin Marzinski
2024-11-07 18:03 ` Martin Wilck
0 siblings, 1 reply; 24+ messages in thread
From: Benjamin Marzinski @ 2024-11-07 17:48 UTC (permalink / raw)
To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development
On Wed, Nov 06, 2024 at 11:39:38PM +0100, Martin Wilck wrote:
> On Wed, 2024-11-06 at 17:52 +0100, Martin Wilck wrote:
> >
> > For patch 1-6 of this series:
> > Reviewed-by: Martin Wilck <mwilck@suse.com>
>
> The series breaks some unit tests. I can fix that.
Oops. I'm going to do some more work on this patchset, so I can fix it
myself.
-Ben
>
> Martin
>
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/8] multipath fixes to tableless device handling
2024-11-07 17:43 ` Benjamin Marzinski
@ 2024-11-07 18:02 ` Martin Wilck
2024-11-08 22:08 ` Benjamin Marzinski
0 siblings, 1 reply; 24+ messages in thread
From: Martin Wilck @ 2024-11-07 18:02 UTC (permalink / raw)
To: Benjamin Marzinski; +Cc: Christophe Varoqui, device-mapper development
On Thu, 2024-11-07 at 12:43 -0500, Benjamin Marzinski wrote:
> On Thu, Nov 07, 2024 at 01:20:09PM +0100, Martin Wilck wrote:
> >
> > Thinking about it, I believe we should actually accept devices that
> > have an mpath UUID and an empty table, and add them to our mpvec.
> > We
> > should treat them like devices that have a multipath target but no
> > path
> > devices. If any matching paths become available, the map will be
> > reloaded and the issue will be fixed. IMO, this is less prone to
> > race
> > conditions than trying to delete and re-add the devices.
>
> I'm not sure off the top of my head how easy it will be to handle
> devices with no dm table at all, but I'll take a look into this.
I tried this on a system with 0.9.9 yesterday. 0.9.9 will not add the
map into the mpvec. But in coalesce_paths(), it will just reload the
map:
multipathd[3198]: 360014053cdf4a9b80de42ab96c74f4ef: set ACT_CREATE (map does not exist)
multipathd[3198]: 360014053cdf4a9b80de42ab96c74f4ef: map already present
multipathd[3198]: 360014053cdf4a9b80de42ab96c74f4ef: reload [0 41943040 multipath 1 queue_if_no_path 1 alua 2 1 service-time 0 1 1 8:16 1 serv>
Note that it first says "map does not exist", and then sees it present (with no table).
The reload works without issues though, if the map is empty.
If, on the contrary, the map is not empty but contains another incompatible target
(I tried this with a trivial linear mapping), the reload operation will fail:
multipathd[1347]: 360014053cdf4a9b80de42ab96c74f4ef: set ACT_CREATE (map does not exist)
multipathd[1347]: 360014053cdf4a9b80de42ab96c74f4ef: map already present
multipathd[1347]: 360014053cdf4a9b80de42ab96c74f4ef: reload [0 41943040 multipath 1 queue_if_no_path 1 alua 2 1 service-time 0 1 1 8:16 1 serv>
kernel: device-mapper: ioctl: can't change device type (old=1 vs new=2) after initial table load.
multipathd[1347]: libdevmapper: ioctl/libdm-iface.c(1998): device-mapper: reload ioctl on 360014053cdf4a9b80de42ab96c74f4ef failed: Invalid argument
multipathd[1347]: dm_addmap: libdm task=1 error: Invalid argument
multipathd[1347]: 360014053cdf4a9b80de42ab96c74f4ef: domap (0) failure for create/reload map
multipathd[1347]: 360014053cdf4a9b80de42ab96c74f4ef: removing map
Regards
Martin
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/8] multipath fixes to tableless device handling
2024-11-07 17:48 ` Benjamin Marzinski
@ 2024-11-07 18:03 ` Martin Wilck
0 siblings, 0 replies; 24+ messages in thread
From: Martin Wilck @ 2024-11-07 18:03 UTC (permalink / raw)
To: Benjamin Marzinski; +Cc: Christophe Varoqui, device-mapper development
On Thu, 2024-11-07 at 12:48 -0500, Benjamin Marzinski wrote:
> On Wed, Nov 06, 2024 at 11:39:38PM +0100, Martin Wilck wrote:
> > On Wed, 2024-11-06 at 17:52 +0100, Martin Wilck wrote:
> > >
> > > For patch 1-6 of this series:
> > > Reviewed-by: Martin Wilck <mwilck@suse.com>
> >
> > The series breaks some unit tests. I can fix that.
>
> Oops. I'm going to do some more work on this patchset, so I can fix
> it
> myself.
Have a look at my "tip" branch, fixes are already applied there.
Martin
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/8] multipath fixes to tableless device handling
2024-11-07 18:02 ` Martin Wilck
@ 2024-11-08 22:08 ` Benjamin Marzinski
2024-11-08 23:03 ` Martin Wilck
0 siblings, 1 reply; 24+ messages in thread
From: Benjamin Marzinski @ 2024-11-08 22:08 UTC (permalink / raw)
To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development
On Thu, Nov 07, 2024 at 07:02:11PM +0100, Martin Wilck wrote:
> On Thu, 2024-11-07 at 12:43 -0500, Benjamin Marzinski wrote:
> > On Thu, Nov 07, 2024 at 01:20:09PM +0100, Martin Wilck wrote:
> > >
> > > Thinking about it, I believe we should actually accept devices that
> > > have an mpath UUID and an empty table, and add them to our mpvec.
> > > We
> > > should treat them like devices that have a multipath target but no
> > > path
> > > devices. If any matching paths become available, the map will be
> > > reloaded and the issue will be fixed. IMO, this is less prone to
> > > race
> > > conditions than trying to delete and re-add the devices.
> >
> > I'm not sure off the top of my head how easy it will be to handle
> > devices with no dm table at all, but I'll take a look into this.
>
> I tried this on a system with 0.9.9 yesterday. 0.9.9 will not add the
> map into the mpvec. But in coalesce_paths(), it will just reload the
> map:
Given that the code already does the right thing without needing any
changes to handle tableless maps makes me feel like just removing the
final patch is the best idea.
The only issue with the current code is that if you have a device with
no table with the UUID of a multipath device but a different name than
that multipath device should have, multipath will try to create a new
device instead of reload it, and this fails, since the UUID is already
in use.
That should probably be handled by making the multipath code pay more
attention to the device UUID and less to the device name, but that's
work for a different patchset.
-Ben
>
> multipathd[3198]: 360014053cdf4a9b80de42ab96c74f4ef: set ACT_CREATE (map does not exist)
> multipathd[3198]: 360014053cdf4a9b80de42ab96c74f4ef: map already present
> multipathd[3198]: 360014053cdf4a9b80de42ab96c74f4ef: reload [0 41943040 multipath 1 queue_if_no_path 1 alua 2 1 service-time 0 1 1 8:16 1 serv>
>
> Note that it first says "map does not exist", and then sees it present (with no table).
> The reload works without issues though, if the map is empty.
>
> If, on the contrary, the map is not empty but contains another incompatible target
> (I tried this with a trivial linear mapping), the reload operation will fail:
>
> multipathd[1347]: 360014053cdf4a9b80de42ab96c74f4ef: set ACT_CREATE (map does not exist)
> multipathd[1347]: 360014053cdf4a9b80de42ab96c74f4ef: map already present
> multipathd[1347]: 360014053cdf4a9b80de42ab96c74f4ef: reload [0 41943040 multipath 1 queue_if_no_path 1 alua 2 1 service-time 0 1 1 8:16 1 serv>
> kernel: device-mapper: ioctl: can't change device type (old=1 vs new=2) after initial table load.
> multipathd[1347]: libdevmapper: ioctl/libdm-iface.c(1998): device-mapper: reload ioctl on 360014053cdf4a9b80de42ab96c74f4ef failed: Invalid argument
> multipathd[1347]: dm_addmap: libdm task=1 error: Invalid argument
> multipathd[1347]: 360014053cdf4a9b80de42ab96c74f4ef: domap (0) failure for create/reload map
> multipathd[1347]: 360014053cdf4a9b80de42ab96c74f4ef: removing map
>
> Regards
> Martin
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/8] multipath fixes to tableless device handling
2024-11-08 22:08 ` Benjamin Marzinski
@ 2024-11-08 23:03 ` Martin Wilck
2024-11-11 17:17 ` Benjamin Marzinski
0 siblings, 1 reply; 24+ messages in thread
From: Martin Wilck @ 2024-11-08 23:03 UTC (permalink / raw)
To: Benjamin Marzinski; +Cc: Christophe Varoqui, device-mapper development
On Fri, 2024-11-08 at 17:08 -0500, Benjamin Marzinski wrote:
> On Thu, Nov 07, 2024 at 07:02:11PM +0100, Martin Wilck wrote:
> > On Thu, 2024-11-07 at 12:43 -0500, Benjamin Marzinski wrote:
> > > On Thu, Nov 07, 2024 at 01:20:09PM +0100, Martin Wilck wrote:
> > > >
> > > > Thinking about it, I believe we should actually accept devices
> > > > that
> > > > have an mpath UUID and an empty table, and add them to our
> > > > mpvec.
> > > > We
> > > > should treat them like devices that have a multipath target but
> > > > no
> > > > path
> > > > devices. If any matching paths become available, the map will
> > > > be
> > > > reloaded and the issue will be fixed. IMO, this is less prone
> > > > to
> > > > race
> > > > conditions than trying to delete and re-add the devices.
> > >
> > > I'm not sure off the top of my head how easy it will be to handle
> > > devices with no dm table at all, but I'll take a look into this.
> >
> > I tried this on a system with 0.9.9 yesterday. 0.9.9 will not add
> > the
> > map into the mpvec. But in coalesce_paths(), it will just reload
> > the
> > map:
>
> Given that the code already does the right thing without needing any
> changes to handle tableless maps makes me feel like just removing the
> final patch is the best idea.
If we do this, do we still need the special case for DMP_BAD_DEV? IMO
we just need to make sure that dm_get_maps() doesn't return an error if
it encounters an empty map.
> The only issue with the current code is that if you have a device
> with
> no table with the UUID of a multipath device but a different name
> than
> that multipath device should have, multipath will try to create a new
> device instead of reload it, and this fails, since the UUID is
> already
> in use.
That should be handled in coalesce_paths(). It's basically the scenario
that occurs if we enable or disable user_friendly names.
> That should probably be handled by making the multipath code pay more
> attention to the device UUID and less to the device name, but that's
> work for a different patchset.
Yes, definitely.
Martin
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/8] multipath fixes to tableless device handling
2024-11-08 23:03 ` Martin Wilck
@ 2024-11-11 17:17 ` Benjamin Marzinski
2024-11-12 7:46 ` Martin Wilck
0 siblings, 1 reply; 24+ messages in thread
From: Benjamin Marzinski @ 2024-11-11 17:17 UTC (permalink / raw)
To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development
On Sat, Nov 09, 2024 at 12:03:33AM +0100, Martin Wilck wrote:
> On Fri, 2024-11-08 at 17:08 -0500, Benjamin Marzinski wrote:
> > On Thu, Nov 07, 2024 at 07:02:11PM +0100, Martin Wilck wrote:
> > > On Thu, 2024-11-07 at 12:43 -0500, Benjamin Marzinski wrote:
> > > > On Thu, Nov 07, 2024 at 01:20:09PM +0100, Martin Wilck wrote:
> > > > >
> > > > > Thinking about it, I believe we should actually accept devices
> > > > > that
> > > > > have an mpath UUID and an empty table, and add them to our
> > > > > mpvec.
> > > > > We
> > > > > should treat them like devices that have a multipath target but
> > > > > no
> > > > > path
> > > > > devices. If any matching paths become available, the map will
> > > > > be
> > > > > reloaded and the issue will be fixed. IMO, this is less prone
> > > > > to
> > > > > race
> > > > > conditions than trying to delete and re-add the devices.
> > > >
> > > > I'm not sure off the top of my head how easy it will be to handle
> > > > devices with no dm table at all, but I'll take a look into this.
> > >
> > > I tried this on a system with 0.9.9 yesterday. 0.9.9 will not add
> > > the
> > > map into the mpvec. But in coalesce_paths(), it will just reload
> > > the
> > > map:
> >
> > Given that the code already does the right thing without needing any
> > changes to handle tableless maps makes me feel like just removing the
> > final patch is the best idea.
>
> If we do this, do we still need the special case for DMP_BAD_DEV? IMO
> we just need to make sure that dm_get_maps() doesn't return an error if
> it encounters an empty map.
Depends. If we fail attempting to create a device, we could have raced
with something else attempting to create it. Just checking for
dm_map_present() before we remove the device won't distinguish between
these two cases at all.
On the other hand, there is still a window where the program that we
raced with is in the middle of creating the device, but it doesn't have
a live table yet. In this case, even the DMP_BAD_DEV code won't be able
to distinguish between a map that didn't get created correctly and one
that is in the process of getting created. It would reduce the window
where we could accidentally delete a working device, however.
Since it doesn't add much complexity, I lean towards keeping it, but it
is one more state we need to handle on returns from libmp_mapinfo(), so
I'm open to an argument that it's not worth it.
> > The only issue with the current code is that if you have a device
> > with
> > no table with the UUID of a multipath device but a different name
> > than
> > that multipath device should have, multipath will try to create a new
> > device instead of reload it, and this fails, since the UUID is
> > already
> > in use.
>
> That should be handled in coalesce_paths(). It's basically the scenario
> that occurs if we enable or disable user_friendly names.
Except that we have a mpp for the other device if it has a table. If the
device doesn't have a table, then a mpp isn't created, so
coalesce_paths() doesn't track it. We could add code to create mpps for
these tableless devices, but it's another trade off of added complexity
to handle a rare corner case, and I'm not sure this one is worth it.
-Ben
> > That should probably be handled by making the multipath code pay more
> > attention to the device UUID and less to the device name, but that's
> > work for a different patchset.
>
> Yes, definitely.
>
> Martin
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/8] multipath fixes to tableless device handling
2024-11-11 17:17 ` Benjamin Marzinski
@ 2024-11-12 7:46 ` Martin Wilck
2024-11-13 0:07 ` Benjamin Marzinski
0 siblings, 1 reply; 24+ messages in thread
From: Martin Wilck @ 2024-11-12 7:46 UTC (permalink / raw)
To: Benjamin Marzinski; +Cc: Christophe Varoqui, device-mapper development
On Mon, 2024-11-11 at 12:17 -0500, Benjamin Marzinski wrote:
> On Sat, Nov 09, 2024 at 12:03:33AM +0100, Martin Wilck wrote:
> > On Fri, 2024-11-08 at 17:08 -0500, Benjamin Marzinski wrote:
> > >
> > > Given that the code already does the right thing without needing
> > > any
> > > changes to handle tableless maps makes me feel like just removing
> > > the
> > > final patch is the best idea.
> >
> > If we do this, do we still need the special case for DMP_BAD_DEV?
> > IMO
> > we just need to make sure that dm_get_maps() doesn't return an
> > error if
> > it encounters an empty map.
>
> Depends. If we fail attempting to create a device, we could have
> raced
> with something else attempting to create it. Just checking for
> dm_map_present() before we remove the device won't distinguish
> between
> these two cases at all.
I would like to keep things as simple as possible.
The main mistake that I made in bf3a4ad ("libmultipath: simplify
dm_get_maps()") was that I'd changed dm_get_maps() such that returned
error if anything went wrong for a single map. That was wrong, it
should just skip the map in question (IMO that even includes the
DMP_ERR case, but that's up to discussion). Just changing the behavior
of dm_get_maps() would fix the regressions reported on GitHub (correct
me if I'm wrong).
Wrt libmp_mapinfo:
From my PoV, there's no difference between "has no table" and "has no
targets". There _is_ a difference between "no targets" and "multiple
targets" though, as the former can be reloaded as a proper multipath
table whereas the latter cannot. So these two cases must be treated
differently.
I think we should actually introduce a new return code.
- DMP_NOT_FOUND should actually mean that the given map does not exist
at all
- DMP_NO_MATCH should mean that it has targets that don't match
(either multiple targets, or something non-multipath), or that the UUID
doesn't match (if MAPINFO_CHECK_UUID is set).
- We should have another rc code for "exists but is empty", but that
should be called DMP_EMPTY rather than DMP_BAD_DEV.
> On the other hand, there is still a window where the program that we
> raced with is in the middle of creating the device, but it doesn't
> have
> a live table yet. In this case, even the DMP_BAD_DEV code won't be
> able
> to distinguish between a map that didn't get created correctly and
> one
> that is in the process of getting created. It would reduce the window
> where we could accidentally delete a working device, however.
I wouldn't bother about this kind of race too much. If multipathd just
ignores this kind of maps as it used to, it won't do any harm AFAICT.
We shouldn't alter this behavior, IOW multipathd should neither remove
these maps nor add them to its internal tables unless they become
populated with paths.
If anything, we should finally get down to delegating map creation from
multipath to multipathd.
> Since it doesn't add much complexity, I lean towards keeping it, but
> it
> is one more state we need to handle on returns from libmp_mapinfo(),
> so
> I'm open to an argument that it's not worth it.
I agree, comments above.
> > > The only issue with the current code is that if you have a device
> > > with
> > > no table with the UUID of a multipath device but a different name
> > > than
> > > that multipath device should have, multipath will try to create a
> > > new
> > > device instead of reload it, and this fails, since the UUID is
> > > already
> > > in use.
> >
> > That should be handled in coalesce_paths(). It's basically the
> > scenario
> > that occurs if we enable or disable user_friendly names.
>
> Except that we have a mpp for the other device if it has a table. If
> the
> device doesn't have a table, then a mpp isn't created, so
> coalesce_paths() doesn't track it. We could add code to create mpps
> for
> these tableless devices, but it's another trade off of added
> complexity
> to handle a rare corner case, and I'm not sure this one is worth it.
What I meant is that if there are matching path devices,
coalesce_paths() will create an mpp and reload the table. Otherwise,
the table will continue lurking around without being tracked, which
doesn't do harm even if it may not seem optimal. If a matching path is
added, again, uev_add_path() should reload the map (to be tested).
I think that behavior is sane enough for a corner case like this.
Martin
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/8] multipath fixes to tableless device handling
2024-11-12 7:46 ` Martin Wilck
@ 2024-11-13 0:07 ` Benjamin Marzinski
0 siblings, 0 replies; 24+ messages in thread
From: Benjamin Marzinski @ 2024-11-13 0:07 UTC (permalink / raw)
To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development
On Tue, Nov 12, 2024 at 08:46:52AM +0100, Martin Wilck wrote:
> On Mon, 2024-11-11 at 12:17 -0500, Benjamin Marzinski wrote:
> > On Sat, Nov 09, 2024 at 12:03:33AM +0100, Martin Wilck wrote:
> > > On Fri, 2024-11-08 at 17:08 -0500, Benjamin Marzinski wrote:
> > > >
> > > > Given that the code already does the right thing without needing
> > > > any
> > > > changes to handle tableless maps makes me feel like just removing
> > > > the
> > > > final patch is the best idea.
> > >
> > > If we do this, do we still need the special case for DMP_BAD_DEV?
> > > IMO
> > > we just need to make sure that dm_get_maps() doesn't return an
> > > error if
> > > it encounters an empty map.
> >
> > Depends. If we fail attempting to create a device, we could have
> > raced
> > with something else attempting to create it. Just checking for
> > dm_map_present() before we remove the device won't distinguish
> > between
> > these two cases at all.
>
> I would like to keep things as simple as possible.
>
> The main mistake that I made in bf3a4ad ("libmultipath: simplify
> dm_get_maps()") was that I'd changed dm_get_maps() such that returned
> error if anything went wrong for a single map. That was wrong, it
> should just skip the map in question (IMO that even includes the
> DMP_ERR case, but that's up to discussion). Just changing the behavior
> of dm_get_maps() would fix the regressions reported on GitHub (correct
> me if I'm wrong).
>
> Wrt libmp_mapinfo:
>
> >From my PoV, there's no difference between "has no table" and "has no
> targets". There _is_ a difference between "no targets" and "multiple
> targets" though, as the former can be reloaded as a proper multipath
> table whereas the latter cannot. So these two cases must be treated
> differently.
>
> I think we should actually introduce a new return code.
>
> - DMP_NOT_FOUND should actually mean that the given map does not exist
> at all
> - DMP_NO_MATCH should mean that it has targets that don't match
> (either multiple targets, or something non-multipath), or that the UUID
> doesn't match (if MAPINFO_CHECK_UUID is set).
> - We should have another rc code for "exists but is empty", but that
> should be called DMP_EMPTY rather than DMP_BAD_DEV.
This sounds reasonable. I can redo my patch for that on top of the set
you just posted.
> > On the other hand, there is still a window where the program that we
> > raced with is in the middle of creating the device, but it doesn't
> > have
> > a live table yet. In this case, even the DMP_BAD_DEV code won't be
> > able
> > to distinguish between a map that didn't get created correctly and
> > one
> > that is in the process of getting created. It would reduce the window
> > where we could accidentally delete a working device, however.
>
> I wouldn't bother about this kind of race too much. If multipathd just
> ignores this kind of maps as it used to, it won't do any harm AFAICT.
> We shouldn't alter this behavior, IOW multipathd should neither remove
> these maps nor add them to its internal tables unless they become
> populated with paths.
>
> If anything, we should finally get down to delegating map creation from
> multipath to multipathd.
The issue is that if creating a multipath device fails, the current code
will only delete any leftover device it finds if that device is already
set up correctly (since dm_flush_map_nosync() won't delete the map if it
doesn't have a multipath table). This means we only delete devices in
the case where we did race and something else managed to set the device
up, which is exactly what we don't want to do. So, I'm going to repost a
version of my ("libmultipath: fix removing device after failed
creation") patch because anything is better than what we do right now,
which is just wrong.
> > Since it doesn't add much complexity, I lean towards keeping it, but
> > it
> > is one more state we need to handle on returns from libmp_mapinfo(),
> > so
> > I'm open to an argument that it's not worth it.
>
> I agree, comments above.
>
> > > > The only issue with the current code is that if you have a device
> > > > with
> > > > no table with the UUID of a multipath device but a different name
> > > > than
> > > > that multipath device should have, multipath will try to create a
> > > > new
> > > > device instead of reload it, and this fails, since the UUID is
> > > > already
> > > > in use.
> > >
> > > That should be handled in coalesce_paths(). It's basically the
> > > scenario
> > > that occurs if we enable or disable user_friendly names.
> >
> > Except that we have a mpp for the other device if it has a table. If
> > the
> > device doesn't have a table, then a mpp isn't created, so
> > coalesce_paths() doesn't track it. We could add code to create mpps
> > for
> > these tableless devices, but it's another trade off of added
> > complexity
> > to handle a rare corner case, and I'm not sure this one is worth it.
>
> What I meant is that if there are matching path devices,
> coalesce_paths() will create an mpp and reload the table. Otherwise,
> the table will continue lurking around without being tracked, which
> doesn't do harm even if it may not seem optimal. If a matching path is
> added, again, uev_add_path() should reload the map (to be tested).
>
> I think that behavior is sane enough for a corner case like this.
Even if there are matching path devices, if there is a dm device that
has a different name than the device we are trying to create, but the
same UUID, and no table (and thus, not in the current list of mpps),
multipath will try to create a new device instead of reload it, and it
will fail. I'm trying to avoid that. But this issue has been around
forever, and AFAIK has never actually caused a problem, so fixing it
isn't worth any real added complexity.
-Ben
> Martin
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2024-11-13 0:07 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-31 18:32 [PATCH 0/8] multipath fixes to tableless device handling Benjamin Marzinski
2024-10-31 18:32 ` [PATCH 1/8] multipathd: print an error when failing to connect to multipathd Benjamin Marzinski
2024-10-31 18:32 ` [PATCH 2/8] libmultipath: check DM UUID earlier in libmp_mapinfo__ Benjamin Marzinski
2024-10-31 18:32 ` [PATCH 3/8] libmultipath: Don't fail dm_get_maps for DMP_NOT_FOUND Benjamin Marzinski
2024-10-31 18:32 ` [PATCH 4/8] multipathd.service: restart multipathd on failure Benjamin Marzinski
2024-10-31 18:32 ` [PATCH 5/8] libmultipath: use MAPINFO_CHECK_UUID in dm_get_multipath Benjamin Marzinski
2024-10-31 18:32 ` [PATCH 6/8] libmultipath: signal multipath UUID device with no table Benjamin Marzinski
2024-11-06 16:17 ` Martin Wilck
2024-11-07 17:42 ` Benjamin Marzinski
2024-10-31 18:33 ` [PATCH 7/8] libmultipath: fix removing device after failed creation Benjamin Marzinski
2024-10-31 18:33 ` [RFC PATCH 8/8] libmultipath: remove devices with no table and multipath DM UUID Benjamin Marzinski
2024-11-06 16:52 ` [PATCH 0/8] multipath fixes to tableless device handling Martin Wilck
2024-11-07 12:20 ` Martin Wilck
2024-11-07 17:43 ` Benjamin Marzinski
2024-11-07 18:02 ` Martin Wilck
2024-11-08 22:08 ` Benjamin Marzinski
2024-11-08 23:03 ` Martin Wilck
2024-11-11 17:17 ` Benjamin Marzinski
2024-11-12 7:46 ` Martin Wilck
2024-11-13 0:07 ` Benjamin Marzinski
2024-11-06 16:52 ` Martin Wilck
2024-11-06 22:39 ` Martin Wilck
2024-11-07 17:48 ` Benjamin Marzinski
2024-11-07 18:03 ` Martin Wilck
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.