* [PATCH 1/3] libmultipath: don't print error message if WATCHDOG_USEC is 0
2024-11-14 14:42 [PATCH 0/3] multipath-tools: fixes for systemd watchdog Martin Wilck
@ 2024-11-14 14:42 ` Martin Wilck
2024-11-14 14:42 ` [PATCH 2/3] libmultipath: honor WATCHDOG_PID setting Martin Wilck
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Martin Wilck @ 2024-11-14 14:42 UTC (permalink / raw)
To: Christophe Varoqui, Benjamin Marzinski; +Cc: Martin Wilck, dm-devel
WATCHDOG_USEC may be set to 0, which means that the watchdog
is disabled in systemd.
Signed-off-by: Martin Wilck <mwilck@suse.com>
---
libmultipath/config.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/libmultipath/config.c b/libmultipath/config.c
index 0e3a5cc..226ddec 100644
--- a/libmultipath/config.c
+++ b/libmultipath/config.c
@@ -865,6 +865,9 @@ static void set_max_checkint_from_watchdog(struct config *conf)
unsigned long checkint;
if (envp && sscanf(envp, "%lu", &checkint) == 1) {
+ if (checkint == 0)
+ /* watchdog disabled */
+ return;
/* Value is in microseconds */
checkint /= 1000000;
if (checkint < 1 || checkint > UINT_MAX) {
--
2.47.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH 2/3] libmultipath: honor WATCHDOG_PID setting
2024-11-14 14:42 [PATCH 0/3] multipath-tools: fixes for systemd watchdog Martin Wilck
2024-11-14 14:42 ` [PATCH 1/3] libmultipath: don't print error message if WATCHDOG_USEC is 0 Martin Wilck
@ 2024-11-14 14:42 ` Martin Wilck
2024-11-14 14:42 ` [PATCH 3/3] libmultipath: cut watchdog interval in half Martin Wilck
2024-11-16 0:05 ` [PATCH 0/3] multipath-tools: fixes for systemd watchdog Benjamin Marzinski
3 siblings, 0 replies; 5+ messages in thread
From: Martin Wilck @ 2024-11-14 14:42 UTC (permalink / raw)
To: Christophe Varoqui, Benjamin Marzinski; +Cc: Martin Wilck, dm-devel
WATCHDOG_USEC should only be evaluated if WATCHDOG_PID is either
unset (systemd <= 208) or set to the main daemon's pid [1].
Passing the daemon's PID to set_max_checkint_from_watchdog() requires
a mechanism similar to what we've been using for get_multipath_config().
[1] https://www.freedesktop.org/software/systemd/man/latest/sd_watchdog_enabled.html
Signed-off-by: Martin Wilck <mwilck@suse.com>
---
libmultipath/config.c | 11 ++++++++++-
libmultipath/config.h | 2 ++
libmultipath/libmultipath.version | 1 +
3 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/libmultipath/config.c b/libmultipath/config.c
index 226ddec..bd199fa 100644
--- a/libmultipath/config.c
+++ b/libmultipath/config.c
@@ -858,12 +858,21 @@ process_config_dir(struct config *conf, char *dir)
pthread_cleanup_pop(1);
}
+__attribute__((weak)) pid_t daemon_pid = -1;
+
#ifdef USE_SYSTEMD
static void set_max_checkint_from_watchdog(struct config *conf)
{
- char *envp = getenv("WATCHDOG_USEC");
+ const char *envp;
unsigned long checkint;
+ long pid;
+ envp = getenv("WATCHDOG_PID");
+ /* See sd_watchdog_enabled(3) */
+ if (envp && sscanf(envp, "%lu", &pid) == 1 && pid != daemon_pid)
+ return;
+
+ envp = getenv("WATCHDOG_USEC");
if (envp && sscanf(envp, "%lu", &checkint) == 1) {
if (checkint == 0)
/* watchdog disabled */
diff --git a/libmultipath/config.h b/libmultipath/config.h
index 94cdf25..d12f63e 100644
--- a/libmultipath/config.h
+++ b/libmultipath/config.h
@@ -315,4 +315,6 @@ int parse_uid_attrs(char *uid_attrs, struct config *conf);
const char *get_uid_attribute_by_attrs(const struct config *conf,
const char *path_dev);
+/* Weak dummy function, meant to be overridden by multipathd */
+extern pid_t daemon_pid;
#endif
diff --git a/libmultipath/libmultipath.version b/libmultipath/libmultipath.version
index 6bdf694..a898f7a 100644
--- a/libmultipath/libmultipath.version
+++ b/libmultipath/libmultipath.version
@@ -70,6 +70,7 @@ global:
cleanup_multipath_and_paths;
coalesce_paths;
count_active_paths;
+ daemon_pid;
delete_all_foreign;
delete_foreign;
dm_cancel_deferred_remove;
--
2.47.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH 3/3] libmultipath: cut watchdog interval in half
2024-11-14 14:42 [PATCH 0/3] multipath-tools: fixes for systemd watchdog Martin Wilck
2024-11-14 14:42 ` [PATCH 1/3] libmultipath: don't print error message if WATCHDOG_USEC is 0 Martin Wilck
2024-11-14 14:42 ` [PATCH 2/3] libmultipath: honor WATCHDOG_PID setting Martin Wilck
@ 2024-11-14 14:42 ` Martin Wilck
2024-11-16 0:05 ` [PATCH 0/3] multipath-tools: fixes for systemd watchdog Benjamin Marzinski
3 siblings, 0 replies; 5+ messages in thread
From: Martin Wilck @ 2024-11-14 14:42 UTC (permalink / raw)
To: Christophe Varoqui, Benjamin Marzinski; +Cc: Martin Wilck, dm-devel
According to [1], a daemon should trigger the watchdog every half of the time
that is communicated via WATCHDOG_USEC.
[1] https://www.freedesktop.org/software/systemd/man/latest/sd_watchdog_enabled.html
Signed-off-by: Martin Wilck <mwilck@suse.com>
---
libmultipath/config.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/libmultipath/config.c b/libmultipath/config.c
index bd199fa..4d457b8 100644
--- a/libmultipath/config.c
+++ b/libmultipath/config.c
@@ -877,8 +877,11 @@ static void set_max_checkint_from_watchdog(struct config *conf)
if (checkint == 0)
/* watchdog disabled */
return;
- /* Value is in microseconds */
- checkint /= 1000000;
+ /*
+ * Value is in microseconds, and we should trigger the watchdog
+ * twice per interval.
+ */
+ checkint /= 2000000;
if (checkint < 1 || checkint > UINT_MAX) {
condlog(1, "invalid value for WatchdogSec: \"%s\"", envp);
return;
--
2.47.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH 0/3] multipath-tools: fixes for systemd watchdog
2024-11-14 14:42 [PATCH 0/3] multipath-tools: fixes for systemd watchdog Martin Wilck
` (2 preceding siblings ...)
2024-11-14 14:42 ` [PATCH 3/3] libmultipath: cut watchdog interval in half Martin Wilck
@ 2024-11-16 0:05 ` Benjamin Marzinski
3 siblings, 0 replies; 5+ messages in thread
From: Benjamin Marzinski @ 2024-11-16 0:05 UTC (permalink / raw)
To: Martin Wilck; +Cc: Christophe Varoqui, Martin Wilck, dm-devel
On Thu, Nov 14, 2024 at 03:42:21PM +0100, Martin Wilck wrote:
> A set of small fixes to make multipathd better adhere to the conventions
I'm a little confused by this. Your patches make sense, but why do we
bother limiting max_checkint to the Watchdog timer? We send a notify
every loop in checkerloop() regardless. Changing max_checkint won't make
us any less likely to hang for too long if we're using a synchronous
chacker, and I can't figure out why the Watchdog time should have
anything to do with how frequently we check working paths.
Your code changes looks fine, so assuming there's a use for this code:
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
But possibly we should be removing set_max_checkint_from_watchdog(), or
replacing max_checkint with something like checker_timeout, but only for
synchronous checkers. Or possibly I'm just missing something here.
-Ben
> for systemd's WatchdogSec= setting.
>
> Martin Wilck (3):
> libmultipath: don't print error message if WATCHDOG_USEC is 0
> libmultipath: honor WATCHDOG_PID setting
> libmultipath: cut watchdog interval in half
>
> libmultipath/config.c | 21 ++++++++++++++++++---
> libmultipath/config.h | 2 ++
> libmultipath/libmultipath.version | 1 +
> 3 files changed, 21 insertions(+), 3 deletions(-)
>
> --
> 2.47.0
^ permalink raw reply [flat|nested] 5+ messages in thread