[PATCH 0/4] multipath-tools: generic async threads for TUR checker

public inbox for dm-devel@redhat.com
 help / color / mirror / Atom feed

* [PATCH 0/4] multipath-tools: generic async threads for TUR checker
@ 2026-03-19 22:13 Martin Wilck
  2026-03-19 22:13 ` [PATCH 1/4] multipathd: get_new_state: map PATH_TIMEOUT to PATH_DOWN Martin Wilck
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Martin Wilck @ 2026-03-19 22:13 UTC (permalink / raw)
  To: Christophe Varoqui, Benjamin Marzinski, Brian Bunker, dm-devel
  Cc: Martin Wilck

Motivated by the recent submission of an ALUA checker for multipath-tools by
Brian Bunker, I am proposing a generic framework for asynchronous checker
threads in multipath-tools.

The first patch is a small fix I came up with while testing this.
2/4 is the actual implementation, 3/4 test code, and 4/4 modifies the
TUR checker to the new library code. This makes the logic of the TUR
checker easier to understand, and should make it easier for Brian to
implement the ALUA checker based on the same framework.

Further improvements on top of this are possible. The new TUR code,
except for the tur_check() function itself, is pretty generic and
would allow abstracting an "async checker" model with just a few
changes. This would make it possible to switch also the legacy
checkers to an asynchronous mode of operation.

Comments and reviews welcome.
Martin

Martin Wilck (4):
  multipathd: get_new_state: map PATH_TIMEOUT to PATH_DOWN
  libmpathutil: add generic implementation for checker thread runners
  multipath-tools tests: add test program for thread runners
  libmultipath: TUR checker: use runner threads

 libmpathutil/Makefile             |   2 +-
 libmpathutil/libmpathutil.version |   6 +-
 libmpathutil/runner.c             | 205 ++++++++++++
 libmpathutil/runner.h             |  84 +++++
 libmultipath/checkers/tur.c       | 341 ++++++-------------
 multipathd/main.c                 |   4 +-
 tests/Makefile                    |  15 +-
 tests/runner-test.sh              |  37 +++
 tests/runner-test.supp            |  15 +
 tests/runner.c                    | 530 ++++++++++++++++++++++++++++++
 10 files changed, 994 insertions(+), 245 deletions(-)
 create mode 100644 libmpathutil/runner.c
 create mode 100644 libmpathutil/runner.h
 create mode 100755 tests/runner-test.sh
 create mode 100644 tests/runner-test.supp
 create mode 100644 tests/runner.c

-- 
2.53.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/4] multipathd: get_new_state: map PATH_TIMEOUT to PATH_DOWN
  2026-03-19 22:13 [PATCH 0/4] multipath-tools: generic async threads for TUR checker Martin Wilck
@ 2026-03-19 22:13 ` Martin Wilck
  2026-03-24  5:35   ` Benjamin Marzinski
  2026-03-19 22:13 ` [PATCH 2/4] libmpathutil: add generic implementation for checker thread runners Martin Wilck
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Martin Wilck @ 2026-03-19 22:13 UTC (permalink / raw)
  To: Christophe Varoqui, Benjamin Marzinski, Brian Bunker, dm-devel
  Cc: Martin Wilck

We map PATH_TIMEOUT to PATH_DOWN in pathinfo(), but not in get_new_state().
Do it there, too, to treat the states consistently.
This avoids logging "checker timed out" twice in update_path_state(), even
if log_checker_err is set to "once".

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 multipathd/main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/multipathd/main.c b/multipathd/main.c
index d87c7b1..9bce4d6 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2503,8 +2503,10 @@ get_new_state(struct path *pp)
 	 * Wait for uevent for removed paths;
 	 * some LLDDs like zfcp keep paths unavailable
 	 * without sending uevents.
+	 * Also, map PATH_TIMEOUT to PATH_DOWN here, like we do in
+	 * pathinfo().
 	 */
-	if (newstate == PATH_REMOVED)
+	if (newstate == PATH_REMOVED || newstate == PATH_TIMEOUT)
 		newstate = PATH_DOWN;
 
 	/*
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/4] libmpathutil: add generic implementation for checker thread runners
  2026-03-19 22:13 [PATCH 0/4] multipath-tools: generic async threads for TUR checker Martin Wilck
  2026-03-19 22:13 ` [PATCH 1/4] multipathd: get_new_state: map PATH_TIMEOUT to PATH_DOWN Martin Wilck
@ 2026-03-19 22:13 ` Martin Wilck
  2026-03-24  5:44   ` Benjamin Marzinski
  2026-03-19 22:13 ` [PATCH 3/4] multipath-tools tests: add test program for " Martin Wilck
  2026-03-19 22:13 ` [PATCH 4/4] libmultipath: TUR checker: use runner threads Martin Wilck
  3 siblings, 1 reply; 11+ messages in thread
From: Martin Wilck @ 2026-03-19 22:13 UTC (permalink / raw)
  To: Christophe Varoqui, Benjamin Marzinski, Brian Bunker, dm-devel
  Cc: Martin Wilck

This code adds a generic, abstract implementation of the kind of threads we
need for checkers: detached threads that may time out. The design is such
that these threads never access any memory of the calling program, strongly
simplifying the management of lifetimes of objects.

See the documentation of the API in "runner.h".

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 libmpathutil/Makefile             |   2 +-
 libmpathutil/libmpathutil.version |   6 +-
 libmpathutil/runner.c             | 205 ++++++++++++++++++++++++++++++
 libmpathutil/runner.h             |  84 ++++++++++++
 4 files changed, 295 insertions(+), 2 deletions(-)
 create mode 100644 libmpathutil/runner.c
 create mode 100644 libmpathutil/runner.h

diff --git a/libmpathutil/Makefile b/libmpathutil/Makefile
index e1cc2c6..bdd7063 100644
--- a/libmpathutil/Makefile
+++ b/libmpathutil/Makefile
@@ -13,7 +13,7 @@ LIBDEPS += -lpthread -ldl -ludev -L$(mpathcmddir) -lmpathcmd $(SYSTEMD_LIBDEPS)
 
 # other object files
 OBJS := mt-libudev.o parser.o vector.o util.o debug.o time-util.o \
-	uxsock.o log_pthread.o log.o strbuf.o globals.o msort.o
+	uxsock.o log_pthread.o log.o strbuf.o globals.o msort.o runner.o
 
 all:	$(DEVLIB)
 
diff --git a/libmpathutil/libmpathutil.version b/libmpathutil/libmpathutil.version
index 22ef994..5a7df82 100644
--- a/libmpathutil/libmpathutil.version
+++ b/libmpathutil/libmpathutil.version
@@ -43,7 +43,7 @@ LIBMPATHCOMMON_1.0.0 {
 	put_multipath_config;
 };
 
-LIBMPATHUTIL_5.0 {
+LIBMPATHUTIL_6.0 {
 global:
 	alloc_bitfield;
 	alloc_strvec;
@@ -62,6 +62,8 @@ global:
 	cleanup_vector;
 	cleanup_vector_free;
 	convert_dev;
+	cancel_runner;
+	check_runner;
 	dlog;
 	filepresent;
 	fill_strbuf;
@@ -72,6 +74,7 @@ global:
 	free_strvec;
 	get_linux_version_code;
 	get_monotonic_time;
+	get_runner;
 	get_strbuf_buf__;
 	get_next_string;
 	get_strbuf_len;
@@ -162,6 +165,7 @@ global:
 	pthread_cond_init_mono;
 	recv_packet;
 	reset_strbuf;
+	runner_state_name;
 	safe_write;
 	send_packet;
 	set_max_fds;
diff --git a/libmpathutil/runner.c b/libmpathutil/runner.c
new file mode 100644
index 0000000..8a49d92
--- /dev/null
+++ b/libmpathutil/runner.c
@@ -0,0 +1,205 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+// Copyright (c) 2026 SUSE LLC
+#include <assert.h>
+#include <time.h>
+#include <pthread.h>
+#include <urcu/uatomic.h>
+#include "util.h"
+#include "debug.h"
+#include "time-util.h"
+#include "runner.h"
+
+#define STACK_SIZE (4 * 1024)
+#define MILLION 1000000
+
+const char *runner_state_name(int state)
+{
+	// clang-format off
+	static const char * const state_name_[] = {
+		[RUNNER_IDLE] = "idle",
+		[RUNNER_RUNNING] = "running",
+		[RUNNER_DONE] = "done",
+		[RUNNER_CANCELLED] = "cancelled"
+	};
+	// clang-format on
+
+	if (state < RUNNER_IDLE || state > RUNNER_CANCELLED)
+		return "unknown";
+	return state_name_[state];
+}
+
+struct runner_context {
+	int refcount;
+	int status;
+	struct timespec deadline;
+	pthread_t thr;
+	void (*func)(void *data);
+	/* User data will be copied into this area */
+	char data[];
+};
+
+static void release_context(struct runner_context *ctx)
+{
+	int n;
+
+	n = uatomic_sub_return(&ctx->refcount, 1);
+	assert(n >= 0);
+
+	if (n == 0)
+		free(ctx);
+}
+
+static void cleanup_context(struct runner_context **ctx)
+{
+	if (*ctx)
+		release_context(*ctx);
+}
+
+static void *runner_thread(void *arg)
+{
+	int st, refcount;
+	/*
+	 * The cleanup function makes sure memory is freed if the thread is
+	 * cancelled (-fexceptions).
+	 */
+	struct runner_context *ctx __attribute__((cleanup(cleanup_context))) = arg;
+
+	refcount = uatomic_add_return(&ctx->refcount, 1);
+	assert(refcount == 2);
+
+	st = uatomic_cmpxchg(&ctx->status, RUNNER_IDLE, RUNNER_RUNNING);
+	/*
+	 * The thread has already been cancelled before it was even started.
+	 * In this case, cancel_runner() doesn't release the context.
+	 * Do it here. See comments for RUNNER_IDLE in cancel_runner().
+	 */
+	if (st != RUNNER_IDLE) {
+		release_context(ctx);
+		return NULL;
+	}
+
+	(*ctx->func)(ctx->data);
+	uatomic_cmpxchg(&ctx->status, RUNNER_RUNNING, RUNNER_DONE);
+	return NULL;
+}
+
+void cancel_runner(struct runner_context *ctx)
+{
+	int st = uatomic_cmpxchg(&ctx->status, RUNNER_RUNNING, RUNNER_CANCELLED);
+	int level = 4, retry = 1;
+
+repeat:
+	switch (st) {
+	case RUNNER_IDLE:
+		/*
+		 * Race with thread startup.
+		 *
+		 * If after the following cmpxchg st is still IDLE, the cmpxchg
+		 * in runner_thread() will return CANCELLED, and the context
+		 * will be relased there. Otherwise, the thread has switched
+		 * to RUNNING in the meantime, and we will be able to cancel
+		 * it regularly if we retry.
+		 */
+		if (retry--) {
+			st = uatomic_cmpxchg(&ctx->status, RUNNER_IDLE,
+					     RUNNER_CANCELLED);
+			goto repeat;
+		}
+		level = 3;
+		break;
+	case RUNNER_RUNNING:
+		pthread_cancel(ctx->thr);
+		/* Caller has no interest in the data any more */
+		release_context(ctx);
+		return;
+	case RUNNER_DONE:
+		/* Caller has no interest in the data any more */
+		release_context(ctx);
+		/* fallthrough */
+	default:
+		level = 3;
+		break;
+	}
+	condlog(level, "%s: runner cancelled in state '%s', ctx=%p", __func__,
+		runner_state_name(st), ctx);
+}
+
+struct runner_context *get_runner(runner_func func, void *data,
+				  unsigned int size, unsigned long timeout_usec)
+{
+	static const struct timespec time_zero = {.tv_sec = 0};
+	struct runner_context *ctx;
+	pthread_attr_t attr;
+	int rc;
+
+	assert(func);
+	assert(data);
+	assert(size > 0);
+
+	ctx = malloc(sizeof(*ctx) + size);
+	if (!ctx)
+		return NULL;
+
+	ctx->func = func;
+	uatomic_set(&ctx->refcount, 1);
+	uatomic_set(&ctx->status, RUNNER_IDLE);
+	memcpy(ctx->data, data, size);
+
+	if (timeout_usec) {
+		get_monotonic_time(&ctx->deadline);
+		ctx->deadline.tv_sec += timeout_usec / MILLION;
+		ctx->deadline.tv_nsec += (timeout_usec % MILLION) * 1000;
+	} else
+		ctx->deadline = time_zero;
+
+	/*
+	 * This pairs with the implicit barrier in uatomic_add_return()
+	 * in runner_thread()
+	 */
+	cmm_smp_wmb();
+
+	setup_thread_attr(&attr, STACK_SIZE, 1);
+	rc = pthread_create(&ctx->thr, &attr, runner_thread, ctx);
+	pthread_attr_destroy(&attr);
+
+	if (rc) {
+		condlog(1, "%s: pthread_create(): %s", __func__, strerror(rc));
+		release_context(ctx);
+		return NULL;
+	}
+	return ctx;
+}
+
+int check_runner(struct runner_context *ctx, void *data, unsigned int size)
+{
+	int st = uatomic_read(&ctx->status);
+
+	switch (st) {
+	case RUNNER_DONE:
+		if (data)
+			/* hand back the data to the caller */
+			memcpy(data, ctx->data, size);
+		/* caller is done with this context */
+		release_context(ctx);
+		/* fallthrough */
+	case RUNNER_CANCELLED:
+		return st;
+	case RUNNER_IDLE:
+	case RUNNER_RUNNING:
+		if (ctx->deadline.tv_sec != 0 || ctx->deadline.tv_nsec != 0) {
+			struct timespec now;
+
+			get_monotonic_time(&now);
+			if (timespeccmp(&ctx->deadline, &now) <= 0) {
+				cancel_runner(ctx);
+				return RUNNER_CANCELLED;
+			}
+		}
+		return RUNNER_RUNNING;
+	default:
+		condlog(1, "%s: runner in impossible state '%s'", __func__,
+			runner_state_name(st));
+		assert(false);
+		return st;
+	}
+}
diff --git a/libmpathutil/runner.h b/libmpathutil/runner.h
new file mode 100644
index 0000000..0c9cf80
--- /dev/null
+++ b/libmpathutil/runner.h
@@ -0,0 +1,84 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+// Copyright (c) 2026 SUSE LLC
+#ifndef RUNNER_H_INCLUDED
+#define RUNNER_H_INCLUDED
+
+enum runner_status {
+	RUNNER_IDLE,
+	RUNNER_RUNNING,
+	RUNNER_DONE,
+	RUNNER_CANCELLED,
+};
+
+typedef void (*runner_func)(void *data);
+struct runner_context;
+
+/**
+ * runner_state_name(): helper for printing runner states
+ *
+ * @param state: a valid @runner_status value
+ * @returns: a string describing the status
+ */
+const char *runner_state_name(int state);
+
+/**
+ * get_runner(): start a runner thread
+ *
+ * This function starts a runner thread that calls @func(@data).
+ * The thread is created as detached thread.
+ * @data will be copied to thread-private memory, which will be freed when
+ * the runner terminates.
+ * Output values can be retrieved with @check_runner().
+ *
+ * @param func: the worker function to invoke
+ *        This parameter must not be NULL.
+ * @param data: pointer the data to pass to the function (input and output)
+ *        This parameter must not be NULL.
+ * @param size: the size (in bytes) of the data passed to the function
+ *        This parameter must be positive.
+ * @param timeout_sec: timeout (in seconds) after which to cancel the runner.
+ *        If it is 0, the runner will not time out.
+ * @returns: a runner context that must be passed to the functions below.
+ */
+struct runner_context *get_runner(runner_func func, void *data,
+				  unsigned int size, unsigned long timeout_usec);
+
+/**
+ * cancel_runner(): cancel a runner thread.
+ *
+ * This function should be called to terminate runners on program exit,
+ * or to terminate runners that have no timeout set in @get_runner,
+ * and haven't completed.
+ * Upon return of this function, @ctx becomes stale and shouldn't accessed
+ * any more.
+ *
+ * @param ctx: the context of the runner to be cancelled.
+ */
+void cancel_runner(struct runner_context *ctx);
+
+/**
+ * check_runner(): query the state of a runner thread and obtain results
+ *
+ * Check the state of a runner previously started with @get_runner. If the
+ * thread function has completed, @RUNNER_DONE will be returned, and the
+ * user data will be copied into the @data argument. If @check_runner returns
+ * anything else, @data contains no valid data. The @size argument
+ * will typically be the same as the @size passed to @get_runner (meaning
+ * that @data represents an object of the same type as the @data argument
+ * passed to @get_runner previously).
+ * If @check_runner returns @RUNNER_CANCELLED or @RUNNER_DONE, the @ctx object
+ * has become stale and must not be used any more.
+ * @RUNNER_CANCELLED is returned if the timeout set in @get_runner has passed
+ * and the worker function hasn't returned yet.
+ *
+ * @param ctx: the context of the runner to be queried
+ * @param data: memory pointer that will receive results of the worker
+ * function. Can be NULL, in which case no data will be copied.
+ * @param size: size of the memory pointed to by data. It must be no bigger
+ *        than the size of the memory passed to @get_runner for this runner.
+ *        It can be smaller, but no more than @size bytes will be copied.
+ * @returns: @RUNNER_RUNNING, @RUNNER_DONE, or @RUNNER_CANCELLED.
+ */
+int check_runner(struct runner_context *ctx, void *data, unsigned int size);
+
+#endif /* RUNNER_H_INCLUDED */
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/4] multipath-tools tests: add test program for thread runners
  2026-03-19 22:13 [PATCH 0/4] multipath-tools: generic async threads for TUR checker Martin Wilck
  2026-03-19 22:13 ` [PATCH 1/4] multipathd: get_new_state: map PATH_TIMEOUT to PATH_DOWN Martin Wilck
  2026-03-19 22:13 ` [PATCH 2/4] libmpathutil: add generic implementation for checker thread runners Martin Wilck
@ 2026-03-19 22:13 ` Martin Wilck
  2026-03-24  5:47   ` Benjamin Marzinski
  2026-03-19 22:13 ` [PATCH 4/4] libmultipath: TUR checker: use runner threads Martin Wilck
  3 siblings, 1 reply; 11+ messages in thread
From: Martin Wilck @ 2026-03-19 22:13 UTC (permalink / raw)
  To: Christophe Varoqui, Benjamin Marzinski, Brian Bunker, dm-devel
  Cc: Martin Wilck

Add a test program for the "runner" thread implementation from
the previous commit. The test program runs simulated "hanging" threads that
may time out, and optionally kills all threads at an arbitrary point in
time. See the comments at the top of the file for details.

Also add a test driver script (runner-test.sh) with a few reasonable
combinations of command line arguments.

The test program has been used to test the "runner" implementation
extensively on different architectures (x86_64, aarch64, ppc64le), using
both valgrind and the gcc address sanitizer (libasan) for detection of
memory leaks and use-after-free errors.

For valgrind, a suppression file needs to be added, as valgrind doesn't
seem to capture the deallocation of thread local storage for detached
threads in the test case where the test program is killed. The suppression
affects only memory allocated by glibc. This leak has not been seen with
libasan, only with valgrind.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 tests/Makefile         |  15 +-
 tests/runner-test.sh   |  37 +++
 tests/runner-test.supp |  15 ++
 tests/runner.c         | 530 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 593 insertions(+), 4 deletions(-)
 create mode 100755 tests/runner-test.sh
 create mode 100644 tests/runner-test.supp
 create mode 100644 tests/runner.c

diff --git a/tests/Makefile b/tests/Makefile
index 9f1b950..f0e5dd3 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -9,12 +9,12 @@ CFLAGS += $(BIN_CFLAGS) -Wno-unused-parameter -Wno-unused-function $(W_MISSING_I
 LIBDEPS += -L. -L $(mpathutildir) -L$(mpathcmddir) -lmultipath -lmpathutil -lmpathcmd -lcmocka
 
 TESTS := uevent parser util dmevents hwtable blacklist unaligned vpd pgpolicy \
-	 alias directio valid devt mpathvalid strbuf sysfs features cli mapinfo
+	 alias directio valid devt mpathvalid strbuf sysfs features cli mapinfo runner
 HELPERS := test-lib.o test-log.o
 
 .PRECIOUS: $(TESTS:%=%-test)
 
-all:	$(TESTS:%=%.out)
+all:	$(TESTS:%=%.out) runner-test.out
 progs:	$(TESTS:%=%-test) lib/libchecktur.so
 valgrind:	$(TESTS:%=%.vgr)
 
@@ -65,6 +65,7 @@ features-test_LIBDEPS := -ludev -lpthread
 features-test_OBJDEPS := $(mpathutildir)/mt-libudev.o
 cli-test_OBJDEPS := $(daemondir)/cli.o
 mapinfo-test_LIBDEPS = -lpthread -ldevmapper
+runner-test_LIBDEPS = -lpthread
 
 %.o: %.c
 	@echo building $@ because of $?
@@ -81,12 +82,18 @@ lib/libchecktur.so:
 %.vgr:  %-test lib/libchecktur.so
 	@echo == running valgrind for $< ==
 	@LD_LIBRARY_PATH=.:$(mpathutildir):$(mpathcmddir) \
-		valgrind --leak-check=full --error-exitcode=128 ./$< >$@ 2>&1
+		valgrind --leak-check=full --error-exitcode=128 --suppressions=./runner-test.supp ./$< >$@ 2>&1
 
 OBJS = $(TESTS:%=%.o) $(HELPERS)
 
+runner-test.out: runner-test
+	$(Q)./runner-test.sh >$@ 2>&1
+
+runner-test.vgr: runner-test
+	$(Q)VALGRIND=1 ./runner-test.sh >$@ 2>&1
+
 test_clean:
-	$(Q)$(RM) $(TESTS:%=%.out) $(TESTS:%=%.vgr) *.so*
+	$(Q)$(RM) $(TESTS:%=%.out) $(TESTS:%=%.vgr) *.so* runner-test.out runner-test.vgr
 
 valgrind_clean:
 	$(Q)$(RM) $(TESTS:%=%.vgr)
diff --git a/tests/runner-test.sh b/tests/runner-test.sh
new file mode 100755
index 0000000..214433d
--- /dev/null
+++ b/tests/runner-test.sh
@@ -0,0 +1,37 @@
+#! /bin/sh
+
+export LD_LIBRARY_PATH=../libmultipath:../libmpathutil:../libmpathcmd
+export MPATHTEST_VERBOSITY=2
+
+RUNNER=./runner-test
+if [ "$VALGRIND" ]; then
+    command -v valgrind >/dev/null && \
+	RUNNER="valgrind --leak-check=full --error-exitcode=128 --max-threads=5000 --suppressions=./runner-test.supp ./runner-test"
+fi
+
+# LSAN is not supported on ppc64le
+case $(uname -m) in
+    x86_64|aarch64)
+	export ASAN_OPTIONS=detect_leaks=1
+	export LSAN_OPTIONS=report_objects=1
+	;;
+esac
+
+# Test scenarios
+# 1. timeout 0 - test runner creation / cancellation races
+# 2. "realistic" test, scaled down by a factor 10 in time
+# 3. / 4. Tests with high likelihood of completion / cancellation race
+set -- \
+    "-N 100 -p 1 -t 0 -n 2 -b 1 -s 1 -i -r 20" \
+    "-N 1000 -p 100 -t 3000 -n 2999 -b 5 -s 1 -i -r 20 -k 30000" \
+    "-N 1000 -p 10 -t 3000 -n 1 -b 1 -s 1 -i -r 20 -k 23000" \
+    "-N 100 -p 1 -t 3000 -n 0 -s 1 -i -r 20"
+
+errors=0
+for args in "$@"; do
+    # shellcheck disable=SC2086
+    $RUNNER $args || errors=$((errors+1))
+done
+
+echo "$0: ERRORS: $errors"
+[ $errors -eq 0 ]
diff --git a/tests/runner-test.supp b/tests/runner-test.supp
new file mode 100644
index 0000000..e099ec9
--- /dev/null
+++ b/tests/runner-test.supp
@@ -0,0 +1,15 @@
+{
+   glibc TLS for detached threads if program is terminated
+   Memcheck:Leak
+   match-leak-kinds: possible
+   fun:calloc
+   ...
+   fun:allocate_dtv
+   fun:_dl_allocate_tls
+   fun:allocate_stack
+   ...
+   fun:get_runner
+   fun:start_runner
+   ...
+   fun:main
+}
diff --git a/tests/runner.c b/tests/runner.c
new file mode 100644
index 0000000..21a44ca
--- /dev/null
+++ b/tests/runner.c
@@ -0,0 +1,530 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+/*
+ * Test reliability of the runner implementation.
+ *
+ * This tests simulates "path checkers" being started through the
+ * runner code. It creates threads that run for a variable amount of time,
+ * optionally ignoring cancellation signals. The runners have a fixed
+ * timeout (TIMEOUT_USEC, "-t" option), after which they are considered
+ * "hanging" and will be cancelled. The actual runtime of the runner is random.
+ * It varies between (TIMEOUT_USEC - NOISE USEC) and
+ * (TIMEOUT_USEC + NOISE_BIAS * NOISE_USEC). These noise parameters are
+ * set with the "-n" and "-b" option, respectively.
+ * This allows simulating frequent races between cancellation and regular
+ * completion of threads. Note that because timers in different threads
+ * aren't started simultaneously, NOISE_USEC = 0 doesn't mean that the
+ * runners will complete exactly at the point in time when there timer
+ * expires.
+ * The runners simulate waiting checkers by simply sleeping. Optionally,
+ * they can ignore cancellation while sleeping (IGNORE_CANCEL, -i) or
+ * divide the sleep time in multiple "steps" (-s), between which they can
+ * be cancelled.
+ * Like multipathd, the main thread "polls" the status of the runners in
+ * regular time intervals that are set with POLL_USEC (-p). Because of
+ * this polling behavior, it can happen that a runner finishes after
+ * its timeout has expired. The runner code (and this test) treats this case
+ * as successful completion.
+ * If a runner completes, the result is checked against the expected value.
+ * The number of threads that have not finished (either successfully or
+ * cancelled, plus the number of wrong results of completed runners is
+ * the error count.
+ * The N_RUNNERS (-N) option determines how many simultaneous threads are
+ * started.
+ * The test runs until all runners have either completed or expired, or
+ * until a maximum wait time is reached, which is calculated from the
+ * test parameters (max_wait in run_test()). The REPEAT (-r) parameter
+ * determines the number of times the entire test is repeated.
+ * The KILL_TIMEOUT (-k) parameter is for simulating a shutdown of the
+ * main program (think multipathd). When this timeout expires, all pending
+ * runners are cancelled and the program terminates.
+ *
+ * A "realistic" simulation of multipathd path checkers would use options
+ * roughly like this:
+ *
+ *   runner-test -N 1000 -p 1000 -t 30000 -n 29990 -b 5 -s 1 -i -r 20 -k 300000
+ *
+ * (note that time options are in ms, whereas the code uses us), but this
+ * takes a very long time to run.
+ *
+ * Scaled down, it becomes:
+ *
+ *   runner-test -N 1000 -p 100 -t 3000 -n 2999 -b 5 -s 1 -i -r 20 -k 30000
+ *
+ * A less realistic run with high likelihood of completionc / cancellation races:
+ *
+ *   runner-test -N 1000 -p 10 -t 3000 -n 1 -b 1 -s 1 -i -r 20
+ *
+ * Here, all runners finish in a +-1ms timer interval around the timeout.
+ * Even with -n 0 (no noise), with a sufficient number of runners, some runners
+ * will time out.
+ */
+
+#include <time.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <signal.h>
+#include <pthread.h>
+#include <stdbool.h>
+#include <sched.h>
+#include <errno.h>
+#include <sys/select.h>
+#include <sys/wait.h>
+#include "debug.h"
+#include "time-util.h"
+#include "runner.h"
+#include "runner.h"
+#include "globals.c"
+
+#define MILLION 1000000
+#define BILLION 1000000000
+
+static int N_RUNNERS = 100;
+/* sleep time between runner status polling */
+static long POLL_USEC = 10000;
+/* timeout for runner */
+static long TIMEOUT_USEC = 100000;
+/* random noise to subtract / add to the sleep time */
+static long NOISE_USEC = 10000;
+/*
+ * Factor to increase noise towards longer sleep times (timeouts).
+ * The actual sleep time will be in the interval
+ * [ TIMEOUT_USEC - NOISE_USEC, TIMEOUT_USEC + NOISE_USEC * NOISE_BIAS ]
+ */
+static int NOISE_BIAS = 5;
+/* number of sleep intervals the runner uses */
+static int SLEEP_STEPS = 1;
+/* time after which to kill all runners */
+static long KILL_TIMEOUT = 0;
+/* number of repeated runs */
+static int REPEAT = 10;
+/* whether to ignore cancellation signals */
+static bool IGNORE_CANCEL = false;
+
+/* gap in the paylod to similate larger size */
+#define PAYLOAD_GAP 128
+
+struct payload {
+	long wait_nsec;
+	int steps;
+	bool ignore_cancel;
+	int start;
+	char pad[PAYLOAD_GAP];
+	int end;
+};
+
+static void wait_and_add_1(void *arg)
+{
+	struct payload *t1 = arg;
+	struct timespec wait;
+	int i, cancelstate;
+
+	wait.tv_sec = t1->wait_nsec / BILLION;
+	wait.tv_nsec = t1->wait_nsec % BILLION;
+	normalize_timespec(&wait);
+	for (i = 0; i < t1->steps; i++) {
+		if (t1->ignore_cancel)
+			pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, &cancelstate);
+		if (nanosleep(&wait, NULL) != 0 && errno != EINTR)
+			condlog(3, "%s: nanosleep: %s", __func__, strerror(errno));
+		if (t1->ignore_cancel) {
+			pthread_setcancelstate(cancelstate, NULL);
+		}
+		pthread_testcancel();
+	}
+	t1->end = t1->start + 1;
+}
+
+static bool payload_error(const struct payload *p)
+{
+	return p->end != p->start + 1;
+}
+
+static int check_payload(struct runner_context *ctx, bool *error)
+{
+	struct timespec now;
+	struct payload t1;
+	int st = check_runner(ctx, &t1, sizeof(t1));
+
+	if (st == RUNNER_RUNNING)
+		return st;
+
+	clock_gettime(CLOCK_MONOTONIC, &now);
+	if (st == RUNNER_DONE) {
+		if (error)
+			*error = payload_error(&t1);
+		condlog(3, "runner finished in state 'done' at %lld.%06lld, start %d end %d",
+			(long long)now.tv_sec, (long long)now.tv_nsec / 1000,
+			t1.start, t1.end);
+	} else
+		condlog(3, "runner finished in state '%s' at %lld.%06lld",
+			runner_state_name(st), (long long)now.tv_sec,
+			(long long)now.tv_nsec / 1000);
+	return st;
+}
+
+static struct runner_context *
+start_runner(long usecs, int start, long noise_range_usec, long bias,
+	     int steps, bool ignore_cancel)
+{
+
+	struct payload t1;
+	struct runner_context *ctx;
+	long noise;
+
+	if (noise_range_usec > 0)
+		noise = random() % ((bias + 1) * noise_range_usec) -
+			noise_range_usec;
+	else
+		noise = 0;
+	t1.start = start;
+	t1.end = 0;
+	t1.wait_nsec = (usecs + noise) * 1000 / steps;
+	t1.wait_nsec = t1.wait_nsec > 0 ? t1.wait_nsec : 0;
+	t1.steps = steps;
+	t1.ignore_cancel = ignore_cancel;
+
+	ctx = get_runner(wait_and_add_1, &t1, sizeof(t1), usecs);
+
+	if (ctx) {
+		struct timespec tmo, finish;
+
+		clock_gettime(CLOCK_MONOTONIC, &tmo);
+		tmo.tv_sec += usecs / MILLION;
+		tmo.tv_nsec += (usecs % MILLION) * 1000;
+		finish = tmo;
+		normalize_timespec(&tmo);
+		finish.tv_sec += noise / MILLION;
+		finish.tv_nsec += (noise % MILLION) * 1000;
+		normalize_timespec(&finish);
+		condlog(4, "started runner start %d timeout %lld.%06lld, finish %lld.%06lld (noise %ld), steps %d, %signoring cancellation",
+			start, (long long)tmo.tv_sec,
+			(long long)tmo.tv_nsec / 1000, (long long)finish.tv_sec,
+			(long long)finish.tv_nsec / 1000, noise, steps,
+			ignore_cancel ? "" : "not ");
+		return ctx;
+	} else {
+		condlog(0, "failed to start runner for start %d", start);
+		return NULL;
+	}
+}
+
+static struct runner_context **context;
+static volatile bool must_stop = false;
+void int_handler(int signal)
+{
+	must_stop = true;
+}
+
+static void terminate_all(void)
+{
+	int i, count;
+
+	for (count = 0, i = 0; i < N_RUNNERS; i++)
+		if (context[i]) {
+			cancel_runner(context[i]);
+			context[i] = NULL;
+			count++;
+		}
+	condlog(3, "%s: %d runners cancelled", __func__, count);
+	/* give runners a chance to clean up */
+	sched_yield();
+}
+
+static bool test_sleep(const struct timespec *wait)
+{
+	sigset_t set;
+
+	sigfillset(&set);
+	sigdelset(&set, SIGTERM);
+	sigdelset(&set, SIGINT);
+	sigdelset(&set, SIGQUIT);
+
+	pselect(0, NULL, NULL, NULL, wait, &set);
+
+	if (!must_stop)
+		return false;
+
+	terminate_all();
+	return true;
+}
+
+static int run_test(int n)
+{
+	int i, running, done, errors;
+	const struct timespec wait = {.tv_sec = 0, .tv_nsec = 1000 * POLL_USEC};
+	struct timespec stop, now;
+	long max_wait = TIMEOUT_USEC + NOISE_BIAS * NOISE_USEC + 10000;
+	bool killed = false;
+
+	for (i = 0; i < N_RUNNERS; i++)
+		context[i] = start_runner(TIMEOUT_USEC, i, NOISE_USEC, NOISE_BIAS,
+					  SLEEP_STEPS, IGNORE_CANCEL);
+
+	clock_gettime(CLOCK_MONOTONIC, &stop);
+	stop.tv_sec += max_wait / MILLION;
+	stop.tv_nsec += (max_wait % MILLION) * 1000;
+	normalize_timespec(&stop);
+	running = N_RUNNERS;
+	done = 0;
+	errors = 0;
+	do {
+		bool err = false;
+
+		condlog(4, "%d runners active", running);
+		killed = test_sleep(&wait);
+		if (killed)
+			condlog(3, "%s: terminating on signal...", __func__);
+		for (running = 0, i = 0; i < N_RUNNERS; i++) {
+			int st;
+
+			if (!context[i])
+				continue;
+			st = check_payload(context[i], &err);
+			switch (st) {
+			case RUNNER_DONE:
+				if (err)
+					errors++;
+				done++;
+				/* fallthrough */
+			case RUNNER_CANCELLED:
+				context[i] = NULL;
+				break;
+			default:
+				running++;
+				break;
+			}
+		}
+		if (killed)
+			break;
+		clock_gettime(CLOCK_MONOTONIC, &now);
+		if (timespeccmp(&stop, &now) <= 0)
+			break;
+	} while (running);
+
+	condlog(2, "%10d%10d%10d%10d%10d", n, N_RUNNERS, N_RUNNERS - running,
+		done, errors);
+	if (killed) {
+		condlog(2, "%s: termination signal received", __func__);
+		exit(0);
+	}
+
+	if (running > 0) {
+		condlog(1, "ERROR: %d runners haven't finished", running);
+		terminate_all();
+	}
+	return running + errors;
+}
+
+static void free_ctxs(struct runner_context ***ctxs)
+{
+	if (*ctxs)
+		free(*ctxs);
+}
+
+static int setup_signal_handler(int sig, void (*handler)(int))
+{
+	sigset_t set;
+	sigfillset(&set);
+	struct sigaction sga = {.sa_handler = NULL};
+
+	sga.sa_handler = int_handler;
+	sga.sa_mask = set;
+	if (sigaction(sig, &sga, NULL) != 0) {
+		condlog(1, "%s: failed to install signal handler for %d: %s",
+			__func__, sig, strerror(errno));
+		return -1;
+	}
+	return 0;
+}
+
+int run_tests(void)
+{
+	int errors = 0, i;
+	struct runner_context **ctxs __attribute__((cleanup(free_ctxs))) = NULL;
+
+	if (setup_signal_handler(SIGINT, int_handler) != 0)
+		return -1;
+
+	ctxs = calloc(N_RUNNERS, sizeof(*context));
+	if (ctxs == NULL)
+		/* arbitrary number to indicate OOM error */
+		return 7000;
+	context = ctxs;
+	for (i = 0; i < REPEAT; i++) {
+		errors += run_test(i + 1);
+	}
+	return errors ? 1 : 0;
+}
+
+/* We need to register a dummy handler to avoid system call restarting in
+ * pselect() below */
+static void dummy_handler(int sig) {}
+
+static int fork_test(void)
+{
+	sigset_t set;
+	pid_t child;
+	int wstatus;
+	struct timespec wait_to_kill = {.tv_sec = 0};
+
+	/* Block all signals. termination signals will be enabled in test_sleep() */
+	sigfillset(&set);
+	pthread_sigmask(SIG_SETMASK, &set, NULL);
+
+	child = fork();
+
+	if (child < 0) {
+		condlog(0, "error in fork(), %s", strerror(errno));
+		return -1;
+	} else if (child == 0) {
+		/* child */
+		int rc = run_tests();
+		exit(rc ? 1 : 0);
+	}
+
+	setup_signal_handler(SIGCHLD, dummy_handler);
+
+	/* parent */
+	if (KILL_TIMEOUT > 0) {
+		sigset_t set;
+
+		condlog(3, "%s: == Child %d will be killed with SIGINT after %ld us",
+			__func__, child, KILL_TIMEOUT);
+		wait_to_kill.tv_sec = KILL_TIMEOUT / MILLION;
+		wait_to_kill.tv_nsec = (KILL_TIMEOUT % MILLION) * 1000;
+
+		/*
+		 * Unblock SIGCHLD in case thild terminates
+		 * (child will receive SIGINT)
+		 */
+		sigfillset(&set);
+		sigdelset(&set, SIGCHLD);
+		if (pselect(0, NULL, NULL, NULL, &wait_to_kill, &set) != 0) {
+			if (errno == EINTR)
+				condlog(2, "main: child terminated");
+			else
+				condlog(1, "main: error in pselect: %s",
+					strerror(errno));
+		} else
+			kill(child, SIGINT);
+	}
+
+	if (waitpid(child, &wstatus, 0) <= 0) {
+		condlog(1, "%s: failed to wait for child %d", __func__, child);
+		return -1;
+	}
+	if (WIFEXITED(wstatus)) {
+		condlog(3, "%s: child %d return code %d", __func__, child,
+			WEXITSTATUS(wstatus));
+		return WEXITSTATUS(wstatus);
+	} else if (WIFSIGNALED(wstatus)) {
+		condlog(2, "%s: child %d killed by signal code %d", __func__,
+			child, WTERMSIG(wstatus));
+		return -1;
+	} else {
+		condlog(1, "%s: unexpected status of child %d", __func__, child);
+		return -1;
+	}
+}
+
+static long parse_number(const char *arg, long factor, long deflt)
+{
+	char *ep;
+	long v;
+
+	if (*arg) {
+		v = strtol(arg, &ep, 10);
+		if (!*ep && v >= 0)
+			return factor * v;
+	}
+	condlog(1, "invalid argument: %s, using %ld", arg, deflt);
+	return deflt;
+}
+
+static int usage(const char *cmd, int opt)
+{
+#define USAGE_FMT \
+	"Usage: %s [options]\n" \
+		"	-N runners:  number of parallel runners\n" \
+		"	-p msecs:    time to sleep between status polls in main thread\n" \
+		"	-t msecs:    timeout for runners\n" \
+		"	-n msecs:    random noise for runner sleep time\n" \
+		"	-b bias:     noise increase factor towards sleeping longer\n" \
+		"	-s n:        number of steps to divide sleep time into\n" \
+		"	-k msecs:    timeout after which to kill all runners (0: don't kill)\n" \
+		"	-r n:        number of times to repeat test\n" \
+		"	-i:          runners ignore cancellation while sleeping\n" \
+		"	-v n:        set verbosity level (default 2)\n" \
+		"	-h:          print this help"
+	condlog(0, USAGE_FMT, cmd);
+	return opt == 'h' ? 0 : 1;
+}
+
+int main(int argc, char *argv[])
+{
+	int opt;
+	int total = 0;
+	const char *optstring = "+:N:p:t:n:b:s:k:r:v:ih";
+
+	init_test_verbosity(2);
+
+	while ((opt = getopt(argc, argv, optstring)) != -1) {
+		switch (opt) {
+		case 'N':
+			N_RUNNERS = parse_number(optarg, 1L, N_RUNNERS);
+			break;
+		case 'p':
+			POLL_USEC = parse_number(optarg, 1000L, POLL_USEC);
+			break;
+		case 't':
+			TIMEOUT_USEC = parse_number(optarg, 1000L, TIMEOUT_USEC);
+			break;
+		case 'n':
+			NOISE_USEC = parse_number(optarg, 1000L, NOISE_USEC);
+			break;
+		case 'b':
+			NOISE_BIAS = parse_number(optarg, 1L, NOISE_BIAS);
+			break;
+		case 's':
+			SLEEP_STEPS = parse_number(optarg, 1L, SLEEP_STEPS);
+			break;
+		case 'k':
+			KILL_TIMEOUT = parse_number(optarg, 1000L, KILL_TIMEOUT);
+			break;
+		case 'r':
+			REPEAT = parse_number(optarg, 1L, REPEAT);
+			break;
+		case 'i':
+			IGNORE_CANCEL = true;
+			break;
+		case 'v':
+			libmp_verbosity = parse_number(optarg, 1L, libmp_verbosity);
+			break;
+		case 'h':
+		case ':':
+		case '?':
+			return usage(argv[0], opt);
+			break;
+		}
+	}
+
+	if (optind != argc)
+		return usage(argv[0], '?');
+
+	condlog(2, "Runner: timeout=%ld, noise interval=[%ld:%ld], steps=%d",
+		TIMEOUT_USEC, TIMEOUT_USEC - NOISE_USEC,
+		TIMEOUT_USEC + NOISE_BIAS * NOISE_USEC, SLEEP_STEPS);
+	condlog(2, "Other : poll interval=%ld, ignore cancellation=%s, runners=%d, repeat=%d, kill timeout=%ld",
+		POLL_USEC, IGNORE_CANCEL ? "YES" : "NO", N_RUNNERS, REPEAT,
+		KILL_TIMEOUT);
+	condlog(2, "%10s%10s%10s%10s%10s", "run", "total", "finished",
+		"completed", "errors");
+
+	total = fork_test();
+	if (total == -1)
+		return 130;
+	condlog(2, "== TOTAL NUMBER OF FAILED RUNS: %d", total);
+	return total ? 1 : 0;
+}
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/4] libmultipath: TUR checker: use runner threads
  2026-03-19 22:13 [PATCH 0/4] multipath-tools: generic async threads for TUR checker Martin Wilck
                   ` (2 preceding siblings ...)
  2026-03-19 22:13 ` [PATCH 3/4] multipath-tools tests: add test program for " Martin Wilck
@ 2026-03-19 22:13 ` Martin Wilck
  2026-03-24  6:38   ` Benjamin Marzinski
  3 siblings, 1 reply; 11+ messages in thread
From: Martin Wilck @ 2026-03-19 22:13 UTC (permalink / raw)
  To: Christophe Varoqui, Benjamin Marzinski, Brian Bunker, dm-devel
  Cc: Martin Wilck

Use the generic runners to simplify the TUR checker code.

The logic is the same as before, with one exception: the runner API does
not allow to track a thread after it has been cancelled, like our current
code does. The current code used this to determine when a previously
cancelled thread eventually did complete. The use case for doing this in
the current code was the MAX_NR_RUNNERS logic: the code waited "forever"
for the "last" checker to complete. This is not possible with runners;
once a runner is cancelled, it is off limits. (Waiting for a zombie thread to
write to memory belonging to the main program, as the current code does,
doesn't seem to be the best idea, anyway).

Therefore this patch implements a different logic for MAX_NR_RUNNERS. When
nr_timeouts reaches the limit, the last checker is spawned with an infinite
timeout. Like before, no new checker thread will be spawned until this last
checker eventually completes. One difference to the previous code is that
this last checker thread is never cancelled (after all, it never times
out). But this code is meant for the case where cancellation is not
effective, anyway. Another difference is that the new code will report
PATH_TIMEOUT state as soon as this last thread is started rather than
PATH_PENDING. That seems reasonable given that the checker previously timed
out for this thread.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 libmultipath/checkers/tur.c | 341 +++++++++++-------------------------
 1 file changed, 103 insertions(+), 238 deletions(-)

diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c
index ba4ca68..6f22c45 100644
--- a/libmultipath/checkers/tur.c
+++ b/libmultipath/checkers/tur.c
@@ -3,7 +3,6 @@
  *
  * Copyright (c) 2004 Christophe Varoqui
  */
-#include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <sys/types.h>
@@ -17,13 +16,13 @@
 #include <pthread.h>
 #include <urcu.h>
 #include <urcu/uatomic.h>
+#include <assert.h>
 
 #include "checkers.h"
 
 #include "debug.h"
 #include "sg_include.h"
-#include "util.h"
-#include "time-util.h"
+#include "runner.h"
 
 #define TUR_CMD_LEN 6
 #define HEAVY_CHECK_COUNT       10
@@ -45,21 +44,20 @@ const char *libcheck_msgtable[] = {
 	NULL,
 };
 
-struct tur_checker_context {
-	dev_t devt;
-	int state;
-	int running; /* uatomic access only */
+struct tur_context {
 	int fd;
+	dev_t devt;
 	unsigned int timeout;
-	time_t time;
-	pthread_t thread;
-	pthread_mutex_t lock;
-	pthread_cond_t active;
-	int holders; /* uatomic access only */
-	int msgid;
+	int state;
+	short msgid;
+};
+
+struct tur_checker_context {
 	struct checker_context ctx;
+	int last_runner_state;
 	unsigned int nr_timeouts;
-	bool checked_state;
+	struct runner_context *rtx;
+	struct tur_context tcx;
 };
 
 int libcheck_init (struct checker * c)
@@ -67,48 +65,26 @@ int libcheck_init (struct checker * c)
 	struct tur_checker_context *ct;
 	struct stat sb;
 
-	ct = malloc(sizeof(struct tur_checker_context));
-	if (!ct)
-		return 1;
-	memset(ct, 0, sizeof(struct tur_checker_context));
-
-	ct->state = PATH_UNCHECKED;
-	ct->fd = -1;
-	uatomic_set(&ct->holders, 1);
-	pthread_cond_init_mono(&ct->active);
-	pthread_mutex_init(&ct->lock, NULL);
+	ct = calloc(1, sizeof(*ct));
+	ct->tcx.state = PATH_UNCHECKED;
+	ct->tcx.fd = -1;
 	if (fstat(c->fd, &sb) == 0)
-		ct->devt = sb.st_rdev;
+		ct->tcx.devt = sb.st_rdev;
 	ct->ctx.cls = c->cls;
 	c->context = ct;
-
 	return 0;
 }
 
-static void cleanup_context(struct tur_checker_context *ct)
-{
-	pthread_mutex_destroy(&ct->lock);
-	pthread_cond_destroy(&ct->active);
-	free(ct);
-}
-
 void libcheck_free (struct checker * c)
 {
-	if (c->context) {
-		struct tur_checker_context *ct = c->context;
-		int holders;
-		int running;
+	struct tur_checker_context *tcc = c->context;
 
-		running = uatomic_xchg(&ct->running, 0);
-		if (running)
-			pthread_cancel(ct->thread);
-		ct->thread = 0;
-		holders = uatomic_sub_return(&ct->holders, 1);
-		if (!holders)
-			cleanup_context(ct);
-		c->context = NULL;
-	}
-	return;
+	if (!tcc)
+		return;
+	c->context = NULL;
+	if (tcc->rtx)
+		cancel_runner(tcc->rtx);
+	free(tcc);
 }
 
 static int
@@ -216,19 +192,6 @@ retry:
 	return PATH_UP;
 }
 
-#define tur_thread_cleanup_push(ct) pthread_cleanup_push(cleanup_func, ct)
-#define tur_thread_cleanup_pop(ct) pthread_cleanup_pop(1)
-
-static void cleanup_func(void *data)
-{
-	int holders;
-	struct tur_checker_context *ct = data;
-
-	holders = uatomic_sub_return(&ct->holders, 1);
-	if (!holders)
-		cleanup_context(ct);
-}
-
 /*
  * Test code for "zombie tur thread" handling.
  * Compile e.g. with CFLAGS=-DTUR_TEST_MAJOR=8
@@ -273,110 +236,82 @@ static void tur_deep_sleep(const struct tur_checker_context *ct)
 #define tur_deep_sleep(x) do {} while (0)
 #endif /* TUR_TEST_MAJOR */
 
-void *libcheck_thread(struct checker_context *ctx)
+void runner_callback(void *arg)
 {
-	struct tur_checker_context *ct =
-		container_of(ctx, struct tur_checker_context, ctx);
-	int state, running;
-	short msgid;
+	struct tur_context *tcx = arg;
+	int state;
 
-	/* This thread can be canceled, so setup clean up */
-	tur_thread_cleanup_push(ct);
+	condlog(4, "%d:%d : tur checker starting up", major(tcx->devt),
+		minor(tcx->devt));
 
-	condlog(4, "%d:%d : tur checker starting up", major(ct->devt),
-		minor(ct->devt));
-
-	tur_deep_sleep(ct);
-	state = tur_check(ct->fd, ct->timeout, &msgid);
+	tur_deep_sleep(tcx);
+	state = tur_check(tcx->fd, tcx->timeout, &tcx->msgid);
+	tcx->state = state;
 	pthread_testcancel();
-
-	/* TUR checker done */
-	pthread_mutex_lock(&ct->lock);
-	ct->state = state;
-	ct->msgid = msgid;
-	pthread_cond_signal(&ct->active);
-	pthread_mutex_unlock(&ct->lock);
-
-	condlog(4, "%d:%d : tur checker finished, state %s", major(ct->devt),
-		minor(ct->devt), checker_state_name(state));
-
-	running = uatomic_xchg(&ct->running, 0);
-	if (!running)
-		pause();
-
-	tur_thread_cleanup_pop(ct);
-
-	return ((void *)0);
+	condlog(4, "%d:%d : tur checker finished, state %s", major(tcx->devt),
+		minor(tcx->devt), checker_state_name(state));
 }
 
-static void tur_set_async_timeout(struct checker *c)
+static int check_runner_state(struct tur_checker_context *tcc)
 {
-	struct tur_checker_context *ct = c->context;
-	struct timespec now;
+	struct runner_context *rtx = tcc->rtx;
+	int rc;
 
-	get_monotonic_time(&now);
-	ct->time = now.tv_sec + c->timeout;
-}
-
-static int tur_check_async_timeout(struct checker *c)
-{
-	struct tur_checker_context *ct = c->context;
-	struct timespec now;
-
-	get_monotonic_time(&now);
-	return (now.tv_sec > ct->time);
-}
-
-int check_pending(struct checker *c)
-{
-	struct tur_checker_context *ct = c->context;
-	int tur_status = PATH_PENDING;
-
-	pthread_mutex_lock(&ct->lock);
-
-	if (ct->state != PATH_PENDING || ct->msgid != MSG_TUR_RUNNING)
-	{
-		tur_status = ct->state;
-		c->msgid = ct->msgid;
-	}
-	pthread_mutex_unlock(&ct->lock);
-	if (tur_status == PATH_PENDING && c->msgid == MSG_TUR_RUNNING) {
+	rc = check_runner(rtx, &tcc->tcx, sizeof(tcc->tcx));
+	switch (rc) {
+	case RUNNER_DONE:
+		tcc->last_runner_state = rc;
+		tcc->rtx = NULL;
+		tcc->nr_timeouts = 0;
+		condlog(3, "%d:%d : tur checker finished, state %s",
+			major(tcc->tcx.devt), minor(tcc->tcx.devt),
+			checker_state_name(tcc->tcx.state));
+		break;
+	case RUNNER_CANCELLED:
+		tcc->last_runner_state = rc;
+		tcc->rtx = NULL;
+		tcc->tcx.state = PATH_TIMEOUT;
+		tcc->tcx.msgid = MSG_TUR_TIMEOUT;
+		if (tcc->nr_timeouts < MAX_NR_TIMEOUTS)
+			tcc->nr_timeouts++;
+		condlog(3, "%d:%d : tur checker timed out",
+			major(tcc->tcx.devt), minor(tcc->tcx.devt));
+		break;
+	case RUNNER_RUNNING:
 		condlog(4, "%d:%d : tur checker still running",
-			major(ct->devt), minor(ct->devt));
-	} else {
-		int running = uatomic_xchg(&ct->running, 0);
-		if (running)
-			pthread_cancel(ct->thread);
-		ct->thread = 0;
+			major(tcc->tcx.devt), minor(tcc->tcx.devt));
+		tcc->tcx.msgid = MSG_TUR_RUNNING;
+		break;
+	default:
+		assert(false);
+		break;
 	}
-
-	ct->checked_state = true;
-	return tur_status;
+	return rc;
 }
 
 bool libcheck_need_wait(struct checker *c)
 {
 	struct tur_checker_context *ct = c->context;
-	return (ct && ct->thread && uatomic_read(&ct->running) != 0 &&
-		!ct->checked_state);
+
+	return ct && ct->rtx;
 }
 
 int libcheck_pending(struct checker *c)
 {
 	struct tur_checker_context *ct = c->context;
-
 	/* The if path checker isn't running, just return the exiting value. */
-	if (!ct || !ct->thread)
+	if (!ct || !ct->rtx)
 		return c->path_state;
 
-	return check_pending(c);
+	/* This may nullify ct->rtx */
+	check_runner_state(ct);
+	c->msgid = ct->tcx.msgid;
+	return ct->tcx.state;
 }
 
 int libcheck_check(struct checker * c)
 {
 	struct tur_checker_context *ct = c->context;
-	pthread_attr_t attr;
-	int tur_status, r;
 
 	if (!ct)
 		return PATH_UNCHECKED;
@@ -384,109 +319,39 @@ int libcheck_check(struct checker * c)
 	if (checker_is_sync(c))
 		return tur_check(c->fd, c->timeout, &c->msgid);
 
-	/*
-	 * Async mode
-	 */
-	if (ct->thread) {
-		ct->checked_state = true;
-		if (tur_check_async_timeout(c)) {
-			int running = uatomic_xchg(&ct->running, 0);
-			if (running) {
-				pthread_cancel(ct->thread);
-				condlog(3, "%d:%d : tur checker timeout",
-					major(ct->devt), minor(ct->devt));
-				c->msgid = MSG_TUR_TIMEOUT;
-				tur_status = PATH_TIMEOUT;
-			} else {
-				pthread_mutex_lock(&ct->lock);
-				tur_status = ct->state;
-				c->msgid = ct->msgid;
-				pthread_mutex_unlock(&ct->lock);
-			}
-			ct->thread = 0;
-		} else if (uatomic_read(&ct->running) != 0) {
+	/* Handle the case that the checker just completed */
+	if (ct->rtx) {
+		if (check_runner_state(ct) == RUNNER_RUNNING)
 			condlog(3, "%d:%d : tur checker not finished",
-				major(ct->devt), minor(ct->devt));
-			tur_status = PATH_PENDING;
-			c->msgid = MSG_TUR_RUNNING;
-		} else {
-			/* TUR checker done */
-			ct->thread = 0;
-			pthread_mutex_lock(&ct->lock);
-			tur_status = ct->state;
-			c->msgid = ct->msgid;
-			pthread_mutex_unlock(&ct->lock);
-		}
-	} else {
-		if (uatomic_read(&ct->holders) > 1) {
-			/* The thread has been cancelled but hasn't quit. */
-			if (ct->nr_timeouts == MAX_NR_TIMEOUTS) {
-				condlog(2, "%d:%d : waiting for stalled tur thread to finish",
-					major(ct->devt), minor(ct->devt));
-				ct->nr_timeouts++;
-			}
-			/*
-			 * Don't start new threads until the last once has
-			 * finished.
-			 */
-			if (ct->nr_timeouts > MAX_NR_TIMEOUTS) {
-				c->msgid = MSG_TUR_TIMEOUT;
-				return PATH_TIMEOUT;
-			}
-			ct->nr_timeouts++;
-			/*
-			 * Start a new thread while the old one is stalled.
-			 * We have to prevent it from interfering with the new
-			 * thread. We create a new context and leave the old
-			 * one with the stale thread, hoping it will clean up
-			 * eventually.
-			 */
-			condlog(3, "%d:%d : tur thread not responding",
-				major(ct->devt), minor(ct->devt));
-
-			/*
-			 * libcheck_init will replace c->context.
-			 * It fails only in OOM situations. In this case, return
-			 * PATH_UNCHECKED to avoid prematurely failing the path.
-			 */
-			if (libcheck_init(c) != 0) {
-				c->msgid = MSG_TUR_FAILED;
-				return PATH_UNCHECKED;
-			}
-			((struct tur_checker_context *)c->context)->nr_timeouts = ct->nr_timeouts;
-
-			if (!uatomic_sub_return(&ct->holders, 1)) {
-				/* It did terminate, eventually */
-				cleanup_context(ct);
-				((struct tur_checker_context *)c->context)->nr_timeouts = 0;
-			}
-
-			ct = c->context;
-		} else
-			ct->nr_timeouts = 0;
-		/* Start new TUR checker */
-		pthread_mutex_lock(&ct->lock);
-		tur_status = ct->state = PATH_PENDING;
-		c->msgid = ct->msgid = MSG_TUR_RUNNING;
-		pthread_mutex_unlock(&ct->lock);
-		ct->fd = c->fd;
-		ct->timeout = c->timeout;
-		ct->checked_state = false;
-		uatomic_add(&ct->holders, 1);
-		uatomic_set(&ct->running, 1);
-		tur_set_async_timeout(c);
-		setup_thread_attr(&attr, 32 * 1024, 1);
-		r = start_checker_thread(&ct->thread, &attr, &ct->ctx);
-		pthread_attr_destroy(&attr);
-		if (r) {
-			uatomic_sub(&ct->holders, 1);
-			uatomic_set(&ct->running, 0);
-			ct->thread = 0;
-			condlog(3, "%d:%d : failed to start tur thread, using"
-				" sync mode", major(ct->devt), minor(ct->devt));
-			return tur_check(c->fd, c->timeout, &c->msgid);
-		}
+				major(ct->tcx.devt), minor(ct->tcx.devt));
+		c->msgid = ct->tcx.msgid;
+		return ct->tcx.state;
 	}
 
-	return tur_status;
+	/* create new checker thread */
+	ct->tcx.fd = c->fd;
+	ct->tcx.timeout = c->timeout;
+
+	if (ct->nr_timeouts < MAX_NR_TIMEOUTS) {
+		condlog(3, "%d:%d : starting checker with timeout",
+			major(ct->tcx.devt), minor(ct->tcx.devt));
+		ct->tcx.state = PATH_PENDING;
+		ct->tcx.msgid = MSG_TUR_RUNNING;
+		ct->rtx = get_runner(runner_callback, &ct->tcx,
+				     sizeof(ct->tcx), 1000000 * c->timeout);
+	} else {
+		condlog(3, "%d:%d : starting checker without timeout",
+			major(ct->tcx.devt), minor(ct->tcx.devt));
+		ct->tcx.state = PATH_TIMEOUT;
+		ct->rtx = get_runner(runner_callback, &ct->tcx, sizeof(ct->tcx), 0);
+	}
+
+	if (ct->rtx) {
+		c->msgid = ct->tcx.msgid;
+		return ct->tcx.state;
+	} else {
+		condlog(3, "%d:%d : failed to start tur thread, using sync mode",
+			major(ct->tcx.devt), minor(ct->tcx.devt));
+		return tur_check(c->fd, c->timeout, &c->msgid);
+	}
 }
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/4] multipathd: get_new_state: map PATH_TIMEOUT to PATH_DOWN
  2026-03-19 22:13 ` [PATCH 1/4] multipathd: get_new_state: map PATH_TIMEOUT to PATH_DOWN Martin Wilck
@ 2026-03-24  5:35   ` Benjamin Marzinski
  0 siblings, 0 replies; 11+ messages in thread
From: Benjamin Marzinski @ 2026-03-24  5:35 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Christophe Varoqui, Brian Bunker, dm-devel, Martin Wilck

On Thu, Mar 19, 2026 at 11:13:41PM +0100, Martin Wilck wrote:
> We map PATH_TIMEOUT to PATH_DOWN in pathinfo(), but not in get_new_state().
> Do it there, too, to treat the states consistently.
> This avoids logging "checker timed out" twice in update_path_state(), even
> if log_checker_err is set to "once".
> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>

Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/4] libmpathutil: add generic implementation for checker thread runners
  2026-03-19 22:13 ` [PATCH 2/4] libmpathutil: add generic implementation for checker thread runners Martin Wilck
@ 2026-03-24  5:44   ` Benjamin Marzinski
  0 siblings, 0 replies; 11+ messages in thread
From: Benjamin Marzinski @ 2026-03-24  5:44 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Christophe Varoqui, Brian Bunker, dm-devel, Martin Wilck

On Thu, Mar 19, 2026 at 11:13:42PM +0100, Martin Wilck wrote:
> This code adds a generic, abstract implementation of the kind of threads we
> need for checkers: detached threads that may time out. The design is such
> that these threads never access any memory of the calling program, strongly
> simplifying the management of lifetimes of objects.
> 
> See the documentation of the API in "runner.h".
> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  libmpathutil/Makefile             |   2 +-
>  libmpathutil/libmpathutil.version |   6 +-
>  libmpathutil/runner.c             | 205 ++++++++++++++++++++++++++++++
>  libmpathutil/runner.h             |  84 ++++++++++++
>  4 files changed, 295 insertions(+), 2 deletions(-)
>  create mode 100644 libmpathutil/runner.c
>  create mode 100644 libmpathutil/runner.h
> 
> diff --git a/libmpathutil/Makefile b/libmpathutil/Makefile
> index e1cc2c6..bdd7063 100644
> --- a/libmpathutil/Makefile
> +++ b/libmpathutil/Makefile
> @@ -13,7 +13,7 @@ LIBDEPS += -lpthread -ldl -ludev -L$(mpathcmddir) -lmpathcmd $(SYSTEMD_LIBDEPS)
>  
>  # other object files
>  OBJS := mt-libudev.o parser.o vector.o util.o debug.o time-util.o \
> -	uxsock.o log_pthread.o log.o strbuf.o globals.o msort.o
> +	uxsock.o log_pthread.o log.o strbuf.o globals.o msort.o runner.o
>  
>  all:	$(DEVLIB)
>  
> diff --git a/libmpathutil/libmpathutil.version b/libmpathutil/libmpathutil.version
> index 22ef994..5a7df82 100644
> --- a/libmpathutil/libmpathutil.version
> +++ b/libmpathutil/libmpathutil.version
> @@ -43,7 +43,7 @@ LIBMPATHCOMMON_1.0.0 {
>  	put_multipath_config;
>  };
>  
> -LIBMPATHUTIL_5.0 {
> +LIBMPATHUTIL_6.0 {
>  global:
>  	alloc_bitfield;
>  	alloc_strvec;
> @@ -62,6 +62,8 @@ global:
>  	cleanup_vector;
>  	cleanup_vector_free;
>  	convert_dev;
> +	cancel_runner;
> +	check_runner;
>  	dlog;
>  	filepresent;
>  	fill_strbuf;
> @@ -72,6 +74,7 @@ global:
>  	free_strvec;
>  	get_linux_version_code;
>  	get_monotonic_time;
> +	get_runner;
>  	get_strbuf_buf__;
>  	get_next_string;
>  	get_strbuf_len;
> @@ -162,6 +165,7 @@ global:
>  	pthread_cond_init_mono;
>  	recv_packet;
>  	reset_strbuf;
> +	runner_state_name;
>  	safe_write;
>  	send_packet;
>  	set_max_fds;
> diff --git a/libmpathutil/runner.c b/libmpathutil/runner.c
> new file mode 100644
> index 0000000..8a49d92
> --- /dev/null
> +++ b/libmpathutil/runner.c
> @@ -0,0 +1,205 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +// Copyright (c) 2026 SUSE LLC
> +#include <assert.h>
> +#include <time.h>
> +#include <pthread.h>
> +#include <urcu/uatomic.h>
> +#include "util.h"
> +#include "debug.h"
> +#include "time-util.h"
> +#include "runner.h"
> +
> +#define STACK_SIZE (4 * 1024)
> +#define MILLION 1000000
> +
> +const char *runner_state_name(int state)
> +{
> +	// clang-format off
> +	static const char * const state_name_[] = {
> +		[RUNNER_IDLE] = "idle",
> +		[RUNNER_RUNNING] = "running",
> +		[RUNNER_DONE] = "done",
> +		[RUNNER_CANCELLED] = "cancelled"
> +	};
> +	// clang-format on
> +
> +	if (state < RUNNER_IDLE || state > RUNNER_CANCELLED)
> +		return "unknown";
> +	return state_name_[state];
> +}
> +
> +struct runner_context {
> +	int refcount;
> +	int status;
> +	struct timespec deadline;
> +	pthread_t thr;
> +	void (*func)(void *data);
> +	/* User data will be copied into this area */

The data array is currently pointer aligned. We might want to put a
comment mentioning that it must stay that way, or force an alignment on
it.

> +	char data[];
> +};
> +
> +static void release_context(struct runner_context *ctx)
> +{
> +	int n;
> +
> +	n = uatomic_sub_return(&ctx->refcount, 1);
> +	assert(n >= 0);
> +
> +	if (n == 0)
> +		free(ctx);
> +}
> +
> +static void cleanup_context(struct runner_context **ctx)
> +{
> +	if (*ctx)
> +		release_context(*ctx);
> +}
> +
> +static void *runner_thread(void *arg)
> +{
> +	int st, refcount;
> +	/*
> +	 * The cleanup function makes sure memory is freed if the thread is
> +	 * cancelled (-fexceptions).
> +	 */
> +	struct runner_context *ctx __attribute__((cleanup(cleanup_context))) = arg;
> +
> +	refcount = uatomic_add_return(&ctx->refcount, 1);
> +	assert(refcount == 2);
> +
> +	st = uatomic_cmpxchg(&ctx->status, RUNNER_IDLE, RUNNER_RUNNING);
> +	/*
> +	 * The thread has already been cancelled before it was even started.
> +	 * In this case, cancel_runner() doesn't release the context.
> +	 * Do it here. See comments for RUNNER_IDLE in cancel_runner().
> +	 */
> +	if (st != RUNNER_IDLE) {
> +		release_context(ctx);
> +		return NULL;
> +	}
> +
> +	(*ctx->func)(ctx->data);
> +	uatomic_cmpxchg(&ctx->status, RUNNER_RUNNING, RUNNER_DONE);
> +	return NULL;
> +}
> +
> +void cancel_runner(struct runner_context *ctx)
> +{
> +	int st = uatomic_cmpxchg(&ctx->status, RUNNER_RUNNING, RUNNER_CANCELLED);
> +	int level = 4, retry = 1;
> +
> +repeat:
> +	switch (st) {
> +	case RUNNER_IDLE:
> +		/*
> +		 * Race with thread startup.
> +		 *
> +		 * If after the following cmpxchg st is still IDLE, the cmpxchg
> +		 * in runner_thread() will return CANCELLED, and the context
> +		 * will be relased there. Otherwise, the thread has switched
> +		 * to RUNNING in the meantime, and we will be able to cancel
> +		 * it regularly if we retry.
> +		 */
> +		if (retry--) {
> +			st = uatomic_cmpxchg(&ctx->status, RUNNER_IDLE,
> +					     RUNNER_CANCELLED);
> +			goto repeat;
> +		}
> +		level = 3;
> +		break;
> +	case RUNNER_RUNNING:
> +		pthread_cancel(ctx->thr);
> +		/* Caller has no interest in the data any more */
> +		release_context(ctx);
> +		return;
> +	case RUNNER_DONE:
> +		/* Caller has no interest in the data any more */
> +		release_context(ctx);
> +		/* fallthrough */
> +	default:
> +		level = 3;
> +		break;
> +	}
> +	condlog(level, "%s: runner cancelled in state '%s', ctx=%p", __func__,
> +		runner_state_name(st), ctx);
> +}
> +
> +struct runner_context *get_runner(runner_func func, void *data,
> +				  unsigned int size, unsigned long timeout_usec)
> +{
> +	static const struct timespec time_zero = {.tv_sec = 0};
> +	struct runner_context *ctx;
> +	pthread_attr_t attr;
> +	int rc;
> +
> +	assert(func);
> +	assert(data);
> +	assert(size > 0);

Why asserts? We already need to deal with this function failing. I'd
prefer error messages and failure, instead of crashing later when a null
value gets dereferenced, for instance. I admit, it's unlikley that these
arguments will be set wrong, but if there is a bug that makes it into
production code, these asserts will be disabled, and a log message
telling me what went wrong would be helpful.

> +
> +	ctx = malloc(sizeof(*ctx) + size);
> +	if (!ctx)
> +		return NULL;
> +
> +	ctx->func = func;
> +	uatomic_set(&ctx->refcount, 1);
> +	uatomic_set(&ctx->status, RUNNER_IDLE);
> +	memcpy(ctx->data, data, size);
> +
> +	if (timeout_usec) {
> +		get_monotonic_time(&ctx->deadline);
> +		ctx->deadline.tv_sec += timeout_usec / MILLION;
> +		ctx->deadline.tv_nsec += (timeout_usec % MILLION) * 1000;
> +	} else
> +		ctx->deadline = time_zero;
> +

Don't think we need the smm_smp_wmb(). From
https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap04.html#tag_04_15_02
"The following functions synchronize memory with respect to other threads:" 
lists pthread_create().

-Ben

> +	/*
> +	 * This pairs with the implicit barrier in uatomic_add_return()
> +	 * in runner_thread()
> +	 */
> +	cmm_smp_wmb();
> +
> +	setup_thread_attr(&attr, STACK_SIZE, 1);
> +	rc = pthread_create(&ctx->thr, &attr, runner_thread, ctx);
> +	pthread_attr_destroy(&attr);
> +
> +	if (rc) {
> +		condlog(1, "%s: pthread_create(): %s", __func__, strerror(rc));
> +		release_context(ctx);
> +		return NULL;
> +	}
> +	return ctx;
> +}
> +
> +int check_runner(struct runner_context *ctx, void *data, unsigned int size)
> +{
> +	int st = uatomic_read(&ctx->status);
> +
> +	switch (st) {
> +	case RUNNER_DONE:
> +		if (data)
> +			/* hand back the data to the caller */
> +			memcpy(data, ctx->data, size);
> +		/* caller is done with this context */
> +		release_context(ctx);
> +		/* fallthrough */
> +	case RUNNER_CANCELLED:
> +		return st;
> +	case RUNNER_IDLE:
> +	case RUNNER_RUNNING:
> +		if (ctx->deadline.tv_sec != 0 || ctx->deadline.tv_nsec != 0) {
> +			struct timespec now;
> +
> +			get_monotonic_time(&now);
> +			if (timespeccmp(&ctx->deadline, &now) <= 0) {
> +				cancel_runner(ctx);
> +				return RUNNER_CANCELLED;
> +			}
> +		}
> +		return RUNNER_RUNNING;
> +	default:
> +		condlog(1, "%s: runner in impossible state '%s'", __func__,
> +			runner_state_name(st));
> +		assert(false);
> +		return st;
> +	}
> +}
> diff --git a/libmpathutil/runner.h b/libmpathutil/runner.h
> new file mode 100644
> index 0000000..0c9cf80
> --- /dev/null
> +++ b/libmpathutil/runner.h
> @@ -0,0 +1,84 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +// Copyright (c) 2026 SUSE LLC
> +#ifndef RUNNER_H_INCLUDED
> +#define RUNNER_H_INCLUDED
> +
> +enum runner_status {
> +	RUNNER_IDLE,
> +	RUNNER_RUNNING,
> +	RUNNER_DONE,
> +	RUNNER_CANCELLED,
> +};
> +
> +typedef void (*runner_func)(void *data);
> +struct runner_context;
> +
> +/**
> + * runner_state_name(): helper for printing runner states
> + *
> + * @param state: a valid @runner_status value
> + * @returns: a string describing the status
> + */
> +const char *runner_state_name(int state);
> +
> +/**
> + * get_runner(): start a runner thread
> + *
> + * This function starts a runner thread that calls @func(@data).
> + * The thread is created as detached thread.
> + * @data will be copied to thread-private memory, which will be freed when
> + * the runner terminates.
> + * Output values can be retrieved with @check_runner().
> + *
> + * @param func: the worker function to invoke
> + *        This parameter must not be NULL.
> + * @param data: pointer the data to pass to the function (input and output)
> + *        This parameter must not be NULL.
> + * @param size: the size (in bytes) of the data passed to the function
> + *        This parameter must be positive.
> + * @param timeout_sec: timeout (in seconds) after which to cancel the runner.
> + *        If it is 0, the runner will not time out.
> + * @returns: a runner context that must be passed to the functions below.
> + */
> +struct runner_context *get_runner(runner_func func, void *data,
> +				  unsigned int size, unsigned long timeout_usec);
> +
> +/**
> + * cancel_runner(): cancel a runner thread.
> + *
> + * This function should be called to terminate runners on program exit,
> + * or to terminate runners that have no timeout set in @get_runner,
> + * and haven't completed.
> + * Upon return of this function, @ctx becomes stale and shouldn't accessed
> + * any more.
> + *
> + * @param ctx: the context of the runner to be cancelled.
> + */
> +void cancel_runner(struct runner_context *ctx);
> +
> +/**
> + * check_runner(): query the state of a runner thread and obtain results
> + *
> + * Check the state of a runner previously started with @get_runner. If the
> + * thread function has completed, @RUNNER_DONE will be returned, and the
> + * user data will be copied into the @data argument. If @check_runner returns
> + * anything else, @data contains no valid data. The @size argument
> + * will typically be the same as the @size passed to @get_runner (meaning
> + * that @data represents an object of the same type as the @data argument
> + * passed to @get_runner previously).
> + * If @check_runner returns @RUNNER_CANCELLED or @RUNNER_DONE, the @ctx object
> + * has become stale and must not be used any more.
> + * @RUNNER_CANCELLED is returned if the timeout set in @get_runner has passed
> + * and the worker function hasn't returned yet.
> + *
> + * @param ctx: the context of the runner to be queried
> + * @param data: memory pointer that will receive results of the worker
> + * function. Can be NULL, in which case no data will be copied.
> + * @param size: size of the memory pointed to by data. It must be no bigger
> + *        than the size of the memory passed to @get_runner for this runner.
> + *        It can be smaller, but no more than @size bytes will be copied.
> + * @returns: @RUNNER_RUNNING, @RUNNER_DONE, or @RUNNER_CANCELLED.
> + */
> +int check_runner(struct runner_context *ctx, void *data, unsigned int size);
> +
> +#endif /* RUNNER_H_INCLUDED */
> -- 
> 2.53.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/4] multipath-tools tests: add test program for thread runners
  2026-03-19 22:13 ` [PATCH 3/4] multipath-tools tests: add test program for " Martin Wilck
@ 2026-03-24  5:47   ` Benjamin Marzinski
  0 siblings, 0 replies; 11+ messages in thread
From: Benjamin Marzinski @ 2026-03-24  5:47 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Christophe Varoqui, Brian Bunker, dm-devel, Martin Wilck

On Thu, Mar 19, 2026 at 11:13:43PM +0100, Martin Wilck wrote:
> Add a test program for the "runner" thread implementation from
> the previous commit. The test program runs simulated "hanging" threads that
> may time out, and optionally kills all threads at an arbitrary point in
> time. See the comments at the top of the file for details.
> 
> Also add a test driver script (runner-test.sh) with a few reasonable
> combinations of command line arguments.
> 
> The test program has been used to test the "runner" implementation
> extensively on different architectures (x86_64, aarch64, ppc64le), using
> both valgrind and the gcc address sanitizer (libasan) for detection of
> memory leaks and use-after-free errors.
> 
> For valgrind, a suppression file needs to be added, as valgrind doesn't
> seem to capture the deallocation of thread local storage for detached
> threads in the test case where the test program is killed. The suppression
> affects only memory allocated by glibc. This leak has not been seen with
> libasan, only with valgrind.
> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  tests/Makefile         |  15 +-
>  tests/runner-test.sh   |  37 +++
>  tests/runner-test.supp |  15 ++
>  tests/runner.c         | 530 +++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 593 insertions(+), 4 deletions(-)
>  create mode 100755 tests/runner-test.sh
>  create mode 100644 tests/runner-test.supp
>  create mode 100644 tests/runner.c
> 
> [snip]
>
> diff --git a/tests/runner.c b/tests/runner.c
> new file mode 100644
> index 0000000..21a44ca
> --- /dev/null
> +++ b/tests/runner.c
> @@ -0,0 +1,530 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +
> +/*
> + * Test reliability of the runner implementation.
> + *
> + * This tests simulates "path checkers" being started through the
> + * runner code. It creates threads that run for a variable amount of time,
> + * optionally ignoring cancellation signals. The runners have a fixed
> + * timeout (TIMEOUT_USEC, "-t" option), after which they are considered
> + * "hanging" and will be cancelled. The actual runtime of the runner is random.
> + * It varies between (TIMEOUT_USEC - NOISE USEC) and
> + * (TIMEOUT_USEC + NOISE_BIAS * NOISE_USEC). These noise parameters are
> + * set with the "-n" and "-b" option, respectively.
> + * This allows simulating frequent races between cancellation and regular
> + * completion of threads. Note that because timers in different threads
> + * aren't started simultaneously, NOISE_USEC = 0 doesn't mean that the
> + * runners will complete exactly at the point in time when there timer
> + * expires.
> + * The runners simulate waiting checkers by simply sleeping. Optionally,
> + * they can ignore cancellation while sleeping (IGNORE_CANCEL, -i) or
> + * divide the sleep time in multiple "steps" (-s), between which they can
> + * be cancelled.
> + * Like multipathd, the main thread "polls" the status of the runners in
> + * regular time intervals that are set with POLL_USEC (-p). Because of
> + * this polling behavior, it can happen that a runner finishes after
> + * its timeout has expired. The runner code (and this test) treats this case
> + * as successful completion.
> + * If a runner completes, the result is checked against the expected value.
> + * The number of threads that have not finished (either successfully or
> + * cancelled, plus the number of wrong results of completed runners is
> + * the error count.
> + * The N_RUNNERS (-N) option determines how many simultaneous threads are
> + * started.
> + * The test runs until all runners have either completed or expired, or
> + * until a maximum wait time is reached, which is calculated from the
> + * test parameters (max_wait in run_test()). The REPEAT (-r) parameter
> + * determines the number of times the entire test is repeated.
> + * The KILL_TIMEOUT (-k) parameter is for simulating a shutdown of the
> + * main program (think multipathd). When this timeout expires, all pending
> + * runners are cancelled and the program terminates.
> + *
> + * A "realistic" simulation of multipathd path checkers would use options
> + * roughly like this:
> + *
> + *   runner-test -N 1000 -p 1000 -t 30000 -n 29990 -b 5 -s 1 -i -r 20 -k 300000
> + *
> + * (note that time options are in ms, whereas the code uses us), but this
> + * takes a very long time to run.
> + *
> + * Scaled down, it becomes:
> + *
> + *   runner-test -N 1000 -p 100 -t 3000 -n 2999 -b 5 -s 1 -i -r 20 -k 30000
> + *
> + * A less realistic run with high likelihood of completionc / cancellation races:
> + *
> + *   runner-test -N 1000 -p 10 -t 3000 -n 1 -b 1 -s 1 -i -r 20
> + *
> + * Here, all runners finish in a +-1ms timer interval around the timeout.
> + * Even with -n 0 (no noise), with a sufficient number of runners, some runners
> + * will time out.
> + */
> +
> +#include <time.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <unistd.h>
> +#include <signal.h>
> +#include <pthread.h>
> +#include <stdbool.h>
> +#include <sched.h>
> +#include <errno.h>
> +#include <sys/select.h>
> +#include <sys/wait.h>
> +#include "debug.h"
> +#include "time-util.h"
> +#include "runner.h"
> +#include "runner.h"
> +#include "globals.c"
> +
> +#define MILLION 1000000
> +#define BILLION 1000000000
> +
> +static int N_RUNNERS = 100;
> +/* sleep time between runner status polling */
> +static long POLL_USEC = 10000;
> +/* timeout for runner */
> +static long TIMEOUT_USEC = 100000;
> +/* random noise to subtract / add to the sleep time */
> +static long NOISE_USEC = 10000;
> +/*
> + * Factor to increase noise towards longer sleep times (timeouts).
> + * The actual sleep time will be in the interval
> + * [ TIMEOUT_USEC - NOISE_USEC, TIMEOUT_USEC + NOISE_USEC * NOISE_BIAS ]
> + */
> +static int NOISE_BIAS = 5;
> +/* number of sleep intervals the runner uses */
> +static int SLEEP_STEPS = 1;
> +/* time after which to kill all runners */
> +static long KILL_TIMEOUT = 0;
> +/* number of repeated runs */
> +static int REPEAT = 10;
> +/* whether to ignore cancellation signals */
> +static bool IGNORE_CANCEL = false;
> +
> +/* gap in the paylod to similate larger size */
> +#define PAYLOAD_GAP 128
> +
> +struct payload {
> +	long wait_nsec;
> +	int steps;
> +	bool ignore_cancel;
> +	int start;
> +	char pad[PAYLOAD_GAP];
> +	int end;
> +};
> +
> +static void wait_and_add_1(void *arg)
> +{
> +	struct payload *t1 = arg;
> +	struct timespec wait;
> +	int i, cancelstate;
> +
> +	wait.tv_sec = t1->wait_nsec / BILLION;
> +	wait.tv_nsec = t1->wait_nsec % BILLION;
> +	normalize_timespec(&wait);
> +	for (i = 0; i < t1->steps; i++) {
> +		if (t1->ignore_cancel)
> +			pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, &cancelstate);
> +		if (nanosleep(&wait, NULL) != 0 && errno != EINTR)
> +			condlog(3, "%s: nanosleep: %s", __func__, strerror(errno));
> +		if (t1->ignore_cancel) {
> +			pthread_setcancelstate(cancelstate, NULL);
> +		}
> +		pthread_testcancel();
> +	}
> +	t1->end = t1->start + 1;
> +}
> +
> +static bool payload_error(const struct payload *p)
> +{
> +	return p->end != p->start + 1;
> +}
> +
> +static int check_payload(struct runner_context *ctx, bool *error)
> +{
> +	struct timespec now;
> +	struct payload t1;
> +	int st = check_runner(ctx, &t1, sizeof(t1));
> +
> +	if (st == RUNNER_RUNNING)
> +		return st;
> +
> +	clock_gettime(CLOCK_MONOTONIC, &now);
> +	if (st == RUNNER_DONE) {
> +		if (error)
> +			*error = payload_error(&t1);
> +		condlog(3, "runner finished in state 'done' at %lld.%06lld, start %d end %d",
> +			(long long)now.tv_sec, (long long)now.tv_nsec / 1000,
> +			t1.start, t1.end);
> +	} else
> +		condlog(3, "runner finished in state '%s' at %lld.%06lld",
> +			runner_state_name(st), (long long)now.tv_sec,
> +			(long long)now.tv_nsec / 1000);
> +	return st;
> +}
> +
> +static struct runner_context *
> +start_runner(long usecs, int start, long noise_range_usec, long bias,
> +	     int steps, bool ignore_cancel)
> +{
> +
> +	struct payload t1;
> +	struct runner_context *ctx;
> +	long noise;
> +
> +	if (noise_range_usec > 0)
> +		noise = random() % ((bias + 1) * noise_range_usec) -
> +			noise_range_usec;
> +	else
> +		noise = 0;
> +	t1.start = start;
> +	t1.end = 0;
> +	t1.wait_nsec = (usecs + noise) * 1000 / steps;
> +	t1.wait_nsec = t1.wait_nsec > 0 ? t1.wait_nsec : 0;
> +	t1.steps = steps;
> +	t1.ignore_cancel = ignore_cancel;
> +
> +	ctx = get_runner(wait_and_add_1, &t1, sizeof(t1), usecs);
> +
> +	if (ctx) {
> +		struct timespec tmo, finish;
> +
> +		clock_gettime(CLOCK_MONOTONIC, &tmo);
> +		tmo.tv_sec += usecs / MILLION;
> +		tmo.tv_nsec += (usecs % MILLION) * 1000;
> +		finish = tmo;
> +		normalize_timespec(&tmo);
> +		finish.tv_sec += noise / MILLION;
> +		finish.tv_nsec += (noise % MILLION) * 1000;
> +		normalize_timespec(&finish);
> +		condlog(4, "started runner start %d timeout %lld.%06lld, finish %lld.%06lld (noise %ld), steps %d, %signoring cancellation",
> +			start, (long long)tmo.tv_sec,
> +			(long long)tmo.tv_nsec / 1000, (long long)finish.tv_sec,
> +			(long long)finish.tv_nsec / 1000, noise, steps,
> +			ignore_cancel ? "" : "not ");
> +		return ctx;
> +	} else {
> +		condlog(0, "failed to start runner for start %d", start);
> +		return NULL;
> +	}
> +}
> +
> +static struct runner_context **context;
> +static volatile bool must_stop = false;
> +void int_handler(int signal)
> +{
> +	must_stop = true;
> +}
> +
> +static void terminate_all(void)
> +{
> +	int i, count;
> +
> +	for (count = 0, i = 0; i < N_RUNNERS; i++)
> +		if (context[i]) {
> +			cancel_runner(context[i]);
> +			context[i] = NULL;
> +			count++;
> +		}
> +	condlog(3, "%s: %d runners cancelled", __func__, count);
> +	/* give runners a chance to clean up */
> +	sched_yield();
> +}
> +
> +static bool test_sleep(const struct timespec *wait)
> +{
> +	sigset_t set;
> +
> +	sigfillset(&set);
> +	sigdelset(&set, SIGTERM);
> +	sigdelset(&set, SIGINT);
> +	sigdelset(&set, SIGQUIT);
> +
> +	pselect(0, NULL, NULL, NULL, wait, &set);
> +
> +	if (!must_stop)
> +		return false;
> +
> +	terminate_all();
> +	return true;
> +}
> +
> +static int run_test(int n)
> +{
> +	int i, running, done, errors;
> +	const struct timespec wait = {.tv_sec = 0, .tv_nsec = 1000 * POLL_USEC};
> +	struct timespec stop, now;
> +	long max_wait = TIMEOUT_USEC + NOISE_BIAS * NOISE_USEC + 10000;
> +	bool killed = false;
> +
> +	for (i = 0; i < N_RUNNERS; i++)
> +		context[i] = start_runner(TIMEOUT_USEC, i, NOISE_USEC, NOISE_BIAS,
> +					  SLEEP_STEPS, IGNORE_CANCEL);
> +
> +	clock_gettime(CLOCK_MONOTONIC, &stop);
> +	stop.tv_sec += max_wait / MILLION;
> +	stop.tv_nsec += (max_wait % MILLION) * 1000;
> +	normalize_timespec(&stop);
> +	running = N_RUNNERS;
> +	done = 0;
> +	errors = 0;
> +	do {
> +		bool err = false;
> +
> +		condlog(4, "%d runners active", running);
> +		killed = test_sleep(&wait);
> +		if (killed)
> +			condlog(3, "%s: terminating on signal...", __func__);
> +		for (running = 0, i = 0; i < N_RUNNERS; i++) {
> +			int st;
> +
> +			if (!context[i])
> +				continue;
> +			st = check_payload(context[i], &err);
> +			switch (st) {
> +			case RUNNER_DONE:
> +				if (err)
> +					errors++;
> +				done++;
> +				/* fallthrough */
> +			case RUNNER_CANCELLED:
> +				context[i] = NULL;
> +				break;
> +			default:
> +				running++;
> +				break;
> +			}
> +		}
> +		if (killed)
> +			break;
> +		clock_gettime(CLOCK_MONOTONIC, &now);
> +		if (timespeccmp(&stop, &now) <= 0)
> +			break;
> +	} while (running);
> +
> +	condlog(2, "%10d%10d%10d%10d%10d", n, N_RUNNERS, N_RUNNERS - running,
> +		done, errors);
> +	if (killed) {
> +		condlog(2, "%s: termination signal received", __func__);
> +		exit(0);
> +	}
> +
> +	if (running > 0) {
> +		condlog(1, "ERROR: %d runners haven't finished", running);
> +		terminate_all();
> +	}
> +	return running + errors;
> +}
> +
> +static void free_ctxs(struct runner_context ***ctxs)
> +{
> +	if (*ctxs)
> +		free(*ctxs);
> +}
> +
> +static int setup_signal_handler(int sig, void (*handler)(int))
> +{
> +	sigset_t set;
> +	sigfillset(&set);
> +	struct sigaction sga = {.sa_handler = NULL};
> +
> +	sga.sa_handler = int_handler;

Typo. Since you pass in a handler, it should be

sga.sa_handler = handler;

not that it actually makes any difference.

-Ben

> +	sga.sa_mask = set;
> +	if (sigaction(sig, &sga, NULL) != 0) {
> +		condlog(1, "%s: failed to install signal handler for %d: %s",
> +			__func__, sig, strerror(errno));
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +int run_tests(void)
> +{
> +	int errors = 0, i;
> +	struct runner_context **ctxs __attribute__((cleanup(free_ctxs))) = NULL;
> +
> +	if (setup_signal_handler(SIGINT, int_handler) != 0)
> +		return -1;
> +
> +	ctxs = calloc(N_RUNNERS, sizeof(*context));
> +	if (ctxs == NULL)
> +		/* arbitrary number to indicate OOM error */
> +		return 7000;
> +	context = ctxs;
> +	for (i = 0; i < REPEAT; i++) {
> +		errors += run_test(i + 1);
> +	}
> +	return errors ? 1 : 0;
> +}
> +
> +/* We need to register a dummy handler to avoid system call restarting in
> + * pselect() below */
> +static void dummy_handler(int sig) {}
> +
> +static int fork_test(void)
> +{
> +	sigset_t set;
> +	pid_t child;
> +	int wstatus;
> +	struct timespec wait_to_kill = {.tv_sec = 0};
> +
> +	/* Block all signals. termination signals will be enabled in test_sleep() */
> +	sigfillset(&set);
> +	pthread_sigmask(SIG_SETMASK, &set, NULL);
> +
> +	child = fork();
> +
> +	if (child < 0) {
> +		condlog(0, "error in fork(), %s", strerror(errno));
> +		return -1;
> +	} else if (child == 0) {
> +		/* child */
> +		int rc = run_tests();
> +		exit(rc ? 1 : 0);
> +	}
> +
> +	setup_signal_handler(SIGCHLD, dummy_handler);
> +
> +	/* parent */
> +	if (KILL_TIMEOUT > 0) {
> +		sigset_t set;
> +
> +		condlog(3, "%s: == Child %d will be killed with SIGINT after %ld us",
> +			__func__, child, KILL_TIMEOUT);
> +		wait_to_kill.tv_sec = KILL_TIMEOUT / MILLION;
> +		wait_to_kill.tv_nsec = (KILL_TIMEOUT % MILLION) * 1000;
> +
> +		/*
> +		 * Unblock SIGCHLD in case thild terminates
> +		 * (child will receive SIGINT)
> +		 */
> +		sigfillset(&set);
> +		sigdelset(&set, SIGCHLD);
> +		if (pselect(0, NULL, NULL, NULL, &wait_to_kill, &set) != 0) {
> +			if (errno == EINTR)
> +				condlog(2, "main: child terminated");
> +			else
> +				condlog(1, "main: error in pselect: %s",
> +					strerror(errno));
> +		} else
> +			kill(child, SIGINT);
> +	}
> +
> +	if (waitpid(child, &wstatus, 0) <= 0) {
> +		condlog(1, "%s: failed to wait for child %d", __func__, child);
> +		return -1;
> +	}
> +	if (WIFEXITED(wstatus)) {
> +		condlog(3, "%s: child %d return code %d", __func__, child,
> +			WEXITSTATUS(wstatus));
> +		return WEXITSTATUS(wstatus);
> +	} else if (WIFSIGNALED(wstatus)) {
> +		condlog(2, "%s: child %d killed by signal code %d", __func__,
> +			child, WTERMSIG(wstatus));
> +		return -1;
> +	} else {
> +		condlog(1, "%s: unexpected status of child %d", __func__, child);
> +		return -1;
> +	}
> +}
> +
> +static long parse_number(const char *arg, long factor, long deflt)
> +{
> +	char *ep;
> +	long v;
> +
> +	if (*arg) {
> +		v = strtol(arg, &ep, 10);
> +		if (!*ep && v >= 0)
> +			return factor * v;
> +	}
> +	condlog(1, "invalid argument: %s, using %ld", arg, deflt);
> +	return deflt;
> +}
> +
> +static int usage(const char *cmd, int opt)
> +{
> +#define USAGE_FMT \
> +	"Usage: %s [options]\n" \
> +		"	-N runners:  number of parallel runners\n" \
> +		"	-p msecs:    time to sleep between status polls in main thread\n" \
> +		"	-t msecs:    timeout for runners\n" \
> +		"	-n msecs:    random noise for runner sleep time\n" \
> +		"	-b bias:     noise increase factor towards sleeping longer\n" \
> +		"	-s n:        number of steps to divide sleep time into\n" \
> +		"	-k msecs:    timeout after which to kill all runners (0: don't kill)\n" \
> +		"	-r n:        number of times to repeat test\n" \
> +		"	-i:          runners ignore cancellation while sleeping\n" \
> +		"	-v n:        set verbosity level (default 2)\n" \
> +		"	-h:          print this help"
> +	condlog(0, USAGE_FMT, cmd);
> +	return opt == 'h' ? 0 : 1;
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +	int opt;
> +	int total = 0;
> +	const char *optstring = "+:N:p:t:n:b:s:k:r:v:ih";
> +
> +	init_test_verbosity(2);
> +
> +	while ((opt = getopt(argc, argv, optstring)) != -1) {
> +		switch (opt) {
> +		case 'N':
> +			N_RUNNERS = parse_number(optarg, 1L, N_RUNNERS);
> +			break;
> +		case 'p':
> +			POLL_USEC = parse_number(optarg, 1000L, POLL_USEC);
> +			break;
> +		case 't':
> +			TIMEOUT_USEC = parse_number(optarg, 1000L, TIMEOUT_USEC);
> +			break;
> +		case 'n':
> +			NOISE_USEC = parse_number(optarg, 1000L, NOISE_USEC);
> +			break;
> +		case 'b':
> +			NOISE_BIAS = parse_number(optarg, 1L, NOISE_BIAS);
> +			break;
> +		case 's':
> +			SLEEP_STEPS = parse_number(optarg, 1L, SLEEP_STEPS);
> +			break;
> +		case 'k':
> +			KILL_TIMEOUT = parse_number(optarg, 1000L, KILL_TIMEOUT);
> +			break;
> +		case 'r':
> +			REPEAT = parse_number(optarg, 1L, REPEAT);
> +			break;
> +		case 'i':
> +			IGNORE_CANCEL = true;
> +			break;
> +		case 'v':
> +			libmp_verbosity = parse_number(optarg, 1L, libmp_verbosity);
> +			break;
> +		case 'h':
> +		case ':':
> +		case '?':
> +			return usage(argv[0], opt);
> +			break;
> +		}
> +	}
> +
> +	if (optind != argc)
> +		return usage(argv[0], '?');
> +
> +	condlog(2, "Runner: timeout=%ld, noise interval=[%ld:%ld], steps=%d",
> +		TIMEOUT_USEC, TIMEOUT_USEC - NOISE_USEC,
> +		TIMEOUT_USEC + NOISE_BIAS * NOISE_USEC, SLEEP_STEPS);
> +	condlog(2, "Other : poll interval=%ld, ignore cancellation=%s, runners=%d, repeat=%d, kill timeout=%ld",
> +		POLL_USEC, IGNORE_CANCEL ? "YES" : "NO", N_RUNNERS, REPEAT,
> +		KILL_TIMEOUT);
> +	condlog(2, "%10s%10s%10s%10s%10s", "run", "total", "finished",
> +		"completed", "errors");
> +
> +	total = fork_test();
> +	if (total == -1)
> +		return 130;
> +	condlog(2, "== TOTAL NUMBER OF FAILED RUNS: %d", total);
> +	return total ? 1 : 0;
> +}
> -- 
> 2.53.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/4] libmultipath: TUR checker: use runner threads
  2026-03-19 22:13 ` [PATCH 4/4] libmultipath: TUR checker: use runner threads Martin Wilck
@ 2026-03-24  6:38   ` Benjamin Marzinski
  2026-03-24 12:24     ` Martin Wilck
  0 siblings, 1 reply; 11+ messages in thread
From: Benjamin Marzinski @ 2026-03-24  6:38 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Christophe Varoqui, Brian Bunker, dm-devel, Martin Wilck

On Thu, Mar 19, 2026 at 11:13:44PM +0100, Martin Wilck wrote:
> Use the generic runners to simplify the TUR checker code.
> 
> The logic is the same as before, with one exception: the runner API does
> not allow to track a thread after it has been cancelled, like our current
> code does. The current code used this to determine when a previously
> cancelled thread eventually did complete. The use case for doing this in
> the current code was the MAX_NR_RUNNERS logic: the code waited "forever"
> for the "last" checker to complete. This is not possible with runners;
> once a runner is cancelled, it is off limits. (Waiting for a zombie thread to
> write to memory belonging to the main program, as the current code does,
> doesn't seem to be the best idea, anyway).
> 
> Therefore this patch implements a different logic for MAX_NR_RUNNERS. When
> nr_timeouts reaches the limit, the last checker is spawned with an infinite
> timeout. Like before, no new checker thread will be spawned until this last
> checker eventually completes. One difference to the previous code is that
> this last checker thread is never cancelled (after all, it never times
> out). But this code is meant for the case where cancellation is not
> effective, anyway. Another difference is that the new code will report
> PATH_TIMEOUT state as soon as this last thread is started rather than
> PATH_PENDING. That seems reasonable given that the checker previously timed
> out for this thread.

I'm not super enthused about this way of handling the misbehaving
threads. It's probably o.k. But what if the issue is that the checker
isn't completing by itself at all, but it would eventually get
cancelled, just not till after we timeout. In this case, we should be
cancelling that last runner. Admittedly, it's kinda hard to believe that
a new runner would work correctly, while the old one still is stuck
needing to be cancelled, but what we had before definitely worked.

I don't really see why we can't continue with roughly what we had
before. We just need a new state for check_runner_state() to return.
Something like RUNNER_STOPPING. cancel_runner() would cmpxchg with
RUNNER_STOPPING, and not free the runner_context. In check_runner(), if
the state was RUNNER_STOPPING or you just called cancel_runner(), you
would check the ctx->refcount. If it was already 1, you would release
the context and return RUNNER_CANCELLED. This way RUNNER_CANCELLED still
means that the context has been freed, but it would only happen when the
thread has ended. If you want to stop paying attention to a runner in
RUNNER_STOPPING, you just call release_context() manually to drop your
reference without waiting for it to finish.

Then, if you still have a runner in RUNNER_STOPPING after calling
check_runner_state() at the start of libcheck_check(), you drop your
reference to it if ct->nr_timeouts < MAX_NR_TIMEOUTS. Otherwise you just
return PATH_TIMEOUT.

Thoughts? Like I said, I think what you have will probably be fine.
I just know what we had before worked.

More comments below.

> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  libmultipath/checkers/tur.c | 341 +++++++++++-------------------------
>  1 file changed, 103 insertions(+), 238 deletions(-)
> 
> diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c
> index ba4ca68..6f22c45 100644
> --- a/libmultipath/checkers/tur.c
> +++ b/libmultipath/checkers/tur.c
> @@ -3,7 +3,6 @@
>   *
>   * Copyright (c) 2004 Christophe Varoqui
>   */
> -#include <stdio.h>
>  #include <stdlib.h>
>  #include <string.h>
>  #include <sys/types.h>
> @@ -17,13 +16,13 @@
>  #include <pthread.h>
>  #include <urcu.h>
>  #include <urcu/uatomic.h>
> +#include <assert.h>
>  
>  #include "checkers.h"
>  
>  #include "debug.h"
>  #include "sg_include.h"
> -#include "util.h"
> -#include "time-util.h"
> +#include "runner.h"
>  
>  #define TUR_CMD_LEN 6
>  #define HEAVY_CHECK_COUNT       10
> @@ -45,21 +44,20 @@ const char *libcheck_msgtable[] = {
>  	NULL,
>  };
>  
> -struct tur_checker_context {
> -	dev_t devt;
> -	int state;
> -	int running; /* uatomic access only */
> +struct tur_context {
>  	int fd;
> +	dev_t devt;
>  	unsigned int timeout;
> -	time_t time;
> -	pthread_t thread;
> -	pthread_mutex_t lock;
> -	pthread_cond_t active;
> -	int holders; /* uatomic access only */
> -	int msgid;
> +	int state;
> +	short msgid;
> +};
> +
> +struct tur_checker_context {
>  	struct checker_context ctx;
> +	int last_runner_state;
>  	unsigned int nr_timeouts;
> -	bool checked_state;
> +	struct runner_context *rtx;
> +	struct tur_context tcx;

There are lots of structures named *_context with names like *tx.
Perhaps we could rename the tur_context something like tur_data?

Also, ctx is the checker_context here, and the runner_contect in the
runner.c code. I think it would be easier to follow and grep if we kept
the runner_context named rtx in both places.

>  };
>  
>  int libcheck_init (struct checker * c)
> @@ -67,48 +65,26 @@ int libcheck_init (struct checker * c)
>  	struct tur_checker_context *ct;

Also the tur_checker_context is tcc in every function but this one. I
think it helpful to always have the same name.

>  	struct stat sb;
>  
> -	ct = malloc(sizeof(struct tur_checker_context));
> -	if (!ct)
> -		return 1;
> -	memset(ct, 0, sizeof(struct tur_checker_context));
> -
> -	ct->state = PATH_UNCHECKED;
> -	ct->fd = -1;
> -	uatomic_set(&ct->holders, 1);
> -	pthread_cond_init_mono(&ct->active);
> -	pthread_mutex_init(&ct->lock, NULL);
> +	ct = calloc(1, sizeof(*ct));
> +	ct->tcx.state = PATH_UNCHECKED;
> +	ct->tcx.fd = -1;
>  	if (fstat(c->fd, &sb) == 0)
> -		ct->devt = sb.st_rdev;
> +		ct->tcx.devt = sb.st_rdev;
>  	ct->ctx.cls = c->cls;
>  	c->context = ct;
> -
>  	return 0;
>  }
>  
> -static void cleanup_context(struct tur_checker_context *ct)
> -{
> -	pthread_mutex_destroy(&ct->lock);
> -	pthread_cond_destroy(&ct->active);
> -	free(ct);
> -}
> -
>  void libcheck_free (struct checker * c)
>  {
> -	if (c->context) {
> -		struct tur_checker_context *ct = c->context;
> -		int holders;
> -		int running;
> +	struct tur_checker_context *tcc = c->context;
>  
> -		running = uatomic_xchg(&ct->running, 0);
> -		if (running)
> -			pthread_cancel(ct->thread);
> -		ct->thread = 0;
> -		holders = uatomic_sub_return(&ct->holders, 1);
> -		if (!holders)
> -			cleanup_context(ct);
> -		c->context = NULL;
> -	}
> -	return;
> +	if (!tcc)
> +		return;
> +	c->context = NULL;
> +	if (tcc->rtx)
> +		cancel_runner(tcc->rtx);
> +	free(tcc);
>  }
>  
>  static int
> @@ -216,19 +192,6 @@ retry:
>  	return PATH_UP;
>  }
>  
> -#define tur_thread_cleanup_push(ct) pthread_cleanup_push(cleanup_func, ct)
> -#define tur_thread_cleanup_pop(ct) pthread_cleanup_pop(1)
> -
> -static void cleanup_func(void *data)
> -{
> -	int holders;
> -	struct tur_checker_context *ct = data;
> -
> -	holders = uatomic_sub_return(&ct->holders, 1);
> -	if (!holders)
> -		cleanup_context(ct);
> -}
> -
>  /*
>   * Test code for "zombie tur thread" handling.
>   * Compile e.g. with CFLAGS=-DTUR_TEST_MAJOR=8
> @@ -273,110 +236,82 @@ static void tur_deep_sleep(const struct tur_checker_context *ct)
>  #define tur_deep_sleep(x) do {} while (0)
>  #endif /* TUR_TEST_MAJOR */
>  
> -void *libcheck_thread(struct checker_context *ctx)
> +void runner_callback(void *arg)
>  {
> -	struct tur_checker_context *ct =
> -		container_of(ctx, struct tur_checker_context, ctx);
> -	int state, running;
> -	short msgid;
> +	struct tur_context *tcx = arg;
> +	int state;
>  
> -	/* This thread can be canceled, so setup clean up */
> -	tur_thread_cleanup_push(ct);
> +	condlog(4, "%d:%d : tur checker starting up", major(tcx->devt),
> +		minor(tcx->devt));
>  
> -	condlog(4, "%d:%d : tur checker starting up", major(ct->devt),
> -		minor(ct->devt));
> -
> -	tur_deep_sleep(ct);
> -	state = tur_check(ct->fd, ct->timeout, &msgid);
> +	tur_deep_sleep(tcx);
> +	state = tur_check(tcx->fd, tcx->timeout, &tcx->msgid);
> +	tcx->state = state;
>  	pthread_testcancel();
> -
> -	/* TUR checker done */
> -	pthread_mutex_lock(&ct->lock);
> -	ct->state = state;
> -	ct->msgid = msgid;
> -	pthread_cond_signal(&ct->active);
> -	pthread_mutex_unlock(&ct->lock);
> -
> -	condlog(4, "%d:%d : tur checker finished, state %s", major(ct->devt),
> -		minor(ct->devt), checker_state_name(state));
> -
> -	running = uatomic_xchg(&ct->running, 0);
> -	if (!running)
> -		pause();
> -
> -	tur_thread_cleanup_pop(ct);
> -
> -	return ((void *)0);
> +	condlog(4, "%d:%d : tur checker finished, state %s", major(tcx->devt),
> +		minor(tcx->devt), checker_state_name(state));
>  }
>  
> -static void tur_set_async_timeout(struct checker *c)
> +static int check_runner_state(struct tur_checker_context *tcc)
>  {
> -	struct tur_checker_context *ct = c->context;
> -	struct timespec now;
> +	struct runner_context *rtx = tcc->rtx;
> +	int rc;
>  
> -	get_monotonic_time(&now);
> -	ct->time = now.tv_sec + c->timeout;
> -}
> -
> -static int tur_check_async_timeout(struct checker *c)
> -{
> -	struct tur_checker_context *ct = c->context;
> -	struct timespec now;
> -
> -	get_monotonic_time(&now);
> -	return (now.tv_sec > ct->time);
> -}
> -
> -int check_pending(struct checker *c)
> -{
> -	struct tur_checker_context *ct = c->context;
> -	int tur_status = PATH_PENDING;
> -
> -	pthread_mutex_lock(&ct->lock);
> -
> -	if (ct->state != PATH_PENDING || ct->msgid != MSG_TUR_RUNNING)
> -	{
> -		tur_status = ct->state;
> -		c->msgid = ct->msgid;
> -	}
> -	pthread_mutex_unlock(&ct->lock);
> -	if (tur_status == PATH_PENDING && c->msgid == MSG_TUR_RUNNING) {
> +	rc = check_runner(rtx, &tcc->tcx, sizeof(tcc->tcx));
> +	switch (rc) {
> +	case RUNNER_DONE:
> +		tcc->last_runner_state = rc;
> +		tcc->rtx = NULL;
> +		tcc->nr_timeouts = 0;
> +		condlog(3, "%d:%d : tur checker finished, state %s",
> +			major(tcc->tcx.devt), minor(tcc->tcx.devt),
> +			checker_state_name(tcc->tcx.state));
> +		break;
> +	case RUNNER_CANCELLED:
> +		tcc->last_runner_state = rc;
> +		tcc->rtx = NULL;
> +		tcc->tcx.state = PATH_TIMEOUT;
> +		tcc->tcx.msgid = MSG_TUR_TIMEOUT;
> +		if (tcc->nr_timeouts < MAX_NR_TIMEOUTS)
> +			tcc->nr_timeouts++;
> +		condlog(3, "%d:%d : tur checker timed out",
> +			major(tcc->tcx.devt), minor(tcc->tcx.devt));
> +		break;
> +	case RUNNER_RUNNING:
>  		condlog(4, "%d:%d : tur checker still running",
> -			major(ct->devt), minor(ct->devt));
> -	} else {
> -		int running = uatomic_xchg(&ct->running, 0);
> -		if (running)
> -			pthread_cancel(ct->thread);
> -		ct->thread = 0;
> +			major(tcc->tcx.devt), minor(tcc->tcx.devt));
> +		tcc->tcx.msgid = MSG_TUR_RUNNING;
> +		break;
> +	default:
> +		assert(false);
> +		break;
>  	}
> -
> -	ct->checked_state = true;
> -	return tur_status;
> +	return rc;
>  }
>  
>  bool libcheck_need_wait(struct checker *c)
>  {
>  	struct tur_checker_context *ct = c->context;
> -	return (ct && ct->thread && uatomic_read(&ct->running) != 0 &&
> -		!ct->checked_state);
> +
> +	return ct && ct->rtx;
>  }
>  
>  int libcheck_pending(struct checker *c)
>  {
>  	struct tur_checker_context *ct = c->context;
> -
>  	/* The if path checker isn't running, just return the exiting value. */
> -	if (!ct || !ct->thread)
> +	if (!ct || !ct->rtx)
>  		return c->path_state;
>  
> -	return check_pending(c);
> +	/* This may nullify ct->rtx */
> +	check_runner_state(ct);
> +	c->msgid = ct->tcx.msgid;
> +	return ct->tcx.state;
>  }
>  
>  int libcheck_check(struct checker * c)
>  {
>  	struct tur_checker_context *ct = c->context;
> -	pthread_attr_t attr;
> -	int tur_status, r;
>  
>  	if (!ct)
>  		return PATH_UNCHECKED;
> @@ -384,109 +319,39 @@ int libcheck_check(struct checker * c)
>  	if (checker_is_sync(c))
>  		return tur_check(c->fd, c->timeout, &c->msgid);
>  
> -	/*
> -	 * Async mode
> -	 */
> -	if (ct->thread) {
> -		ct->checked_state = true;
> -		if (tur_check_async_timeout(c)) {
> -			int running = uatomic_xchg(&ct->running, 0);
> -			if (running) {
> -				pthread_cancel(ct->thread);
> -				condlog(3, "%d:%d : tur checker timeout",
> -					major(ct->devt), minor(ct->devt));
> -				c->msgid = MSG_TUR_TIMEOUT;
> -				tur_status = PATH_TIMEOUT;
> -			} else {
> -				pthread_mutex_lock(&ct->lock);
> -				tur_status = ct->state;
> -				c->msgid = ct->msgid;
> -				pthread_mutex_unlock(&ct->lock);
> -			}
> -			ct->thread = 0;
> -		} else if (uatomic_read(&ct->running) != 0) {
> +	/* Handle the case that the checker just completed */
> +	if (ct->rtx) {
> +		if (check_runner_state(ct) == RUNNER_RUNNING)
>  			condlog(3, "%d:%d : tur checker not finished",
> -				major(ct->devt), minor(ct->devt));
> -			tur_status = PATH_PENDING;
> -			c->msgid = MSG_TUR_RUNNING;
> -		} else {
> -			/* TUR checker done */
> -			ct->thread = 0;
> -			pthread_mutex_lock(&ct->lock);
> -			tur_status = ct->state;
> -			c->msgid = ct->msgid;
> -			pthread_mutex_unlock(&ct->lock);
> -		}
> -	} else {
> -		if (uatomic_read(&ct->holders) > 1) {
> -			/* The thread has been cancelled but hasn't quit. */
> -			if (ct->nr_timeouts == MAX_NR_TIMEOUTS) {
> -				condlog(2, "%d:%d : waiting for stalled tur thread to finish",
> -					major(ct->devt), minor(ct->devt));
> -				ct->nr_timeouts++;
> -			}
> -			/*
> -			 * Don't start new threads until the last once has
> -			 * finished.
> -			 */
> -			if (ct->nr_timeouts > MAX_NR_TIMEOUTS) {
> -				c->msgid = MSG_TUR_TIMEOUT;
> -				return PATH_TIMEOUT;
> -			}
> -			ct->nr_timeouts++;
> -			/*
> -			 * Start a new thread while the old one is stalled.
> -			 * We have to prevent it from interfering with the new
> -			 * thread. We create a new context and leave the old
> -			 * one with the stale thread, hoping it will clean up
> -			 * eventually.
> -			 */
> -			condlog(3, "%d:%d : tur thread not responding",
> -				major(ct->devt), minor(ct->devt));
> -
> -			/*
> -			 * libcheck_init will replace c->context.
> -			 * It fails only in OOM situations. In this case, return
> -			 * PATH_UNCHECKED to avoid prematurely failing the path.
> -			 */
> -			if (libcheck_init(c) != 0) {
> -				c->msgid = MSG_TUR_FAILED;
> -				return PATH_UNCHECKED;
> -			}
> -			((struct tur_checker_context *)c->context)->nr_timeouts = ct->nr_timeouts;
> -
> -			if (!uatomic_sub_return(&ct->holders, 1)) {
> -				/* It did terminate, eventually */
> -				cleanup_context(ct);
> -				((struct tur_checker_context *)c->context)->nr_timeouts = 0;
> -			}
> -
> -			ct = c->context;
> -		} else
> -			ct->nr_timeouts = 0;
> -		/* Start new TUR checker */
> -		pthread_mutex_lock(&ct->lock);
> -		tur_status = ct->state = PATH_PENDING;
> -		c->msgid = ct->msgid = MSG_TUR_RUNNING;
> -		pthread_mutex_unlock(&ct->lock);
> -		ct->fd = c->fd;
> -		ct->timeout = c->timeout;
> -		ct->checked_state = false;
> -		uatomic_add(&ct->holders, 1);
> -		uatomic_set(&ct->running, 1);
> -		tur_set_async_timeout(c);
> -		setup_thread_attr(&attr, 32 * 1024, 1);
> -		r = start_checker_thread(&ct->thread, &attr, &ct->ctx);
> -		pthread_attr_destroy(&attr);
> -		if (r) {
> -			uatomic_sub(&ct->holders, 1);
> -			uatomic_set(&ct->running, 0);
> -			ct->thread = 0;
> -			condlog(3, "%d:%d : failed to start tur thread, using"
> -				" sync mode", major(ct->devt), minor(ct->devt));
> -			return tur_check(c->fd, c->timeout, &c->msgid);
> -		}
> +				major(ct->tcx.devt), minor(ct->tcx.devt));
> +		c->msgid = ct->tcx.msgid;
> +		return ct->tcx.state;
>  	}
>  
> -	return tur_status;
> +	/* create new checker thread */
> +	ct->tcx.fd = c->fd;
> +	ct->tcx.timeout = c->timeout;
> +
> +	if (ct->nr_timeouts < MAX_NR_TIMEOUTS) {
> +		condlog(3, "%d:%d : starting checker with timeout",
> +			major(ct->tcx.devt), minor(ct->tcx.devt));
> +		ct->tcx.state = PATH_PENDING;
> +		ct->tcx.msgid = MSG_TUR_RUNNING;
> +		ct->rtx = get_runner(runner_callback, &ct->tcx,
> +				     sizeof(ct->tcx), 1000000 * c->timeout);
> +	} else {
> +		condlog(3, "%d:%d : starting checker without timeout",
> +			major(ct->tcx.devt), minor(ct->tcx.devt));
> +		ct->tcx.state = PATH_TIMEOUT;

shouldn't we set ct->tcx.msgid = MSG_TUR_MSG_TUR_TIMEOUT here?

-Ben

> +		ct->rtx = get_runner(runner_callback, &ct->tcx, sizeof(ct->tcx), 0);
> +	}
> +
> +	if (ct->rtx) {
> +		c->msgid = ct->tcx.msgid;
> +		return ct->tcx.state;
> +	} else {
> +		condlog(3, "%d:%d : failed to start tur thread, using sync mode",
> +			major(ct->tcx.devt), minor(ct->tcx.devt));
> +		return tur_check(c->fd, c->timeout, &c->msgid);
> +	}
>  }
> -- 
> 2.53.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/4] libmultipath: TUR checker: use runner threads
  2026-03-24  6:38   ` Benjamin Marzinski
@ 2026-03-24 12:24     ` Martin Wilck
  2026-03-24 14:46       ` Benjamin Marzinski
  0 siblings, 1 reply; 11+ messages in thread
From: Martin Wilck @ 2026-03-24 12:24 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: Christophe Varoqui, Brian Bunker, dm-devel

On Tue, 2026-03-24 at 02:38 -0400, Benjamin Marzinski wrote:
> On Thu, Mar 19, 2026 at 11:13:44PM +0100, Martin Wilck wrote:
> > Use the generic runners to simplify the TUR checker code.
> > 
> > The logic is the same as before, with one exception: the runner API
> > does
> > not allow to track a thread after it has been cancelled, like our
> > current
> > code does. The current code used this to determine when a
> > previously
> > cancelled thread eventually did complete. The use case for doing
> > this in
> > the current code was the MAX_NR_RUNNERS logic: the code waited
> > "forever"
> > for the "last" checker to complete. This is not possible with
> > runners;
> > once a runner is cancelled, it is off limits. (Waiting for a zombie
> > thread to
> > write to memory belonging to the main program, as the current code
> > does,
> > doesn't seem to be the best idea, anyway).
> > 
> > Therefore this patch implements a different logic for
> > MAX_NR_RUNNERS. When
> > nr_timeouts reaches the limit, the last checker is spawned with an
> > infinite
> > timeout. Like before, no new checker thread will be spawned until
> > this last
> > checker eventually completes. One difference to the previous code
> > is that
> > this last checker thread is never cancelled (after all, it never
> > times
> > out). But this code is meant for the case where cancellation is not
> > effective, anyway. Another difference is that the new code will
> > report
> > PATH_TIMEOUT state as soon as this last thread is started rather
> > than
> > PATH_PENDING. That seems reasonable given that the checker
> > previously timed
> > out for this thread.
> 
> I'm not super enthused about this way of handling the misbehaving
> threads. It's probably o.k. But what if the issue is that the checker
> isn't completing by itself at all, but it would eventually get
> cancelled, just not till after we timeout. In this case, we should be
> cancelling that last runner. Admittedly, it's kinda hard to believe
> that
> a new runner would work correctly, while the old one still is stuck
> needing to be cancelled, but what we had before definitely worked.

Well, not relying on a cancelled, detached zombie thread to eventually
modify some shared memory location was a design principle of this new
code.

But of course I realize that this does represent a change for those
cases where a checker doesn't complete but can still be forced to
return by cancellation. Is this a reasonable scenario? SG_IO on sd
devices, which is what our checkers do, will go to uninterruptible
sleep in blk_execute_rq(), and thus can't be cancelled (it _should_
time out, though, unless the kernel misbehaves). It would be a
different thing if we used sg devices instead...

> 
> I don't really see why we can't continue with roughly what we had
> before. We just need a new state for check_runner_state() to return.
> Something like RUNNER_STOPPING. cancel_runner() would cmpxchg with
> RUNNER_STOPPING, and not free the runner_context. In check_runner(),
> if
> the state was RUNNER_STOPPING or you just called cancel_runner(), you
> would check the ctx->refcount. If it was already 1, you would release
> the context and return RUNNER_CANCELLED. This way RUNNER_CANCELLED
> still
> means that the context has been freed, but it would only happen when
> the
> thread has ended.

That sort of logic is dangerous, IMO. I'd prefer to use only
uatomic_sub_return() on the refcount.

>  If you want to stop paying attention to a runner in
> RUNNER_STOPPING, you just call release_context() manually to drop
> your
> reference without waiting for it to finish.
> 
> Then, if you still have a runner in RUNNER_STOPPING after calling
> check_runner_state() at the start of libcheck_check(), you drop your
> reference to it if ct->nr_timeouts < MAX_NR_TIMEOUTS. Otherwise you
> just
> return PATH_TIMEOUT.
> 
> Thoughts? Like I said, I think what you have will probably be fine.
> I just know what we had before worked.
> 

I'll try to figure something out. No matter what I do, the API for the
caller will get more complicated and error-prone, because it will need
to track the lifetime of the runner's memory somehow (and if we do
that, we might as well give up the idea of just using memcpy() to avoid
shared memory issues).

> More comments below.
> 
> > 
> > Signed-off-by: Martin Wilck <mwilck@suse.com>
> > ---
> >  libmultipath/checkers/tur.c | 341 +++++++++++---------------------
> > ----
> >  1 file changed, 103 insertions(+), 238 deletions(-)
> > 
> > diff --git a/libmultipath/checkers/tur.c
> > b/libmultipath/checkers/tur.c
> > index ba4ca68..6f22c45 100644
> > --- a/libmultipath/checkers/tur.c
> > +++ b/libmultipath/checkers/tur.c
> > @@ -3,7 +3,6 @@
> >   *
> >   * Copyright (c) 2004 Christophe Varoqui
> >   */
> > -#include <stdio.h>
> >  #include <stdlib.h>
> >  #include <string.h>
> >  #include <sys/types.h>
> > @@ -17,13 +16,13 @@
> >  #include <pthread.h>
> >  #include <urcu.h>
> >  #include <urcu/uatomic.h>
> > +#include <assert.h>
> >  
> >  #include "checkers.h"
> >  
> >  #include "debug.h"
> >  #include "sg_include.h"
> > -#include "util.h"
> > -#include "time-util.h"
> > +#include "runner.h"
> >  
> >  #define TUR_CMD_LEN 6
> >  #define HEAVY_CHECK_COUNT       10
> > @@ -45,21 +44,20 @@ const char *libcheck_msgtable[] = {
> >  	NULL,
> >  };
> >  
> > -struct tur_checker_context {
> > -	dev_t devt;
> > -	int state;
> > -	int running; /* uatomic access only */
> > +struct tur_context {
> >  	int fd;
> > +	dev_t devt;
> >  	unsigned int timeout;
> > -	time_t time;
> > -	pthread_t thread;
> > -	pthread_mutex_t lock;
> > -	pthread_cond_t active;
> > -	int holders; /* uatomic access only */
> > -	int msgid;
> > +	int state;
> > +	short msgid;
> > +};
> > +
> > +struct tur_checker_context {
> >  	struct checker_context ctx;
> > +	int last_runner_state;
> >  	unsigned int nr_timeouts;
> > -	bool checked_state;
> > +	struct runner_context *rtx;
> > +	struct tur_context tcx;
> 
> There are lots of structures named *_context with names like *tx.
> Perhaps we could rename the tur_context something like tur_data?
> 
> Also, ctx is the checker_context here, and the runner_contect in the
> runner.c code. I think it would be easier to follow and grep if we
> kept
> the runner_context named rtx in both places.
> 
> >  };
> >  
> >  int libcheck_init (struct checker * c)
> > @@ -67,48 +65,26 @@ int libcheck_init (struct checker * c)
> >  	struct tur_checker_context *ct;
> 
> Also the tur_checker_context is tcc in every function but this one. I
> think it helpful to always have the same name.
> 
> 

Right, I wasn't so happy about the variable naming either. I kept ct
here because this is what it used to be. I'll try to make it more
consistent in v2.

> >  	}
> >  
> > -	return tur_status;
> > +	/* create new checker thread */
> > +	ct->tcx.fd = c->fd;
> > +	ct->tcx.timeout = c->timeout;
> > +
> > +	if (ct->nr_timeouts < MAX_NR_TIMEOUTS) {
> > +		condlog(3, "%d:%d : starting checker with
> > timeout",
> > +			major(ct->tcx.devt), minor(ct->tcx.devt));
> > +		ct->tcx.state = PATH_PENDING;
> > +		ct->tcx.msgid = MSG_TUR_RUNNING;
> > +		ct->rtx = get_runner(runner_callback, &ct->tcx,
> > +				     sizeof(ct->tcx), 1000000 * c-
> > >timeout);
> > +	} else {
> > +		condlog(3, "%d:%d : starting checker without
> > timeout",
> > +			major(ct->tcx.devt), minor(ct->tcx.devt));
> > +		ct->tcx.state = PATH_TIMEOUT;
> 
> shouldn't we set ct->tcx.msgid = MSG_TUR_MSG_TUR_TIMEOUT here?
> 

This was on purpose, but of course debatable. My thinking was this: The
previous checker has timed out and we're just retrying, therefore
setting PATH_TIMEOUT correctly reflects the state. But the message,
which is mainly for the user, might as well express the fact that
another checker is indeed currently running.

If we change the timeout logic, this change will be obsolete anyway.

Regards
Martin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/4] libmultipath: TUR checker: use runner threads
  2026-03-24 12:24     ` Martin Wilck
@ 2026-03-24 14:46       ` Benjamin Marzinski
  0 siblings, 0 replies; 11+ messages in thread
From: Benjamin Marzinski @ 2026-03-24 14:46 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Christophe Varoqui, Brian Bunker, dm-devel

On Tue, Mar 24, 2026 at 01:24:45PM +0100, Martin Wilck wrote:
> On Tue, 2026-03-24 at 02:38 -0400, Benjamin Marzinski wrote:
> > On Thu, Mar 19, 2026 at 11:13:44PM +0100, Martin Wilck wrote:
> > > Use the generic runners to simplify the TUR checker code.
> > > 
> > > The logic is the same as before, with one exception: the runner API
> > > does
> > > not allow to track a thread after it has been cancelled, like our
> > > current
> > > code does. The current code used this to determine when a
> > > previously
> > > cancelled thread eventually did complete. The use case for doing
> > > this in
> > > the current code was the MAX_NR_RUNNERS logic: the code waited
> > > "forever"
> > > for the "last" checker to complete. This is not possible with
> > > runners;
> > > once a runner is cancelled, it is off limits. (Waiting for a zombie
> > > thread to
> > > write to memory belonging to the main program, as the current code
> > > does,
> > > doesn't seem to be the best idea, anyway).
> > > 
> > > Therefore this patch implements a different logic for
> > > MAX_NR_RUNNERS. When
> > > nr_timeouts reaches the limit, the last checker is spawned with an
> > > infinite
> > > timeout. Like before, no new checker thread will be spawned until
> > > this last
> > > checker eventually completes. One difference to the previous code
> > > is that
> > > this last checker thread is never cancelled (after all, it never
> > > times
> > > out). But this code is meant for the case where cancellation is not
> > > effective, anyway. Another difference is that the new code will
> > > report
> > > PATH_TIMEOUT state as soon as this last thread is started rather
> > > than
> > > PATH_PENDING. That seems reasonable given that the checker
> > > previously timed
> > > out for this thread.
> > 
> > I'm not super enthused about this way of handling the misbehaving
> > threads. It's probably o.k. But what if the issue is that the checker
> > isn't completing by itself at all, but it would eventually get
> > cancelled, just not till after we timeout. In this case, we should be
> > cancelling that last runner. Admittedly, it's kinda hard to believe
> > that
> > a new runner would work correctly, while the old one still is stuck
> > needing to be cancelled, but what we had before definitely worked.
> 
> Well, not relying on a cancelled, detached zombie thread to eventually
> modify some shared memory location was a design principle of this new
> code.

We rely on cleanup handlers all over the code. Isn't this just another case
of that?

> But of course I realize that this does represent a change for those
> cases where a checker doesn't complete but can still be forced to
> return by cancellation. Is this a reasonable scenario?

Likely not. Like I said, I think that your method is probably fine.  I
just have a preference for the one that has been working without issue
for years. If my idea the design notably worse, I'm not strongly against
what you have.

> SG_IO on sd
> devices, which is what our checkers do, will go to uninterruptible
> sleep in blk_execute_rq(), and thus can't be cancelled (it _should_
> time out, though, unless the kernel misbehaves). It would be a
> different thing if we used sg devices instead...
> 
> > 
> > I don't really see why we can't continue with roughly what we had
> > before. We just need a new state for check_runner_state() to return.
> > Something like RUNNER_STOPPING. cancel_runner() would cmpxchg with
> > RUNNER_STOPPING, and not free the runner_context. In check_runner(),
> > if
> > the state was RUNNER_STOPPING or you just called cancel_runner(), you
> > would check the ctx->refcount. If it was already 1, you would release
> > the context and return RUNNER_CANCELLED. This way RUNNER_CANCELLED
> > still
> > means that the context has been freed, but it would only happen when
> > the
> > thread has ended.
> 
> That sort of logic is dangerous, IMO. I'd prefer to use only
> uatomic_sub_return() on the refcount.

I concede that it's racey. You could add a memory barrier to make in
quicker to see the results of a uatomic_sub_return() from the runner
thread, but it will always be racey to check if another thread has done
something. You might just miss the update and have to wait till the next
check. But I don't see how it's dangerous. You have a reference to the
object, so it should always be safe to check the object's state.

> >  If you want to stop paying attention to a runner in
> > RUNNER_STOPPING, you just call release_context() manually to drop
> > your
> > reference without waiting for it to finish.
> > 
> > Then, if you still have a runner in RUNNER_STOPPING after calling
> > check_runner_state() at the start of libcheck_check(), you drop your
> > reference to it if ct->nr_timeouts < MAX_NR_TIMEOUTS. Otherwise you
> > just
> > return PATH_TIMEOUT.
> > 
> > Thoughts? Like I said, I think what you have will probably be fine.
> > I just know what we had before worked.
> > 
> 
> I'll try to figure something out. No matter what I do, the API for the
> caller will get more complicated and error-prone, because it will need
> to track the lifetime of the runner's memory somehow (and if we do
> that, we might as well give up the idea of just using memcpy() to avoid
> shared memory issues).

It would still be true that when you got a RUNNER_DONE or
RUNNER_CANCELLED result from check_runner_state(), the rtx would be
invalid. Otherwise it would be valid. The only difference is that
RUNNER_STOPPING means that you have already cancelled the thread. You
can either hang onto the rtx until check_runner_state() returns
RUNNER_CANCELLED or drop your reference without waiting. But yes, the
runner code will be reading the refcount of the shared rtx.

I think that your memcpy design still will make things signficantly
simpler than the existing code, even with the code checking the refcount
of rtx. That refcount checking only comes into play when we've cancelled
the thread, at which point we don't care about the data anymore anyways.
 
> > More comments below.
> > 
> > > 
> > > Signed-off-by: Martin Wilck <mwilck@suse.com>
> > > ---
> > >  libmultipath/checkers/tur.c | 341 +++++++++++---------------------
> > > ----
> > >  1 file changed, 103 insertions(+), 238 deletions(-)
> > > 
> > > diff --git a/libmultipath/checkers/tur.c
> > > b/libmultipath/checkers/tur.c
> > > index ba4ca68..6f22c45 100644
> > > --- a/libmultipath/checkers/tur.c
> > > +++ b/libmultipath/checkers/tur.c
> > > @@ -3,7 +3,6 @@
> > >   *
> > >   * Copyright (c) 2004 Christophe Varoqui
> > >   */
> > > -#include <stdio.h>
> > >  #include <stdlib.h>
> > >  #include <string.h>
> > >  #include <sys/types.h>
> > > @@ -17,13 +16,13 @@
> > >  #include <pthread.h>
> > >  #include <urcu.h>
> > >  #include <urcu/uatomic.h>
> > > +#include <assert.h>
> > >  
> > >  #include "checkers.h"
> > >  
> > >  #include "debug.h"
> > >  #include "sg_include.h"
> > > -#include "util.h"
> > > -#include "time-util.h"
> > > +#include "runner.h"
> > >  
> > >  #define TUR_CMD_LEN 6
> > >  #define HEAVY_CHECK_COUNT       10
> > > @@ -45,21 +44,20 @@ const char *libcheck_msgtable[] = {
> > >  	NULL,
> > >  };
> > >  
> > > -struct tur_checker_context {
> > > -	dev_t devt;
> > > -	int state;
> > > -	int running; /* uatomic access only */
> > > +struct tur_context {
> > >  	int fd;
> > > +	dev_t devt;
> > >  	unsigned int timeout;
> > > -	time_t time;
> > > -	pthread_t thread;
> > > -	pthread_mutex_t lock;
> > > -	pthread_cond_t active;
> > > -	int holders; /* uatomic access only */
> > > -	int msgid;
> > > +	int state;
> > > +	short msgid;
> > > +};
> > > +
> > > +struct tur_checker_context {
> > >  	struct checker_context ctx;
> > > +	int last_runner_state;
> > >  	unsigned int nr_timeouts;
> > > -	bool checked_state;
> > > +	struct runner_context *rtx;
> > > +	struct tur_context tcx;
> > 
> > There are lots of structures named *_context with names like *tx.
> > Perhaps we could rename the tur_context something like tur_data?
> > 
> > Also, ctx is the checker_context here, and the runner_contect in the
> > runner.c code. I think it would be easier to follow and grep if we
> > kept
> > the runner_context named rtx in both places.
> > 
> > >  };
> > >  
> > >  int libcheck_init (struct checker * c)
> > > @@ -67,48 +65,26 @@ int libcheck_init (struct checker * c)
> > >  	struct tur_checker_context *ct;
> > 
> > Also the tur_checker_context is tcc in every function but this one. I
> > think it helpful to always have the same name.
> > 
> > 
> 
> Right, I wasn't so happy about the variable naming either. I kept ct
> here because this is what it used to be. I'll try to make it more
> consistent in v2.
> 
> > >  	}
> > >  
> > > -	return tur_status;
> > > +	/* create new checker thread */
> > > +	ct->tcx.fd = c->fd;
> > > +	ct->tcx.timeout = c->timeout;
> > > +
> > > +	if (ct->nr_timeouts < MAX_NR_TIMEOUTS) {
> > > +		condlog(3, "%d:%d : starting checker with
> > > timeout",
> > > +			major(ct->tcx.devt), minor(ct->tcx.devt));
> > > +		ct->tcx.state = PATH_PENDING;
> > > +		ct->tcx.msgid = MSG_TUR_RUNNING;
> > > +		ct->rtx = get_runner(runner_callback, &ct->tcx,
> > > +				     sizeof(ct->tcx), 1000000 * c-
> > > >timeout);
> > > +	} else {
> > > +		condlog(3, "%d:%d : starting checker without
> > > timeout",
> > > +			major(ct->tcx.devt), minor(ct->tcx.devt));
> > > +		ct->tcx.state = PATH_TIMEOUT;
> > 
> > shouldn't we set ct->tcx.msgid = MSG_TUR_MSG_TUR_TIMEOUT here?
> > 
> 
> This was on purpose, but of course debatable. My thinking was this: The
> previous checker has timed out and we're just retrying, therefore
> setting PATH_TIMEOUT correctly reflects the state. But the message,
> which is mainly for the user, might as well express the fact that
> another checker is indeed currently running.

But won't ct->tcx.msgid be most likely already be set to
MSG_TUR_MSG_TUR_TIMEOUT from the previous failed runner?
check_runner_state() will update it, but the

c->msgid = ct->tcx.msgid

line just below this will use whatever happens to be in there, since we
don't set it. It seems like we should set it to something. MSG_TUR_RUNNING
works too.
 
> If we change the timeout logic, this change will be obsolete anyway.

Yep. But we still haven't decided that one way or the other.

-Ben

> Regards
> Martin


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-03-24 14:46 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-19 22:13 [PATCH 0/4] multipath-tools: generic async threads for TUR checker Martin Wilck
2026-03-19 22:13 ` [PATCH 1/4] multipathd: get_new_state: map PATH_TIMEOUT to PATH_DOWN Martin Wilck
2026-03-24  5:35   ` Benjamin Marzinski
2026-03-19 22:13 ` [PATCH 2/4] libmpathutil: add generic implementation for checker thread runners Martin Wilck
2026-03-24  5:44   ` Benjamin Marzinski
2026-03-19 22:13 ` [PATCH 3/4] multipath-tools tests: add test program for " Martin Wilck
2026-03-24  5:47   ` Benjamin Marzinski
2026-03-19 22:13 ` [PATCH 4/4] libmultipath: TUR checker: use runner threads Martin Wilck
2026-03-24  6:38   ` Benjamin Marzinski
2026-03-24 12:24     ` Martin Wilck
2026-03-24 14:46       ` Benjamin Marzinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox