[PATCH v2 0/5] multipath-tools: generic async threads for TUR checker

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/5] multipath-tools: generic async threads for TUR checker
@ 2026-04-23 20:44 Martin Wilck
  2026-04-23 20:44 ` [PATCH v2 1/5] multipathd: get_new_state: map PATH_TIMEOUT to PATH_DOWN Martin Wilck
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Martin Wilck @ 2026-04-23 20:44 UTC (permalink / raw)
  To: Christophe Varoqui, Benjamin Marzinski, Brian Bunker, dm-devel
  Cc: Martin Wilck

Motivated by the recent submission of an ALUA checker for multipath-tools by
Brian Bunker, I am proposing a generic framework for asynchronous checker
threads in multipath-tools.

The first patch is a small fix I came up with while testing this.
2/5 is the actual implementation, 3/5 test code, and 4/4 modifies the
TUR checker to the new library code. 5/5 is new, it simplifies testing
the timeout handling code.

This makes the logic of the TUR checker easier to understand, and should make
it easier for Brian to implement the ALUA checker based on the same framework.

Further improvements on top of this are possible. The new TUR code, except for
the tur_check() function itself, is pretty generic and would allow abstracting
an "async checker" model with just a few changes. This would make it possible
to switch also the legacy checkers to an asynchronous mode of operation.

Comments and reviews welcome.
Martin

Changes v1 -> v2 (based on the review by Ben Marzinski):

- The design of the runner code has been changed such that no caller-side
  references are dropped automatically any more. The caller needs
  to call release_runner() explicitly to drop its reference, independent
  of the runner status. This simplifies the code and allows implementing
  the MAX_TIMEOUTS logic in tur.c as it used to be. The overall API is
  not more complex, as release_runner() replaces cancel_runner().
- An additional state RUNNER_DEAD is introduced, denoting a state in which
  the callback result is unavailable, but the thread has finished.
- The naming of variables and struct members has been cleaned up.
- 02/05: specified alignment for struct runner_context.data
- 02/05: removed most assert() statements
- 02/05: removed superfluous cmm_smp_wmb()
- 02/05: perform all state modifications in the runner's cleanup function
- 03/05: fixed signal handler name
- 03/05: reduced the run time of the runner test script
- 04/05: revert back to the original MAX_TIMEOUTS logic, keeping a ref the "last"
  runner context until the thread eventually terminates
- 04/05: consistent handling of state and checker message
- 05/05: new

Martin Wilck (5):
  multipathd: get_new_state: map PATH_TIMEOUT to PATH_DOWN
  libmpathutil: add generic implementation for checker thread runners
  multipath-tools tests: add test program for thread runners
  libmultipath: TUR checker: use runner threads
  libmultipath: tur checker: improve tur_deep_sleep() test

 libmpathutil/Makefile             |   2 +-
 libmpathutil/libmpathutil.version |   7 +-
 libmpathutil/runner.c             | 221 ++++++++++++
 libmpathutil/runner.h             | 102 ++++++
 libmultipath/checkers/tur.c       | 369 +++++++-------------
 multipathd/main.c                 |   4 +-
 tests/Makefile                    |  15 +-
 tests/runner-test.sh              |  55 +++
 tests/runner-test.supp            |  15 +
 tests/runner.c                    | 552 ++++++++++++++++++++++++++++++
 10 files changed, 1081 insertions(+), 261 deletions(-)
 create mode 100644 libmpathutil/runner.c
 create mode 100644 libmpathutil/runner.h
 create mode 100755 tests/runner-test.sh
 create mode 100644 tests/runner-test.supp
 create mode 100644 tests/runner.c

-- 
2.53.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/5] multipathd: get_new_state: map PATH_TIMEOUT to PATH_DOWN
  2026-04-23 20:44 [PATCH v2 0/5] multipath-tools: generic async threads for TUR checker Martin Wilck
@ 2026-04-23 20:44 ` Martin Wilck
  2026-04-23 20:44 ` [PATCH v2 2/5] libmpathutil: add generic implementation for checker thread runners Martin Wilck
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Martin Wilck @ 2026-04-23 20:44 UTC (permalink / raw)
  To: Christophe Varoqui, Benjamin Marzinski, Brian Bunker, dm-devel
  Cc: Martin Wilck

We map PATH_TIMEOUT to PATH_DOWN in pathinfo(), but not in get_new_state().
Do it there, too, to treat the states consistently.
This avoids logging "checker timed out" twice in update_path_state(), even
if log_checker_err is set to "once".

Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 multipathd/main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/multipathd/main.c b/multipathd/main.c
index d87c7b1..9bce4d6 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2503,8 +2503,10 @@ get_new_state(struct path *pp)
 	 * Wait for uevent for removed paths;
 	 * some LLDDs like zfcp keep paths unavailable
 	 * without sending uevents.
+	 * Also, map PATH_TIMEOUT to PATH_DOWN here, like we do in
+	 * pathinfo().
 	 */
-	if (newstate == PATH_REMOVED)
+	if (newstate == PATH_REMOVED || newstate == PATH_TIMEOUT)
 		newstate = PATH_DOWN;
 
 	/*
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/5] libmpathutil: add generic implementation for checker thread runners
  2026-04-23 20:44 [PATCH v2 0/5] multipath-tools: generic async threads for TUR checker Martin Wilck
  2026-04-23 20:44 ` [PATCH v2 1/5] multipathd: get_new_state: map PATH_TIMEOUT to PATH_DOWN Martin Wilck
@ 2026-04-23 20:44 ` Martin Wilck
  2026-04-28  0:20   ` Benjamin Marzinski
  2026-04-23 20:44 ` [PATCH v2 3/5] multipath-tools tests: add test program for " Martin Wilck
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Martin Wilck @ 2026-04-23 20:44 UTC (permalink / raw)
  To: Christophe Varoqui, Benjamin Marzinski, Brian Bunker, dm-devel
  Cc: Martin Wilck

This code adds a generic, abstract implementation of the kind of threads we
need for checkers: detached threads that may time out. The design is such
that these threads never access any memory of the calling program,
simplifying the management of lifetimes of objects.

See the documentation of the API in "runner.h".

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 libmpathutil/Makefile             |   2 +-
 libmpathutil/libmpathutil.version |   7 +-
 libmpathutil/runner.c             | 221 ++++++++++++++++++++++++++++++
 libmpathutil/runner.h             | 102 ++++++++++++++
 4 files changed, 330 insertions(+), 2 deletions(-)
 create mode 100644 libmpathutil/runner.c
 create mode 100644 libmpathutil/runner.h

diff --git a/libmpathutil/Makefile b/libmpathutil/Makefile
index e1cc2c6..bdd7063 100644
--- a/libmpathutil/Makefile
+++ b/libmpathutil/Makefile
@@ -13,7 +13,7 @@ LIBDEPS += -lpthread -ldl -ludev -L$(mpathcmddir) -lmpathcmd $(SYSTEMD_LIBDEPS)
 
 # other object files
 OBJS := mt-libudev.o parser.o vector.o util.o debug.o time-util.o \
-	uxsock.o log_pthread.o log.o strbuf.o globals.o msort.o
+	uxsock.o log_pthread.o log.o strbuf.o globals.o msort.o runner.o
 
 all:	$(DEVLIB)
 
diff --git a/libmpathutil/libmpathutil.version b/libmpathutil/libmpathutil.version
index 22ef994..c69120d 100644
--- a/libmpathutil/libmpathutil.version
+++ b/libmpathutil/libmpathutil.version
@@ -43,7 +43,7 @@ LIBMPATHCOMMON_1.0.0 {
 	put_multipath_config;
 };
 
-LIBMPATHUTIL_5.0 {
+LIBMPATHUTIL_6.0 {
 global:
 	alloc_bitfield;
 	alloc_strvec;
@@ -62,6 +62,7 @@ global:
 	cleanup_vector;
 	cleanup_vector_free;
 	convert_dev;
+	check_runner;
 	dlog;
 	filepresent;
 	fill_strbuf;
@@ -72,6 +73,8 @@ global:
 	free_strvec;
 	get_linux_version_code;
 	get_monotonic_time;
+	get_persistent_runner;
+	get_runner;
 	get_strbuf_buf__;
 	get_next_string;
 	get_strbuf_len;
@@ -161,7 +164,9 @@ global:
 	process_file;
 	pthread_cond_init_mono;
 	recv_packet;
+	release_runner;
 	reset_strbuf;
+	runner_state_name;
 	safe_write;
 	send_packet;
 	set_max_fds;
diff --git a/libmpathutil/runner.c b/libmpathutil/runner.c
new file mode 100644
index 0000000..f5c5482
--- /dev/null
+++ b/libmpathutil/runner.c
@@ -0,0 +1,221 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+// Copyright (c) 2026 SUSE LLC
+#include <assert.h>
+#include <time.h>
+#include <pthread.h>
+#include <urcu/uatomic.h>
+#include "util.h"
+#include "debug.h"
+#include "time-util.h"
+#include "runner.h"
+
+#define STACK_SIZE (4 * 1024)
+#define MILLION 1000000
+
+const char *runner_state_name(int state)
+{
+	// clang-format off
+	static const char * const state_name_[] = {
+		[RUNNER_IDLE] = "idle",
+		[RUNNER_RUNNING] = "running",
+		[RUNNER_DONE] = "done",
+		[RUNNER_CANCELLED] = "cancelled",
+		[RUNNER_DEAD] = "dead"
+	};
+	// clang-format on
+
+	if (state < RUNNER_IDLE || state > RUNNER_DEAD)
+		return "unknown";
+	return state_name_[state];
+}
+
+struct runner_context {
+	int refcount;
+	int status;
+	struct timespec deadline;
+	pthread_t thr;
+	void (*func)(void *data);
+	/* User data will be copied into this area */
+	char __attribute__((aligned(sizeof(void *)))) data[];
+};
+
+static void release_context(struct runner_context *rctx)
+{
+	int n;
+
+	n = uatomic_sub_return(&rctx->refcount, 1);
+	assert(n >= 0);
+
+	if (n == 0)
+		free(rctx);
+}
+
+static void cleanup_context(struct runner_context **prctx)
+{
+	struct runner_context *rctx = *prctx;
+	int st;
+
+	if (!rctx)
+		return;
+
+	st = uatomic_cmpxchg(&rctx->status, RUNNER_RUNNING, RUNNER_DONE);
+	if (st != RUNNER_RUNNING) {
+		uatomic_cmpxchg(&rctx->status, st, RUNNER_DEAD);
+		if (st != RUNNER_CANCELLED)
+			condlog(2, "%s: runner %p finished in state %s",
+				__func__, rctx, runner_state_name(st));
+	}
+	release_context(rctx);
+}
+
+static void *runner_thread(void *arg)
+{
+	int st, refcount;
+	/*
+	 * The cleanup function makes sure memory is freed if the thread is
+	 * cancelled (-fexceptions).
+	 */
+	struct runner_context *rctx __attribute__((cleanup(cleanup_context))) = arg;
+
+	refcount = uatomic_add_return(&rctx->refcount, 1);
+	assert(refcount == 2);
+
+	st = uatomic_cmpxchg(&rctx->status, RUNNER_IDLE, RUNNER_RUNNING);
+	/*
+	 * The thread has already been cancelled before it was even started.
+	 * In this case, cancel_runner() doesn't release the context.
+	 * Do it here. See comments for RUNNER_IDLE in cancel_runner().
+	 */
+	if (st != RUNNER_IDLE) {
+		release_context(rctx);
+		return NULL;
+	}
+
+	(*rctx->func)(rctx->data);
+	return NULL;
+}
+
+static int cancel_runner(struct runner_context *rctx)
+{
+	int st, st_new;
+	int level = 4, retry = 1;
+
+repeat:
+	st = uatomic_cmpxchg(&rctx->status, RUNNER_RUNNING, RUNNER_CANCELLED);
+	st_new = st;
+	switch (st) {
+	case RUNNER_IDLE:
+		/*
+		 * Race with thread startup.
+		 *
+		 * If after the following cmpxchg st is still IDLE, the cmpxchg
+		 * in runner_thread() will return CANCELLED, and the context
+		 * will be relased there. Otherwise, the thread has switched
+		 * to RUNNING in the meantime, and we will be able to cancel
+		 * it regularly if we retry.
+		 */
+		if (retry--) {
+			st = uatomic_cmpxchg(&rctx->status, RUNNER_IDLE,
+					     RUNNER_CANCELLED);
+			if (st == RUNNER_IDLE)
+				st_new = RUNNER_CANCELLED;
+			else
+				goto repeat;
+		}
+		break;
+	case RUNNER_RUNNING:
+		pthread_cancel(rctx->thr);
+		st_new = RUNNER_CANCELLED;
+		/* fallthrough */
+	case RUNNER_CANCELLED:
+		break;
+	case RUNNER_DONE:
+		st_new = RUNNER_DEAD;
+		/* fallthrough */
+	case RUNNER_DEAD:
+		level = 3;
+		break;
+	}
+	condlog(level, "%s: runner %p cancelled in state '%s'", __func__, rctx,
+		runner_state_name(st));
+	return st_new;
+}
+
+void release_runner(struct runner_context *rctx)
+{
+	cancel_runner(rctx);
+	release_context(rctx);
+}
+
+int check_runner(struct runner_context *rctx, void *data, unsigned int size)
+{
+	int st = uatomic_read(&rctx->status);
+
+	switch (st) {
+	case RUNNER_DONE:
+		if (data)
+			/* hand back the data to the caller */
+			memcpy(data, rctx->data, size);
+		/* fallthrough */
+	case RUNNER_DEAD:
+	case RUNNER_CANCELLED:
+		return st;
+	case RUNNER_IDLE:
+	case RUNNER_RUNNING:
+		if (rctx->deadline.tv_sec != 0 || rctx->deadline.tv_nsec != 0) {
+			struct timespec now;
+
+			get_monotonic_time(&now);
+			if (timespeccmp(&rctx->deadline, &now) <= 0)
+				return cancel_runner(rctx);
+		}
+		/* don't bother the caller with RUNNER_IDLE */
+		return RUNNER_RUNNING;
+	default:
+		condlog(1, "%s: runner in impossible state '%s'", __func__,
+			runner_state_name(st));
+		assert(false);
+		return st;
+	}
+}
+
+struct runner_context *get_runner(runner_func func, void *data,
+				  unsigned int size, unsigned long timeout_usec)
+{
+	static const struct timespec time_zero = {.tv_sec = 0};
+	struct runner_context *rctx;
+	pthread_attr_t attr;
+	int rc;
+
+	if (!func || !data || size <= 0) {
+		condlog(0, "%s: illegal arguments", __func__);
+		return NULL;
+	}
+
+	rctx = malloc(sizeof(*rctx) + size);
+	if (!rctx)
+		return NULL;
+
+	rctx->func = func;
+	uatomic_set(&rctx->refcount, 1);
+	uatomic_set(&rctx->status, RUNNER_IDLE);
+	memcpy(rctx->data, data, size);
+
+	if (timeout_usec) {
+		get_monotonic_time(&rctx->deadline);
+		rctx->deadline.tv_sec += timeout_usec / MILLION;
+		rctx->deadline.tv_nsec += (timeout_usec % MILLION) * 1000;
+	} else
+		rctx->deadline = time_zero;
+
+	setup_thread_attr(&attr, STACK_SIZE, 1);
+	rc = pthread_create(&rctx->thr, &attr, runner_thread, rctx);
+	pthread_attr_destroy(&attr);
+
+	if (rc) {
+		condlog(1, "%s: pthread_create(): %s", __func__, strerror(rc));
+		release_context(rctx);
+		return NULL;
+	}
+	return rctx;
+}
diff --git a/libmpathutil/runner.h b/libmpathutil/runner.h
new file mode 100644
index 0000000..cb2ad94
--- /dev/null
+++ b/libmpathutil/runner.h
@@ -0,0 +1,102 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+// Copyright (c) 2026 SUSE LLC
+#ifndef RUNNER_H_INCLUDED
+#define RUNNER_H_INCLUDED
+
+enum runner_status {
+	/**
+	 * Initial state. Runner thread has not started yet.
+	 */
+	RUNNER_IDLE,
+	/**
+	 * The runner thread is running in @runner_func (see below).
+	 */
+	RUNNER_RUNNING,
+	/**
+	 * @runner_func has terminated. This is the only only state
+	 * in which @check_runner can obtain user data in the @data
+	 * parameter.
+	 */
+	RUNNER_DONE,
+	/**
+	 * The runner thread has been cancelled (usually because of a timeout),
+	 * but @runner_func is still running.
+	 */
+	RUNNER_CANCELLED,
+	/**
+	 * The runner thread has terminated without providing user data
+	 * (usually after a timeout).
+	 */
+	RUNNER_DEAD,
+};
+
+typedef void (*runner_func)(void *data);
+struct runner_context;
+
+/**
+ * runner_state_name(): helper for printing runner states
+ *
+ * @param state: a valid @runner_status value
+ * @returns: a string describing the status
+ */
+const char *runner_state_name(int state);
+
+/**
+ * get_runner(): start a runner thread
+ *
+ * This function starts a runner thread that calls @func(@data).
+ * The thread is created as detached thread.
+ * @data will be copied to thread-private memory, which will be freed when
+ * the runner terminates.
+ * Output values can be retrieved with @check_runner().
+ *
+ * @param func: the worker function to invoke
+ *        This parameter must not be NULL.
+ * @param data: pointer the data to pass to the function (input and output)
+ *        This parameter must not be NULL.
+ * @param size: the size (in bytes) of the data passed to the function
+ *        This parameter must be positive.
+ * @param timeout_usec: timeout (in microseconds) after which to cancel the
+ * runner. If it is 0, the runner will not time out.
+ * @returns: a runner context that must be passed to the functions below.
+ */
+struct runner_context *get_runner(runner_func func, void *data,
+				  unsigned int size, unsigned long timeout_usec);
+
+/**
+ * release_runner(): release a runner context
+ *
+ * This function should be called when the caller has no interest to
+ * operate on the given @runner_context any more.
+ * If necessary, the runner thread will be cancelled.
+ * Upon return of this function, @rctx becomes stale and shouldn't accessed
+ * any more.
+ *
+ * @param rctx: the context of the runner to be released.
+ */
+void release_runner(struct runner_context *rctx);
+
+/**
+ * check_runner(): query the state of a runner thread and obtain results
+ *
+ * Check the state of a runner previously started with @get_runner. If the
+ * thread function has completed, @RUNNER_DONE will be returned, and the
+ * user data will be copied into the @data argument. If @check_runner returns
+ * anything else, @data contains no valid data. The @size argument
+ * will typically be the same as the @size passed to @get_runner (meaning
+ * that @data represents an object of the same type as the @data argument
+ * passed to @get_runner previously).
+ *
+ * Side effect: If the runner has timed out, the thread will be cancelled.
+ *
+ * @param rctx: the context of the runner to be queried
+ * @param data: memory pointer that will receive results of the worker
+ * function. Can be NULL, in which case no data will be copied.
+ * @param size: size of the memory pointed to by data. It must be no bigger
+ *        than the size of the memory passed to @get_runner for this runner.
+ *        It can be smaller, but no more than @size bytes will be copied.
+ * @returns: @RUNNER_RUNNING, @RUNNER_DONE, @RUNNER_CANCELLED, or @RUNNER_DEAD.
+ */
+int check_runner(struct runner_context *rctx, void *data, unsigned int size);
+
+#endif /* RUNNER_H_INCLUDED */
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/5] libmpathutil: add generic implementation for checker thread runners
  2026-04-23 20:44 ` [PATCH v2 2/5] libmpathutil: add generic implementation for checker thread runners Martin Wilck
@ 2026-04-28  0:20   ` Benjamin Marzinski
  2026-04-28 10:47     ` Martin Wilck
  0 siblings, 1 reply; 11+ messages in thread
From: Benjamin Marzinski @ 2026-04-28  0:20 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Christophe Varoqui, Brian Bunker, dm-devel, Martin Wilck

On Thu, Apr 23, 2026 at 10:44:43PM +0200, Martin Wilck wrote:
> This code adds a generic, abstract implementation of the kind of threads we
> need for checkers: detached threads that may time out. The design is such
> that these threads never access any memory of the calling program,
> simplifying the management of lifetimes of objects.
> 
> See the documentation of the API in "runner.h".
> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  libmpathutil/Makefile             |   2 +-
>  libmpathutil/libmpathutil.version |   7 +-
>  libmpathutil/runner.c             | 221 ++++++++++++++++++++++++++++++
>  libmpathutil/runner.h             | 102 ++++++++++++++
>  4 files changed, 330 insertions(+), 2 deletions(-)
>  create mode 100644 libmpathutil/runner.c
>  create mode 100644 libmpathutil/runner.h
> 
> diff --git a/libmpathutil/Makefile b/libmpathutil/Makefile
> index e1cc2c6..bdd7063 100644
> --- a/libmpathutil/Makefile
> +++ b/libmpathutil/Makefile
> @@ -13,7 +13,7 @@ LIBDEPS += -lpthread -ldl -ludev -L$(mpathcmddir) -lmpathcmd $(SYSTEMD_LIBDEPS)
>  
>  # other object files
>  OBJS := mt-libudev.o parser.o vector.o util.o debug.o time-util.o \
> -	uxsock.o log_pthread.o log.o strbuf.o globals.o msort.o
> +	uxsock.o log_pthread.o log.o strbuf.o globals.o msort.o runner.o
>  
>  all:	$(DEVLIB)
>  
> diff --git a/libmpathutil/libmpathutil.version b/libmpathutil/libmpathutil.version
> index 22ef994..c69120d 100644
> --- a/libmpathutil/libmpathutil.version
> +++ b/libmpathutil/libmpathutil.version
> @@ -43,7 +43,7 @@ LIBMPATHCOMMON_1.0.0 {
>  	put_multipath_config;
>  };
>  
> -LIBMPATHUTIL_5.0 {
> +LIBMPATHUTIL_6.0 {
>  global:
>  	alloc_bitfield;
>  	alloc_strvec;
> @@ -62,6 +62,7 @@ global:
>  	cleanup_vector;
>  	cleanup_vector_free;
>  	convert_dev;
> +	check_runner;
>  	dlog;
>  	filepresent;
>  	fill_strbuf;
> @@ -72,6 +73,8 @@ global:
>  	free_strvec;
>  	get_linux_version_code;
>  	get_monotonic_time;
> +	get_persistent_runner;
> +	get_runner;
>  	get_strbuf_buf__;
>  	get_next_string;
>  	get_strbuf_len;
> @@ -161,7 +164,9 @@ global:
>  	process_file;
>  	pthread_cond_init_mono;
>  	recv_packet;
> +	release_runner;
>  	reset_strbuf;
> +	runner_state_name;
>  	safe_write;
>  	send_packet;
>  	set_max_fds;
> diff --git a/libmpathutil/runner.c b/libmpathutil/runner.c
> new file mode 100644
> index 0000000..f5c5482
> --- /dev/null
> +++ b/libmpathutil/runner.c
> @@ -0,0 +1,221 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +// Copyright (c) 2026 SUSE LLC
> +#include <assert.h>
> +#include <time.h>
> +#include <pthread.h>
> +#include <urcu/uatomic.h>
> +#include "util.h"
> +#include "debug.h"
> +#include "time-util.h"
> +#include "runner.h"
> +
> +#define STACK_SIZE (4 * 1024)
> +#define MILLION 1000000
> +
> +const char *runner_state_name(int state)
> +{
> +	// clang-format off
> +	static const char * const state_name_[] = {
> +		[RUNNER_IDLE] = "idle",
> +		[RUNNER_RUNNING] = "running",
> +		[RUNNER_DONE] = "done",
> +		[RUNNER_CANCELLED] = "cancelled",
> +		[RUNNER_DEAD] = "dead"
> +	};
> +	// clang-format on
> +
> +	if (state < RUNNER_IDLE || state > RUNNER_DEAD)
> +		return "unknown";
> +	return state_name_[state];
> +}
> +
> +struct runner_context {
> +	int refcount;
> +	int status;
> +	struct timespec deadline;
> +	pthread_t thr;
> +	void (*func)(void *data);
> +	/* User data will be copied into this area */
> +	char __attribute__((aligned(sizeof(void *)))) data[];
> +};
> +
> +static void release_context(struct runner_context *rctx)
> +{
> +	int n;
> +
> +	n = uatomic_sub_return(&rctx->refcount, 1);
> +	assert(n >= 0);
> +
> +	if (n == 0)
> +		free(rctx);
> +}
> +
> +static void cleanup_context(struct runner_context **prctx)
> +{
> +	struct runner_context *rctx = *prctx;
> +	int st;
> +
> +	if (!rctx)
> +		return;
> +
> +	st = uatomic_cmpxchg(&rctx->status, RUNNER_RUNNING, RUNNER_DONE);
> +	if (st != RUNNER_RUNNING) {
> +		uatomic_cmpxchg(&rctx->status, st, RUNNER_DEAD);
> +		if (st != RUNNER_CANCELLED)
> +			condlog(2, "%s: runner %p finished in state %s",
> +				__func__, rctx, runner_state_name(st));
> +	}
> +	release_context(rctx);
> +}
> +
> +static void *runner_thread(void *arg)
> +{
> +	int st, refcount;
> +	/*
> +	 * The cleanup function makes sure memory is freed if the thread is
> +	 * cancelled (-fexceptions).
> +	 */
> +	struct runner_context *rctx __attribute__((cleanup(cleanup_context))) = arg;
> +
> +	refcount = uatomic_add_return(&rctx->refcount, 1);
> +	assert(refcount == 2);
> +
> +	st = uatomic_cmpxchg(&rctx->status, RUNNER_IDLE, RUNNER_RUNNING);
> +	/*
> +	 * The thread has already been cancelled before it was even started.
> +	 * In this case, cancel_runner() doesn't release the context.
> +	 * Do it here. See comments for RUNNER_IDLE in cancel_runner().
> +	 */

I'm confused about this, and AFAICS, it doesn't appear to be true. If
you call check_runner() and it returns RUNNER_CANCELLED, you have no
idea if the runner was previously in RUNNER_IDLE, which means that you
cannot touch the context anymore, or if the runner was in
RUNNER_RUNNING, in which case it's your job to release the context.
In the TUR checker, check_runner_state() just calls release_runner(),
which will cause a UAF error if the runner was in RUNNER_IDLE before
it got cancelled.

It seems to me that the correct way to do this would be to set
rctx->refcount to 2 before creating the runner thread, and not even have
a RUNNER_IDLE state. Perhaps I'm just being stupid, but I don't see the
necessity. If you set a thread to deferred cancellation, I don't believe
it can get cancelled between when you create it, and when it hits it's
first cancellation point, so you shouldn't have to worry about getting
cancelled before the runner's cleanup_context handler isset up. I'm
pretty sure that if the thread gets created successfully, it should
always drop its reference count when it exits, regardless of if/when it
gets cancelled. And if you fail to create the thread, you can just drop
the refcount back to 1 before calling release_context() in get_runner().

-Ben

> +	if (st != RUNNER_IDLE) {
> +		release_context(rctx);
> +		return NULL;
> +	}
> +
> +	(*rctx->func)(rctx->data);
> +	return NULL;
> +}
> +
> +static int cancel_runner(struct runner_context *rctx)
> +{
> +	int st, st_new;
> +	int level = 4, retry = 1;
> +
> +repeat:
> +	st = uatomic_cmpxchg(&rctx->status, RUNNER_RUNNING, RUNNER_CANCELLED);
> +	st_new = st;
> +	switch (st) {
> +	case RUNNER_IDLE:
> +		/*
> +		 * Race with thread startup.
> +		 *
> +		 * If after the following cmpxchg st is still IDLE, the cmpxchg
> +		 * in runner_thread() will return CANCELLED, and the context
> +		 * will be relased there. Otherwise, the thread has switched
> +		 * to RUNNING in the meantime, and we will be able to cancel
> +		 * it regularly if we retry.
> +		 */
> +		if (retry--) {
> +			st = uatomic_cmpxchg(&rctx->status, RUNNER_IDLE,
> +					     RUNNER_CANCELLED);
> +			if (st == RUNNER_IDLE)
> +				st_new = RUNNER_CANCELLED;
> +			else
> +				goto repeat;
> +		}
> +		break;
> +	case RUNNER_RUNNING:
> +		pthread_cancel(rctx->thr);
> +		st_new = RUNNER_CANCELLED;
> +		/* fallthrough */
> +	case RUNNER_CANCELLED:
> +		break;
> +	case RUNNER_DONE:
> +		st_new = RUNNER_DEAD;
> +		/* fallthrough */
> +	case RUNNER_DEAD:
> +		level = 3;
> +		break;
> +	}
> +	condlog(level, "%s: runner %p cancelled in state '%s'", __func__, rctx,
> +		runner_state_name(st));
> +	return st_new;
> +}
> +
> +void release_runner(struct runner_context *rctx)
> +{
> +	cancel_runner(rctx);
> +	release_context(rctx);
> +}
> +
> +int check_runner(struct runner_context *rctx, void *data, unsigned int size)
> +{
> +	int st = uatomic_read(&rctx->status);
> +
> +	switch (st) {
> +	case RUNNER_DONE:
> +		if (data)
> +			/* hand back the data to the caller */
> +			memcpy(data, rctx->data, size);
> +		/* fallthrough */
> +	case RUNNER_DEAD:
> +	case RUNNER_CANCELLED:
> +		return st;
> +	case RUNNER_IDLE:
> +	case RUNNER_RUNNING:
> +		if (rctx->deadline.tv_sec != 0 || rctx->deadline.tv_nsec != 0) {
> +			struct timespec now;
> +
> +			get_monotonic_time(&now);
> +			if (timespeccmp(&rctx->deadline, &now) <= 0)
> +				return cancel_runner(rctx);
> +		}
> +		/* don't bother the caller with RUNNER_IDLE */
> +		return RUNNER_RUNNING;
> +	default:
> +		condlog(1, "%s: runner in impossible state '%s'", __func__,
> +			runner_state_name(st));
> +		assert(false);
> +		return st;
> +	}
> +}
> +
> +struct runner_context *get_runner(runner_func func, void *data,
> +				  unsigned int size, unsigned long timeout_usec)
> +{
> +	static const struct timespec time_zero = {.tv_sec = 0};
> +	struct runner_context *rctx;
> +	pthread_attr_t attr;
> +	int rc;
> +
> +	if (!func || !data || size <= 0) {
> +		condlog(0, "%s: illegal arguments", __func__);
> +		return NULL;
> +	}
> +
> +	rctx = malloc(sizeof(*rctx) + size);
> +	if (!rctx)
> +		return NULL;
> +
> +	rctx->func = func;
> +	uatomic_set(&rctx->refcount, 1);
> +	uatomic_set(&rctx->status, RUNNER_IDLE);
> +	memcpy(rctx->data, data, size);
> +
> +	if (timeout_usec) {
> +		get_monotonic_time(&rctx->deadline);
> +		rctx->deadline.tv_sec += timeout_usec / MILLION;
> +		rctx->deadline.tv_nsec += (timeout_usec % MILLION) * 1000;
> +	} else
> +		rctx->deadline = time_zero;
> +
> +	setup_thread_attr(&attr, STACK_SIZE, 1);
> +	rc = pthread_create(&rctx->thr, &attr, runner_thread, rctx);
> +	pthread_attr_destroy(&attr);
> +
> +	if (rc) {
> +		condlog(1, "%s: pthread_create(): %s", __func__, strerror(rc));
> +		release_context(rctx);
> +		return NULL;
> +	}
> +	return rctx;
> +}
> diff --git a/libmpathutil/runner.h b/libmpathutil/runner.h
> new file mode 100644
> index 0000000..cb2ad94
> --- /dev/null
> +++ b/libmpathutil/runner.h
> @@ -0,0 +1,102 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +// Copyright (c) 2026 SUSE LLC
> +#ifndef RUNNER_H_INCLUDED
> +#define RUNNER_H_INCLUDED
> +
> +enum runner_status {
> +	/**
> +	 * Initial state. Runner thread has not started yet.
> +	 */
> +	RUNNER_IDLE,
> +	/**
> +	 * The runner thread is running in @runner_func (see below).
> +	 */
> +	RUNNER_RUNNING,
> +	/**
> +	 * @runner_func has terminated. This is the only only state
> +	 * in which @check_runner can obtain user data in the @data
> +	 * parameter.
> +	 */
> +	RUNNER_DONE,
> +	/**
> +	 * The runner thread has been cancelled (usually because of a timeout),
> +	 * but @runner_func is still running.
> +	 */
> +	RUNNER_CANCELLED,
> +	/**
> +	 * The runner thread has terminated without providing user data
> +	 * (usually after a timeout).
> +	 */
> +	RUNNER_DEAD,
> +};
> +
> +typedef void (*runner_func)(void *data);
> +struct runner_context;
> +
> +/**
> + * runner_state_name(): helper for printing runner states
> + *
> + * @param state: a valid @runner_status value
> + * @returns: a string describing the status
> + */
> +const char *runner_state_name(int state);
> +
> +/**
> + * get_runner(): start a runner thread
> + *
> + * This function starts a runner thread that calls @func(@data).
> + * The thread is created as detached thread.
> + * @data will be copied to thread-private memory, which will be freed when
> + * the runner terminates.
> + * Output values can be retrieved with @check_runner().
> + *
> + * @param func: the worker function to invoke
> + *        This parameter must not be NULL.
> + * @param data: pointer the data to pass to the function (input and output)
> + *        This parameter must not be NULL.
> + * @param size: the size (in bytes) of the data passed to the function
> + *        This parameter must be positive.
> + * @param timeout_usec: timeout (in microseconds) after which to cancel the
> + * runner. If it is 0, the runner will not time out.
> + * @returns: a runner context that must be passed to the functions below.
> + */
> +struct runner_context *get_runner(runner_func func, void *data,
> +				  unsigned int size, unsigned long timeout_usec);
> +
> +/**
> + * release_runner(): release a runner context
> + *
> + * This function should be called when the caller has no interest to
> + * operate on the given @runner_context any more.
> + * If necessary, the runner thread will be cancelled.
> + * Upon return of this function, @rctx becomes stale and shouldn't accessed
> + * any more.
> + *
> + * @param rctx: the context of the runner to be released.
> + */
> +void release_runner(struct runner_context *rctx);
> +
> +/**
> + * check_runner(): query the state of a runner thread and obtain results
> + *
> + * Check the state of a runner previously started with @get_runner. If the
> + * thread function has completed, @RUNNER_DONE will be returned, and the
> + * user data will be copied into the @data argument. If @check_runner returns
> + * anything else, @data contains no valid data. The @size argument
> + * will typically be the same as the @size passed to @get_runner (meaning
> + * that @data represents an object of the same type as the @data argument
> + * passed to @get_runner previously).
> + *
> + * Side effect: If the runner has timed out, the thread will be cancelled.
> + *
> + * @param rctx: the context of the runner to be queried
> + * @param data: memory pointer that will receive results of the worker
> + * function. Can be NULL, in which case no data will be copied.
> + * @param size: size of the memory pointed to by data. It must be no bigger
> + *        than the size of the memory passed to @get_runner for this runner.
> + *        It can be smaller, but no more than @size bytes will be copied.
> + * @returns: @RUNNER_RUNNING, @RUNNER_DONE, @RUNNER_CANCELLED, or @RUNNER_DEAD.
> + */
> +int check_runner(struct runner_context *rctx, void *data, unsigned int size);
> +
> +#endif /* RUNNER_H_INCLUDED */
> -- 
> 2.53.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/5] libmpathutil: add generic implementation for checker thread runners
  2026-04-28  0:20   ` Benjamin Marzinski
@ 2026-04-28 10:47     ` Martin Wilck
  0 siblings, 0 replies; 11+ messages in thread
From: Martin Wilck @ 2026-04-28 10:47 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: Christophe Varoqui, Brian Bunker, dm-devel

Hi Ben,

On Mon, 2026-04-27 at 20:20 -0400, Benjamin Marzinski wrote:
> On Thu, Apr 23, 2026 at 10:44:43PM +0200, Martin Wilck wrote:
> > This code adds a generic, abstract implementation of the kind of
> > threads we
> > need for checkers: detached threads that may time out. The design
> > is such
> > that these threads never access any memory of the calling program,
> > simplifying the management of lifetimes of objects.
> > 
> > See the documentation of the API in "runner.h".
> > 
> > Signed-off-by: Martin Wilck <mwilck@suse.com>
> > ---
> >  libmpathutil/Makefile             |   2 +-
> >  libmpathutil/libmpathutil.version |   7 +-
> >  libmpathutil/runner.c             | 221
> > ++++++++++++++++++++++++++++++
> >  libmpathutil/runner.h             | 102 ++++++++++++++
> >  4 files changed, 330 insertions(+), 2 deletions(-)
> >  create mode 100644 libmpathutil/runner.c
> >  create mode 100644 libmpathutil/runner.h
> > 
> > 
[...]
> > diff --git a/libmpathutil/runner.c b/libmpathutil/runner.c
> > new file mode 100644
> > index 0000000..f5c5482
> > --- /dev/null
> > +++ b/libmpathutil/runner.c
[...]
> > +static void *runner_thread(void *arg)
> > +{
> > +	int st, refcount;
> > +	/*
> > +	 * The cleanup function makes sure memory is freed if the
> > thread is
> > +	 * cancelled (-fexceptions).
> > +	 */
> > +	struct runner_context *rctx
> > __attribute__((cleanup(cleanup_context))) = arg;
> > +
> > +	refcount = uatomic_add_return(&rctx->refcount, 1);
> > +	assert(refcount == 2);
> > +
> > +	st = uatomic_cmpxchg(&rctx->status, RUNNER_IDLE,
> > RUNNER_RUNNING);
> > +	/*
> > +	 * The thread has already been cancelled before it was
> > even started.
> > +	 * In this case, cancel_runner() doesn't release the
> > context.
> > +	 * Do it here. See comments for RUNNER_IDLE in
> > cancel_runner().
> > +	 */
> 
> I'm confused about this, and AFAICS, it doesn't appear to be true. If
> you call check_runner() and it returns RUNNER_CANCELLED, you have no
> idea if the runner was previously in RUNNER_IDLE, which means that
> you
> cannot touch the context anymore, or if the runner was in
> RUNNER_RUNNING, in which case it's your job to release the context.
> In the TUR checker, check_runner_state() just calls release_runner(),
> which will cause a UAF error if the runner was in RUNNER_IDLE before
> it got cancelled.

Right, good catch, thanks for finding it. This was a remnant of the
previous code, where the caller didn't have to release the context
after RUNNER_CANCELLED. 

Releasing the context here is just wrong, I'll drop the line.

I had expected my tests to capture an UAF like this, but I made a dumb
mistake: I used a timeout of 0 in my test code, which means "infinite"
timeout for the runner, and thus my first test case, which was intended
to capture thus this kind of issue, was ineffective. I will change my
test program now to use 1us timeout if "-t 0" is specified. I will also
add a compile-time option to artificially delay thread startup in order
to test this.

> It seems to me that the correct way to do this would be to set
> rctx->refcount to 2 before creating the runner thread, and not even
> have
> a RUNNER_IDLE state. 

Correct, too. I was totally naïve to rely on the thread itself
increasing the refcount if I allow the thread to be cancelled before
it even starts :-/

However, I think I'll keep RUNNER_IDLE state around. It doesn't hurt
and it provides possible insight into whether or not the runner
function was actually called.

> Perhaps I'm just being stupid, but I don't see the
> necessity. If you set a thread to deferred cancellation, I don't
> believe
> it can get cancelled between when you create it, and when it hits
> it's
> first cancellation point, so you shouldn't have to worry about
> getting
> cancelled before the runner's cleanup_context handler isset up. I'm
> pretty sure that if the thread gets created successfully, it should
> always drop its reference count when it exits, regardless of if/when
> it
> gets cancelled. And if you fail to create the thread, you can just
> drop
> the refcount back to 1 before calling release_context() in
> get_runner().

I believe that it's possible for pthread_create() to succeed and the
program to terminate before the thread is started. In which case the
thread's cleanup code won't be set up, and the context won't be freed.
But we explicitly don't want to wait for these threads, so I think
that's a cosmetic problem in a corner case that we don't need to bother
with.

Thanks,
Martin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 3/5] multipath-tools tests: add test program for thread runners
  2026-04-23 20:44 [PATCH v2 0/5] multipath-tools: generic async threads for TUR checker Martin Wilck
  2026-04-23 20:44 ` [PATCH v2 1/5] multipathd: get_new_state: map PATH_TIMEOUT to PATH_DOWN Martin Wilck
  2026-04-23 20:44 ` [PATCH v2 2/5] libmpathutil: add generic implementation for checker thread runners Martin Wilck
@ 2026-04-23 20:44 ` Martin Wilck
  2026-04-28  0:24   ` Benjamin Marzinski
  2026-04-23 20:44 ` [PATCH v2 4/5] libmultipath: TUR checker: use runner threads Martin Wilck
  2026-04-23 20:44 ` [PATCH v2 5/5] libmultipath: tur checker: improve tur_deep_sleep() test Martin Wilck
  4 siblings, 1 reply; 11+ messages in thread
From: Martin Wilck @ 2026-04-23 20:44 UTC (permalink / raw)
  To: Christophe Varoqui, Benjamin Marzinski, Brian Bunker, dm-devel
  Cc: Martin Wilck

Add a test program for the "runner" thread implementation from the previous
commit. The test program runs simulated "hanging" threads that may time
out, and optionally kills all threads at an arbitrary point in time. See
the comments at the top of the file for details.

Also add a test driver script (runner-test.sh) with a few reasonable
combinations of command line arguments.

The test program has been used to test the "runner" implementation
extensively on different architectures (x86_64, aarch64, ppc64le, s390x),
using both valgrind and the gcc address sanitizer (libasan) for detection
of memory leaks and use-after-free errors.

For valgrind, a suppression file needs to be added, as valgrind doesn't
seem to capture the deallocation of thread local storage for detached
threads in the test case where the test program is killed. The suppression
affects only memory allocated by glibc. This leak has not been seen with
libasan, only with valgrind.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 tests/Makefile         |  15 +-
 tests/runner-test.sh   |  55 ++++
 tests/runner-test.supp |  15 ++
 tests/runner.c         | 552 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 633 insertions(+), 4 deletions(-)
 create mode 100755 tests/runner-test.sh
 create mode 100644 tests/runner-test.supp
 create mode 100644 tests/runner.c

diff --git a/tests/Makefile b/tests/Makefile
index 9f1b950..f0e5dd3 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -9,12 +9,12 @@ CFLAGS += $(BIN_CFLAGS) -Wno-unused-parameter -Wno-unused-function $(W_MISSING_I
 LIBDEPS += -L. -L $(mpathutildir) -L$(mpathcmddir) -lmultipath -lmpathutil -lmpathcmd -lcmocka
 
 TESTS := uevent parser util dmevents hwtable blacklist unaligned vpd pgpolicy \
-	 alias directio valid devt mpathvalid strbuf sysfs features cli mapinfo
+	 alias directio valid devt mpathvalid strbuf sysfs features cli mapinfo runner
 HELPERS := test-lib.o test-log.o
 
 .PRECIOUS: $(TESTS:%=%-test)
 
-all:	$(TESTS:%=%.out)
+all:	$(TESTS:%=%.out) runner-test.out
 progs:	$(TESTS:%=%-test) lib/libchecktur.so
 valgrind:	$(TESTS:%=%.vgr)
 
@@ -65,6 +65,7 @@ features-test_LIBDEPS := -ludev -lpthread
 features-test_OBJDEPS := $(mpathutildir)/mt-libudev.o
 cli-test_OBJDEPS := $(daemondir)/cli.o
 mapinfo-test_LIBDEPS = -lpthread -ldevmapper
+runner-test_LIBDEPS = -lpthread
 
 %.o: %.c
 	@echo building $@ because of $?
@@ -81,12 +82,18 @@ lib/libchecktur.so:
 %.vgr:  %-test lib/libchecktur.so
 	@echo == running valgrind for $< ==
 	@LD_LIBRARY_PATH=.:$(mpathutildir):$(mpathcmddir) \
-		valgrind --leak-check=full --error-exitcode=128 ./$< >$@ 2>&1
+		valgrind --leak-check=full --error-exitcode=128 --suppressions=./runner-test.supp ./$< >$@ 2>&1
 
 OBJS = $(TESTS:%=%.o) $(HELPERS)
 
+runner-test.out: runner-test
+	$(Q)./runner-test.sh >$@ 2>&1
+
+runner-test.vgr: runner-test
+	$(Q)VALGRIND=1 ./runner-test.sh >$@ 2>&1
+
 test_clean:
-	$(Q)$(RM) $(TESTS:%=%.out) $(TESTS:%=%.vgr) *.so*
+	$(Q)$(RM) $(TESTS:%=%.out) $(TESTS:%=%.vgr) *.so* runner-test.out runner-test.vgr
 
 valgrind_clean:
 	$(Q)$(RM) $(TESTS:%=%.vgr)
diff --git a/tests/runner-test.sh b/tests/runner-test.sh
new file mode 100755
index 0000000..7c88bdd
--- /dev/null
+++ b/tests/runner-test.sh
@@ -0,0 +1,55 @@
+#! /bin/sh
+: "${MPATHTEST_VERBOSITY:=2}"
+
+export LD_LIBRARY_PATH=../libmultipath:../libmpathutil:../libmpathcmd
+export MPATHTEST_VERBOSITY
+
+RUNNER=./runner-test
+if [ "$VALGRIND" ]; then
+    command -v valgrind >/dev/null && \
+	RUNNER="valgrind --leak-check=full --error-exitcode=128 --max-threads=5000 --suppressions=./runner-test.supp ./runner-test"
+fi
+
+# LSAN is not supported on ppc64le
+case $(uname -m) in
+    x86_64|aarch64)
+	export ASAN_OPTIONS="detect_leaks=1:detect_odr_violation=0"
+	export LSAN_OPTIONS="report_objects=1"
+	;;
+esac
+
+LONG=
+while [ $# -gt 0 ]; do
+    case $1 in
+	-l) LONG=1;;
+    esac
+    shift
+done
+
+if [ "$LONG" ]; then
+    TIME1=30000
+    TIME2=23000
+else
+    TIME1=7500
+    TIME2=4700
+fi
+
+# Test scenarios
+# 1.. timeout 0 - test runner creation / cancellation races
+# 2.. "realistic" test, scaled down by a factor 10 in time
+# 3./4. Tests with high likelihood of completion / cancellation race
+set -- \
+    "-N 100 -p 1 -t 0 -n 2 -b 1 -s 1 -i -r 20" \
+    "-N 1000 -p 100 -t 3000 -n 2999 -b 5 -s 1 -i -r 20 -k $TIME1" \
+    "-N 1000 -p 10 -t 3000 -n 1 -b 1 -s 1 -i -r 20 -k $TIME2" \
+    "-N 100 -p 1 -t 3000 -n 0 -s 1 -i -r 5"
+
+errors=0
+for args in "$@"; do
+    echo "=== $RUNNER $args"
+    # shellcheck disable=SC2086
+    $RUNNER $args || errors=$((errors+1))
+done
+
+echo "$0: ERRORS: $errors"
+[ $errors -eq 0 ]
diff --git a/tests/runner-test.supp b/tests/runner-test.supp
new file mode 100644
index 0000000..e099ec9
--- /dev/null
+++ b/tests/runner-test.supp
@@ -0,0 +1,15 @@
+{
+   glibc TLS for detached threads if program is terminated
+   Memcheck:Leak
+   match-leak-kinds: possible
+   fun:calloc
+   ...
+   fun:allocate_dtv
+   fun:_dl_allocate_tls
+   fun:allocate_stack
+   ...
+   fun:get_runner
+   fun:start_runner
+   ...
+   fun:main
+}
diff --git a/tests/runner.c b/tests/runner.c
new file mode 100644
index 0000000..ff26ce0
--- /dev/null
+++ b/tests/runner.c
@@ -0,0 +1,552 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+/*
+ * Test reliability of the runner implementation.
+ *
+ * This tests simulates "path checkers" being started through the
+ * runner code. It creates threads that run for a variable amount of time,
+ * optionally ignoring cancellation signals. The runners have a fixed
+ * timeout (TIMEOUT_USEC, "-t" option), after which they are considered
+ * "hanging" and will be cancelled. The actual runtime of the runner is random.
+ * It varies between (TIMEOUT_USEC - NOISE USEC) and
+ * (TIMEOUT_USEC + NOISE_BIAS * NOISE_USEC). These noise parameters are
+ * set with the "-n" and "-b" option, respectively.
+ * This allows simulating frequent races between cancellation and regular
+ * completion of threads. Note that because timers in different threads
+ * aren't started simultaneously, NOISE_USEC = 0 doesn't mean that the
+ * runners will complete exactly at the point in time when there timer
+ * expires.
+ * The runners simulate waiting checkers by simply sleeping. Optionally,
+ * they can ignore cancellation while sleeping (IGNORE_CANCEL, -i) or
+ * divide the sleep time in multiple "steps" (-s), between which they can
+ * be cancelled.
+ * Like multipathd, the main thread "polls" the status of the runners in
+ * regular time intervals that are set with POLL_USEC (-p). Because of
+ * this polling behavior, it can happen that a runner finishes after
+ * its timeout has expired. The runner code (and this test) treats this case
+ * as successful completion.
+ * If a runner completes, the result is checked against the expected value.
+ * The number of threads that have not finished (either successfully or
+ * cancelled, plus the number of wrong results of completed runners is
+ * the error count.
+ * The N_RUNNERS (-N) option determines how many simultaneous threads are
+ * started.
+ * The test runs until all runners have either completed or expired, or
+ * until a maximum wait time is reached, which is calculated from the
+ * test parameters (max_wait in run_test()). The REPEAT (-r) parameter
+ * determines the number of times the entire test is repeated.
+ * The KILL_TIMEOUT (-k) parameter is for simulating a shutdown of the
+ * main program (think multipathd). When this timeout expires, all pending
+ * runners are cancelled and the program terminates.
+ *
+ * A "realistic" simulation of multipathd path checkers would use options
+ * roughly like this:
+ *
+ *   runner-test -N 1000 -p 1000 -t 30000 -n 29990 -b 5 -s 1 -i -r 20 -k 300000
+ *
+ * (note that time options are in ms, whereas the code uses us), but this
+ * takes a very long time to run.
+ *
+ * Scaled down, it becomes:
+ *
+ *   runner-test -N 1000 -p 100 -t 3000 -n 2999 -b 5 -s 1 -i -r 20 -k 30000
+ *
+ * A less realistic run with high likelihood of completionc / cancellation races:
+ *
+ *   runner-test -N 1000 -p 10 -t 3000 -n 1 -b 1 -s 1 -i -r 20
+ *
+ * Here, all runners finish in a +-1ms timer interval around the timeout.
+ * Even with -n 0 (no noise), with a sufficient number of runners, some runners
+ * will time out.
+ */
+
+#include <time.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <signal.h>
+#include <pthread.h>
+#include <stdbool.h>
+#include <sched.h>
+#include <errno.h>
+#include <sys/select.h>
+#include <sys/wait.h>
+#include "debug.h"
+#include "time-util.h"
+#include "runner.h"
+#include "runner.h"
+#include "globals.c"
+
+#define MILLION 1000000
+#define BILLION 1000000000
+
+static int N_RUNNERS = 100;
+/* sleep time between runner status polling */
+static long POLL_USEC = 10000;
+/* timeout for runner */
+static long TIMEOUT_USEC = 100000;
+/* random noise to subtract / add to the sleep time */
+static long NOISE_USEC = 10000;
+/*
+ * Factor to increase noise towards longer sleep times (timeouts).
+ * The actual sleep time will be in the interval
+ * [ TIMEOUT_USEC - NOISE_USEC, TIMEOUT_USEC + NOISE_USEC * NOISE_BIAS ]
+ */
+static int NOISE_BIAS = 5;
+/* number of sleep intervals the runner uses */
+static int SLEEP_STEPS = 1;
+/* time after which to kill all runners */
+static long KILL_TIMEOUT = 0;
+/* number of repeated runs */
+static int REPEAT = 10;
+/* whether to ignore cancellation signals */
+static bool IGNORE_CANCEL = false;
+
+/* gap in the paylod to similate larger size */
+#define PAYLOAD_GAP 128
+
+struct payload {
+	long wait_nsec;
+	int steps;
+	bool ignore_cancel;
+	int start;
+	char pad[PAYLOAD_GAP];
+	int end;
+};
+
+static void wait_and_add_1(void *arg)
+{
+	struct payload *t1 = arg;
+	struct timespec wait;
+	int i, cancelstate;
+
+	wait.tv_sec = t1->wait_nsec / BILLION;
+	wait.tv_nsec = t1->wait_nsec % BILLION;
+	normalize_timespec(&wait);
+	for (i = 0; i < t1->steps; i++) {
+		if (t1->ignore_cancel)
+			pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, &cancelstate);
+		if (nanosleep(&wait, NULL) != 0 && errno != EINTR)
+			condlog(3, "%s: nanosleep: %s", __func__, strerror(errno));
+		if (t1->ignore_cancel) {
+			pthread_setcancelstate(cancelstate, NULL);
+		}
+		pthread_testcancel();
+	}
+	t1->end = t1->start + 1;
+}
+
+static bool payload_error(const struct payload *p)
+{
+	return p->end != p->start + 1;
+}
+
+static int check_payload(struct runner_context *rctx, bool *error)
+{
+	struct timespec now;
+	struct payload t1;
+	int st = check_runner(rctx, &t1, sizeof(t1));
+
+	if (st == RUNNER_RUNNING || st == RUNNER_IDLE)
+		return st;
+
+	clock_gettime(CLOCK_MONOTONIC, &now);
+	if (st == RUNNER_DONE) {
+		int level = 4;
+
+		if (error) {
+			*error = payload_error(&t1);
+			if (*error)
+				level = 2;
+		}
+		condlog(level,
+			"runner %p finished in state 'done' at %lld.%06lld, start %d end %d",
+			rctx, (long long)now.tv_sec,
+			(long long)now.tv_nsec / 1000, t1.start, t1.end);
+	} else
+		condlog(4, "runner %p finished in state '%s' at %lld.%06lld",
+			rctx, runner_state_name(st), (long long)now.tv_sec,
+			(long long)now.tv_nsec / 1000);
+	return st;
+}
+
+static struct runner_context *
+start_runner(long usecs, int start, long noise_range_usec, long bias,
+	     int steps, bool ignore_cancel)
+{
+
+	struct payload t1;
+	struct runner_context *rctx;
+	long noise;
+
+	if (noise_range_usec > 0)
+		noise = random() % ((bias + 1) * noise_range_usec) -
+			noise_range_usec;
+	else
+		noise = 0;
+	t1.start = start;
+	t1.end = 0;
+	t1.wait_nsec = (usecs + noise) * 1000 / steps;
+	t1.wait_nsec = t1.wait_nsec > 0 ? t1.wait_nsec : 0;
+	t1.steps = steps;
+	t1.ignore_cancel = ignore_cancel;
+
+	rctx = get_runner(wait_and_add_1, &t1, sizeof(t1), usecs);
+
+	if (rctx) {
+		struct timespec tmo, finish;
+
+		clock_gettime(CLOCK_MONOTONIC, &tmo);
+		tmo.tv_sec += usecs / MILLION;
+		tmo.tv_nsec += (usecs % MILLION) * 1000;
+		finish = tmo;
+		normalize_timespec(&tmo);
+		finish.tv_sec += noise / MILLION;
+		finish.tv_nsec += (noise % MILLION) * 1000;
+		normalize_timespec(&finish);
+		condlog(4, "started runner start %d timeout %lld.%06lld, finish %lld.%06lld (noise %ld), steps %d, %signoring cancellation",
+			start, (long long)tmo.tv_sec,
+			(long long)tmo.tv_nsec / 1000, (long long)finish.tv_sec,
+			(long long)finish.tv_nsec / 1000, noise, steps,
+			ignore_cancel ? "" : "not ");
+		return rctx;
+	} else {
+		condlog(0, "failed to start runner for start %d", start);
+		return NULL;
+	}
+}
+
+static struct runner_context **context;
+static volatile bool must_stop = false;
+void int_handler(int signal)
+{
+	must_stop = true;
+}
+
+static void terminate_all(void)
+{
+	int i, count;
+
+	for (count = 0, i = 0; i < N_RUNNERS; i++)
+		if (context[i]) {
+			release_runner(context[i]);
+			context[i] = NULL;
+			count++;
+		}
+	condlog(3, "%s: %d runners released", __func__, count);
+	/* give runners a chance to clean up */
+	sched_yield();
+}
+
+static bool test_sleep(const struct timespec *wait)
+{
+	sigset_t set;
+
+	sigfillset(&set);
+	sigdelset(&set, SIGTERM);
+	sigdelset(&set, SIGINT);
+	sigdelset(&set, SIGQUIT);
+
+	pselect(0, NULL, NULL, NULL, wait, &set);
+
+	if (!must_stop)
+		return false;
+
+	terminate_all();
+	return true;
+}
+
+static int run_test(int n)
+{
+	int i, running, done, errors;
+	const struct timespec wait = {.tv_sec = 0, .tv_nsec = 1000 * POLL_USEC};
+	struct timespec stop, now;
+	long max_wait = TIMEOUT_USEC + NOISE_BIAS * NOISE_USEC + 100000;
+	bool killed = false;
+
+	for (i = 0; i < N_RUNNERS; i++)
+		context[i] = start_runner(TIMEOUT_USEC, i, NOISE_USEC, NOISE_BIAS,
+					  SLEEP_STEPS, IGNORE_CANCEL);
+
+	clock_gettime(CLOCK_MONOTONIC, &stop);
+	stop.tv_sec += max_wait / MILLION;
+	stop.tv_nsec += (max_wait % MILLION) * 1000;
+	normalize_timespec(&stop);
+	running = N_RUNNERS;
+	done = 0;
+	errors = 0;
+	do {
+		bool err = false, last = false;
+
+		killed = test_sleep(&wait);
+		if (killed)
+			condlog(3, "%s: terminating on signal...", __func__);
+
+		clock_gettime(CLOCK_MONOTONIC, &now);
+		if (timespeccmp(&stop, &now) <= 0)
+			last = true;
+
+		for (running = 0, i = 0; i < N_RUNNERS; i++) {
+			int st;
+
+			if (!context[i])
+				continue;
+			st = check_payload(context[i], &err);
+			switch (st) {
+			case RUNNER_DONE:
+				if (err)
+					errors++;
+				done++;
+				/* fallthrough */
+			case RUNNER_DEAD:
+				release_runner(context[i]);
+				context[i] = NULL;
+				break;
+			case RUNNER_CANCELLED:
+				if (last) {
+					condlog(3, "%s: releasing %p",
+						__func__, context[i]);
+					release_runner(context[i]);
+					context[i] = NULL;
+				} else
+					running++;
+				break;
+			default:
+				running++;
+				if (last)
+					condlog(1, "%s: found thread in state %s, rctx=%p",
+						__func__, runner_state_name(st),
+						context[i]);
+				break;
+			}
+		}
+		if (killed || last)
+			break;
+	} while (running);
+
+	condlog(2, "%10d%10d%10d%10d%10d", n, N_RUNNERS, N_RUNNERS - running,
+		done, errors);
+
+	if (killed) {
+		condlog(2, "%s: termination signal received", __func__);
+		exit(0);
+	}
+
+	if (running > 0) {
+		condlog(1, "ERROR: %d runners haven't finished", running);
+		terminate_all();
+	}
+	return running + errors;
+}
+
+static void free_rctxs(struct runner_context ***rctxs)
+{
+	if (*rctxs)
+		free(*rctxs);
+}
+
+static int setup_signal_handler(int sig, void (*handler)(int))
+{
+	sigset_t set;
+	sigfillset(&set);
+	struct sigaction sga = {.sa_handler = NULL};
+
+	sga.sa_handler = handler;
+	sga.sa_mask = set;
+	if (sigaction(sig, &sga, NULL) != 0) {
+		condlog(1, "%s: failed to install signal handler for %d: %s",
+			__func__, sig, strerror(errno));
+		return -1;
+	}
+	return 0;
+}
+
+int run_tests(void)
+{
+	int errors = 0, i;
+	struct runner_context **rctxs __attribute__((cleanup(free_rctxs))) = NULL;
+
+	if (setup_signal_handler(SIGINT, int_handler) != 0)
+		return -1;
+
+	rctxs = calloc(N_RUNNERS, sizeof(*context));
+	if (rctxs == NULL)
+		/* arbitrary number to indicate OOM error */
+		return 7000;
+	context = rctxs;
+	for (i = 0; i < REPEAT; i++) {
+		errors += run_test(i + 1);
+	}
+	return errors ? 1 : 0;
+}
+
+/* We need to register a dummy handler to avoid system call restarting in
+ * pselect() below */
+static void dummy_handler(int sig) {}
+
+static int fork_test(void)
+{
+	sigset_t set;
+	pid_t child;
+	int wstatus;
+	struct timespec wait_to_kill = {.tv_sec = 0};
+
+	/* Block all signals. termination signals will be enabled in test_sleep() */
+	sigfillset(&set);
+	pthread_sigmask(SIG_SETMASK, &set, NULL);
+
+	child = fork();
+
+	if (child < 0) {
+		condlog(0, "error in fork(), %s", strerror(errno));
+		return -1;
+	} else if (child == 0) {
+		/* child */
+		int rc = run_tests();
+		exit(rc ? 1 : 0);
+	}
+
+	setup_signal_handler(SIGCHLD, dummy_handler);
+
+	/* parent */
+	if (KILL_TIMEOUT > 0) {
+		sigset_t set;
+
+		condlog(3, "%s: == Child %d will be killed with SIGINT after %ld us",
+			__func__, child, KILL_TIMEOUT);
+		wait_to_kill.tv_sec = KILL_TIMEOUT / MILLION;
+		wait_to_kill.tv_nsec = (KILL_TIMEOUT % MILLION) * 1000;
+
+		/*
+		 * Unblock SIGCHLD in case thild terminates
+		 * (child will receive SIGINT)
+		 */
+		sigfillset(&set);
+		sigdelset(&set, SIGCHLD);
+		if (pselect(0, NULL, NULL, NULL, &wait_to_kill, &set) != 0) {
+			if (errno == EINTR)
+				condlog(2, "main: child terminated");
+			else
+				condlog(1, "main: error in pselect: %s",
+					strerror(errno));
+		} else
+			kill(child, SIGINT);
+	}
+
+	if (waitpid(child, &wstatus, 0) <= 0) {
+		condlog(1, "%s: failed to wait for child %d", __func__, child);
+		return -1;
+	}
+	if (WIFEXITED(wstatus)) {
+		condlog(3, "%s: child %d return code %d", __func__, child,
+			WEXITSTATUS(wstatus));
+		return WEXITSTATUS(wstatus);
+	} else if (WIFSIGNALED(wstatus)) {
+		condlog(2, "%s: child %d killed by signal code %d", __func__,
+			child, WTERMSIG(wstatus));
+		return -1;
+	} else {
+		condlog(1, "%s: unexpected status of child %d", __func__, child);
+		return -1;
+	}
+}
+
+static long parse_number(const char *arg, long factor, long deflt)
+{
+	char *ep;
+	long v;
+
+	if (*arg) {
+		v = strtol(arg, &ep, 10);
+		if (!*ep && v >= 0)
+			return factor * v;
+	}
+	condlog(1, "invalid argument: %s, using %ld", arg, deflt);
+	return deflt;
+}
+
+static int usage(const char *cmd, int opt)
+{
+#define USAGE_FMT \
+	"Usage: %s [options]\n" \
+		"	-N runners:  number of parallel runners\n" \
+		"	-p msecs:    time to sleep between status polls in main thread\n" \
+		"	-t msecs:    timeout for runners\n" \
+		"	-n msecs:    random noise for runner sleep time\n" \
+		"	-b bias:     noise increase factor towards sleeping longer\n" \
+		"	-s n:        number of steps to divide sleep time into\n" \
+		"	-k msecs:    timeout after which to kill all runners (0: don't kill)\n" \
+		"	-r n:        number of times to repeat test\n" \
+		"	-i:          runners ignore cancellation while sleeping\n" \
+		"	-v n:        set verbosity level (default 2)\n" \
+		"	-h:          print this help"
+	condlog(0, USAGE_FMT, cmd);
+	return opt == 'h' ? 0 : 1;
+}
+
+int main(int argc, char *argv[])
+{
+	int opt;
+	int total = 0;
+	const char *optstring = "+:N:p:t:n:b:s:k:r:v:ih";
+
+	init_test_verbosity(2);
+
+	while ((opt = getopt(argc, argv, optstring)) != -1) {
+		switch (opt) {
+		case 'N':
+			N_RUNNERS = parse_number(optarg, 1L, N_RUNNERS);
+			break;
+		case 'p':
+			POLL_USEC = parse_number(optarg, 1000L, POLL_USEC);
+			break;
+		case 't':
+			TIMEOUT_USEC = parse_number(optarg, 1000L, TIMEOUT_USEC);
+			break;
+		case 'n':
+			NOISE_USEC = parse_number(optarg, 1000L, NOISE_USEC);
+			break;
+		case 'b':
+			NOISE_BIAS = parse_number(optarg, 1L, NOISE_BIAS);
+			break;
+		case 's':
+			SLEEP_STEPS = parse_number(optarg, 1L, SLEEP_STEPS);
+			break;
+		case 'k':
+			KILL_TIMEOUT = parse_number(optarg, 1000L, KILL_TIMEOUT);
+			break;
+		case 'r':
+			REPEAT = parse_number(optarg, 1L, REPEAT);
+			break;
+		case 'i':
+			IGNORE_CANCEL = true;
+			break;
+		case 'v':
+			libmp_verbosity = parse_number(optarg, 1L, libmp_verbosity);
+			break;
+		case 'h':
+		case ':':
+		case '?':
+			return usage(argv[0], opt);
+			break;
+		}
+	}
+
+	if (optind != argc)
+		return usage(argv[0], '?');
+
+	condlog(2, "Runner: timeout=%ld, noise interval=[-%ld:%ld], steps=%d",
+		TIMEOUT_USEC, TIMEOUT_USEC - NOISE_USEC,
+		TIMEOUT_USEC + NOISE_BIAS * NOISE_USEC, SLEEP_STEPS);
+	condlog(2, "Other : poll interval=%ld, ignore cancellation=%s, runners=%d, repeat=%d, kill timeout=%ld",
+		POLL_USEC, IGNORE_CANCEL ? "YES" : "NO", N_RUNNERS, REPEAT,
+		KILL_TIMEOUT);
+	condlog(2, "%10s%10s%10s%10s%10s", "run", "total", "finished",
+		"completed", "errors");
+
+	total = fork_test();
+	if (total == -1)
+		return 130;
+	condlog(2, "== TOTAL NUMBER OF FAILED RUNS: %d", total);
+	return total ? 1 : 0;
+}
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/5] multipath-tools tests: add test program for thread runners
  2026-04-23 20:44 ` [PATCH v2 3/5] multipath-tools tests: add test program for " Martin Wilck
@ 2026-04-28  0:24   ` Benjamin Marzinski
  0 siblings, 0 replies; 11+ messages in thread
From: Benjamin Marzinski @ 2026-04-28  0:24 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Christophe Varoqui, Brian Bunker, dm-devel, Martin Wilck

On Thu, Apr 23, 2026 at 10:44:44PM +0200, Martin Wilck wrote:
> Add a test program for the "runner" thread implementation from the previous
> commit. The test program runs simulated "hanging" threads that may time
> out, and optionally kills all threads at an arbitrary point in time. See
> the comments at the top of the file for details.
> 
> Also add a test driver script (runner-test.sh) with a few reasonable
> combinations of command line arguments.
> 
> The test program has been used to test the "runner" implementation
> extensively on different architectures (x86_64, aarch64, ppc64le, s390x),
> using both valgrind and the gcc address sanitizer (libasan) for detection
> of memory leaks and use-after-free errors.
> 
> For valgrind, a suppression file needs to be added, as valgrind doesn't
> seem to capture the deallocation of thread local storage for detached
> threads in the test case where the test program is killed. The suppression
> affects only memory allocated by glibc. This leak has not been seen with
> libasan, only with valgrind.
> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 4/5] libmultipath: TUR checker: use runner threads
  2026-04-23 20:44 [PATCH v2 0/5] multipath-tools: generic async threads for TUR checker Martin Wilck
                   ` (2 preceding siblings ...)
  2026-04-23 20:44 ` [PATCH v2 3/5] multipath-tools tests: add test program for " Martin Wilck
@ 2026-04-23 20:44 ` Martin Wilck
  2026-04-28  4:04   ` Benjamin Marzinski
  2026-04-23 20:44 ` [PATCH v2 5/5] libmultipath: tur checker: improve tur_deep_sleep() test Martin Wilck
  4 siblings, 1 reply; 11+ messages in thread
From: Martin Wilck @ 2026-04-23 20:44 UTC (permalink / raw)
  To: Christophe Varoqui, Benjamin Marzinski, Brian Bunker, dm-devel
  Cc: Martin Wilck

Use the generic runners to simplify the TUR checker code.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 libmultipath/checkers/tur.c | 363 +++++++++++-------------------------
 1 file changed, 112 insertions(+), 251 deletions(-)

diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c
index ba4ca68..0a312c7 100644
--- a/libmultipath/checkers/tur.c
+++ b/libmultipath/checkers/tur.c
@@ -3,7 +3,6 @@
  *
  * Copyright (c) 2004 Christophe Varoqui
  */
-#include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <sys/types.h>
@@ -14,16 +13,11 @@
 #include <sys/sysmacros.h>
 #include <errno.h>
 #include <sys/time.h>
-#include <pthread.h>
-#include <urcu.h>
-#include <urcu/uatomic.h>
 
 #include "checkers.h"
-
 #include "debug.h"
 #include "sg_include.h"
-#include "util.h"
-#include "time-util.h"
+#include "runner.h"
 
 #define TUR_CMD_LEN 6
 #define HEAVY_CHECK_COUNT       10
@@ -45,70 +39,47 @@ const char *libcheck_msgtable[] = {
 	NULL,
 };
 
-struct tur_checker_context {
-	dev_t devt;
-	int state;
-	int running; /* uatomic access only */
+struct tur_data {
 	int fd;
+	dev_t devt;
 	unsigned int timeout;
-	time_t time;
-	pthread_t thread;
-	pthread_mutex_t lock;
-	pthread_cond_t active;
-	int holders; /* uatomic access only */
-	int msgid;
-	struct checker_context ctx;
-	unsigned int nr_timeouts;
-	bool checked_state;
+	int state;
+	short msgid;
 };
 
-int libcheck_init (struct checker * c)
+struct tur_checker_context {
+	struct checker_context chkr;
+	int last_runner_state;
+	unsigned int nr_timeouts;
+	struct runner_context *rtx;
+	struct tur_data tdata;
+};
+
+int libcheck_init(struct checker *c)
 {
-	struct tur_checker_context *ct;
+	struct tur_checker_context *tcc;
 	struct stat sb;
 
-	ct = malloc(sizeof(struct tur_checker_context));
-	if (!ct)
-		return 1;
-	memset(ct, 0, sizeof(struct tur_checker_context));
-
-	ct->state = PATH_UNCHECKED;
-	ct->fd = -1;
-	uatomic_set(&ct->holders, 1);
-	pthread_cond_init_mono(&ct->active);
-	pthread_mutex_init(&ct->lock, NULL);
+	tcc = calloc(1, sizeof(*tcc));
+	tcc->tdata.state = PATH_UNCHECKED;
+	tcc->tdata.fd = -1;
 	if (fstat(c->fd, &sb) == 0)
-		ct->devt = sb.st_rdev;
-	ct->ctx.cls = c->cls;
-	c->context = ct;
-
+		tcc->tdata.devt = sb.st_rdev;
+	tcc->chkr.cls = c->cls;
+	c->context = tcc;
 	return 0;
 }
 
-static void cleanup_context(struct tur_checker_context *ct)
-{
-	pthread_mutex_destroy(&ct->lock);
-	pthread_cond_destroy(&ct->active);
-	free(ct);
-}
-
 void libcheck_free (struct checker * c)
 {
-	if (c->context) {
-		struct tur_checker_context *ct = c->context;
-		int holders;
-		int running;
+	struct tur_checker_context *tcc = c->context;
 
-		running = uatomic_xchg(&ct->running, 0);
-		if (running)
-			pthread_cancel(ct->thread);
-		ct->thread = 0;
-		holders = uatomic_sub_return(&ct->holders, 1);
-		if (!holders)
-			cleanup_context(ct);
-		c->context = NULL;
-	}
-	return;
+	if (!tcc)
+		return;
+	c->context = NULL;
+	if (tcc->rtx)
+		release_runner(tcc->rtx);
+	free(tcc);
 }
 
 static int
@@ -216,19 +187,6 @@ retry:
 	return PATH_UP;
 }
 
-#define tur_thread_cleanup_push(ct) pthread_cleanup_push(cleanup_func, ct)
-#define tur_thread_cleanup_pop(ct) pthread_cleanup_pop(1)
-
-static void cleanup_func(void *data)
-{
-	int holders;
-	struct tur_checker_context *ct = data;
-
-	holders = uatomic_sub_return(&ct->holders, 1);
-	if (!holders)
-		cleanup_context(ct);
-}
-
 /*
  * Test code for "zombie tur thread" handling.
  * Compile e.g. with CFLAGS=-DTUR_TEST_MAJOR=8
@@ -249,13 +207,13 @@ static void cleanup_func(void *data)
 #define TUR_SLEEP_SECS 60
 #endif
 
-static void tur_deep_sleep(const struct tur_checker_context *ct)
+static void tur_deep_sleep(const struct tur_data *tdata)
 {
 	static int sleep_cnt;
 	const struct timespec ts = { .tv_sec = TUR_SLEEP_SECS, .tv_nsec = 0 };
 	int oldstate;
 
-	if (ct->devt != makedev(TUR_TEST_MAJOR, TUR_TEST_MINOR) ||
+	if (tdata->devt != makedev(TUR_TEST_MAJOR, TUR_TEST_MINOR) ||
 	    ++sleep_cnt % TUR_SLEEP_INTERVAL != 0)
 		return;
 
@@ -273,110 +231,92 @@ static void tur_deep_sleep(const struct tur_checker_context *ct)
 #define tur_deep_sleep(x) do {} while (0)
 #endif /* TUR_TEST_MAJOR */
 
-void *libcheck_thread(struct checker_context *ctx)
+void runner_callback(void *arg)
 {
-	struct tur_checker_context *ct =
-		container_of(ctx, struct tur_checker_context, ctx);
-	int state, running;
-	short msgid;
+	struct tur_data *tdata = arg;
+	int state;
 
-	/* This thread can be canceled, so setup clean up */
-	tur_thread_cleanup_push(ct);
+	condlog(4, "%d:%d : tur checker starting up", major(tdata->devt),
+		minor(tdata->devt));
 
-	condlog(4, "%d:%d : tur checker starting up", major(ct->devt),
-		minor(ct->devt));
-
-	tur_deep_sleep(ct);
-	state = tur_check(ct->fd, ct->timeout, &msgid);
+	tur_deep_sleep(tdata);
+	state = tur_check(tdata->fd, tdata->timeout, &tdata->msgid);
+	tdata->state = state;
 	pthread_testcancel();
-
-	/* TUR checker done */
-	pthread_mutex_lock(&ct->lock);
-	ct->state = state;
-	ct->msgid = msgid;
-	pthread_cond_signal(&ct->active);
-	pthread_mutex_unlock(&ct->lock);
-
-	condlog(4, "%d:%d : tur checker finished, state %s", major(ct->devt),
-		minor(ct->devt), checker_state_name(state));
-
-	running = uatomic_xchg(&ct->running, 0);
-	if (!running)
-		pause();
-
-	tur_thread_cleanup_pop(ct);
-
-	return ((void *)0);
+	condlog(4, "%d:%d : tur checker finished, state %s", major(tdata->devt),
+		minor(tdata->devt), checker_state_name(state));
 }
 
-static void tur_set_async_timeout(struct checker *c)
+static int check_runner_state(struct tur_checker_context *tcc)
 {
-	struct tur_checker_context *ct = c->context;
-	struct timespec now;
+	struct runner_context *rtx = tcc->rtx;
+	int rc;
 
-	get_monotonic_time(&now);
-	ct->time = now.tv_sec + c->timeout;
-}
-
-static int tur_check_async_timeout(struct checker *c)
-{
-	struct tur_checker_context *ct = c->context;
-	struct timespec now;
-
-	get_monotonic_time(&now);
-	return (now.tv_sec > ct->time);
-}
-
-int check_pending(struct checker *c)
-{
-	struct tur_checker_context *ct = c->context;
-	int tur_status = PATH_PENDING;
-
-	pthread_mutex_lock(&ct->lock);
-
-	if (ct->state != PATH_PENDING || ct->msgid != MSG_TUR_RUNNING)
-	{
-		tur_status = ct->state;
-		c->msgid = ct->msgid;
-	}
-	pthread_mutex_unlock(&ct->lock);
-	if (tur_status == PATH_PENDING && c->msgid == MSG_TUR_RUNNING) {
+	rc = check_runner(rtx, &tcc->tdata, sizeof(tcc->tdata));
+	switch (rc) {
+	case RUNNER_DEAD:
+		tcc->tdata.state = PATH_TIMEOUT;
+		tcc->tdata.msgid = MSG_TUR_TIMEOUT;
+		/* fallthrough */
+	case RUNNER_DONE:
+		release_runner(tcc->rtx);
+		tcc->rtx = NULL;
+		tcc->last_runner_state = rc;
+		tcc->nr_timeouts = 0;
+		condlog(rc == RUNNER_DONE ? 4 : 3,
+			"%d:%d : tur checker finished, state %s, runner state %s",
+			major(tcc->tdata.devt), minor(tcc->tdata.devt),
+			checker_state_name(tcc->tdata.state),
+			runner_state_name(rc));
+		break;
+	case RUNNER_CANCELLED:
+		tcc->last_runner_state = rc;
+		tcc->tdata.state = PATH_TIMEOUT;
+		tcc->tdata.msgid = MSG_TUR_TIMEOUT;
+		if (tcc->nr_timeouts < MAX_NR_TIMEOUTS) {
+			condlog(3, "%d:%d : tur checker timed out, releasing it",
+				major(tcc->tdata.devt), minor(tcc->tdata.devt));
+			tcc->nr_timeouts++;
+			release_runner(tcc->rtx);
+			tcc->rtx = NULL;
+		} else if (tcc->nr_timeouts == MAX_NR_TIMEOUTS) {
+			tcc->nr_timeouts++;
+			condlog(3, "%d:%d : tur checker timed out, waiting for it",
+				major(tcc->tdata.devt), minor(tcc->tdata.devt));
+		}
+		break;
+	default:
 		condlog(4, "%d:%d : tur checker still running",
-			major(ct->devt), minor(ct->devt));
-	} else {
-		int running = uatomic_xchg(&ct->running, 0);
-		if (running)
-			pthread_cancel(ct->thread);
-		ct->thread = 0;
+			major(tcc->tdata.devt), minor(tcc->tdata.devt));
+		tcc->tdata.msgid = MSG_TUR_RUNNING;
+		break;
 	}
-
-	ct->checked_state = true;
-	return tur_status;
+	return rc;
 }
 
 bool libcheck_need_wait(struct checker *c)
 {
 	struct tur_checker_context *ct = c->context;
-	return (ct && ct->thread && uatomic_read(&ct->running) != 0 &&
-		!ct->checked_state);
+
+	return ct && ct->rtx;
 }
 
 int libcheck_pending(struct checker *c)
 {
 	struct tur_checker_context *ct = c->context;
-
 	/* The if path checker isn't running, just return the exiting value. */
-	if (!ct || !ct->thread)
+	if (!ct || !ct->rtx)
 		return c->path_state;
 
-	return check_pending(c);
+	/* This may nullify ct->rtx */
+	check_runner_state(ct);
+	c->msgid = ct->tdata.msgid;
+	return ct->tdata.state;
 }
 
 int libcheck_check(struct checker * c)
 {
 	struct tur_checker_context *ct = c->context;
-	pthread_attr_t attr;
-	int tur_status, r;
 
 	if (!ct)
 		return PATH_UNCHECKED;
@@ -384,109 +324,30 @@ int libcheck_check(struct checker * c)
 	if (checker_is_sync(c))
 		return tur_check(c->fd, c->timeout, &c->msgid);
 
-	/*
-	 * Async mode
-	 */
-	if (ct->thread) {
-		ct->checked_state = true;
-		if (tur_check_async_timeout(c)) {
-			int running = uatomic_xchg(&ct->running, 0);
-			if (running) {
-				pthread_cancel(ct->thread);
-				condlog(3, "%d:%d : tur checker timeout",
-					major(ct->devt), minor(ct->devt));
-				c->msgid = MSG_TUR_TIMEOUT;
-				tur_status = PATH_TIMEOUT;
-			} else {
-				pthread_mutex_lock(&ct->lock);
-				tur_status = ct->state;
-				c->msgid = ct->msgid;
-				pthread_mutex_unlock(&ct->lock);
-			}
-			ct->thread = 0;
-		} else if (uatomic_read(&ct->running) != 0) {
-			condlog(3, "%d:%d : tur checker not finished",
-				major(ct->devt), minor(ct->devt));
-			tur_status = PATH_PENDING;
-			c->msgid = MSG_TUR_RUNNING;
-		} else {
-			/* TUR checker done */
-			ct->thread = 0;
-			pthread_mutex_lock(&ct->lock);
-			tur_status = ct->state;
-			c->msgid = ct->msgid;
-			pthread_mutex_unlock(&ct->lock);
-		}
-	} else {
-		if (uatomic_read(&ct->holders) > 1) {
-			/* The thread has been cancelled but hasn't quit. */
-			if (ct->nr_timeouts == MAX_NR_TIMEOUTS) {
-				condlog(2, "%d:%d : waiting for stalled tur thread to finish",
-					major(ct->devt), minor(ct->devt));
-				ct->nr_timeouts++;
-			}
-			/*
-			 * Don't start new threads until the last once has
-			 * finished.
-			 */
-			if (ct->nr_timeouts > MAX_NR_TIMEOUTS) {
-				c->msgid = MSG_TUR_TIMEOUT;
-				return PATH_TIMEOUT;
-			}
-			ct->nr_timeouts++;
-			/*
-			 * Start a new thread while the old one is stalled.
-			 * We have to prevent it from interfering with the new
-			 * thread. We create a new context and leave the old
-			 * one with the stale thread, hoping it will clean up
-			 * eventually.
-			 */
-			condlog(3, "%d:%d : tur thread not responding",
-				major(ct->devt), minor(ct->devt));
-
-			/*
-			 * libcheck_init will replace c->context.
-			 * It fails only in OOM situations. In this case, return
-			 * PATH_UNCHECKED to avoid prematurely failing the path.
-			 */
-			if (libcheck_init(c) != 0) {
-				c->msgid = MSG_TUR_FAILED;
-				return PATH_UNCHECKED;
-			}
-			((struct tur_checker_context *)c->context)->nr_timeouts = ct->nr_timeouts;
-
-			if (!uatomic_sub_return(&ct->holders, 1)) {
-				/* It did terminate, eventually */
-				cleanup_context(ct);
-				((struct tur_checker_context *)c->context)->nr_timeouts = 0;
-			}
-
-			ct = c->context;
-		} else
-			ct->nr_timeouts = 0;
-		/* Start new TUR checker */
-		pthread_mutex_lock(&ct->lock);
-		tur_status = ct->state = PATH_PENDING;
-		c->msgid = ct->msgid = MSG_TUR_RUNNING;
-		pthread_mutex_unlock(&ct->lock);
-		ct->fd = c->fd;
-		ct->timeout = c->timeout;
-		ct->checked_state = false;
-		uatomic_add(&ct->holders, 1);
-		uatomic_set(&ct->running, 1);
-		tur_set_async_timeout(c);
-		setup_thread_attr(&attr, 32 * 1024, 1);
-		r = start_checker_thread(&ct->thread, &attr, &ct->ctx);
-		pthread_attr_destroy(&attr);
-		if (r) {
-			uatomic_sub(&ct->holders, 1);
-			uatomic_set(&ct->running, 0);
-			ct->thread = 0;
-			condlog(3, "%d:%d : failed to start tur thread, using"
-				" sync mode", major(ct->devt), minor(ct->devt));
-			return tur_check(c->fd, c->timeout, &c->msgid);
-		}
+	/* Handle the case that the checker just completed */
+	if (ct->rtx) {
+		check_runner_state(ct);
+		c->msgid = ct->tdata.msgid;
+		return ct->tdata.state;
 	}
 
-	return tur_status;
+	/* create new checker thread */
+	ct->tdata.fd = c->fd;
+	ct->tdata.timeout = c->timeout;
+
+	ct->tdata.state = PATH_PENDING;
+	ct->tdata.msgid = MSG_TUR_RUNNING;
+	condlog(3, "%d:%d : starting checker", major(ct->tdata.devt),
+		minor(ct->tdata.devt));
+	ct->rtx = get_runner(runner_callback, &ct->tdata, sizeof(ct->tdata),
+			     1000000 * c->timeout);
+
+	if (ct->rtx) {
+		c->msgid = ct->tdata.msgid;
+		return ct->tdata.state;
+	} else {
+		condlog(3, "%d:%d : failed to start tur thread, using sync mode",
+			major(ct->tdata.devt), minor(ct->tdata.devt));
+		return tur_check(c->fd, c->timeout, &c->msgid);
+	}
 }
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 4/5] libmultipath: TUR checker: use runner threads
  2026-04-23 20:44 ` [PATCH v2 4/5] libmultipath: TUR checker: use runner threads Martin Wilck
@ 2026-04-28  4:04   ` Benjamin Marzinski
  0 siblings, 0 replies; 11+ messages in thread
From: Benjamin Marzinski @ 2026-04-28  4:04 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Christophe Varoqui, Brian Bunker, dm-devel, Martin Wilck

On Thu, Apr 23, 2026 at 10:44:45PM +0200, Martin Wilck wrote:
> Use the generic runners to simplify the TUR checker code.
> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>

This is a really nice cleanup.

Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 5/5] libmultipath: tur checker: improve tur_deep_sleep() test
  2026-04-23 20:44 [PATCH v2 0/5] multipath-tools: generic async threads for TUR checker Martin Wilck
                   ` (3 preceding siblings ...)
  2026-04-23 20:44 ` [PATCH v2 4/5] libmultipath: TUR checker: use runner threads Martin Wilck
@ 2026-04-23 20:44 ` Martin Wilck
  2026-04-28  4:05   ` Benjamin Marzinski
  4 siblings, 1 reply; 11+ messages in thread
From: Martin Wilck @ 2026-04-23 20:44 UTC (permalink / raw)
  To: Christophe Varoqui, Benjamin Marzinski, Brian Bunker, dm-devel
  Cc: Martin Wilck

The tur_deep_sleep() test serves to test the behavior of the tur checker
when checkers time out and cannot be cancelled.

Up to now the checker timed out on every TUR_SLEEP_INTERVALth
invocation. That made it impossible to test the MAX_TIMEOUTS logic, which
requires multiple consecutive timeouts. To fix it, invert the logic, so
that the checker just succeeds on every TUR_SLEEP_INTERVALth call.

Also, decrease loglevel of the messages from tur_deep_sleep().

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 libmultipath/checkers/tur.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c
index 0a312c7..6791172 100644
--- a/libmultipath/checkers/tur.c
+++ b/libmultipath/checkers/tur.c
@@ -214,15 +214,15 @@ static void tur_deep_sleep(const struct tur_data *tdata)
 	int oldstate;
 
 	if (tdata->devt != makedev(TUR_TEST_MAJOR, TUR_TEST_MINOR) ||
-	    ++sleep_cnt % TUR_SLEEP_INTERVAL != 0)
+	    ++sleep_cnt % TUR_SLEEP_INTERVAL == 0)
 		return;
 
-	condlog(1, "tur thread going to sleep for %ld seconds", ts.tv_sec);
+	condlog(3, "tur thread going to sleep for %ld seconds", ts.tv_sec);
 	if (pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, &oldstate) != 0)
 		condlog(0, "pthread_setcancelstate: %m");
 	if (nanosleep(&ts, NULL) != 0)
 		condlog(0, "nanosleep: %m");
-	condlog(1, "tur zombie thread woke up");
+	condlog(3, "tur zombie thread woke up");
 	if (pthread_setcancelstate(oldstate, NULL) != 0)
 		condlog(0, "pthread_setcancelstate (2): %m");
 	pthread_testcancel();
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 5/5] libmultipath: tur checker: improve tur_deep_sleep() test
  2026-04-23 20:44 ` [PATCH v2 5/5] libmultipath: tur checker: improve tur_deep_sleep() test Martin Wilck
@ 2026-04-28  4:05   ` Benjamin Marzinski
  0 siblings, 0 replies; 11+ messages in thread
From: Benjamin Marzinski @ 2026-04-28  4:05 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Christophe Varoqui, Brian Bunker, dm-devel, Martin Wilck

On Thu, Apr 23, 2026 at 10:44:46PM +0200, Martin Wilck wrote:
> The tur_deep_sleep() test serves to test the behavior of the tur checker
> when checkers time out and cannot be cancelled.
> 
> Up to now the checker timed out on every TUR_SLEEP_INTERVALth
> invocation. That made it impossible to test the MAX_TIMEOUTS logic, which
> requires multiple consecutive timeouts. To fix it, invert the logic, so
> that the checker just succeeds on every TUR_SLEEP_INTERVALth call.
> 
> Also, decrease loglevel of the messages from tur_deep_sleep().
> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>

Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-04-28 10:47 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-23 20:44 [PATCH v2 0/5] multipath-tools: generic async threads for TUR checker Martin Wilck
2026-04-23 20:44 ` [PATCH v2 1/5] multipathd: get_new_state: map PATH_TIMEOUT to PATH_DOWN Martin Wilck
2026-04-23 20:44 ` [PATCH v2 2/5] libmpathutil: add generic implementation for checker thread runners Martin Wilck
2026-04-28  0:20   ` Benjamin Marzinski
2026-04-28 10:47     ` Martin Wilck
2026-04-23 20:44 ` [PATCH v2 3/5] multipath-tools tests: add test program for " Martin Wilck
2026-04-28  0:24   ` Benjamin Marzinski
2026-04-23 20:44 ` [PATCH v2 4/5] libmultipath: TUR checker: use runner threads Martin Wilck
2026-04-28  4:04   ` Benjamin Marzinski
2026-04-23 20:44 ` [PATCH v2 5/5] libmultipath: tur checker: improve tur_deep_sleep() test Martin Wilck
2026-04-28  4:05   ` Benjamin Marzinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.