* [PATCH 00/15] Yet Another path checker refactor
@ 2024-08-28 22:17 Benjamin Marzinski
2024-08-28 22:17 ` [PATCH 01/15] libmultipath: store checker_check() result in checker struct Benjamin Marzinski
` (15 more replies)
0 siblings, 16 replies; 46+ messages in thread
From: Benjamin Marzinski @ 2024-08-28 22:17 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
This patchset cleans up some of the ugly code from my last refactor, and
speeds up multipathd path checking, by refactoring how paths are
checked.
multipathd previously ran the checker on a path, waited briefly for it
to complete, and if it did, updated the path and the priorities for the
multipath device and possibly reloaded it.
This had multiple issues.
1. All the paths waited for their checkers to complete serially. This
wasted time with no benefit. It also meant that it wasn't that uncommon
for a checker to not finish in time, and get marked pending.
2. The prioritiers could get run on paths multiple times during a
checker loop if multiple paths were restored at the same time, which is
a common occurance.
3. In an effort to keep a multipath device's state consistent despite
the delays introduced by waiting for the path checkers, paths were
checked by multipath device. However checking the paths could involve
reloading a multipath device table, or even removing a multipath device.
This added some ugly backout and retry logic to the path check code.
With this patchset, the code now first starts the path checkers on all
devices due for a path check. Then it goes back and updates the state of
all the path devices, waiting at most a total of 1ms if there are
checkers that need waiting for. Once all of the paths have been updated,
it goes through each multipath device, updating path priorities and
reloading the device as necessary.
This allows multipathd to spend less time and do less redundant work
while checking paths, while making paths slightly more likely to not
spend a checker cycle marked as pending.
Since there isn't a delay waiting for the previous checker before
starting the next one, the path checker code has reverted to checking
all paths in pathvec order, instead of by multipath device. Updating the
paths in pathvec order simplifies the code, since the multipath devices
can change during path updates. Starting the checkers in pathvec order
keeps a path from having its checker started towards the end of the list
of paths but checking for the result towards the start of the list of
paths. However, it's possible to start the checkers by multipath device
without adding much complexity, if people think that the benfit of
starting the checkers as close together as possible outweighs the
benefit of giving each checker as close to the same amount of time as
possible to complete.
Benjamin Marzinski (15):
libmultipath: store checker_check() result in checker struct
libmultipath: add missing checker function prototypes
libmultipath: split out the code to wait for pending checkers
libmultipath: remove pending wait code from libcheck_check calls
multipath-tools tests: fix up directio tests
libmultipath: split get_state into two functions
libmultipath: change path_offline to path_sysfs_state
multipathd: split check_path_state into two functions
multipathd: split do_checker_path
multipathd: split check_path into two functions
multipathd: split handle_uninitialized_path into two functions
multipathd: split check_paths into two functions
multipathd: update priority once after updating all paths
multipathd: remove pointless check
multipathd: fix deferred_failback_tick for reload removes
libmultipath/checkers.c | 35 +--
libmultipath/checkers.h | 8 +-
libmultipath/checkers/directio.c | 129 ++++++----
libmultipath/checkers/tur.c | 66 +++--
libmultipath/discovery.c | 90 ++++---
libmultipath/discovery.h | 6 +-
libmultipath/libmultipath.version | 3 +-
libmultipath/print.c | 2 +-
libmultipath/propsel.c | 1 +
libmultipath/structs.h | 21 +-
multipathd/main.c | 385 ++++++++++++++++++------------
tests/directio.c | 133 +++++++----
12 files changed, 546 insertions(+), 333 deletions(-)
--
2.45.0
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH 01/15] libmultipath: store checker_check() result in checker struct
2024-08-28 22:17 [PATCH 00/15] Yet Another path checker refactor Benjamin Marzinski
@ 2024-08-28 22:17 ` Benjamin Marzinski
2024-09-04 16:13 ` Martin Wilck
2024-08-28 22:17 ` [PATCH 02/15] libmultipath: add missing checker function prototypes Benjamin Marzinski
` (14 subsequent siblings)
15 siblings, 1 reply; 46+ messages in thread
From: Benjamin Marzinski @ 2024-08-28 22:17 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
checker_check() is now a void function that stores the path state in
the checker struct. It can be retrieved later using checker_get_state().
Right now, this is called immediately after checker_check(), but in
the future, it can be deferred to another time.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
libmultipath/checkers.c | 26 +++++++++++++-------------
libmultipath/checkers.h | 4 +++-
libmultipath/discovery.c | 3 ++-
3 files changed, 18 insertions(+), 15 deletions(-)
diff --git a/libmultipath/checkers.c b/libmultipath/checkers.c
index fdb91e17..c4918d28 100644
--- a/libmultipath/checkers.c
+++ b/libmultipath/checkers.c
@@ -299,28 +299,28 @@ void checker_put (struct checker * dst)
free_checker_class(src);
}
-int checker_check (struct checker * c, int path_state)
+int checker_get_state(struct checker *c)
{
- int r;
+ return c ? c->path_state : PATH_UNCHECKED;
+}
+void checker_check (struct checker * c, int path_state)
+{
if (!c)
- return PATH_WILD;
+ return;
c->msgid = CHECKER_MSGID_NONE;
if (c->disable) {
c->msgid = CHECKER_MSGID_DISABLED;
- return PATH_UNCHECKED;
- }
- if (!strncmp(c->cls->name, NONE, 4))
- return path_state;
-
- if (c->fd < 0) {
+ c->path_state = PATH_UNCHECKED;
+ } else if (!strncmp(c->cls->name, NONE, 4)) {
+ c->path_state = path_state;
+ } else if (c->fd < 0) {
c->msgid = CHECKER_MSGID_NO_FD;
- return PATH_WILD;
+ c->path_state = PATH_WILD;
+ } else {
+ c->path_state = c->cls->check(c);
}
- r = c->cls->check(c);
-
- return r;
}
const char *checker_name(const struct checker *c)
diff --git a/libmultipath/checkers.h b/libmultipath/checkers.h
index 102351f6..6e54d8f0 100644
--- a/libmultipath/checkers.h
+++ b/libmultipath/checkers.h
@@ -123,6 +123,7 @@ struct checker {
int fd;
unsigned int timeout;
int disable;
+ int path_state;
short msgid; /* checker-internal extra status */
void * context; /* store for persistent data */
void ** mpcontext; /* store for persistent data shared
@@ -169,7 +170,8 @@ struct checker_context {
};
int start_checker_thread (pthread_t *thread, const pthread_attr_t *attr,
struct checker_context *ctx);
-int checker_check (struct checker *, int);
+int checker_get_state(struct checker *c);
+void checker_check (struct checker *, int);
int checker_is_sync(const struct checker *);
const char *checker_name (const struct checker *);
void reset_checker_classes(void);
diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
index e94705bf..5648be60 100644
--- a/libmultipath/discovery.c
+++ b/libmultipath/discovery.c
@@ -2007,7 +2007,8 @@ get_state (struct path * pp, struct config *conf, int daemon, int oldstate)
checker_set_async(c);
else
checker_set_sync(c);
- state = checker_check(c, oldstate);
+ checker_check(c, oldstate);
+ state = checker_get_state(c);
condlog(3, "%s: %s state = %s", pp->dev,
checker_name(c), checker_state_name(state));
if (state != PATH_UP && state != PATH_GHOST &&
--
2.45.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH 02/15] libmultipath: add missing checker function prototypes
2024-08-28 22:17 [PATCH 00/15] Yet Another path checker refactor Benjamin Marzinski
2024-08-28 22:17 ` [PATCH 01/15] libmultipath: store checker_check() result in checker struct Benjamin Marzinski
@ 2024-08-28 22:17 ` Benjamin Marzinski
2024-09-04 16:13 ` Martin Wilck
2024-08-28 22:17 ` [PATCH 03/15] libmultipath: split out the code to wait for pending checkers Benjamin Marzinski
` (13 subsequent siblings)
15 siblings, 1 reply; 46+ messages in thread
From: Benjamin Marzinski @ 2024-08-28 22:17 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
libmultipath/checkers.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/libmultipath/checkers.h b/libmultipath/checkers.h
index 6e54d8f0..fb1160af 100644
--- a/libmultipath/checkers.h
+++ b/libmultipath/checkers.h
@@ -188,6 +188,9 @@ int libcheck_check(struct checker *);
int libcheck_init(struct checker *);
void libcheck_free(struct checker *);
void *libcheck_thread(struct checker_context *ctx);
+void libcheck_reset(void);
+int libcheck_mp_init(struct checker *);
+
/*
* msgid => message map.
--
2.45.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH 03/15] libmultipath: split out the code to wait for pending checkers
2024-08-28 22:17 [PATCH 00/15] Yet Another path checker refactor Benjamin Marzinski
2024-08-28 22:17 ` [PATCH 01/15] libmultipath: store checker_check() result in checker struct Benjamin Marzinski
2024-08-28 22:17 ` [PATCH 02/15] libmultipath: add missing checker function prototypes Benjamin Marzinski
@ 2024-08-28 22:17 ` Benjamin Marzinski
2024-09-04 15:01 ` Martin Wilck
2024-08-28 22:17 ` [PATCH 04/15] libmultipath: remove pending wait code from libcheck_check calls Benjamin Marzinski
` (12 subsequent siblings)
15 siblings, 1 reply; 46+ messages in thread
From: Benjamin Marzinski @ 2024-08-28 22:17 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
This patch adds a new optional symbol for the dynamic path checker
libraries, libcheck_pending. This is currently unused, but can be called
on pending checkers to check if they've completed and return their value.
The "tur" and "directio" checkers are the only ones which can return
PATH_PENDING. They now implement libcheck_pending() as a wrapper around
the code that libcheck_check uses to wait for pending paths, which has
been broken out into its own function.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
libmultipath/checkers.c | 4 +-
libmultipath/checkers.h | 1 +
libmultipath/checkers/directio.c | 86 ++++++++++++++++++++++----------
libmultipath/checkers/tur.c | 65 ++++++++++++++++--------
4 files changed, 109 insertions(+), 47 deletions(-)
diff --git a/libmultipath/checkers.c b/libmultipath/checkers.c
index c4918d28..298aec78 100644
--- a/libmultipath/checkers.c
+++ b/libmultipath/checkers.c
@@ -26,6 +26,7 @@ struct checker_class {
void (*free)(struct checker *); /* to free the context */
void (*reset)(void); /* to reset the global variables */
void *(*thread)(void *); /* async thread entry point */
+ int (*pending)(struct checker *); /* to recheck pending paths */
const char **msgtable;
short msgtable_size;
};
@@ -180,7 +181,8 @@ static struct checker_class *add_checker_class(const char *name)
c->mp_init = (int (*)(struct checker *)) dlsym(c->handle, "libcheck_mp_init");
c->reset = (void (*)(void)) dlsym(c->handle, "libcheck_reset");
c->thread = (void *(*)(void*)) dlsym(c->handle, "libcheck_thread");
- /* These 3 functions can be NULL. call dlerror() to clear out any
+ c->pending = (int (*)(struct checker *)) dlsym(c->handle, "libcheck_pending");
+ /* These 4 functions can be NULL. call dlerror() to clear out any
* error string */
dlerror();
diff --git a/libmultipath/checkers.h b/libmultipath/checkers.h
index fb1160af..b2342a1b 100644
--- a/libmultipath/checkers.h
+++ b/libmultipath/checkers.h
@@ -190,6 +190,7 @@ void libcheck_free(struct checker *);
void *libcheck_thread(struct checker_context *ctx);
void libcheck_reset(void);
int libcheck_mp_init(struct checker *);
+int libcheck_pending(struct checker *c);
/*
diff --git a/libmultipath/checkers/directio.c b/libmultipath/checkers/directio.c
index 8e87878b..3f88b40d 100644
--- a/libmultipath/checkers/directio.c
+++ b/libmultipath/checkers/directio.c
@@ -288,14 +288,36 @@ get_events(struct aio_group *aio_grp, struct timespec *timeout)
return got_events;
}
+static void
+check_pending(struct directio_context *ct, struct timespec endtime)
+{
+ int r;
+ struct timespec currtime, timeout;
+
+ while(1) {
+ get_monotonic_time(&currtime);
+ timespecsub(&endtime, &currtime, &timeout);
+ if (timeout.tv_sec < 0)
+ timeout.tv_sec = timeout.tv_nsec = 0;
+
+ r = get_events(ct->aio_grp, &timeout);
+
+ if (ct->req->state != PATH_PENDING) {
+ ct->running = 0;
+ return;
+ } else if (r == 0 ||
+ (timeout.tv_sec == 0 && timeout.tv_nsec == 0))
+ return;
+ }
+}
+
static int
check_state(int fd, struct directio_context *ct, int sync, int timeout_secs)
{
struct timespec timeout = { .tv_nsec = 1000 };
struct stat sb;
int rc;
- long r;
- struct timespec currtime, endtime;
+ struct timespec endtime;
if (fstat(fd, &sb) == 0) {
LOG(4, "called for %x", (unsigned) sb.st_rdev);
@@ -330,21 +352,11 @@ check_state(int fd, struct directio_context *ct, int sync, int timeout_secs)
endtime.tv_sec += timeout.tv_sec;
endtime.tv_nsec += timeout.tv_nsec;
normalize_timespec(&endtime);
- while(1) {
- r = get_events(ct->aio_grp, &timeout);
- if (ct->req->state != PATH_PENDING) {
- ct->running = 0;
- return ct->req->state;
- } else if (r == 0 ||
- (timeout.tv_sec == 0 && timeout.tv_nsec == 0))
- break;
+ check_pending(ct, endtime);
+ if (ct->req->state != PATH_PENDING)
+ return ct->req->state;
- get_monotonic_time(&currtime);
- timespecsub(&endtime, &currtime, &timeout);
- if (timeout.tv_sec < 0)
- timeout.tv_sec = timeout.tv_nsec = 0;
- }
if (ct->running > timeout_secs || sync) {
struct io_event event;
@@ -360,17 +372,9 @@ check_state(int fd, struct directio_context *ct, int sync, int timeout_secs)
return rc;
}
-int libcheck_check (struct checker * c)
+static void set_msgid(struct checker *c, int state)
{
- int ret;
- struct directio_context * ct = (struct directio_context *)c->context;
-
- if (!ct)
- return PATH_UNCHECKED;
-
- ret = check_state(c->fd, ct, checker_is_sync(c), c->timeout);
-
- switch (ret)
+ switch (state)
{
case PATH_UNCHECKED:
c->msgid = MSG_DIRECTIO_UNKNOWN;
@@ -387,5 +391,37 @@ int libcheck_check (struct checker * c)
default:
break;
}
+}
+
+int libcheck_pending(struct checker *c)
+{
+ struct timespec endtime;
+ struct directio_context *ct = (struct directio_context *)c->context;
+
+ /* The if path checker isn't running, just return the exiting value. */
+ if (!ct || !ct->running)
+ return c->path_state;
+
+ if (ct->req->state == PATH_PENDING) {
+ get_monotonic_time(&endtime);
+ check_pending(ct, endtime);
+ } else
+ ct->running = 0;
+ set_msgid(c, ct->req->state);
+
+ return ct->req->state;
+}
+
+int libcheck_check (struct checker * c)
+{
+ int ret;
+ struct directio_context * ct = (struct directio_context *)c->context;
+
+ if (!ct)
+ return PATH_UNCHECKED;
+
+ ret = check_state(c->fd, ct, checker_is_sync(c), c->timeout);
+ set_msgid(c, ret);
+
return ret;
}
diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c
index a2905af5..95af5214 100644
--- a/libmultipath/checkers/tur.c
+++ b/libmultipath/checkers/tur.c
@@ -323,6 +323,49 @@ static int tur_check_async_timeout(struct checker *c)
return (now.tv_sec > ct->time);
}
+int check_pending(struct checker *c, struct timespec endtime)
+{
+ struct tur_checker_context *ct = c->context;
+ int r, tur_status = PATH_PENDING;
+
+ pthread_mutex_lock(&ct->lock);
+
+ for (r = 0;
+ r == 0 && ct->state == PATH_PENDING &&
+ ct->msgid == MSG_TUR_RUNNING;
+ r = pthread_cond_timedwait(&ct->active, &ct->lock, &endtime));
+
+ if (!r) {
+ tur_status = ct->state;
+ c->msgid = ct->msgid;
+ }
+ pthread_mutex_unlock(&ct->lock);
+ if (tur_status == PATH_PENDING && c->msgid == MSG_TUR_RUNNING) {
+ condlog(4, "%d:%d : tur checker still running",
+ major(ct->devt), minor(ct->devt));
+ } else {
+ int running = uatomic_xchg(&ct->running, 0);
+ if (running)
+ pthread_cancel(ct->thread);
+ ct->thread = 0;
+ }
+
+ return tur_status;
+}
+
+int libcheck_pending(struct checker *c)
+{
+ struct timespec endtime;
+ struct tur_checker_context *ct = c->context;
+
+ /* The if path checker isn't running, just return the exiting value. */
+ if (!ct || !ct->thread)
+ return c->path_state;
+
+ get_monotonic_time(&endtime);
+ return check_pending(c, endtime);
+}
+
int libcheck_check(struct checker * c)
{
struct tur_checker_context *ct = c->context;
@@ -437,27 +480,7 @@ int libcheck_check(struct checker * c)
return tur_check(c->fd, c->timeout, &c->msgid);
}
tur_timeout(&tsp);
- pthread_mutex_lock(&ct->lock);
-
- for (r = 0;
- r == 0 && ct->state == PATH_PENDING &&
- ct->msgid == MSG_TUR_RUNNING;
- r = pthread_cond_timedwait(&ct->active, &ct->lock, &tsp));
-
- if (!r) {
- tur_status = ct->state;
- c->msgid = ct->msgid;
- }
- pthread_mutex_unlock(&ct->lock);
- if (tur_status == PATH_PENDING && c->msgid == MSG_TUR_RUNNING) {
- condlog(4, "%d:%d : tur checker still running",
- major(ct->devt), minor(ct->devt));
- } else {
- int running = uatomic_xchg(&ct->running, 0);
- if (running)
- pthread_cancel(ct->thread);
- ct->thread = 0;
- }
+ tur_status = check_pending(c, tsp);
}
return tur_status;
--
2.45.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH 04/15] libmultipath: remove pending wait code from libcheck_check calls
2024-08-28 22:17 [PATCH 00/15] Yet Another path checker refactor Benjamin Marzinski
` (2 preceding siblings ...)
2024-08-28 22:17 ` [PATCH 03/15] libmultipath: split out the code to wait for pending checkers Benjamin Marzinski
@ 2024-08-28 22:17 ` Benjamin Marzinski
2024-09-04 20:05 ` Martin Wilck
2024-08-28 22:17 ` [PATCH 05/15] multipath-tools tests: fix up directio tests Benjamin Marzinski
` (11 subsequent siblings)
15 siblings, 1 reply; 46+ messages in thread
From: Benjamin Marzinski @ 2024-08-28 22:17 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
When the tur and directio checkers start an asynchronous checker, they
now immediately return with the path in PATH_PENDING, instead of
waiting in checker_check(). Instead the wait now happens in
checker_get_state().
Additionally, the directio checker now waits for 1 ms, like the tur
checker does. Also like the tur checker it now only waits once. If it
is still pending after the first call to checker_get_state(). It will
not wait at all on future calls, and will just process the already
completed IOs.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
libmultipath/checkers.c | 7 +++-
libmultipath/checkers/directio.c | 57 +++++++++++++++++---------------
libmultipath/checkers/tur.c | 13 +++-----
3 files changed, 41 insertions(+), 36 deletions(-)
diff --git a/libmultipath/checkers.c b/libmultipath/checkers.c
index 298aec78..b44b856a 100644
--- a/libmultipath/checkers.c
+++ b/libmultipath/checkers.c
@@ -303,7 +303,12 @@ void checker_put (struct checker * dst)
int checker_get_state(struct checker *c)
{
- return c ? c->path_state : PATH_UNCHECKED;
+ if (!c)
+ return PATH_UNCHECKED;
+ if (c->path_state != PATH_PENDING || !c->cls->pending)
+ return c->path_state;
+ c->path_state = c->cls->pending(c);
+ return c->path_state;
}
void checker_check (struct checker * c, int path_state)
diff --git a/libmultipath/checkers/directio.c b/libmultipath/checkers/directio.c
index 3f88b40d..904e3071 100644
--- a/libmultipath/checkers/directio.c
+++ b/libmultipath/checkers/directio.c
@@ -60,10 +60,11 @@ const char *libcheck_msgtable[] = {
#define LOG(prio, fmt, args...) condlog(prio, "directio: " fmt, ##args)
struct directio_context {
- int running;
+ unsigned int running;
int reset_flags;
struct aio_group *aio_grp;
struct async_req *req;
+ struct timespec endtime;
};
static struct aio_group *
@@ -314,19 +315,16 @@ check_pending(struct directio_context *ct, struct timespec endtime)
static int
check_state(int fd, struct directio_context *ct, int sync, int timeout_secs)
{
- struct timespec timeout = { .tv_nsec = 1000 };
struct stat sb;
int rc;
+ struct io_event event;
struct timespec endtime;
if (fstat(fd, &sb) == 0) {
LOG(4, "called for %x", (unsigned) sb.st_rdev);
}
- if (sync > 0) {
+ if (sync > 0)
LOG(4, "called in synchronous mode");
- timeout.tv_sec = timeout_secs;
- timeout.tv_nsec = 0;
- }
if (ct->running) {
if (ct->req->state != PATH_PENDING) {
@@ -345,31 +343,26 @@ check_state(int fd, struct directio_context *ct, int sync, int timeout_secs)
LOG(3, "io_submit error %i", -rc);
return PATH_UNCHECKED;
}
+ get_monotonic_time(&ct->endtime);
+ ct->endtime.tv_nsec += 1000 * 1000;
+ normalize_timespec(&ct->endtime);
}
ct->running++;
+ if (!sync)
+ return PATH_PENDING;
get_monotonic_time(&endtime);
- endtime.tv_sec += timeout.tv_sec;
- endtime.tv_nsec += timeout.tv_nsec;
+ endtime.tv_sec += timeout_secs;
normalize_timespec(&endtime);
check_pending(ct, endtime);
if (ct->req->state != PATH_PENDING)
return ct->req->state;
- if (ct->running > timeout_secs || sync) {
- struct io_event event;
-
- LOG(3, "abort check on timeout");
-
- io_cancel(ct->aio_grp->ioctx, &ct->req->io, &event);
- rc = PATH_DOWN;
- } else {
- LOG(4, "async io pending");
- rc = PATH_PENDING;
- }
+ LOG(3, "abort check on timeout");
- return rc;
+ io_cancel(ct->aio_grp->ioctx, &ct->req->io, &event);
+ return PATH_DOWN;
}
static void set_msgid(struct checker *c, int state)
@@ -395,21 +388,31 @@ static void set_msgid(struct checker *c, int state)
int libcheck_pending(struct checker *c)
{
- struct timespec endtime;
+ int rc;
+ struct io_event event;
struct directio_context *ct = (struct directio_context *)c->context;
/* The if path checker isn't running, just return the exiting value. */
if (!ct || !ct->running)
return c->path_state;
- if (ct->req->state == PATH_PENDING) {
- get_monotonic_time(&endtime);
- check_pending(ct, endtime);
- } else
+ if (ct->req->state == PATH_PENDING)
+ check_pending(ct, ct->endtime);
+ else
ct->running = 0;
- set_msgid(c, ct->req->state);
+ rc = ct->req->state;
+ if (rc == PATH_PENDING) {
+ if (ct->running > c->timeout) {
+ LOG(3, "abort check on timeout");
+ io_cancel(ct->aio_grp->ioctx, &ct->req->io, &event);
+ rc = PATH_DOWN;
+ }
+ else
+ LOG(4, "async io pending");
+ }
+ set_msgid(c, rc);
- return ct->req->state;
+ return rc;
}
int libcheck_check (struct checker * c)
diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c
index 95af5214..81db565b 100644
--- a/libmultipath/checkers/tur.c
+++ b/libmultipath/checkers/tur.c
@@ -58,6 +58,7 @@ struct tur_checker_context {
int msgid;
struct checker_context ctx;
unsigned int nr_timeouts;
+ struct timespec endtime;
};
int libcheck_init (struct checker * c)
@@ -323,7 +324,7 @@ static int tur_check_async_timeout(struct checker *c)
return (now.tv_sec > ct->time);
}
-int check_pending(struct checker *c, struct timespec endtime)
+int check_pending(struct checker *c)
{
struct tur_checker_context *ct = c->context;
int r, tur_status = PATH_PENDING;
@@ -333,7 +334,7 @@ int check_pending(struct checker *c, struct timespec endtime)
for (r = 0;
r == 0 && ct->state == PATH_PENDING &&
ct->msgid == MSG_TUR_RUNNING;
- r = pthread_cond_timedwait(&ct->active, &ct->lock, &endtime));
+ r = pthread_cond_timedwait(&ct->active, &ct->lock, &ct->endtime));
if (!r) {
tur_status = ct->state;
@@ -355,21 +356,18 @@ int check_pending(struct checker *c, struct timespec endtime)
int libcheck_pending(struct checker *c)
{
- struct timespec endtime;
struct tur_checker_context *ct = c->context;
/* The if path checker isn't running, just return the exiting value. */
if (!ct || !ct->thread)
return c->path_state;
- get_monotonic_time(&endtime);
- return check_pending(c, endtime);
+ return check_pending(c);
}
int libcheck_check(struct checker * c)
{
struct tur_checker_context *ct = c->context;
- struct timespec tsp;
pthread_attr_t attr;
int tur_status, r;
@@ -479,8 +477,7 @@ int libcheck_check(struct checker * c)
" sync mode", major(ct->devt), minor(ct->devt));
return tur_check(c->fd, c->timeout, &c->msgid);
}
- tur_timeout(&tsp);
- tur_status = check_pending(c, tsp);
+ tur_timeout(&ct->endtime);
}
return tur_status;
--
2.45.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH 05/15] multipath-tools tests: fix up directio tests
2024-08-28 22:17 [PATCH 00/15] Yet Another path checker refactor Benjamin Marzinski
` (3 preceding siblings ...)
2024-08-28 22:17 ` [PATCH 04/15] libmultipath: remove pending wait code from libcheck_check calls Benjamin Marzinski
@ 2024-08-28 22:17 ` Benjamin Marzinski
2024-09-04 16:12 ` Martin Wilck
2024-08-28 22:17 ` [PATCH 06/15] libmultipath: split get_state into two functions Benjamin Marzinski
` (10 subsequent siblings)
15 siblings, 1 reply; 46+ messages in thread
From: Benjamin Marzinski @ 2024-08-28 22:17 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
Make the directio tests work with libcheck_pending() being separate from
libcheck_check
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
tests/directio.c | 133 +++++++++++++++++++++++++++++------------------
1 file changed, 82 insertions(+), 51 deletions(-)
diff --git a/tests/directio.c b/tests/directio.c
index 763929e5..2e22f831 100644
--- a/tests/directio.c
+++ b/tests/directio.c
@@ -219,13 +219,22 @@ static void return_io_getevents_nr(struct timespec *ts, int nr,
ev_off += i;
}
-void do_check_state(struct checker *c, int sync, int timeout, int chk_state)
+void do_check_state(struct checker *c, int sync, int chk_state)
{
struct directio_context * ct = (struct directio_context *)c->context;
if (!ct->running)
will_return(__wrap_io_submit, 1);
- assert_int_equal(check_state(test_fd, ct, sync, timeout), chk_state);
+ assert_int_equal(check_state(test_fd, ct, sync, c->timeout), chk_state);
+ if (sync) {
+ assert_int_equal(ev_off, 0);
+ memset(mock_events, 0, sizeof(mock_events));
+ }
+}
+
+void do_libcheck_pending(struct checker *c, int chk_state)
+{
+ assert_int_equal(libcheck_pending(c), chk_state);
assert_int_equal(ev_off, 0);
memset(mock_events, 0, sizeof(mock_events));
}
@@ -245,12 +254,13 @@ void do_libcheck_reset(int nr_aio_grps)
assert_int_equal(ioctx_count, 0);
}
-static void do_libcheck_init(struct checker *c, int blocksize,
+static void do_libcheck_init(struct checker *c, int blocksize, int timeout,
struct async_req **req)
{
struct directio_context * ct;
c->fd = test_fd;
+ c->timeout = timeout;
wrap_will_return(WRAP_IOCTL, blocksize);
assert_int_equal(libcheck_init(c), 0);
ct = (struct directio_context *)c->context;
@@ -305,7 +315,7 @@ static void test_init_reset_init(void **state)
assert_true(list_empty(&aio_grp_list));
will_return(__wrap_io_setup, 0);
- do_libcheck_init(&c, 4096, NULL);
+ do_libcheck_init(&c, 4096, 0, NULL);
aio_grp = get_aio_grp(&c);
check_aio_grp(aio_grp, 1, 0);
list_for_each_entry(tmp_grp, &aio_grp_list, node)
@@ -314,7 +324,7 @@ static void test_init_reset_init(void **state)
check_aio_grp(aio_grp, 0, 0);
do_libcheck_reset(1);
will_return(__wrap_io_setup, 0);
- do_libcheck_init(&c, 4096, NULL);
+ do_libcheck_init(&c, 4096, 0, NULL);
aio_grp = get_aio_grp(&c);
check_aio_grp(aio_grp, 1, 0);
list_for_each_entry(tmp_grp, &aio_grp_list, node)
@@ -340,11 +350,11 @@ static void test_init_free(void **state)
struct directio_context * ct;
if (i % 3 == 0)
- do_libcheck_init(&c[i], 512, NULL);
+ do_libcheck_init(&c[i], 512, 0, NULL);
else if (i % 3 == 1)
- do_libcheck_init(&c[i], 1024, NULL);
+ do_libcheck_init(&c[i], 1024, 0, NULL);
else
- do_libcheck_init(&c[i], 4096, NULL);
+ do_libcheck_init(&c[i], 4096, 0, NULL);
ct = (struct directio_context *)c[i].context;
assert_non_null(ct->aio_grp);
if ((i & 1023) == 0)
@@ -385,7 +395,7 @@ static void test_multi_init_free(void **state)
for (count = 0, i = 0; i < 4096; count++) {
/* usually init, but occasionally free checkers */
if (count == 0 || (count % 5 != 0 && count % 7 != 0)) {
- do_libcheck_init(&c[i], 4096, NULL);
+ do_libcheck_init(&c[i], 4096, 0, NULL);
i++;
} else {
i--;
@@ -404,7 +414,7 @@ static void test_multi_init_free(void **state)
i--;
libcheck_free(&c[i]);
} else {
- do_libcheck_init(&c[i], 4096, NULL);
+ do_libcheck_init(&c[i], 4096, 0, NULL);
i++;
}
}
@@ -420,9 +430,9 @@ static void test_check_state_simple(void **state)
assert_true(list_empty(&aio_grp_list));
will_return(__wrap_io_setup, 0);
- do_libcheck_init(&c, 4096, &req);
+ do_libcheck_init(&c, 4096, 30, &req);
return_io_getevents_nr(NULL, 1, &req, &res);
- do_check_state(&c, 1, 30, PATH_UP);
+ do_check_state(&c, 1, PATH_UP);
libcheck_free(&c);
do_libcheck_reset(1);
}
@@ -435,10 +445,10 @@ static void test_check_state_timeout(void **state)
assert_true(list_empty(&aio_grp_list));
will_return(__wrap_io_setup, 0);
- do_libcheck_init(&c, 4096, NULL);
+ do_libcheck_init(&c, 4096, 30, NULL);
aio_grp = get_aio_grp(&c);
return_io_getevents_none();
- do_check_state(&c, 1, 30, PATH_DOWN);
+ do_check_state(&c, 1, PATH_DOWN);
check_aio_grp(aio_grp, 1, 0);
libcheck_free(&c);
do_libcheck_reset(1);
@@ -452,16 +462,20 @@ static void test_check_state_async_timeout(void **state)
assert_true(list_empty(&aio_grp_list));
will_return(__wrap_io_setup, 0);
- do_libcheck_init(&c, 4096, NULL);
+ do_libcheck_init(&c, 4096, 3, NULL);
aio_grp = get_aio_grp(&c);
+ do_check_state(&c, 0, PATH_PENDING);
return_io_getevents_none();
- do_check_state(&c, 0, 3, PATH_PENDING);
+ do_libcheck_pending(&c, PATH_PENDING);
+ do_check_state(&c, 0, PATH_PENDING);
return_io_getevents_none();
- do_check_state(&c, 0, 3, PATH_PENDING);
+ do_libcheck_pending(&c, PATH_PENDING);
+ do_check_state(&c, 0, PATH_PENDING);
return_io_getevents_none();
- do_check_state(&c, 0, 3, PATH_PENDING);
+ do_libcheck_pending(&c, PATH_PENDING);
+ do_check_state(&c, 0, PATH_PENDING);
return_io_getevents_none();
- do_check_state(&c, 0, 3, PATH_DOWN);
+ do_libcheck_pending(&c, PATH_DOWN);
check_aio_grp(aio_grp, 1, 0);
libcheck_free(&c);
do_libcheck_reset(1);
@@ -477,14 +491,16 @@ static void test_free_with_pending(void **state)
assert_true(list_empty(&aio_grp_list));
will_return(__wrap_io_setup, 0);
- do_libcheck_init(&c[0], 4096, &req);
- do_libcheck_init(&c[1], 4096, NULL);
+ do_libcheck_init(&c[0], 4096, 30, &req);
+ do_libcheck_init(&c[1], 4096, 30, NULL);
aio_grp = get_aio_grp(c);
+ do_check_state(&c[0], 0, PATH_PENDING);
+ do_check_state(&c[1], 0, PATH_PENDING);
return_io_getevents_none();
- do_check_state(&c[0], 0, 30, PATH_PENDING);
+ do_libcheck_pending(&c[0], PATH_PENDING);
return_io_getevents_nr(NULL, 1, &req, &res);
return_io_getevents_none();
- do_check_state(&c[1], 0, 30, PATH_PENDING);
+ do_libcheck_pending(&c[1], PATH_PENDING);
assert_true(is_checker_running(&c[0]));
assert_true(is_checker_running(&c[1]));
check_aio_grp(aio_grp, 2, 0);
@@ -505,9 +521,10 @@ static void test_orphaned_aio_group(void **state)
assert_true(list_empty(&aio_grp_list));
will_return(__wrap_io_setup, 0);
for (i = 0; i < AIO_GROUP_SIZE; i++) {
- do_libcheck_init(&c[i], 4096, NULL);
+ do_libcheck_init(&c[i], 4096, 30, NULL);
+ do_check_state(&c[i], 0, PATH_PENDING);
return_io_getevents_none();
- do_check_state(&c[i], 0, 30, PATH_PENDING);
+ do_libcheck_pending(&c[i], PATH_PENDING);
}
aio_grp = get_aio_grp(c);
check_aio_grp(aio_grp, AIO_GROUP_SIZE, 0);
@@ -539,19 +556,19 @@ static void test_timeout_cancel_failed(void **state)
assert_true(list_empty(&aio_grp_list));
will_return(__wrap_io_setup, 0);
for (i = 0; i < 2; i++)
- do_libcheck_init(&c[i], 4096, &reqs[i]);
+ do_libcheck_init(&c[i], 4096, 30, &reqs[i]);
aio_grp = get_aio_grp(c);
return_io_getevents_none();
- do_check_state(&c[0], 1, 30, PATH_DOWN);
+ do_check_state(&c[0], 1, PATH_DOWN);
assert_true(is_checker_running(&c[0]));
check_aio_grp(aio_grp, 2, 0);
return_io_getevents_none();
- do_check_state(&c[0], 1, 30, PATH_DOWN);
+ do_check_state(&c[0], 1, PATH_DOWN);
assert_true(is_checker_running(&c[0]));
return_io_getevents_nr(NULL, 1, &reqs[0], &res[0]);
return_io_getevents_nr(NULL, 1, &reqs[1], &res[1]);
- do_check_state(&c[1], 1, 30, PATH_UP);
- do_check_state(&c[0], 1, 30, PATH_UP);
+ do_check_state(&c[1], 1, PATH_UP);
+ do_check_state(&c[0], 1, PATH_UP);
for (i = 0; i < 2; i++) {
assert_false(is_checker_running(&c[i]));
libcheck_free(&c[i]);
@@ -571,28 +588,37 @@ static void test_async_timeout_cancel_failed(void **state)
assert_true(list_empty(&aio_grp_list));
will_return(__wrap_io_setup, 0);
for (i = 0; i < 2; i++)
- do_libcheck_init(&c[i], 4096, &reqs[i]);
+ do_libcheck_init(&c[i], 4096, 2, &reqs[i]);
+ do_check_state(&c[0], 0, PATH_PENDING);
+ do_check_state(&c[1], 0, PATH_PENDING);
return_io_getevents_none();
- do_check_state(&c[0], 0, 2, PATH_PENDING);
+ do_libcheck_pending(&c[0], PATH_PENDING);
return_io_getevents_none();
- do_check_state(&c[1], 0, 2, PATH_PENDING);
+ do_libcheck_pending(&c[1], PATH_PENDING);
+ do_check_state(&c[0], 0, PATH_PENDING);
+ do_check_state(&c[1], 0, PATH_PENDING);
return_io_getevents_none();
- do_check_state(&c[0], 0, 2, PATH_PENDING);
+ do_libcheck_pending(&c[0], PATH_PENDING);
return_io_getevents_none();
- do_check_state(&c[1], 0, 2, PATH_PENDING);
+ do_libcheck_pending(&c[1], PATH_PENDING);
+ do_check_state(&c[0], 0, PATH_PENDING);
+ do_check_state(&c[1], 0, PATH_PENDING);
return_io_getevents_none();
- do_check_state(&c[0], 0, 2, PATH_DOWN);
+ do_libcheck_pending(&c[0], PATH_DOWN);
if (!test_dev) {
/* can't pick which even gets returned on real devices */
return_io_getevents_nr(NULL, 1, &reqs[1], &res[1]);
- do_check_state(&c[1], 0, 2, PATH_UP);
+ do_libcheck_pending(&c[1], PATH_UP);
}
+ do_check_state(&c[0], 0, PATH_PENDING);
return_io_getevents_none();
- do_check_state(&c[0], 0, 2, PATH_DOWN);
+ do_libcheck_pending(&c[0], PATH_DOWN);
assert_true(is_checker_running(&c[0]));
+ do_check_state(&c[1], 0, PATH_PENDING);
+ do_check_state(&c[0], 0, PATH_PENDING);
return_io_getevents_nr(NULL, 2, reqs, res);
- do_check_state(&c[1], 0, 2, PATH_UP);
- do_check_state(&c[0], 0, 2, PATH_UP);
+ do_libcheck_pending(&c[1], PATH_UP);
+ do_libcheck_pending(&c[0], PATH_UP);
for (i = 0; i < 2; i++) {
assert_false(is_checker_running(&c[i]));
libcheck_free(&c[i]);
@@ -612,15 +638,17 @@ static void test_orphan_checker_cleanup(void **state)
assert_true(list_empty(&aio_grp_list));
will_return(__wrap_io_setup, 0);
for (i = 0; i < 2; i++)
- do_libcheck_init(&c[i], 4096, &reqs[i]);
+ do_libcheck_init(&c[i], 4096, 30, &reqs[i]);
aio_grp = get_aio_grp(c);
+ do_check_state(&c[0], 0, PATH_PENDING);
return_io_getevents_none();
- do_check_state(&c[0], 0, 30, PATH_PENDING);
+ do_libcheck_pending(&c[0], PATH_PENDING);
check_aio_grp(aio_grp, 2, 0);
libcheck_free(&c[0]);
check_aio_grp(aio_grp, 2, 1);
+ do_check_state(&c[1], 0, PATH_PENDING);
return_io_getevents_nr(NULL, 2, reqs, res);
- do_check_state(&c[1], 0, 2, PATH_UP);
+ do_libcheck_pending(&c[1], PATH_UP);
check_aio_grp(aio_grp, 1, 0);
libcheck_free(&c[1]);
check_aio_grp(aio_grp, 0, 0);
@@ -636,10 +664,11 @@ static void test_orphan_reset_cleanup(void **state)
assert_true(list_empty(&aio_grp_list));
will_return(__wrap_io_setup, 0);
- do_libcheck_init(&c, 4096, NULL);
+ do_libcheck_init(&c, 4096, 30, NULL);
orphan_aio_grp = get_aio_grp(&c);
+ do_check_state(&c, 0, PATH_PENDING);
return_io_getevents_none();
- do_check_state(&c, 0, 30, PATH_PENDING);
+ do_libcheck_pending(&c, PATH_PENDING);
check_aio_grp(orphan_aio_grp, 1, 0);
libcheck_free(&c);
check_aio_grp(orphan_aio_grp, 1, 1);
@@ -671,10 +700,10 @@ static void test_check_state_blksize(void **state)
assert_true(list_empty(&aio_grp_list));
will_return(__wrap_io_setup, 0);
for (i = 0; i < 3; i++)
- do_libcheck_init(&c[i], blksize[i], &reqs[i]);
+ do_libcheck_init(&c[i], blksize[i], 30, &reqs[i]);
for (i = 0; i < 3; i++) {
return_io_getevents_nr(NULL, 1, &reqs[i], &res[i]);
- do_check_state(&c[i], 1, 30, chk_state[i]);
+ do_check_state(&c[i], 1, chk_state[i]);
}
for (i = 0; i < 3; i++) {
assert_false(is_checker_running(&c[i]));
@@ -695,19 +724,21 @@ static void test_check_state_async(void **state)
assert_true(list_empty(&aio_grp_list));
will_return(__wrap_io_setup, 0);
for (i = 0; i < 257; i++)
- do_libcheck_init(&c[i], 4096, &reqs[i]);
+ do_libcheck_init(&c[i], 4096, 30, &reqs[i]);
for (i = 0; i < 256; i++) {
+ do_check_state(&c[i], 0, PATH_PENDING);
return_io_getevents_none();
- do_check_state(&c[i], 0, 30, PATH_PENDING);
+ do_libcheck_pending(&c[i], PATH_PENDING);
assert_true(is_checker_running(&c[i]));
}
+ do_check_state(&c[256], 0, PATH_PENDING);
return_io_getevents_nr(&full_timeout, 256, reqs, res);
return_io_getevents_nr(NULL, 1, &reqs[256], &res[256]);
- do_check_state(&c[256], 0, 30, PATH_UP);
+ do_libcheck_pending(&c[256], PATH_UP);
assert_false(is_checker_running(&c[256]));
libcheck_free(&c[256]);
for (i = 0; i < 256; i++) {
- do_check_state(&c[i], 0, 30, PATH_UP);
+ do_check_state(&c[i], 0, PATH_UP);
assert_false(is_checker_running(&c[i]));
libcheck_free(&c[i]);
}
--
2.45.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH 06/15] libmultipath: split get_state into two functions
2024-08-28 22:17 [PATCH 00/15] Yet Another path checker refactor Benjamin Marzinski
` (4 preceding siblings ...)
2024-08-28 22:17 ` [PATCH 05/15] multipath-tools tests: fix up directio tests Benjamin Marzinski
@ 2024-08-28 22:17 ` Benjamin Marzinski
2024-09-04 15:14 ` Martin Wilck
2024-08-28 22:17 ` [PATCH 07/15] libmultipath: change path_offline to path_sysfs_state Benjamin Marzinski
` (9 subsequent siblings)
15 siblings, 1 reply; 46+ messages in thread
From: Benjamin Marzinski @ 2024-08-28 22:17 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
get_state() is now split into start_checker(), which runs the checker
but doesn't wait for async checkers, and get_state(), which returns the
state from the checker, after waiting for async checkers if necessary.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
libmultipath/discovery.c | 22 ++++++++++++++++------
libmultipath/discovery.h | 4 +++-
libmultipath/propsel.c | 1 +
multipathd/main.c | 4 +++-
4 files changed, 23 insertions(+), 8 deletions(-)
diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
index 5648be60..e0f46ff2 100644
--- a/libmultipath/discovery.c
+++ b/libmultipath/discovery.c
@@ -1974,30 +1974,29 @@ cciss_ioctl_pathinfo(struct path *pp)
}
int
-get_state (struct path * pp, struct config *conf, int daemon, int oldstate)
+start_checker (struct path * pp, struct config *conf, int daemon, int oldstate)
{
struct checker * c = &pp->checker;
- int state;
if (!checker_selected(c)) {
if (daemon) {
if (pathinfo(pp, conf, DI_SYSFS) != PATHINFO_OK) {
condlog(3, "%s: couldn't get sysfs pathinfo",
pp->dev);
- return PATH_UNCHECKED;
+ return -1;
}
}
select_detect_checker(conf, pp);
select_checker(conf, pp);
if (!checker_selected(c)) {
condlog(3, "%s: No checker selected", pp->dev);
- return PATH_UNCHECKED;
+ return -1;
}
checker_set_fd(c, pp->fd);
if (checker_init(c, pp->mpp?&pp->mpp->mpcontext:NULL)) {
checker_clear(c);
condlog(3, "%s: checker init failed", pp->dev);
- return PATH_UNCHECKED;
+ return -1;
}
}
if (pp->mpp && !c->mpcontext)
@@ -2008,6 +2007,15 @@ get_state (struct path * pp, struct config *conf, int daemon, int oldstate)
else
checker_set_sync(c);
checker_check(c, oldstate);
+ return 0;
+}
+
+int
+get_state (struct path * pp)
+{
+ struct checker * c = &pp->checker;
+ int state;
+
state = checker_get_state(c);
condlog(3, "%s: %s state = %s", pp->dev,
checker_name(c), checker_state_name(state));
@@ -2455,7 +2463,9 @@ int pathinfo(struct path *pp, struct config *conf, int mask)
if (mask & DI_CHECKER) {
if (path_state == PATH_UP) {
- int newstate = get_state(pp, conf, 0, path_state);
+ int newstate = PATH_UNCHECKED;
+ if (start_checker(pp, conf, 0, path_state) == 0)
+ newstate = get_state(pp);
if (newstate != PATH_PENDING ||
pp->state == PATH_UNCHECKED ||
pp->state == PATH_WILD)
diff --git a/libmultipath/discovery.h b/libmultipath/discovery.h
index c93abf1c..f3e0c618 100644
--- a/libmultipath/discovery.h
+++ b/libmultipath/discovery.h
@@ -34,7 +34,9 @@ int path_discovery (vector pathvec, int flag);
int path_get_tpgs(struct path *pp); /* This function never returns TPGS_UNDEF */
int do_tur (char *);
int path_offline (struct path *);
-int get_state (struct path * pp, struct config * conf, int daemon, int state);
+int start_checker(struct path * pp, struct config * conf, int daemon,
+ int state);
+int get_state(struct path * pp);
int get_vpd_sgio (int fd, int pg, int vend_id, char * str, int maxlen);
int pathinfo (struct path * pp, struct config * conf, int mask);
int alloc_path_with_pathinfo (struct config *conf, struct udev_device *udevice,
diff --git a/libmultipath/propsel.c b/libmultipath/propsel.c
index a3fce203..ad771d35 100644
--- a/libmultipath/propsel.c
+++ b/libmultipath/propsel.c
@@ -700,6 +700,7 @@ out:
condlog(3, "%s: path_checker = %s %s", pp->dev,
checker_name(c), origin);
c->timeout = pp->checker_timeout;
+ c->path_state = PATH_UNCHECKED;
return 0;
}
diff --git a/multipathd/main.c b/multipathd/main.c
index 1b7fd04f..4f752adb 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2324,7 +2324,9 @@ check_path_state(struct path *pp)
if (newstate == PATH_UP) {
conf = get_multipath_config();
pthread_cleanup_push(put_multipath_config, conf);
- newstate = get_state(pp, conf, 1, newstate);
+ newstate = PATH_UNCHECKED;
+ if (start_checker(pp, conf, 1, newstate) == 0)
+ newstate = get_state(pp);
pthread_cleanup_pop(1);
} else {
checker_clear_message(&pp->checker);
--
2.45.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH 07/15] libmultipath: change path_offline to path_sysfs_state
2024-08-28 22:17 [PATCH 00/15] Yet Another path checker refactor Benjamin Marzinski
` (5 preceding siblings ...)
2024-08-28 22:17 ` [PATCH 06/15] libmultipath: split get_state into two functions Benjamin Marzinski
@ 2024-08-28 22:17 ` Benjamin Marzinski
2024-09-04 15:31 ` Martin Wilck
2024-08-28 22:17 ` [PATCH 08/15] multipathd: split check_path_state into two functions Benjamin Marzinski
` (8 subsequent siblings)
15 siblings, 1 reply; 46+ messages in thread
From: Benjamin Marzinski @ 2024-08-28 22:17 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
Instead of pp->offline being a binary value, change it to show the
actual result of looking at the sysfs state, and rename it to
pp->sysfs_state. Also change the function name from path_offline() to
path_sysfs_state(). Adapt the tests for pp->offline. This should not
change how multipath currently works. pp->sysfs_state will be used in
future patches.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
libmultipath/discovery.c | 65 ++++++++++++++++---------------
libmultipath/discovery.h | 2 +-
libmultipath/libmultipath.version | 3 +-
libmultipath/print.c | 2 +-
libmultipath/structs.h | 2 +-
multipathd/main.c | 4 +-
6 files changed, 41 insertions(+), 37 deletions(-)
diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
index e0f46ff2..6ccdfa0b 100644
--- a/libmultipath/discovery.c
+++ b/libmultipath/discovery.c
@@ -1076,18 +1076,16 @@ detect_alua(struct path * pp)
return;
}
- if (pp->fd == -1 || pp->offline)
+ if (pp->fd == -1 || pp->sysfs_state == PATH_DOWN)
return;
ret = get_target_port_group(pp);
if (ret < 0 || get_asymmetric_access_state(pp, ret) < 0) {
- int state;
-
if (ret == -RTPG_INQUIRY_FAILED)
return;
- state = path_offline(pp);
- if (state != PATH_UP)
+ path_sysfs_state(pp);
+ if (pp->sysfs_state != PATH_UP)
return;
pp->tpgs = TPGS_NONE;
@@ -1800,7 +1798,7 @@ common_sysfs_pathinfo (struct path * pp)
}
int
-path_offline (struct path * pp)
+path_sysfs_state(struct path * pp)
{
struct udev_device * parent;
char buff[SCSI_STATE_SIZE];
@@ -1814,7 +1812,8 @@ path_offline (struct path * pp)
subsys_type = "nvme";
}
else {
- return PATH_UP;
+ pp->sysfs_state = PATH_UP;
+ goto out;
}
parent = pp->udev;
@@ -1827,16 +1826,18 @@ path_offline (struct path * pp)
if (!parent) {
condlog(1, "%s: failed to get sysfs information", pp->dev);
- return PATH_REMOVED;
+ pp->sysfs_state = PATH_REMOVED;
+ goto out;
}
memset(buff, 0x0, SCSI_STATE_SIZE);
err = sysfs_attr_get_value(parent, "state", buff, sizeof(buff));
if (!sysfs_attr_value_ok(err, sizeof(buff))) {
if (err == -ENXIO)
- return PATH_REMOVED;
+ pp->sysfs_state = PATH_REMOVED;
else
- return PATH_DOWN;
+ pp->sysfs_state = PATH_DOWN;
+ goto out;
}
@@ -1844,31 +1845,34 @@ path_offline (struct path * pp)
if (pp->bus == SYSFS_BUS_SCSI) {
if (!strncmp(buff, "offline", 7)) {
- pp->offline = 1;
- return PATH_DOWN;
+ pp->sysfs_state = PATH_DOWN;
+ goto out;
+ } else if (!strncmp(buff, "blocked", 7) ||
+ !strncmp(buff, "quiesce", 7)) {
+ pp->sysfs_state = PATH_PENDING;
+ goto out;
+ } else if (!strncmp(buff, "running", 7)) {
+ pp->sysfs_state = PATH_UP;
+ goto out;
}
- pp->offline = 0;
- if (!strncmp(buff, "blocked", 7) ||
- !strncmp(buff, "quiesce", 7))
- return PATH_PENDING;
- else if (!strncmp(buff, "running", 7))
- return PATH_UP;
}
else if (pp->bus == SYSFS_BUS_NVME) {
if (!strncmp(buff, "dead", 4)) {
- pp->offline = 1;
- return PATH_DOWN;
+ pp->sysfs_state = PATH_DOWN;
+ goto out;
+ } else if (!strncmp(buff, "new", 3) ||
+ !strncmp(buff, "deleting", 8)) {
+ pp->sysfs_state = PATH_PENDING;
+ goto out;
+ } else if (!strncmp(buff, "live", 4)) {
+ pp->sysfs_state = PATH_UP;
+ goto out;
}
- pp->offline = 0;
- if (!strncmp(buff, "new", 3) ||
- !strncmp(buff, "deleting", 8))
- return PATH_PENDING;
- else if (!strncmp(buff, "live", 4))
- return PATH_UP;
}
-
- return PATH_DOWN;
+ pp->sysfs_state = PATH_DOWN;
+out:
+ return pp->sysfs_state;
}
static int
@@ -2052,8 +2056,7 @@ get_prio (struct path * pp)
old_prio = pp->priority;
pp->priority = prio_getprio(p, pp);
if (pp->priority < 0) {
- /* this changes pp->offline, but why not */
- int state = path_offline(pp);
+ int state = path_sysfs_state(pp);
if (state == PATH_DOWN || state == PATH_PENDING) {
pp->priority = old_prio;
@@ -2424,7 +2427,7 @@ int pathinfo(struct path *pp, struct config *conf, int mask)
return PATHINFO_SKIPPED;
}
- path_state = path_offline(pp);
+ path_state = path_sysfs_state(pp);
if (path_state == PATH_REMOVED)
goto blank;
else if (mask & DI_NOIO) {
diff --git a/libmultipath/discovery.h b/libmultipath/discovery.h
index f3e0c618..7d42eae5 100644
--- a/libmultipath/discovery.h
+++ b/libmultipath/discovery.h
@@ -33,7 +33,7 @@ struct config;
int path_discovery (vector pathvec, int flag);
int path_get_tpgs(struct path *pp); /* This function never returns TPGS_UNDEF */
int do_tur (char *);
-int path_offline (struct path *);
+int path_sysfs_state(struct path *);
int start_checker(struct path * pp, struct config * conf, int daemon,
int state);
int get_state(struct path * pp);
diff --git a/libmultipath/libmultipath.version b/libmultipath/libmultipath.version
index 21d48da6..6439d3a7 100644
--- a/libmultipath/libmultipath.version
+++ b/libmultipath/libmultipath.version
@@ -146,7 +146,7 @@ global:
path_discovery;
path_get_tpgs;
pathinfo;
- path_offline;
+ path_sysfs_state;
print_all_paths;
print_foreign_topology;
print_multipath_topology__;
@@ -160,6 +160,7 @@ global:
remove_wwid;
replace_wwids;
reset_checker_classes;
+ start_checker;
select_all_tg_pt;
select_action;
select_find_multipaths_timeout;
diff --git a/libmultipath/print.c b/libmultipath/print.c
index e536c5c0..00c03ace 100644
--- a/libmultipath/print.c
+++ b/libmultipath/print.c
@@ -514,7 +514,7 @@ snprint_offline (struct strbuf *buff, const struct path * pp)
{
if (!pp || !pp->mpp)
return append_strbuf_str(buff, "unknown");
- else if (pp->offline)
+ else if (pp->sysfs_state == PATH_DOWN)
return append_strbuf_str(buff, "offline");
else
return append_strbuf_str(buff, "running");
diff --git a/libmultipath/structs.h b/libmultipath/structs.h
index 074faca6..d8231e95 100644
--- a/libmultipath/structs.h
+++ b/libmultipath/structs.h
@@ -362,7 +362,7 @@ struct path {
unsigned int tick;
unsigned int pending_ticks;
int bus;
- int offline;
+ int sysfs_state;
int state;
int dmstate;
int chkrstate;
diff --git a/multipathd/main.c b/multipathd/main.c
index 4f752adb..33a57041 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -96,7 +96,7 @@ void * mpath_pr_event_handler_fn (void * );
do { \
if (pp->mpp && checker_selected(&pp->checker) && \
lvl <= libmp_verbosity) { \
- if (pp->offline) \
+ if (pp->sysfs_state == PATH_DOWN) \
condlog(lvl, "%s: %s - path offline", \
pp->mpp->alias, pp->dev); \
else { \
@@ -2320,7 +2320,7 @@ check_path_state(struct path *pp)
int newstate;
struct config *conf;
- newstate = path_offline(pp);
+ newstate = path_sysfs_state(pp);
if (newstate == PATH_UP) {
conf = get_multipath_config();
pthread_cleanup_push(put_multipath_config, conf);
--
2.45.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH 08/15] multipathd: split check_path_state into two functions
2024-08-28 22:17 [PATCH 00/15] Yet Another path checker refactor Benjamin Marzinski
` (6 preceding siblings ...)
2024-08-28 22:17 ` [PATCH 07/15] libmultipath: change path_offline to path_sysfs_state Benjamin Marzinski
@ 2024-08-28 22:17 ` Benjamin Marzinski
2024-08-28 22:17 ` [PATCH 09/15] multipathd: split do_checker_path Benjamin Marzinski
` (7 subsequent siblings)
15 siblings, 0 replies; 46+ messages in thread
From: Benjamin Marzinski @ 2024-08-28 22:17 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
check_path_state() is now split into start_path_check(), which calls
path_sysfs_state() and if the path is up also calls start_checker(), and
check_path_state() which gets the new state from either pp->sysfs_state
or get_state().
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
multipathd/main.c | 26 +++++++++++++++++---------
1 file changed, 17 insertions(+), 9 deletions(-)
diff --git a/multipathd/main.c b/multipathd/main.c
index 33a57041..d157e3c2 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2314,25 +2314,31 @@ should_skip_path(struct path *pp){
return 0;
}
-static int
-check_path_state(struct path *pp)
+static void
+start_path_check(struct path *pp)
{
- int newstate;
struct config *conf;
- newstate = path_sysfs_state(pp);
- if (newstate == PATH_UP) {
+ if (path_sysfs_state(pp) == PATH_UP) {
conf = get_multipath_config();
pthread_cleanup_push(put_multipath_config, conf);
- newstate = PATH_UNCHECKED;
- if (start_checker(pp, conf, 1, newstate) == 0)
- newstate = get_state(pp);
+ start_checker(pp, conf, 1, PATH_UNCHECKED);
pthread_cleanup_pop(1);
} else {
checker_clear_message(&pp->checker);
condlog(3, "%s: state %s, checker not called",
- pp->dev, checker_state_name(newstate));
+ pp->dev, checker_state_name(pp->sysfs_state));
}
+}
+
+static int
+check_path_state(struct path *pp)
+{
+ int newstate = pp->sysfs_state;
+ struct config *conf;
+
+ if (newstate == PATH_UP)
+ newstate = get_state(pp);
/*
* Wait for uevent for removed paths;
* some LLDDs like zfcp keep paths unavailable
@@ -2413,6 +2419,7 @@ do_check_path (struct vectors * vecs, struct path * pp)
pp->checkint = checkint;
};
+ start_path_check(pp);
newstate = check_path_state(pp);
if (newstate == PATH_WILD || newstate == PATH_UNCHECKED)
return CHECK_PATH_SKIPPED;
@@ -2752,6 +2759,7 @@ handle_uninitialized_path(struct vectors * vecs, struct path * pp,
}
}
+ start_path_check(pp);
newstate = check_path_state(pp);
if (!strlen(pp->wwid) &&
--
2.45.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH 09/15] multipathd: split do_checker_path
2024-08-28 22:17 [PATCH 00/15] Yet Another path checker refactor Benjamin Marzinski
` (7 preceding siblings ...)
2024-08-28 22:17 ` [PATCH 08/15] multipathd: split check_path_state into two functions Benjamin Marzinski
@ 2024-08-28 22:17 ` Benjamin Marzinski
2024-08-28 22:17 ` [PATCH 10/15] multipathd: split check_path into two functions Benjamin Marzinski
` (6 subsequent siblings)
15 siblings, 0 replies; 46+ messages in thread
From: Benjamin Marzinski @ 2024-08-28 22:17 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
Move the code that starts the path checker from do_check_path() into
check_path(), rename the remainder of do_check_path() to
update_path_state() and call that from check_path().
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
multipathd/main.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/multipathd/main.c b/multipathd/main.c
index d157e3c2..94d4e421 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2396,7 +2396,7 @@ enum check_path_return {
};
static int
-do_check_path (struct vectors * vecs, struct path * pp)
+update_path_state (struct vectors * vecs, struct path * pp)
{
int newstate;
int new_path_up = 0;
@@ -2414,12 +2414,6 @@ do_check_path (struct vectors * vecs, struct path * pp)
marginal_pathgroups = conf->marginal_pathgroups;
put_multipath_config(conf);
- if (pp->checkint == CHECKINT_UNDEF) {
- condlog(0, "%s: BUG: checkint is not set", pp->dev);
- pp->checkint = checkint;
- };
-
- start_path_check(pp);
newstate = check_path_state(pp);
if (newstate == PATH_WILD || newstate == PATH_UNCHECKED)
return CHECK_PATH_SKIPPED;
@@ -2639,7 +2633,7 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks,
time_t start_secs)
{
int r;
- unsigned int adjust_int, max_checkint;
+ unsigned int adjust_int, checkint, max_checkint;
struct config *conf;
time_t next_idx, goal_idx;
@@ -2652,14 +2646,21 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks,
return CHECK_PATH_SKIPPED;
conf = get_multipath_config();
+ checkint = conf->checkint;
max_checkint = conf->max_checkint;
adjust_int = conf->adjust_int;
put_multipath_config(conf);
- r = do_check_path(vecs, pp);
+ if (pp->checkint == CHECKINT_UNDEF) {
+ condlog(0, "%s: BUG: checkint is not set", pp->dev);
+ pp->checkint = checkint;
+ }
+
+ start_path_check(pp);
+ r = update_path_state(vecs, pp);
/*
- * do_check_path() removed or orphaned the path.
+ * update_path() removed or orphaned the path.
*/
if (r == CHECK_PATH_REMOVED || !pp->mpp)
return r;
--
2.45.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH 10/15] multipathd: split check_path into two functions
2024-08-28 22:17 [PATCH 00/15] Yet Another path checker refactor Benjamin Marzinski
` (8 preceding siblings ...)
2024-08-28 22:17 ` [PATCH 09/15] multipathd: split do_checker_path Benjamin Marzinski
@ 2024-08-28 22:17 ` Benjamin Marzinski
2024-09-04 15:38 ` Martin Wilck
2024-08-28 22:17 ` [PATCH 11/15] multipathd: split handle_uninitialized_path " Benjamin Marzinski
` (5 subsequent siblings)
15 siblings, 1 reply; 46+ messages in thread
From: Benjamin Marzinski @ 2024-08-28 22:17 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
Split out the code that updates a path's state and sets up the next
check time into its own function, update_path().
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
multipathd/main.c | 31 ++++++++++++++++++++++---------
1 file changed, 22 insertions(+), 9 deletions(-)
diff --git a/multipathd/main.c b/multipathd/main.c
index 94d4e421..300f8247 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2390,6 +2390,7 @@ sync_mpp(struct vectors * vecs, struct multipath *mpp, unsigned int ticks)
}
enum check_path_return {
+ CHECK_PATH_STARTED,
CHECK_PATH_CHECKED,
CHECK_PATH_SKIPPED,
CHECK_PATH_REMOVED,
@@ -2629,13 +2630,10 @@ update_path_state (struct vectors * vecs, struct path * pp)
}
static int
-check_path (struct vectors * vecs, struct path * pp, unsigned int ticks,
- time_t start_secs)
+check_path (struct path * pp, unsigned int ticks)
{
- int r;
- unsigned int adjust_int, checkint, max_checkint;
+ unsigned int checkint;
struct config *conf;
- time_t next_idx, goal_idx;
if (pp->initialized == INIT_REMOVED)
return CHECK_PATH_SKIPPED;
@@ -2647,8 +2645,6 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks,
conf = get_multipath_config();
checkint = conf->checkint;
- max_checkint = conf->max_checkint;
- adjust_int = conf->adjust_int;
put_multipath_config(conf);
if (pp->checkint == CHECKINT_UNDEF) {
@@ -2657,6 +2653,17 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks,
}
start_path_check(pp);
+ return CHECK_PATH_STARTED;
+}
+
+static int
+update_path(struct vectors * vecs, struct path * pp, time_t start_secs)
+{
+ int r;
+ unsigned int adjust_int, max_checkint;
+ struct config *conf;
+ time_t next_idx, goal_idx;
+
r = update_path_state(vecs, pp);
/*
@@ -2685,6 +2692,10 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks,
if (pp->tick == 1)
return r;
+ conf = get_multipath_config();
+ max_checkint = conf->max_checkint;
+ adjust_int = conf->adjust_int;
+ put_multipath_config(conf);
/*
* every mpp has a goal_idx in the range of
* 0 <= goal_idx < conf->max_checkint
@@ -2818,8 +2829,10 @@ check_paths(struct vectors *vecs, unsigned int ticks, int *num_paths_p)
if (!pp->mpp || pp->is_checked)
continue;
pp->is_checked = true;
- rc = check_path(vecs, pp, ticks,
- start_time.tv_sec);
+ rc = check_path(pp, ticks);
+ if (rc == CHECK_PATH_STARTED)
+ rc = update_path(vecs, pp,
+ start_time.tv_sec);
if (rc == CHECK_PATH_CHECKED)
(*num_paths_p)++;
if (++paths_checked % 128 == 0)
--
2.45.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH 11/15] multipathd: split handle_uninitialized_path into two functions
2024-08-28 22:17 [PATCH 00/15] Yet Another path checker refactor Benjamin Marzinski
` (9 preceding siblings ...)
2024-08-28 22:17 ` [PATCH 10/15] multipathd: split check_path into two functions Benjamin Marzinski
@ 2024-08-28 22:17 ` Benjamin Marzinski
2024-08-28 22:17 ` [PATCH 12/15] multipathd: split check_paths " Benjamin Marzinski
` (4 subsequent siblings)
15 siblings, 0 replies; 46+ messages in thread
From: Benjamin Marzinski @ 2024-08-28 22:17 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
Split handle_uninitialized_path() into check_uninitialized_path, which
handles udev retriggers for INIT_MISSING_UDEV paths and starts the path
checker for INIT_FAILED and INIT_NEW paths, and
update_uninitialized_path() which gets the path checker result and
reruns pathinfo if the path is up.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
multipathd/main.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/multipathd/main.c b/multipathd/main.c
index 300f8247..91263a10 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2718,13 +2718,10 @@ update_path(struct vectors * vecs, struct path * pp, time_t start_secs)
}
static int
-handle_uninitialized_path(struct vectors * vecs, struct path * pp,
- unsigned int ticks)
+check_uninitialized_path(struct path * pp, unsigned int ticks)
{
- int newstate;
int retrigger_tries;
struct config *conf;
- int ret;
if (pp->initialized != INIT_NEW && pp->initialized != INIT_FAILED &&
pp->initialized != INIT_MISSING_UDEV)
@@ -2772,6 +2769,15 @@ handle_uninitialized_path(struct vectors * vecs, struct path * pp,
}
start_path_check(pp);
+ return CHECK_PATH_STARTED;
+}
+
+static int
+update_uninitialized_path(struct vectors * vecs, struct path * pp)
+{
+ int newstate, ret;
+ struct config *conf;
+
newstate = check_path_state(pp);
if (!strlen(pp->wwid) &&
@@ -2862,7 +2868,9 @@ next_mpp:
continue;
pp->is_checked = true;
- rc = handle_uninitialized_path(vecs, pp, ticks);
+ rc = check_uninitialized_path(pp, ticks);
+ if (rc == CHECK_PATH_STARTED)
+ rc = update_uninitialized_path(vecs, pp);
if (rc == CHECK_PATH_REMOVED)
i--;
else if (rc == CHECK_PATH_CHECKED)
--
2.45.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH 12/15] multipathd: split check_paths into two functions
2024-08-28 22:17 [PATCH 00/15] Yet Another path checker refactor Benjamin Marzinski
` (10 preceding siblings ...)
2024-08-28 22:17 ` [PATCH 11/15] multipathd: split handle_uninitialized_path " Benjamin Marzinski
@ 2024-08-28 22:17 ` Benjamin Marzinski
2024-08-28 22:17 ` [PATCH 13/15] multipathd: update priority once after updating all paths Benjamin Marzinski
` (3 subsequent siblings)
15 siblings, 0 replies; 46+ messages in thread
From: Benjamin Marzinski @ 2024-08-28 22:17 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
Instead of checking and updating each path, the checkerloop now starts
the checkers on all the paths in check_paths(), and then goes back and
updates all the paths in update_paths(). Since the async checkers use an
absolute time to wait for before returning PATH_PENDING, only one
checker actually needs to be waited for in update_paths(). The rest will
already have reached their timeout when update_path() is called for
them.
The check_paths() and update_paths() loop over the pathvec instead of
looping through the multipath device paths to avoid having to restart
checking of the device paths when the multipath device needs to be
resynced while updating the paths.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
libmultipath/structs.h | 10 +++-
multipathd/main.c | 105 +++++++++++++++++++----------------------
2 files changed, 57 insertions(+), 58 deletions(-)
diff --git a/libmultipath/structs.h b/libmultipath/structs.h
index d8231e95..af8e31e9 100644
--- a/libmultipath/structs.h
+++ b/libmultipath/structs.h
@@ -314,6 +314,14 @@ enum recheck_wwid_states {
RECHECK_WWID_ON = YNU_YES,
};
+enum check_path_states {
+ CHECK_PATH_UNCHECKED,
+ CHECK_PATH_STARTED,
+ CHECK_PATH_CHECKED,
+ CHECK_PATH_SKIPPED,
+ CHECK_PATH_REMOVED,
+};
+
struct vpd_vendor_page {
int pg;
const char *name;
@@ -395,7 +403,7 @@ struct path {
int fast_io_fail;
unsigned int dev_loss;
int eh_deadline;
- bool is_checked;
+ enum check_path_states is_checked;
bool can_use_env_uid;
unsigned int checker_timeout;
/* configlet pointers */
diff --git a/multipathd/main.c b/multipathd/main.c
index 91263a10..2bcc6066 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2389,13 +2389,6 @@ sync_mpp(struct vectors * vecs, struct multipath *mpp, unsigned int ticks)
do_sync_mpp(vecs, mpp);
}
-enum check_path_return {
- CHECK_PATH_STARTED,
- CHECK_PATH_CHECKED,
- CHECK_PATH_SKIPPED,
- CHECK_PATH_REMOVED,
-};
-
static int
update_path_state (struct vectors * vecs, struct path * pp)
{
@@ -2735,6 +2728,7 @@ check_uninitialized_path(struct path * pp, unsigned int ticks)
conf = get_multipath_config();
retrigger_tries = conf->retrigger_tries;
pp->tick = conf->max_checkint;
+ pp->checkint = conf->checkint;
put_multipath_config(conf);
if (pp->initialized == INIT_MISSING_UDEV) {
@@ -2778,6 +2772,10 @@ update_uninitialized_path(struct vectors * vecs, struct path * pp)
int newstate, ret;
struct config *conf;
+ if (pp->initialized != INIT_NEW && pp->initialized != INIT_FAILED &&
+ pp->initialized != INIT_MISSING_UDEV)
+ return CHECK_PATH_SKIPPED;
+
newstate = check_path_state(pp);
if (!strlen(pp->wwid) &&
@@ -2807,80 +2805,69 @@ update_uninitialized_path(struct vectors * vecs, struct path * pp)
enum checker_state {
CHECKER_STARTING,
- CHECKER_RUNNING,
+ CHECKER_CHECKING_PATHS,
+ CHECKER_UPDATING_PATHS,
CHECKER_FINISHED,
};
static enum checker_state
-check_paths(struct vectors *vecs, unsigned int ticks, int *num_paths_p)
+check_paths(struct vectors *vecs, unsigned int ticks)
{
unsigned int paths_checked = 0;
struct timespec diff_time, start_time, end_time;
- struct multipath *mpp;
struct path *pp;
- int i, rc;
+ int i;
get_monotonic_time(&start_time);
- vector_foreach_slot(vecs->mpvec, mpp, i) {
- struct pathgroup *pgp;
- struct path *pp;
- int j, k;
- bool check_for_waiters = false;
- /* maps can be rechecked, so this is not always 0 */
- int synced_count = mpp->synced_count;
-
- vector_foreach_slot (mpp->pg, pgp, j) {
- vector_foreach_slot (pgp->paths, pp, k) {
- if (!pp->mpp || pp->is_checked)
- continue;
- pp->is_checked = true;
- rc = check_path(pp, ticks);
- if (rc == CHECK_PATH_STARTED)
- rc = update_path(vecs, pp,
- start_time.tv_sec);
- if (rc == CHECK_PATH_CHECKED)
- (*num_paths_p)++;
- if (++paths_checked % 128 == 0)
- check_for_waiters = true;
- /*
- * mpp has been removed or resynced. Path may
- * have been removed.
- */
- if (VECTOR_SLOT(vecs->mpvec, i) != mpp ||
- synced_count != mpp->synced_count) {
- i--;
- goto next_mpp;
- }
- }
- }
-next_mpp:
- if (check_for_waiters &&
+ vector_foreach_slot(vecs->pathvec, pp, i) {
+ if (pp->is_checked != CHECK_PATH_UNCHECKED)
+ continue;
+ if (pp->mpp)
+ pp->is_checked = check_path(pp, ticks);
+ else
+ pp->is_checked = check_uninitialized_path(pp, ticks);
+ if (++paths_checked % 128 == 0 &&
(lock_has_waiters(&vecs->lock) || waiting_clients())) {
get_monotonic_time(&end_time);
timespecsub(&end_time, &start_time, &diff_time);
if (diff_time.tv_sec > 0)
- return CHECKER_RUNNING;
+ return CHECKER_CHECKING_PATHS;
}
}
+ return CHECKER_UPDATING_PATHS;
+}
+
+static enum checker_state
+update_paths(struct vectors *vecs, int *num_paths_p, time_t start_secs)
+{
+ unsigned int paths_checked = 0;
+ struct timespec diff_time, start_time, end_time;
+ struct path *pp;
+ int i, rc;
+
+ get_monotonic_time(&start_time);
+
vector_foreach_slot(vecs->pathvec, pp, i) {
- if (pp->mpp || pp->is_checked)
+ if (pp->is_checked != CHECK_PATH_STARTED)
continue;
- pp->is_checked = true;
-
- rc = check_uninitialized_path(pp, ticks);
- if (rc == CHECK_PATH_STARTED)
+ if (pp->mpp)
+ rc = update_path(vecs, pp, start_secs);
+ else
rc = update_uninitialized_path(vecs, pp);
if (rc == CHECK_PATH_REMOVED)
i--;
- else if (rc == CHECK_PATH_CHECKED)
- (*num_paths_p)++;
+ else {
+ pp->is_checked = rc;
+ if (rc == CHECK_PATH_CHECKED)
+ (*num_paths_p)++;
+ }
if (++paths_checked % 128 == 0 &&
(lock_has_waiters(&vecs->lock) || waiting_clients())) {
get_monotonic_time(&end_time);
timespecsub(&end_time, &start_time, &diff_time);
if (diff_time.tv_sec > 0)
- return CHECKER_RUNNING;
+ return CHECKER_UPDATING_PATHS;
}
}
return CHECKER_FINISHED;
@@ -2948,10 +2935,14 @@ checkerloop (void *ap)
vector_foreach_slot(vecs->mpvec, mpp, i)
sync_mpp(vecs, mpp, ticks);
vector_foreach_slot(vecs->pathvec, pp, i)
- pp->is_checked = false;
- checker_state = CHECKER_RUNNING;
+ pp->is_checked = CHECK_PATH_UNCHECKED;
+ checker_state = CHECKER_CHECKING_PATHS;
}
- checker_state = check_paths(vecs, ticks, &num_paths);
+ if (checker_state == CHECKER_CHECKING_PATHS)
+ checker_state = check_paths(vecs, ticks);
+ if (checker_state == CHECKER_UPDATING_PATHS)
+ checker_state = update_paths(vecs, &num_paths,
+ start_time.tv_sec);
lock_cleanup_pop(vecs->lock);
if (checker_state != CHECKER_FINISHED) {
/* Yield to waiters */
--
2.45.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH 13/15] multipathd: update priority once after updating all paths
2024-08-28 22:17 [PATCH 00/15] Yet Another path checker refactor Benjamin Marzinski
` (11 preceding siblings ...)
2024-08-28 22:17 ` [PATCH 12/15] multipathd: split check_paths " Benjamin Marzinski
@ 2024-08-28 22:17 ` Benjamin Marzinski
2024-09-04 15:06 ` Martin Wilck
2024-09-04 18:57 ` Martin Wilck
2024-08-28 22:17 ` [PATCH 14/15] multipathd: remove pointless check Benjamin Marzinski
` (2 subsequent siblings)
15 siblings, 2 replies; 46+ messages in thread
From: Benjamin Marzinski @ 2024-08-28 22:17 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
Instead of updating the path priorities and possibly reloading the
multipath device when each path is updated, wait till all paths
have been updated, and then go through the multipath devices updating
the priorities once, reloading if necessary.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
libmultipath/structs.h | 9 +++
multipathd/main.c | 169 ++++++++++++++++++++++++++---------------
2 files changed, 118 insertions(+), 60 deletions(-)
diff --git a/libmultipath/structs.h b/libmultipath/structs.h
index af8e31e9..1f531d30 100644
--- a/libmultipath/structs.h
+++ b/libmultipath/structs.h
@@ -318,6 +318,7 @@ enum check_path_states {
CHECK_PATH_UNCHECKED,
CHECK_PATH_STARTED,
CHECK_PATH_CHECKED,
+ CHECK_PATH_NEW_UP,
CHECK_PATH_SKIPPED,
CHECK_PATH_REMOVED,
};
@@ -421,6 +422,13 @@ enum prflag_value {
PRFLAG_SET,
};
+enum prio_update_type {
+ PRIO_UPDATE_NONE,
+ PRIO_UPDATE_NORMAL,
+ PRIO_UPDATE_NEW_PATH,
+ PRIO_UPDATE_MARGINAL,
+};
+
struct multipath {
char wwid[WWID_SIZE];
char alias_old[WWID_SIZE];
@@ -464,6 +472,7 @@ struct multipath {
int queue_mode;
unsigned int sync_tick;
int synced_count;
+ enum prio_update_type prio_update;
uid_t uid;
gid_t gid;
mode_t mode;
diff --git a/multipathd/main.c b/multipathd/main.c
index 2bcc6066..1511f701 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1996,15 +1996,13 @@ mpvec_garbage_collector (struct vectors * vecs)
* best pathgroup, and this is the first path in the pathgroup to come back
* up, then switch to this pathgroup */
static int
-followover_should_failback(struct path * pp)
+do_followover_should_failback(struct path * pp)
{
struct pathgroup * pgp;
struct path *pp1;
int i;
- if (pp->mpp->pgfailback != -FAILBACK_FOLLOWOVER ||
- !pp->mpp->pg || !pp->pgindex ||
- pp->pgindex != pp->mpp->bestpg)
+ if (pp->pgindex != pp->mpp->bestpg)
return 0;
pgp = VECTOR_SLOT(pp->mpp->pg, pp->pgindex - 1);
@@ -2017,6 +2015,26 @@ followover_should_failback(struct path * pp)
return 1;
}
+static int
+followover_should_failback(struct multipath *mpp)
+{
+ struct path *pp;
+ struct pathgroup * pgp;
+ int i, j;
+
+ if (mpp->pgfailback != -FAILBACK_FOLLOWOVER || !mpp->pg || !mpp->bestpg)
+ return 0;
+
+ vector_foreach_slot (mpp->pg, pgp, i) {
+ vector_foreach_slot (pgp->paths, pp, j) {
+ if (pp->is_checked == CHECK_PATH_NEW_UP &&
+ do_followover_should_failback(pp))
+ return 1;
+ }
+ }
+ return 0;
+}
+
static void
missing_uev_wait_tick(struct vectors *vecs)
{
@@ -2132,41 +2150,53 @@ partial_retrigger_tick(vector pathvec)
}
}
-static int update_prio(struct path *pp, int force_refresh_all)
+static bool update_prio(struct multipath *mpp, bool refresh_all)
{
int oldpriority;
- struct path *pp1;
+ struct path *pp;
struct pathgroup * pgp;
- int i, j, changed = 0;
+ int i, j;
+ bool changed = false;
+ bool skipped_path = false;
struct config *conf;
- oldpriority = pp->priority;
- if (pp->state != PATH_DOWN) {
- conf = get_multipath_config();
- pthread_cleanup_push(put_multipath_config, conf);
- pathinfo(pp, conf, DI_PRIO);
- pthread_cleanup_pop(1);
+ vector_foreach_slot (mpp->pg, pgp, i) {
+ vector_foreach_slot (pgp->paths, pp, j) {
+ if (pp->state != PATH_UP && pp->state != PATH_GHOST)
+ continue;
+ if (!refresh_all &&
+ pp->is_checked != CHECK_PATH_CHECKED) {
+ skipped_path = true;
+ continue;
+ }
+ oldpriority = pp->priority;
+ conf = get_multipath_config();
+ pthread_cleanup_push(put_multipath_config, conf);
+ pathinfo(pp, conf, DI_PRIO);
+ pthread_cleanup_pop(1);
+ if (pp->priority != oldpriority)
+ changed = true;
+ }
}
-
- if (pp->priority != oldpriority)
- changed = 1;
- else if (!force_refresh_all)
- return 0;
-
- vector_foreach_slot (pp->mpp->pg, pgp, i) {
- vector_foreach_slot (pgp->paths, pp1, j) {
- if (pp1 == pp || pp1->state == PATH_DOWN)
+ if (!changed || !skipped_path)
+ return changed;
+ /*
+ * If a path changed priorities, refresh the priorities of any
+ * paths we skipped
+ */
+ vector_foreach_slot (mpp->pg, pgp, i) {
+ vector_foreach_slot (pgp->paths, pp, j) {
+ if (pp->state != PATH_UP && pp->state != PATH_GHOST)
+ continue;
+ if (pp->is_checked == CHECK_PATH_CHECKED)
continue;
- oldpriority = pp1->priority;
conf = get_multipath_config();
pthread_cleanup_push(put_multipath_config, conf);
- pathinfo(pp1, conf, DI_PRIO);
+ pathinfo(pp, conf, DI_PRIO);
pthread_cleanup_pop(1);
- if (pp1->priority != oldpriority)
- changed = 1;
}
}
- return changed;
+ return true;
}
static int reload_map(struct vectors *vecs, struct multipath *mpp,
@@ -2393,14 +2423,12 @@ static int
update_path_state (struct vectors * vecs, struct path * pp)
{
int newstate;
- int new_path_up = 0;
int chkr_new_path_up = 0;
int disable_reinstate = 0;
int oldchkrstate = pp->chkrstate;
unsigned int checkint, max_checkint;
struct config *conf;
- int marginal_pathgroups, marginal_changed = 0;
- bool need_reload;
+ int marginal_pathgroups;
conf = get_multipath_config();
checkint = conf->checkint;
@@ -2462,7 +2490,7 @@ update_path_state (struct vectors * vecs, struct path * pp)
}
if (!pp->marginal) {
pp->marginal = 1;
- marginal_changed = 1;
+ pp->mpp->prio_update = PRIO_UPDATE_MARGINAL;
}
} else {
if (pp->marginal || pp->state == PATH_DELAYED)
@@ -2470,7 +2498,7 @@ update_path_state (struct vectors * vecs, struct path * pp)
pp->dev);
if (marginal_pathgroups && pp->marginal) {
pp->marginal = 0;
- marginal_changed = 1;
+ pp->mpp->prio_update = PRIO_UPDATE_MARGINAL;
}
}
}
@@ -2537,7 +2565,8 @@ update_path_state (struct vectors * vecs, struct path * pp)
*/
if (!disable_reinstate)
reinstate_path(pp);
- new_path_up = 1;
+ if (pp->mpp->prio_update != PRIO_UPDATE_MARGINAL)
+ pp->mpp->prio_update = PRIO_UPDATE_NEW_PATH;
if (oldchkrstate != PATH_UP && oldchkrstate != PATH_GHOST)
chkr_new_path_up = 1;
@@ -2588,38 +2617,48 @@ update_path_state (struct vectors * vecs, struct path * pp)
LOG_MSG(2, pp);
}
}
-
+ if (pp->mpp->prio_update == PRIO_UPDATE_NONE &&
+ (newstate == PATH_UP || newstate == PATH_GHOST))
+ pp->mpp->prio_update = PRIO_UPDATE_NORMAL;
pp->state = newstate;
+ return chkr_new_path_up ? CHECK_PATH_NEW_UP : CHECK_PATH_CHECKED;
+}
- if (pp->mpp->wait_for_udev)
- return CHECK_PATH_CHECKED;
- /*
- * path prio refreshing
- */
- condlog(4, "path prio refresh");
-
- if (marginal_changed) {
- update_prio(pp, 1);
- reload_and_sync_map(pp->mpp, vecs);
- } else if (update_prio(pp, new_path_up) &&
- pp->mpp->pgpolicyfn == (pgpolicyfn *)group_by_prio &&
- pp->mpp->pgfailback == -FAILBACK_IMMEDIATE) {
+static int
+update_mpp_prio(struct vectors *vecs, struct multipath *mpp)
+{
+ bool need_reload, changed;
+ enum prio_update_type prio_update = mpp->prio_update;
+ mpp->prio_update = PRIO_UPDATE_NONE;
+
+ if (mpp->wait_for_udev || prio_update == PRIO_UPDATE_NONE)
+ return 0;
+ condlog(4, "prio refresh");
+
+ changed = update_prio(mpp, prio_update != PRIO_UPDATE_NORMAL);
+ if (prio_update == PRIO_UPDATE_MARGINAL)
+ return reload_and_sync_map(mpp, vecs);
+ if (changed && mpp->pgpolicyfn == (pgpolicyfn *)group_by_prio &&
+ mpp->pgfailback == -FAILBACK_IMMEDIATE) {
condlog(2, "%s: path priorities changed. reloading",
- pp->mpp->alias);
- reload_and_sync_map(pp->mpp, vecs);
- } else if (need_switch_pathgroup(pp->mpp, &need_reload)) {
- if (pp->mpp->pgfailback > 0 &&
- (new_path_up || pp->mpp->failback_tick <= 0))
- pp->mpp->failback_tick = pp->mpp->pgfailback + 1;
- else if (pp->mpp->pgfailback == -FAILBACK_IMMEDIATE ||
- (chkr_new_path_up && followover_should_failback(pp))) {
+ mpp->alias);
+ return reload_and_sync_map(mpp, vecs);
+ }
+ if (need_switch_pathgroup(mpp, &need_reload)) {
+ if (mpp->pgfailback > 0 &&
+ (prio_update == PRIO_UPDATE_NEW_PATH ||
+ mpp->failback_tick <= 0))
+ mpp->failback_tick = mpp->pgfailback + 1;
+ else if (mpp->pgfailback == -FAILBACK_IMMEDIATE ||
+ (prio_update == PRIO_UPDATE_NEW_PATH &&
+ followover_should_failback(mpp))) {
if (need_reload)
- reload_and_sync_map(pp->mpp, vecs);
+ return reload_and_sync_map(mpp, vecs);
else
- switch_pathgroup(pp->mpp);
+ switch_pathgroup(mpp);
}
}
- return CHECK_PATH_CHECKED;
+ return 0;
}
static int
@@ -2859,7 +2898,7 @@ update_paths(struct vectors *vecs, int *num_paths_p, time_t start_secs)
i--;
else {
pp->is_checked = rc;
- if (rc == CHECK_PATH_CHECKED)
+ if (rc == CHECK_PATH_CHECKED || CHECK_PATH_NEW_UP)
(*num_paths_p)++;
}
if (++paths_checked % 128 == 0 &&
@@ -2932,8 +2971,10 @@ checkerloop (void *ap)
vector_foreach_slot(vecs->mpvec, mpp, i)
mpp->synced_count = 0;
if (checker_state == CHECKER_STARTING) {
- vector_foreach_slot(vecs->mpvec, mpp, i)
+ vector_foreach_slot(vecs->mpvec, mpp, i) {
sync_mpp(vecs, mpp, ticks);
+ mpp->prio_update = PRIO_UPDATE_NONE;
+ }
vector_foreach_slot(vecs->pathvec, pp, i)
pp->is_checked = CHECK_PATH_UNCHECKED;
checker_state = CHECKER_CHECKING_PATHS;
@@ -2943,6 +2984,14 @@ checkerloop (void *ap)
if (checker_state == CHECKER_UPDATING_PATHS)
checker_state = update_paths(vecs, &num_paths,
start_time.tv_sec);
+ if (checker_state == CHECKER_FINISHED) {
+ vector_foreach_slot(vecs->mpvec, mpp, i) {
+ if (update_mpp_prio(vecs, mpp) == 2) {
+ /* multipath device deleted */
+ i--;
+ }
+ }
+ }
lock_cleanup_pop(vecs->lock);
if (checker_state != CHECKER_FINISHED) {
/* Yield to waiters */
--
2.45.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH 14/15] multipathd: remove pointless check
2024-08-28 22:17 [PATCH 00/15] Yet Another path checker refactor Benjamin Marzinski
` (12 preceding siblings ...)
2024-08-28 22:17 ` [PATCH 13/15] multipathd: update priority once after updating all paths Benjamin Marzinski
@ 2024-08-28 22:17 ` Benjamin Marzinski
2024-09-04 16:07 ` Martin Wilck
2024-08-28 22:17 ` [PATCH 15/15] multipathd: fix deferred_failback_tick for reload removes Benjamin Marzinski
2024-09-04 20:07 ` [PATCH 00/15] Yet Another path checker refactor Martin Wilck
15 siblings, 1 reply; 46+ messages in thread
From: Benjamin Marzinski @ 2024-08-28 22:17 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
Since we just returned if newstate wasn't PATH_UP or PATH_GHOST, it
must obviously be PATH_UP or PATH_GHOST at this point in the code. We
don't even wrap all the code for dealing with a path that just came up
in this if-block, It's only the persistent resrvation code.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
multipathd/main.c | 26 +++++++++++++-------------
1 file changed, 13 insertions(+), 13 deletions(-)
diff --git a/multipathd/main.c b/multipathd/main.c
index 1511f701..f9b0ebaf 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2545,19 +2545,19 @@ update_path_state (struct vectors * vecs, struct path * pp)
return CHECK_PATH_CHECKED;
}
- if (newstate == PATH_UP || newstate == PATH_GHOST) {
- if (pp->mpp->prflag != PRFLAG_UNSET) {
- int prflag = pp->mpp->prflag;
- /*
- * Check Persistent Reservation.
- */
- condlog(2, "%s: checking persistent "
- "reservation registration", pp->dev);
- mpath_pr_event_handle(pp);
- if (pp->mpp->prflag == PRFLAG_SET &&
- prflag != PRFLAG_SET)
- pr_register_active_paths(pp->mpp);
- }
+ /* newstate == PATH_UP || newstate == PATH_GHOST */
+
+ if (pp->mpp->prflag != PRFLAG_UNSET) {
+ int prflag = pp->mpp->prflag;
+ /*
+ * Check Persistent Reservation.
+ */
+ condlog(2, "%s: checking persistent "
+ "reservation registration", pp->dev);
+ mpath_pr_event_handle(pp);
+ if (pp->mpp->prflag == PRFLAG_SET &&
+ prflag != PRFLAG_SET)
+ pr_register_active_paths(pp->mpp);
}
/*
--
2.45.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH 15/15] multipathd: fix deferred_failback_tick for reload removes
2024-08-28 22:17 [PATCH 00/15] Yet Another path checker refactor Benjamin Marzinski
` (13 preceding siblings ...)
2024-08-28 22:17 ` [PATCH 14/15] multipathd: remove pointless check Benjamin Marzinski
@ 2024-08-28 22:17 ` Benjamin Marzinski
2024-09-04 16:10 ` Martin Wilck
2024-09-04 20:07 ` [PATCH 00/15] Yet Another path checker refactor Martin Wilck
15 siblings, 1 reply; 46+ messages in thread
From: Benjamin Marzinski @ 2024-08-28 22:17 UTC (permalink / raw)
To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck
If reload_and_sync_map() removes the multipath device,
deferred_failback_tick() needs to decrement the counter so that it
doesn't skip the following device.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
multipathd/main.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/multipathd/main.c b/multipathd/main.c
index f9b0ebaf..9e930b53 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2098,9 +2098,12 @@ deferred_failback_tick (struct vectors *vecs)
if (!mpp->failback_tick &&
need_switch_pathgroup(mpp, &need_reload)) {
- if (need_reload)
- reload_and_sync_map(mpp, vecs);
- else
+ if (need_reload) {
+ if (reload_and_sync_map(mpp, vecs) == 2) {
+ /* multipath device removed */
+ i--;
+ }
+ } else
switch_pathgroup(mpp);
}
}
--
2.45.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH 03/15] libmultipath: split out the code to wait for pending checkers
2024-08-28 22:17 ` [PATCH 03/15] libmultipath: split out the code to wait for pending checkers Benjamin Marzinski
@ 2024-09-04 15:01 ` Martin Wilck
2024-09-04 18:16 ` Benjamin Marzinski
0 siblings, 1 reply; 46+ messages in thread
From: Martin Wilck @ 2024-09-04 15:01 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Wed, 2024-08-28 at 18:17 -0400, Benjamin Marzinski wrote:
> This patch adds a new optional symbol for the dynamic path checker
> libraries, libcheck_pending. This is currently unused, but can be
> called
> on pending checkers to check if they've completed and return their
> value.
> The "tur" and "directio" checkers are the only ones which can return
> PATH_PENDING. They now implement libcheck_pending() as a wrapper
> around
> the code that libcheck_check uses to wait for pending paths, which
> has
> been broken out into its own function.
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
> libmultipath/checkers.c | 4 +-
> libmultipath/checkers.h | 1 +
> libmultipath/checkers/directio.c | 86 ++++++++++++++++++++++--------
> --
> libmultipath/checkers/tur.c | 65 ++++++++++++++++--------
> 4 files changed, 109 insertions(+), 47 deletions(-)
>
> diff --git a/libmultipath/checkers.c b/libmultipath/checkers.c
> index c4918d28..298aec78 100644
> --- a/libmultipath/checkers.c
> +++ b/libmultipath/checkers.c
> @@ -26,6 +26,7 @@ struct checker_class {
> void (*free)(struct checker *); /* to free the context
> */
> void (*reset)(void); /* to reset the global
> variables */
> void *(*thread)(void *); /* async thread entry
> point */
> + int (*pending)(struct checker *); /* to recheck pending
> paths */
> const char **msgtable;
> short msgtable_size;
> };
> @@ -180,7 +181,8 @@ static struct checker_class
> *add_checker_class(const char *name)
> c->mp_init = (int (*)(struct checker *)) dlsym(c->handle,
> "libcheck_mp_init");
> c->reset = (void (*)(void)) dlsym(c->handle,
> "libcheck_reset");
> c->thread = (void *(*)(void*)) dlsym(c->handle,
> "libcheck_thread");
> - /* These 3 functions can be NULL. call dlerror() to clear
> out any
> + c->pending = (int (*)(struct checker *)) dlsym(c->handle,
> "libcheck_pending");
> + /* These 4 functions can be NULL. call dlerror() to clear
> out any
> * error string */
> dlerror();
>
> diff --git a/libmultipath/checkers.h b/libmultipath/checkers.h
> index fb1160af..b2342a1b 100644
> --- a/libmultipath/checkers.h
> +++ b/libmultipath/checkers.h
> @@ -190,6 +190,7 @@ void libcheck_free(struct checker *);
> void *libcheck_thread(struct checker_context *ctx);
> void libcheck_reset(void);
> int libcheck_mp_init(struct checker *);
> +int libcheck_pending(struct checker *c);
>
>
> /*
> diff --git a/libmultipath/checkers/directio.c
> b/libmultipath/checkers/directio.c
> index 8e87878b..3f88b40d 100644
> --- a/libmultipath/checkers/directio.c
> +++ b/libmultipath/checkers/directio.c
> @@ -288,14 +288,36 @@ get_events(struct aio_group *aio_grp, struct
> timespec *timeout)
> return got_events;
> }
>
> +static void
> +check_pending(struct directio_context *ct, struct timespec endtime)
> +{
> + int r;
> + struct timespec currtime, timeout;
> +
> + while(1) {
> + get_monotonic_time(&currtime);
> + timespecsub(&endtime, &currtime, &timeout);
> + if (timeout.tv_sec < 0)
> + timeout.tv_sec = timeout.tv_nsec = 0;
> +
> + r = get_events(ct->aio_grp, &timeout);
> +
> + if (ct->req->state != PATH_PENDING) {
> + ct->running = 0;
> + return;
> + } else if (r == 0 ||
> + (timeout.tv_sec == 0 && timeout.tv_nsec
> == 0))
> + return;
> + }
> +}
> +
> static int
> check_state(int fd, struct directio_context *ct, int sync, int
> timeout_secs)
> {
> struct timespec timeout = { .tv_nsec = 1000 };
> struct stat sb;
> int rc;
> - long r;
> - struct timespec currtime, endtime;
> + struct timespec endtime;
>
> if (fstat(fd, &sb) == 0) {
> LOG(4, "called for %x", (unsigned) sb.st_rdev);
> @@ -330,21 +352,11 @@ check_state(int fd, struct directio_context
> *ct, int sync, int timeout_secs)
> endtime.tv_sec += timeout.tv_sec;
> endtime.tv_nsec += timeout.tv_nsec;
> normalize_timespec(&endtime);
> - while(1) {
> - r = get_events(ct->aio_grp, &timeout);
>
> - if (ct->req->state != PATH_PENDING) {
> - ct->running = 0;
> - return ct->req->state;
> - } else if (r == 0 ||
> - (timeout.tv_sec == 0 && timeout.tv_nsec
> == 0))
> - break;
> + check_pending(ct, endtime);
> + if (ct->req->state != PATH_PENDING)
> + return ct->req->state;
>
> - get_monotonic_time(&currtime);
> - timespecsub(&endtime, &currtime, &timeout);
> - if (timeout.tv_sec < 0)
> - timeout.tv_sec = timeout.tv_nsec = 0;
> - }
> if (ct->running > timeout_secs || sync) {
> struct io_event event;
>
> @@ -360,17 +372,9 @@ check_state(int fd, struct directio_context *ct,
> int sync, int timeout_secs)
> return rc;
> }
>
> -int libcheck_check (struct checker * c)
> +static void set_msgid(struct checker *c, int state)
> {
> - int ret;
> - struct directio_context * ct = (struct directio_context *)c-
> >context;
> -
> - if (!ct)
> - return PATH_UNCHECKED;
> -
> - ret = check_state(c->fd, ct, checker_is_sync(c), c-
> >timeout);
> -
> - switch (ret)
> + switch (state)
> {
> case PATH_UNCHECKED:
> c->msgid = MSG_DIRECTIO_UNKNOWN;
> @@ -387,5 +391,37 @@ int libcheck_check (struct checker * c)
> default:
> break;
> }
> +}
> +
> +int libcheck_pending(struct checker *c)
> +{
> + struct timespec endtime;
> + struct directio_context *ct = (struct directio_context *)c-
> >context;
> +
> + /* The if path checker isn't running, just return the
> exiting value. */
> + if (!ct || !ct->running)
> + return c->path_state;
> +
> + if (ct->req->state == PATH_PENDING) {
> + get_monotonic_time(&endtime);
> + check_pending(ct, endtime);
> + } else
> + ct->running = 0;
> + set_msgid(c, ct->req->state);
> +
> + return ct->req->state;
> +}
> +
> +int libcheck_check (struct checker * c)
> +{
> + int ret;
> + struct directio_context * ct = (struct directio_context *)c-
> >context;
> +
> + if (!ct)
> + return PATH_UNCHECKED;
> +
> + ret = check_state(c->fd, ct, checker_is_sync(c), c-
> >timeout);
> + set_msgid(c, ret);
> +
> return ret;
> }
> diff --git a/libmultipath/checkers/tur.c
> b/libmultipath/checkers/tur.c
> index a2905af5..95af5214 100644
> --- a/libmultipath/checkers/tur.c
> +++ b/libmultipath/checkers/tur.c
> @@ -323,6 +323,49 @@ static int tur_check_async_timeout(struct
> checker *c)
> return (now.tv_sec > ct->time);
> }
>
> +int check_pending(struct checker *c, struct timespec endtime)
> +{
> + struct tur_checker_context *ct = c->context;
> + int r, tur_status = PATH_PENDING;
> +
> + pthread_mutex_lock(&ct->lock);
> +
> + for (r = 0;
> + r == 0 && ct->state == PATH_PENDING &&
> + ct->msgid == MSG_TUR_RUNNING;
> + r = pthread_cond_timedwait(&ct->active, &ct->lock,
> &endtime));
> +
> + if (!r) {
> + tur_status = ct->state;
> + c->msgid = ct->msgid;
> + }
> + pthread_mutex_unlock(&ct->lock);
> + if (tur_status == PATH_PENDING && c->msgid ==
> MSG_TUR_RUNNING) {
> + condlog(4, "%d:%d : tur checker still running",
> + major(ct->devt), minor(ct->devt));
> + } else {
> + int running = uatomic_xchg(&ct->running, 0);
> + if (running)
> + pthread_cancel(ct->thread);
> + ct->thread = 0;
> + }
> +
> + return tur_status;
> +}
> +
> +int libcheck_pending(struct checker *c)
> +{
> + struct timespec endtime;
> + struct tur_checker_context *ct = c->context;
> +
> + /* The if path checker isn't running, just return the
> exiting value. */
> + if (!ct || !ct->thread)
> + return c->path_state;
> +
> + get_monotonic_time(&endtime);
> + return check_pending(c, endtime);
Does this work?
https://pubs.opengroup.org/onlinepubs/009695299/functions/pthread_cond_timedwait.html
says that "an error is returned [...] if the absolute time specified by
abstime has already been passed at the time of the call".
Which would always be the case if you pass a timestamp that you
fetched previously unmodified, meaning that you'd always get ETIMEDOUT.
Or am I misreading the docs?
Martin
> +}
> +
> int libcheck_check(struct checker * c)
> {
> struct tur_checker_context *ct = c->context;
> @@ -437,27 +480,7 @@ int libcheck_check(struct checker * c)
> return tur_check(c->fd, c->timeout, &c-
> >msgid);
> }
> tur_timeout(&tsp);
> - pthread_mutex_lock(&ct->lock);
> -
> - for (r = 0;
> - r == 0 && ct->state == PATH_PENDING &&
> - ct->msgid == MSG_TUR_RUNNING;
> - r = pthread_cond_timedwait(&ct->active, &ct-
> >lock, &tsp));
> -
> - if (!r) {
> - tur_status = ct->state;
> - c->msgid = ct->msgid;
> - }
> - pthread_mutex_unlock(&ct->lock);
> - if (tur_status == PATH_PENDING && c->msgid ==
> MSG_TUR_RUNNING) {
> - condlog(4, "%d:%d : tur checker still
> running",
> - major(ct->devt), minor(ct->devt));
> - } else {
> - int running = uatomic_xchg(&ct->running, 0);
> - if (running)
> - pthread_cancel(ct->thread);
> - ct->thread = 0;
> - }
> + tur_status = check_pending(c, tsp);
> }
>
> return tur_status;
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 13/15] multipathd: update priority once after updating all paths
2024-08-28 22:17 ` [PATCH 13/15] multipathd: update priority once after updating all paths Benjamin Marzinski
@ 2024-09-04 15:06 ` Martin Wilck
2024-09-04 18:54 ` Benjamin Marzinski
2024-09-04 18:57 ` Martin Wilck
1 sibling, 1 reply; 46+ messages in thread
From: Martin Wilck @ 2024-09-04 15:06 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Wed, 2024-08-28 at 18:17 -0400, Benjamin Marzinski wrote:
> Instead of updating the path priorities and possibly reloading the
> multipath device when each path is updated, wait till all paths
> have been updated, and then go through the multipath devices updating
> the priorities once, reloading if necessary.
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
> libmultipath/structs.h | 9 +++
> multipathd/main.c | 169 ++++++++++++++++++++++++++-------------
> --
> 2 files changed, 118 insertions(+), 60 deletions(-)
>
> diff --git a/multipathd/main.c b/multipathd/main.c
> index 2bcc6066..1511f701 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
>
> @@ -2859,7 +2898,7 @@ update_paths(struct vectors *vecs, int
> *num_paths_p, time_t start_secs)
> i--;
> else {
> pp->is_checked = rc;
> - if (rc == CHECK_PATH_CHECKED)
> + if (rc == CHECK_PATH_CHECKED ||
> CHECK_PATH_NEW_UP)
This causes compilation errors. I suppose you meant (rc ==
CHECK_PATH_CHECKED || rc == CHECK_PATH_NEW_UP)
(fixed this up for now on my "tip" branch).
Martin
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 06/15] libmultipath: split get_state into two functions
2024-08-28 22:17 ` [PATCH 06/15] libmultipath: split get_state into two functions Benjamin Marzinski
@ 2024-09-04 15:14 ` Martin Wilck
2024-09-04 18:43 ` Benjamin Marzinski
0 siblings, 1 reply; 46+ messages in thread
From: Martin Wilck @ 2024-09-04 15:14 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Wed, 2024-08-28 at 18:17 -0400, Benjamin Marzinski wrote:
> get_state() is now split into start_checker(), which runs the checker
> but doesn't wait for async checkers, and get_state(), which returns
> the
> state from the checker, after waiting for async checkers if
> necessary.
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
> libmultipath/discovery.c | 22 ++++++++++++++++------
> libmultipath/discovery.h | 4 +++-
> libmultipath/propsel.c | 1 +
> multipathd/main.c | 4 +++-
> 4 files changed, 23 insertions(+), 8 deletions(-)
>
> diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
> index 5648be60..e0f46ff2 100644
> --- a/libmultipath/discovery.c
> +++ b/libmultipath/discovery.c
> @@ -1974,30 +1974,29 @@ cciss_ioctl_pathinfo(struct path *pp)
> }
>
> int
> -get_state (struct path * pp, struct config *conf, int daemon, int
> oldstate)
> +start_checker (struct path * pp, struct config *conf, int daemon,
> int oldstate)
> {
> struct checker * c = &pp->checker;
> - int state;
>
> if (!checker_selected(c)) {
> if (daemon) {
> if (pathinfo(pp, conf, DI_SYSFS) !=
> PATHINFO_OK) {
> condlog(3, "%s: couldn't get sysfs
> pathinfo",
> pp->dev);
> - return PATH_UNCHECKED;
> + return -1;
> }
> }
> select_detect_checker(conf, pp);
> select_checker(conf, pp);
> if (!checker_selected(c)) {
> condlog(3, "%s: No checker selected", pp-
> >dev);
> - return PATH_UNCHECKED;
> + return -1;
> }
> checker_set_fd(c, pp->fd);
> if (checker_init(c, pp->mpp?&pp->mpp-
> >mpcontext:NULL)) {
> checker_clear(c);
> condlog(3, "%s: checker init failed", pp-
> >dev);
> - return PATH_UNCHECKED;
> + return -1;
> }
> }
> if (pp->mpp && !c->mpcontext)
> @@ -2008,6 +2007,15 @@ get_state (struct path * pp, struct config
> *conf, int daemon, int oldstate)
> else
> checker_set_sync(c);
> checker_check(c, oldstate);
> + return 0;
> +}
> +
> +int
> +get_state (struct path * pp)
> +{
> + struct checker * c = &pp->checker;
> + int state;
> +
> state = checker_get_state(c);
> condlog(3, "%s: %s state = %s", pp->dev,
> checker_name(c), checker_state_name(state));
> @@ -2455,7 +2463,9 @@ int pathinfo(struct path *pp, struct config
> *conf, int mask)
>
> if (mask & DI_CHECKER) {
> if (path_state == PATH_UP) {
> - int newstate = get_state(pp, conf, 0,
> path_state);
> + int newstate = PATH_UNCHECKED;
> + if (start_checker(pp, conf, 0, path_state)
> == 0)
> + newstate = get_state(pp);
> if (newstate != PATH_PENDING ||
> pp->state == PATH_UNCHECKED ||
> pp->state == PATH_WILD)
> diff --git a/libmultipath/discovery.h b/libmultipath/discovery.h
> index c93abf1c..f3e0c618 100644
> --- a/libmultipath/discovery.h
> +++ b/libmultipath/discovery.h
> @@ -34,7 +34,9 @@ int path_discovery (vector pathvec, int flag);
> int path_get_tpgs(struct path *pp); /* This function never returns
> TPGS_UNDEF */
> int do_tur (char *);
> int path_offline (struct path *);
> -int get_state (struct path * pp, struct config * conf, int daemon,
> int state);
> +int start_checker(struct path * pp, struct config * conf, int
> daemon,
> + int state);
> +int get_state(struct path * pp);
> int get_vpd_sgio (int fd, int pg, int vend_id, char * str, int
> maxlen);
> int pathinfo (struct path * pp, struct config * conf, int mask);
> int alloc_path_with_pathinfo (struct config *conf, struct
> udev_device *udevice,
> diff --git a/libmultipath/propsel.c b/libmultipath/propsel.c
> index a3fce203..ad771d35 100644
> --- a/libmultipath/propsel.c
> +++ b/libmultipath/propsel.c
> @@ -700,6 +700,7 @@ out:
> condlog(3, "%s: path_checker = %s %s", pp->dev,
> checker_name(c), origin);
> c->timeout = pp->checker_timeout;
> + c->path_state = PATH_UNCHECKED;
Should this be in 01/15 already?
Martin
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 07/15] libmultipath: change path_offline to path_sysfs_state
2024-08-28 22:17 ` [PATCH 07/15] libmultipath: change path_offline to path_sysfs_state Benjamin Marzinski
@ 2024-09-04 15:31 ` Martin Wilck
0 siblings, 0 replies; 46+ messages in thread
From: Martin Wilck @ 2024-09-04 15:31 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Wed, 2024-08-28 at 18:17 -0400, Benjamin Marzinski wrote:
> Instead of pp->offline being a binary value, change it to show the
> actual result of looking at the sysfs state, and rename it to
> pp->sysfs_state. Also change the function name from path_offline() to
> path_sysfs_state(). Adapt the tests for pp->offline. This should not
> change how multipath currently works. pp->sysfs_state will be used in
> future patches.
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
> libmultipath/discovery.c | 65 ++++++++++++++++-------------
> --
> libmultipath/discovery.h | 2 +-
> libmultipath/libmultipath.version | 3 +-
> libmultipath/print.c | 2 +-
> libmultipath/structs.h | 2 +-
> multipathd/main.c | 4 +-
> 6 files changed, 41 insertions(+), 37 deletions(-)
>
> diff --git a/libmultipath/libmultipath.version
> b/libmultipath/libmultipath.version
> index 21d48da6..6439d3a7 100644
> --- a/libmultipath/libmultipath.version
> +++ b/libmultipath/libmultipath.version
> @@ -146,7 +146,7 @@ global:
> path_discovery;
> path_get_tpgs;
> pathinfo;
> - path_offline;
> + path_sysfs_state;
> print_all_paths;
> print_foreign_topology;
> print_multipath_topology__;
> @@ -160,6 +160,7 @@ global:
> remove_wwid;
> replace_wwids;
> reset_checker_classes;
> + start_checker;
> select_all_tg_pt;
> select_action;
> select_find_multipaths_timeout;
Nit: This hunk belongs in 06/15. Otherwise, LGTM.
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 10/15] multipathd: split check_path into two functions
2024-08-28 22:17 ` [PATCH 10/15] multipathd: split check_path into two functions Benjamin Marzinski
@ 2024-09-04 15:38 ` Martin Wilck
2024-09-04 18:51 ` Benjamin Marzinski
0 siblings, 1 reply; 46+ messages in thread
From: Martin Wilck @ 2024-09-04 15:38 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Wed, 2024-08-28 at 18:17 -0400, Benjamin Marzinski wrote:
> Split out the code that updates a path's state and sets up the next
> check time into its own function, update_path().
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
> multipathd/main.c | 31 ++++++++++++++++++++++---------
> 1 file changed, 22 insertions(+), 9 deletions(-)
>
> diff --git a/multipathd/main.c b/multipathd/main.c
> index 94d4e421..300f8247 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -2390,6 +2390,7 @@ sync_mpp(struct vectors * vecs, struct
> multipath *mpp, unsigned int ticks)
> }
>
> enum check_path_return {
> + CHECK_PATH_STARTED,
> CHECK_PATH_CHECKED,
> CHECK_PATH_SKIPPED,
> CHECK_PATH_REMOVED,
> @@ -2629,13 +2630,10 @@ update_path_state (struct vectors * vecs,
> struct path * pp)
> }
>
> static int
> -check_path (struct vectors * vecs, struct path * pp, unsigned int
> ticks,
> - time_t start_secs)
> +check_path (struct path * pp, unsigned int ticks)
check_path() used to be one of our core functions, and you now re-
introduce it with quite different semantics.
Perhaps choose a new name?
Martin
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 14/15] multipathd: remove pointless check
2024-08-28 22:17 ` [PATCH 14/15] multipathd: remove pointless check Benjamin Marzinski
@ 2024-09-04 16:07 ` Martin Wilck
0 siblings, 0 replies; 46+ messages in thread
From: Martin Wilck @ 2024-09-04 16:07 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Wed, 2024-08-28 at 18:17 -0400, Benjamin Marzinski wrote:
> Since we just returned if newstate wasn't PATH_UP or PATH_GHOST, it
> must obviously be PATH_UP or PATH_GHOST at this point in the code. We
> don't even wrap all the code for dealing with a path that just came
> up
> in this if-block, It's only the persistent resrvation code.
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 15/15] multipathd: fix deferred_failback_tick for reload removes
2024-08-28 22:17 ` [PATCH 15/15] multipathd: fix deferred_failback_tick for reload removes Benjamin Marzinski
@ 2024-09-04 16:10 ` Martin Wilck
0 siblings, 0 replies; 46+ messages in thread
From: Martin Wilck @ 2024-09-04 16:10 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Wed, 2024-08-28 at 18:17 -0400, Benjamin Marzinski wrote:
> If reload_and_sync_map() removes the multipath device,
> deferred_failback_tick() needs to decrement the counter so that it
> doesn't skip the following device.
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
One day we should add symbolic return values for reload_and_sync_map().
Other than that:
Reviewed-by: Martin Wilck <mwilck@suse.com>
> ---
> multipathd/main.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/multipathd/main.c b/multipathd/main.c
> index f9b0ebaf..9e930b53 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -2098,9 +2098,12 @@ deferred_failback_tick (struct vectors *vecs)
>
> if (!mpp->failback_tick &&
> need_switch_pathgroup(mpp,
> &need_reload)) {
> - if (need_reload)
> - reload_and_sync_map(mpp,
> vecs);
> - else
> + if (need_reload) {
> + if (reload_and_sync_map(mpp,
> vecs) == 2) {
> + /* multipath device
> removed */
> + i--;
> + }
> + } else
> switch_pathgroup(mpp);
> }
> }
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 05/15] multipath-tools tests: fix up directio tests
2024-08-28 22:17 ` [PATCH 05/15] multipath-tools tests: fix up directio tests Benjamin Marzinski
@ 2024-09-04 16:12 ` Martin Wilck
2024-09-04 18:29 ` Benjamin Marzinski
0 siblings, 1 reply; 46+ messages in thread
From: Martin Wilck @ 2024-09-04 16:12 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Wed, 2024-08-28 at 18:17 -0400, Benjamin Marzinski wrote:
> Make the directio tests work with libcheck_pending() being separate
> from
> libcheck_check
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
There's still something wrong with this test. I'm seeing lots of CI
errors with your complete series applied.
https://github.com/openSUSE/multipath-tools/actions?query=branch%3Atip
https://github.com/openSUSE/multipath-tools/actions/runs/10704501258/job/29677643779
Martin
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 01/15] libmultipath: store checker_check() result in checker struct
2024-08-28 22:17 ` [PATCH 01/15] libmultipath: store checker_check() result in checker struct Benjamin Marzinski
@ 2024-09-04 16:13 ` Martin Wilck
0 siblings, 0 replies; 46+ messages in thread
From: Martin Wilck @ 2024-09-04 16:13 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Wed, 2024-08-28 at 18:17 -0400, Benjamin Marzinski wrote:
> checker_check() is now a void function that stores the path state in
> the checker struct. It can be retrieved later using
> checker_get_state().
> Right now, this is called immediately after checker_check(), but in
> the future, it can be deferred to another time.
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 02/15] libmultipath: add missing checker function prototypes
2024-08-28 22:17 ` [PATCH 02/15] libmultipath: add missing checker function prototypes Benjamin Marzinski
@ 2024-09-04 16:13 ` Martin Wilck
0 siblings, 0 replies; 46+ messages in thread
From: Martin Wilck @ 2024-09-04 16:13 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Wed, 2024-08-28 at 18:17 -0400, Benjamin Marzinski wrote:
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 03/15] libmultipath: split out the code to wait for pending checkers
2024-09-04 15:01 ` Martin Wilck
@ 2024-09-04 18:16 ` Benjamin Marzinski
2024-09-04 19:48 ` Martin Wilck
0 siblings, 1 reply; 46+ messages in thread
From: Benjamin Marzinski @ 2024-09-04 18:16 UTC (permalink / raw)
To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development
On Wed, Sep 04, 2024 at 05:01:39PM +0200, Martin Wilck wrote:
> On Wed, 2024-08-28 at 18:17 -0400, Benjamin Marzinski wrote:
<snip>
> > diff --git a/libmultipath/checkers/tur.c
> > b/libmultipath/checkers/tur.c
> > index a2905af5..95af5214 100644
> > --- a/libmultipath/checkers/tur.c
> > +++ b/libmultipath/checkers/tur.c
> > @@ -323,6 +323,49 @@ static int tur_check_async_timeout(struct
> > checker *c)
> > return (now.tv_sec > ct->time);
> > }
> >
> > +int check_pending(struct checker *c, struct timespec endtime)
> > +{
> > + struct tur_checker_context *ct = c->context;
> > + int r, tur_status = PATH_PENDING;
> > +
> > + pthread_mutex_lock(&ct->lock);
> > +
> > + for (r = 0;
> > + r == 0 && ct->state == PATH_PENDING &&
> > + ct->msgid == MSG_TUR_RUNNING;
> > + r = pthread_cond_timedwait(&ct->active, &ct->lock,
> > &endtime));
> > +
> > + if (!r) {
> > + tur_status = ct->state;
> > + c->msgid = ct->msgid;
> > + }
> > + pthread_mutex_unlock(&ct->lock);
> > + if (tur_status == PATH_PENDING && c->msgid ==
> > MSG_TUR_RUNNING) {
> > + condlog(4, "%d:%d : tur checker still running",
> > + major(ct->devt), minor(ct->devt));
> > + } else {
> > + int running = uatomic_xchg(&ct->running, 0);
> > + if (running)
> > + pthread_cancel(ct->thread);
> > + ct->thread = 0;
> > + }
> > +
> > + return tur_status;
> > +}
> > +
> > +int libcheck_pending(struct checker *c)
> > +{
> > + struct timespec endtime;
> > + struct tur_checker_context *ct = c->context;
> > +
> > + /* The if path checker isn't running, just return the
> > exiting value. */
> > + if (!ct || !ct->thread)
> > + return c->path_state;
> > +
> > + get_monotonic_time(&endtime);
> > + return check_pending(c, endtime);
>
> Does this work?
>
> https://pubs.opengroup.org/onlinepubs/009695299/functions/pthread_cond_timedwait.html
> says that "an error is returned [...] if the absolute time specified by
> abstime has already been passed at the time of the call".
>
> Which would always be the case if you pass a timestamp that you
> fetched previously unmodified, meaning that you'd always get ETIMEDOUT.
> Or am I misreading the docs?
No. You are reading the docs right. I'm just don't understand why
that's a problem. When you start the checker, you set a timeout that
you want to wait for, to give the checker a chance to complete in this
checker loop. Once that timeout has passed, if the thread is still
running, then libcheck_pending should return PATH_PENDING.
pthread_cond_timedwait() doesn't get called unless the thread is
running. So I would argue that pthread_cond_timedwait() always failing
once that timeout has passed is the correct thing to do.
Or are you arguing for having check_pending() manually check if the
timeout has passed, and skipping the call to pthread_cond_timedwait() in
that case?
With the directio checker, we need to call get_events() even once our
timeout has passed, since we need to handle any async IOs that have
completed since the last call to check_pending(). But in there's no need
to call pthread_cond_timedwait() for the tur checker, so we could skip
it if we saw that the timeout had already passed. I would argue that
this is just duplicating the check from pthread_cond_timedwait() in
check_pending() for no real benfit though.
Or am I missing something here?
-Ben
> Martin
>
>
>
>
>
>
>
> > +}
> > +
> > int libcheck_check(struct checker * c)
> > {
> > struct tur_checker_context *ct = c->context;
> > @@ -437,27 +480,7 @@ int libcheck_check(struct checker * c)
> > return tur_check(c->fd, c->timeout, &c-
> > >msgid);
> > }
> > tur_timeout(&tsp);
> > - pthread_mutex_lock(&ct->lock);
> > -
> > - for (r = 0;
> > - r == 0 && ct->state == PATH_PENDING &&
> > - ct->msgid == MSG_TUR_RUNNING;
> > - r = pthread_cond_timedwait(&ct->active, &ct-
> > >lock, &tsp));
> > -
> > - if (!r) {
> > - tur_status = ct->state;
> > - c->msgid = ct->msgid;
> > - }
> > - pthread_mutex_unlock(&ct->lock);
> > - if (tur_status == PATH_PENDING && c->msgid ==
> > MSG_TUR_RUNNING) {
> > - condlog(4, "%d:%d : tur checker still
> > running",
> > - major(ct->devt), minor(ct->devt));
> > - } else {
> > - int running = uatomic_xchg(&ct->running, 0);
> > - if (running)
> > - pthread_cancel(ct->thread);
> > - ct->thread = 0;
> > - }
> > + tur_status = check_pending(c, tsp);
> > }
> >
> > return tur_status;
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 05/15] multipath-tools tests: fix up directio tests
2024-09-04 16:12 ` Martin Wilck
@ 2024-09-04 18:29 ` Benjamin Marzinski
2024-09-04 19:36 ` Martin Wilck
0 siblings, 1 reply; 46+ messages in thread
From: Benjamin Marzinski @ 2024-09-04 18:29 UTC (permalink / raw)
To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development
On Wed, Sep 04, 2024 at 06:12:37PM +0200, Martin Wilck wrote:
> On Wed, 2024-08-28 at 18:17 -0400, Benjamin Marzinski wrote:
> > Make the directio tests work with libcheck_pending() being separate
> > from
> > libcheck_check
> >
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
>
> There's still something wrong with this test. I'm seeing lots of CI
> errors with your complete series applied.
>
> https://github.com/openSUSE/multipath-tools/actions?query=branch%3Atip
> https://github.com/openSUSE/multipath-tools/actions/runs/10704501258/job/29677643779
It looks like your "tip" brach is missing:
[PATCH 04/15] libmultipath: remove pending wait code from libcheck_check calls
-Ben
>
> Martin
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 06/15] libmultipath: split get_state into two functions
2024-09-04 15:14 ` Martin Wilck
@ 2024-09-04 18:43 ` Benjamin Marzinski
0 siblings, 0 replies; 46+ messages in thread
From: Benjamin Marzinski @ 2024-09-04 18:43 UTC (permalink / raw)
To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development
On Wed, Sep 04, 2024 at 05:14:32PM +0200, Martin Wilck wrote:
> On Wed, 2024-08-28 at 18:17 -0400, Benjamin Marzinski wrote:
> > get_state() is now split into start_checker(), which runs the checker
> > but doesn't wait for async checkers, and get_state(), which returns
> > the
> > state from the checker, after waiting for async checkers if
> > necessary.
> >
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> > libmultipath/discovery.c | 22 ++++++++++++++++------
> > libmultipath/discovery.h | 4 +++-
> > libmultipath/propsel.c | 1 +
> > multipathd/main.c | 4 +++-
> > 4 files changed, 23 insertions(+), 8 deletions(-)
> >
> > diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
> > index 5648be60..e0f46ff2 100644
> > --- a/libmultipath/discovery.c
> > +++ b/libmultipath/discovery.c
> > @@ -1974,30 +1974,29 @@ cciss_ioctl_pathinfo(struct path *pp)
> > }
> >
> > int
> > -get_state (struct path * pp, struct config *conf, int daemon, int
> > oldstate)
> > +start_checker (struct path * pp, struct config *conf, int daemon,
> > int oldstate)
> > {
> > struct checker * c = &pp->checker;
> > - int state;
> >
> > if (!checker_selected(c)) {
> > if (daemon) {
> > if (pathinfo(pp, conf, DI_SYSFS) !=
> > PATHINFO_OK) {
> > condlog(3, "%s: couldn't get sysfs
> > pathinfo",
> > pp->dev);
> > - return PATH_UNCHECKED;
> > + return -1;
> > }
> > }
> > select_detect_checker(conf, pp);
> > select_checker(conf, pp);
> > if (!checker_selected(c)) {
> > condlog(3, "%s: No checker selected", pp-
> > >dev);
> > - return PATH_UNCHECKED;
> > + return -1;
> > }
> > checker_set_fd(c, pp->fd);
> > if (checker_init(c, pp->mpp?&pp->mpp-
> > >mpcontext:NULL)) {
> > checker_clear(c);
> > condlog(3, "%s: checker init failed", pp-
> > >dev);
> > - return PATH_UNCHECKED;
> > + return -1;
> > }
> > }
> > if (pp->mpp && !c->mpcontext)
> > @@ -2008,6 +2007,15 @@ get_state (struct path * pp, struct config
> > *conf, int daemon, int oldstate)
> > else
> > checker_set_sync(c);
> > checker_check(c, oldstate);
> > + return 0;
> > +}
> > +
> > +int
> > +get_state (struct path * pp)
> > +{
> > + struct checker * c = &pp->checker;
> > + int state;
> > +
> > state = checker_get_state(c);
> > condlog(3, "%s: %s state = %s", pp->dev,
> > checker_name(c), checker_state_name(state));
> > @@ -2455,7 +2463,9 @@ int pathinfo(struct path *pp, struct config
> > *conf, int mask)
> >
> > if (mask & DI_CHECKER) {
> > if (path_state == PATH_UP) {
> > - int newstate = get_state(pp, conf, 0,
> > path_state);
> > + int newstate = PATH_UNCHECKED;
> > + if (start_checker(pp, conf, 0, path_state)
> > == 0)
> > + newstate = get_state(pp);
> > if (newstate != PATH_PENDING ||
> > pp->state == PATH_UNCHECKED ||
> > pp->state == PATH_WILD)
> > diff --git a/libmultipath/discovery.h b/libmultipath/discovery.h
> > index c93abf1c..f3e0c618 100644
> > --- a/libmultipath/discovery.h
> > +++ b/libmultipath/discovery.h
> > @@ -34,7 +34,9 @@ int path_discovery (vector pathvec, int flag);
> > int path_get_tpgs(struct path *pp); /* This function never returns
> > TPGS_UNDEF */
> > int do_tur (char *);
> > int path_offline (struct path *);
> > -int get_state (struct path * pp, struct config * conf, int daemon,
> > int state);
> > +int start_checker(struct path * pp, struct config * conf, int
> > daemon,
> > + int state);
> > +int get_state(struct path * pp);
> > int get_vpd_sgio (int fd, int pg, int vend_id, char * str, int
> > maxlen);
> > int pathinfo (struct path * pp, struct config * conf, int mask);
> > int alloc_path_with_pathinfo (struct config *conf, struct
> > udev_device *udevice,
> > diff --git a/libmultipath/propsel.c b/libmultipath/propsel.c
> > index a3fce203..ad771d35 100644
> > --- a/libmultipath/propsel.c
> > +++ b/libmultipath/propsel.c
> > @@ -700,6 +700,7 @@ out:
> > condlog(3, "%s: path_checker = %s %s", pp->dev,
> > checker_name(c), origin);
> > c->timeout = pp->checker_timeout;
> > + c->path_state = PATH_UNCHECKED;
>
> Should this be in 01/15 already?
Yeah. That makes sense.
-Ben
>
> Martin
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 10/15] multipathd: split check_path into two functions
2024-09-04 15:38 ` Martin Wilck
@ 2024-09-04 18:51 ` Benjamin Marzinski
2024-09-05 19:02 ` Benjamin Marzinski
0 siblings, 1 reply; 46+ messages in thread
From: Benjamin Marzinski @ 2024-09-04 18:51 UTC (permalink / raw)
To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development
On Wed, Sep 04, 2024 at 05:38:52PM +0200, Martin Wilck wrote:
> On Wed, 2024-08-28 at 18:17 -0400, Benjamin Marzinski wrote:
> > Split out the code that updates a path's state and sets up the next
> > check time into its own function, update_path().
> >
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> > multipathd/main.c | 31 ++++++++++++++++++++++---------
> > 1 file changed, 22 insertions(+), 9 deletions(-)
> >
> > diff --git a/multipathd/main.c b/multipathd/main.c
> > index 94d4e421..300f8247 100644
> > --- a/multipathd/main.c
> > +++ b/multipathd/main.c
> > @@ -2390,6 +2390,7 @@ sync_mpp(struct vectors * vecs, struct
> > multipath *mpp, unsigned int ticks)
> > }
> >
> > enum check_path_return {
> > + CHECK_PATH_STARTED,
> > CHECK_PATH_CHECKED,
> > CHECK_PATH_SKIPPED,
> > CHECK_PATH_REMOVED,
> > @@ -2629,13 +2630,10 @@ update_path_state (struct vectors * vecs,
> > struct path * pp)
> > }
> >
> > static int
> > -check_path (struct vectors * vecs, struct path * pp, unsigned int
> > ticks,
> > - time_t start_secs)
> > +check_path (struct path * pp, unsigned int ticks)
>
> check_path() used to be one of our core functions, and you now re-
> introduce it with quite different semantics.
>
> Perhaps choose a new name?
Sure. Although the new check_path() is just the beginning part of the
old check_path(), where we actually run the checker, so it seems
reasonable to me. But your objection is also reasonable. I was just
getting sick of coming up with new function names by this point.
-Ben
>
> Martin
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 13/15] multipathd: update priority once after updating all paths
2024-09-04 15:06 ` Martin Wilck
@ 2024-09-04 18:54 ` Benjamin Marzinski
0 siblings, 0 replies; 46+ messages in thread
From: Benjamin Marzinski @ 2024-09-04 18:54 UTC (permalink / raw)
To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development
On Wed, Sep 04, 2024 at 05:06:23PM +0200, Martin Wilck wrote:
> On Wed, 2024-08-28 at 18:17 -0400, Benjamin Marzinski wrote:
> > Instead of updating the path priorities and possibly reloading the
> > multipath device when each path is updated, wait till all paths
> > have been updated, and then go through the multipath devices updating
> > the priorities once, reloading if necessary.
> >
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> > libmultipath/structs.h | 9 +++
> > multipathd/main.c | 169 ++++++++++++++++++++++++++-------------
> > --
> > 2 files changed, 118 insertions(+), 60 deletions(-)
> >
> > diff --git a/multipathd/main.c b/multipathd/main.c
> > index 2bcc6066..1511f701 100644
> > --- a/multipathd/main.c
> > +++ b/multipathd/main.c
> >
> > @@ -2859,7 +2898,7 @@ update_paths(struct vectors *vecs, int
> > *num_paths_p, time_t start_secs)
> > i--;
> > else {
> > pp->is_checked = rc;
> > - if (rc == CHECK_PATH_CHECKED)
> > + if (rc == CHECK_PATH_CHECKED ||
> > CHECK_PATH_NEW_UP)
>
> This causes compilation errors. I suppose you meant (rc ==
> CHECK_PATH_CHECKED || rc == CHECK_PATH_NEW_UP)
D'oh! Yep. Thanks.
-Ben
>
> (fixed this up for now on my "tip" branch).
>
> Martin
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 13/15] multipathd: update priority once after updating all paths
2024-08-28 22:17 ` [PATCH 13/15] multipathd: update priority once after updating all paths Benjamin Marzinski
2024-09-04 15:06 ` Martin Wilck
@ 2024-09-04 18:57 ` Martin Wilck
2024-09-04 19:51 ` Benjamin Marzinski
1 sibling, 1 reply; 46+ messages in thread
From: Martin Wilck @ 2024-09-04 18:57 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Wed, 2024-08-28 at 18:17 -0400, Benjamin Marzinski wrote:
> Instead of updating the path priorities and possibly reloading the
> multipath device when each path is updated, wait till all paths
> have been updated, and then go through the multipath devices updating
> the priorities once, reloading if necessary.
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
> libmultipath/structs.h | 9 +++
> multipathd/main.c | 169 ++++++++++++++++++++++++++-------------
> --
> 2 files changed, 118 insertions(+), 60 deletions(-)
>
> diff --git a/libmultipath/structs.h b/libmultipath/structs.h
> index af8e31e9..1f531d30 100644
> --- a/libmultipath/structs.h
> +++ b/libmultipath/structs.h
> @@ -318,6 +318,7 @@ enum check_path_states {
> CHECK_PATH_UNCHECKED,
> CHECK_PATH_STARTED,
> CHECK_PATH_CHECKED,
> + CHECK_PATH_NEW_UP,
> CHECK_PATH_SKIPPED,
> CHECK_PATH_REMOVED,
> };
> @@ -421,6 +422,13 @@ enum prflag_value {
> PRFLAG_SET,
> };
>
> +enum prio_update_type {
> + PRIO_UPDATE_NONE,
> + PRIO_UPDATE_NORMAL,
> + PRIO_UPDATE_NEW_PATH,
> + PRIO_UPDATE_MARGINAL,
> +};
> +
> struct multipath {
> char wwid[WWID_SIZE];
> char alias_old[WWID_SIZE];
> @@ -464,6 +472,7 @@ struct multipath {
> int queue_mode;
> unsigned int sync_tick;
> int synced_count;
> + enum prio_update_type prio_update;
> uid_t uid;
> gid_t gid;
> mode_t mode;
> diff --git a/multipathd/main.c b/multipathd/main.c
> index 2bcc6066..1511f701 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -1996,15 +1996,13 @@ mpvec_garbage_collector (struct vectors *
> vecs)
> * best pathgroup, and this is the first path in the pathgroup to
> come back
> * up, then switch to this pathgroup */
> static int
> -followover_should_failback(struct path * pp)
> +do_followover_should_failback(struct path * pp)
> {
> struct pathgroup * pgp;
> struct path *pp1;
> int i;
>
> - if (pp->mpp->pgfailback != -FAILBACK_FOLLOWOVER ||
> - !pp->mpp->pg || !pp->pgindex ||
> - pp->pgindex != pp->mpp->bestpg)
> + if (pp->pgindex != pp->mpp->bestpg)
> return 0;
What happened to the !pp->pgindex test? Is it not necessary any more?
>
> pgp = VECTOR_SLOT(pp->mpp->pg, pp->pgindex - 1);
> @@ -2017,6 +2015,26 @@ followover_should_failback(struct path * pp)
> return 1;
> }
>
> +static int
> +followover_should_failback(struct multipath *mpp)
> +{
> + struct path *pp;
> + struct pathgroup * pgp;
> + int i, j;
> +
> + if (mpp->pgfailback != -FAILBACK_FOLLOWOVER || !mpp->pg ||
> !mpp->bestpg)
> + return 0;
> +
> + vector_foreach_slot (mpp->pg, pgp, i) {
> + vector_foreach_slot (pgp->paths, pp, j) {
> + if (pp->is_checked == CHECK_PATH_NEW_UP &&
> + do_followover_should_failback(pp))
> + return 1;
> + }
> + }
> + return 0;
> +}
> +
> static void
> missing_uev_wait_tick(struct vectors *vecs)
> {
> @@ -2132,41 +2150,53 @@ partial_retrigger_tick(vector pathvec)
> }
> }
>
> -static int update_prio(struct path *pp, int force_refresh_all)
> +static bool update_prio(struct multipath *mpp, bool refresh_all)
> {
> int oldpriority;
> - struct path *pp1;
> + struct path *pp;
> struct pathgroup * pgp;
> - int i, j, changed = 0;
> + int i, j;
> + bool changed = false;
> + bool skipped_path = false;
> struct config *conf;
>
> - oldpriority = pp->priority;
> - if (pp->state != PATH_DOWN) {
> - conf = get_multipath_config();
> - pthread_cleanup_push(put_multipath_config, conf);
> - pathinfo(pp, conf, DI_PRIO);
> - pthread_cleanup_pop(1);
> + vector_foreach_slot (mpp->pg, pgp, i) {
> + vector_foreach_slot (pgp->paths, pp, j) {
> + if (pp->state != PATH_UP && pp->state !=
> PATH_GHOST)
> + continue;
This test used to be pp->state == PATH_DOWN (see below).
Just trying to confirm that you did this on purpose. It might justify a
separate patch, with rationale. After all, AFAICS, we will now keep the
old priority for all path states except PATH_UP and PATH_GHOST, while
previously we attempted to fetch the prio for most of them, and only
used the previous value if fetching the prio failed.
> + if (!refresh_all &&
> + pp->is_checked != CHECK_PATH_CHECKED) {
> + skipped_path = true;
> + continue;
> + }
> + oldpriority = pp->priority;
> + conf = get_multipath_config();
> + pthread_cleanup_push(put_multipath_config,
> conf);
> + pathinfo(pp, conf, DI_PRIO);
> + pthread_cleanup_pop(1);
> + if (pp->priority != oldpriority)
> + changed = true;
> + }
> }
> -
> - if (pp->priority != oldpriority)
> - changed = 1;
> - else if (!force_refresh_all)
> - return 0;
> -
> - vector_foreach_slot (pp->mpp->pg, pgp, i) {
> - vector_foreach_slot (pgp->paths, pp1, j) {
> - if (pp1 == pp || pp1->state == PATH_DOWN)
> + if (!changed || !skipped_path)
> + return changed;
> + /*
> + * If a path changed priorities, refresh the priorities of
> any
> + * paths we skipped
> + */
> + vector_foreach_slot (mpp->pg, pgp, i) {
> + vector_foreach_slot (pgp->paths, pp, j) {
> + if (pp->state != PATH_UP && pp->state !=
> PATH_GHOST)
> + continue;
> + if (pp->is_checked == CHECK_PATH_CHECKED)
> continue;
> - oldpriority = pp1->priority;
> conf = get_multipath_config();
> pthread_cleanup_push(put_multipath_config,
> conf);
> - pathinfo(pp1, conf, DI_PRIO);
> + pathinfo(pp, conf, DI_PRIO);
> pthread_cleanup_pop(1);
> - if (pp1->priority != oldpriority)
> - changed = 1;
> }
> }
> - return changed;
> + return true;
> }
>
> static int reload_map(struct vectors *vecs, struct multipath *mpp,
> @@ -2393,14 +2423,12 @@ static int
> update_path_state (struct vectors * vecs, struct path * pp)
> {
> int newstate;
> - int new_path_up = 0;
> int chkr_new_path_up = 0;
> int disable_reinstate = 0;
> int oldchkrstate = pp->chkrstate;
> unsigned int checkint, max_checkint;
> struct config *conf;
> - int marginal_pathgroups, marginal_changed = 0;
> - bool need_reload;
> + int marginal_pathgroups;
>
> conf = get_multipath_config();
> checkint = conf->checkint;
> @@ -2462,7 +2490,7 @@ update_path_state (struct vectors * vecs,
> struct path * pp)
> }
> if (!pp->marginal) {
> pp->marginal = 1;
> - marginal_changed = 1;
> + pp->mpp->prio_update =
> PRIO_UPDATE_MARGINAL;
> }
> } else {
> if (pp->marginal || pp->state ==
> PATH_DELAYED)
> @@ -2470,7 +2498,7 @@ update_path_state (struct vectors * vecs,
> struct path * pp)
> pp->dev);
> if (marginal_pathgroups && pp->marginal) {
> pp->marginal = 0;
> - marginal_changed = 1;
> + pp->mpp->prio_update =
> PRIO_UPDATE_MARGINAL;
> }
> }
> }
> @@ -2537,7 +2565,8 @@ update_path_state (struct vectors * vecs,
> struct path * pp)
> */
> if (!disable_reinstate)
> reinstate_path(pp);
> - new_path_up = 1;
> + if (pp->mpp->prio_update != PRIO_UPDATE_MARGINAL)
> + pp->mpp->prio_update = PRIO_UPDATE_NEW_PATH;
>
> if (oldchkrstate != PATH_UP && oldchkrstate !=
> PATH_GHOST)
> chkr_new_path_up = 1;
> @@ -2588,38 +2617,48 @@ update_path_state (struct vectors * vecs,
> struct path * pp)
> LOG_MSG(2, pp);
> }
> }
> -
> + if (pp->mpp->prio_update == PRIO_UPDATE_NONE &&
> + (newstate == PATH_UP || newstate == PATH_GHOST))
> + pp->mpp->prio_update = PRIO_UPDATE_NORMAL;
> pp->state = newstate;
> + return chkr_new_path_up ? CHECK_PATH_NEW_UP :
> CHECK_PATH_CHECKED;
> +}
>
> - if (pp->mpp->wait_for_udev)
> - return CHECK_PATH_CHECKED;
> - /*
> - * path prio refreshing
> - */
> - condlog(4, "path prio refresh");
> -
> - if (marginal_changed) {
> - update_prio(pp, 1);
> - reload_and_sync_map(pp->mpp, vecs);
> - } else if (update_prio(pp, new_path_up) &&
> - pp->mpp->pgpolicyfn == (pgpolicyfn
> *)group_by_prio &&
> - pp->mpp->pgfailback == -FAILBACK_IMMEDIATE) {
> +static int
> +update_mpp_prio(struct vectors *vecs, struct multipath *mpp)
> +{
> + bool need_reload, changed;
> + enum prio_update_type prio_update = mpp->prio_update;
> + mpp->prio_update = PRIO_UPDATE_NONE;
> +
> + if (mpp->wait_for_udev || prio_update == PRIO_UPDATE_NONE)
> + return 0;
> + condlog(4, "prio refresh");
> +
> + changed = update_prio(mpp, prio_update !=
> PRIO_UPDATE_NORMAL);
> + if (prio_update == PRIO_UPDATE_MARGINAL)
> + return reload_and_sync_map(mpp, vecs);
> + if (changed && mpp->pgpolicyfn == (pgpolicyfn
> *)group_by_prio &&
> + mpp->pgfailback == -FAILBACK_IMMEDIATE) {
> condlog(2, "%s: path priorities changed. reloading",
> - pp->mpp->alias);
> - reload_and_sync_map(pp->mpp, vecs);
> - } else if (need_switch_pathgroup(pp->mpp, &need_reload)) {
> - if (pp->mpp->pgfailback > 0 &&
> - (new_path_up || pp->mpp->failback_tick <= 0))
> - pp->mpp->failback_tick = pp->mpp->pgfailback
> + 1;
> - else if (pp->mpp->pgfailback == -FAILBACK_IMMEDIATE
> ||
> - (chkr_new_path_up &&
> followover_should_failback(pp))) {
> + mpp->alias);
> + return reload_and_sync_map(mpp, vecs);
> + }
> + if (need_switch_pathgroup(mpp, &need_reload)) {
> + if (mpp->pgfailback > 0 &&
> + (prio_update == PRIO_UPDATE_NEW_PATH ||
> + mpp->failback_tick <= 0))
> + mpp->failback_tick = mpp->pgfailback + 1;
> + else if (mpp->pgfailback == -FAILBACK_IMMEDIATE ||
> + (prio_update == PRIO_UPDATE_NEW_PATH &&
> + followover_should_failback(mpp))) {
> if (need_reload)
> - reload_and_sync_map(pp->mpp, vecs);
> + return reload_and_sync_map(mpp,
> vecs);
> else
> - switch_pathgroup(pp->mpp);
> + switch_pathgroup(mpp);
> }
> }
> - return CHECK_PATH_CHECKED;
> + return 0;
> }
>
> static int
> @@ -2859,7 +2898,7 @@ update_paths(struct vectors *vecs, int
> *num_paths_p, time_t start_secs)
> i--;
> else {
> pp->is_checked = rc;
> - if (rc == CHECK_PATH_CHECKED)
> + if (rc == CHECK_PATH_CHECKED ||
> CHECK_PATH_NEW_UP)
> (*num_paths_p)++;
> }
> if (++paths_checked % 128 == 0 &&
> @@ -2932,8 +2971,10 @@ checkerloop (void *ap)
> vector_foreach_slot(vecs->mpvec, mpp, i)
> mpp->synced_count = 0;
> if (checker_state == CHECKER_STARTING) {
> - vector_foreach_slot(vecs->mpvec,
> mpp, i)
> + vector_foreach_slot(vecs->mpvec,
> mpp, i) {
> sync_mpp(vecs, mpp, ticks);
> + mpp->prio_update =
> PRIO_UPDATE_NONE;
> + }
> vector_foreach_slot(vecs->pathvec,
> pp, i)
> pp->is_checked =
> CHECK_PATH_UNCHECKED;
> checker_state =
> CHECKER_CHECKING_PATHS;
> @@ -2943,6 +2984,14 @@ checkerloop (void *ap)
> if (checker_state == CHECKER_UPDATING_PATHS)
> checker_state = update_paths(vecs,
> &num_paths,
>
> start_time.tv_sec);
> + if (checker_state == CHECKER_FINISHED) {
> + vector_foreach_slot(vecs->mpvec,
> mpp, i) {
> + if (update_mpp_prio(vecs,
> mpp) == 2) {
> + /* multipath device
> deleted */
> + i--;
> + }
> + }
> + }
> lock_cleanup_pop(vecs->lock);
> if (checker_state != CHECKER_FINISHED) {
> /* Yield to waiters */
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 05/15] multipath-tools tests: fix up directio tests
2024-09-04 18:29 ` Benjamin Marzinski
@ 2024-09-04 19:36 ` Martin Wilck
2024-09-04 19:43 ` Martin Wilck
0 siblings, 1 reply; 46+ messages in thread
From: Martin Wilck @ 2024-09-04 19:36 UTC (permalink / raw)
To: Benjamin Marzinski; +Cc: Christophe Varoqui, device-mapper development
On Wed, 2024-09-04 at 14:29 -0400, Benjamin Marzinski wrote:
> On Wed, Sep 04, 2024 at 06:12:37PM +0200, Martin Wilck wrote:
> > On Wed, 2024-08-28 at 18:17 -0400, Benjamin Marzinski wrote:
> > > Make the directio tests work with libcheck_pending() being
> > > separate
> > > from
> > > libcheck_check
> > >
> > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> >
> > There's still something wrong with this test. I'm seeing lots of CI
> > errors with your complete series applied.
> >
> > https://github.com/openSUSE/multipath-tools/actions?query=branch%3Atip
> > https://github.com/openSUSE/multipath-tools/actions/runs/10704501258/job/29677643779
>
> It looks like your "tip" brach is missing:
> [PATCH 04/15] libmultipath: remove pending wait code from
> libcheck_check calls
Yeah. That patch ended up in a different mail folder, and I didn't
notice. Weird. CI looks much better now.
Martin
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 05/15] multipath-tools tests: fix up directio tests
2024-09-04 19:36 ` Martin Wilck
@ 2024-09-04 19:43 ` Martin Wilck
2024-09-04 22:53 ` Benjamin Marzinski
0 siblings, 1 reply; 46+ messages in thread
From: Martin Wilck @ 2024-09-04 19:43 UTC (permalink / raw)
To: Benjamin Marzinski; +Cc: Christophe Varoqui, device-mapper development
On Wed, 2024-09-04 at 21:36 +0200, Martin Wilck wrote:
> On Wed, 2024-09-04 at 14:29 -0400, Benjamin Marzinski wrote:
> > On Wed, Sep 04, 2024 at 06:12:37PM +0200, Martin Wilck wrote:
> > > On Wed, 2024-08-28 at 18:17 -0400, Benjamin Marzinski wrote:
> > > > Make the directio tests work with libcheck_pending() being
> > > > separate
> > > > from
> > > > libcheck_check
> > > >
> > > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > >
> > > There's still something wrong with this test. I'm seeing lots of
> > > CI
> > > errors with your complete series applied.
> > >
> > > https://github.com/openSUSE/multipath-tools/actions?query=branch%3Atip
> > > https://github.com/openSUSE/multipath-tools/actions/runs/10704501258/job/29677643779
> >
> > It looks like your "tip" brach is missing:
> > [PATCH 04/15] libmultipath: remove pending wait code from
> > libcheck_check calls
>
> Yeah. That patch ended up in a different mail folder, and I didn't
> notice. Weird. CI looks much better now.
But some issues remain, e.g.
https://github.com/openSUSE/multipath-tools/actions/runs/10708349169/job/29690448105
Martin
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 03/15] libmultipath: split out the code to wait for pending checkers
2024-09-04 18:16 ` Benjamin Marzinski
@ 2024-09-04 19:48 ` Martin Wilck
0 siblings, 0 replies; 46+ messages in thread
From: Martin Wilck @ 2024-09-04 19:48 UTC (permalink / raw)
To: Benjamin Marzinski; +Cc: Christophe Varoqui, device-mapper development
On Wed, 2024-09-04 at 14:16 -0400, Benjamin Marzinski wrote:
> On Wed, Sep 04, 2024 at 05:01:39PM +0200, Martin Wilck wrote:
> > On Wed, 2024-08-28 at 18:17 -0400, Benjamin Marzinski wrote:
>
> <snip>
>
> > > diff --git a/libmultipath/checkers/tur.c
> > > b/libmultipath/checkers/tur.c
> > > index a2905af5..95af5214 100644
> > > --- a/libmultipath/checkers/tur.c
> > > +++ b/libmultipath/checkers/tur.c
> > > @@ -323,6 +323,49 @@ static int tur_check_async_timeout(struct
> > > checker *c)
> > > return (now.tv_sec > ct->time);
> > > }
> > >
> > > +int check_pending(struct checker *c, struct timespec endtime)
> > > +{
> > > + struct tur_checker_context *ct = c->context;
> > > + int r, tur_status = PATH_PENDING;
> > > +
> > > + pthread_mutex_lock(&ct->lock);
> > > +
> > > + for (r = 0;
> > > + r == 0 && ct->state == PATH_PENDING &&
> > > + ct->msgid == MSG_TUR_RUNNING;
> > > + r = pthread_cond_timedwait(&ct->active, &ct->lock,
> > > &endtime));
> > > +
> > > + if (!r) {
> > > + tur_status = ct->state;
> > > + c->msgid = ct->msgid;
> > > + }
> > > + pthread_mutex_unlock(&ct->lock);
> > > + if (tur_status == PATH_PENDING && c->msgid ==
> > > MSG_TUR_RUNNING) {
> > > + condlog(4, "%d:%d : tur checker still running",
> > > + major(ct->devt), minor(ct->devt));
> > > + } else {
> > > + int running = uatomic_xchg(&ct->running, 0);
> > > + if (running)
> > > + pthread_cancel(ct->thread);
> > > + ct->thread = 0;
> > > + }
> > > +
> > > + return tur_status;
> > > +}
> > > +
> > > +int libcheck_pending(struct checker *c)
> > > +{
> > > + struct timespec endtime;
> > > + struct tur_checker_context *ct = c->context;
> > > +
> > > + /* The if path checker isn't running, just return the
> > > exiting value. */
> > > + if (!ct || !ct->thread)
> > > + return c->path_state;
> > > +
> > > + get_monotonic_time(&endtime);
> > > + return check_pending(c, endtime);
> >
> > Does this work?
> >
> > https://pubs.opengroup.org/onlinepubs/009695299/functions/pthread_cond_timedwait.html
> > says that "an error is returned [...] if the absolute time
> > specified by
> > abstime has already been passed at the time of the call".
> >
> > Which would always be the case if you pass a timestamp that you
> > fetched previously unmodified, meaning that you'd always get
> > ETIMEDOUT.
> > Or am I misreading the docs?
>
> No. You are reading the docs right. I'm just don't understand why
> that's a problem. When you start the checker, you set a timeout that
> you want to wait for, to give the checker a chance to complete in
> this
> checker loop.
>
> Once that timeout has passed, if the thread is still
> running, then libcheck_pending should return PATH_PENDING.
> pthread_cond_timedwait() doesn't get called unless the thread is
> running. So I would argue that pthread_cond_timedwait() always
> failing
> once that timeout has passed is the correct thing to do.
>
> Or are you arguing for having check_pending() manually check if the
> timeout has passed, and skipping the call to pthread_cond_timedwait()
> in
> that case?
I was confused by the missing patch 04/15. So let me read that one and
come back to you.
Martin
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 13/15] multipathd: update priority once after updating all paths
2024-09-04 18:57 ` Martin Wilck
@ 2024-09-04 19:51 ` Benjamin Marzinski
2024-09-04 20:14 ` Martin Wilck
0 siblings, 1 reply; 46+ messages in thread
From: Benjamin Marzinski @ 2024-09-04 19:51 UTC (permalink / raw)
To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development
On Wed, Sep 04, 2024 at 08:57:46PM +0200, Martin Wilck wrote:
> On Wed, 2024-08-28 at 18:17 -0400, Benjamin Marzinski wrote:
> > Instead of updating the path priorities and possibly reloading the
> > multipath device when each path is updated, wait till all paths
> > have been updated, and then go through the multipath devices updating
> > the priorities once, reloading if necessary.
> >
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> > libmultipath/structs.h | 9 +++
> > multipathd/main.c | 169 ++++++++++++++++++++++++++-------------
> > --
> > 2 files changed, 118 insertions(+), 60 deletions(-)
> >
> > diff --git a/libmultipath/structs.h b/libmultipath/structs.h
> > index af8e31e9..1f531d30 100644
> > --- a/libmultipath/structs.h
> > +++ b/libmultipath/structs.h
> > @@ -318,6 +318,7 @@ enum check_path_states {
> > CHECK_PATH_UNCHECKED,
> > CHECK_PATH_STARTED,
> > CHECK_PATH_CHECKED,
> > + CHECK_PATH_NEW_UP,
> > CHECK_PATH_SKIPPED,
> > CHECK_PATH_REMOVED,
> > };
> > @@ -421,6 +422,13 @@ enum prflag_value {
> > PRFLAG_SET,
> > };
> >
> > +enum prio_update_type {
> > + PRIO_UPDATE_NONE,
> > + PRIO_UPDATE_NORMAL,
> > + PRIO_UPDATE_NEW_PATH,
> > + PRIO_UPDATE_MARGINAL,
> > +};
> > +
> > struct multipath {
> > char wwid[WWID_SIZE];
> > char alias_old[WWID_SIZE];
> > @@ -464,6 +472,7 @@ struct multipath {
> > int queue_mode;
> > unsigned int sync_tick;
> > int synced_count;
> > + enum prio_update_type prio_update;
> > uid_t uid;
> > gid_t gid;
> > mode_t mode;
> > diff --git a/multipathd/main.c b/multipathd/main.c
> > index 2bcc6066..1511f701 100644
> > --- a/multipathd/main.c
> > +++ b/multipathd/main.c
> > @@ -1996,15 +1996,13 @@ mpvec_garbage_collector (struct vectors *
> > vecs)
> > * best pathgroup, and this is the first path in the pathgroup to
> > come back
> > * up, then switch to this pathgroup */
> > static int
> > -followover_should_failback(struct path * pp)
> > +do_followover_should_failback(struct path * pp)
> > {
> > struct pathgroup * pgp;
> > struct path *pp1;
> > int i;
> >
> > - if (pp->mpp->pgfailback != -FAILBACK_FOLLOWOVER ||
> > - !pp->mpp->pg || !pp->pgindex ||
> > - pp->pgindex != pp->mpp->bestpg)
> > + if (pp->pgindex != pp->mpp->bestpg)
> > return 0;
>
> What happened to the !pp->pgindex test? Is it not necessary any more?
In followover_should_failback(), before calling this function, we test
if mpp->bestpg is 0 and return early if it is (because 0 is not a valid
pathgroup id). Thus, if pp->pgindex equals mpp->bestpg (and we should
failback), it's obviously not zero. If it's not equal to mpp->bestpg,
then it doesn't matter what it is, because we aren't doing the
failback anyway. Right?
>
>
> >
> > pgp = VECTOR_SLOT(pp->mpp->pg, pp->pgindex - 1);
> > @@ -2017,6 +2015,26 @@ followover_should_failback(struct path * pp)
> > return 1;
> > }
> >
> > +static int
> > +followover_should_failback(struct multipath *mpp)
> > +{
> > + struct path *pp;
> > + struct pathgroup * pgp;
> > + int i, j;
> > +
> > + if (mpp->pgfailback != -FAILBACK_FOLLOWOVER || !mpp->pg ||
> > !mpp->bestpg)
> > + return 0;
> > +
> > + vector_foreach_slot (mpp->pg, pgp, i) {
> > + vector_foreach_slot (pgp->paths, pp, j) {
> > + if (pp->is_checked == CHECK_PATH_NEW_UP &&
> > + do_followover_should_failback(pp))
> > + return 1;
> > + }
> > + }
> > + return 0;
> > +}
> > +
> > static void
> > missing_uev_wait_tick(struct vectors *vecs)
> > {
> > @@ -2132,41 +2150,53 @@ partial_retrigger_tick(vector pathvec)
> > }
> > }
> >
> > -static int update_prio(struct path *pp, int force_refresh_all)
> > +static bool update_prio(struct multipath *mpp, bool refresh_all)
> > {
> > int oldpriority;
> > - struct path *pp1;
> > + struct path *pp;
> > struct pathgroup * pgp;
> > - int i, j, changed = 0;
> > + int i, j;
> > + bool changed = false;
> > + bool skipped_path = false;
> > struct config *conf;
> >
> > - oldpriority = pp->priority;
> > - if (pp->state != PATH_DOWN) {
> > - conf = get_multipath_config();
> > - pthread_cleanup_push(put_multipath_config, conf);
> > - pathinfo(pp, conf, DI_PRIO);
> > - pthread_cleanup_pop(1);
> > + vector_foreach_slot (mpp->pg, pgp, i) {
> > + vector_foreach_slot (pgp->paths, pp, j) {
> > + if (pp->state != PATH_UP && pp->state !=
> > PATH_GHOST)
> > + continue;
>
> This test used to be pp->state == PATH_DOWN (see below).
>
> Just trying to confirm that you did this on purpose. It might justify a
> separate patch, with rationale. After all, AFAICS, we will now keep the
> old priority for all path states except PATH_UP and PATH_GHOST, while
> previously we attempted to fetch the prio for most of them, and only
> used the previous value if fetching the prio failed.
In my understanding, we don't try getting the priority of failed paths
because some of the prioritizers can hang. On this grounds, if we are
excluding PATH_DOWN we should definitely also exclude PATH_SHAKY and
PATH_TIMEOUT and probably also PATH_DELAYED.
Also, since only paths that are in PATH_UP or PATH_GHOST are usable by
the kernel and contribute to the pathgroup's priority, not attempting to
update the priority of these other paths avoids the risk of hanging whil
not hurting anything (except that they might end up in the wrong
pathgroup during the time when they aren't usable, if you have
group_by_prio set).
Lastly, The way do_check_path() worked before this patchset was
inconsistent. If a path changed state to anything other than PATH_UP or
PATH_GHOST we skipped the prio update. However if the path kept the same
state we did the prio update, even if it wasn't PATH_UP or PATH_GHOST,
as long as it wasn't PATH_DOWN. That seems wrong.
But I'm fine with making these changes in a separate patch.
-Ben
> > + if (!refresh_all &&
> > + pp->is_checked != CHECK_PATH_CHECKED) {
> > + skipped_path = true;
> > + continue;
> > + }
> > + oldpriority = pp->priority;
> > + conf = get_multipath_config();
> > + pthread_cleanup_push(put_multipath_config,
> > conf);
> > + pathinfo(pp, conf, DI_PRIO);
> > + pthread_cleanup_pop(1);
> > + if (pp->priority != oldpriority)
> > + changed = true;
> > + }
> > }
> > -
> > - if (pp->priority != oldpriority)
> > - changed = 1;
> > - else if (!force_refresh_all)
> > - return 0;
> > -
> > - vector_foreach_slot (pp->mpp->pg, pgp, i) {
> > - vector_foreach_slot (pgp->paths, pp1, j) {
> > - if (pp1 == pp || pp1->state == PATH_DOWN)
> > + if (!changed || !skipped_path)
> > + return changed;
> > + /*
> > + * If a path changed priorities, refresh the priorities of
> > any
> > + * paths we skipped
> > + */
> > + vector_foreach_slot (mpp->pg, pgp, i) {
> > + vector_foreach_slot (pgp->paths, pp, j) {
> > + if (pp->state != PATH_UP && pp->state !=
> > PATH_GHOST)
> > + continue;
> > + if (pp->is_checked == CHECK_PATH_CHECKED)
> > continue;
> > - oldpriority = pp1->priority;
> > conf = get_multipath_config();
> > pthread_cleanup_push(put_multipath_config,
> > conf);
> > - pathinfo(pp1, conf, DI_PRIO);
> > + pathinfo(pp, conf, DI_PRIO);
> > pthread_cleanup_pop(1);
> > - if (pp1->priority != oldpriority)
> > - changed = 1;
> > }
> > }
> > - return changed;
> > + return true;
> > }
> >
> > static int reload_map(struct vectors *vecs, struct multipath *mpp,
> > @@ -2393,14 +2423,12 @@ static int
> > update_path_state (struct vectors * vecs, struct path * pp)
> > {
> > int newstate;
> > - int new_path_up = 0;
> > int chkr_new_path_up = 0;
> > int disable_reinstate = 0;
> > int oldchkrstate = pp->chkrstate;
> > unsigned int checkint, max_checkint;
> > struct config *conf;
> > - int marginal_pathgroups, marginal_changed = 0;
> > - bool need_reload;
> > + int marginal_pathgroups;
> >
> > conf = get_multipath_config();
> > checkint = conf->checkint;
> > @@ -2462,7 +2490,7 @@ update_path_state (struct vectors * vecs,
> > struct path * pp)
> > }
> > if (!pp->marginal) {
> > pp->marginal = 1;
> > - marginal_changed = 1;
> > + pp->mpp->prio_update =
> > PRIO_UPDATE_MARGINAL;
> > }
> > } else {
> > if (pp->marginal || pp->state ==
> > PATH_DELAYED)
> > @@ -2470,7 +2498,7 @@ update_path_state (struct vectors * vecs,
> > struct path * pp)
> > pp->dev);
> > if (marginal_pathgroups && pp->marginal) {
> > pp->marginal = 0;
> > - marginal_changed = 1;
> > + pp->mpp->prio_update =
> > PRIO_UPDATE_MARGINAL;
> > }
> > }
> > }
> > @@ -2537,7 +2565,8 @@ update_path_state (struct vectors * vecs,
> > struct path * pp)
> > */
> > if (!disable_reinstate)
> > reinstate_path(pp);
> > - new_path_up = 1;
> > + if (pp->mpp->prio_update != PRIO_UPDATE_MARGINAL)
> > + pp->mpp->prio_update = PRIO_UPDATE_NEW_PATH;
> >
> > if (oldchkrstate != PATH_UP && oldchkrstate !=
> > PATH_GHOST)
> > chkr_new_path_up = 1;
> > @@ -2588,38 +2617,48 @@ update_path_state (struct vectors * vecs,
> > struct path * pp)
> > LOG_MSG(2, pp);
> > }
> > }
> > -
> > + if (pp->mpp->prio_update == PRIO_UPDATE_NONE &&
> > + (newstate == PATH_UP || newstate == PATH_GHOST))
> > + pp->mpp->prio_update = PRIO_UPDATE_NORMAL;
> > pp->state = newstate;
> > + return chkr_new_path_up ? CHECK_PATH_NEW_UP :
> > CHECK_PATH_CHECKED;
> > +}
> >
> > - if (pp->mpp->wait_for_udev)
> > - return CHECK_PATH_CHECKED;
> > - /*
> > - * path prio refreshing
> > - */
> > - condlog(4, "path prio refresh");
> > -
> > - if (marginal_changed) {
> > - update_prio(pp, 1);
> > - reload_and_sync_map(pp->mpp, vecs);
> > - } else if (update_prio(pp, new_path_up) &&
> > - pp->mpp->pgpolicyfn == (pgpolicyfn
> > *)group_by_prio &&
> > - pp->mpp->pgfailback == -FAILBACK_IMMEDIATE) {
> > +static int
> > +update_mpp_prio(struct vectors *vecs, struct multipath *mpp)
> > +{
> > + bool need_reload, changed;
> > + enum prio_update_type prio_update = mpp->prio_update;
> > + mpp->prio_update = PRIO_UPDATE_NONE;
> > +
> > + if (mpp->wait_for_udev || prio_update == PRIO_UPDATE_NONE)
> > + return 0;
> > + condlog(4, "prio refresh");
> > +
> > + changed = update_prio(mpp, prio_update !=
> > PRIO_UPDATE_NORMAL);
> > + if (prio_update == PRIO_UPDATE_MARGINAL)
> > + return reload_and_sync_map(mpp, vecs);
> > + if (changed && mpp->pgpolicyfn == (pgpolicyfn
> > *)group_by_prio &&
> > + mpp->pgfailback == -FAILBACK_IMMEDIATE) {
> > condlog(2, "%s: path priorities changed. reloading",
> > - pp->mpp->alias);
> > - reload_and_sync_map(pp->mpp, vecs);
> > - } else if (need_switch_pathgroup(pp->mpp, &need_reload)) {
> > - if (pp->mpp->pgfailback > 0 &&
> > - (new_path_up || pp->mpp->failback_tick <= 0))
> > - pp->mpp->failback_tick = pp->mpp->pgfailback
> > + 1;
> > - else if (pp->mpp->pgfailback == -FAILBACK_IMMEDIATE
> > ||
> > - (chkr_new_path_up &&
> > followover_should_failback(pp))) {
> > + mpp->alias);
> > + return reload_and_sync_map(mpp, vecs);
> > + }
> > + if (need_switch_pathgroup(mpp, &need_reload)) {
> > + if (mpp->pgfailback > 0 &&
> > + (prio_update == PRIO_UPDATE_NEW_PATH ||
> > + mpp->failback_tick <= 0))
> > + mpp->failback_tick = mpp->pgfailback + 1;
> > + else if (mpp->pgfailback == -FAILBACK_IMMEDIATE ||
> > + (prio_update == PRIO_UPDATE_NEW_PATH &&
> > + followover_should_failback(mpp))) {
> > if (need_reload)
> > - reload_and_sync_map(pp->mpp, vecs);
> > + return reload_and_sync_map(mpp,
> > vecs);
> > else
> > - switch_pathgroup(pp->mpp);
> > + switch_pathgroup(mpp);
> > }
> > }
> > - return CHECK_PATH_CHECKED;
> > + return 0;
> > }
> >
> > static int
> > @@ -2859,7 +2898,7 @@ update_paths(struct vectors *vecs, int
> > *num_paths_p, time_t start_secs)
> > i--;
> > else {
> > pp->is_checked = rc;
> > - if (rc == CHECK_PATH_CHECKED)
> > + if (rc == CHECK_PATH_CHECKED ||
> > CHECK_PATH_NEW_UP)
> > (*num_paths_p)++;
> > }
> > if (++paths_checked % 128 == 0 &&
> > @@ -2932,8 +2971,10 @@ checkerloop (void *ap)
> > vector_foreach_slot(vecs->mpvec, mpp, i)
> > mpp->synced_count = 0;
> > if (checker_state == CHECKER_STARTING) {
> > - vector_foreach_slot(vecs->mpvec,
> > mpp, i)
> > + vector_foreach_slot(vecs->mpvec,
> > mpp, i) {
> > sync_mpp(vecs, mpp, ticks);
> > + mpp->prio_update =
> > PRIO_UPDATE_NONE;
> > + }
> > vector_foreach_slot(vecs->pathvec,
> > pp, i)
> > pp->is_checked =
> > CHECK_PATH_UNCHECKED;
> > checker_state =
> > CHECKER_CHECKING_PATHS;
> > @@ -2943,6 +2984,14 @@ checkerloop (void *ap)
> > if (checker_state == CHECKER_UPDATING_PATHS)
> > checker_state = update_paths(vecs,
> > &num_paths,
> >
> > start_time.tv_sec);
> > + if (checker_state == CHECKER_FINISHED) {
> > + vector_foreach_slot(vecs->mpvec,
> > mpp, i) {
> > + if (update_mpp_prio(vecs,
> > mpp) == 2) {
> > + /* multipath device
> > deleted */
> > + i--;
> > + }
> > + }
> > + }
> > lock_cleanup_pop(vecs->lock);
> > if (checker_state != CHECKER_FINISHED) {
> > /* Yield to waiters */
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 04/15] libmultipath: remove pending wait code from libcheck_check calls
2024-08-28 22:17 ` [PATCH 04/15] libmultipath: remove pending wait code from libcheck_check calls Benjamin Marzinski
@ 2024-09-04 20:05 ` Martin Wilck
2024-09-04 21:17 ` Benjamin Marzinski
0 siblings, 1 reply; 46+ messages in thread
From: Martin Wilck @ 2024-09-04 20:05 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Wed, 2024-08-28 at 18:17 -0400, Benjamin Marzinski wrote:
> When the tur and directio checkers start an asynchronous checker,
> they
> now immediately return with the path in PATH_PENDING, instead of
> waiting in checker_check(). Instead the wait now happens in
> checker_get_state().
>
> Additionally, the directio checker now waits for 1 ms, like the tur
> checker does. Also like the tur checker it now only waits once. If it
> is still pending after the first call to checker_get_state(). It will
> not wait at all on future calls, and will just process the already
> completed IOs.
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Now I understand what you're doing, but I find it somewhat confusing. I
believe this can be simplified after the split between starting the
checkers and looking at their result.
Why not just fire off the checkers, wait 1ms for _all_ the just started
checkers in checkerloop() [*], and then do the pending test like it is
now (but just in polling mode, with no c->timeout)?
Regards
Martin
[*] or maybe even a little more, 5ms, say: we'd still be waiting far
less time than we used to, but greatly increase the odds that most
checkers actually complete.
> ---
> libmultipath/checkers.c | 7 +++-
> libmultipath/checkers/directio.c | 57 +++++++++++++++++-------------
> --
> libmultipath/checkers/tur.c | 13 +++-----
> 3 files changed, 41 insertions(+), 36 deletions(-)
>
> diff --git a/libmultipath/checkers.c b/libmultipath/checkers.c
> index 298aec78..b44b856a 100644
> --- a/libmultipath/checkers.c
> +++ b/libmultipath/checkers.c
> @@ -303,7 +303,12 @@ void checker_put (struct checker * dst)
>
> int checker_get_state(struct checker *c)
> {
> - return c ? c->path_state : PATH_UNCHECKED;
> + if (!c)
> + return PATH_UNCHECKED;
> + if (c->path_state != PATH_PENDING || !c->cls->pending)
> + return c->path_state;
> + c->path_state = c->cls->pending(c);
> + return c->path_state;
> }
>
> void checker_check (struct checker * c, int path_state)
> diff --git a/libmultipath/checkers/directio.c
> b/libmultipath/checkers/directio.c
> index 3f88b40d..904e3071 100644
> --- a/libmultipath/checkers/directio.c
> +++ b/libmultipath/checkers/directio.c
> @@ -60,10 +60,11 @@ const char *libcheck_msgtable[] = {
> #define LOG(prio, fmt, args...) condlog(prio, "directio: " fmt,
> ##args)
>
> struct directio_context {
> - int running;
> + unsigned int running;
> int reset_flags;
> struct aio_group *aio_grp;
> struct async_req *req;
> + struct timespec endtime;
> };
>
> static struct aio_group *
> @@ -314,19 +315,16 @@ check_pending(struct directio_context *ct,
> struct timespec endtime)
> static int
> check_state(int fd, struct directio_context *ct, int sync, int
> timeout_secs)
> {
> - struct timespec timeout = { .tv_nsec = 1000 };
> struct stat sb;
> int rc;
> + struct io_event event;
> struct timespec endtime;
>
> if (fstat(fd, &sb) == 0) {
> LOG(4, "called for %x", (unsigned) sb.st_rdev);
> }
> - if (sync > 0) {
> + if (sync > 0)
> LOG(4, "called in synchronous mode");
> - timeout.tv_sec = timeout_secs;
> - timeout.tv_nsec = 0;
> - }
>
> if (ct->running) {
> if (ct->req->state != PATH_PENDING) {
> @@ -345,31 +343,26 @@ check_state(int fd, struct directio_context
> *ct, int sync, int timeout_secs)
> LOG(3, "io_submit error %i", -rc);
> return PATH_UNCHECKED;
> }
> + get_monotonic_time(&ct->endtime);
> + ct->endtime.tv_nsec += 1000 * 1000;
> + normalize_timespec(&ct->endtime);
> }
> ct->running++;
> + if (!sync)
> + return PATH_PENDING;
>
> get_monotonic_time(&endtime);
> - endtime.tv_sec += timeout.tv_sec;
> - endtime.tv_nsec += timeout.tv_nsec;
> + endtime.tv_sec += timeout_secs;
> normalize_timespec(&endtime);
>
> check_pending(ct, endtime);
> if (ct->req->state != PATH_PENDING)
> return ct->req->state;
>
> - if (ct->running > timeout_secs || sync) {
> - struct io_event event;
> -
> - LOG(3, "abort check on timeout");
> -
> - io_cancel(ct->aio_grp->ioctx, &ct->req->io, &event);
> - rc = PATH_DOWN;
> - } else {
> - LOG(4, "async io pending");
> - rc = PATH_PENDING;
> - }
> + LOG(3, "abort check on timeout");
>
> - return rc;
> + io_cancel(ct->aio_grp->ioctx, &ct->req->io, &event);
> + return PATH_DOWN;
> }
>
> static void set_msgid(struct checker *c, int state)
> @@ -395,21 +388,31 @@ static void set_msgid(struct checker *c, int
> state)
>
> int libcheck_pending(struct checker *c)
> {
> - struct timespec endtime;
> + int rc;
> + struct io_event event;
> struct directio_context *ct = (struct directio_context *)c-
> >context;
>
> /* The if path checker isn't running, just return the
> exiting value. */
> if (!ct || !ct->running)
> return c->path_state;
>
> - if (ct->req->state == PATH_PENDING) {
> - get_monotonic_time(&endtime);
> - check_pending(ct, endtime);
> - } else
> + if (ct->req->state == PATH_PENDING)
> + check_pending(ct, ct->endtime);
> + else
> ct->running = 0;
> - set_msgid(c, ct->req->state);
> + rc = ct->req->state;
> + if (rc == PATH_PENDING) {
> + if (ct->running > c->timeout) {
> + LOG(3, "abort check on timeout");
> + io_cancel(ct->aio_grp->ioctx, &ct->req->io,
> &event);
> + rc = PATH_DOWN;
> + }
> + else
> + LOG(4, "async io pending");
> + }
> + set_msgid(c, rc);
>
> - return ct->req->state;
> + return rc;
> }
>
> int libcheck_check (struct checker * c)
> diff --git a/libmultipath/checkers/tur.c
> b/libmultipath/checkers/tur.c
> index 95af5214..81db565b 100644
> --- a/libmultipath/checkers/tur.c
> +++ b/libmultipath/checkers/tur.c
> @@ -58,6 +58,7 @@ struct tur_checker_context {
> int msgid;
> struct checker_context ctx;
> unsigned int nr_timeouts;
> + struct timespec endtime;
> };
>
> int libcheck_init (struct checker * c)
> @@ -323,7 +324,7 @@ static int tur_check_async_timeout(struct checker
> *c)
> return (now.tv_sec > ct->time);
> }
>
> -int check_pending(struct checker *c, struct timespec endtime)
> +int check_pending(struct checker *c)
> {
> struct tur_checker_context *ct = c->context;
> int r, tur_status = PATH_PENDING;
> @@ -333,7 +334,7 @@ int check_pending(struct checker *c, struct
> timespec endtime)
> for (r = 0;
> r == 0 && ct->state == PATH_PENDING &&
> ct->msgid == MSG_TUR_RUNNING;
> - r = pthread_cond_timedwait(&ct->active, &ct->lock,
> &endtime));
> + r = pthread_cond_timedwait(&ct->active, &ct->lock, &ct-
> >endtime));
>
> if (!r) {
> tur_status = ct->state;
> @@ -355,21 +356,18 @@ int check_pending(struct checker *c, struct
> timespec endtime)
>
> int libcheck_pending(struct checker *c)
> {
> - struct timespec endtime;
> struct tur_checker_context *ct = c->context;
>
> /* The if path checker isn't running, just return the
> exiting value. */
> if (!ct || !ct->thread)
> return c->path_state;
>
> - get_monotonic_time(&endtime);
> - return check_pending(c, endtime);
> + return check_pending(c);
> }
>
> int libcheck_check(struct checker * c)
> {
> struct tur_checker_context *ct = c->context;
> - struct timespec tsp;
> pthread_attr_t attr;
> int tur_status, r;
>
> @@ -479,8 +477,7 @@ int libcheck_check(struct checker * c)
> " sync mode", major(ct->devt),
> minor(ct->devt));
> return tur_check(c->fd, c->timeout, &c-
> >msgid);
> }
> - tur_timeout(&tsp);
> - tur_status = check_pending(c, tsp);
> + tur_timeout(&ct->endtime);
> }
>
> return tur_status;
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 00/15] Yet Another path checker refactor
2024-08-28 22:17 [PATCH 00/15] Yet Another path checker refactor Benjamin Marzinski
` (14 preceding siblings ...)
2024-08-28 22:17 ` [PATCH 15/15] multipathd: fix deferred_failback_tick for reload removes Benjamin Marzinski
@ 2024-09-04 20:07 ` Martin Wilck
15 siblings, 0 replies; 46+ messages in thread
From: Martin Wilck @ 2024-09-04 20:07 UTC (permalink / raw)
To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development
On Wed, 2024-08-28 at 18:17 -0400, Benjamin Marzinski wrote:
>
> With this patchset, the code now first starts the path checkers on
> all
> devices due for a path check. Then it goes back and updates the state
> of
> all the path devices, waiting at most a total of 1ms if there are
> checkers that need waiting for. Once all of the paths have been
> updated,
> it goes through each multipath device, updating path priorities and
> reloading the device as necessary.
>
> This allows multipathd to spend less time and do less redundant work
> while checking paths, while making paths slightly more likely to not
> spend a checker cycle marked as pending.
while I've already responded to several of the patches: Thanks a lot
for this!! I've been thinking about a change like this for a long time,
and this is definitely a change for the better.
Martin
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 13/15] multipathd: update priority once after updating all paths
2024-09-04 19:51 ` Benjamin Marzinski
@ 2024-09-04 20:14 ` Martin Wilck
0 siblings, 0 replies; 46+ messages in thread
From: Martin Wilck @ 2024-09-04 20:14 UTC (permalink / raw)
To: Benjamin Marzinski; +Cc: Christophe Varoqui, device-mapper development
On Wed, 2024-09-04 at 15:51 -0400, Benjamin Marzinski wrote:
> On Wed, Sep 04, 2024 at 08:57:46PM +0200, Martin Wilck wrote:
> > On Wed, 2024-08-28 at 18:17 -0400, Benjamin Marzinski wrote:
> > > Instead of updating the path priorities and possibly reloading
> > > the
> > > multipath device when each path is updated, wait till all paths
> > > have been updated, and then go through the multipath devices
> > > updating
> > > the priorities once, reloading if necessary.
> > >
> > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > > ---
> > > libmultipath/structs.h | 9 +++
> > > multipathd/main.c | 169 ++++++++++++++++++++++++++---------
> > > ----
> > > --
> > > 2 files changed, 118 insertions(+), 60 deletions(-)
> > >
> > > diff --git a/libmultipath/structs.h b/libmultipath/structs.h
> > > index af8e31e9..1f531d30 100644
> > > --- a/libmultipath/structs.h
> > > +++ b/libmultipath/structs.h
> > > @@ -318,6 +318,7 @@ enum check_path_states {
> > > CHECK_PATH_UNCHECKED,
> > > CHECK_PATH_STARTED,
> > > CHECK_PATH_CHECKED,
> > > + CHECK_PATH_NEW_UP,
> > > CHECK_PATH_SKIPPED,
> > > CHECK_PATH_REMOVED,
> > > };
> > > @@ -421,6 +422,13 @@ enum prflag_value {
> > > PRFLAG_SET,
> > > };
> > >
> > > +enum prio_update_type {
> > > + PRIO_UPDATE_NONE,
> > > + PRIO_UPDATE_NORMAL,
> > > + PRIO_UPDATE_NEW_PATH,
> > > + PRIO_UPDATE_MARGINAL,
> > > +};
> > > +
> > > struct multipath {
> > > char wwid[WWID_SIZE];
> > > char alias_old[WWID_SIZE];
> > > @@ -464,6 +472,7 @@ struct multipath {
> > > int queue_mode;
> > > unsigned int sync_tick;
> > > int synced_count;
> > > + enum prio_update_type prio_update;
> > > uid_t uid;
> > > gid_t gid;
> > > mode_t mode;
> > > diff --git a/multipathd/main.c b/multipathd/main.c
> > > index 2bcc6066..1511f701 100644
> > > --- a/multipathd/main.c
> > > +++ b/multipathd/main.c
> > > @@ -1996,15 +1996,13 @@ mpvec_garbage_collector (struct vectors *
> > > vecs)
> > > * best pathgroup, and this is the first path in the pathgroup
> > > to
> > > come back
> > > * up, then switch to this pathgroup */
> > > static int
> > > -followover_should_failback(struct path * pp)
> > > +do_followover_should_failback(struct path * pp)
> > > {
> > > struct pathgroup * pgp;
> > > struct path *pp1;
> > > int i;
> > >
> > > - if (pp->mpp->pgfailback != -FAILBACK_FOLLOWOVER ||
> > > - !pp->mpp->pg || !pp->pgindex ||
> > > - pp->pgindex != pp->mpp->bestpg)
> > > + if (pp->pgindex != pp->mpp->bestpg)
> > > return 0;
> >
> > What happened to the !pp->pgindex test? Is it not necessary any
> > more?
>
> In followover_should_failback(), before calling this function, we
> test
> if mpp->bestpg is 0 and return early if it is (because 0 is not a
> valid
> pathgroup id). Thus, if pp->pgindex equals mpp->bestpg (and we
> should
> failback), it's obviously not zero. If it's not equal to mpp->bestpg,
> then it doesn't matter what it is, because we aren't doing the
> failback anyway. Right?
Ok, thanks for clarifying it ;-)
> >
> > This test used to be pp->state == PATH_DOWN (see below).
> >
> > Just trying to confirm that you did this on purpose. It might
> > justify a
> > separate patch, with rationale. After all, AFAICS, we will now keep
> > the
> > old priority for all path states except PATH_UP and PATH_GHOST,
> > while
> > previously we attempted to fetch the prio for most of them, and
> > only
> > used the previous value if fetching the prio failed.
>
> In my understanding, we don't try getting the priority of failed
> paths
> because some of the prioritizers can hang. On this grounds, if we are
> excluding PATH_DOWN we should definitely also exclude PATH_SHAKY and
> PATH_TIMEOUT and probably also PATH_DELAYED.
>
> Also, since only paths that are in PATH_UP or PATH_GHOST are usable
> by
> the kernel and contribute to the pathgroup's priority, not attempting
> to
> update the priority of these other paths avoids the risk of hanging
> whil
> not hurting anything (except that they might end up in the wrong
> pathgroup during the time when they aren't usable, if you have
> group_by_prio set).
>
> Lastly, The way do_check_path() worked before this patchset was
> inconsistent. If a path changed state to anything other than PATH_UP
> or
> PATH_GHOST we skipped the prio update. However if the path kept the
> same
> state we did the prio update, even if it wasn't PATH_UP or
> PATH_GHOST,
> as long as it wasn't PATH_DOWN. That seems wrong.
>
> But I'm fine with making these changes in a separate patch.
That would be great, and please include the above rationale.
Martin
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 04/15] libmultipath: remove pending wait code from libcheck_check calls
2024-09-04 20:05 ` Martin Wilck
@ 2024-09-04 21:17 ` Benjamin Marzinski
2024-09-05 7:53 ` Martin Wilck
0 siblings, 1 reply; 46+ messages in thread
From: Benjamin Marzinski @ 2024-09-04 21:17 UTC (permalink / raw)
To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development
On Wed, Sep 04, 2024 at 10:05:30PM +0200, Martin Wilck wrote:
> On Wed, 2024-08-28 at 18:17 -0400, Benjamin Marzinski wrote:
> > When the tur and directio checkers start an asynchronous checker,
> > they
> > now immediately return with the path in PATH_PENDING, instead of
> > waiting in checker_check(). Instead the wait now happens in
> > checker_get_state().
> >
> > Additionally, the directio checker now waits for 1 ms, like the tur
> > checker does. Also like the tur checker it now only waits once. If it
> > is still pending after the first call to checker_get_state(). It will
> > not wait at all on future calls, and will just process the already
> > completed IOs.
> >
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
>
> Now I understand what you're doing, but I find it somewhat confusing. I
> believe this can be simplified after the split between starting the
> checkers and looking at their result.
>
> Why not just fire off the checkers, wait 1ms for _all_ the just started
> checkers in checkerloop() [*], and then do the pending test like it is
> now (but just in polling mode, with no c->timeout)?
That's probably possible. You would also need to put that wait in
pathinfo().
The big reason I did it this way was so nothing outside of the checker
itself needed to care about things like if the checker was set to async
and whether or not the checker was actually async capable and if this
was a newly started async check or if the checker had already been
pending for a cycle, where waiting another 1ms was pointless.
This way, you just call checker_get_state() on all the paths were you
ran the path checker, and if at least one of them needs to wait, then
the first one that does will wait, and the rest won't. If none of them
need to wait, then none of them will.
-Ben
>
> Regards
> Martin
>
> [*] or maybe even a little more, 5ms, say: we'd still be waiting far
> less time than we used to, but greatly increase the odds that most
> checkers actually complete.
>
>
> > ---
> > libmultipath/checkers.c | 7 +++-
> > libmultipath/checkers/directio.c | 57 +++++++++++++++++-------------
> > --
> > libmultipath/checkers/tur.c | 13 +++-----
> > 3 files changed, 41 insertions(+), 36 deletions(-)
> >
> > diff --git a/libmultipath/checkers.c b/libmultipath/checkers.c
> > index 298aec78..b44b856a 100644
> > --- a/libmultipath/checkers.c
> > +++ b/libmultipath/checkers.c
> > @@ -303,7 +303,12 @@ void checker_put (struct checker * dst)
> >
> > int checker_get_state(struct checker *c)
> > {
> > - return c ? c->path_state : PATH_UNCHECKED;
> > + if (!c)
> > + return PATH_UNCHECKED;
> > + if (c->path_state != PATH_PENDING || !c->cls->pending)
> > + return c->path_state;
> > + c->path_state = c->cls->pending(c);
> > + return c->path_state;
> > }
> >
> > void checker_check (struct checker * c, int path_state)
> > diff --git a/libmultipath/checkers/directio.c
> > b/libmultipath/checkers/directio.c
> > index 3f88b40d..904e3071 100644
> > --- a/libmultipath/checkers/directio.c
> > +++ b/libmultipath/checkers/directio.c
> > @@ -60,10 +60,11 @@ const char *libcheck_msgtable[] = {
> > #define LOG(prio, fmt, args...) condlog(prio, "directio: " fmt,
> > ##args)
> >
> > struct directio_context {
> > - int running;
> > + unsigned int running;
> > int reset_flags;
> > struct aio_group *aio_grp;
> > struct async_req *req;
> > + struct timespec endtime;
> > };
> >
> > static struct aio_group *
> > @@ -314,19 +315,16 @@ check_pending(struct directio_context *ct,
> > struct timespec endtime)
> > static int
> > check_state(int fd, struct directio_context *ct, int sync, int
> > timeout_secs)
> > {
> > - struct timespec timeout = { .tv_nsec = 1000 };
> > struct stat sb;
> > int rc;
> > + struct io_event event;
> > struct timespec endtime;
> >
> > if (fstat(fd, &sb) == 0) {
> > LOG(4, "called for %x", (unsigned) sb.st_rdev);
> > }
> > - if (sync > 0) {
> > + if (sync > 0)
> > LOG(4, "called in synchronous mode");
> > - timeout.tv_sec = timeout_secs;
> > - timeout.tv_nsec = 0;
> > - }
> >
> > if (ct->running) {
> > if (ct->req->state != PATH_PENDING) {
> > @@ -345,31 +343,26 @@ check_state(int fd, struct directio_context
> > *ct, int sync, int timeout_secs)
> > LOG(3, "io_submit error %i", -rc);
> > return PATH_UNCHECKED;
> > }
> > + get_monotonic_time(&ct->endtime);
> > + ct->endtime.tv_nsec += 1000 * 1000;
> > + normalize_timespec(&ct->endtime);
> > }
> > ct->running++;
> > + if (!sync)
> > + return PATH_PENDING;
> >
> > get_monotonic_time(&endtime);
> > - endtime.tv_sec += timeout.tv_sec;
> > - endtime.tv_nsec += timeout.tv_nsec;
> > + endtime.tv_sec += timeout_secs;
> > normalize_timespec(&endtime);
> >
> > check_pending(ct, endtime);
> > if (ct->req->state != PATH_PENDING)
> > return ct->req->state;
> >
> > - if (ct->running > timeout_secs || sync) {
> > - struct io_event event;
> > -
> > - LOG(3, "abort check on timeout");
> > -
> > - io_cancel(ct->aio_grp->ioctx, &ct->req->io, &event);
> > - rc = PATH_DOWN;
> > - } else {
> > - LOG(4, "async io pending");
> > - rc = PATH_PENDING;
> > - }
> > + LOG(3, "abort check on timeout");
> >
> > - return rc;
> > + io_cancel(ct->aio_grp->ioctx, &ct->req->io, &event);
> > + return PATH_DOWN;
> > }
> >
> > static void set_msgid(struct checker *c, int state)
> > @@ -395,21 +388,31 @@ static void set_msgid(struct checker *c, int
> > state)
> >
> > int libcheck_pending(struct checker *c)
> > {
> > - struct timespec endtime;
> > + int rc;
> > + struct io_event event;
> > struct directio_context *ct = (struct directio_context *)c-
> > >context;
> >
> > /* The if path checker isn't running, just return the
> > exiting value. */
> > if (!ct || !ct->running)
> > return c->path_state;
> >
> > - if (ct->req->state == PATH_PENDING) {
> > - get_monotonic_time(&endtime);
> > - check_pending(ct, endtime);
> > - } else
> > + if (ct->req->state == PATH_PENDING)
> > + check_pending(ct, ct->endtime);
> > + else
> > ct->running = 0;
> > - set_msgid(c, ct->req->state);
> > + rc = ct->req->state;
> > + if (rc == PATH_PENDING) {
> > + if (ct->running > c->timeout) {
> > + LOG(3, "abort check on timeout");
> > + io_cancel(ct->aio_grp->ioctx, &ct->req->io,
> > &event);
> > + rc = PATH_DOWN;
> > + }
> > + else
> > + LOG(4, "async io pending");
> > + }
> > + set_msgid(c, rc);
> >
> > - return ct->req->state;
> > + return rc;
> > }
> >
> > int libcheck_check (struct checker * c)
> > diff --git a/libmultipath/checkers/tur.c
> > b/libmultipath/checkers/tur.c
> > index 95af5214..81db565b 100644
> > --- a/libmultipath/checkers/tur.c
> > +++ b/libmultipath/checkers/tur.c
> > @@ -58,6 +58,7 @@ struct tur_checker_context {
> > int msgid;
> > struct checker_context ctx;
> > unsigned int nr_timeouts;
> > + struct timespec endtime;
> > };
> >
> > int libcheck_init (struct checker * c)
> > @@ -323,7 +324,7 @@ static int tur_check_async_timeout(struct checker
> > *c)
> > return (now.tv_sec > ct->time);
> > }
> >
> > -int check_pending(struct checker *c, struct timespec endtime)
> > +int check_pending(struct checker *c)
> > {
> > struct tur_checker_context *ct = c->context;
> > int r, tur_status = PATH_PENDING;
> > @@ -333,7 +334,7 @@ int check_pending(struct checker *c, struct
> > timespec endtime)
> > for (r = 0;
> > r == 0 && ct->state == PATH_PENDING &&
> > ct->msgid == MSG_TUR_RUNNING;
> > - r = pthread_cond_timedwait(&ct->active, &ct->lock,
> > &endtime));
> > + r = pthread_cond_timedwait(&ct->active, &ct->lock, &ct-
> > >endtime));
> >
> > if (!r) {
> > tur_status = ct->state;
> > @@ -355,21 +356,18 @@ int check_pending(struct checker *c, struct
> > timespec endtime)
> >
> > int libcheck_pending(struct checker *c)
> > {
> > - struct timespec endtime;
> > struct tur_checker_context *ct = c->context;
> >
> > /* The if path checker isn't running, just return the
> > exiting value. */
> > if (!ct || !ct->thread)
> > return c->path_state;
> >
> > - get_monotonic_time(&endtime);
> > - return check_pending(c, endtime);
> > + return check_pending(c);
> > }
> >
> > int libcheck_check(struct checker * c)
> > {
> > struct tur_checker_context *ct = c->context;
> > - struct timespec tsp;
> > pthread_attr_t attr;
> > int tur_status, r;
> >
> > @@ -479,8 +477,7 @@ int libcheck_check(struct checker * c)
> > " sync mode", major(ct->devt),
> > minor(ct->devt));
> > return tur_check(c->fd, c->timeout, &c-
> > >msgid);
> > }
> > - tur_timeout(&tsp);
> > - tur_status = check_pending(c, tsp);
> > + tur_timeout(&ct->endtime);
> > }
> >
> > return tur_status;
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 05/15] multipath-tools tests: fix up directio tests
2024-09-04 19:43 ` Martin Wilck
@ 2024-09-04 22:53 ` Benjamin Marzinski
2024-09-05 7:57 ` Martin Wilck
0 siblings, 1 reply; 46+ messages in thread
From: Benjamin Marzinski @ 2024-09-04 22:53 UTC (permalink / raw)
To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development
On Wed, Sep 04, 2024 at 09:43:59PM +0200, Martin Wilck wrote:
> On Wed, 2024-09-04 at 21:36 +0200, Martin Wilck wrote:
> > On Wed, 2024-09-04 at 14:29 -0400, Benjamin Marzinski wrote:
> > > On Wed, Sep 04, 2024 at 06:12:37PM +0200, Martin Wilck wrote:
> > > > On Wed, 2024-08-28 at 18:17 -0400, Benjamin Marzinski wrote:
> > > > > Make the directio tests work with libcheck_pending() being
> > > > > separate
> > > > > from
> > > > > libcheck_check
> > > > >
> > > > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > > >
> > > > There's still something wrong with this test. I'm seeing lots of
> > > > CI
> > > > errors with your complete series applied.
> > > >
> > > > https://github.com/openSUSE/multipath-tools/actions?query=branch%3Atip
> > > > https://github.com/openSUSE/multipath-tools/actions/runs/10704501258/job/29677643779
> > >
> > > It looks like your "tip" brach is missing:
> > > [PATCH 04/15] libmultipath: remove pending wait code from
> > > libcheck_check calls
> >
> > Yeah. That patch ended up in a different mail folder, and I didn't
> > notice. Weird. CI looks much better now.
>
> But some issues remain, e.g.
>
> https://github.com/openSUSE/multipath-tools/actions/runs/10708349169/job/29690448105
I'm pretty sure that due to valgrind and virtual machine induced delays,
we end up waiting more than 1ms in test_check_state_async() between
starting the checker at
do_check_state(&c[256], 0, PATH_PENDING);
and calling libcheck_pending at
do_libcheck_pending(&c[256], PATH_UP);
This means that we will only call get_events() once, and we won't get
the IO for the c[256] which the test returns on the second call to
get_events(). This would cause the error from the github CI runs (I
haven't been able to reproduce this myself locally, but I haven't tried
on an Ubuntu VM):
[ RUN ] test_check_state_async
[ ERROR ] --- 0x6 != 0x3
[ LINE ] --- directio.c:237: error: Failure!
[ FAILED ] test_check_state_async
Since the time it takes the test program to run is out of our hands and
the checker wait time isn't configurable, I'm not sure that we can
guarantee that this test will always run correctly while testing this
code path without being a little hacky and manually bumping up
ct->endtime so that we're sure it hasn't already passed when we call
libcheck_pending().
Obviously if we took your route and did the waiting outside of
libcheck_pending(), then this code path wouldn't exist and the problem
would go away. I'll think on this a bit.
-Ben
>
> Martin
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 04/15] libmultipath: remove pending wait code from libcheck_check calls
2024-09-04 21:17 ` Benjamin Marzinski
@ 2024-09-05 7:53 ` Martin Wilck
2024-09-06 17:26 ` Benjamin Marzinski
0 siblings, 1 reply; 46+ messages in thread
From: Martin Wilck @ 2024-09-05 7:53 UTC (permalink / raw)
To: Benjamin Marzinski; +Cc: Christophe Varoqui, device-mapper development
On Wed, 2024-09-04 at 17:17 -0400, Benjamin Marzinski wrote:
> On Wed, Sep 04, 2024 at 10:05:30PM +0200, Martin Wilck wrote:
> > On Wed, 2024-08-28 at 18:17 -0400, Benjamin Marzinski wrote:
> > > When the tur and directio checkers start an asynchronous checker,
> > > they
> > > now immediately return with the path in PATH_PENDING, instead of
> > > waiting in checker_check(). Instead the wait now happens in
> > > checker_get_state().
> > >
> > > Additionally, the directio checker now waits for 1 ms, like the
> > > tur
> > > checker does. Also like the tur checker it now only waits once.
> > > If it
> > > is still pending after the first call to checker_get_state(). It
> > > will
> > > not wait at all on future calls, and will just process the
> > > already
> > > completed IOs.
> > >
> > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> >
> > Now I understand what you're doing, but I find it somewhat
> > confusing. I
> > believe this can be simplified after the split between starting the
> > checkers and looking at their result.
> >
> > Why not just fire off the checkers, wait 1ms for _all_ the just
> > started
> > checkers in checkerloop() [*], and then do the pending test like it
> > is
> > now (but just in polling mode, with no c->timeout)?
>
> That's probably possible. You would also need to put that wait in
> pathinfo().
pathinfo() is tricky because it's written with the assumption that the
result is obtained synchronously. Maybe in pathinfo we should just fire
up the checker and set the status to PENDING. But we might as well not
run the checker at all, just set the checker interval to minimum, and
let the checkers do their work. (I like that idea, actually).
Another idea would be to define an exception for pathinfo(), and
actually wait for completion there, not until c->timeout expires, but
somewhat longer than we currently do.
> The big reason I did it this way was so nothing outside of the
> checker
> itself needed to care about things like if the checker was set to
> async
> and whether or not the checker was actually async capable and if this
> was a newly started async check or if the checker had already been
> pending for a cycle, where waiting another 1ms was pointless.
Right, the sync property should be hidden. If sync checkers are in use,
none of the recent changes will have much of a positive effect. I'd
like to convert all checkers to async mode. Just need some time to do
it :-/ AFAICS, the only sync checker that still matters is rdac.
Below is some brainstorming from my side.
Ignore it if it makes no sense.
We currently use the path state, in particular PATH_PENDING, to infer
the state of the checker (thread, aio request), but that doesn't give
us the full picture.
We should be able to distinguish the path state and the state of the
checker itself, which can be IDLE, RUNNING, or DONE. The checker
provides an API checker_state() (I can't think of a better name) to
retrieve its state. The path state can (only) be retrieved in DONE
checker state. When the path state is retrieved, the checker state
switches to IDLE. checker_start() switches from IDLE to RUNNING.
checker_wait(timeout) waits until the RUNNING checker reaches DONE, or
the passed timeout is reached. A timed-out checker
(wrt c->timeout) has state DONE with path state PATH_TIMEOUT.
Sync checkers would have a degenerated API where checker_start() and
checker_state() would be noops.
> This way, you just call checker_get_state() on all the paths were you
> ran the path checker, and if at least one of them needs to wait, then
> the first one that does will wait, and the rest won't. If none of
> them
> need to wait, then none of them will.
This makes sense. But the "wait" you're talking about is the 1ms delay,
after which the paths may well still be in PATH_PENDING. It's actually
not much different from not waiting at all, we've just decreased the
likelihood of PATH_PENDIN. 1ms is just an estimate, and, in my
experience, too short in many environments.
The real benefit of your patch set is that we now start all (async)
checkers in parallel. Which means that we can wait longer, increasing
the likelihood that the checkers have actually finished, while spending
less total time waiting. "Waiting longer" is a bit hand-waving; IMO it
should be handled further up in the stack rather than down in the
checker code.
We could achieve the behavior you describe in checkerloop, like this:
- record start_timestamp
- fire off all handlers
- calculate end_time as start_timestamp + some_delay
- use that end_time for waiting for results of all fired-off checkers
Like in your example, we'd really wait for no more than 1 path.
We could do this for the entire set of paths, or for each multipath map
separately.
Regards
Martin
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 05/15] multipath-tools tests: fix up directio tests
2024-09-04 22:53 ` Benjamin Marzinski
@ 2024-09-05 7:57 ` Martin Wilck
0 siblings, 0 replies; 46+ messages in thread
From: Martin Wilck @ 2024-09-05 7:57 UTC (permalink / raw)
To: Benjamin Marzinski; +Cc: Christophe Varoqui, device-mapper development
On Wed, 2024-09-04 at 18:53 -0400, Benjamin Marzinski wrote:
> On Wed, Sep 04, 2024 at 09:43:59PM +0200, Martin Wilck wrote:
> >
> > But some issues remain, e.g.
> >
> > https://github.com/openSUSE/multipath-tools/actions/runs/10708349169/job/29690448105
>
> I'm pretty sure that due to valgrind and virtual machine induced
> delays,
> we end up waiting more than 1ms in test_check_state_async() between
> starting the checker at
>
> do_check_state(&c[256], 0, PATH_PENDING);
>
> and calling libcheck_pending at
>
> do_libcheck_pending(&c[256], PATH_UP);
>
> This means that we will only call get_events() once, and we won't get
> the IO for the c[256] which the test returns on the second call to
> get_events(). This would cause the error from the github CI runs (I
> haven't been able to reproduce this myself locally, but I haven't
> tried
> on an Ubuntu VM):
>
> [ RUN ] test_check_state_async
> [ ERROR ] --- 0x6 != 0x3
> [ LINE ] --- directio.c:237: error: Failure!
> [ FAILED ] test_check_state_async
>
> Since the time it takes the test program to run is out of our hands
> and
> the checker wait time isn't configurable, I'm not sure that we can
> guarantee that this test will always run correctly while testing this
> code path without being a little hacky and manually bumping up
> ct->endtime so that we're sure it hasn't already passed when we call
> libcheck_pending().
Thanks for having a look. What you write makes sense.
Whatever we do, we will either need to disable the test, or find a way
to fine-tune the timeout such that the CI succeeds (most of the time,
at least). Constantly failing CI is no CI at all.
> Obviously if we took your route and did the waiting outside of
> libcheck_pending(), then this code path wouldn't exist and the
> problem
> would go away. I'll think on this a bit.
Thanks!
Martin
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 10/15] multipathd: split check_path into two functions
2024-09-04 18:51 ` Benjamin Marzinski
@ 2024-09-05 19:02 ` Benjamin Marzinski
2024-09-06 7:45 ` Martin Wilck
0 siblings, 1 reply; 46+ messages in thread
From: Benjamin Marzinski @ 2024-09-05 19:02 UTC (permalink / raw)
To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development
On Wed, Sep 04, 2024 at 02:51:19PM -0400, Benjamin Marzinski wrote:
> On Wed, Sep 04, 2024 at 05:38:52PM +0200, Martin Wilck wrote:
> > On Wed, 2024-08-28 at 18:17 -0400, Benjamin Marzinski wrote:
> > > Split out the code that updates a path's state and sets up the next
> > > check time into its own function, update_path().
> > >
> > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > > ---
> > > multipathd/main.c | 31 ++++++++++++++++++++++---------
> > > 1 file changed, 22 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/multipathd/main.c b/multipathd/main.c
> > > index 94d4e421..300f8247 100644
> > > --- a/multipathd/main.c
> > > +++ b/multipathd/main.c
> > > @@ -2390,6 +2390,7 @@ sync_mpp(struct vectors * vecs, struct
> > > multipath *mpp, unsigned int ticks)
> > > }
> > >
> > > enum check_path_return {
> > > + CHECK_PATH_STARTED,
> > > CHECK_PATH_CHECKED,
> > > CHECK_PATH_SKIPPED,
> > > CHECK_PATH_REMOVED,
> > > @@ -2629,13 +2630,10 @@ update_path_state (struct vectors * vecs,
> > > struct path * pp)
> > > }
> > >
> > > static int
> > > -check_path (struct vectors * vecs, struct path * pp, unsigned int
> > > ticks,
> > > - time_t start_secs)
> > > +check_path (struct path * pp, unsigned int ticks)
> >
> > check_path() used to be one of our core functions, and you now re-
> > introduce it with quite different semantics.
> >
> > Perhaps choose a new name?
>
> Sure. Although the new check_path() is just the beginning part of the
> old check_path(), where we actually run the checker, so it seems
> reasonable to me. But your objection is also reasonable. I was just
> getting sick of coming up with new function names by this point.
>
> -Ben
Do you have ideas for the name, because I can't think of anything that
makes more sense then check_path() in the code paths. Here's the code
paths
checkerloop (initialized paths)
-----------------------------------------
check_paths update_paths
check_path update_path
start_path_check update_path_state
start_checker check_path_state
checker_check get_state
checker_get_state
checkerloop (uninitialized paths)
-----------------------------------------
check_paths update_paths
check_uninitialized_path update_uninitialized_path
start_path_check check_path_state
start_checker get_state
checker_check checker_get_state
pathinfo
----------------
start_checker get_state
checker_check checker_get_state
The only function that stands out to me as misnamed is
"check_path_state", which I think I'm going to change to
"get_updated_state", since that's a better description of what it's
doing, and it avoids using "check" in the update code path.
Also, like I said before, if you're looking for the function that gets
run to see if a path is due for checking, and runs the path checker if
it is, that's check_path(), just like it used to be. It just no longer
updates the path and mpp state based on the checker result like it used
to. That's now done by update_path() and update_mpp_prio().
>
> >
> > Martin
>
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 10/15] multipathd: split check_path into two functions
2024-09-05 19:02 ` Benjamin Marzinski
@ 2024-09-06 7:45 ` Martin Wilck
0 siblings, 0 replies; 46+ messages in thread
From: Martin Wilck @ 2024-09-06 7:45 UTC (permalink / raw)
To: Benjamin Marzinski; +Cc: Christophe Varoqui, device-mapper development
On Thu, 2024-09-05 at 15:02 -0400, Benjamin Marzinski wrote:
> On Wed, Sep 04, 2024 at 02:51:19PM -0400, Benjamin Marzinski wrote:
> > On Wed, Sep 04, 2024 at 05:38:52PM +0200, Martin Wilck wrote:
> > > On Wed, 2024-08-28 at 18:17 -0400, Benjamin Marzinski wrote:
> > > > Split out the code that updates a path's state and sets up the
> > > > next
> > > > check time into its own function, update_path().
> > > >
> > > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > > > ---
> > > > multipathd/main.c | 31 ++++++++++++++++++++++---------
> > > > 1 file changed, 22 insertions(+), 9 deletions(-)
> > > >
> > > > diff --git a/multipathd/main.c b/multipathd/main.c
> > > > index 94d4e421..300f8247 100644
> > > > --- a/multipathd/main.c
> > > > +++ b/multipathd/main.c
> > > > @@ -2390,6 +2390,7 @@ sync_mpp(struct vectors * vecs, struct
> > > > multipath *mpp, unsigned int ticks)
> > > > }
> > > >
> > > > enum check_path_return {
> > > > + CHECK_PATH_STARTED,
> > > > CHECK_PATH_CHECKED,
> > > > CHECK_PATH_SKIPPED,
> > > > CHECK_PATH_REMOVED,
> > > > @@ -2629,13 +2630,10 @@ update_path_state (struct vectors *
> > > > vecs,
> > > > struct path * pp)
> > > > }
> > > >
> > > > static int
> > > > -check_path (struct vectors * vecs, struct path * pp, unsigned
> > > > int
> > > > ticks,
> > > > - time_t start_secs)
> > > > +check_path (struct path * pp, unsigned int ticks)
> > >
> > > check_path() used to be one of our core functions, and you now
> > > re-
> > > introduce it with quite different semantics.
> > >
> > > Perhaps choose a new name?
> >
> > Sure. Although the new check_path() is just the beginning part of
> > the
> > old check_path(), where we actually run the checker, so it seems
> > reasonable to me. But your objection is also reasonable. I was just
> > getting sick of coming up with new function names by this point.
> >
> > -Ben
>
> Do you have ideas for the name, because I can't think of anything
> that
> makes more sense then check_path() in the code paths. Here's the code
> paths
Ok then, keep the name. I was hoping we could find something better
because (as you correctly show) there's already quite some confusion
because of multiple functions with very similar names.
But if we can't, so be it.
Martin
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 04/15] libmultipath: remove pending wait code from libcheck_check calls
2024-09-05 7:53 ` Martin Wilck
@ 2024-09-06 17:26 ` Benjamin Marzinski
0 siblings, 0 replies; 46+ messages in thread
From: Benjamin Marzinski @ 2024-09-06 17:26 UTC (permalink / raw)
To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development
On Thu, Sep 05, 2024 at 09:53:52AM +0200, Martin Wilck wrote:
> On Wed, 2024-09-04 at 17:17 -0400, Benjamin Marzinski wrote:
> > On Wed, Sep 04, 2024 at 10:05:30PM +0200, Martin Wilck wrote:
> > > On Wed, 2024-08-28 at 18:17 -0400, Benjamin Marzinski wrote:
<snip>
> > This way, you just call checker_get_state() on all the paths were you
> > ran the path checker, and if at least one of them needs to wait, then
> > the first one that does will wait, and the rest won't. If none of
> > them
> > need to wait, then none of them will.
>
> This makes sense. But the "wait" you're talking about is the 1ms delay,
> after which the paths may well still be in PATH_PENDING. It's actually
> not much different from not waiting at all, we've just decreased the
> likelihood of PATH_PENDIN. 1ms is just an estimate, and, in my
> experience, too short in many environments.
>
> The real benefit of your patch set is that we now start all (async)
> checkers in parallel. Which means that we can wait longer, increasing
> the likelihood that the checkers have actually finished, while spending
> less total time waiting. "Waiting longer" is a bit hand-waving; IMO it
> should be handled further up in the stack rather than down in the
> checker code.
>
> We could achieve the behavior you describe in checkerloop, like this:
>
> - record start_timestamp
> - fire off all handlers
> - calculate end_time as start_timestamp + some_delay
> - use that end_time for waiting for results of all fired-off checkers
>
> Like in your example, we'd really wait for no more than 1 path.
> We could do this for the entire set of paths, or for each multipath map
> separately.
Actually, now that I think about it, the biggest benefit of doing the
wait in checkerloop itself is that the code already handles the case
where we drop the vecs lock while there are paths that have started the
checker, but not yet updated the device based on the results. Given
that, we can just do the wait for pending paths without holding the vecs
lock. It's sort of embarassing that I did all this work to cut down on
the time we're waiting but didn't think about the fact that with just a
bit more work, we could avoid sleeping while holding the lock
altogether.
I already did some testing before to make sure that things like
reconfigures, adding/removing paths and manually failing/restoring paths
didn't mess things up if they happened in that window where we drop the
lock in the middle of checking the paths, but I'll probably need to do
some more tests and recheck that code if it's going to be non-rare
occurance for other threads to run in the middle of checking paths. But
this seems like too big of a benefit to avoid doing.
-Ben
> Regards
> Martin
^ permalink raw reply [flat|nested] 46+ messages in thread
end of thread, other threads:[~2024-09-06 17:26 UTC | newest]
Thread overview: 46+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-28 22:17 [PATCH 00/15] Yet Another path checker refactor Benjamin Marzinski
2024-08-28 22:17 ` [PATCH 01/15] libmultipath: store checker_check() result in checker struct Benjamin Marzinski
2024-09-04 16:13 ` Martin Wilck
2024-08-28 22:17 ` [PATCH 02/15] libmultipath: add missing checker function prototypes Benjamin Marzinski
2024-09-04 16:13 ` Martin Wilck
2024-08-28 22:17 ` [PATCH 03/15] libmultipath: split out the code to wait for pending checkers Benjamin Marzinski
2024-09-04 15:01 ` Martin Wilck
2024-09-04 18:16 ` Benjamin Marzinski
2024-09-04 19:48 ` Martin Wilck
2024-08-28 22:17 ` [PATCH 04/15] libmultipath: remove pending wait code from libcheck_check calls Benjamin Marzinski
2024-09-04 20:05 ` Martin Wilck
2024-09-04 21:17 ` Benjamin Marzinski
2024-09-05 7:53 ` Martin Wilck
2024-09-06 17:26 ` Benjamin Marzinski
2024-08-28 22:17 ` [PATCH 05/15] multipath-tools tests: fix up directio tests Benjamin Marzinski
2024-09-04 16:12 ` Martin Wilck
2024-09-04 18:29 ` Benjamin Marzinski
2024-09-04 19:36 ` Martin Wilck
2024-09-04 19:43 ` Martin Wilck
2024-09-04 22:53 ` Benjamin Marzinski
2024-09-05 7:57 ` Martin Wilck
2024-08-28 22:17 ` [PATCH 06/15] libmultipath: split get_state into two functions Benjamin Marzinski
2024-09-04 15:14 ` Martin Wilck
2024-09-04 18:43 ` Benjamin Marzinski
2024-08-28 22:17 ` [PATCH 07/15] libmultipath: change path_offline to path_sysfs_state Benjamin Marzinski
2024-09-04 15:31 ` Martin Wilck
2024-08-28 22:17 ` [PATCH 08/15] multipathd: split check_path_state into two functions Benjamin Marzinski
2024-08-28 22:17 ` [PATCH 09/15] multipathd: split do_checker_path Benjamin Marzinski
2024-08-28 22:17 ` [PATCH 10/15] multipathd: split check_path into two functions Benjamin Marzinski
2024-09-04 15:38 ` Martin Wilck
2024-09-04 18:51 ` Benjamin Marzinski
2024-09-05 19:02 ` Benjamin Marzinski
2024-09-06 7:45 ` Martin Wilck
2024-08-28 22:17 ` [PATCH 11/15] multipathd: split handle_uninitialized_path " Benjamin Marzinski
2024-08-28 22:17 ` [PATCH 12/15] multipathd: split check_paths " Benjamin Marzinski
2024-08-28 22:17 ` [PATCH 13/15] multipathd: update priority once after updating all paths Benjamin Marzinski
2024-09-04 15:06 ` Martin Wilck
2024-09-04 18:54 ` Benjamin Marzinski
2024-09-04 18:57 ` Martin Wilck
2024-09-04 19:51 ` Benjamin Marzinski
2024-09-04 20:14 ` Martin Wilck
2024-08-28 22:17 ` [PATCH 14/15] multipathd: remove pointless check Benjamin Marzinski
2024-09-04 16:07 ` Martin Wilck
2024-08-28 22:17 ` [PATCH 15/15] multipathd: fix deferred_failback_tick for reload removes Benjamin Marzinski
2024-09-04 16:10 ` Martin Wilck
2024-09-04 20:07 ` [PATCH 00/15] Yet Another path checker refactor Martin Wilck
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.