Re: [PATCH 0/1] checkers: fix race in async TUR and ALUA result collection

public inbox for dm-devel@redhat.com
 help / color / mirror / Atom feed

* Re: [PATCH 0/1] checkers: fix race in async TUR and ALUA result collection
       [not found] <20260224010058.35337-1-brian@purestorage.com>
@ 2026-02-24 13:26 ` Martin Wilck
  2026-02-25 20:41   ` Benjamin Marzinski
  0 siblings, 1 reply; 8+ messages in thread
From: Martin Wilck @ 2026-02-24 13:26 UTC (permalink / raw)
  To: Brian Bunker, bmarzins; +Cc: dm-devel

Hi Brian,

On Mon, 2026-02-23 at 17:00 -0800, Brian Bunker wrote:
> Fix a race condition in libcheck_check() for both the TUR and ALUA
> async
> checkers where a completed result could be missed, returning
> PATH_PENDING
> unnecessarily.

First of all, you should have cc'd the dm-devel mailing list.

Second, you call this a race condition, but what negative effect does
it actually have? AFAIU this "race condition" just delays returning the
new checker state by a single tick (i.e. 1 second), which is a rather
benign problem for a race, and generally not a big problem for
multipath, where paths can be offline for much longer periods of time.

Finally, before going into any detail, I am highly reluctant to changes
in this area. Ben and I have fought with deadlocks and races in corner
cases for a long time, and we seem to have solved it for good; at least
we haven't seen any more bugs in this area for quite some time. For a
change like this, figuring out possible side effects for all possible
cases by code inspection alone is extremely hard. I'd rather not risk a
regression just to detect a status change 1 second earlier than we
currently do, in rare cases.

> 
> Race window
> -----------
> 
> In libcheck_thread() / alua_thread():
> 
>     pthread_mutex_lock(&ct->lock);
>     ct->state = state;
>     ct->msgid = msgid;
>     pthread_cond_signal(&ct->active);
>     pthread_mutex_unlock(&ct->lock);
>     /* <-- race window here --> */
>     uatomic_set(&ct->running, 0);
> 

You seem to be looking at old code. The code I am seeing has this:

       running = uatomic_xchg(&ct->running, 0);
       if (!running)
               pause();

Anyway, I think we had very good reasons to set the running state after
releasing the lock. I would need to dig deeper into the history in
order to figure out exactly why we did it this way. It probably has
something to do with the actual fatal races that we observed between
checkers timing out / being cancelled and finishing.

> When libcheck_check() observes ct->running != 0 during this window,
> the
> result is already available under lock, but the previous code could
> fail
> to collect it.
> 
> Why the old check was broken
> ----------------------------
> 
> The ALUA checker used:
> 
>     if (ct->state != PATH_PENDING || ct->msgid != MSG_ALUA_RUNNING)
> 
> This conflated two different meanings of PATH_PENDING:
> 1. Thread still in progress (no result yet)
> 2. ALUA state is transitioning (valid completed result with
>    MSG_ALUA_TRANSITIONING)

True, but what's the actual problem with this? In both cases, the
checker is indeed not yet able to figure out if the path is usable,
which is what CHECKER_PENDING means.

> 
> The TUR checker had similar issues since PATH_PENDING with
> MSG_TUR_TRANSITIONING is a valid result.
> 
> The fix
> -------
> 
> Check ct->msgid alone under lock. The msgid is set to MSG_*_RUNNING
> only
> at async check start and atomically updated with state when complete:
> 
>     if (ct->msgid != MSG_TUR_RUNNING) {
>         tur_status = ct->state;
>         c->msgid = ct->msgid;
>         ct->thread = 0;
>     }

You probably realize that with this change, you basically imply that
the test for ct->running is superfluous. If this was true, we could
actually just skip it and solely rely on ct->msgid. What you're
basically saying is that ct->running is meaningless. But it isn't. It
means that the thread is "dead", which it isn't as long as holds the
mutex.

I'll wait for Ben's opinion, but I am inclined to nack this, sorry.

Martin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] checkers: fix race in async TUR and ALUA result collection
  2026-02-24 13:26 ` [PATCH 0/1] checkers: fix race in async TUR and ALUA result collection Martin Wilck
@ 2026-02-25 20:41   ` Benjamin Marzinski
  2026-02-25 22:00     ` Martin Wilck
  0 siblings, 1 reply; 8+ messages in thread
From: Benjamin Marzinski @ 2026-02-25 20:41 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Brian Bunker, dm-devel

On Tue, Feb 24, 2026 at 02:26:22PM +0100, Martin Wilck wrote:
> Hi Brian,
> 
> On Mon, 2026-02-23 at 17:00 -0800, Brian Bunker wrote:
> > Fix a race condition in libcheck_check() for both the TUR and ALUA
> > async
> > checkers where a completed result could be missed, returning
> > PATH_PENDING
> > unnecessarily.
> 
> First of all, you should have cc'd the dm-devel mailing list.
> 
> Second, you call this a race condition, but what negative effect does
> it actually have? AFAIU this "race condition" just delays returning the
> new checker state by a single tick (i.e. 1 second), which is a rather
> benign problem for a race, and generally not a big problem for
> multipath, where paths can be offline for much longer periods of time.
> 
> Finally, before going into any detail, I am highly reluctant to changes
> in this area. Ben and I have fought with deadlocks and races in corner
> cases for a long time, and we seem to have solved it for good; at least
> we haven't seen any more bugs in this area for quite some time. For a
> change like this, figuring out possible side effects for all possible
> cases by code inspection alone is extremely hard. I'd rather not risk a
> regression just to detect a status change 1 second earlier than we
> currently do, in rare cases.

I pretty much agree with Martin here. I do think this is likely safe,
but I don't think this is really a race. We don't return the status of
the thread until the thread completes, and the thread signals its
completion by clearing ct->running. That's the same definition of
successful thread completion that we use for deciding to kill the
thread, and I don't see why it's not a valid one.

Yes we can get the thread result slightly before the thread has
completed. But by trusting that the thread will complete once pp->msgid
has changed, we are giving up the ability to kill it if it does hang
later, for instance in that condlog() call. On systemd systems, this is
just some stdio print functions, but on non-systemd systems, this uses
its own pthread locking, and could legitimately hang (although the
argument certainly can be made that if this is hanging, multipathd has
bigger problems to worry about than an extra tur checker thread lying
around).  
 
> > 
> > Race window
> > -----------
> > 
> > In libcheck_thread() / alua_thread():
> > 
> >     pthread_mutex_lock(&ct->lock);
> >     ct->state = state;
> >     ct->msgid = msgid;
> >     pthread_cond_signal(&ct->active);
> >     pthread_mutex_unlock(&ct->lock);
> >     /* <-- race window here --> */
> >     uatomic_set(&ct->running, 0);
> > 
> 
> You seem to be looking at old code. The code I am seeing has this:
> 
>        running = uatomic_xchg(&ct->running, 0);
>        if (!running)
>                pause();
> 
> Anyway, I think we had very good reasons to set the running state after
> releasing the lock. I would need to dig deeper into the history in
> order to figure out exactly why we did it this way. It probably has
> something to do with the actual fatal races that we observed between
> checkers timing out / being cancelled and finishing.
> 
> 
> > When libcheck_check() observes ct->running != 0 during this window,
> > the
> > result is already available under lock, but the previous code could
> > fail
> > to collect it.
> > 
> > Why the old check was broken
> > ----------------------------
> > 
> > The ALUA checker used:
> > 
> >     if (ct->state != PATH_PENDING || ct->msgid != MSG_ALUA_RUNNING)
> > 
> > This conflated two different meanings of PATH_PENDING:
> > 1. Thread still in progress (no result yet)
> > 2. ALUA state is transitioning (valid completed result with
> >    MSG_ALUA_TRANSITIONING)
> 
> True, but what's the actual problem with this? In both cases, the
> checker is indeed not yet able to figure out if the path is usable,
> which is what CHECKER_PENDING means.

This is where I'm stuck too. If there is a legitimate problem with the
current code, then that would certainly weigh against our caution here.
But AFAICS, this is just multipathd checking the thread and seeing that
it hasn't completed yet (to multipathd's definition of completed), and
waiting a second to recheck.

-Ben 
 
> 
> > 
> > The TUR checker had similar issues since PATH_PENDING with
> > MSG_TUR_TRANSITIONING is a valid result.
> > 
> > The fix
> > -------
> > 
> > Check ct->msgid alone under lock. The msgid is set to MSG_*_RUNNING
> > only
> > at async check start and atomically updated with state when complete:
> > 
> >     if (ct->msgid != MSG_TUR_RUNNING) {
> >         tur_status = ct->state;
> >         c->msgid = ct->msgid;
> >         ct->thread = 0;
> >     }
> 
> You probably realize that with this change, you basically imply that
> the test for ct->running is superfluous. If this was true, we could
> actually just skip it and solely rely on ct->msgid. What you're
> basically saying is that ct->running is meaningless. But it isn't. It
> means that the thread is "dead", which it isn't as long as holds the
> mutex.
> 
> I'll wait for Ben's opinion, but I am inclined to nack this, sorry.
> 
> Martin


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] checkers: fix race in async TUR and ALUA result collection
  2026-02-25 20:41   ` Benjamin Marzinski
@ 2026-02-25 22:00     ` Martin Wilck
  2026-02-26 18:48       ` Benjamin Marzinski
  0 siblings, 1 reply; 8+ messages in thread
From: Martin Wilck @ 2026-02-25 22:00 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: Brian Bunker, dm-devel

On Wed, 2026-02-25 at 15:41 -0500, Benjamin Marzinski wrote:
> 
> I pretty much agree with Martin here. I do think this is likely safe,
> but I don't think this is really a race. We don't return the status
> of
> the thread until the thread completes, and the thread signals its
> completion by clearing ct->running. That's the same definition of
> successful thread completion that we use for deciding to kill the
> thread, and I don't see why it's not a valid one.
> 
> Yes we can get the thread result slightly before the thread has
> completed. But by trusting that the thread will complete once pp-
> >msgid
> has changed, we are giving up the ability to kill it if it does hang
> later, for instance in that condlog() call. On systemd systems, this
> is
> just some stdio print functions, but on non-systemd systems, this
> uses
> its own pthread locking, and could legitimately hang (although the
> argument certainly can be made that if this is hanging, multipathd
> has
> bigger problems to worry about than an extra tur checker thread lying
> around).  

After some further pondering, I wonder if we could do this:

--- a/libmultipath/checkers/tur.c
+++ b/libmultipath/checkers/tur.c
@@ -267,6 +267,7 @@ void *libcheck_thread(struct checker_context *ctx)
        struct tur_checker_context *ct =
                container_of(ctx, struct tur_checker_context, ctx);
        int state, running;
+       dev_t devt;
        short msgid;

        /* This thread can be canceled, so setup clean up */
@@ -284,17 +285,17 @@ void *libcheck_thread(struct checker_context
*ctx)
        ct->state = state;
        ct->msgid = msgid;
        pthread_cond_signal(&ct->active);
+       running = uatomic_xchg(&ct->running, 0);
        pthread_mutex_unlock(&ct->lock);

-       condlog(4, "%d:%d : tur checker finished, state %s", major(ct-
>devt),
-               minor(ct->devt), checker_state_name(state));
-
-       running = uatomic_xchg(&ct->running, 0);
+       devt = ct->devt;
        if (!running)
                pause();

        tur_thread_cleanup_pop(ct);

+       condlog(4, "%d:%d : tur checker finished, state %s",
major(devt),
+               minor(devt), checker_state_name(state));
        return ((void *)0);
 }

@@ -356,7 +357,7 @@ int libcheck_check(struct checker * c)
                                pthread_mutex_unlock(&ct->lock);
                        }
                        ct->thread = 0;
-               } else if (uatomic_read(&ct->running) != 0) {
+               } else if (uatomic_add_return(&ct->running, 0) != 0) {
                        condlog(3, "%d:%d : tur checker not finished",
                                major(ct->devt), minor(ct->devt));
                        tur_status = PATH_PENDING;

This way the thread would "lie" about itself being not running any more
by setting ct->running to 0 before releasing the lock. AFAICS, in the
worst case, this can cause the main thread not to cancel it even though
it has timed out. But at this point it has already passed the point at
which it's most likely to hang. The only point at which it could still
block is the condlog(). By releasing ct before calling condlog() we can
make sure that even if the thread now blocks, it can't prevent freeing
of ct any more. We can basically forget about it at this point.

Replacing the uatomic_read() by a uatomic_add_return(..., 0) is a hack
that adds a memory barrier, which, together with the mutex in
libcheck_thread(), should make sure that the values of ct->running and
ct->msgid are consistent at this point in libcheck_check().

Together, these changes should fix Brian's issue.

The added memory barrier might make libcheck_check() a bit slower. I'm
not sure whether the difference would be noticeable. I'm not sure why
we didn't use a barrier there to begin with, perhaps to make this check
as light-weight as possible. The uatomic_read has been added in c3a839c
("libmultipath: cleanup tur locking").

I'm still undecided if it's worth changing this code in order to
reinstate some paths 1s earlier in certain sitautions.

Thoughts?

Martin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] checkers: fix race in async TUR and ALUA result collection
  2026-02-25 22:00     ` Martin Wilck
@ 2026-02-26 18:48       ` Benjamin Marzinski
  2026-02-27  8:36         ` Martin Wilck
  0 siblings, 1 reply; 8+ messages in thread
From: Benjamin Marzinski @ 2026-02-26 18:48 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Brian Bunker, dm-devel

On Wed, Feb 25, 2026 at 11:00:44PM +0100, Martin Wilck wrote:
> On Wed, 2026-02-25 at 15:41 -0500, Benjamin Marzinski wrote:
> > 
> > I pretty much agree with Martin here. I do think this is likely safe,
> > but I don't think this is really a race. We don't return the status
> > of
> > the thread until the thread completes, and the thread signals its
> > completion by clearing ct->running. That's the same definition of
> > successful thread completion that we use for deciding to kill the
> > thread, and I don't see why it's not a valid one.
> > 
> > Yes we can get the thread result slightly before the thread has
> > completed. But by trusting that the thread will complete once pp-
> > >msgid
> > has changed, we are giving up the ability to kill it if it does hang
> > later, for instance in that condlog() call. On systemd systems, this
> > is
> > just some stdio print functions, but on non-systemd systems, this
> > uses
> > its own pthread locking, and could legitimately hang (although the
> > argument certainly can be made that if this is hanging, multipathd
> > has
> > bigger problems to worry about than an extra tur checker thread lying
> > around).  
> 
> After some further pondering, I wonder if we could do this:
> 
> --- a/libmultipath/checkers/tur.c
> +++ b/libmultipath/checkers/tur.c
> @@ -267,6 +267,7 @@ void *libcheck_thread(struct checker_context *ctx)
>         struct tur_checker_context *ct =
>                 container_of(ctx, struct tur_checker_context, ctx);
>         int state, running;
> +       dev_t devt;
>         short msgid;
>  
>         /* This thread can be canceled, so setup clean up */
> @@ -284,17 +285,17 @@ void *libcheck_thread(struct checker_context
> *ctx)
>         ct->state = state;
>         ct->msgid = msgid;
>         pthread_cond_signal(&ct->active);
> +       running = uatomic_xchg(&ct->running, 0);
>         pthread_mutex_unlock(&ct->lock);
>  
> -       condlog(4, "%d:%d : tur checker finished, state %s", major(ct-
> >devt),
> -               minor(ct->devt), checker_state_name(state));
> -
> -       running = uatomic_xchg(&ct->running, 0);
> +       devt = ct->devt;
>         if (!running)
>                 pause();
>  
>         tur_thread_cleanup_pop(ct);
>  
> +       condlog(4, "%d:%d : tur checker finished, state %s",
> major(devt),
> +               minor(devt), checker_state_name(state));
>         return ((void *)0);
>  }
>  
> @@ -356,7 +357,7 @@ int libcheck_check(struct checker * c)
>                                 pthread_mutex_unlock(&ct->lock);
>                         }
>                         ct->thread = 0;
> -               } else if (uatomic_read(&ct->running) != 0) {
> +               } else if (uatomic_add_return(&ct->running, 0) != 0) {
>                         condlog(3, "%d:%d : tur checker not finished",
>                                 major(ct->devt), minor(ct->devt));
>                         tur_status = PATH_PENDING;
> 
> 
> This way the thread would "lie" about itself being not running any more
> by setting ct->running to 0 before releasing the lock. AFAICS, in the
> worst case, this can cause the main thread not to cancel it even though
> it has timed out. But at this point it has already passed the point at
> which it's most likely to hang. The only point at which it could still
> block is the condlog(). By releasing ct before calling condlog() we can
> make sure that even if the thread now blocks, it can't prevent freeing
> of ct any more. We can basically forget about it at this point.
> 
> Replacing the uatomic_read() by a uatomic_add_return(..., 0) is a hack
> that adds a memory barrier, which, together with the mutex in
> libcheck_thread(), should make sure that the values of ct->running and
> ct->msgid are consistent at this point in libcheck_check().
> 
> Together, these changes should fix Brian's issue.
> 
> The added memory barrier might make libcheck_check() a bit slower. I'm
> not sure whether the difference would be noticeable. I'm not sure why
> we didn't use a barrier there to begin with, perhaps to make this check
> as light-weight as possible. The uatomic_read has been added in c3a839c
> ("libmultipath: cleanup tur locking").
> 
> I'm still undecided if it's worth changing this code in order to
> reinstate some paths 1s earlier in certain sitautions.
> 
> Thoughts?

I don't see any problems with this. But I also didn't see any real
problems with Brian's version (although this version does remove most of
my theoretic issues. Like I said before. If we hang in the condlog,
we've got bigger problems than a stray checker thread). If we do this,
we should probably document the uatomic_add_return() call, or just use
cmm_smp_mb() directly. But I still not sure what we get out of this,
other than a very slight chance to finish the checker a second sooner. I
don't think that really makes a difference, but I don't really see much
risk in the patch either, so I guess I'm fine either way.

-Ben

> 
> Martin


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] checkers: fix race in async TUR and ALUA result collection
  2026-02-26 18:48       ` Benjamin Marzinski
@ 2026-02-27  8:36         ` Martin Wilck
  2026-03-04 17:38           ` Brian Bunker
  0 siblings, 1 reply; 8+ messages in thread
From: Martin Wilck @ 2026-02-27  8:36 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: Brian Bunker, dm-devel

On Thu, 2026-02-26 at 13:48 -0500, Benjamin Marzinski wrote:
> On Wed, Feb 25, 2026 at 11:00:44PM +0100, Martin Wilck wrote:
> > 
> > I'm still undecided if it's worth changing this code in order to
> > reinstate some paths 1s earlier in certain sitautions.
> > 
> > Thoughts?
> 
> I don't see any problems with this. But I also didn't see any real
> problems with Brian's version (although this version does remove most
> of
> my theoretic issues. Like I said before. If we hang in the condlog,
> we've got bigger problems than a stray checker thread). If we do
> this,
> we should probably document the uatomic_add_return() call,

Of course, this was not an official patch submission, just an idea how
we might improve this code. It's not exactly the same thing; Brian's
patch meant simply ignoring ct->running, while mine would make the ct-
>running check reliable, albeit with changed semantics.

>  or just use
> cmm_smp_mb() directly. But I still not sure what we get out of this,
> other than a very slight chance to finish the checker a second
> sooner.

Yes. But it's also a fact that the simple atomic_read() test in the
current code isn't reliable. It can well return true when ct->running
has already been set to false. I guess the implementation tends to be
as light-weight as possible, avoiding a memory barrier (or taking a
lock with the side effect of a memory barrier) in the "fast path". That
comes with the price that we may miss an already finished checker
thread.

> don't think that really makes a difference, but I don't really see
> much
> risk in the patch either, so I guess I'm fine either way.

If the risk was truly zero, I would agree. But experience tells that
it's very easy to get these things wrong for corner cases, and I tend
to want to avoid even a minimal risk just for "very slight chance to
finish the checker a second sooner", as you've expressed it very
nicely.

@Brian, feel free to try to convince us otherwise. It might be helpful
if we understood your use case better.

Martin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] checkers: fix race in async TUR and ALUA result collection
  2026-02-27  8:36         ` Martin Wilck
@ 2026-03-04 17:38           ` Brian Bunker
  2026-03-04 21:15             ` Martin Wilck
  2026-03-06 23:30             ` Benjamin Marzinski
  0 siblings, 2 replies; 8+ messages in thread
From: Brian Bunker @ 2026-03-04 17:38 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Benjamin Marzinski, dm-devel

Ben and Martin,

Sorry for the delayed response. I sent this to you both initially instead of
to the list because I thought you might have some insight into the issue.
I have an 'alua' checker that I will post where I found this issue. I
didn't actually
find it in the 'tur' checker but I could see it is prone to the same
issue. I have a
test I created to show the issue called test_tur.c. I will send this
too if it helps.
The problem when I didn't use this patch was that my checker the entire
checker thread would stall. It didn't just correct itself. Until other
paths failed
the checker thread was deadlocked somehow.

In the 'tur' checker without this change, the 'running != 0' will lead
to PATH_PENDING
and MSG_TUR_RUNNING without looking at what those actual values are.
It might actually
be finished, but running hasn't changed to 0 yet, and ct->thread is not
0 on returning. I am not sure what extra conditions were true when my checker
thread deadlocks, but with this change the deadlock never happens.

Thanks,
Brian

On Fri, Feb 27, 2026 at 12:36 AM Martin Wilck <mwilck@suse.com> wrote:
>
> On Thu, 2026-02-26 at 13:48 -0500, Benjamin Marzinski wrote:
> > On Wed, Feb 25, 2026 at 11:00:44PM +0100, Martin Wilck wrote:
> > >
> > > I'm still undecided if it's worth changing this code in order to
> > > reinstate some paths 1s earlier in certain sitautions.
> > >
> > > Thoughts?
> >
> > I don't see any problems with this. But I also didn't see any real
> > problems with Brian's version (although this version does remove most
> > of
> > my theoretic issues. Like I said before. If we hang in the condlog,
> > we've got bigger problems than a stray checker thread). If we do
> > this,
> > we should probably document the uatomic_add_return() call,
>
> Of course, this was not an official patch submission, just an idea how
> we might improve this code. It's not exactly the same thing; Brian's
> patch meant simply ignoring ct->running, while mine would make the ct-
> >running check reliable, albeit with changed semantics.
>
> >  or just use
> > cmm_smp_mb() directly. But I still not sure what we get out of this,
> > other than a very slight chance to finish the checker a second
> > sooner.
>
> Yes. But it's also a fact that the simple atomic_read() test in the
> current code isn't reliable. It can well return true when ct->running
> has already been set to false. I guess the implementation tends to be
> as light-weight as possible, avoiding a memory barrier (or taking a
> lock with the side effect of a memory barrier) in the "fast path". That
> comes with the price that we may miss an already finished checker
> thread.
>
> > don't think that really makes a difference, but I don't really see
> > much
> > risk in the patch either, so I guess I'm fine either way.
>
> If the risk was truly zero, I would agree. But experience tells that
> it's very easy to get these things wrong for corner cases, and I tend
> to want to avoid even a minimal risk just for "very slight chance to
> finish the checker a second sooner", as you've expressed it very
> nicely.
>
> @Brian, feel free to try to convince us otherwise. It might be helpful
> if we understood your use case better.
>
> Martin



-- 
Brian Bunker
PURE Storage, Inc.
brian@purestorage.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] checkers: fix race in async TUR and ALUA result collection
  2026-03-04 17:38           ` Brian Bunker
@ 2026-03-04 21:15             ` Martin Wilck
  2026-03-06 23:30             ` Benjamin Marzinski
  1 sibling, 0 replies; 8+ messages in thread
From: Martin Wilck @ 2026-03-04 21:15 UTC (permalink / raw)
  To: Brian Bunker; +Cc: Benjamin Marzinski, dm-devel

On Wed, 2026-03-04 at 09:38 -0800, Brian Bunker wrote:
> Ben and Martin,
> 
> Sorry for the delayed response. I sent this to you both initially
> instead of
> to the list because I thought you might have some insight into the
> issue.
> I have an 'alua' checker that I will post where I found this issue. I
> didn't actually
> find it in the 'tur' checker but I could see it is prone to the same
> issue. I have a
> test I created to show the issue called test_tur.c. I will send this
> too if it helps.
> The problem when I didn't use this patch was that my checker the
> entire
> checker thread would stall. It didn't just correct itself. Until
> other
> paths failed
> the checker thread was deadlocked somehow.
> 
> In the 'tur' checker without this change, the 'running != 0' will
> lead
> to PATH_PENDING
> and MSG_TUR_RUNNING without looking at what those actual values are.
> It might actually
> be finished, but running hasn't changed to 0 yet, and ct->thread is
> not
> 0 on returning. I am not sure what extra conditions were true when my
> checker
> thread deadlocks, but with this change the deadlock never happens.
> 

This is a stronger reason to make a change than 1 second gaine21:40
multipatd for a path reinstantiation. IIRC you made an analogous change
to the one you sent us for the TUR checker for your ALUA checker, and
that solved the deadlock issue for you? It's hard for me to understand
with the TUR checker coded in my mind. My guess is that your checker
does not work exactly like the TUR checker, and that you have some
general other issue. But again, without examining the new checker very
carefully and possibly testing, it's impossible to tell.

Could you try the approach I sent in my other email instead of yours,
and see if it makes a difference?

In general, I'm not surprised that you saw deadlocks when trying to
write a new checker. As I said previously, we've had our share of them
in the past with the TUR checker. This isn't easy to get right.

I suggest that you leave the TUR patch aside for now, make your checker
work, and send patches for review. Maybe when we see the code we will
be able to understand better what's going on.

Regards
Martin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] checkers: fix race in async TUR and ALUA result collection
  2026-03-04 17:38           ` Brian Bunker
  2026-03-04 21:15             ` Martin Wilck
@ 2026-03-06 23:30             ` Benjamin Marzinski
  1 sibling, 0 replies; 8+ messages in thread
From: Benjamin Marzinski @ 2026-03-06 23:30 UTC (permalink / raw)
  To: Brian Bunker; +Cc: Martin Wilck, dm-devel

On Wed, Mar 04, 2026 at 09:38:50AM -0800, Brian Bunker wrote:
> Ben and Martin,
> 
> Sorry for the delayed response. I sent this to you both initially instead of
> to the list because I thought you might have some insight into the issue.
> I have an 'alua' checker that I will post where I found this issue. I
> didn't actually
> find it in the 'tur' checker but I could see it is prone to the same
> issue. I have a
> test I created to show the issue called test_tur.c. I will send this
> too if it helps.

If you have a test that can trigger the issue you are seeing, that would
be really helpful it seeing what's going on here.

-Ben

> The problem when I didn't use this patch was that my checker the entire
> checker thread would stall. It didn't just correct itself. Until other
> paths failed
> the checker thread was deadlocked somehow.
> 
> In the 'tur' checker without this change, the 'running != 0' will lead
> to PATH_PENDING
> and MSG_TUR_RUNNING without looking at what those actual values are.
> It might actually
> be finished, but running hasn't changed to 0 yet, and ct->thread is not
> 0 on returning. I am not sure what extra conditions were true when my checker
> thread deadlocks, but with this change the deadlock never happens.
> 
> Thanks,
> Brian
> 
> On Fri, Feb 27, 2026 at 12:36 AM Martin Wilck <mwilck@suse.com> wrote:
> >
> > On Thu, 2026-02-26 at 13:48 -0500, Benjamin Marzinski wrote:
> > > On Wed, Feb 25, 2026 at 11:00:44PM +0100, Martin Wilck wrote:
> > > >
> > > > I'm still undecided if it's worth changing this code in order to
> > > > reinstate some paths 1s earlier in certain sitautions.
> > > >
> > > > Thoughts?
> > >
> > > I don't see any problems with this. But I also didn't see any real
> > > problems with Brian's version (although this version does remove most
> > > of
> > > my theoretic issues. Like I said before. If we hang in the condlog,
> > > we've got bigger problems than a stray checker thread). If we do
> > > this,
> > > we should probably document the uatomic_add_return() call,
> >
> > Of course, this was not an official patch submission, just an idea how
> > we might improve this code. It's not exactly the same thing; Brian's
> > patch meant simply ignoring ct->running, while mine would make the ct-
> > >running check reliable, albeit with changed semantics.
> >
> > >  or just use
> > > cmm_smp_mb() directly. But I still not sure what we get out of this,
> > > other than a very slight chance to finish the checker a second
> > > sooner.
> >
> > Yes. But it's also a fact that the simple atomic_read() test in the
> > current code isn't reliable. It can well return true when ct->running
> > has already been set to false. I guess the implementation tends to be
> > as light-weight as possible, avoiding a memory barrier (or taking a
> > lock with the side effect of a memory barrier) in the "fast path". That
> > comes with the price that we may miss an already finished checker
> > thread.
> >
> > > don't think that really makes a difference, but I don't really see
> > > much
> > > risk in the patch either, so I guess I'm fine either way.
> >
> > If the risk was truly zero, I would agree. But experience tells that
> > it's very easy to get these things wrong for corner cases, and I tend
> > to want to avoid even a minimal risk just for "very slight chance to
> > finish the checker a second sooner", as you've expressed it very
> > nicely.
> >
> > @Brian, feel free to try to convince us otherwise. It might be helpful
> > if we understood your use case better.
> >
> > Martin
> 
> 
> 
> -- 
> Brian Bunker
> PURE Storage, Inc.
> brian@purestorage.com


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-03-06 23:30 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260224010058.35337-1-brian@purestorage.com>
2026-02-24 13:26 ` [PATCH 0/1] checkers: fix race in async TUR and ALUA result collection Martin Wilck
2026-02-25 20:41   ` Benjamin Marzinski
2026-02-25 22:00     ` Martin Wilck
2026-02-26 18:48       ` Benjamin Marzinski
2026-02-27  8:36         ` Martin Wilck
2026-03-04 17:38           ` Brian Bunker
2026-03-04 21:15             ` Martin Wilck
2026-03-06 23:30             ` Benjamin Marzinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox