public inbox for dm-devel@redhat.com
 help / color / mirror / Atom feed
From: Benjamin Marzinski <bmarzins@redhat.com>
To: Brian Bunker <brian@purestorage.com>
Cc: Martin Wilck <mwilck@suse.com>, dm-devel@lists.linux.dev
Subject: Re: [PATCH 0/1] checkers: fix race in async TUR and ALUA result collection
Date: Fri, 6 Mar 2026 18:30:41 -0500	[thread overview]
Message-ID: <aatjoS5MGnXDqcfL@redhat.com> (raw)
In-Reply-To: <CAHZQxyLzqG-N5jOOrODL-O+RRLmwHiNO1wSUGC=gvYpVkQTbnA@mail.gmail.com>

On Wed, Mar 04, 2026 at 09:38:50AM -0800, Brian Bunker wrote:
> Ben and Martin,
> 
> Sorry for the delayed response. I sent this to you both initially instead of
> to the list because I thought you might have some insight into the issue.
> I have an 'alua' checker that I will post where I found this issue. I
> didn't actually
> find it in the 'tur' checker but I could see it is prone to the same
> issue. I have a
> test I created to show the issue called test_tur.c. I will send this
> too if it helps.

If you have a test that can trigger the issue you are seeing, that would
be really helpful it seeing what's going on here.

-Ben

> The problem when I didn't use this patch was that my checker the entire
> checker thread would stall. It didn't just correct itself. Until other
> paths failed
> the checker thread was deadlocked somehow.
> 
> In the 'tur' checker without this change, the 'running != 0' will lead
> to PATH_PENDING
> and MSG_TUR_RUNNING without looking at what those actual values are.
> It might actually
> be finished, but running hasn't changed to 0 yet, and ct->thread is not
> 0 on returning. I am not sure what extra conditions were true when my checker
> thread deadlocks, but with this change the deadlock never happens.
> 
> Thanks,
> Brian
> 
> On Fri, Feb 27, 2026 at 12:36 AM Martin Wilck <mwilck@suse.com> wrote:
> >
> > On Thu, 2026-02-26 at 13:48 -0500, Benjamin Marzinski wrote:
> > > On Wed, Feb 25, 2026 at 11:00:44PM +0100, Martin Wilck wrote:
> > > >
> > > > I'm still undecided if it's worth changing this code in order to
> > > > reinstate some paths 1s earlier in certain sitautions.
> > > >
> > > > Thoughts?
> > >
> > > I don't see any problems with this. But I also didn't see any real
> > > problems with Brian's version (although this version does remove most
> > > of
> > > my theoretic issues. Like I said before. If we hang in the condlog,
> > > we've got bigger problems than a stray checker thread). If we do
> > > this,
> > > we should probably document the uatomic_add_return() call,
> >
> > Of course, this was not an official patch submission, just an idea how
> > we might improve this code. It's not exactly the same thing; Brian's
> > patch meant simply ignoring ct->running, while mine would make the ct-
> > >running check reliable, albeit with changed semantics.
> >
> > >  or just use
> > > cmm_smp_mb() directly. But I still not sure what we get out of this,
> > > other than a very slight chance to finish the checker a second
> > > sooner.
> >
> > Yes. But it's also a fact that the simple atomic_read() test in the
> > current code isn't reliable. It can well return true when ct->running
> > has already been set to false. I guess the implementation tends to be
> > as light-weight as possible, avoiding a memory barrier (or taking a
> > lock with the side effect of a memory barrier) in the "fast path". That
> > comes with the price that we may miss an already finished checker
> > thread.
> >
> > > don't think that really makes a difference, but I don't really see
> > > much
> > > risk in the patch either, so I guess I'm fine either way.
> >
> > If the risk was truly zero, I would agree. But experience tells that
> > it's very easy to get these things wrong for corner cases, and I tend
> > to want to avoid even a minimal risk just for "very slight chance to
> > finish the checker a second sooner", as you've expressed it very
> > nicely.
> >
> > @Brian, feel free to try to convince us otherwise. It might be helpful
> > if we understood your use case better.
> >
> > Martin
> 
> 
> 
> -- 
> Brian Bunker
> PURE Storage, Inc.
> brian@purestorage.com


      parent reply	other threads:[~2026-03-06 23:30 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260224010058.35337-1-brian@purestorage.com>
2026-02-24 13:26 ` [PATCH 0/1] checkers: fix race in async TUR and ALUA result collection Martin Wilck
2026-02-25 20:41   ` Benjamin Marzinski
2026-02-25 22:00     ` Martin Wilck
2026-02-26 18:48       ` Benjamin Marzinski
2026-02-27  8:36         ` Martin Wilck
2026-03-04 17:38           ` Brian Bunker
2026-03-04 21:15             ` Martin Wilck
2026-03-06 23:30             ` Benjamin Marzinski [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aatjoS5MGnXDqcfL@redhat.com \
    --to=bmarzins@redhat.com \
    --cc=brian@purestorage.com \
    --cc=dm-devel@lists.linux.dev \
    --cc=mwilck@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox