All of lore.kernel.org
 help / color / mirror / Atom feed
From: Benjamin Marzinski <bmarzins@redhat.com>
To: Brian Bunker <brian@purestorage.com>
Cc: Martin Wilck <martin.wilck@suse.com>,
	dm-devel@lists.linux.dev,
	Krishna Kant <krishna.kant@purestorage.com>
Subject: Re: [PATCH] Add purge capability to multipath-tools
Date: Thu, 4 Dec 2025 15:03:49 -0500	[thread overview]
Message-ID: <aTHpJVhhDKrJxhBU@redhat.com> (raw)
In-Reply-To: <CAHZQxyJ0Y7Tx4UC4je40s6dQKDE942qo-OwSpnFUeJcocozL-Q@mail.gmail.com>

On Thu, Dec 04, 2025 at 08:55:58AM -0800, Brian Bunker wrote:
> On Wed, Dec 3, 2025 at 11:37 AM Benjamin Marzinski <bmarzins@redhat.com> wrote:
> >
> > On Thu, Nov 27, 2025 at 12:31:12PM +0100, Martin Wilck wrote:
> > > On Mon, 2025-11-24 at 15:11 -0500, Benjamin Marzinski wrote:
> > > > On Fri, Nov 21, 2025 at 02:35:23PM -0800, Brian Bunker wrote:
> > > >
> > > > >
> > > > > I added the purge thread because I didn't want to starve the
> > > > > checker thread at a large disconnect volume scale. I noticed
> > > > > that the number of devices if I purged inline with the check that
> > > > > it didn't scale linearly after a point and seemed to be
> > > > > significantly
> > > > > starving the checker thread. Doing the purge in another thread
> > > > > seemed to relieve that pressure.
> > > >
> > > > That might be because the check thread has safeguards to try to avoid
> > > > starving the other threads. Since a lot of multipathd's work is gated
> > > > by
> > > > the vecs lock, there's only so much parallelism that can happen with
> > > > multiple threads. If deleting the devices is taking a long time, it
> > > > might
> > > > be better for this to get interrupted, so that other threads can run.
> > > >
> > > > Since purging the paths is fairly low priority, we could continue to
> > > > run
> > > > it in its own thread, but instead of running through all the devices
> > > > at
> > > > once, purgeloop could lock the vecs->lock, handle one device, and
> > > > then
> > > > unlock and loop. This means it would need to search through the list
> > > > from the start each time it regrabbed the lock, since the list could
> > > > have changed while it wasn't holding it. When purgeloop makes it all
> > > > the
> > > > way through the list without finding any paths that are due for a
> > > > delete
> > > > attempt, it can sleep or block waiting for more paths.
> > > >
> > > > This would mean that paths would need to remember if they had already
> > > > been handled in a cycle. That could be done by purgeloop keeping
> > > > track
> > > > of which cycle it was on, and the paths storing the cycle number
> > > > whenever they were checked.
> > >
> > > As I wrote in my other post, wish that this thread wouldn't hold the
> > > vecs lock at all. multipathd could simply use a pipe to write dev_t's
> > > of to-be-purged paths to the purge thread (or process :-) ).
> >
> > This seems like a good idea. Obviously, writing to sysfs while the vecs
> > lock is being held means that multipath will continue to have that scsi
> > device open, so there's no chance of the device name getting reused if
> > the device is removed from the system. But this other thread/process
> > could simply open the scsi device, double check that it's still
> > disconnected, write to sysfs, and then close the device. That would
> > offer the same guarantee, without the need to interact with anything
> > else.
> >
> > > Martin
> >
> Martin,

Actually, this wasn't Martin, it was me.

> If you look at the v3 implementation that I sent up yesterday, you will
> see we are no longer holding the vecs lock during the sysfs delete
> operation. We also made the delete operation non-blocking.
> 
> The two places in the purgeloop where we do hold the lock are for two
> very short operations:
> 
> 1. We scan the vector to find one path that needs purging. We update
> pp->purge_cycle and increment the retry counter pp->purge_retries++.
> Then we copy the necessary data for the purge into the purge_info
> data structure. Then we release the lock. There shouldn't be much
> contention here with these operations.
> 
> 2. We also re-acquire the lock to make sure the path still exists in
> the pathvec and clear the purge_path and reset purge_retries. This should
> also be very fast.
> 
> We can't take the approach to open the device since it is disconnected.
> I think it will just fail or hang. We have the udev reference to make sure
> the device object is alive and not freed underneath us.

That makes sense. Unfortunately, I checked, and holding a reference on
the udev device does not keep the scsi device from being deleted, or a
new device from being created with the same devnode. 

My comments on your v3 patch talk about using a dup() to copy the
existing pp->fd to deal with this, which should work, regardless of the
path state.

-Ben
 
> Thanks,
> Brian
> 
> -- 
> Brian Bunker
> PURE Storage, Inc.
> brian@purestorage.com


      reply	other threads:[~2025-12-04 20:03 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-13 18:43 [PATCH] Add purge capability to multipath-tools Brian Bunker
2025-11-21 19:07 ` Benjamin Marzinski
2025-11-21 22:35   ` Brian Bunker
2025-11-24 20:11     ` Benjamin Marzinski
2025-11-27 11:31       ` Martin Wilck
2025-12-03 19:37         ` Benjamin Marzinski
2025-12-04 16:55           ` Brian Bunker
2025-12-04 20:03             ` Benjamin Marzinski [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aTHpJVhhDKrJxhBU@redhat.com \
    --to=bmarzins@redhat.com \
    --cc=brian@purestorage.com \
    --cc=dm-devel@lists.linux.dev \
    --cc=krishna.kant@purestorage.com \
    --cc=martin.wilck@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.