Re: [PATCH 03/13] multipathd: allow map removal in do_sync_mpp()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Benjamin Marzinski <bmarzins@redhat.com>
To: Martin Wilck <mwilck@suse.com>
Cc: Christophe Varoqui <christophe.varoqui@opensvc.com>,
	dm-devel@lists.linux.dev
Subject: Re: [PATCH 03/13] multipathd: allow map removal in do_sync_mpp()
Date: Wed, 11 Dec 2024 12:09:38 -0500	[thread overview]
Message-ID: <Z1nHUk8o3_qqPwED@redhat.com> (raw)
In-Reply-To: <ee6fcbda31fd1f13969653782417fbed748f5bc7.camel@suse.com>

On Wed, Dec 11, 2024 at 01:06:46PM +0100, Martin Wilck wrote:
> On Tue, 2024-12-10 at 18:30 -0500, Benjamin Marzinski wrote:
> > On Sat, Dec 07, 2024 at 12:36:07AM +0100, Martin Wilck wrote:
> > > We previously didn't allow map removal inside the checker loop. But
> > > with the late updates to the checkerloop code, it should be safe to
> > > orphan
> > > paths and delete maps even in this situation. We remove such maps
> > > everywhere
> > > else in the code already, whenever refresh_multipath() or
> > > setup_multipath()
> > > is called.
> > 
> > Actually, thinking about this more, what do we get by proactively
> > deleting the multipath device if something goes wrong in the checker?
> > If
> > we successfully reload a device, but can't sync it with the kernel,
> > that's one thing, But that was triggered by a change in the device,
> > and
> > we know that when we reloaded the device, device-mapper was working.
> > I'm
> > leery of possibly deleting the map because of a transient device-
> > mapper
> > issue.  I'm not sure if on a check that we do repeatedly, we should
> > delete the device on an error.  We haven't in the past, and as far as
> > I
> > know, it doesn't cause problems.  
> 
> I don't disagree. But the same can be said for basically all call
> chains where setup_multipath() is called for an existing map. I was
> just following the pattern that we use e.g. in ev_add_path(), or in
> update_mpp_prio(). Why would we treat the checker and path addition
> differently in this respect?

I'm confused here. ev_add_path() doesn't remove the device if the reload
fails. If a reload fails, the table should stay the same. That's why I
said that in other cases where we delete the device, we know that when
we just reloaded the device, device-mapper was working. Looking at the
code, that isn't really true. After failed reloads, we still call
setup_multipath to update our state, and we will delete the device if
that fails.

> If we look at this pragmatically (assuming that multipathd gets the
> parameters right), the most probable reason for a map reload failure is
> failure to open a path device in bdev_open(), either because the device
> doesn't exist, or because it's busy or otherwise unavailable. If this
> happens in ev_add_path(), the likely reason is that the path just added
> was busy, and the smartest action upon such a failure would probably be
> to just undo that addition. We currently don't do that; we remove the
> entire map, which is questionable, as you state correctly.

This is why we call setup_multipath after failed reloads, to make sure
multipathd's view of the multipath device resyncs with the kernel's,
which hasn't changed from what it was before the reload failed.

> In the checker, this can't happen. Obviously, no other process can grab
> a path device while the device mapper is holding it, so -EBUSY won't
> occur if we reload an existing map. Even device deletion doesn't cause
> failure on reload. It is possible to delete a SCSI device while it's
> mapped, and execute a table reload / suspend / resume cycle on the map
> while referencing the deleted device. The kernel keeps holding the
> reference to the deleted device, and will simply mark it as
> failed. This holds also if the mapped paths are re-grouped or re-
> ordered in the table. Failure occurs only if we temporarily remove the
> device from the map and re-add it, because as soon as the device is
> removed from the map's dm table, its refcount drops to zero, and it's
> gone for good.
> 
> IOW, reloading a map with a table containing only already-mapped
> devices will never fail, except in extreme situations like kernel OOM.

Maybe I should clarify my position a bit. I am fine with reloading the
device in the checkerloop if something has changed. This obviously
does run a very small risk of something going wrong and a device getting
removed unnecessarily, but we know that we need to reload the device, so
we should.

What I would rather avoid is reloading the device because we failed to
get it's state in do_sync_mpp(). We don't do this because we know that
something has changed. We do this as a safety measure to deal with
corner cases where our state doesn't match the kernel's and we didn't
get an event. Double checking this each time we check a path in a
device saves having to catch all these corner cases elsewhere. But it's
almost always completely unnecessary, and we're doing it on every
multipath device every couple of seconds, unlike reloading a device,
which is rare.

> Thus, AFAICS, the only relevant scenario where a reloading would fail
> is trying to add a path device that was not previously mapped, and
> that's either busy (perhaps in another map) or has been deleted, IOW
> only when we reload after calling adopt_paths(). This is where we could
> improve. If we fail to reload after adopting new paths, we could fall
> back to the existing table, and perhaps try to add paths one by one.
> Again, this is post-0.11 material.
> 
> OTOH, practially impossible is not totally impossible, so we need to be
> prepared to map reload failure either way. IMO the best thing we can do
> in this case is to keep using the kernel's map, and retry reloading
> later. 

I'm not actually worried about the kernel so much as libdevmapper. It is
not designed for multi-threaded processes, and that has bitten us in the
past. For intance, it's why we don't delete devices in dmevent_loop() on
libdevmapper errors. dm_get_events() just waits and retries if getting
the device list fails, and for each device, it calls dm_is_mpath and
will only delete a device on DM_IS_MPATH_NO, which is what I suggested
for the cleanup function.

I'm pretty sure we've handled all of the known issues here, with fixes
like:
02d4bf07 ("libmultipath: protect racy libdevmapper calls with a mutex")
34e01d2f ("multipath-tools: don't call dm_lib_release() any more")

I'd rather not risk having missed some issue that could cause a
temporary error in a function that we call every couple of seconds
(almost always unnecessarily).

-Ben

> The only critical situation is WWID change of path devices. We must try
> to fix this situation ASAP when we detect it. I'm unsure what the best
> action is if a reload fails in that situation, though (other than
> failing the path, as we already do).
> 
> Martin

next prev parent reply	other threads:[~2024-12-11 17:09 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-06 23:36 [PATCH 00/13] multipathd: More map reload handling, and checkerloop work Martin Wilck
2024-12-06 23:36 ` [PATCH 01/13] multipathd: don't reload map in update_mpp_prio() Martin Wilck
2024-12-06 23:36 ` [PATCH 02/13] multipathd: remove dm_get_info() call from refresh_multipath() Martin Wilck
2024-12-06 23:36 ` [PATCH 03/13] multipathd: allow map removal in do_sync_mpp() Martin Wilck
2024-12-10 19:02   ` Benjamin Marzinski
2024-12-10 19:44     ` Benjamin Marzinski
2024-12-10 21:05     ` Martin Wilck
2024-12-10 22:49       ` Benjamin Marzinski
2024-12-11 20:48         ` Martin Wilck
2024-12-10 23:30   ` Benjamin Marzinski
2024-12-11 12:06     ` Martin Wilck
2024-12-11 17:09       ` Benjamin Marzinski [this message]
2024-12-11 20:20         ` Martin Wilck
2024-12-11 20:33           ` Martin Wilck
2024-12-12 17:12             ` Benjamin Marzinski
2024-12-12 17:18               ` Martin Wilck
2024-12-12 17:50                 ` Benjamin Marzinski
2024-12-06 23:36 ` [PATCH 04/13] multipathd: reload maps in do_sync_mpp() if necessary Martin Wilck
2024-12-10 19:20   ` Benjamin Marzinski
2024-12-06 23:36 ` [PATCH 05/13] multipathd: move yielding for waiters to start of checkerloop Martin Wilck
2024-12-06 23:36 ` [PATCH 06/13] multipathd: add checker_finished() Martin Wilck
2024-12-06 23:36 ` [PATCH 07/13] multipathd: move "tick" calls into checker_finished() Martin Wilck
2024-12-06 23:36 ` [PATCH 08/13] multipathd: remove mpvec_garbage_collector() Martin Wilck
2024-12-10 23:34   ` Benjamin Marzinski
2024-12-06 23:36 ` [PATCH 09/13] multipathd: don't call reload_and_sync_map() from deferred_failback_tick() Martin Wilck
2024-12-06 23:36 ` [PATCH 10/13] multipathd: move retry_count_tick() into existing mpvec loop Martin Wilck
2024-12-06 23:36 ` [PATCH 11/13] multipathd: don't call update_map() from missing_uev_wait_tick() Martin Wilck
2024-12-10 23:13   ` Benjamin Marzinski
2024-12-06 23:36 ` [PATCH 12/13] multipathd: don't call udpate_map() from ghost_delay_tick() Martin Wilck
2024-12-06 23:36 ` [PATCH 13/13] multipathd: only call reload_and_sync_map() when ghost delay expires Martin Wilck
2024-12-11  0:02 ` [PATCH 00/13] multipathd: More map reload handling, and checkerloop work Benjamin Marzinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z1nHUk8o3_qqPwED@redhat.com \
    --to=bmarzins@redhat.com \
    --cc=christophe.varoqui@opensvc.com \
    --cc=dm-devel@lists.linux.dev \
    --cc=mwilck@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.