All of lore.kernel.org
 help / color / mirror / Atom feed
From: Benjamin Marzinski <bmarzins@redhat.com>
To: Martin Wilck <martin.wilck@suse.com>
Cc: Christophe Varoqui <christophe.varoqui@opensvc.com>,
	device-mapper development <dm-devel@lists.linux.dev>
Subject: Re: [PATCH 3/7] multipathd: refresh multipath before calling set_no_path_retry()
Date: Tue, 2 Jul 2024 14:10:53 -0400	[thread overview]
Message-ID: <ZoRCrVoNbIntNH__@redhat.com> (raw)
In-Reply-To: <d9c5652f9f0a17db466354d4b5d10ca04a959215.camel@suse.com>

On Tue, Jul 02, 2024 at 09:34:17AM +0200, Martin Wilck wrote:
> On Mon, 2024-07-01 at 14:41 -0400, Benjamin Marzinski wrote:
> > 
> > 
> > 
> > My purpose in writing 0f850db wasn't to cut down on ioctls. It was to
> > keep set_no_path_retry() from incorrectly handling queueing in cases
> > where no_path_retry was set to a number, and the device had no usable
> > paths.  My purpose in this patch is also to make sure
> > set_no_path_retry() does the right thing. Obviously, this one is more
> > of
> > a corner case, and if instead we were occasionally resyncing with the
> > kernel even when a multipath device had no paths, that would solve
> > this
> > issue, and I'd be fine with dropping this patch.
> 
> That sounds good.
> 
> > So unless it conflicts with work you're already doing, I'll send a
> > new
> > patchset that changes checkerloop() to pull the multipath device
> > updating work out of check_path(). I'm not sure that it makes a lot
> > of
> > sense to change how we loop through the paths, since check_path()
> > does
> > work on some paths that aren't assigned to a multipath device. 
> > Plus
> > different paths in a multipath device will need checking at different
> > times depending on their states anyway, so you can't just go through
> > all
> > the paths at once.
> 
> While I agree that syncing the map state just once per checkerloop and
> map is the right thing, I believe we should do it as soon as we have
> checked all paths that need to be checked for the given map. And that's
> most easily done by looping over the mpath vector, like this:
> 
> for each mpp
>     for each path in mpp
>         if path needs to be checked:
>             check path
>     sync mpp

Huh? Are you suggesting syncing the mpp state every second? That would
be an increase in the number of ioctls we're doing in many cases. For
instance, a 4 path device with all paths working and a default
configuration will sync the map 4 times for every 20 seconds.
Unfortunately, it's possible that all these syncs would happend in a
clump and then we'd wait 20 seconds to rapidly sync 4 times again.  An 8
path device with all paths down would sync 8 times every 5 seconds. In
this case syncing once a second would be less work, but in general this
looks like multipathd doing more ioctls. 

The primary purpose of these periodic syncs is to protect us from
external changes that don't trigger a dm event. This is a rare enough
thing that syncing every max_checkint seems often enough. So is "sync
mpp" just shorthand for "sync mpp if it needs to be synced", or is there
some benefit that I'm overlooking to sync every second?

> 
> for each path
>     if (!pp->mpp and path needs to be checked)
>           check path
> 
> Of course we could do this in multiple steps and move the mpp sync out
> of the path loop first, like you suggested. But if we do this, there
> may be a considerable time delay between checking the paths of a given
> map and syncing the related mpp state, especially on systems with a
> large number of devices, and that might be problematic.
>
> Path status changes are asynchronous by nature, and thus whatever we do
> will not be perfect. IMO it makes sense to do all checks related to a
> given map in as short a time interval as possible. This way we are most
> likely to get a consistent picture of the current state of the map in
> question in the kernel.

I'm not against your idea, but the way to code is now, even doing your
mpp loop doesn't really guarantee that. Just because all the paths get
checked every 20 seconds, doesn't mean that they will be ready to be
checked on the same second. For paths that get added after the multipath
device is created, this is often not the case. And even if the paths
start in sync, there are multiple reasons why they can drop out of sync.

But it we wanted to, we probably could fix that and space the checkers
out better at the same time. I don't think it would be that hard to make
the paths change their next check tick by a second every so often so that
all the paths in the same state from a given multipath device run their
checkers at the same time. We could even make the different multipath
devices spread out when they are checked, so that they aren't trying to
all run their checkers at once.

I'll take a stab at this in my updated patchset.

-Ben

> In the long run, I think what we should do is use asynchronous checkers
> exclusively. And instead of waiting a millisecond for them to complete
> synchronously, we should trigger the path check on all paths (that need
> to be checked) in one go, then wait for all checkers to complete (this
> is the difficult part, I know), and then go over all the results, again
> without waiting for anything.
> 
> Another project I've been wanting to do for a long time :-/
> 
> Regards
> Martin
> 


  reply	other threads:[~2024-07-02 18:11 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-05 23:22 [PATCH 0/7] multipath-tools: cli_add_map cleanup and misc fixes Benjamin Marzinski
2024-06-05 23:22 ` [PATCH 1/7] multipathd: fix flush check in flush_map() Benjamin Marzinski
2024-06-27  9:31   ` Martin Wilck
2024-06-05 23:22 ` [PATCH 2/7] libmultipath: check for not PATH_UP in detect_alua Benjamin Marzinski
2024-06-27  9:31   ` Martin Wilck
2024-06-05 23:22 ` [PATCH 3/7] multipathd: refresh multipath before calling set_no_path_retry() Benjamin Marzinski
2024-06-27  7:47   ` Martin Wilck
2024-06-27 18:32     ` Benjamin Marzinski
2024-06-27 18:39       ` Benjamin Marzinski
2024-06-27 21:31         ` Martin Wilck
2024-07-01 18:41           ` Benjamin Marzinski
2024-07-02  7:34             ` Martin Wilck
2024-07-02 18:10               ` Benjamin Marzinski [this message]
2024-07-04  7:19                 ` Martin Wilck
2024-06-05 23:22 ` [PATCH 4/7] libmultipath: add name and minor outputs for dm_map_present_by_uuid() Benjamin Marzinski
2024-06-27  9:27   ` Martin Wilck
2024-06-27 18:48     ` Benjamin Marzinski
2024-06-27 20:42       ` Martin Wilck
2024-06-05 23:22 ` [PATCH 5/7] multipath-tools: Makefile.inc: compile with -fexceptions Benjamin Marzinski
2024-06-27  9:32   ` Martin Wilck
2024-06-05 23:22 ` [PATCH 6/7] multipathd: free alias if cli_add_map() is cancelled Benjamin Marzinski
2024-06-27  9:32   ` Martin Wilck
2024-06-05 23:22 ` [PATCH 7/7] multipathd: make cli_add_map() handle adding maps by WWID correctly Benjamin Marzinski
2024-06-27  9:33   ` Martin Wilck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZoRCrVoNbIntNH__@redhat.com \
    --to=bmarzins@redhat.com \
    --cc=christophe.varoqui@opensvc.com \
    --cc=dm-devel@lists.linux.dev \
    --cc=martin.wilck@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.