From: Lan <transter@gmail.com>
To: device-mapper development <dm-devel@redhat.com>
Subject: Re: what is the current utility in testing active paths from multipat hd?
Date: Wed, 27 Apr 2005 11:36:27 -0700 [thread overview]
Message-ID: <ac71172a05042711367e80e21d@mail.gmail.com> (raw)
In-Reply-To: <20050427170710.GU4431@marowsky-bree.de>
On 4/27/05, Lars Marowsky-Bree <lmb@suse.de> wrote:
> On 2005-04-27T12:27:32, "goggin, edward" <egoggin@emc.com> wrote:
>
> > Although I know it sounds a bit radical and counter intuitive,
> > but I'm not sure of the utility gained in the current multipathing
> > implementation by multipathd periodically testing paths which
> > are known to be in an active state in the multipath target driver.
> > Possibly someone can convince me otherwise.
>
> Because user-space doesn't know whether any IO has actually gone down a
> given path, and that would be the only time the kernel would detect the
> error.
>
> > If not, it may be possible to significantly reduce the cpu&io
> > resource utilization consumed by multipathd path testing on
> > enterprise scale configurations by only testing those paths
> > which the kernel thinks are in a failed state -- obviously a
> > much smaller set of paths.
>
> I could see not testing paths if we knew IO was hitting them; as an
> approximization, the active paths from the active PG might be omitted.
> However, the paths in the inactive PG all need to be tested, or else you
> are never going to find out that the paths have gone bad on you until
> you try to failover.
>
> The best way to minimize path (re-)testing needed is to figure in the
> hierarchy of components involved; as long as the FC switch is still bad,
> there's no point testing any target which we could reach through it,
> etc; testing whether the switch itself is healthy would round-robin
> through our various connections to the switch, to make sure we don't
> declare the switch down because we got hung up on one failed path.
>
> Another option would be to not mechanically test every N seconds, but to
> retest a failed path after 1s - 2s - 4s - ... 32s max as a cascading
> back-off, and maybe start at 2 - 64s for paths in inactive PGs.
>
> Not testing paths however isn't a real option.
>
I think it's a good idea to make a distinction between testing paths
for probing (i.e. making sure they have not gone dead) and for
reclamation. Possibly this would mean having two separate testing
threads. This way users could decide which policies they would want to
use for each type of testing. Some users may not care so much for
probing. For example, if they have large configurations and and are
willing to trade off immediate knowledge of system degradation for
saved cycles, then they may decide to not have probing at all, and can
live with having paths fail due to failed I/O. Or use a probing policy
that doesn't consume so many resources, e.g. use a lower probing
frequency than reclamation testing. Reclamation is more crucial I
think and would be of more concern for users. Enabling users to
determine the policy for reclamation, e.g. the testing frequency or
enable cascade-backoff, etc., would be good since factors for this
decision would be based on knowledge users have of their own
configuration and data load.
> > multipathd, this will no longer be true. This seems unlikely
> > apparently due to the difficulty in implementing consistently
> > accurate path testing in user space.
>
> Uh? How is path testing in user-space difficult?
>
> Sincerely,
> Lars Marowsky-Brée <lmb@suse.de>
>
> --
> High Availability & Clustering
> SUSE Labs, Research and Development
> SUSE LINUX Products GmbH - A Novell Business
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
>
next prev parent reply other threads:[~2005-04-27 18:36 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-04-27 16:27 what is the current utility in testing active paths from multipat hd? goggin, edward
2005-04-27 17:02 ` Alasdair G Kergon
2005-04-27 17:07 ` Lars Marowsky-Bree
2005-04-27 18:17 ` Mike Anderson
2005-04-27 20:10 ` Lars Marowsky-Bree
2005-04-27 20:23 ` christophe varoqui
2005-04-27 18:36 ` Lan [this message]
2005-04-28 16:37 ` Lars Marowsky-Bree
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ac71172a05042711367e80e21d@mail.gmail.com \
--to=transter@gmail.com \
--cc=dm-devel@redhat.com \
--cc=tranlan@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.