* what is the current utility in testing active paths from multipat hd?
@ 2005-04-27 16:27 goggin, edward
2005-04-27 17:02 ` Alasdair G Kergon
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: goggin, edward @ 2005-04-27 16:27 UTC (permalink / raw)
To: device-mapper development
Although I know it sounds a bit radical and counter intuitive,
but I'm not sure of the utility gained in the current multipathing
implementation by multipathd periodically testing paths which
are known to be in an active state in the multipath target driver.
Possibly someone can convince me otherwise.
If not, it may be possible to significantly reduce the cpu&io
resource utilization consumed by multipathd path testing on
enterprise scale configurations by only testing those paths
which the kernel thinks are in a failed state -- obviously a
much smaller set of paths. Paths known to be in a failed
state in the multipath target driver must be tested since it is
currently the sole responsibility of multipathd initiated
invocations of multipath to make these paths usable in the
kernel again by changing their state to active in the multipath
target driver.
The path testing is done in checkerloop in multipathd/main.c.
This function is really only interested in cases where the
multipathd view on a path's state has changed, that is, from
active to failed or failed to active. The other two cases are
uninteresting.
Furthermore, while the checkloop function reacts immediately
to a multipathd state transition of failed to active, the code
appears little interested (other than updating the multipathd
path state to failed) in the case where the multipathd path
state changes from active to failed.
Certainly, the risk is in having multipathd path state not being
updated periodically to reflect path test failures on paths which
incur little to no io traffic. Paths that see any io after a path failure
will have the multipathd path state updated to reflect the
kernel's path state via mark_failed_path invoked from a device
mapper io event callback.
Yet, unlike the multipath configurator, the multipathd code
currently appears to have little utility for keeping its own path
state separate from the kernel's. This makes me believe that
there is little to no utility currently gained by having multipathd
test paths which the kernel thinks are active. Certainly, if the
multipathd/multipath code changes to update kernel path state
from active to failed as a result of failed path tests done by
multipathd, this will no longer be true. This seems unlikely
apparently due to the difficulty in implementing consistently
accurate path testing in user space.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: what is the current utility in testing active paths from multipat hd?
2005-04-27 16:27 what is the current utility in testing active paths from multipat hd? goggin, edward
@ 2005-04-27 17:02 ` Alasdair G Kergon
2005-04-27 17:07 ` Lars Marowsky-Bree
2005-04-28 16:37 ` Lars Marowsky-Bree
2 siblings, 0 replies; 8+ messages in thread
From: Alasdair G Kergon @ 2005-04-27 17:02 UTC (permalink / raw)
To: device-mapper development
On receiving an event from the kernel, I think the daemon should:
compare the current status with the previous status to see what changed;
add newly-failed paths to its list of failed paths that need retesting;
immediately test newly-failed paths and schedule further regular
testing for them;
if the number of active paths in the currently active PG is getting low,
immediately retest other failed paths in that PG
Periodically the daemon should wake itself up to test the failed paths
it knows about.
It should also test active paths to preempt problems: if an
inactive path has failed, the admin needs to know the system is
running in a degraded state.
The frequency of the testing should be configurable in severals
ways - it could depend on the size of the system, the
load, the time since each path was last tested, time since
last successful test of the path, type of hardware (what's the
perfomance impact of a test? e.g. trespass issues), structure containing
the path (more important to test failed paths in the current PG than other PGs;
if you're using round-robin and you know there's always I/O happening to the
device, it's more important to test active paths in other PGs) etc.
Alasdair
--
agk@redhat.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: what is the current utility in testing active paths from multipat hd?
2005-04-27 16:27 what is the current utility in testing active paths from multipat hd? goggin, edward
2005-04-27 17:02 ` Alasdair G Kergon
@ 2005-04-27 17:07 ` Lars Marowsky-Bree
2005-04-27 18:17 ` Mike Anderson
2005-04-27 18:36 ` Lan
2005-04-28 16:37 ` Lars Marowsky-Bree
2 siblings, 2 replies; 8+ messages in thread
From: Lars Marowsky-Bree @ 2005-04-27 17:07 UTC (permalink / raw)
To: device-mapper development
On 2005-04-27T12:27:32, "goggin, edward" <egoggin@emc.com> wrote:
> Although I know it sounds a bit radical and counter intuitive,
> but I'm not sure of the utility gained in the current multipathing
> implementation by multipathd periodically testing paths which
> are known to be in an active state in the multipath target driver.
> Possibly someone can convince me otherwise.
Because user-space doesn't know whether any IO has actually gone down a
given path, and that would be the only time the kernel would detect the
error.
> If not, it may be possible to significantly reduce the cpu&io
> resource utilization consumed by multipathd path testing on
> enterprise scale configurations by only testing those paths
> which the kernel thinks are in a failed state -- obviously a
> much smaller set of paths.
I could see not testing paths if we knew IO was hitting them; as an
approximization, the active paths from the active PG might be omitted.
However, the paths in the inactive PG all need to be tested, or else you
are never going to find out that the paths have gone bad on you until
you try to failover.
The best way to minimize path (re-)testing needed is to figure in the
hierarchy of components involved; as long as the FC switch is still bad,
there's no point testing any target which we could reach through it,
etc; testing whether the switch itself is healthy would round-robin
through our various connections to the switch, to make sure we don't
declare the switch down because we got hung up on one failed path.
Another option would be to not mechanically test every N seconds, but to
retest a failed path after 1s - 2s - 4s - ... 32s max as a cascading
back-off, and maybe start at 2 - 64s for paths in inactive PGs.
Not testing paths however isn't a real option.
> multipathd, this will no longer be true. This seems unlikely
> apparently due to the difficulty in implementing consistently
> accurate path testing in user space.
Uh? How is path testing in user-space difficult?
Sincerely,
Lars Marowsky-Brée <lmb@suse.de>
--
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: what is the current utility in testing active paths from multipat hd?
2005-04-27 17:07 ` Lars Marowsky-Bree
@ 2005-04-27 18:17 ` Mike Anderson
2005-04-27 20:10 ` Lars Marowsky-Bree
2005-04-27 18:36 ` Lan
1 sibling, 1 reply; 8+ messages in thread
From: Mike Anderson @ 2005-04-27 18:17 UTC (permalink / raw)
To: device-mapper development
Lars Marowsky-Bree [lmb@suse.de] wrote:
> On 2005-04-27T12:27:32, "goggin, edward" <egoggin@emc.com> wrote:
> > If not, it may be possible to significantly reduce the cpu&io
> > resource utilization consumed by multipathd path testing on
> > enterprise scale configurations by only testing those paths
> > which the kernel thinks are in a failed state -- obviously a
> > much smaller set of paths.
>
> I could see not testing paths if we knew IO was hitting them; as an
> approximization, the active paths from the active PG might be omitted.
> However, the paths in the inactive PG all need to be tested, or else you
> are never going to find out that the paths have gone bad on you until
> you try to failover.
>
> The best way to minimize path (re-)testing needed is to figure in the
> hierarchy of components involved; as long as the FC switch is still bad,
> there's no point testing any target which we could reach through it,
> etc; testing whether the switch itself is healthy would round-robin
> through our various connections to the switch, to make sure we don't
> declare the switch down because we got hung up on one failed path.
>
Once support gets completed / utilized the fc_transport class should
provide data on the link state and the port state which could be provide
indication of path health for deciding if to send a patch check cmd. This
would add complication to the tester as each new transport would need some
type of handler.
> Another option would be to not mechanically test every N seconds, but to
> retest a failed path after 1s - 2s - 4s - ... 32s max as a cascading
> back-off, and maybe start at 2 - 64s for paths in inactive PGs.
>
A cascading backoff / staggered timer would require less topology
knowledge than the above path health testing method and would provide the
reduce IO loading desired (depending on how high a user was willing to go
on setting the delta between path tests).
-andmike
--
Michael Anderson
andmike@us.ibm.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: what is the current utility in testing active paths from multipat hd?
2005-04-27 17:07 ` Lars Marowsky-Bree
2005-04-27 18:17 ` Mike Anderson
@ 2005-04-27 18:36 ` Lan
1 sibling, 0 replies; 8+ messages in thread
From: Lan @ 2005-04-27 18:36 UTC (permalink / raw)
To: device-mapper development
On 4/27/05, Lars Marowsky-Bree <lmb@suse.de> wrote:
> On 2005-04-27T12:27:32, "goggin, edward" <egoggin@emc.com> wrote:
>
> > Although I know it sounds a bit radical and counter intuitive,
> > but I'm not sure of the utility gained in the current multipathing
> > implementation by multipathd periodically testing paths which
> > are known to be in an active state in the multipath target driver.
> > Possibly someone can convince me otherwise.
>
> Because user-space doesn't know whether any IO has actually gone down a
> given path, and that would be the only time the kernel would detect the
> error.
>
> > If not, it may be possible to significantly reduce the cpu&io
> > resource utilization consumed by multipathd path testing on
> > enterprise scale configurations by only testing those paths
> > which the kernel thinks are in a failed state -- obviously a
> > much smaller set of paths.
>
> I could see not testing paths if we knew IO was hitting them; as an
> approximization, the active paths from the active PG might be omitted.
> However, the paths in the inactive PG all need to be tested, or else you
> are never going to find out that the paths have gone bad on you until
> you try to failover.
>
> The best way to minimize path (re-)testing needed is to figure in the
> hierarchy of components involved; as long as the FC switch is still bad,
> there's no point testing any target which we could reach through it,
> etc; testing whether the switch itself is healthy would round-robin
> through our various connections to the switch, to make sure we don't
> declare the switch down because we got hung up on one failed path.
>
> Another option would be to not mechanically test every N seconds, but to
> retest a failed path after 1s - 2s - 4s - ... 32s max as a cascading
> back-off, and maybe start at 2 - 64s for paths in inactive PGs.
>
> Not testing paths however isn't a real option.
>
I think it's a good idea to make a distinction between testing paths
for probing (i.e. making sure they have not gone dead) and for
reclamation. Possibly this would mean having two separate testing
threads. This way users could decide which policies they would want to
use for each type of testing. Some users may not care so much for
probing. For example, if they have large configurations and and are
willing to trade off immediate knowledge of system degradation for
saved cycles, then they may decide to not have probing at all, and can
live with having paths fail due to failed I/O. Or use a probing policy
that doesn't consume so many resources, e.g. use a lower probing
frequency than reclamation testing. Reclamation is more crucial I
think and would be of more concern for users. Enabling users to
determine the policy for reclamation, e.g. the testing frequency or
enable cascade-backoff, etc., would be good since factors for this
decision would be based on knowledge users have of their own
configuration and data load.
> > multipathd, this will no longer be true. This seems unlikely
> > apparently due to the difficulty in implementing consistently
> > accurate path testing in user space.
>
> Uh? How is path testing in user-space difficult?
>
> Sincerely,
> Lars Marowsky-Brée <lmb@suse.de>
>
> --
> High Availability & Clustering
> SUSE Labs, Research and Development
> SUSE LINUX Products GmbH - A Novell Business
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: what is the current utility in testing active paths from multipat hd?
2005-04-27 18:17 ` Mike Anderson
@ 2005-04-27 20:10 ` Lars Marowsky-Bree
2005-04-27 20:23 ` christophe varoqui
0 siblings, 1 reply; 8+ messages in thread
From: Lars Marowsky-Bree @ 2005-04-27 20:10 UTC (permalink / raw)
To: device-mapper development
On 2005-04-27T11:17:02, Mike Anderson <andmike@us.ibm.com> wrote:
> Once support gets completed / utilized the fc_transport class should
> provide data on the link state and the port state which could be provide
> indication of path health for deciding if to send a patch check cmd. This
> would add complication to the tester as each new transport would need some
> type of handler.
ACK. Yes, this is part of the additional information to use I was
referring to. As long as the port is down, why bother...
> > Another option would be to not mechanically test every N seconds, but to
> > retest a failed path after 1s - 2s - 4s - ... 32s max as a cascading
> > back-off, and maybe start at 2 - 64s for paths in inactive PGs.
> >
> A cascading backoff / staggered timer would require less topology
> knowledge than the above path health testing method and would provide the
> reduce IO loading desired (depending on how high a user was willing to go
> on setting the delta between path tests).
Yes, it's easier, but it also slows down responsiveness and path
reactivation of course. One can argue that the combination of the two
works; we only retest every path every N seconds, but we interleave
them, so that essentially we test a path every N/M seconds; and as soon
as one path finds a state change, we shorten the timers for all paths so
they get all tested faster.
That's probably a pretty sophisticated heuristic which would work
reasonably well w/o any additional configuration.
Sincerely,
Lars Marowsky-Brée <lmb@suse.de>
--
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: what is the current utility in testing active paths from multipat hd?
2005-04-27 20:10 ` Lars Marowsky-Bree
@ 2005-04-27 20:23 ` christophe varoqui
0 siblings, 0 replies; 8+ messages in thread
From: christophe varoqui @ 2005-04-27 20:23 UTC (permalink / raw)
To: device-mapper development
On mer, 2005-04-27 at 22:10 +0200, Lars Marowsky-Bree wrote:
> On 2005-04-27T11:17:02, Mike Anderson <andmike@us.ibm.com> wrote:
>
> > Once support gets completed / utilized the fc_transport class should
> > provide data on the link state and the port state which could be provide
> > indication of path health for deciding if to send a patch check cmd. This
> > would add complication to the tester as each new transport would need some
> > type of handler.
>
> ACK. Yes, this is part of the additional information to use I was
> referring to. As long as the port is down, why bother...
>
> > > Another option would be to not mechanically test every N seconds, but to
> > > retest a failed path after 1s - 2s - 4s - ... 32s max as a cascading
> > > back-off, and maybe start at 2 - 64s for paths in inactive PGs.
> > >
> > A cascading backoff / staggered timer would require less topology
> > knowledge than the above path health testing method and would provide the
> > reduce IO loading desired (depending on how high a user was willing to go
> > on setting the delta between path tests).
>
> Yes, it's easier, but it also slows down responsiveness and path
> reactivation of course. One can argue that the combination of the two
> works; we only retest every path every N seconds, but we interleave
> them, so that essentially we test a path every N/M seconds; and as soon
> as one path finds a state change, we shorten the timers for all paths so
> they get all tested faster.
>
> That's probably a pretty sophisticated heuristic which would work
> reasonably well w/o any additional configuration.
>
I'm in the process of plugging a uevent listener into a new daemon
thread.
Right now, the goal is to trap sysfs block add/remove events so that I
can get rid of the multipath -> multipathd signaling.
I can see a bright future where the sysfs transport classes will
broadcast over that bus too. And we'll be ready to play with these new
events.
Regards,
--
christophe varoqui <christophe.varoqui@free.fr>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: what is the current utility in testing active paths from multipat hd?
2005-04-27 16:27 what is the current utility in testing active paths from multipat hd? goggin, edward
2005-04-27 17:02 ` Alasdair G Kergon
2005-04-27 17:07 ` Lars Marowsky-Bree
@ 2005-04-28 16:37 ` Lars Marowsky-Bree
2 siblings, 0 replies; 8+ messages in thread
From: Lars Marowsky-Bree @ 2005-04-28 16:37 UTC (permalink / raw)
To: device-mapper development
On 2005-04-27T12:27:32, "goggin, edward" <egoggin@emc.com> wrote:
> Although I know it sounds a bit radical and counter intuitive,
> but I'm not sure of the utility gained in the current multipathing
> implementation by multipathd periodically testing paths which
> are known to be in an active state in the multipath target driver.
> Possibly someone can convince me otherwise.
Ed, I misread your mail and missed the point. Your more succinct summary
on the conf call today made it all clear to me; indeed this doesn't just
need optimization, it needs fixing...
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=156280
Sincerely,
Lars Marowsky-Brée <lmb@suse.de>
--
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2005-04-28 16:37 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-27 16:27 what is the current utility in testing active paths from multipat hd? goggin, edward
2005-04-27 17:02 ` Alasdair G Kergon
2005-04-27 17:07 ` Lars Marowsky-Bree
2005-04-27 18:17 ` Mike Anderson
2005-04-27 20:10 ` Lars Marowsky-Bree
2005-04-27 20:23 ` christophe varoqui
2005-04-27 18:36 ` Lan
2005-04-28 16:37 ` Lars Marowsky-Bree
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.