From: Martin Wilck <mwilck@suse.com>
To: Guan Junxiong <guanjunxiong@huawei.com>, Hannes Reinecke <hare@suse.de>
Cc: dm-devel@redhat.com, chengjike.cheng@huawei.com,
niuhaoxin@huawei.com, shenhong09@huawei.com
Subject: Re: [PATCH] multipath-tools: intermittent IO error accounting to improve reliability
Date: Mon, 28 Aug 2017 13:13:04 +0200 [thread overview]
Message-ID: <1503918784.4546.24.camel@suse.com> (raw)
In-Reply-To: <508b849f-026c-7365-7a4c-f2e8b2ef01a4@huawei.com>
On Thu, 2017-08-24 at 17:59 +0800, Guan Junxiong wrote:
> Hi, Hannes
> Thanks for your comments. My reply inline.
>
> On 2017/8/22 23:37, Hannes Reinecke wrote:
> > - As we now have advanced path selectors the overall consensus is
> > that
> > those selectors _should_ be able to handle these situations; ie for
> > a
> > flaky path the path selector should switch away from it and move
> > the
> > load to other, unaffected paths.
> > Have you checked if the existing path selectors are able to cope
> > with
> > this situation? If not, why not?
>
> The existing path selectors in the kernel space are able to fail_path
> the flaky path when certain IO errors occurs. However only the user-
> space
> multipathd's checkers can detect whether the path is up. Therefore,
> for path
> with long-time intermittent IO or flaky path, that path selectors
> suffers
> from taking in the path and taking out the path _again_ _and_
> _again_.
> Even the san_path_err_threshold , san_path_err_forget_rate and
> san_path_err_recovery_time
> is turned on, the detect sample interval of that path checkers is so
> big/coarse
> that it doesn't see what happens in the middle of the sample
> interval.
I have the concern that we are introducing too many different
regulation algorithms. We have path selectors, path checkers,
san_path_err_XXX, and now path_io_err_XXX as well. We must be certain
that these play together in a well-defined fashion (most importantly,
avoid that one mechanism activates a path while the other is in the
process of tearing it down, etc.). We must also avoid causing user
confusion, as multipath configuration is already a daunting task for
many. Your new algorithm should be mutually exclusive with
san_path_err_XXX. Perhaps we should even consider dropping the
san_path_err_XXX options entirely if we choose to adopt your new
approach.
> > - However, flaky path detection is implemented, it will work most
> > efficiently when moving I/O _away_ from the flaky path. However, in
> > doing so we don't have a mechanism to figure out if and when the
> > path is
> > useable again (as we're not sending I/O to it, and the TUR or any
> > other
> > path checker might not be affected from the flaky behaviour).
> > So when should we declare a path as 'good' again?
>
> In this patch, the flaky path will stay only
> path_io_err_recovery_time seconds
> if there are more than one active path. After only
> path_io_err_recovery_time seconds,
> the flaky path will stay in normal, which means , when path checker
> detects it
> is up, it will reinstate into the usable path.
>
> However, how about we schedule the intermittent IO checking process
> again when
> the path_io_err_recovery_time seconds expires. If the number of IO
> erros is less
> than path_io_err_num_threshold, we declare the path as 'good' again.
That sounds like a reasonable improvement over the original patch.
Regards,
Martin
--
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
next prev parent reply other threads:[~2017-08-28 11:13 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-22 10:07 [PATCH] multipath-tools: intermittent IO error accounting to improve reliability Guan Junxiong
2017-08-22 15:37 ` Hannes Reinecke
2017-08-24 9:59 ` Guan Junxiong
2017-08-28 11:13 ` Martin Wilck [this message]
2017-08-29 1:16 ` Guan Junxiong
2017-08-28 13:28 ` Martin Wilck
2017-08-29 3:18 ` Guan Junxiong
2017-09-04 19:36 ` Martin Wilck
2017-09-05 1:36 ` Guan Junxiong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1503918784.4546.24.camel@suse.com \
--to=mwilck@suse.com \
--cc=chengjike.cheng@huawei.com \
--cc=dm-devel@redhat.com \
--cc=guanjunxiong@huawei.com \
--cc=hare@suse.de \
--cc=niuhaoxin@huawei.com \
--cc=shenhong09@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.