From: Lars Marowsky-Bree <lmb@suse.de>
To: device-mapper development <dm-devel@redhat.com>,
Andreas Herrmann <aherrman@de.ibm.com>
Cc: Linux SCSI <linux-scsi@vger.kernel.org>
Subject: Re: Re: fastfail operation and retries
Date: Fri, 22 Apr 2005 00:16:15 +0200 [thread overview]
Message-ID: <20050421221614.GX17315@marowsky-bree.de> (raw)
In-Reply-To: <C2EEB4E538D3DC48BF57F391F422779321A927@SRMANNING.eng.emc.com>
On 2005-04-21T18:01:04, "goggin, edward" <egoggin@emc.com> wrote:
> > If we can't differentiate in the kernel where we have the IO error
> > details available, then how would user-space? You're not solving the
> > problem ;-)
> Maybe not completely, but at least an inquiry of page 83 will not trip
> over media errors. Also, why use a different test for determining path
> success than the one used for path failure?
If the kernel sees an error, it needs to take action. It has immediate
knowledge of the error, while the further user-space diagnosis (or even
further in-kernel diagnosis; where this is actually implemented doesn't
matter) obviously lags behind.
I think the aim is to immediately react and re-route IO to reduce the
interruption to upper layers. In principle, if we have healthy paths,
rerouting is always safe; only if we know for sure it's a media error
(as indicated by appropriate sense data) do we immediately report IO
error to upper layers, or switch pgs instead of failing the path etc.
This is a pessimistic approach: take a potentially failed path out of
service asap.
What also happens though is that an event is sent to user-space, and
user-space "immediately" retests the path, and if it finds it healthy,
will reinstate it.
I believe this is correct behaviour.
> > According to my docs, the only EMC array which does fail all paths
> > during a software update (by doing a "Warm Reboot") is a FC4500 array.
> > Not sure whether this also includes the AX-series, though, my doc
> > doesn't mention it. The FC4500 might not respond to IO for upto 50
> > seconds; in which case the queue_if_no_path and user-space retesting
> > provides adequate (as good as possible) coverage to reinstate
> > the paths.
>
> I am seeing all-paths-down time period whenever I perfrom an NDU
> for a CX300 while running 1 (async write behind) dd thread per
> mapped device for 16 mapped devices.
Are you already running the code with the sense data decoding enabled,
for example a _very_ recent SLES9 SP2 beta kernel (basically, as of a
couple hours ago) or one with all patches applied from the multipath
bugzilla + multipath-tools pre18, and are you connected to both SPs?
If not, it's possible that that combo kernel didn't correctly handle
that case, because it didn't know about triggering a switch_pg etc.
And, if the CX300 indeed fails all paths during NDU at the same time, it
is behaving contrary to the published CX-series specification; in which
case it is an EMC (and not ours! ;-) bug and needs to be fixed in the
firmware ;-)
> > (The fact that no write/reads complete should automatically throttle
> > the IO, too; however, this might not be true for certain write
> > patterns, and in particular async IO (how could we possible throttle
> > _that_?). IO throttling in this case remains a problem which we
> > might need to address.)
> This is the problem I am refering to.
Well, I don't think so. This is an additional problem, but not one you
should be running into.
Sincerely,
Lars Marowsky-Brée <lmb@suse.de>
--
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business
next prev parent reply other threads:[~2005-04-21 22:16 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-04-21 22:01 Re: fastfail operation and retries goggin, edward
2005-04-21 22:16 ` Lars Marowsky-Bree [this message]
-- strict thread matches above, loose matches on Subject: below --
2005-04-21 21:33 [dm-devel] " Andreas Herrmann
2005-04-21 22:24 ` Lars Marowsky-Bree
2005-04-22 19:13 ` Lan
2005-04-25 23:56 ` [dm-devel] " Tim Pepper
2005-04-27 14:44 ` Lars Marowsky-Bree
2005-04-27 22:57 ` Tim Pepper
2005-05-03 11:11 ` Lars Marowsky-Bree
2005-04-26 9:55 ` Lars Marowsky-Bree
2005-04-21 21:31 goggin, edward
2005-04-21 21:49 ` Lars Marowsky-Bree
2005-04-21 21:02 goggin, edward
2005-04-19 17:19 Andreas Herrmann
2005-04-21 16:42 ` Patrick Mansfield
2005-04-21 19:54 ` Lars Marowsky-Bree
2005-04-21 22:13 ` Patrick Mansfield
2005-04-21 22:52 ` Lars Marowsky-Bree
2005-04-22 0:22 ` Patrick Mansfield
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050421221614.GX17315@marowsky-bree.de \
--to=lmb@suse.de \
--cc=aherrman@de.ibm.com \
--cc=dm-devel@redhat.com \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox