public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* fastfail operation and retries
@ 2005-04-19 17:19 Andreas Herrmann
  2005-04-21 16:42 ` Patrick Mansfield
  0 siblings, 1 reply; 17+ messages in thread
From: Andreas Herrmann @ 2005-04-19 17:19 UTC (permalink / raw)
  To: Linux SCSI

Hi,

I have question(s) regarding the fastfail operation of the SCSI stack.

Performing multipath-tests with an IBM ESS I encountered problems.
During certain operations on an ESS (quiesce/resume and such) requests
on all paths fail temporarily with an data underrun (resid is set in
the FCP-response).  In another situation abort sequences happen (see
FC-FS).

In both cases it is not a path failure but the device (ESS) reports
error conditions temporarily (some seconds).

Now on error on the first path the multipath layer initiates failover
to other available path(s) where requests will immediately fail.

Using linux-2.4 and LVM such problems did not occure. There were
enough retries (5 for each path) to handle such situations.

Now if the FASTFAIL flag is set the SCSI stack prevents retries for
failed SCSI commands.

Problem is that the multipath layer cannot distinguish between path
and device failures (and won't do any retries for the failed request
on the same path anyway).

How can an lld force the SCSI stack to retry a failed scsi-command
(without using DID_REQUEUE or DID_IMM_RETRY, which both do not change
the retry counter).

What about a DID_FORCE_RETRY ?  Or is there any outlook when there
will be a better interface between the SCSI stack and the multipath
layer to properly handle retries.


Regards,

Andreas


^ permalink raw reply	[flat|nested] 17+ messages in thread
* RE: Re: fastfail operation and retries
@ 2005-04-21 21:02 goggin, edward
  0 siblings, 0 replies; 17+ messages in thread
From: goggin, edward @ 2005-04-21 21:02 UTC (permalink / raw)
  To: 'Lars Marowsky-Bree', device-mapper development,
	Andreas Herrmann
  Cc: Linux SCSI

On Thursday, April 21, 2005 3:55 PM,  Lars Marowsky-Bree wrote:
> Together with the "queue_if_no_path" feature flag for dm-mpath that
> should do what you need to handle this (arguably broken) array
> behaviour: It'll queue until the error goes away and 
> multipathd retests
> and reactivates the paths. That ought to work, but given that I don't
> have an IBM ESS accessible, please confirm that.

Depending on the "queue_if_no_path" feature has the current undesirable
side-effect of requiring intervention of the user space multipath components
to reinstate at least one of the paths to a useable state in the multipath
target driver.  This dependency currently creates the potential for deadlock
scenarios since the user space multipath components (nor the kernel for that
matter) are currently architected to avoid them.

I think for now it may be better to try to avoid having to fail a path if it
is possible that an io error is not path related.

^ permalink raw reply	[flat|nested] 17+ messages in thread
* RE: Re: fastfail operation and retries
@ 2005-04-21 21:31 goggin, edward
  2005-04-21 21:49 ` Lars Marowsky-Bree
  0 siblings, 1 reply; 17+ messages in thread
From: goggin, edward @ 2005-04-21 21:31 UTC (permalink / raw)
  To: 'Lars Marowsky-Bree', device-mapper development,
	Andreas Herrmann
  Cc: Linux SCSI

> -----Original Message-----
> From: linux-scsi-owner@vger.kernel.org 
> [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of Lars 
> Marowsky-Bree
> Sent: Thursday, April 21, 2005 5:19 PM
> To: device-mapper development; Andreas Herrmann
> Cc: Linux SCSI
> Subject: Re: [dm-devel] Re: fastfail operation and retries
> 
> On 2005-04-21T17:02:44, "goggin, edward" <egoggin@emc.com> wrote:
> 
> > Depending on the "queue_if_no_path" feature has the current 
> undesirable
> > side-effect of requiring intervention of the user space 
> multipath components
> > to reinstate at least one of the paths to a useable state 
> in the multipath
> > target driver.  This dependency currently creates the 
> potential for deadlock
> > scenarios since the user space multipath components (nor 
> the kernel for that
> > matter) are currently architected to avoid them.
> 
> multipath-tools is, to a certain degree, architected to avoid 
> them. And
> the kernel is meant to be, too - there's bugs and known FIXME's, but
> those are just bugs and we're taking patches gladly ;-)
> 
> > I think for now it may be better to try to avoid having to 
> fail a path if it
> > is possible that an io error is not path related.
> 
> No. Basically every time out error creates a "dunno why" 
> error right now
> - could be the storage system itself, could be the network in between.
>

I was really thinking of the code where the sense key/asc/ascq makes it
into the bio.
 
> A failover to another path is the obvious remedy; take for example the
> CX series where even if it's not the path, it's the SP, and 
> failing over
> to the other SP will cure the problem.
> 
> If the storage at least rejects the IO with a specific error code, it
> can be worked around by a specific hw handler which doesn't fail the
> path but just causes the IO to be queued and retried; that's a pretty
> simple hardware handler to write.

I agree we and likely other storage vendors could do a better job here.
But that said, the multipathing code could also avoid failing the path
just because an io error occurred on that path.  Instead, this could be
the sole responsibility of path testing (from user space) which could
reduce the likelihood of media errors being confused with path
connectivity ones.

> 
> But quite frankly, storage subsystems which _reject_ all IO 
> for a given
> time are just broken for reliable configurations. What good 
> are they in
> multipath configurations if they fail _all_ paths at the same 
> time? How
> can they even dare claim redundancy? We can build more or less smelly
> kludges around them, but it remains a problem to be fixed at 
> the storage
> subsystem level IMNSHO.

I agree that its unfortunate that the CLARiion is failing all paths
during NDU, even for a restricted amount of time.  Even so, it must
be dealt with as is.

> 
> 
> Sincerely,
>     Lars Marowsky-Brée <lmb@suse.de>
> 
> -- 
> High Availability & Clustering
> SUSE Labs, Research and Development
> SUSE LINUX Products GmbH - A Novell Business
> 
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread
* Re: [dm-devel] Re: fastfail operation and retries
@ 2005-04-21 21:33 Andreas Herrmann
  2005-04-21 22:24 ` Lars Marowsky-Bree
  0 siblings, 1 reply; 17+ messages in thread
From: Andreas Herrmann @ 2005-04-21 21:33 UTC (permalink / raw)
  To: Lars Marowsky-Bree
  Cc: device-mapper development, Linux SCSI, linux-scsi-owner

        Lars Marowsky-Bree <lmb@suse.de>
        21.04.2005 21:54
 
> On 2005-04-21T09:42:05, Patrick Mansfield <patmans@us.ibm.com> wrote:

> > On Tue, Apr 19, 2005 at 07:19:53PM +0200, Andreas Herrmann wrote:

  <snip>

> > 
> > We need a patch like Mike Christie had, this:
> > 
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=107961883914541&w=2
> > 
> > The scsi core should decode the sense data and pass up the result, 
then dm
> > need not decode sense data, and we don't need sense data passed around 
via
> > the block layer.

> The most recent udm patchset has a patch by Jens Axboe and myself to
> pass up sense data / error codes in the bio so the dm mpath module can
> deal with it. 

> Only issue still is that the SCSI midlayer does only generate a single
> "EIO" code also for timeouts; however, that pretty much means it's a
> transport error, because if it was a media error, we'd be getting sense
> data ;-)

Well, there are various situations when all paths to the ESS are
"temporarily unavailable". In some cases TASK_SET_FULL/BUSY is
reported as it should be. In other cases we just encounter data
underruns or exchange sequences are aborted and finally it might be
that requests just time out. BTW, it is not only ESS where I have seen
such (broken) behaviour.

> Together with the "queue_if_no_path" feature flag for dm-mpath that
> should do what you need to handle this (arguably broken) array
> behaviour: It'll queue until the error goes away and multipathd retests
> and reactivates the paths. That ought to work, but given that I don't
> have an IBM ESS accessible, please confirm that.

Sounds good. Will make some tests using the "queue_if_no_path" feature.

> It is possible that to fully support them a dm mpath hardware handler
> (like for the EMC CX family) might be required, too.

For the time being I hope "queue_if_no_path" feature is sufficient
to succesfully pass our tests ;-)

> (For easier testing, you'll find that all this functionality is
> available in the latest SLES9 SP2 betas, to which you ought to have
> access at IBM, and the kernels are also available via
> ftp://ftp.suse.com/pub/projects/kernel/kotd/.)

> > scsi core could be changed to handle device specific decoding via 
sense
> > tables that can be modified via sysfs, similar to devinfo code (well,
> > devinfo still lacks a sysfs interface).

> dm-path's capabilities go a bit beyond just the error decoding (which
> for generic devices is also provided for in a generic
> dm_scsi_err_handler()); for example you can code special initialization
> commands and behaviour an array might need.

> Maybe this could indeed be abstracted further to download the command
> and/or specific decoding tables from user-space via sysfs or configfs by
> a generic user-space customizable dm-hw-handler-generic.[ch] plugin; I
> think patches are being accepted ;-)

Thanks for the information.


Regards,

Andreas


^ permalink raw reply	[flat|nested] 17+ messages in thread
* RE: Re: fastfail operation and retries
@ 2005-04-21 22:01 goggin, edward
  2005-04-21 22:16 ` Lars Marowsky-Bree
  0 siblings, 1 reply; 17+ messages in thread
From: goggin, edward @ 2005-04-21 22:01 UTC (permalink / raw)
  To: 'Lars Marowsky-Bree', device-mapper development,
	Andreas Herrmann
  Cc: Linux SCSI

> -----Original Message-----
> From: linux-scsi-owner@vger.kernel.org 
> [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of Lars 
> Marowsky-Bree
> Sent: Thursday, April 21, 2005 5:50 PM
> To: device-mapper development; Andreas Herrmann
> Cc: Linux SCSI
> Subject: Re: [dm-devel] Re: fastfail operation and retries
> 
> On 2005-04-21T17:31:46, "goggin, edward" <egoggin@emc.com> wrote:
> 
> > > No. Basically every time out error creates a "dunno why" 
> error right
> > > now - could be the storage system itself, could be the network in
> > > between.
> > >
> > I was really thinking of the code where the sense key/asc/ascq makes
> > it into the bio.
> 
> We don't get sense data for transport errors and certain storage
> failures, though.
> 
> > I agree we and likely other storage vendors could do a better job
> > here.  But that said, the multipathing code could also avoid failing
> > the path just because an io error occurred on that path.  Instead,
> > this could be the sole responsibility of path testing (from user
> > space) which could reduce the likelihood of media errors being
> > confused with path connectivity ones.
> 
> If we can't differentiate in the kernel where we have the IO error
> details available, then how would user-space? You're not solving the
> problem ;-)

Maybe not completely, but at least an inquiry of page 83 will not trip
over media errors.  Also, why use a different test for determining path
success than the one used for path failure?

> 
> > I agree that its unfortunate that the CLARiion is failing all paths
> > during NDU, even for a restricted amount of time.  Even so, it must
> > be dealt with as is.
> 
> It does? According to my documentation, the CX-family, the FC4700(-2)
> and likely the Symmetrix NDU is a rolling update, so that always one
> Service-Processor remains accessible, with enough delay in 
> between them
> that path retesting will have reenabled the path.
> 
> We get an 02/04/03 Path Not Ready error code for this case, 
> which in the
> dm-emc.c handler is translated to an immediate switch_pg.
> 
> In fact, the user-space testing code will receive 
> pre-notification of a
> pending NDU by the LUN Operations field being set to 1, which 
> will cause
> user-space to flag that path as down, even if there's no in-flight IO.
> 
> This combined ought to cover the NDU case pretty well and is 
> implemented
> already. (And supposedly works in SLES9 SP2 beta3.)
> 
> According to my docs, the only EMC array which does fail all paths
> during a software update (by doing a "Warm Reboot") is a FC4500 array.
> Not sure whether this also includes the AX-series, though, my doc
> doesn't mention it. The FC4500 might not respond to IO for upto 50
> seconds; in which case the queue_if_no_path and user-space retesting
> provides adequate (as good as possible) coverage to reinstate 
> the paths.

I am seeing all-paths-down time period whenever I perfrom an NDU
for a CX300 while running 1 (async write behind) dd thread per
mapped device for 16 mapped devices.

> 
> (The fact that no write/reads complete should automatically 
> throttle the
> IO, too; however, this might not be true for certain write 
> patterns, and
> in particular async IO (how could we possible throttle _that_?). IO
> throttling in this case remains a problem which we might need to
> address.)

This is the problem I am refering to.

> 
> I guess you get what you pay for: The arrays which _do_ have this
> misbehaviour _will_ be problematic in certain configurations; putting
> swap on them comes to mind.
> 
> As this allows EMC and other vendors to sell their higher end 
> arrays, I
> can't see how you could possibly complain ;-)
> 
> I stand by my point that any array which does have this behaviour does
> not qualify as high-end storage.
> 
> 
> Sincerely,
>     Lars Marowsky-Brée <lmb@suse.de>
> 
> -- 
> High Availability & Clustering
> SUSE Labs, Research and Development
> SUSE LINUX Products GmbH - A Novell Business
> 
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2005-05-03 11:11 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-19 17:19 fastfail operation and retries Andreas Herrmann
2005-04-21 16:42 ` Patrick Mansfield
2005-04-21 19:54   ` Lars Marowsky-Bree
2005-04-21 22:13     ` Patrick Mansfield
2005-04-21 22:52       ` Lars Marowsky-Bree
2005-04-22  0:22         ` Patrick Mansfield
  -- strict thread matches above, loose matches on Subject: below --
2005-04-21 21:02 goggin, edward
2005-04-21 21:31 goggin, edward
2005-04-21 21:49 ` Lars Marowsky-Bree
2005-04-21 21:33 [dm-devel] " Andreas Herrmann
2005-04-21 22:24 ` Lars Marowsky-Bree
2005-04-22 19:13   ` Lan
2005-04-25 23:56     ` [dm-devel] " Tim Pepper
2005-04-27 14:44       ` Lars Marowsky-Bree
2005-04-27 22:57         ` Tim Pepper
2005-05-03 11:11           ` Lars Marowsky-Bree
2005-04-26  9:55     ` Lars Marowsky-Bree
2005-04-21 22:01 goggin, edward
2005-04-21 22:16 ` Lars Marowsky-Bree

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox