public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [dm-devel] Re: fastfail operation and retries
@ 2005-04-21 21:33 Andreas Herrmann
  2005-04-21 21:38 ` David S. Miller
  2005-04-21 22:24 ` Lars Marowsky-Bree
  0 siblings, 2 replies; 10+ messages in thread
From: Andreas Herrmann @ 2005-04-21 21:33 UTC (permalink / raw)
  To: Lars Marowsky-Bree
  Cc: device-mapper development, Linux SCSI, linux-scsi-owner

        Lars Marowsky-Bree <lmb@suse.de>
        21.04.2005 21:54
 
> On 2005-04-21T09:42:05, Patrick Mansfield <patmans@us.ibm.com> wrote:

> > On Tue, Apr 19, 2005 at 07:19:53PM +0200, Andreas Herrmann wrote:

  <snip>

> > 
> > We need a patch like Mike Christie had, this:
> > 
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=107961883914541&w=2
> > 
> > The scsi core should decode the sense data and pass up the result, 
then dm
> > need not decode sense data, and we don't need sense data passed around 
via
> > the block layer.

> The most recent udm patchset has a patch by Jens Axboe and myself to
> pass up sense data / error codes in the bio so the dm mpath module can
> deal with it. 

> Only issue still is that the SCSI midlayer does only generate a single
> "EIO" code also for timeouts; however, that pretty much means it's a
> transport error, because if it was a media error, we'd be getting sense
> data ;-)

Well, there are various situations when all paths to the ESS are
"temporarily unavailable". In some cases TASK_SET_FULL/BUSY is
reported as it should be. In other cases we just encounter data
underruns or exchange sequences are aborted and finally it might be
that requests just time out. BTW, it is not only ESS where I have seen
such (broken) behaviour.

> Together with the "queue_if_no_path" feature flag for dm-mpath that
> should do what you need to handle this (arguably broken) array
> behaviour: It'll queue until the error goes away and multipathd retests
> and reactivates the paths. That ought to work, but given that I don't
> have an IBM ESS accessible, please confirm that.

Sounds good. Will make some tests using the "queue_if_no_path" feature.

> It is possible that to fully support them a dm mpath hardware handler
> (like for the EMC CX family) might be required, too.

For the time being I hope "queue_if_no_path" feature is sufficient
to succesfully pass our tests ;-)

> (For easier testing, you'll find that all this functionality is
> available in the latest SLES9 SP2 betas, to which you ought to have
> access at IBM, and the kernels are also available via
> ftp://ftp.suse.com/pub/projects/kernel/kotd/.)

> > scsi core could be changed to handle device specific decoding via 
sense
> > tables that can be modified via sysfs, similar to devinfo code (well,
> > devinfo still lacks a sysfs interface).

> dm-path's capabilities go a bit beyond just the error decoding (which
> for generic devices is also provided for in a generic
> dm_scsi_err_handler()); for example you can code special initialization
> commands and behaviour an array might need.

> Maybe this could indeed be abstracted further to download the command
> and/or specific decoding tables from user-space via sysfs or configfs by
> a generic user-space customizable dm-hw-handler-generic.[ch] plugin; I
> think patches are being accepted ;-)

Thanks for the information.


Regards,

Andreas


^ permalink raw reply	[flat|nested] 10+ messages in thread
* RE: Re: fastfail operation and retries
@ 2005-04-21 21:02 goggin, edward
  2005-04-21 21:18 ` [dm-devel] " Lars Marowsky-Bree
  0 siblings, 1 reply; 10+ messages in thread
From: goggin, edward @ 2005-04-21 21:02 UTC (permalink / raw)
  To: 'Lars Marowsky-Bree', device-mapper development,
	Andreas Herrmann
  Cc: Linux SCSI

On Thursday, April 21, 2005 3:55 PM,  Lars Marowsky-Bree wrote:
> Together with the "queue_if_no_path" feature flag for dm-mpath that
> should do what you need to handle this (arguably broken) array
> behaviour: It'll queue until the error goes away and 
> multipathd retests
> and reactivates the paths. That ought to work, but given that I don't
> have an IBM ESS accessible, please confirm that.

Depending on the "queue_if_no_path" feature has the current undesirable
side-effect of requiring intervention of the user space multipath components
to reinstate at least one of the paths to a useable state in the multipath
target driver.  This dependency currently creates the potential for deadlock
scenarios since the user space multipath components (nor the kernel for that
matter) are currently architected to avoid them.

I think for now it may be better to try to avoid having to fail a path if it
is possible that an io error is not path related.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-05-03 11:11 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-21 21:33 [dm-devel] Re: fastfail operation and retries Andreas Herrmann
2005-04-21 21:38 ` David S. Miller
2005-04-21 22:24 ` Lars Marowsky-Bree
2005-04-22 19:13   ` Lan
2005-04-25 23:56     ` [dm-devel] " Tim Pepper
2005-04-27 14:44       ` Lars Marowsky-Bree
2005-04-27 22:57         ` Tim Pepper
2005-05-03 11:11           ` Lars Marowsky-Bree
2005-04-26  9:55     ` Lars Marowsky-Bree
  -- strict thread matches above, loose matches on Subject: below --
2005-04-21 21:02 goggin, edward
2005-04-21 21:18 ` [dm-devel] " Lars Marowsky-Bree

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox