Re: [dm-devel] Re: fastfail operation and retries

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [dm-devel] Re: fastfail operation and retries
  2005-04-21 21:02 goggin, edward
@ 2005-04-21 21:18 ` Lars Marowsky-Bree
  0 siblings, 0 replies; 10+ messages in thread
From: Lars Marowsky-Bree @ 2005-04-21 21:18 UTC (permalink / raw)
  To: device-mapper development, Andreas Herrmann; +Cc: Linux SCSI

On 2005-04-21T17:02:44, "goggin, edward" <egoggin@emc.com> wrote:

> Depending on the "queue_if_no_path" feature has the current undesirable
> side-effect of requiring intervention of the user space multipath components
> to reinstate at least one of the paths to a useable state in the multipath
> target driver.  This dependency currently creates the potential for deadlock
> scenarios since the user space multipath components (nor the kernel for that
> matter) are currently architected to avoid them.

multipath-tools is, to a certain degree, architected to avoid them. And
the kernel is meant to be, too - there's bugs and known FIXME's, but
those are just bugs and we're taking patches gladly ;-)

> I think for now it may be better to try to avoid having to fail a path if it
> is possible that an io error is not path related.

No. Basically every time out error creates a "dunno why" error right now
- could be the storage system itself, could be the network in between.

A failover to another path is the obvious remedy; take for example the
CX series where even if it's not the path, it's the SP, and failing over
to the other SP will cure the problem.

If the storage at least rejects the IO with a specific error code, it
can be worked around by a specific hw handler which doesn't fail the
path but just causes the IO to be queued and retried; that's a pretty
simple hardware handler to write.

But quite frankly, storage subsystems which _reject_ all IO for a given
time are just broken for reliable configurations. What good are they in
multipath configurations if they fail _all_ paths at the same time? How
can they even dare claim redundancy? We can build more or less smelly
kludges around them, but it remains a problem to be fixed at the storage
subsystem level IMNSHO.

Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dm-devel] Re: fastfail operation and retries
@ 2005-04-21 21:33 Andreas Herrmann
  2005-04-21 21:38 ` David S. Miller
  2005-04-21 22:24 ` Lars Marowsky-Bree
  0 siblings, 2 replies; 10+ messages in thread
From: Andreas Herrmann @ 2005-04-21 21:33 UTC (permalink / raw)
  To: Lars Marowsky-Bree
  Cc: device-mapper development, Linux SCSI, linux-scsi-owner

        Lars Marowsky-Bree <lmb@suse.de>
        21.04.2005 21:54
 
> On 2005-04-21T09:42:05, Patrick Mansfield <patmans@us.ibm.com> wrote:

> > On Tue, Apr 19, 2005 at 07:19:53PM +0200, Andreas Herrmann wrote:

  <snip>

> > 
> > We need a patch like Mike Christie had, this:
> > 
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=107961883914541&w=2
> > 
> > The scsi core should decode the sense data and pass up the result, 
then dm
> > need not decode sense data, and we don't need sense data passed around 
via
> > the block layer.

> The most recent udm patchset has a patch by Jens Axboe and myself to
> pass up sense data / error codes in the bio so the dm mpath module can
> deal with it. 

> Only issue still is that the SCSI midlayer does only generate a single
> "EIO" code also for timeouts; however, that pretty much means it's a
> transport error, because if it was a media error, we'd be getting sense
> data ;-)

Well, there are various situations when all paths to the ESS are
"temporarily unavailable". In some cases TASK_SET_FULL/BUSY is
reported as it should be. In other cases we just encounter data
underruns or exchange sequences are aborted and finally it might be
that requests just time out. BTW, it is not only ESS where I have seen
such (broken) behaviour.

> Together with the "queue_if_no_path" feature flag for dm-mpath that
> should do what you need to handle this (arguably broken) array
> behaviour: It'll queue until the error goes away and multipathd retests
> and reactivates the paths. That ought to work, but given that I don't
> have an IBM ESS accessible, please confirm that.

Sounds good. Will make some tests using the "queue_if_no_path" feature.

> It is possible that to fully support them a dm mpath hardware handler
> (like for the EMC CX family) might be required, too.

For the time being I hope "queue_if_no_path" feature is sufficient
to succesfully pass our tests ;-)

> (For easier testing, you'll find that all this functionality is
> available in the latest SLES9 SP2 betas, to which you ought to have
> access at IBM, and the kernels are also available via
> ftp://ftp.suse.com/pub/projects/kernel/kotd/.)

> > scsi core could be changed to handle device specific decoding via 
sense
> > tables that can be modified via sysfs, similar to devinfo code (well,
> > devinfo still lacks a sysfs interface).

> dm-path's capabilities go a bit beyond just the error decoding (which
> for generic devices is also provided for in a generic
> dm_scsi_err_handler()); for example you can code special initialization
> commands and behaviour an array might need.

> Maybe this could indeed be abstracted further to download the command
> and/or specific decoding tables from user-space via sysfs or configfs by
> a generic user-space customizable dm-hw-handler-generic.[ch] plugin; I
> think patches are being accepted ;-)

Thanks for the information.


Regards,

Andreas


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dm-devel] Re: fastfail operation and retries
  2005-04-21 21:33 [dm-devel] Re: fastfail operation and retries Andreas Herrmann
@ 2005-04-21 21:38 ` David S. Miller
  2005-04-21 22:24 ` Lars Marowsky-Bree
  1 sibling, 0 replies; 10+ messages in thread
From: David S. Miller @ 2005-04-21 21:38 UTC (permalink / raw)
  To: Andreas Herrmann; +Cc: lmb, dm-devel, linux-scsi, linux-scsi-owner

Please don't add "linux-scsi-owner" to the CC: list like that.
That goes to the list administrator (currently me), not the
linux-scsi mailing list.

There seems to be a rather prominent influx of people sending
posts to the *-owner address lately, I wonder why as nothing has
materially changed in the outgoing headers.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Re: fastfail operation and retries
  2005-04-21 21:33 [dm-devel] Re: fastfail operation and retries Andreas Herrmann
  2005-04-21 21:38 ` David S. Miller
@ 2005-04-21 22:24 ` Lars Marowsky-Bree
  2005-04-22 19:13   ` Lan
  1 sibling, 1 reply; 10+ messages in thread
From: Lars Marowsky-Bree @ 2005-04-21 22:24 UTC (permalink / raw)
  To: device-mapper development; +Cc: Linux SCSI, aherrman

On 2005-04-21T23:33:57, Andreas Herrmann <aherrman@de.ibm.com> wrote:

> Well, there are various situations when all paths to the ESS are
> "temporarily unavailable". In some cases TASK_SET_FULL/BUSY is
> reported as it should be.

Not sure whether this sense data is decoded and handled correctly in
dm-mpath yet. I don't have detailed specs, nor a feature request to
allocate time to work on making sure it really does. I recommend that
someone at IBM takes the real specs for the ESS and makes sure that it
all works, by a combination of the right defaults in the multipath-tools
hwtable and, if need be, a dm-ess plugin to handle this.

This would be much appreciated.

> underruns or exchange sequences are aborted and finally it might be
> that requests just time out. BTW, it is not only ESS where I have seen
> such (broken) behaviour.

Well, what can I say. Broken behaviour needs to be documented and worked
around, but obviously only as far as that is possible.

> > It is possible that to fully support them a dm mpath hardware handler
> > (like for the EMC CX family) might be required, too.
> For the time being I hope "queue_if_no_path" feature is sufficient
> to succesfully pass our tests ;-)

If it is sufficient, you might at least wish to update the
multipath-tools hwtable entry so that it is automagically set for your
arrays.

Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Re: fastfail operation and retries
  2005-04-21 22:24 ` Lars Marowsky-Bree
@ 2005-04-22 19:13   ` Lan
  2005-04-25 23:56     ` [dm-devel] " Tim Pepper
  2005-04-26  9:55     ` Lars Marowsky-Bree
  0 siblings, 2 replies; 10+ messages in thread
From: Lan @ 2005-04-22 19:13 UTC (permalink / raw)
  To: device-mapper development; +Cc: Linux SCSI, aherrman

On 4/21/05, Lars Marowsky-Bree <lmb@suse.de> wrote:
> On 2005-04-21T23:33:57, Andreas Herrmann <aherrman@de.ibm.com> wrote:
> 
> > Well, there are various situations when all paths to the ESS are
> > "temporarily unavailable". In some cases TASK_SET_FULL/BUSY is
> > reported as it should be.
> 
> Not sure whether this sense data is decoded and handled correctly in
> dm-mpath yet. I don't have detailed specs, nor a feature request to
> allocate time to work on making sure it really does. I recommend that
> someone at IBM takes the real specs for the ESS and makes sure that it
> all works, by a combination of the right defaults in the multipath-tools
> hwtable and, if need be, a dm-ess plugin to handle this.
> 
> This would be much appreciated.
>

Please correct me if my assumption is wrong, but I would think that
transient errors are expected, especially in a SAN, from both the
fabric and media. A storage device may have to return retryable status
conditions at certain points, and that such retryable conditions are
not necessarily specific to a storage device. For example, a
QUEUE_FULL or BUSY, implying that the device is congested. Wouldn't
most storage devices reasonably expect I/O failed due to this
condition will be retried? [Such a congestion handling mechanism, I
would think, would not have to be storage-specific, although the
policy for handling congestion might be?]  So  in order to deal with
transient conditions given that failfast flag is set, the
queue_if_no_path must be used; I'm not sure why any dm-multipath
storage users would not want to turn on queue_if_no_path by default?

As far as I know, ESS does not require any special handing of special
sense information, besides various sense data status conditions that
it expects would be retried. (Arent' data underruns also an expected
retryable condition?).  I'm not so familiar with all the various
possible transport and media errors/conditions, but I would think that
most could/would want to be handled generically by storage devices
(which is why the scsi core has generic error handling i'd imagine).
But I agree that more testing should be done with ESS and its spec to
verify that a special dm-ess error handler is actually not needed. 
And at the least, a hw entry should be added to dm to turn on
queue_if_no_path by default for ESS, and any other necessary defaults.
 Although, it seems need to add to multipath-tools the ability to set
a timeout limit on how long an I/O is queued and retried (otherwise in
a permanent failure, I think the I/O  could be queued for a quite
awhile, e.g. until system runs out of memory).

Also, what do you think about allowing a configurable threshold on I/O
failures in dm-multipath before deciding to set a path dead; 1 is
kinda low, and has no tolerance at all for transient errors. I think
it will lessen the dependency on waiting for multipath-tools to
reinstate a path that has been set dead due to a transient condition.

Thanks!
Lan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dm-devel] Re: fastfail operation and retries
  2005-04-22 19:13   ` Lan
@ 2005-04-25 23:56     ` Tim Pepper
  2005-04-27 14:44       ` Lars Marowsky-Bree
  2005-04-26  9:55     ` Lars Marowsky-Bree
  1 sibling, 1 reply; 10+ messages in thread
From: Tim Pepper @ 2005-04-25 23:56 UTC (permalink / raw)
  To: tranlan; +Cc: device-mapper development, Linux SCSI, aherrman

On 4/22/05, Lan <transter@gmail.com> wrote:
>
> queue_if_no_path must be used; I'm not sure why any dm-multipath
> storage users would not want to turn on queue_if_no_path by default?

What protection is there against long term queueing and running the
machine out of memory?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Re: fastfail operation and retries
  2005-04-22 19:13   ` Lan
  2005-04-25 23:56     ` [dm-devel] " Tim Pepper
@ 2005-04-26  9:55     ` Lars Marowsky-Bree
  1 sibling, 0 replies; 10+ messages in thread
From: Lars Marowsky-Bree @ 2005-04-26  9:55 UTC (permalink / raw)
  To: tranlan, device-mapper development; +Cc: Linux SCSI, aherrman

On 2005-04-22T12:13:53, Lan <transter@gmail.com> wrote:

>  Although, it seems need to add to multipath-tools the ability to set
> a timeout limit on how long an I/O is queued and retried (otherwise in
> a permanent failure, I think the I/O  could be queued for a quite
> awhile, e.g. until system runs out of memory).

This can actually be implemented in user-space. If the paths stay down
for N seconds, remove the queue_if_no_path feature flag, and all IO will
be failed.

> Also, what do you think about allowing a configurable threshold on I/O
> failures in dm-multipath before deciding to set a path dead; 1 is
> kinda low, and has no tolerance at all for transient errors.

That might be a good idea. 

Note however that DM mpath already distinguishes between path failures
and media failures for example: A media failure will not cause a path to
be failed.

And there's also a trade-off: As long as the path is not failed, it'll
receive more IO. Which, if it doesn't turn out to be a transient error,
we will need to wait on to fail, has to be requeued and retried
somewhere else. This causes delays.

Failing the path on the first error potentially attributable to the
transport will cause an immediate retry on another path though; and if
it turns out to be a transient error, the path will be returned into
operation within a couple of seconds by user-space.

> I think it will lessen the dependency on waiting for multipath-tools
> to reinstate a path that has been set dead due to a transient
> condition.

True, but this is actually by current design, because we want to
redirect IO to healthy paths as quickly as possible.

Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Re: fastfail operation and retries
  2005-04-25 23:56     ` [dm-devel] " Tim Pepper
@ 2005-04-27 14:44       ` Lars Marowsky-Bree
  2005-04-27 22:57         ` Tim Pepper
  0 siblings, 1 reply; 10+ messages in thread
From: Lars Marowsky-Bree @ 2005-04-27 14:44 UTC (permalink / raw)
  To: Tim Pepper, device-mapper development, tranlan; +Cc: Linux SCSI, aherrman

On 2005-04-25T16:56:56, Tim Pepper <tpepper@gmail.com> wrote:

> > queue_if_no_path must be used; I'm not sure why any dm-multipath
> > storage users would not want to turn on queue_if_no_path by default?
> What protection is there against long term queueing and running the
> machine out of memory?

User-space needs to take action and tell us when to stop queuing.


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Re: fastfail operation and retries
  2005-04-27 14:44       ` Lars Marowsky-Bree
@ 2005-04-27 22:57         ` Tim Pepper
  2005-05-03 11:11           ` Lars Marowsky-Bree
  0 siblings, 1 reply; 10+ messages in thread
From: Tim Pepper @ 2005-04-27 22:57 UTC (permalink / raw)
  To: Lars Marowsky-Bree
  Cc: device-mapper development, tranlan, Linux SCSI, aherrman

On 4/27/05, Lars Marowsky-Bree <lmb@suse.de> wrote:
> User-space needs to take action and tell us when to stop queuing.

Is there any risk of priority inversion?  I can't think of a specific
issue beyond the userspace daemon process simply not existing that
wouldn't hopefully settle out over time and I haven't looked closely
at this aspect of 2.6, but it used to be easy to get/keep the cpu busy
enough on flushing IO to disk to hurt userspace response times (fibre
pulls during heavy buffered, filesystem IO effectively DoSing the
machine for a long period).  If that sort of thing is still possible,
it seems risky relying on a userspace application for
timely/meaningful recovery of the resources consumed by the IO.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Re: fastfail operation and retries
  2005-04-27 22:57         ` Tim Pepper
@ 2005-05-03 11:11           ` Lars Marowsky-Bree
  0 siblings, 0 replies; 10+ messages in thread
From: Lars Marowsky-Bree @ 2005-05-03 11:11 UTC (permalink / raw)
  To: Tim Pepper, device-mapper development; +Cc: tranlan, Linux SCSI, aherrman

On 2005-04-27T15:57:09, Tim Pepper <tpepper@gmail.com> wrote:

> > User-space needs to take action and tell us when to stop queuing.
> Is there any risk of priority inversion?

That risk of course always exists (and it'd exist in the kernel too).
The code in question needs to be auditted to make sure this case is
taken care of.


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business	 -- Charles Darwin
"Ignorance more frequently begets confidence than does knowledge"

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-05-03 11:11 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-21 21:33 [dm-devel] Re: fastfail operation and retries Andreas Herrmann
2005-04-21 21:38 ` David S. Miller
2005-04-21 22:24 ` Lars Marowsky-Bree
2005-04-22 19:13   ` Lan
2005-04-25 23:56     ` [dm-devel] " Tim Pepper
2005-04-27 14:44       ` Lars Marowsky-Bree
2005-04-27 22:57         ` Tim Pepper
2005-05-03 11:11           ` Lars Marowsky-Bree
2005-04-26  9:55     ` Lars Marowsky-Bree
  -- strict thread matches above, loose matches on Subject: below --
2005-04-21 21:02 goggin, edward
2005-04-21 21:18 ` [dm-devel] " Lars Marowsky-Bree

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox