* RE: Re: fastfail operation and retries
@ 2005-04-21 21:02 goggin, edward
2005-04-21 21:18 ` [dm-devel] " Lars Marowsky-Bree
0 siblings, 1 reply; 5+ messages in thread
From: goggin, edward @ 2005-04-21 21:02 UTC (permalink / raw)
To: 'Lars Marowsky-Bree', device-mapper development,
Andreas Herrmann
Cc: Linux SCSI
On Thursday, April 21, 2005 3:55 PM, Lars Marowsky-Bree wrote:
> Together with the "queue_if_no_path" feature flag for dm-mpath that
> should do what you need to handle this (arguably broken) array
> behaviour: It'll queue until the error goes away and
> multipathd retests
> and reactivates the paths. That ought to work, but given that I don't
> have an IBM ESS accessible, please confirm that.
Depending on the "queue_if_no_path" feature has the current undesirable
side-effect of requiring intervention of the user space multipath components
to reinstate at least one of the paths to a useable state in the multipath
target driver. This dependency currently creates the potential for deadlock
scenarios since the user space multipath components (nor the kernel for that
matter) are currently architected to avoid them.
I think for now it may be better to try to avoid having to fail a path if it
is possible that an io error is not path related.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [dm-devel] Re: fastfail operation and retries
2005-04-21 21:02 Re: fastfail operation and retries goggin, edward
@ 2005-04-21 21:18 ` Lars Marowsky-Bree
0 siblings, 0 replies; 5+ messages in thread
From: Lars Marowsky-Bree @ 2005-04-21 21:18 UTC (permalink / raw)
To: device-mapper development, Andreas Herrmann; +Cc: Linux SCSI
On 2005-04-21T17:02:44, "goggin, edward" <egoggin@emc.com> wrote:
> Depending on the "queue_if_no_path" feature has the current undesirable
> side-effect of requiring intervention of the user space multipath components
> to reinstate at least one of the paths to a useable state in the multipath
> target driver. This dependency currently creates the potential for deadlock
> scenarios since the user space multipath components (nor the kernel for that
> matter) are currently architected to avoid them.
multipath-tools is, to a certain degree, architected to avoid them. And
the kernel is meant to be, too - there's bugs and known FIXME's, but
those are just bugs and we're taking patches gladly ;-)
> I think for now it may be better to try to avoid having to fail a path if it
> is possible that an io error is not path related.
No. Basically every time out error creates a "dunno why" error right now
- could be the storage system itself, could be the network in between.
A failover to another path is the obvious remedy; take for example the
CX series where even if it's not the path, it's the SP, and failing over
to the other SP will cure the problem.
If the storage at least rejects the IO with a specific error code, it
can be worked around by a specific hw handler which doesn't fail the
path but just causes the IO to be queued and retried; that's a pretty
simple hardware handler to write.
But quite frankly, storage subsystems which _reject_ all IO for a given
time are just broken for reliable configurations. What good are they in
multipath configurations if they fail _all_ paths at the same time? How
can they even dare claim redundancy? We can build more or less smelly
kludges around them, but it remains a problem to be fixed at the storage
subsystem level IMNSHO.
Sincerely,
Lars Marowsky-Brée <lmb@suse.de>
--
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [dm-devel] Re: fastfail operation and retries
@ 2005-04-21 21:33 Andreas Herrmann
2005-04-21 21:38 ` David S. Miller
2005-04-21 22:24 ` Lars Marowsky-Bree
0 siblings, 2 replies; 5+ messages in thread
From: Andreas Herrmann @ 2005-04-21 21:33 UTC (permalink / raw)
To: Lars Marowsky-Bree
Cc: device-mapper development, Linux SCSI, linux-scsi-owner
Lars Marowsky-Bree <lmb@suse.de>
21.04.2005 21:54
> On 2005-04-21T09:42:05, Patrick Mansfield <patmans@us.ibm.com> wrote:
> > On Tue, Apr 19, 2005 at 07:19:53PM +0200, Andreas Herrmann wrote:
<snip>
> >
> > We need a patch like Mike Christie had, this:
> >
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=107961883914541&w=2
> >
> > The scsi core should decode the sense data and pass up the result,
then dm
> > need not decode sense data, and we don't need sense data passed around
via
> > the block layer.
> The most recent udm patchset has a patch by Jens Axboe and myself to
> pass up sense data / error codes in the bio so the dm mpath module can
> deal with it.
> Only issue still is that the SCSI midlayer does only generate a single
> "EIO" code also for timeouts; however, that pretty much means it's a
> transport error, because if it was a media error, we'd be getting sense
> data ;-)
Well, there are various situations when all paths to the ESS are
"temporarily unavailable". In some cases TASK_SET_FULL/BUSY is
reported as it should be. In other cases we just encounter data
underruns or exchange sequences are aborted and finally it might be
that requests just time out. BTW, it is not only ESS where I have seen
such (broken) behaviour.
> Together with the "queue_if_no_path" feature flag for dm-mpath that
> should do what you need to handle this (arguably broken) array
> behaviour: It'll queue until the error goes away and multipathd retests
> and reactivates the paths. That ought to work, but given that I don't
> have an IBM ESS accessible, please confirm that.
Sounds good. Will make some tests using the "queue_if_no_path" feature.
> It is possible that to fully support them a dm mpath hardware handler
> (like for the EMC CX family) might be required, too.
For the time being I hope "queue_if_no_path" feature is sufficient
to succesfully pass our tests ;-)
> (For easier testing, you'll find that all this functionality is
> available in the latest SLES9 SP2 betas, to which you ought to have
> access at IBM, and the kernels are also available via
> ftp://ftp.suse.com/pub/projects/kernel/kotd/.)
> > scsi core could be changed to handle device specific decoding via
sense
> > tables that can be modified via sysfs, similar to devinfo code (well,
> > devinfo still lacks a sysfs interface).
> dm-path's capabilities go a bit beyond just the error decoding (which
> for generic devices is also provided for in a generic
> dm_scsi_err_handler()); for example you can code special initialization
> commands and behaviour an array might need.
> Maybe this could indeed be abstracted further to download the command
> and/or specific decoding tables from user-space via sysfs or configfs by
> a generic user-space customizable dm-hw-handler-generic.[ch] plugin; I
> think patches are being accepted ;-)
Thanks for the information.
Regards,
Andreas
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [dm-devel] Re: fastfail operation and retries
2005-04-21 21:33 Andreas Herrmann
@ 2005-04-21 21:38 ` David S. Miller
2005-04-21 22:24 ` Lars Marowsky-Bree
1 sibling, 0 replies; 5+ messages in thread
From: David S. Miller @ 2005-04-21 21:38 UTC (permalink / raw)
To: Andreas Herrmann; +Cc: lmb, dm-devel, linux-scsi, linux-scsi-owner
Please don't add "linux-scsi-owner" to the CC: list like that.
That goes to the list administrator (currently me), not the
linux-scsi mailing list.
There seems to be a rather prominent influx of people sending
posts to the *-owner address lately, I wonder why as nothing has
materially changed in the outgoing headers.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [dm-devel] Re: fastfail operation and retries
2005-04-22 19:13 ` Lan
@ 2005-04-25 23:56 ` Tim Pepper
0 siblings, 0 replies; 5+ messages in thread
From: Tim Pepper @ 2005-04-25 23:56 UTC (permalink / raw)
To: tranlan; +Cc: device-mapper development, Linux SCSI, aherrman
On 4/22/05, Lan <transter@gmail.com> wrote:
>
> queue_if_no_path must be used; I'm not sure why any dm-multipath
> storage users would not want to turn on queue_if_no_path by default?
What protection is there against long term queueing and running the
machine out of memory?
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2005-04-25 23:56 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-21 21:02 Re: fastfail operation and retries goggin, edward
2005-04-21 21:18 ` [dm-devel] " Lars Marowsky-Bree
-- strict thread matches above, loose matches on Subject: below --
2005-04-21 21:33 Andreas Herrmann
2005-04-21 21:38 ` David S. Miller
2005-04-21 22:24 ` Lars Marowsky-Bree
2005-04-22 19:13 ` Lan
2005-04-25 23:56 ` [dm-devel] " Tim Pepper
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox