linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* mpath: don't fail paths on first error
@ 2008-06-05 17:25 Mike Christie
  2008-06-06 14:18 ` Hannes Reinecke
  0 siblings, 1 reply; 2+ messages in thread
From: Mike Christie @ 2008-06-05 17:25 UTC (permalink / raw)
  To: device-mapper development, SCSI Mailing List

[-- Attachment #1: Type: text/plain, Size: 1245 bytes --]

The problem we see a lot at Red Hat is that if drivers fail a command 
with DID_BUS_BUSY or DID_ERROR for something like underrun or even for 
transient path problems, we can normally recover from this pretty 
quickly and we do not need to switch path groups.

queue_if_no_path/no_path_retry will prevent IO from being fail upwards, 
but just switching paths can cause a lot of strain on the target, so we 
might want to prevent path switching when we do not need to. If we are 
using a box that requires manual failover or a box that does not use 
manual failover but still has to shift resources between storage 
controllers when switching paths, we most likely do not want to mark 
paths failed for these transient errors.

The attached patch allows us to wait X seconds before marking a path as 
failed. If within X seconds from seeing the first IO error, we do not 
see a IO complete successfully then we mark a path as failed. This patch 
work best with the fail fast enhancements ones where for a lot of path 
problems the fast io fail / recovery timeout will fail io quickly to us 
and the test IOs do not get stuck, and where some errors like DID_ERROR 
are not even failed fast.

The patch should apply over linus's tree or scsi-misc.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-dm-mpath-don-t-fail-paths-on-first-error.patch --]
[-- Type: text/x-patch; name="0001-dm-mpath-don-t-fail-paths-on-first-error.patch", Size: 0 bytes --]



[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: mpath: don't fail paths on first error
  2008-06-05 17:25 mpath: don't fail paths on first error Mike Christie
@ 2008-06-06 14:18 ` Hannes Reinecke
  0 siblings, 0 replies; 2+ messages in thread
From: Hannes Reinecke @ 2008-06-06 14:18 UTC (permalink / raw)
  To: Mike Christie; +Cc: device-mapper development, SCSI Mailing List

Hi Mike,

Mike Christie wrote:
> The problem we see a lot at Red Hat is that if drivers fail a command 
> with DID_BUS_BUSY or DID_ERROR for something like underrun or even for 
> transient path problems, we can normally recover from this pretty 
> quickly and we do not need to switch path groups.
> 
Yeah, I thought about this, too.
> queue_if_no_path/no_path_retry will prevent IO from being fail upwards, 
> but just switching paths can cause a lot of strain on the target, so we 
> might want to prevent path switching when we do not need to. If we are 
> using a box that requires manual failover or a box that does not use 
> manual failover but still has to shift resources between storage 
> controllers when switching paths, we most likely do not want to mark 
> paths failed for these transient errors.
> 
Well, the original design idea was that it always will be quicker or
less error-prone to just move the I/O to the next path.
Seeing that this is not always the case this approach is probably
better.

> The attached patch allows us to wait X seconds before marking a path as 
> failed. If within X seconds from seeing the first IO error, we do not 
> see a IO complete successfully then we mark a path as failed. This patch 
> work best with the fail fast enhancements ones where for a lot of path 
> problems the fast io fail / recovery timeout will fail io quickly to us 
> and the test IOs do not get stuck, and where some errors like DID_ERROR 
> are not even failed fast.
> 
> The patch should apply over linus's tree or scsi-misc.
> 
Thanks for this, Mike.

Signed-off-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2008-06-06 14:18 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-05 17:25 mpath: don't fail paths on first error Mike Christie
2008-06-06 14:18 ` Hannes Reinecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).