public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* Summary of the Multi-Path BOF at OLS and future directions
@ 2003-08-05  3:54 James Bottomley
  2003-08-05 16:48 ` Alan Cox
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: James Bottomley @ 2003-08-05  3:54 UTC (permalink / raw)
  To: SCSI Mailing List

[-- Attachment #1: Type: text/plain, Size: 3690 bytes --]

Hi All,

For those of you who couldn't attend OLS, I thought a short summary of
what went on might be useful.

Multi-path was a hot topic throughout both the Kernel Summit and OLS.
Thing began with a requirement inputs panel of vendors identifying
multi-path as one of their primary problems.  Followed by an invited
discussion with Lars Marowski-Brée and Mike Anderson on multi-path. At
OLS, there was a paper presentation by Mike and Patrick Mansfield on the
IBM SCSI layer multi-pathing solution and finally there was the BOF
session which tried to pick a way forwards for us in 2.6/2.7

What I'd like to summarise is what I think the conclusions we reached
are:

1. Multi-path is relevant to more layers of the I/O stack than just
SCSI. Thus, it makes sense to do it at the layer just above bio.  This
would either be md/multipath or the Device Mapper multi-path module.

2. Doing multi-path at that level is not easy without fast failure
indications.

2a. On discussion of this, it was decided that on each bio/request, the
upper layers would like to indicate which failures they wish to be fast
and which they wish not to know about.  The two principle ones were
transport errors (relevant to multi-path) and medium errors (relevant to
software raid).

2b. Upwards, on fast failure, we would send back the raw sense data
(probably encoded in the sense request) plus a translated indication of
what the problem was.  The translations would probably be a combination
of (fatal|retryable) and (driver error (card out of
resources/failure)|transport error|medum error).

3.  It was noted that symmetric active multi-path in this scheme is not
possible without the ability to place a proper elevator above the
multi-pathing driver (and have a simple queue only noop elevator
below).  This should help alleviate the current fragmentation issues
where symmetric active multi-path produces I/O in decidedly non-optimal
page sized chunks.

4. Configuration of this solution would be extremely important.  The
idea here is to rely on the udev solution currently making its way into
the kernel and essentially have a vendor specific multi-path
configuration as a udev plug-in.

5. Vendor value add for specific devices could be encoded both as
configuration (udev) pieces and plug-ins to the upper layer multi-path
driver to activate any proprietary vendor specific configuration options
that may be needed for specific solutions.

6. Ownership.  This wasn't exactly discussed, but in light of the
problems with even SCSI-3 reservations, it is becoming clear that
storage ownership in a multi-path configuration is getting impossible to
maintain from user level.  Therefore, I at least will be giving thought
to an ownership API that could be used to manage storage ownership from
the kernel in the face of path fail overs.

As far as the beginnings of implementation go, we already have
md/multi-path.  Joe Thorber of Sistina will shortly be releasing the
code to do multi-path over the device mapper interface, and our trusty
block layer maintainer, Jens Axboe, has done the skeleton of a fast fail
infrastructure for us (in 2.6.0-test2).  The attached patch should add
the fast fail capability to SCSI (although without the upwards/downwards
failure indications) and we should be able to build the rest of the
infrastructure on this framework.

As far as errors and omissions go, I found KS/OLS to go rather fast and
be a bit blurry, so hopefully those who were also present can chime in
on this thread to amplify/correct the points I actually managed to grasp
and summarise the ones I missed.

Thanks,

James




[-- Attachment #2: tmp.diff --]
[-- Type: text/plain, Size: 1198 bytes --]

===== scsi_error.c 1.60 vs edited =====
--- 1.60/drivers/scsi/scsi_error.c	Thu Jul 31 07:32:18 2003
+++ edited/scsi_error.c	Mon Aug  4 14:20:24 2003
@@ -1285,7 +1285,12 @@
 
       maybe_retry:
 
-	if ((++scmd->retries) < scmd->allowed) {
+	/* we requeue for retry because the error was retryable, and
+	 * the request was not marked fast fail.  Note that above,
+	 * even if the request is marked fast fail, we still requeue
+	 * for queue congestion conditions (QUEUE_FULL or BUSY) */
+	if ((++scmd->retries) < scmd->allowed 
+	    && !blk_noretry_request(scmd->request)) {
 		return NEEDS_RETRY;
 	} else {
 		/*
===== scsi_lib.c 1.108 vs edited =====
--- 1.108/drivers/scsi/scsi_lib.c	Sat Aug  2 10:18:20 2003
+++ edited/scsi_lib.c	Mon Aug  4 14:26:46 2003
@@ -497,6 +497,13 @@
 	struct request *req = cmd->request;
 	unsigned long flags;
 
+	/* If failfast is enabled, override the number of completed
+	 * sectors to make sure the entire request is finished right
+	 * now */
+	if(blk_noretry_request(req)) {
+		sectors = req->hard_nr_sectors;
+	}
+
 	/*
 	 * If there are blocks left over at the end, set up the command
 	 * to queue the remainder of them.

^ permalink raw reply	[flat|nested] 15+ messages in thread
* RE: Summary of the Multi-Path BOF at OLS and future directions
@ 2003-08-08 12:13 jansen, frank
  2003-08-08 12:15 ` Christoph Hellwig
  2003-08-08 12:21 ` Josef Möllers
  0 siblings, 2 replies; 15+ messages in thread
From: jansen, frank @ 2003-08-08 12:13 UTC (permalink / raw)
  To: 'Josef Möllers', Christoph Hellwig
  Cc: James Bottomley, SCSI Mailing List

> 
> Christoph Hellwig wrote:
> 
> > > 4. Configuration of this solution would be extremely 
> important.  The
> > > idea here is to rely on the udev solution currently 
> making its way into
> > > the kernel and essentially have a vendor specific multi-path
> > > configuration as a udev plug-in.
> > >
> > > 5. Vendor value add for specific devices could be encoded both as
> > > configuration (udev) pieces and plug-ins to the upper 
> layer multi-path
> > > driver to activate any proprietary vendor specific 
> configuration options
> > > that may be needed for specific solutions.
> > 
> > What are examples of such value add?
> 
> Some older EMC RAID boxes need a special "TRESPASS" command 
> (implemented
> as a MODE SELECT) to switch the ownership of a LUN from one path
> ("Storage Processor") to the other. The newer ones do this
> automatically, though.

The delineation between the EMC boxes that require vs. not require
a "TRESPASS" command lies not with their age, but rather their family.
The CLARiiON family of Storage Arrays require a "TRESPASS" command to
move a LUN from one redundant Storage Processor (SP) to another.  On the
other hand, Symmetrix Storage Arrays do not use any such behavior.  In
a nutshell, the Symmetrix is active on all paths, on which the LUN is
presented, whereas the CLARiiON is active on one SP and passive on the
other SP.

> Without the TRESPASS, you'd see the LUN (INQUIRY), but you can't do
> anything with it, even a simple READ CAPACITY fails.

This is correct.  Note, that it is not necessarily desirable to TRESPASS
the LUN, as there may be an active path from the other SP to this or
another HBA on the host system.

> -- 
> Josef Möllers (Pinguinpfleger bei FSC)
> 	If failure had no penalty success would not be a prize
> 						-- T.  Pratchett
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread
* RE: Summary of the Multi-Path BOF at OLS and future directions
@ 2003-08-08 12:28 jansen, frank
  2003-08-08 13:27 ` Josef Möllers
  0 siblings, 1 reply; 15+ messages in thread
From: jansen, frank @ 2003-08-08 12:28 UTC (permalink / raw)
  To: 'Josef Möllers', jansen, frank
  Cc: Christoph Hellwig, James Bottomley, SCSI Mailing List

> Josef Möllers wrote
> 
> With MultiPath it _is_ necessary to TRESPASS a CLARiiON box if each SP
> is conected to a seperate path.
> 
To access the non-active path, this is correct.  What I was trying to say
is that the MultiPath layer should be somewhat intelligent about this and
be aware that there may be another active path.  It is much quicker to go
down an active path than trespass a LUN and then do the I/O.  The other
part is the risk of excessive trespassing, where a LUN just gets bounced
back and forth between SPs for each I/O; this is an absolute worst case 
that would bring performance to a standstill.
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2003-08-08 13:25 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-08-05  3:54 Summary of the Multi-Path BOF at OLS and future directions James Bottomley
2003-08-05 16:48 ` Alan Cox
2003-08-05 17:06   ` James Bottomley
2003-08-07 11:00     ` Alan Cox
2003-08-06  0:14 ` Patrick Mansfield
2003-08-06 20:26   ` Steven Dake
2003-08-07  7:38     ` Lars Marowsky-Bree
2003-08-07 16:20 ` Christoph Hellwig
2003-08-07 23:54   ` Tim Pepper
2003-08-08  6:45   ` Josef Möllers
  -- strict thread matches above, loose matches on Subject: below --
2003-08-08 12:13 jansen, frank
2003-08-08 12:15 ` Christoph Hellwig
2003-08-08 12:21 ` Josef Möllers
2003-08-08 12:28 jansen, frank
2003-08-08 13:27 ` Josef Möllers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox