public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: James Bottomley <James.Bottomley@steeleye.com>
To: SCSI Mailing List <linux-scsi@vger.kernel.org>
Subject: Summary of the Multi-Path BOF at OLS and future directions
Date: 04 Aug 2003 20:54:55 -0700	[thread overview]
Message-ID: <1060042082.1985.53.camel@fuzzy> (raw)

[-- Attachment #1: Type: text/plain, Size: 3690 bytes --]

Hi All,

For those of you who couldn't attend OLS, I thought a short summary of
what went on might be useful.

Multi-path was a hot topic throughout both the Kernel Summit and OLS.
Thing began with a requirement inputs panel of vendors identifying
multi-path as one of their primary problems.  Followed by an invited
discussion with Lars Marowski-Brée and Mike Anderson on multi-path. At
OLS, there was a paper presentation by Mike and Patrick Mansfield on the
IBM SCSI layer multi-pathing solution and finally there was the BOF
session which tried to pick a way forwards for us in 2.6/2.7

What I'd like to summarise is what I think the conclusions we reached
are:

1. Multi-path is relevant to more layers of the I/O stack than just
SCSI. Thus, it makes sense to do it at the layer just above bio.  This
would either be md/multipath or the Device Mapper multi-path module.

2. Doing multi-path at that level is not easy without fast failure
indications.

2a. On discussion of this, it was decided that on each bio/request, the
upper layers would like to indicate which failures they wish to be fast
and which they wish not to know about.  The two principle ones were
transport errors (relevant to multi-path) and medium errors (relevant to
software raid).

2b. Upwards, on fast failure, we would send back the raw sense data
(probably encoded in the sense request) plus a translated indication of
what the problem was.  The translations would probably be a combination
of (fatal|retryable) and (driver error (card out of
resources/failure)|transport error|medum error).

3.  It was noted that symmetric active multi-path in this scheme is not
possible without the ability to place a proper elevator above the
multi-pathing driver (and have a simple queue only noop elevator
below).  This should help alleviate the current fragmentation issues
where symmetric active multi-path produces I/O in decidedly non-optimal
page sized chunks.

4. Configuration of this solution would be extremely important.  The
idea here is to rely on the udev solution currently making its way into
the kernel and essentially have a vendor specific multi-path
configuration as a udev plug-in.

5. Vendor value add for specific devices could be encoded both as
configuration (udev) pieces and plug-ins to the upper layer multi-path
driver to activate any proprietary vendor specific configuration options
that may be needed for specific solutions.

6. Ownership.  This wasn't exactly discussed, but in light of the
problems with even SCSI-3 reservations, it is becoming clear that
storage ownership in a multi-path configuration is getting impossible to
maintain from user level.  Therefore, I at least will be giving thought
to an ownership API that could be used to manage storage ownership from
the kernel in the face of path fail overs.

As far as the beginnings of implementation go, we already have
md/multi-path.  Joe Thorber of Sistina will shortly be releasing the
code to do multi-path over the device mapper interface, and our trusty
block layer maintainer, Jens Axboe, has done the skeleton of a fast fail
infrastructure for us (in 2.6.0-test2).  The attached patch should add
the fast fail capability to SCSI (although without the upwards/downwards
failure indications) and we should be able to build the rest of the
infrastructure on this framework.

As far as errors and omissions go, I found KS/OLS to go rather fast and
be a bit blurry, so hopefully those who were also present can chime in
on this thread to amplify/correct the points I actually managed to grasp
and summarise the ones I missed.

Thanks,

James




[-- Attachment #2: tmp.diff --]
[-- Type: text/plain, Size: 1198 bytes --]

===== scsi_error.c 1.60 vs edited =====
--- 1.60/drivers/scsi/scsi_error.c	Thu Jul 31 07:32:18 2003
+++ edited/scsi_error.c	Mon Aug  4 14:20:24 2003
@@ -1285,7 +1285,12 @@
 
       maybe_retry:
 
-	if ((++scmd->retries) < scmd->allowed) {
+	/* we requeue for retry because the error was retryable, and
+	 * the request was not marked fast fail.  Note that above,
+	 * even if the request is marked fast fail, we still requeue
+	 * for queue congestion conditions (QUEUE_FULL or BUSY) */
+	if ((++scmd->retries) < scmd->allowed 
+	    && !blk_noretry_request(scmd->request)) {
 		return NEEDS_RETRY;
 	} else {
 		/*
===== scsi_lib.c 1.108 vs edited =====
--- 1.108/drivers/scsi/scsi_lib.c	Sat Aug  2 10:18:20 2003
+++ edited/scsi_lib.c	Mon Aug  4 14:26:46 2003
@@ -497,6 +497,13 @@
 	struct request *req = cmd->request;
 	unsigned long flags;
 
+	/* If failfast is enabled, override the number of completed
+	 * sectors to make sure the entire request is finished right
+	 * now */
+	if(blk_noretry_request(req)) {
+		sectors = req->hard_nr_sectors;
+	}
+
 	/*
 	 * If there are blocks left over at the end, set up the command
 	 * to queue the remainder of them.

             reply	other threads:[~2003-08-05  3:54 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-08-05  3:54 James Bottomley [this message]
2003-08-05 16:48 ` Summary of the Multi-Path BOF at OLS and future directions Alan Cox
2003-08-05 17:06   ` James Bottomley
2003-08-07 11:00     ` Alan Cox
2003-08-06  0:14 ` Patrick Mansfield
2003-08-06 20:26   ` Steven Dake
2003-08-07  7:38     ` Lars Marowsky-Bree
2003-08-07 16:20 ` Christoph Hellwig
2003-08-07 23:54   ` Tim Pepper
2003-08-08  6:45   ` Josef Möllers
  -- strict thread matches above, loose matches on Subject: below --
2003-08-08 12:13 jansen, frank
2003-08-08 12:15 ` Christoph Hellwig
2003-08-08 12:21 ` Josef Möllers
2003-08-08 12:28 jansen, frank
2003-08-08 13:27 ` Josef Möllers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1060042082.1985.53.camel@fuzzy \
    --to=james.bottomley@steeleye.com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox