block and scsi fail fast fixes

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* block and scsi fail fast fixes
@ 2008-06-05  1:41 michaelc
  2008-06-05  1:41 ` [PATCH 1/7] scsi: add transport host byte errors (v2) michaelc
  0 siblings, 1 reply; 9+ messages in thread
From: michaelc @ 2008-06-05  1:41 UTC (permalink / raw)
  To: dm-devel, linux-scsi

The following patches fix two problems I have been seeing in Red Hat
bugzillas. The patches are made over scsi-misc, but except for
0006-block-and-drivers-separate-failfast-into-multiple-b.patch
they could also apply over scsi-rc-fixes or linus's tree.
0006-block-and-drivers-separate-failfast-into-multiple-b.patch has a patch
to convert the scsi dh modules so that is why it does not apply to
the other kernels.

The first problem is that when a transport problem is detected and
the classes/drivers block the scsi_devices, there is IO in the driver
and IO in the scsi_device queues. For fibre we have the fast IO fail
tmo infrastructure to allow us to get IO in the driver up to multipath,
but IO in the queues remains until the dev_loss_tmo fires. The
difference between the timers can be minutes, so it looks like hang to
the application. iSCSI has something similar to FC's fast io fail
tmo, but it is called the replacment timeout. With this we will fail
all IO that is in the driver or queued or any incoming IO.

The first 5 patches try to provide common behavior:
0001-scsi-add-transport-host-byte-errors-v2.patch
0002-iscsi-class-libiscsi-and-qla4xxx-convert-to-new-tr.patch
0003-fc-class-Add-support-for-new-transport-errors.patch
0004-qla2xxx-use-new-host-byte-transport-errors.patch
0005-lpfc-start-to-use-new-trasnport-errors.patch

Basically, when we block a device we fail IO with DID_TRANSPORT_DISRUPTED.
When the fast io transport timer fires we fail IO with DID_TRANSPORT_FAILFAST.

I converted qla2xxx and tried to convert lpfc (I was not sure about
some of the errors). zfcp and mpt need to be converted, but it looked
like they would be ok with the patches below. I could only test qla2xxx
and lpfc though.

The second problem is that multipath is not really good at handling a lot
of errors. It just retries all errors on a different path, so for transport
errors it makes a lot of sense to send them up to us pretty quickly. But
device errors or driver errors or weird ones inbetween the scsi layer is
better at handling them because the multipath layer does not know anything
about scsi details.

The patches:
0006-block-and-drivers-separate-failfast-into-multiple-b.patch
0007-scsi-Support-fail-fast-bits.patch

are really simple and just break up the FAILFAST bits into device, driver
and transport bits, so the upper layer can ask the lower layers to only
fail fast certain types of errors. For multipath we only set the transport
fail fast bit, and I thought in the future maybe something like RAID
would set the device failfast error and not want transport errors failed
fast to it.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/7] scsi: add transport host byte errors (v2)
  2008-06-05  1:41 block and scsi fail fast fixes michaelc
@ 2008-06-05  1:41 ` michaelc
  2008-06-05  1:41   ` [PATCH 2/7] iscsi class, libiscsi and qla4xxx: convert to new transport host byte values michaelc
  0 siblings, 1 reply; 9+ messages in thread
From: michaelc @ 2008-06-05  1:41 UTC (permalink / raw)
  To: dm-devel, linux-scsi; +Cc: Mike Christie

From: Mike Christie <michaelc@cs.wisc.edu>

Currently, if there is a transport problem the iscsi drivers will return
outstanding commands (commands being exeucted by the driver/fw/hw) with
DID_BUS_BUSY and block the session so no new commands can be queued.
Commands that are caught between the failure handling and blocking are
failed with DID_IMM_RETRY or one of the scsi ml queuecommand return values.
When the recovery_timeout fires, the iscsi drivers then fail IO with
DID_NO_CONNECT.

For fcp, some drivers will fail some outstanding IO (disk but possibly not
tape) with DID_BUS_BUSY or DID_ERROR or some other value that causes a retry
and hits the scsi_error.c failfast check, block the rport, and commands
caught in the race are failed with DID_IMM_RETRY. Other drivers, will hold
onto all IO and wait for the terminate_rport_io or dev_loss_tmo_callbk to be
called.

The following patches attempt to unify what upper layers will see drivers
like multipath can make a good guess. This relies on drivers being
hooked into their transport class.

This first patch just defines two new host byte errors so drivers can
return the same value for when a rport/session is blocked and for
when the fast_io_fail_tmo fires.

The idea is that if the LLD/class detects a problem and is going to block
a rport/session, then if the LLD wants or must return the command to scsi-ml,
then it can return it with DID_TRANSPORT_DISRUPTED. This will requeue
the IO into the same scsi queue it came from, until the fast io fail timer
fires and the class decides what to do.

When using multipath and the fast_io_fail_tmo fires then the class
can fail commands with DID_TRANSPORT_FAILFAST or drivers can use
DID_TRANSPORT_FAILFAST in their terminate_rport_io callbacks or
the equivlent in iscsi if we ever implement more advanced recovery methods.
A LLD, like lpfc, could continue to return DID_ERROR and then it will hit
the normal failfast path. The point of the patches is that upper layers will
not see a failure that could be recovered from while the rport/session is
blocked until fast_io_fail_tmo/recovery_timeout fires.

V2
Fixed patch/diff errors and renamed DID_TRANSPORT_BLOCKED to
DID_TRANSPORT_DISRUPTED.
V1
initial patch.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/scsi/constants.c  |    3 ++-
 drivers/scsi/scsi_error.c |   18 +++++++++++++++++-
 include/scsi/scsi.h       |    5 +++++
 3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/constants.c b/drivers/scsi/constants.c
index 9785d73..4003dee 100644
--- a/drivers/scsi/constants.c
+++ b/drivers/scsi/constants.c
@@ -1364,7 +1364,8 @@ EXPORT_SYMBOL(scsi_print_sense);
 static const char * const hostbyte_table[]={
 "DID_OK", "DID_NO_CONNECT", "DID_BUS_BUSY", "DID_TIME_OUT", "DID_BAD_TARGET",
 "DID_ABORT", "DID_PARITY", "DID_ERROR", "DID_RESET", "DID_BAD_INTR",
-"DID_PASSTHROUGH", "DID_SOFT_ERROR", "DID_IMM_RETRY", "DID_REQUEUE"};
+"DID_PASSTHROUGH", "DID_SOFT_ERROR", "DID_IMM_RETRY", "DID_REQUEUE",
+"DID_TRANSPORT_DISRUPTED", "DID_TRANSPORT_FAILFAST" };
 #define NUM_HOSTBYTE_STRS ARRAY_SIZE(hostbyte_table)

 static const char * const driverbyte_table[]={
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 006a959..d257210 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1343,7 +1343,23 @@ int scsi_decide_disposition(struct scsi_cmnd *scmd)

 	case DID_REQUEUE:
 		return ADD_TO_MLQUEUE;
-
+	case DID_TRANSPORT_DISRUPTED:
+		/*
+		 * LLD/transport was disrupted during processing of the IO.
+		 * The transport class is now blocked/blocking,
+		 * and the transport will decide what to do with the IO
+		 * based on its timers and recovery capablilities.
+		 *
+		 * TODO: When the target block code is merged we can block
+		 * entire target instead of just this device.
+		 */
+		return ADD_TO_MLQUEUE;
+	case DID_TRANSPORT_FAILFAST:
+		/*
+		 * The transport decided to failfast the IO (most likely
+		 * the fast io fail tmo fired), so send IO directly upwards.
+		 */
+		return SUCCESS;
 	case DID_ERROR:
 		if (msg_byte(scmd->result) == COMMAND_COMPLETE &&
 		    status_byte(scmd->result) == RESERVATION_CONFLICT)
diff --git a/include/scsi/scsi.h b/include/scsi/scsi.h
index 2b5b935..df2c775 100644
--- a/include/scsi/scsi.h
+++ b/include/scsi/scsi.h
@@ -363,6 +363,11 @@ struct scsi_lun {
 #define DID_IMM_RETRY   0x0c	/* Retry without decrementing retry count  */
 #define DID_REQUEUE	0x0d	/* Requeue command (no immediate retry) also
 				 * without decrementing the retry count	   */
+#define DID_TRANSPORT_DISRUPTED 0x0e /* Transport error disrupted execution
+				      * and the driver blocked the port to
+				      * recover the link. Transport class will
+				      * retry or fail IO */
+#define DID_TRANSPORT_FAILFAST	0x0f /* Transport class fastfailed the io */
 #define DRIVER_OK       0x00	/* Driver status                           */

 /*
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/7] iscsi class, libiscsi and qla4xxx: convert to new transport host byte values
  2008-06-05  1:41 ` [PATCH 1/7] scsi: add transport host byte errors (v2) michaelc
@ 2008-06-05  1:41   ` michaelc
  2008-06-05  1:41     ` [PATCH 3/7] fc class: Add support for new transport errors michaelc
  0 siblings, 1 reply; 9+ messages in thread
From: michaelc @ 2008-06-05  1:41 UTC (permalink / raw)
  To: dm-devel, linux-scsi; +Cc: Mike Christie

From: Mike Christie <michaelc@cs.wisc.edu>

This patch converts the iscsi drivers to the new host byte values.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/scsi/libiscsi.c             |    6 +++---
 drivers/scsi/qla4xxx/ql4_isr.c      |    4 ++--
 drivers/scsi/scsi_transport_iscsi.c |    4 ++--
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 010c1b9..055c196 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -1132,7 +1132,7 @@ int iscsi_queuecommand(struct scsi_cmnd *sc, void (*done)(struct scsi_cmnd *))
 		switch (session->state) {
 		case ISCSI_STATE_IN_RECOVERY:
 			reason = FAILURE_SESSION_IN_RECOVERY;
-			sc->result = DID_IMM_RETRY << 16;
+			sc->result = DID_TRANSPORT_DISRUPTED << 16;
 			break;
 		case ISCSI_STATE_LOGGING_OUT:
 			reason = FAILURE_SESSION_LOGGING_OUT;
@@ -1140,7 +1140,7 @@ int iscsi_queuecommand(struct scsi_cmnd *sc, void (*done)(struct scsi_cmnd *))
 			break;
 		case ISCSI_STATE_RECOVERY_FAILED:
 			reason = FAILURE_SESSION_RECOVERY_TIMEOUT;
-			sc->result = DID_NO_CONNECT << 16;
+			sc->result = DID_TRANSPORT_FAILFAST << 16;
 			break;
 		case ISCSI_STATE_TERMINATE:
 			reason = FAILURE_SESSION_TERMINATE;
@@ -2233,7 +2233,7 @@ static void iscsi_start_session_recovery(struct iscsi_session *session,
 	 */
 	spin_lock_bh(&session->lock);
 	fail_all_commands(conn, -1,
-			STOP_CONN_RECOVER ? DID_BUS_BUSY : DID_ERROR);
+			STOP_CONN_RECOVER ? DID_TRANSPORT_DISRUPTED : DID_ERROR);
 	flush_control_queues(session, conn);
 	spin_unlock_bh(&session->lock);
 	mutex_unlock(&session->eh_mutex);
diff --git a/drivers/scsi/qla4xxx/ql4_isr.c b/drivers/scsi/qla4xxx/ql4_isr.c
index a91a57c..799120f 100644
--- a/drivers/scsi/qla4xxx/ql4_isr.c
+++ b/drivers/scsi/qla4xxx/ql4_isr.c
@@ -139,7 +139,7 @@ static void qla4xxx_status_entry(struct scsi_qla_host *ha,
 			      ha->host_no, cmd->device->channel,
 			      cmd->device->id, cmd->device->lun));
 
-		cmd->result = DID_BUS_BUSY << 16;
+		cmd->result = DID_TRANSPORT_DISRUPTED << 16;
 
 		/*
 		 * Mark device missing so that we won't continue to send
@@ -243,7 +243,7 @@ static void qla4xxx_status_entry(struct scsi_qla_host *ha,
 		if (atomic_read(&ddb_entry->state) == DDB_STATE_ONLINE)
 			qla4xxx_mark_device_missing(ha, ddb_entry);
 
-		cmd->result = DID_BUS_BUSY << 16;
+		cmd->result = DID_TRANSPORT_DISRUPTED << 16;
 		break;
 
 	case SCS_QUEUE_FULL:
diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c
index 65d1737..b5a529c 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -258,10 +258,10 @@ int iscsi_session_chkready(struct iscsi_cls_session *session)
 		err = 0;
 		break;
 	case ISCSI_SESSION_FAILED:
-		err = DID_IMM_RETRY << 16;
+		err = DID_TRANSPORT_DISRUPTED << 16;
 		break;
 	case ISCSI_SESSION_FREE:
-		err = DID_NO_CONNECT << 16;
+		err = DID_TRANSPORT_FAILFAST << 16;
 		break;
 	default:
 		err = DID_NO_CONNECT << 16;
-- 
1.5.4.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/7] fc class: Add support for new transport errors
  2008-06-05  1:41   ` [PATCH 2/7] iscsi class, libiscsi and qla4xxx: convert to new transport host byte values michaelc
@ 2008-06-05  1:41     ` michaelc
  2008-06-05  1:41       ` [PATCH 4/7] qla2xxx: use new host byte " michaelc
  2008-08-19 15:35       ` [PATCH 3/7] fc class: Add support for new transport errors James Smart
  0 siblings, 2 replies; 9+ messages in thread
From: michaelc @ 2008-06-05  1:41 UTC (permalink / raw)
  To: dm-devel, linux-scsi; +Cc: Mike Christie

From: Mike Christie <michaelc@cs.wisc.edu>

When we block a rport and the driver implements the terminate
callback we will fail IO that was running quickly. However
IO that was in the scsi_device/block queue sits there until
the dev_loss_tmo fires, and this can make it look like IO is
lost because new IO will get executed but that IO stuck in
the blocked queue sits there for some time longer.

With this patch when the fast io fail tmo fires, we will
fail the blocked IO and any new IO. This patch also allows
all drivers to partially support the fast io fail tmo. If the
terminate io callback is not implemented, we will still fail blocked
IO and any new IO, so multipath can handle that. This means that for
drivers like qla2xxx which seem to fail the IO when the error is first
detected this will then allow drivers like lpfc and qla2xxx to have the
IO flushed to the upper layers when the fast io fail tmo is fired.

This patch also allows the fc and iscsi classes to implement the
same behavior. The timers are just unfornately named differently.

The next patches will convert the drivers to support this.

This patch has been lightly tested with lpfc and qla2xxx. I am not able
to test the role change handling.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/scsi/scsi_transport_fc.c |   15 ++++++++++-----
 include/scsi/scsi_transport_fc.h |    8 ++++++--
 2 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index 5fd64e7..ea4906c 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -2156,8 +2156,7 @@ fc_attach_transport(struct fc_function_template *ft)
 	SETUP_PRIVATE_RPORT_ATTRIBUTE_RD(roles);
 	SETUP_PRIVATE_RPORT_ATTRIBUTE_RD(port_state);
 	SETUP_PRIVATE_RPORT_ATTRIBUTE_RD(scsi_target_id);
-	if (ft->terminate_rport_io)
-		SETUP_PRIVATE_RPORT_ATTRIBUTE_RW(fast_io_fail_tmo);
+	SETUP_PRIVATE_RPORT_ATTRIBUTE_RW(fast_io_fail_tmo);
 
 	BUG_ON(count > FC_RPORT_NUM_ATTRS);
 
@@ -2662,6 +2661,7 @@ fc_remote_port_add(struct Scsi_Host *shost, int channel,
 
 				spin_lock_irqsave(shost->host_lock, flags);
 
+				rport->flags &= ~FC_RPORT_FAST_FAIL_TIMEDOUT;
 				rport->flags &= ~FC_RPORT_DEVLOSS_PENDING;
 
 				/* if target, initiate a scan */
@@ -2725,6 +2725,7 @@ fc_remote_port_add(struct Scsi_Host *shost, int channel,
 			rport->port_id = ids->port_id;
 			rport->roles = ids->roles;
 			rport->port_state = FC_PORTSTATE_ONLINE;
+			rport->flags &= ~FC_RPORT_FAST_FAIL_TIMEDOUT;
 
 			if (fci->f->dd_fcrport_size)
 				memset(rport->dd_data, 0,
@@ -2807,7 +2808,6 @@ void
 fc_remote_port_delete(struct fc_rport  *rport)
 {
 	struct Scsi_Host *shost = rport_to_shost(rport);
-	struct fc_internal *i = to_fc_internal(shost->transportt);
 	int timeout = rport->dev_loss_tmo;
 	unsigned long flags;
 
@@ -2853,7 +2853,7 @@ fc_remote_port_delete(struct fc_rport  *rport)
 
 	/* see if we need to kill io faster than waiting for device loss */
 	if ((rport->fast_io_fail_tmo != -1) &&
-	    (rport->fast_io_fail_tmo < timeout) && (i->f->terminate_rport_io))
+	    (rport->fast_io_fail_tmo < timeout))
 		fc_queue_devloss_work(shost, &rport->fail_io_work,
 					rport->fast_io_fail_tmo * HZ);
 
@@ -2930,6 +2930,7 @@ fc_remote_port_rolechg(struct fc_rport  *rport, u32 roles)
 
 		spin_lock_irqsave(shost->host_lock, flags);
 		rport->flags &= ~FC_RPORT_DEVLOSS_PENDING;
+		rport->flags &= ~FC_RPORT_FAST_FAIL_TIMEDOUT;
 		spin_unlock_irqrestore(shost->host_lock, flags);
 
 		/* ensure any stgt delete functions are done */
@@ -3024,6 +3025,7 @@ fc_timeout_deleted_rport(struct work_struct *work)
 	rport->supported_classes = FC_COS_UNSPECIFIED;
 	rport->roles = FC_PORT_ROLE_UNKNOWN;
 	rport->port_state = FC_PORTSTATE_NOTPRESENT;
+	rport->flags &= ~FC_RPORT_FAST_FAIL_TIMEDOUT;
 
 	/* remove the identifiers that aren't used in the consisting binding */
 	switch (fc_host->tgtid_bind_type) {
@@ -3072,7 +3074,10 @@ fc_timeout_fail_rport_io(struct work_struct *work)
 	if (rport->port_state != FC_PORTSTATE_BLOCKED)
 		return;
 
-	i->f->terminate_rport_io(rport);
+	rport->flags |= FC_RPORT_FAST_FAIL_TIMEDOUT;
+	if (i->f->terminate_rport_io)
+		i->f->terminate_rport_io(rport);
+	scsi_target_unblock(&rport->dev);
 }
 
 /**
diff --git a/include/scsi/scsi_transport_fc.h b/include/scsi/scsi_transport_fc.h
index 06f72ba..4cf6fb0 100644
--- a/include/scsi/scsi_transport_fc.h
+++ b/include/scsi/scsi_transport_fc.h
@@ -338,6 +338,7 @@ struct fc_rport {	/* aka fc_starget_attrs */
 /* bit field values for struct fc_rport "flags" field: */
 #define FC_RPORT_DEVLOSS_PENDING	0x01
 #define FC_RPORT_SCAN_PENDING		0x02
+#define FC_RPORT_FAST_FAIL_TIMEDOUT	0x03
 
 #define	dev_to_rport(d)				\
 	container_of(d, struct fc_rport, dev)
@@ -659,12 +660,15 @@ fc_remote_port_chkready(struct fc_rport *rport)
 		if (rport->roles & FC_PORT_ROLE_FCP_TARGET)
 			result = 0;
 		else if (rport->flags & FC_RPORT_DEVLOSS_PENDING)
-			result = DID_IMM_RETRY << 16;
+			result = DID_TRANSPORT_DISRUPTED << 16;
 		else
 			result = DID_NO_CONNECT << 16;
 		break;
 	case FC_PORTSTATE_BLOCKED:
-		result = DID_IMM_RETRY << 16;
+		if (rport->flags & FC_RPORT_FAST_FAIL_TIMEDOUT)
+			result = DID_TRANSPORT_FAILFAST << 16;
+		else
+			result = DID_TRANSPORT_DISRUPTED << 16;
 		break;
 	default:
 		result = DID_NO_CONNECT << 16;
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/7] qla2xxx: use new host byte transport errors.
  2008-06-05  1:41     ` [PATCH 3/7] fc class: Add support for new transport errors michaelc
@ 2008-06-05  1:41       ` michaelc
  2008-06-05  1:41         ` [PATCH 5/7] lpfc: start to use new trasnport errors michaelc
  2008-08-19 15:35       ` [PATCH 3/7] fc class: Add support for new transport errors James Smart
  1 sibling, 1 reply; 9+ messages in thread
From: michaelc @ 2008-06-05  1:41 UTC (permalink / raw)
  To: dm-devel, linux-scsi; +Cc: Mike Christie

From: Mike Christie <michaelc@cs.wisc.edu>

This has qla2xxx use the new transport error values instead of
DID_BUS_BUSY. I am not sure if all the errors
in qla_isr.c I changed are transport related. We end up blocking/deleting
the rport for all of them so it is ok to use the new transport error since
the fc classs will decide when to fail the IO.

With this patch if I pull a cable then IO that had reached
the driver, will be failed with DID_TRANSPORT_DISRUPTED. The
fc class will then fail the IO when the fast io fail tmo
has fired.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/scsi/qla2xxx/qla_isr.c |   14 ++++++++++++--
 1 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
index 5d9a64a..b1dec6e 100644
--- a/drivers/scsi/qla2xxx/qla_isr.c
+++ b/drivers/scsi/qla2xxx/qla_isr.c
@@ -1222,7 +1222,12 @@ qla2x00_status_entry(scsi_qla_host_t *ha, void *pkt)
 		    cp->serial_number, comp_status,
 		    atomic_read(&fcport->state)));
 
-		cp->result = DID_BUS_BUSY << 16;
+		/*
+		 * We are going to have the fc class block the rport
+		 * while we try to recover so instruct the mid layer
+		 * to requeue until the class decides how to handle this.
+		 */
+		cp->result = DID_TRANSPORT_DISRUPTED << 16;
 		if (atomic_read(&fcport->state) == FCS_ONLINE) {
 			qla2x00_mark_device_lost(ha, fcport, 1, 1);
 		}
@@ -1250,7 +1255,12 @@ qla2x00_status_entry(scsi_qla_host_t *ha, void *pkt)
 		break;
 
 	case CS_TIMEOUT:
-		cp->result = DID_BUS_BUSY << 16;
+		/*
+		 * We are going to have the fc class block the rport
+		 * while we try to recover so instruct the mid layer
+		 * to requeue until the class decides how to handle this.
+		 */
+		cp->result = DID_TRANSPORT_DISRUPTED << 16;
 
 		if (IS_FWI2_CAPABLE(ha)) {
 			DEBUG2(printk(KERN_INFO
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 5/7] lpfc: start to use new trasnport errors.
  2008-06-05  1:41       ` [PATCH 4/7] qla2xxx: use new host byte " michaelc
@ 2008-06-05  1:41         ` michaelc
  2008-06-05  1:41           ` [PATCH 6/7] block and drivers: separate failfast into multiple bits michaelc
  0 siblings, 1 reply; 9+ messages in thread
From: michaelc @ 2008-06-05  1:41 UTC (permalink / raw)
  To: dm-devel, linux-scsi; +Cc: Mike Christie

From: Mike Christie <michaelc@cs.wisc.edu>

This is only a test patch to get lpfc going. For the case I changed
it looked like the rport is deleted then we fail these IOs with
DID_BUS_BUSY so using DID_TRANSPORT_DISRUPTED was correct. In testing
the driver by stopping the fcp service on the target this worked.

I was not sure if maybe this bus busy:
                case IOSTAT_NPORT_BSY:
                case IOSTAT_FABRIC_BSY:
                        cmd->result = ScsiResult(DID_BUS_BUSY, 0);
should also be converted. For qla2xxx I thought we blocked the
rport for similar errors (at least the names sounded similar :)) and so I
used DID_TRANSPORT_DISRUPTED, but for lpfc I could not
hit this code and was not sure by just looking at it if it was exactly
the same, so I did not touch it in this patch.

I was also not sure about some cases where if I just unplugged a cable.
I would sometimes get IOSTAT_LOCAL_REJECT with IOERR_DEFAULT, so it seemed
like DID_ERROR was right for that, but I had seen that there is also
a IOERR_LINK_DOWN value. Maybe for that if we end up deleting the rport
we should be returning DID_TRANSPORT_DISRUPTED, but I was not able to
hit that case and was not able to tell from the code when I should, so
I did not touch it.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/scsi/lpfc/lpfc_scsi.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
index 0910a9a..83f7e43 100644
--- a/drivers/scsi/lpfc/lpfc_scsi.c
+++ b/drivers/scsi/lpfc/lpfc_scsi.c
@@ -590,7 +590,14 @@ lpfc_scsi_cmd_iocb_cmpl(struct lpfc_hba *phba, struct lpfc_iocbq *pIocbIn,
 
 		if (!pnode || !NLP_CHK_NODE_ACT(pnode)
 		    || (pnode->nlp_state != NLP_STE_MAPPED_NODE))
-			cmd->result = ScsiResult(DID_BUS_BUSY, SAM_STAT_BUSY);
+			/*
+			 * Port is not setup so fail IO with
+			 * DID_TRANSPORT_DISRUPTED, and allow the fc
+			 * class to determine what to do with it when
+			 * its timers fire.
+			 */
+			cmd->result = ScsiResult(DID_TRANSPORT_DISRUPTED,
+						 SAM_STAT_BUSY);
 	} else {
 		cmd->result = ScsiResult(DID_OK, 0);
 	}
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 6/7] block and drivers: separate failfast into multiple bits.
  2008-06-05  1:41         ` [PATCH 5/7] lpfc: start to use new trasnport errors michaelc
@ 2008-06-05  1:41           ` michaelc
  2008-06-05  1:41             ` [PATCH 7/7] scsi: Support fail fast bits michaelc
  0 siblings, 1 reply; 9+ messages in thread
From: michaelc @ 2008-06-05  1:41 UTC (permalink / raw)
  To: dm-devel, linux-scsi; +Cc: Mike Christie, Jens Axboe

From: Mike Christie <michaelc@cs.wisc.edu>

Multipath is best at handling transport errors. If it gets a device
error then there is not much the multipath layer can do. It will just
access the same device but from a different path. RAID is best at
handling device errors. If it gets a transport error it is going to
do the same thing the lower level would have done - retry it on the
same path.

This patch breaks up failfast into device, transport and driver errors.
The multipath layers (md and dm mutlipath) only ask the lower levels to
fast fail transport errors, but read ahead will ask to fast fail
on all errors.

Note that blk_noretry_request will return true if any failfast bit
is set. This allows drivers that do not support the multipath failfast
bits to continue to fail on any failfast error like before. As a result I
was thinking blk_noretry_request should have a different name like
blk_noretry_any_error or something, but I will do the rename changes
in a different patch.

Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
---
 block/blk-core.c                            |   11 +++++++++--
 drivers/md/dm-mpath.c                       |    2 +-
 drivers/md/multipath.c                      |    4 ++--
 drivers/s390/block/dasd_diag.c              |    2 +-
 drivers/s390/block/dasd_eckd.c              |    2 +-
 drivers/s390/block/dasd_fba.c               |    2 +-
 drivers/scsi/device_handler/scsi_dh_emc.c   |    3 ++-
 drivers/scsi/device_handler/scsi_dh_hp_sw.c |    3 ++-
 drivers/scsi/device_handler/scsi_dh_rdac.c  |    3 ++-
 drivers/scsi/scsi_transport_spi.c           |    4 +++-
 include/linux/bio.h                         |   26 +++++++++++++++++---------
 include/linux/blkdev.h                      |   15 ++++++++++++---
 12 files changed, 53 insertions(+), 24 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index b754a4a..7fefda4 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1062,8 +1062,15 @@ void init_request_from_bio(struct request *req, struct bio *bio)
 	/*
 	 * inherit FAILFAST from bio (for read-ahead, and explicit FAILFAST)
 	 */
-	if (bio_rw_ahead(bio) || bio_failfast(bio))
-		req->cmd_flags |= REQ_FAILFAST;
+	if (bio_rw_ahead(bio))
+		req->cmd_flags |= (REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT |
+				   REQ_FAILFAST_DRIVER);
+	if (bio_failfast_dev(bio))
+		req->cmd_flags |= REQ_FAILFAST_DEV;
+	if (bio_failfast_transport(bio))
+		req->cmd_flags |= REQ_FAILFAST_TRANSPORT;
+	if (bio_failfast_driver(bio))
+		req->cmd_flags |= BIO_RW_FAILFAST_DRIVER;
 
 	/*
 	 * REQ_BARRIER implies no merging, but lets make it explicit
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index e8f704a..f29ab80 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -808,7 +808,7 @@ static int multipath_map(struct dm_target *ti, struct bio *bio,
 	dm_bio_record(&mpio->details, bio);
 
 	map_context->ptr = mpio;
-	bio->bi_rw |= (1 << BIO_RW_FAILFAST);
+	bio->bi_rw |= (1 << BIO_RW_FAILFAST_TRANSPORT);
 	r = map_io(m, bio, mpio, 0);
 	if (r < 0 || r == DM_MAPIO_REQUEUE)
 		mempool_free(mpio, m->mpio_pool);
diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
index 42ee1a2..a8030d6 100644
--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -172,7 +172,7 @@ static int multipath_make_request (struct request_queue *q, struct bio * bio)
 	mp_bh->bio = *bio;
 	mp_bh->bio.bi_sector += multipath->rdev->data_offset;
 	mp_bh->bio.bi_bdev = multipath->rdev->bdev;
-	mp_bh->bio.bi_rw |= (1 << BIO_RW_FAILFAST);
+	mp_bh->bio.bi_rw |= (1 << BIO_RW_FAILFAST_TRANSPORT);
 	mp_bh->bio.bi_end_io = multipath_end_request;
 	mp_bh->bio.bi_private = mp_bh;
 	generic_make_request(&mp_bh->bio);
@@ -390,7 +390,7 @@ static void multipathd (mddev_t *mddev)
 			*bio = *(mp_bh->master_bio);
 			bio->bi_sector += conf->multipaths[mp_bh->path].rdev->data_offset;
 			bio->bi_bdev = conf->multipaths[mp_bh->path].rdev->bdev;
-			bio->bi_rw |= (1 << BIO_RW_FAILFAST);
+			bio->bi_rw |= (1 << BIO_RW_FAILFAST_TRANSPORT);
 			bio->bi_end_io = multipath_end_request;
 			bio->bi_private = mp_bh;
 			generic_make_request(bio);
diff --git a/drivers/s390/block/dasd_diag.c b/drivers/s390/block/dasd_diag.c
index d91df38..60102ce 100644
--- a/drivers/s390/block/dasd_diag.c
+++ b/drivers/s390/block/dasd_diag.c
@@ -533,7 +533,7 @@ static struct dasd_ccw_req *dasd_diag_build_cp(struct dasd_device *memdev,
 	}
 	cqr->retries = DIAG_MAX_RETRIES;
 	cqr->buildclk = get_clock();
-	if (req->cmd_flags & REQ_FAILFAST)
+	if (blk_noretry_request(req))
 		set_bit(DASD_CQR_FLAGS_FAILFAST, &cqr->flags);
 	cqr->startdev = memdev;
 	cqr->memdev = memdev;
diff --git a/drivers/s390/block/dasd_eckd.c b/drivers/s390/block/dasd_eckd.c
index a0edae0..4779e2c 100644
--- a/drivers/s390/block/dasd_eckd.c
+++ b/drivers/s390/block/dasd_eckd.c
@@ -1604,7 +1604,7 @@ static struct dasd_ccw_req *dasd_eckd_build_cp(struct dasd_device *startdev,
 			recid++;
 		}
 	}
-	if (req->cmd_flags & REQ_FAILFAST)
+	if (blk_noretry_request(req))
 		set_bit(DASD_CQR_FLAGS_FAILFAST, &cqr->flags);
 	cqr->startdev = startdev;
 	cqr->memdev = startdev;
diff --git a/drivers/s390/block/dasd_fba.c b/drivers/s390/block/dasd_fba.c
index 1166115..6125041 100644
--- a/drivers/s390/block/dasd_fba.c
+++ b/drivers/s390/block/dasd_fba.c
@@ -350,7 +350,7 @@ static struct dasd_ccw_req *dasd_fba_build_cp(struct dasd_device * memdev,
 			recid++;
 		}
 	}
-	if (req->cmd_flags & REQ_FAILFAST)
+	if (blk_noretry_request(req))
 		set_bit(DASD_CQR_FLAGS_FAILFAST, &cqr->flags);
 	cqr->startdev = memdev;
 	cqr->memdev = memdev;
diff --git a/drivers/scsi/device_handler/scsi_dh_emc.c b/drivers/scsi/device_handler/scsi_dh_emc.c
index ed53f14..376322b 100644
--- a/drivers/scsi/device_handler/scsi_dh_emc.c
+++ b/drivers/scsi/device_handler/scsi_dh_emc.c
@@ -294,7 +294,8 @@ static struct request *get_req(struct scsi_device *sdev, int cmd)
 
 	rq->cmd[4] = len;
 	rq->cmd_type = REQ_TYPE_BLOCK_PC;
-	rq->cmd_flags |= REQ_FAILFAST;
+	rq->cmd_flags |= REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT |
+			 REQ_FAILFAST_DRIVER;
 	rq->timeout = CLARIION_TIMEOUT;
 	rq->retries = CLARIION_RETRIES;
 
diff --git a/drivers/scsi/device_handler/scsi_dh_hp_sw.c b/drivers/scsi/device_handler/scsi_dh_hp_sw.c
index 12ceab7..95be4b3 100644
--- a/drivers/scsi/device_handler/scsi_dh_hp_sw.c
+++ b/drivers/scsi/device_handler/scsi_dh_hp_sw.c
@@ -89,7 +89,8 @@ static int hp_sw_activate(struct scsi_device *sdev)
 	sdev_printk(KERN_INFO, sdev, "sending START_STOP.");
 
 	req->cmd_type = REQ_TYPE_BLOCK_PC;
-	req->cmd_flags |= REQ_FAILFAST;
+	req->cmd_flags |= REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT |
+			  REQ_FAILFAST_DRIVER;
 	req->cmd_len = COMMAND_SIZE(START_STOP);
 	memset(req->cmd, 0, MAX_COMMAND_SIZE);
 	req->cmd[0] = START_STOP;
diff --git a/drivers/scsi/device_handler/scsi_dh_rdac.c b/drivers/scsi/device_handler/scsi_dh_rdac.c
index 6fff077..8117674 100644
--- a/drivers/scsi/device_handler/scsi_dh_rdac.c
+++ b/drivers/scsi/device_handler/scsi_dh_rdac.c
@@ -220,7 +220,8 @@ static struct request *get_rdac_req(struct scsi_device *sdev,
 	rq->sense_len = 0;
 
 	rq->cmd_type = REQ_TYPE_BLOCK_PC;
-	rq->cmd_flags |= REQ_FAILFAST | REQ_NOMERGE;
+	rq->cmd_flags |= REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT |
+			 REQ_FAILFAST_DRIVER;
 	rq->retries = RDAC_RETRIES;
 	rq->timeout = RDAC_TIMEOUT;
 
diff --git a/drivers/scsi/scsi_transport_spi.c b/drivers/scsi/scsi_transport_spi.c
index 75a64a6..b39e12e 100644
--- a/drivers/scsi/scsi_transport_spi.c
+++ b/drivers/scsi/scsi_transport_spi.c
@@ -109,7 +109,9 @@ static int spi_execute(struct scsi_device *sdev, const void *cmd,
 	for(i = 0; i < DV_RETRIES; i++) {
 		result = scsi_execute(sdev, cmd, dir, buffer, bufflen,
 				      sense, DV_TIMEOUT, /* retries */ 1,
-				      REQ_FAILFAST);
+				      REQ_FAILFAST_DEV |
+				      REQ_FAILFAST_TRANSPORT |
+				      REQ_FAILFAST_DRIVER);
 		if (result & DRIVER_SENSE) {
 			struct scsi_sense_hdr sshdr_tmp;
 			if (!sshdr)
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 61c15ea..b6bbad6 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -143,15 +143,20 @@ struct bio {
  * bit 0 -- read (not set) or write (set)
  * bit 1 -- rw-ahead when set
  * bit 2 -- barrier
- * bit 3 -- fail fast, don't want low level driver retries
- * bit 4 -- synchronous I/O hint: the block layer will unplug immediately
+ * bit 3 -- synchronous I/O hint: the block layer will unplug immediately
+ * bit 4 -- meta data
+ * bit 5 -- fail fast device errors
+ * bit 6 -- fail fast transport errors
+ * bit 7 -- fail fast driver errors
  */
-#define BIO_RW		0
-#define BIO_RW_AHEAD	1
-#define BIO_RW_BARRIER	2
-#define BIO_RW_FAILFAST	3
-#define BIO_RW_SYNC	4
-#define BIO_RW_META	5
+#define BIO_RW				0
+#define BIO_RW_AHEAD			1
+#define BIO_RW_BARRIER			2
+#define BIO_RW_SYNC			3
+#define BIO_RW_META			4
+#define BIO_RW_FAILFAST_DEV		5
+#define BIO_RW_FAILFAST_TRANSPORT	6
+#define BIO_RW_FAILFAST_DRIVER		7
 
 /*
  * upper 16 bits of bi_rw define the io priority of this bio
@@ -178,7 +183,10 @@ struct bio {
 #define bio_sectors(bio)	((bio)->bi_size >> 9)
 #define bio_barrier(bio)	((bio)->bi_rw & (1 << BIO_RW_BARRIER))
 #define bio_sync(bio)		((bio)->bi_rw & (1 << BIO_RW_SYNC))
-#define bio_failfast(bio)	((bio)->bi_rw & (1 << BIO_RW_FAILFAST))
+#define bio_failfast_dev(bio)	((bio)->bi_rw &	(1 << BIO_RW_FAILFAST_DEV))
+#define bio_failfast_transport(bio)	\
+	((bio)->bi_rw & (1 << BIO_RW_FAILFAST_TRANSPORT))
+#define bio_failfast_driver(bio) ((bio)->bi_rw & (1 << BIO_RW_FAILFAST_DRIVER))
 #define bio_rw_ahead(bio)	((bio)->bi_rw & (1 << BIO_RW_AHEAD))
 #define bio_rw_meta(bio)	((bio)->bi_rw & (1 << BIO_RW_META))
 #define bio_empty_barrier(bio)	(bio_barrier(bio) && !(bio)->bi_size)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d2a1b71..4abaa3a 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -95,7 +95,9 @@ enum {
  */
 enum rq_flag_bits {
 	__REQ_RW,		/* not set, read. set, write */
-	__REQ_FAILFAST,		/* no low level driver retries */
+	__REQ_FAILFAST_DEV,	/* no driver retries of device errors */
+	__REQ_FAILFAST_TRANSPORT, /* no driver retries of transport errors */
+	__REQ_FAILFAST_DRIVER,	/* no driver retries of driver errors */
 	__REQ_SORTED,		/* elevator knows about this request */
 	__REQ_SOFTBARRIER,	/* may not be passed by ioscheduler */
 	__REQ_HARDBARRIER,	/* may not be passed by drive either */
@@ -117,7 +119,9 @@ enum rq_flag_bits {
 };
 
 #define REQ_RW		(1 << __REQ_RW)
-#define REQ_FAILFAST	(1 << __REQ_FAILFAST)
+#define REQ_FAILFAST_DEV	(1 << __REQ_FAILFAST_DEV)
+#define REQ_FAILFAST_TRANSPORT	(1 << __REQ_FAILFAST_TRANSPORT)
+#define REQ_FAILFAST_DRIVER	(1 << __REQ_FAILFAST_DRIVER)
 #define REQ_SORTED	(1 << __REQ_SORTED)
 #define REQ_SOFTBARRIER	(1 << __REQ_SOFTBARRIER)
 #define REQ_HARDBARRIER	(1 << __REQ_HARDBARRIER)
@@ -495,7 +499,12 @@ enum {
 #define blk_special_request(rq)	((rq)->cmd_type == REQ_TYPE_SPECIAL)
 #define blk_sense_request(rq)	((rq)->cmd_type == REQ_TYPE_SENSE)
 
-#define blk_noretry_request(rq)	((rq)->cmd_flags & REQ_FAILFAST)
+#define blk_failfast_dev(rq)	((rq)->cmd_flags & REQ_FAILFAST_DEV)
+#define blk_failfast_transport(rq) ((rq)->cmd_flags & REQ_FAILFAST_TRANSPORT)
+#define blk_failfast_driver(rq)	((rq)->cmd_flags & REQ_FAILFAST_DRIVER)
+#define blk_noretry_request(rq)	(blk_failfast_dev(rq) ||	\
+				 blk_failfast_transport(rq) ||	\
+				 blk_failfast_driver(rq))
 #define blk_rq_started(rq)	((rq)->cmd_flags & REQ_STARTED)
 
 #define blk_account_rq(rq)	(blk_rq_started(rq) && blk_fs_request(rq))
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 7/7] scsi: Support fail fast bits
  2008-06-05  1:41           ` [PATCH 6/7] block and drivers: separate failfast into multiple bits michaelc
@ 2008-06-05  1:41             ` michaelc
  0 siblings, 0 replies; 9+ messages in thread
From: michaelc @ 2008-06-05  1:41 UTC (permalink / raw)
  To: dm-devel, linux-scsi; +Cc: Mike Christie

From: Mike Christie <michaelc@cs.wisc.edu>

This converts scsi decide disposition to handle to the different
types of failfast that can be requested.

I was not sure if some of these were device or driver or transport
errors. For example I made DID_PARITY a device error, but I thought
maybe this could be a device or transport error. Also DID_ERROR seems
to be used for lots of different errors, so I was not sure how
to classify it.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/scsi/scsi_error.c |   17 ++++++++++++-----
 1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index d257210..555085a 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1288,6 +1288,7 @@ static void scsi_eh_offline_sdevs(struct list_head *work_q,
 int scsi_decide_disposition(struct scsi_cmnd *scmd)
 {
 	int rtn;
+	int retry_flag = 0;
 
 	/*
 	 * if the device is offline, then we clearly just pass the result back
@@ -1337,6 +1338,7 @@ int scsi_decide_disposition(struct scsi_cmnd *scmd)
 		 * and not get stuck in a loop.
 		 */
 	case DID_SOFT_ERROR:
+		retry_flag = REQ_FAILFAST_DRIVER;
 		goto maybe_retry;
 	case DID_IMM_RETRY:
 		return NEEDS_RETRY;
@@ -1368,10 +1370,13 @@ int scsi_decide_disposition(struct scsi_cmnd *scmd)
 			 * lower down
 			 */
 			break;
-		/* fallthrough */
-
+		retry_flag = REQ_FAILFAST_DRIVER;
+		goto maybe_retry;
 	case DID_BUS_BUSY:
+		retry_flag = REQ_FAILFAST_TRANSPORT;
+		goto maybe_retry;
 	case DID_PARITY:
+		retry_flag = REQ_FAILFAST_DEV;
 		goto maybe_retry;
 	case DID_TIME_OUT:
 		/*
@@ -1420,8 +1425,10 @@ int scsi_decide_disposition(struct scsi_cmnd *scmd)
 		return SUCCESS;
 	case CHECK_CONDITION:
 		rtn = scsi_check_sense(scmd);
-		if (rtn == NEEDS_RETRY)
+		if (rtn == NEEDS_RETRY) {
+			retry_flag = REQ_FAILFAST_DEV;
 			goto maybe_retry;
+		}
 		/* if rtn == FAILED, we have no sense information;
 		 * returning FAILED will wake the error handler thread
 		 * to collect the sense and redo the decide
@@ -1451,8 +1458,8 @@ int scsi_decide_disposition(struct scsi_cmnd *scmd)
 	 * the request was not marked fast fail.  Note that above,
 	 * even if the request is marked fast fail, we still requeue
 	 * for queue congestion conditions (QUEUE_FULL or BUSY) */
-	if ((++scmd->retries) <= scmd->allowed
-	    && !blk_noretry_request(scmd->request)) {
+	if ((++scmd->retries) <= scmd->allowed &&
+	    !(scmd->request->cmd_flags & retry_flag)) {
 		return NEEDS_RETRY;
 	} else {
 		/*
-- 
1.5.4.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/7] fc class: Add support for new transport errors
  2008-06-05  1:41     ` [PATCH 3/7] fc class: Add support for new transport errors michaelc
  2008-06-05  1:41       ` [PATCH 4/7] qla2xxx: use new host byte " michaelc
@ 2008-08-19 15:35       ` James Smart
  1 sibling, 0 replies; 9+ messages in thread
From: James Smart @ 2008-08-19 15:35 UTC (permalink / raw)
  To: device-mapper development; +Cc: Mike Christie, linux-scsi

Ack.

Although, I have the personal style preference of :
   rport->flags &= ~(FC_RPORT_FAST_FAIL_TIMEDOUT |
                     FC_RPORT_DEVLOSS_PENDING);
over
  >
  > +	rport->flags &= ~FC_RPORT_FAST_FAIL_TIMEDOUT;
  >  	rport->flags &= ~FC_RPORT_DEVLOSS_PENDING;
  >

-- james s

michaelc@cs.wisc.edu wrote:
> From: Mike Christie <michaelc@cs.wisc.edu>
> 
> When we block a rport and the driver implements the terminate
> callback we will fail IO that was running quickly. However
> IO that was in the scsi_device/block queue sits there until
> the dev_loss_tmo fires, and this can make it look like IO is
> lost because new IO will get executed but that IO stuck in
> the blocked queue sits there for some time longer.
> 
> With this patch when the fast io fail tmo fires, we will
> fail the blocked IO and any new IO. This patch also allows
> all drivers to partially support the fast io fail tmo. If the
> terminate io callback is not implemented, we will still fail blocked
> IO and any new IO, so multipath can handle that. This means that for
> drivers like qla2xxx which seem to fail the IO when the error is first
> detected this will then allow drivers like lpfc and qla2xxx to have the
> IO flushed to the upper layers when the fast io fail tmo is fired.
> 
> This patch also allows the fc and iscsi classes to implement the
> same behavior. The timers are just unfornately named differently.
> 
> The next patches will convert the drivers to support this.
> 
> This patch has been lightly tested with lpfc and qla2xxx. I am not able
> to test the role change handling.
> 
> Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
> ---
>  drivers/scsi/scsi_transport_fc.c |   15 ++++++++++-----
>  include/scsi/scsi_transport_fc.h |    8 ++++++--
>  2 files changed, 16 insertions(+), 7 deletions(-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-08-19 15:35 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-05  1:41 block and scsi fail fast fixes michaelc
2008-06-05  1:41 ` [PATCH 1/7] scsi: add transport host byte errors (v2) michaelc
2008-06-05  1:41   ` [PATCH 2/7] iscsi class, libiscsi and qla4xxx: convert to new transport host byte values michaelc
2008-06-05  1:41     ` [PATCH 3/7] fc class: Add support for new transport errors michaelc
2008-06-05  1:41       ` [PATCH 4/7] qla2xxx: use new host byte " michaelc
2008-06-05  1:41         ` [PATCH 5/7] lpfc: start to use new trasnport errors michaelc
2008-06-05  1:41           ` [PATCH 6/7] block and drivers: separate failfast into multiple bits michaelc
2008-06-05  1:41             ` [PATCH 7/7] scsi: Support fail fast bits michaelc
2008-08-19 15:35       ` [PATCH 3/7] fc class: Add support for new transport errors James Smart

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).