* block and scsi fail fast fixes @ 2008-06-05 1:41 michaelc 2008-06-05 1:41 ` [PATCH 1/7] scsi: add transport host byte errors (v2) michaelc 0 siblings, 1 reply; 9+ messages in thread From: michaelc @ 2008-06-05 1:41 UTC (permalink / raw) To: dm-devel, linux-scsi The following patches fix two problems I have been seeing in Red Hat bugzillas. The patches are made over scsi-misc, but except for 0006-block-and-drivers-separate-failfast-into-multiple-b.patch they could also apply over scsi-rc-fixes or linus's tree. 0006-block-and-drivers-separate-failfast-into-multiple-b.patch has a patch to convert the scsi dh modules so that is why it does not apply to the other kernels. The first problem is that when a transport problem is detected and the classes/drivers block the scsi_devices, there is IO in the driver and IO in the scsi_device queues. For fibre we have the fast IO fail tmo infrastructure to allow us to get IO in the driver up to multipath, but IO in the queues remains until the dev_loss_tmo fires. The difference between the timers can be minutes, so it looks like hang to the application. iSCSI has something similar to FC's fast io fail tmo, but it is called the replacment timeout. With this we will fail all IO that is in the driver or queued or any incoming IO. The first 5 patches try to provide common behavior: 0001-scsi-add-transport-host-byte-errors-v2.patch 0002-iscsi-class-libiscsi-and-qla4xxx-convert-to-new-tr.patch 0003-fc-class-Add-support-for-new-transport-errors.patch 0004-qla2xxx-use-new-host-byte-transport-errors.patch 0005-lpfc-start-to-use-new-trasnport-errors.patch Basically, when we block a device we fail IO with DID_TRANSPORT_DISRUPTED. When the fast io transport timer fires we fail IO with DID_TRANSPORT_FAILFAST. I converted qla2xxx and tried to convert lpfc (I was not sure about some of the errors). zfcp and mpt need to be converted, but it looked like they would be ok with the patches below. I could only test qla2xxx and lpfc though. The second problem is that multipath is not really good at handling a lot of errors. It just retries all errors on a different path, so for transport errors it makes a lot of sense to send them up to us pretty quickly. But device errors or driver errors or weird ones inbetween the scsi layer is better at handling them because the multipath layer does not know anything about scsi details. The patches: 0006-block-and-drivers-separate-failfast-into-multiple-b.patch 0007-scsi-Support-fail-fast-bits.patch are really simple and just break up the FAILFAST bits into device, driver and transport bits, so the upper layer can ask the lower layers to only fail fast certain types of errors. For multipath we only set the transport fail fast bit, and I thought in the future maybe something like RAID would set the device failfast error and not want transport errors failed fast to it. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/7] scsi: add transport host byte errors (v2) 2008-06-05 1:41 block and scsi fail fast fixes michaelc @ 2008-06-05 1:41 ` michaelc 2008-06-05 1:41 ` [PATCH 2/7] iscsi class, libiscsi and qla4xxx: convert to new transport host byte values michaelc 0 siblings, 1 reply; 9+ messages in thread From: michaelc @ 2008-06-05 1:41 UTC (permalink / raw) To: dm-devel, linux-scsi; +Cc: Mike Christie From: Mike Christie <michaelc@cs.wisc.edu> Currently, if there is a transport problem the iscsi drivers will return outstanding commands (commands being exeucted by the driver/fw/hw) with DID_BUS_BUSY and block the session so no new commands can be queued. Commands that are caught between the failure handling and blocking are failed with DID_IMM_RETRY or one of the scsi ml queuecommand return values. When the recovery_timeout fires, the iscsi drivers then fail IO with DID_NO_CONNECT. For fcp, some drivers will fail some outstanding IO (disk but possibly not tape) with DID_BUS_BUSY or DID_ERROR or some other value that causes a retry and hits the scsi_error.c failfast check, block the rport, and commands caught in the race are failed with DID_IMM_RETRY. Other drivers, will hold onto all IO and wait for the terminate_rport_io or dev_loss_tmo_callbk to be called. The following patches attempt to unify what upper layers will see drivers like multipath can make a good guess. This relies on drivers being hooked into their transport class. This first patch just defines two new host byte errors so drivers can return the same value for when a rport/session is blocked and for when the fast_io_fail_tmo fires. The idea is that if the LLD/class detects a problem and is going to block a rport/session, then if the LLD wants or must return the command to scsi-ml, then it can return it with DID_TRANSPORT_DISRUPTED. This will requeue the IO into the same scsi queue it came from, until the fast io fail timer fires and the class decides what to do. When using multipath and the fast_io_fail_tmo fires then the class can fail commands with DID_TRANSPORT_FAILFAST or drivers can use DID_TRANSPORT_FAILFAST in their terminate_rport_io callbacks or the equivlent in iscsi if we ever implement more advanced recovery methods. A LLD, like lpfc, could continue to return DID_ERROR and then it will hit the normal failfast path. The point of the patches is that upper layers will not see a failure that could be recovered from while the rport/session is blocked until fast_io_fail_tmo/recovery_timeout fires. V2 Fixed patch/diff errors and renamed DID_TRANSPORT_BLOCKED to DID_TRANSPORT_DISRUPTED. V1 initial patch. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> --- drivers/scsi/constants.c | 3 ++- drivers/scsi/scsi_error.c | 18 +++++++++++++++++- include/scsi/scsi.h | 5 +++++ 3 files changed, 24 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/constants.c b/drivers/scsi/constants.c index 9785d73..4003dee 100644 --- a/drivers/scsi/constants.c +++ b/drivers/scsi/constants.c @@ -1364,7 +1364,8 @@ EXPORT_SYMBOL(scsi_print_sense); static const char * const hostbyte_table[]={ "DID_OK", "DID_NO_CONNECT", "DID_BUS_BUSY", "DID_TIME_OUT", "DID_BAD_TARGET", "DID_ABORT", "DID_PARITY", "DID_ERROR", "DID_RESET", "DID_BAD_INTR", -"DID_PASSTHROUGH", "DID_SOFT_ERROR", "DID_IMM_RETRY", "DID_REQUEUE"}; +"DID_PASSTHROUGH", "DID_SOFT_ERROR", "DID_IMM_RETRY", "DID_REQUEUE", +"DID_TRANSPORT_DISRUPTED", "DID_TRANSPORT_FAILFAST" }; #define NUM_HOSTBYTE_STRS ARRAY_SIZE(hostbyte_table) static const char * const driverbyte_table[]={ diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 006a959..d257210 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -1343,7 +1343,23 @@ int scsi_decide_disposition(struct scsi_cmnd *scmd) case DID_REQUEUE: return ADD_TO_MLQUEUE; - + case DID_TRANSPORT_DISRUPTED: + /* + * LLD/transport was disrupted during processing of the IO. + * The transport class is now blocked/blocking, + * and the transport will decide what to do with the IO + * based on its timers and recovery capablilities. + * + * TODO: When the target block code is merged we can block + * entire target instead of just this device. + */ + return ADD_TO_MLQUEUE; + case DID_TRANSPORT_FAILFAST: + /* + * The transport decided to failfast the IO (most likely + * the fast io fail tmo fired), so send IO directly upwards. + */ + return SUCCESS; case DID_ERROR: if (msg_byte(scmd->result) == COMMAND_COMPLETE && status_byte(scmd->result) == RESERVATION_CONFLICT) diff --git a/include/scsi/scsi.h b/include/scsi/scsi.h index 2b5b935..df2c775 100644 --- a/include/scsi/scsi.h +++ b/include/scsi/scsi.h @@ -363,6 +363,11 @@ struct scsi_lun { #define DID_IMM_RETRY 0x0c /* Retry without decrementing retry count */ #define DID_REQUEUE 0x0d /* Requeue command (no immediate retry) also * without decrementing the retry count */ +#define DID_TRANSPORT_DISRUPTED 0x0e /* Transport error disrupted execution + * and the driver blocked the port to + * recover the link. Transport class will + * retry or fail IO */ +#define DID_TRANSPORT_FAILFAST 0x0f /* Transport class fastfailed the io */ #define DRIVER_OK 0x00 /* Driver status */ /* -- 1.5.4.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/7] iscsi class, libiscsi and qla4xxx: convert to new transport host byte values 2008-06-05 1:41 ` [PATCH 1/7] scsi: add transport host byte errors (v2) michaelc @ 2008-06-05 1:41 ` michaelc 2008-06-05 1:41 ` [PATCH 3/7] fc class: Add support for new transport errors michaelc 0 siblings, 1 reply; 9+ messages in thread From: michaelc @ 2008-06-05 1:41 UTC (permalink / raw) To: dm-devel, linux-scsi; +Cc: Mike Christie From: Mike Christie <michaelc@cs.wisc.edu> This patch converts the iscsi drivers to the new host byte values. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> --- drivers/scsi/libiscsi.c | 6 +++--- drivers/scsi/qla4xxx/ql4_isr.c | 4 ++-- drivers/scsi/scsi_transport_iscsi.c | 4 ++-- 3 files changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index 010c1b9..055c196 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -1132,7 +1132,7 @@ int iscsi_queuecommand(struct scsi_cmnd *sc, void (*done)(struct scsi_cmnd *)) switch (session->state) { case ISCSI_STATE_IN_RECOVERY: reason = FAILURE_SESSION_IN_RECOVERY; - sc->result = DID_IMM_RETRY << 16; + sc->result = DID_TRANSPORT_DISRUPTED << 16; break; case ISCSI_STATE_LOGGING_OUT: reason = FAILURE_SESSION_LOGGING_OUT; @@ -1140,7 +1140,7 @@ int iscsi_queuecommand(struct scsi_cmnd *sc, void (*done)(struct scsi_cmnd *)) break; case ISCSI_STATE_RECOVERY_FAILED: reason = FAILURE_SESSION_RECOVERY_TIMEOUT; - sc->result = DID_NO_CONNECT << 16; + sc->result = DID_TRANSPORT_FAILFAST << 16; break; case ISCSI_STATE_TERMINATE: reason = FAILURE_SESSION_TERMINATE; @@ -2233,7 +2233,7 @@ static void iscsi_start_session_recovery(struct iscsi_session *session, */ spin_lock_bh(&session->lock); fail_all_commands(conn, -1, - STOP_CONN_RECOVER ? DID_BUS_BUSY : DID_ERROR); + STOP_CONN_RECOVER ? DID_TRANSPORT_DISRUPTED : DID_ERROR); flush_control_queues(session, conn); spin_unlock_bh(&session->lock); mutex_unlock(&session->eh_mutex); diff --git a/drivers/scsi/qla4xxx/ql4_isr.c b/drivers/scsi/qla4xxx/ql4_isr.c index a91a57c..799120f 100644 --- a/drivers/scsi/qla4xxx/ql4_isr.c +++ b/drivers/scsi/qla4xxx/ql4_isr.c @@ -139,7 +139,7 @@ static void qla4xxx_status_entry(struct scsi_qla_host *ha, ha->host_no, cmd->device->channel, cmd->device->id, cmd->device->lun)); - cmd->result = DID_BUS_BUSY << 16; + cmd->result = DID_TRANSPORT_DISRUPTED << 16; /* * Mark device missing so that we won't continue to send @@ -243,7 +243,7 @@ static void qla4xxx_status_entry(struct scsi_qla_host *ha, if (atomic_read(&ddb_entry->state) == DDB_STATE_ONLINE) qla4xxx_mark_device_missing(ha, ddb_entry); - cmd->result = DID_BUS_BUSY << 16; + cmd->result = DID_TRANSPORT_DISRUPTED << 16; break; case SCS_QUEUE_FULL: diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c index 65d1737..b5a529c 100644 --- a/drivers/scsi/scsi_transport_iscsi.c +++ b/drivers/scsi/scsi_transport_iscsi.c @@ -258,10 +258,10 @@ int iscsi_session_chkready(struct iscsi_cls_session *session) err = 0; break; case ISCSI_SESSION_FAILED: - err = DID_IMM_RETRY << 16; + err = DID_TRANSPORT_DISRUPTED << 16; break; case ISCSI_SESSION_FREE: - err = DID_NO_CONNECT << 16; + err = DID_TRANSPORT_FAILFAST << 16; break; default: err = DID_NO_CONNECT << 16; -- 1.5.4.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 3/7] fc class: Add support for new transport errors 2008-06-05 1:41 ` [PATCH 2/7] iscsi class, libiscsi and qla4xxx: convert to new transport host byte values michaelc @ 2008-06-05 1:41 ` michaelc 2008-06-05 1:41 ` [PATCH 4/7] qla2xxx: use new host byte " michaelc 2008-08-19 15:35 ` [PATCH 3/7] fc class: Add support for new transport errors James Smart 0 siblings, 2 replies; 9+ messages in thread From: michaelc @ 2008-06-05 1:41 UTC (permalink / raw) To: dm-devel, linux-scsi; +Cc: Mike Christie From: Mike Christie <michaelc@cs.wisc.edu> When we block a rport and the driver implements the terminate callback we will fail IO that was running quickly. However IO that was in the scsi_device/block queue sits there until the dev_loss_tmo fires, and this can make it look like IO is lost because new IO will get executed but that IO stuck in the blocked queue sits there for some time longer. With this patch when the fast io fail tmo fires, we will fail the blocked IO and any new IO. This patch also allows all drivers to partially support the fast io fail tmo. If the terminate io callback is not implemented, we will still fail blocked IO and any new IO, so multipath can handle that. This means that for drivers like qla2xxx which seem to fail the IO when the error is first detected this will then allow drivers like lpfc and qla2xxx to have the IO flushed to the upper layers when the fast io fail tmo is fired. This patch also allows the fc and iscsi classes to implement the same behavior. The timers are just unfornately named differently. The next patches will convert the drivers to support this. This patch has been lightly tested with lpfc and qla2xxx. I am not able to test the role change handling. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> --- drivers/scsi/scsi_transport_fc.c | 15 ++++++++++----- include/scsi/scsi_transport_fc.h | 8 ++++++-- 2 files changed, 16 insertions(+), 7 deletions(-) diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c index 5fd64e7..ea4906c 100644 --- a/drivers/scsi/scsi_transport_fc.c +++ b/drivers/scsi/scsi_transport_fc.c @@ -2156,8 +2156,7 @@ fc_attach_transport(struct fc_function_template *ft) SETUP_PRIVATE_RPORT_ATTRIBUTE_RD(roles); SETUP_PRIVATE_RPORT_ATTRIBUTE_RD(port_state); SETUP_PRIVATE_RPORT_ATTRIBUTE_RD(scsi_target_id); - if (ft->terminate_rport_io) - SETUP_PRIVATE_RPORT_ATTRIBUTE_RW(fast_io_fail_tmo); + SETUP_PRIVATE_RPORT_ATTRIBUTE_RW(fast_io_fail_tmo); BUG_ON(count > FC_RPORT_NUM_ATTRS); @@ -2662,6 +2661,7 @@ fc_remote_port_add(struct Scsi_Host *shost, int channel, spin_lock_irqsave(shost->host_lock, flags); + rport->flags &= ~FC_RPORT_FAST_FAIL_TIMEDOUT; rport->flags &= ~FC_RPORT_DEVLOSS_PENDING; /* if target, initiate a scan */ @@ -2725,6 +2725,7 @@ fc_remote_port_add(struct Scsi_Host *shost, int channel, rport->port_id = ids->port_id; rport->roles = ids->roles; rport->port_state = FC_PORTSTATE_ONLINE; + rport->flags &= ~FC_RPORT_FAST_FAIL_TIMEDOUT; if (fci->f->dd_fcrport_size) memset(rport->dd_data, 0, @@ -2807,7 +2808,6 @@ void fc_remote_port_delete(struct fc_rport *rport) { struct Scsi_Host *shost = rport_to_shost(rport); - struct fc_internal *i = to_fc_internal(shost->transportt); int timeout = rport->dev_loss_tmo; unsigned long flags; @@ -2853,7 +2853,7 @@ fc_remote_port_delete(struct fc_rport *rport) /* see if we need to kill io faster than waiting for device loss */ if ((rport->fast_io_fail_tmo != -1) && - (rport->fast_io_fail_tmo < timeout) && (i->f->terminate_rport_io)) + (rport->fast_io_fail_tmo < timeout)) fc_queue_devloss_work(shost, &rport->fail_io_work, rport->fast_io_fail_tmo * HZ); @@ -2930,6 +2930,7 @@ fc_remote_port_rolechg(struct fc_rport *rport, u32 roles) spin_lock_irqsave(shost->host_lock, flags); rport->flags &= ~FC_RPORT_DEVLOSS_PENDING; + rport->flags &= ~FC_RPORT_FAST_FAIL_TIMEDOUT; spin_unlock_irqrestore(shost->host_lock, flags); /* ensure any stgt delete functions are done */ @@ -3024,6 +3025,7 @@ fc_timeout_deleted_rport(struct work_struct *work) rport->supported_classes = FC_COS_UNSPECIFIED; rport->roles = FC_PORT_ROLE_UNKNOWN; rport->port_state = FC_PORTSTATE_NOTPRESENT; + rport->flags &= ~FC_RPORT_FAST_FAIL_TIMEDOUT; /* remove the identifiers that aren't used in the consisting binding */ switch (fc_host->tgtid_bind_type) { @@ -3072,7 +3074,10 @@ fc_timeout_fail_rport_io(struct work_struct *work) if (rport->port_state != FC_PORTSTATE_BLOCKED) return; - i->f->terminate_rport_io(rport); + rport->flags |= FC_RPORT_FAST_FAIL_TIMEDOUT; + if (i->f->terminate_rport_io) + i->f->terminate_rport_io(rport); + scsi_target_unblock(&rport->dev); } /** diff --git a/include/scsi/scsi_transport_fc.h b/include/scsi/scsi_transport_fc.h index 06f72ba..4cf6fb0 100644 --- a/include/scsi/scsi_transport_fc.h +++ b/include/scsi/scsi_transport_fc.h @@ -338,6 +338,7 @@ struct fc_rport { /* aka fc_starget_attrs */ /* bit field values for struct fc_rport "flags" field: */ #define FC_RPORT_DEVLOSS_PENDING 0x01 #define FC_RPORT_SCAN_PENDING 0x02 +#define FC_RPORT_FAST_FAIL_TIMEDOUT 0x03 #define dev_to_rport(d) \ container_of(d, struct fc_rport, dev) @@ -659,12 +660,15 @@ fc_remote_port_chkready(struct fc_rport *rport) if (rport->roles & FC_PORT_ROLE_FCP_TARGET) result = 0; else if (rport->flags & FC_RPORT_DEVLOSS_PENDING) - result = DID_IMM_RETRY << 16; + result = DID_TRANSPORT_DISRUPTED << 16; else result = DID_NO_CONNECT << 16; break; case FC_PORTSTATE_BLOCKED: - result = DID_IMM_RETRY << 16; + if (rport->flags & FC_RPORT_FAST_FAIL_TIMEDOUT) + result = DID_TRANSPORT_FAILFAST << 16; + else + result = DID_TRANSPORT_DISRUPTED << 16; break; default: result = DID_NO_CONNECT << 16; -- 1.5.4.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 4/7] qla2xxx: use new host byte transport errors. 2008-06-05 1:41 ` [PATCH 3/7] fc class: Add support for new transport errors michaelc @ 2008-06-05 1:41 ` michaelc 2008-06-05 1:41 ` [PATCH 5/7] lpfc: start to use new trasnport errors michaelc 2008-08-19 15:35 ` [PATCH 3/7] fc class: Add support for new transport errors James Smart 1 sibling, 1 reply; 9+ messages in thread From: michaelc @ 2008-06-05 1:41 UTC (permalink / raw) To: dm-devel, linux-scsi; +Cc: Mike Christie From: Mike Christie <michaelc@cs.wisc.edu> This has qla2xxx use the new transport error values instead of DID_BUS_BUSY. I am not sure if all the errors in qla_isr.c I changed are transport related. We end up blocking/deleting the rport for all of them so it is ok to use the new transport error since the fc classs will decide when to fail the IO. With this patch if I pull a cable then IO that had reached the driver, will be failed with DID_TRANSPORT_DISRUPTED. The fc class will then fail the IO when the fast io fail tmo has fired. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> --- drivers/scsi/qla2xxx/qla_isr.c | 14 ++++++++++++-- 1 files changed, 12 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c index 5d9a64a..b1dec6e 100644 --- a/drivers/scsi/qla2xxx/qla_isr.c +++ b/drivers/scsi/qla2xxx/qla_isr.c @@ -1222,7 +1222,12 @@ qla2x00_status_entry(scsi_qla_host_t *ha, void *pkt) cp->serial_number, comp_status, atomic_read(&fcport->state))); - cp->result = DID_BUS_BUSY << 16; + /* + * We are going to have the fc class block the rport + * while we try to recover so instruct the mid layer + * to requeue until the class decides how to handle this. + */ + cp->result = DID_TRANSPORT_DISRUPTED << 16; if (atomic_read(&fcport->state) == FCS_ONLINE) { qla2x00_mark_device_lost(ha, fcport, 1, 1); } @@ -1250,7 +1255,12 @@ qla2x00_status_entry(scsi_qla_host_t *ha, void *pkt) break; case CS_TIMEOUT: - cp->result = DID_BUS_BUSY << 16; + /* + * We are going to have the fc class block the rport + * while we try to recover so instruct the mid layer + * to requeue until the class decides how to handle this. + */ + cp->result = DID_TRANSPORT_DISRUPTED << 16; if (IS_FWI2_CAPABLE(ha)) { DEBUG2(printk(KERN_INFO -- 1.5.4.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 5/7] lpfc: start to use new trasnport errors. 2008-06-05 1:41 ` [PATCH 4/7] qla2xxx: use new host byte " michaelc @ 2008-06-05 1:41 ` michaelc 2008-06-05 1:41 ` [PATCH 6/7] block and drivers: separate failfast into multiple bits michaelc 0 siblings, 1 reply; 9+ messages in thread From: michaelc @ 2008-06-05 1:41 UTC (permalink / raw) To: dm-devel, linux-scsi; +Cc: Mike Christie From: Mike Christie <michaelc@cs.wisc.edu> This is only a test patch to get lpfc going. For the case I changed it looked like the rport is deleted then we fail these IOs with DID_BUS_BUSY so using DID_TRANSPORT_DISRUPTED was correct. In testing the driver by stopping the fcp service on the target this worked. I was not sure if maybe this bus busy: case IOSTAT_NPORT_BSY: case IOSTAT_FABRIC_BSY: cmd->result = ScsiResult(DID_BUS_BUSY, 0); should also be converted. For qla2xxx I thought we blocked the rport for similar errors (at least the names sounded similar :)) and so I used DID_TRANSPORT_DISRUPTED, but for lpfc I could not hit this code and was not sure by just looking at it if it was exactly the same, so I did not touch it in this patch. I was also not sure about some cases where if I just unplugged a cable. I would sometimes get IOSTAT_LOCAL_REJECT with IOERR_DEFAULT, so it seemed like DID_ERROR was right for that, but I had seen that there is also a IOERR_LINK_DOWN value. Maybe for that if we end up deleting the rport we should be returning DID_TRANSPORT_DISRUPTED, but I was not able to hit that case and was not able to tell from the code when I should, so I did not touch it. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> --- drivers/scsi/lpfc/lpfc_scsi.c | 9 ++++++++- 1 files changed, 8 insertions(+), 1 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c index 0910a9a..83f7e43 100644 --- a/drivers/scsi/lpfc/lpfc_scsi.c +++ b/drivers/scsi/lpfc/lpfc_scsi.c @@ -590,7 +590,14 @@ lpfc_scsi_cmd_iocb_cmpl(struct lpfc_hba *phba, struct lpfc_iocbq *pIocbIn, if (!pnode || !NLP_CHK_NODE_ACT(pnode) || (pnode->nlp_state != NLP_STE_MAPPED_NODE)) - cmd->result = ScsiResult(DID_BUS_BUSY, SAM_STAT_BUSY); + /* + * Port is not setup so fail IO with + * DID_TRANSPORT_DISRUPTED, and allow the fc + * class to determine what to do with it when + * its timers fire. + */ + cmd->result = ScsiResult(DID_TRANSPORT_DISRUPTED, + SAM_STAT_BUSY); } else { cmd->result = ScsiResult(DID_OK, 0); } -- 1.5.4.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 6/7] block and drivers: separate failfast into multiple bits. 2008-06-05 1:41 ` [PATCH 5/7] lpfc: start to use new trasnport errors michaelc @ 2008-06-05 1:41 ` michaelc 2008-06-05 1:41 ` [PATCH 7/7] scsi: Support fail fast bits michaelc 0 siblings, 1 reply; 9+ messages in thread From: michaelc @ 2008-06-05 1:41 UTC (permalink / raw) To: dm-devel, linux-scsi; +Cc: Mike Christie, Jens Axboe From: Mike Christie <michaelc@cs.wisc.edu> Multipath is best at handling transport errors. If it gets a device error then there is not much the multipath layer can do. It will just access the same device but from a different path. RAID is best at handling device errors. If it gets a transport error it is going to do the same thing the lower level would have done - retry it on the same path. This patch breaks up failfast into device, transport and driver errors. The multipath layers (md and dm mutlipath) only ask the lower levels to fast fail transport errors, but read ahead will ask to fast fail on all errors. Note that blk_noretry_request will return true if any failfast bit is set. This allows drivers that do not support the multipath failfast bits to continue to fail on any failfast error like before. As a result I was thinking blk_noretry_request should have a different name like blk_noretry_any_error or something, but I will do the rename changes in a different patch. Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> --- block/blk-core.c | 11 +++++++++-- drivers/md/dm-mpath.c | 2 +- drivers/md/multipath.c | 4 ++-- drivers/s390/block/dasd_diag.c | 2 +- drivers/s390/block/dasd_eckd.c | 2 +- drivers/s390/block/dasd_fba.c | 2 +- drivers/scsi/device_handler/scsi_dh_emc.c | 3 ++- drivers/scsi/device_handler/scsi_dh_hp_sw.c | 3 ++- drivers/scsi/device_handler/scsi_dh_rdac.c | 3 ++- drivers/scsi/scsi_transport_spi.c | 4 +++- include/linux/bio.h | 26 +++++++++++++++++--------- include/linux/blkdev.h | 15 ++++++++++++--- 12 files changed, 53 insertions(+), 24 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index b754a4a..7fefda4 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1062,8 +1062,15 @@ void init_request_from_bio(struct request *req, struct bio *bio) /* * inherit FAILFAST from bio (for read-ahead, and explicit FAILFAST) */ - if (bio_rw_ahead(bio) || bio_failfast(bio)) - req->cmd_flags |= REQ_FAILFAST; + if (bio_rw_ahead(bio)) + req->cmd_flags |= (REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | + REQ_FAILFAST_DRIVER); + if (bio_failfast_dev(bio)) + req->cmd_flags |= REQ_FAILFAST_DEV; + if (bio_failfast_transport(bio)) + req->cmd_flags |= REQ_FAILFAST_TRANSPORT; + if (bio_failfast_driver(bio)) + req->cmd_flags |= BIO_RW_FAILFAST_DRIVER; /* * REQ_BARRIER implies no merging, but lets make it explicit diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index e8f704a..f29ab80 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -808,7 +808,7 @@ static int multipath_map(struct dm_target *ti, struct bio *bio, dm_bio_record(&mpio->details, bio); map_context->ptr = mpio; - bio->bi_rw |= (1 << BIO_RW_FAILFAST); + bio->bi_rw |= (1 << BIO_RW_FAILFAST_TRANSPORT); r = map_io(m, bio, mpio, 0); if (r < 0 || r == DM_MAPIO_REQUEUE) mempool_free(mpio, m->mpio_pool); diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c index 42ee1a2..a8030d6 100644 --- a/drivers/md/multipath.c +++ b/drivers/md/multipath.c @@ -172,7 +172,7 @@ static int multipath_make_request (struct request_queue *q, struct bio * bio) mp_bh->bio = *bio; mp_bh->bio.bi_sector += multipath->rdev->data_offset; mp_bh->bio.bi_bdev = multipath->rdev->bdev; - mp_bh->bio.bi_rw |= (1 << BIO_RW_FAILFAST); + mp_bh->bio.bi_rw |= (1 << BIO_RW_FAILFAST_TRANSPORT); mp_bh->bio.bi_end_io = multipath_end_request; mp_bh->bio.bi_private = mp_bh; generic_make_request(&mp_bh->bio); @@ -390,7 +390,7 @@ static void multipathd (mddev_t *mddev) *bio = *(mp_bh->master_bio); bio->bi_sector += conf->multipaths[mp_bh->path].rdev->data_offset; bio->bi_bdev = conf->multipaths[mp_bh->path].rdev->bdev; - bio->bi_rw |= (1 << BIO_RW_FAILFAST); + bio->bi_rw |= (1 << BIO_RW_FAILFAST_TRANSPORT); bio->bi_end_io = multipath_end_request; bio->bi_private = mp_bh; generic_make_request(bio); diff --git a/drivers/s390/block/dasd_diag.c b/drivers/s390/block/dasd_diag.c index d91df38..60102ce 100644 --- a/drivers/s390/block/dasd_diag.c +++ b/drivers/s390/block/dasd_diag.c @@ -533,7 +533,7 @@ static struct dasd_ccw_req *dasd_diag_build_cp(struct dasd_device *memdev, } cqr->retries = DIAG_MAX_RETRIES; cqr->buildclk = get_clock(); - if (req->cmd_flags & REQ_FAILFAST) + if (blk_noretry_request(req)) set_bit(DASD_CQR_FLAGS_FAILFAST, &cqr->flags); cqr->startdev = memdev; cqr->memdev = memdev; diff --git a/drivers/s390/block/dasd_eckd.c b/drivers/s390/block/dasd_eckd.c index a0edae0..4779e2c 100644 --- a/drivers/s390/block/dasd_eckd.c +++ b/drivers/s390/block/dasd_eckd.c @@ -1604,7 +1604,7 @@ static struct dasd_ccw_req *dasd_eckd_build_cp(struct dasd_device *startdev, recid++; } } - if (req->cmd_flags & REQ_FAILFAST) + if (blk_noretry_request(req)) set_bit(DASD_CQR_FLAGS_FAILFAST, &cqr->flags); cqr->startdev = startdev; cqr->memdev = startdev; diff --git a/drivers/s390/block/dasd_fba.c b/drivers/s390/block/dasd_fba.c index 1166115..6125041 100644 --- a/drivers/s390/block/dasd_fba.c +++ b/drivers/s390/block/dasd_fba.c @@ -350,7 +350,7 @@ static struct dasd_ccw_req *dasd_fba_build_cp(struct dasd_device * memdev, recid++; } } - if (req->cmd_flags & REQ_FAILFAST) + if (blk_noretry_request(req)) set_bit(DASD_CQR_FLAGS_FAILFAST, &cqr->flags); cqr->startdev = memdev; cqr->memdev = memdev; diff --git a/drivers/scsi/device_handler/scsi_dh_emc.c b/drivers/scsi/device_handler/scsi_dh_emc.c index ed53f14..376322b 100644 --- a/drivers/scsi/device_handler/scsi_dh_emc.c +++ b/drivers/scsi/device_handler/scsi_dh_emc.c @@ -294,7 +294,8 @@ static struct request *get_req(struct scsi_device *sdev, int cmd) rq->cmd[4] = len; rq->cmd_type = REQ_TYPE_BLOCK_PC; - rq->cmd_flags |= REQ_FAILFAST; + rq->cmd_flags |= REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | + REQ_FAILFAST_DRIVER; rq->timeout = CLARIION_TIMEOUT; rq->retries = CLARIION_RETRIES; diff --git a/drivers/scsi/device_handler/scsi_dh_hp_sw.c b/drivers/scsi/device_handler/scsi_dh_hp_sw.c index 12ceab7..95be4b3 100644 --- a/drivers/scsi/device_handler/scsi_dh_hp_sw.c +++ b/drivers/scsi/device_handler/scsi_dh_hp_sw.c @@ -89,7 +89,8 @@ static int hp_sw_activate(struct scsi_device *sdev) sdev_printk(KERN_INFO, sdev, "sending START_STOP."); req->cmd_type = REQ_TYPE_BLOCK_PC; - req->cmd_flags |= REQ_FAILFAST; + req->cmd_flags |= REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | + REQ_FAILFAST_DRIVER; req->cmd_len = COMMAND_SIZE(START_STOP); memset(req->cmd, 0, MAX_COMMAND_SIZE); req->cmd[0] = START_STOP; diff --git a/drivers/scsi/device_handler/scsi_dh_rdac.c b/drivers/scsi/device_handler/scsi_dh_rdac.c index 6fff077..8117674 100644 --- a/drivers/scsi/device_handler/scsi_dh_rdac.c +++ b/drivers/scsi/device_handler/scsi_dh_rdac.c @@ -220,7 +220,8 @@ static struct request *get_rdac_req(struct scsi_device *sdev, rq->sense_len = 0; rq->cmd_type = REQ_TYPE_BLOCK_PC; - rq->cmd_flags |= REQ_FAILFAST | REQ_NOMERGE; + rq->cmd_flags |= REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | + REQ_FAILFAST_DRIVER; rq->retries = RDAC_RETRIES; rq->timeout = RDAC_TIMEOUT; diff --git a/drivers/scsi/scsi_transport_spi.c b/drivers/scsi/scsi_transport_spi.c index 75a64a6..b39e12e 100644 --- a/drivers/scsi/scsi_transport_spi.c +++ b/drivers/scsi/scsi_transport_spi.c @@ -109,7 +109,9 @@ static int spi_execute(struct scsi_device *sdev, const void *cmd, for(i = 0; i < DV_RETRIES; i++) { result = scsi_execute(sdev, cmd, dir, buffer, bufflen, sense, DV_TIMEOUT, /* retries */ 1, - REQ_FAILFAST); + REQ_FAILFAST_DEV | + REQ_FAILFAST_TRANSPORT | + REQ_FAILFAST_DRIVER); if (result & DRIVER_SENSE) { struct scsi_sense_hdr sshdr_tmp; if (!sshdr) diff --git a/include/linux/bio.h b/include/linux/bio.h index 61c15ea..b6bbad6 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -143,15 +143,20 @@ struct bio { * bit 0 -- read (not set) or write (set) * bit 1 -- rw-ahead when set * bit 2 -- barrier - * bit 3 -- fail fast, don't want low level driver retries - * bit 4 -- synchronous I/O hint: the block layer will unplug immediately + * bit 3 -- synchronous I/O hint: the block layer will unplug immediately + * bit 4 -- meta data + * bit 5 -- fail fast device errors + * bit 6 -- fail fast transport errors + * bit 7 -- fail fast driver errors */ -#define BIO_RW 0 -#define BIO_RW_AHEAD 1 -#define BIO_RW_BARRIER 2 -#define BIO_RW_FAILFAST 3 -#define BIO_RW_SYNC 4 -#define BIO_RW_META 5 +#define BIO_RW 0 +#define BIO_RW_AHEAD 1 +#define BIO_RW_BARRIER 2 +#define BIO_RW_SYNC 3 +#define BIO_RW_META 4 +#define BIO_RW_FAILFAST_DEV 5 +#define BIO_RW_FAILFAST_TRANSPORT 6 +#define BIO_RW_FAILFAST_DRIVER 7 /* * upper 16 bits of bi_rw define the io priority of this bio @@ -178,7 +183,10 @@ struct bio { #define bio_sectors(bio) ((bio)->bi_size >> 9) #define bio_barrier(bio) ((bio)->bi_rw & (1 << BIO_RW_BARRIER)) #define bio_sync(bio) ((bio)->bi_rw & (1 << BIO_RW_SYNC)) -#define bio_failfast(bio) ((bio)->bi_rw & (1 << BIO_RW_FAILFAST)) +#define bio_failfast_dev(bio) ((bio)->bi_rw & (1 << BIO_RW_FAILFAST_DEV)) +#define bio_failfast_transport(bio) \ + ((bio)->bi_rw & (1 << BIO_RW_FAILFAST_TRANSPORT)) +#define bio_failfast_driver(bio) ((bio)->bi_rw & (1 << BIO_RW_FAILFAST_DRIVER)) #define bio_rw_ahead(bio) ((bio)->bi_rw & (1 << BIO_RW_AHEAD)) #define bio_rw_meta(bio) ((bio)->bi_rw & (1 << BIO_RW_META)) #define bio_empty_barrier(bio) (bio_barrier(bio) && !(bio)->bi_size) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index d2a1b71..4abaa3a 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -95,7 +95,9 @@ enum { */ enum rq_flag_bits { __REQ_RW, /* not set, read. set, write */ - __REQ_FAILFAST, /* no low level driver retries */ + __REQ_FAILFAST_DEV, /* no driver retries of device errors */ + __REQ_FAILFAST_TRANSPORT, /* no driver retries of transport errors */ + __REQ_FAILFAST_DRIVER, /* no driver retries of driver errors */ __REQ_SORTED, /* elevator knows about this request */ __REQ_SOFTBARRIER, /* may not be passed by ioscheduler */ __REQ_HARDBARRIER, /* may not be passed by drive either */ @@ -117,7 +119,9 @@ enum rq_flag_bits { }; #define REQ_RW (1 << __REQ_RW) -#define REQ_FAILFAST (1 << __REQ_FAILFAST) +#define REQ_FAILFAST_DEV (1 << __REQ_FAILFAST_DEV) +#define REQ_FAILFAST_TRANSPORT (1 << __REQ_FAILFAST_TRANSPORT) +#define REQ_FAILFAST_DRIVER (1 << __REQ_FAILFAST_DRIVER) #define REQ_SORTED (1 << __REQ_SORTED) #define REQ_SOFTBARRIER (1 << __REQ_SOFTBARRIER) #define REQ_HARDBARRIER (1 << __REQ_HARDBARRIER) @@ -495,7 +499,12 @@ enum { #define blk_special_request(rq) ((rq)->cmd_type == REQ_TYPE_SPECIAL) #define blk_sense_request(rq) ((rq)->cmd_type == REQ_TYPE_SENSE) -#define blk_noretry_request(rq) ((rq)->cmd_flags & REQ_FAILFAST) +#define blk_failfast_dev(rq) ((rq)->cmd_flags & REQ_FAILFAST_DEV) +#define blk_failfast_transport(rq) ((rq)->cmd_flags & REQ_FAILFAST_TRANSPORT) +#define blk_failfast_driver(rq) ((rq)->cmd_flags & REQ_FAILFAST_DRIVER) +#define blk_noretry_request(rq) (blk_failfast_dev(rq) || \ + blk_failfast_transport(rq) || \ + blk_failfast_driver(rq)) #define blk_rq_started(rq) ((rq)->cmd_flags & REQ_STARTED) #define blk_account_rq(rq) (blk_rq_started(rq) && blk_fs_request(rq)) -- 1.5.4.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 7/7] scsi: Support fail fast bits 2008-06-05 1:41 ` [PATCH 6/7] block and drivers: separate failfast into multiple bits michaelc @ 2008-06-05 1:41 ` michaelc 0 siblings, 0 replies; 9+ messages in thread From: michaelc @ 2008-06-05 1:41 UTC (permalink / raw) To: dm-devel, linux-scsi; +Cc: Mike Christie From: Mike Christie <michaelc@cs.wisc.edu> This converts scsi decide disposition to handle to the different types of failfast that can be requested. I was not sure if some of these were device or driver or transport errors. For example I made DID_PARITY a device error, but I thought maybe this could be a device or transport error. Also DID_ERROR seems to be used for lots of different errors, so I was not sure how to classify it. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> --- drivers/scsi/scsi_error.c | 17 ++++++++++++----- 1 files changed, 12 insertions(+), 5 deletions(-) diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index d257210..555085a 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -1288,6 +1288,7 @@ static void scsi_eh_offline_sdevs(struct list_head *work_q, int scsi_decide_disposition(struct scsi_cmnd *scmd) { int rtn; + int retry_flag = 0; /* * if the device is offline, then we clearly just pass the result back @@ -1337,6 +1338,7 @@ int scsi_decide_disposition(struct scsi_cmnd *scmd) * and not get stuck in a loop. */ case DID_SOFT_ERROR: + retry_flag = REQ_FAILFAST_DRIVER; goto maybe_retry; case DID_IMM_RETRY: return NEEDS_RETRY; @@ -1368,10 +1370,13 @@ int scsi_decide_disposition(struct scsi_cmnd *scmd) * lower down */ break; - /* fallthrough */ - + retry_flag = REQ_FAILFAST_DRIVER; + goto maybe_retry; case DID_BUS_BUSY: + retry_flag = REQ_FAILFAST_TRANSPORT; + goto maybe_retry; case DID_PARITY: + retry_flag = REQ_FAILFAST_DEV; goto maybe_retry; case DID_TIME_OUT: /* @@ -1420,8 +1425,10 @@ int scsi_decide_disposition(struct scsi_cmnd *scmd) return SUCCESS; case CHECK_CONDITION: rtn = scsi_check_sense(scmd); - if (rtn == NEEDS_RETRY) + if (rtn == NEEDS_RETRY) { + retry_flag = REQ_FAILFAST_DEV; goto maybe_retry; + } /* if rtn == FAILED, we have no sense information; * returning FAILED will wake the error handler thread * to collect the sense and redo the decide @@ -1451,8 +1458,8 @@ int scsi_decide_disposition(struct scsi_cmnd *scmd) * the request was not marked fast fail. Note that above, * even if the request is marked fast fail, we still requeue * for queue congestion conditions (QUEUE_FULL or BUSY) */ - if ((++scmd->retries) <= scmd->allowed - && !blk_noretry_request(scmd->request)) { + if ((++scmd->retries) <= scmd->allowed && + !(scmd->request->cmd_flags & retry_flag)) { return NEEDS_RETRY; } else { /* -- 1.5.4.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 3/7] fc class: Add support for new transport errors 2008-06-05 1:41 ` [PATCH 3/7] fc class: Add support for new transport errors michaelc 2008-06-05 1:41 ` [PATCH 4/7] qla2xxx: use new host byte " michaelc @ 2008-08-19 15:35 ` James Smart 1 sibling, 0 replies; 9+ messages in thread From: James Smart @ 2008-08-19 15:35 UTC (permalink / raw) To: device-mapper development; +Cc: Mike Christie, linux-scsi Ack. Although, I have the personal style preference of : rport->flags &= ~(FC_RPORT_FAST_FAIL_TIMEDOUT | FC_RPORT_DEVLOSS_PENDING); over > > + rport->flags &= ~FC_RPORT_FAST_FAIL_TIMEDOUT; > rport->flags &= ~FC_RPORT_DEVLOSS_PENDING; > -- james s michaelc@cs.wisc.edu wrote: > From: Mike Christie <michaelc@cs.wisc.edu> > > When we block a rport and the driver implements the terminate > callback we will fail IO that was running quickly. However > IO that was in the scsi_device/block queue sits there until > the dev_loss_tmo fires, and this can make it look like IO is > lost because new IO will get executed but that IO stuck in > the blocked queue sits there for some time longer. > > With this patch when the fast io fail tmo fires, we will > fail the blocked IO and any new IO. This patch also allows > all drivers to partially support the fast io fail tmo. If the > terminate io callback is not implemented, we will still fail blocked > IO and any new IO, so multipath can handle that. This means that for > drivers like qla2xxx which seem to fail the IO when the error is first > detected this will then allow drivers like lpfc and qla2xxx to have the > IO flushed to the upper layers when the fast io fail tmo is fired. > > This patch also allows the fc and iscsi classes to implement the > same behavior. The timers are just unfornately named differently. > > The next patches will convert the drivers to support this. > > This patch has been lightly tested with lpfc and qla2xxx. I am not able > to test the role change handling. > > Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> > --- > drivers/scsi/scsi_transport_fc.c | 15 ++++++++++----- > include/scsi/scsi_transport_fc.h | 8 ++++++-- > 2 files changed, 16 insertions(+), 7 deletions(-) ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-08-19 15:35 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-06-05 1:41 block and scsi fail fast fixes michaelc 2008-06-05 1:41 ` [PATCH 1/7] scsi: add transport host byte errors (v2) michaelc 2008-06-05 1:41 ` [PATCH 2/7] iscsi class, libiscsi and qla4xxx: convert to new transport host byte values michaelc 2008-06-05 1:41 ` [PATCH 3/7] fc class: Add support for new transport errors michaelc 2008-06-05 1:41 ` [PATCH 4/7] qla2xxx: use new host byte " michaelc 2008-06-05 1:41 ` [PATCH 5/7] lpfc: start to use new trasnport errors michaelc 2008-06-05 1:41 ` [PATCH 6/7] block and drivers: separate failfast into multiple bits michaelc 2008-06-05 1:41 ` [PATCH 7/7] scsi: Support fail fast bits michaelc 2008-08-19 15:35 ` [PATCH 3/7] fc class: Add support for new transport errors James Smart
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).