* [patch 0/2] Stop scsi eh when fast_io_fail_tmo fires @ 2010-03-24 15:50 Christof Schmitt 2010-03-24 15:50 ` [patch 1/2] scsi: Allow FC LLD to fast-fail scsi eh Christof Schmitt 2010-03-24 15:50 ` [patch 2/2] zfcp: Pass return code from fc_block_scsi_eh to " Christof Schmitt 0 siblings, 2 replies; 5+ messages in thread From: Christof Schmitt @ 2010-03-24 15:50 UTC (permalink / raw) To: linux-scsi As a follow-up for the draft mentioned in http://marc.info/?l=linux-scsi&m=126926716605409&w=2 The patches allow the scsi eh to exit early when the fast_io_fail_tmo fires. This fixes the problem that the scsi eh will block until the dev_loss_tmo fires, thus potentially blocking the scsi host for a very long time. Comments? -- Christof Schmitt ^ permalink raw reply [flat|nested] 5+ messages in thread
* [patch 1/2] scsi: Allow FC LLD to fast-fail scsi eh 2010-03-24 15:50 [patch 0/2] Stop scsi eh when fast_io_fail_tmo fires Christof Schmitt @ 2010-03-24 15:50 ` Christof Schmitt 2010-03-25 19:36 ` Mike Christie 2010-03-24 15:50 ` [patch 2/2] zfcp: Pass return code from fc_block_scsi_eh to " Christof Schmitt 1 sibling, 1 reply; 5+ messages in thread From: Christof Schmitt @ 2010-03-24 15:50 UTC (permalink / raw) To: linux-scsi; +Cc: Christof Schmitt [-- Attachment #1: fast-fail-2.diff --] [-- Type: text/plain, Size: 6237 bytes --] From: Christof Schmitt <christof.schmitt@de.ibm.com> If the scsi eh is running and then a FC LLD calls fc_remote_port_delete, the SCSI commands sent from the eh will fail. To prevent this, a FC LLD can call fc_block_scsi_eh from the eh callback, blocking the eh thread until the dev_loss_tmo fires or the remote port is available again. If (e.g. for a multipathing setup) the dev_loss_tmo is set to a very large value, thus preventing the scsi device removal , the scsi eh can block for a long time. For multipathing, the fast_io_fail_tmo is then set to a low value to detect path problems sooner. This patch introduces a new return code FAST_IO_FAIL. The function fc_block_scsi_eh now returns FAST_IO_FAIL when the fast_io_fail_tmo fires. This indicates that the LLD terminated all pending I/O requests and there are no more pending SCSI commands for the scsi eh to wait for. This return code can be passed back to the scsi eh to stop the escalation and finish the recovery process for this device. Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> --- drivers/scsi/scsi_error.c | 15 ++++++++++----- drivers/scsi/scsi_transport_fc.c | 20 +++++++++++++++----- include/scsi/scsi.h | 1 + include/scsi/scsi_transport_fc.h | 2 +- 4 files changed, 27 insertions(+), 11 deletions(-) --- a/drivers/scsi/scsi_error.c 2010-03-23 11:25:24.000000000 +0100 +++ b/drivers/scsi/scsi_error.c 2010-03-23 11:25:26.000000000 +0100 @@ -956,9 +956,10 @@ static int scsi_eh_abort_cmds(struct lis "0x%p\n", current->comm, scmd)); rtn = scsi_try_to_abort_cmd(scmd); - if (rtn == SUCCESS) { + if (rtn == SUCCESS || rtn == FAST_IO_FAIL) { scmd->eh_eflags &= ~SCSI_EH_CANCEL_CMD; if (!scsi_device_online(scmd->device) || + rtn == FAST_IO_FAIL || !scsi_eh_tur(scmd)) { scsi_eh_finish_cmd(scmd, done_q); } @@ -1085,8 +1086,9 @@ static int scsi_eh_bus_device_reset(stru " 0x%p\n", current->comm, sdev)); rtn = scsi_try_bus_device_reset(bdr_scmd); - if (rtn == SUCCESS) { + if (rtn == SUCCESS || rtn == FAST_IO_FAIL) { if (!scsi_device_online(sdev) || + rtn == FAST_IO_FAIL || !scsi_eh_tur(bdr_scmd)) { list_for_each_entry_safe(scmd, next, work_q, eh_entry) { @@ -1149,10 +1151,11 @@ static int scsi_eh_target_reset(struct S "to target %d\n", current->comm, id)); rtn = scsi_try_target_reset(tgtr_scmd); - if (rtn == SUCCESS) { + if (rtn == SUCCESS || rtn == FAST_IO_FAIL) { list_for_each_entry_safe(scmd, next, work_q, eh_entry) { if (id == scmd_id(scmd)) if (!scsi_device_online(scmd->device) || + rtn == FAST_IO_FAIL || !scsi_eh_tur(tgtr_scmd)) scsi_eh_finish_cmd(scmd, done_q); @@ -1208,10 +1211,11 @@ static int scsi_eh_bus_reset(struct Scsi " %d\n", current->comm, channel)); rtn = scsi_try_bus_reset(chan_scmd); - if (rtn == SUCCESS) { + if (rtn == SUCCESS || rtn == FAST_IO_FAIL) { list_for_each_entry_safe(scmd, next, work_q, eh_entry) { if (channel == scmd_channel(scmd)) if (!scsi_device_online(scmd->device) || + rtn == FAST_IO_FAIL || !scsi_eh_tur(scmd)) scsi_eh_finish_cmd(scmd, done_q); @@ -1245,9 +1249,10 @@ static int scsi_eh_host_reset(struct lis , current->comm)); rtn = scsi_try_host_reset(scmd); - if (rtn == SUCCESS) { + if (rtn == SUCCESS || rtn == FAST_IO_FAIL) { list_for_each_entry_safe(scmd, next, work_q, eh_entry) { if (!scsi_device_online(scmd->device) || + rtn == FAST_IO_FAIL || (!scsi_eh_try_stu(scmd) && !scsi_eh_tur(scmd)) || !scsi_eh_tur(scmd)) scsi_eh_finish_cmd(scmd, done_q); --- a/drivers/scsi/scsi_transport_fc.c 2010-03-23 11:25:24.000000000 +0100 +++ b/drivers/scsi/scsi_transport_fc.c 2010-03-23 11:26:16.000000000 +0100 @@ -3190,23 +3190,33 @@ fc_scsi_scan_rport(struct work_struct *w * * This routine can be called from a FC LLD scsi_eh callback. It * blocks the scsi_eh thread until the fc_rport leaves the - * FC_PORTSTATE_BLOCKED. This is necessary to avoid the scsi_eh - * failing recovery actions for blocked rports which would lead to - * offlined SCSI devices. + * FC_PORTSTATE_BLOCKED, or the fast_io_fail_tmo fires. This is + * necessary to avoid the scsi_eh failing recovery actions for blocked + * rports which would lead to offlined SCSI devices. + * + * Returns: 0 if the fc_rport left the state FC_PORTSTATE_BLOCKED. + * FAST_IO_FAIL if the fast_io_fail_tmo fired, this should be + * passed back to scsi_eh. */ -void fc_block_scsi_eh(struct scsi_cmnd *cmnd) +int fc_block_scsi_eh(struct scsi_cmnd *cmnd) { struct Scsi_Host *shost = cmnd->device->host; struct fc_rport *rport = starget_to_rport(scsi_target(cmnd->device)); unsigned long flags; spin_lock_irqsave(shost->host_lock, flags); - while (rport->port_state == FC_PORTSTATE_BLOCKED) { + while (rport->port_state == FC_PORTSTATE_BLOCKED && + !(rport->flags & FC_RPORT_FAST_FAIL_TIMEDOUT)) { spin_unlock_irqrestore(shost->host_lock, flags); msleep(1000); spin_lock_irqsave(shost->host_lock, flags); } spin_unlock_irqrestore(shost->host_lock, flags); + + if (rport->flags & FC_RPORT_FAST_FAIL_TIMEDOUT) + return FAST_IO_FAIL; + + return 0; } EXPORT_SYMBOL(fc_block_scsi_eh); --- a/include/scsi/scsi.h 2010-03-23 11:25:24.000000000 +0100 +++ b/include/scsi/scsi.h 2010-03-23 11:25:26.000000000 +0100 @@ -423,6 +423,7 @@ static inline int scsi_is_wlun(unsigned #define ADD_TO_MLQUEUE 0x2006 #define TIMEOUT_ERROR 0x2007 #define SCSI_RETURN_NOT_HANDLED 0x2008 +#define FAST_IO_FAIL 0x2009 /* * Midlevel queue return values. --- a/include/scsi/scsi_transport_fc.h 2010-03-23 11:25:24.000000000 +0100 +++ b/include/scsi/scsi_transport_fc.h 2010-03-23 11:25:26.000000000 +0100 @@ -807,6 +807,6 @@ void fc_host_post_vendor_event(struct Sc struct fc_vport *fc_vport_create(struct Scsi_Host *shost, int channel, struct fc_vport_identifiers *); int fc_vport_terminate(struct fc_vport *vport); -void fc_block_scsi_eh(struct scsi_cmnd *cmnd); +int fc_block_scsi_eh(struct scsi_cmnd *cmnd); #endif /* SCSI_TRANSPORT_FC_H */ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [patch 1/2] scsi: Allow FC LLD to fast-fail scsi eh 2010-03-24 15:50 ` [patch 1/2] scsi: Allow FC LLD to fast-fail scsi eh Christof Schmitt @ 2010-03-25 19:36 ` Mike Christie 0 siblings, 0 replies; 5+ messages in thread From: Mike Christie @ 2010-03-25 19:36 UTC (permalink / raw) To: Christof Schmitt; +Cc: linux-scsi On 03/24/2010 10:50 AM, Christof Schmitt wrote: > From: Christof Schmitt<christof.schmitt@de.ibm.com> > > If the scsi eh is running and then a FC LLD calls > fc_remote_port_delete, the SCSI commands sent from the eh will fail. > To prevent this, a FC LLD can call fc_block_scsi_eh from the eh > callback, blocking the eh thread until the dev_loss_tmo fires or the > remote port is available again. > > If (e.g. for a multipathing setup) the dev_loss_tmo is set to a very > large value, thus preventing the scsi device removal , the scsi eh can > block for a long time. For multipathing, the fast_io_fail_tmo is then > set to a low value to detect path problems sooner. > > This patch introduces a new return code FAST_IO_FAIL. The function > fc_block_scsi_eh now returns FAST_IO_FAIL when the fast_io_fail_tmo > fires. This indicates that the LLD terminated all pending I/O requests > and there are no more pending SCSI commands for the scsi eh to wait > for. This return code can be passed back to the scsi eh to stop the > escalation and finish the recovery process for this device. > Patch seems ok to me. It fixes the bug I was trying to fix with my other patch but I think your patch is nicer with the new error code to use. ^ permalink raw reply [flat|nested] 5+ messages in thread
* [patch 2/2] zfcp: Pass return code from fc_block_scsi_eh to scsi eh 2010-03-24 15:50 [patch 0/2] Stop scsi eh when fast_io_fail_tmo fires Christof Schmitt 2010-03-24 15:50 ` [patch 1/2] scsi: Allow FC LLD to fast-fail scsi eh Christof Schmitt @ 2010-03-24 15:50 ` Christof Schmitt 1 sibling, 0 replies; 5+ messages in thread From: Christof Schmitt @ 2010-03-24 15:50 UTC (permalink / raw) To: linux-scsi; +Cc: Christof Schmitt [-- Attachment #1: fast-fail-2-zfcp.diff --] [-- Type: text/plain, Size: 2221 bytes --] From: Christof Schmitt <christof.schmitt@de.ibm.com> The return code FAST_IO_FAIL from fc_block_scsi_eh indicates that the pending I/O requests have been terminated as a result of the fast_io_fail_tmo. Pass this return code back to the scsi eh to stop the scsi eh in this case. Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> --- drivers/s390/scsi/zfcp_scsi.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) --- a/drivers/s390/scsi/zfcp_scsi.c 2010-03-23 10:59:09.000000000 +0100 +++ b/drivers/s390/scsi/zfcp_scsi.c 2010-03-23 11:21:10.000000000 +0100 @@ -174,7 +174,7 @@ static int zfcp_scsi_eh_abort_handler(st struct zfcp_fsf_req *old_req, *abrt_req; unsigned long flags; unsigned long old_reqid = (unsigned long) scpnt->host_scribble; - int retval = SUCCESS; + int retval = SUCCESS, ret; int retry = 3; char *dbf_tag; @@ -199,7 +199,9 @@ static int zfcp_scsi_eh_abort_handler(st break; zfcp_erp_wait(adapter); - fc_block_scsi_eh(scpnt); + ret = fc_block_scsi_eh(scpnt); + if (ret) + return ret; if (!(atomic_read(&adapter->status) & ZFCP_STATUS_COMMON_RUNNING)) { zfcp_dbf_scsi_abort("nres", adapter->dbf, scpnt, NULL, @@ -230,7 +232,7 @@ static int zfcp_task_mgmt_function(struc struct zfcp_unit *unit = scpnt->device->hostdata; struct zfcp_adapter *adapter = unit->port->adapter; struct zfcp_fsf_req *fsf_req = NULL; - int retval = SUCCESS; + int retval = SUCCESS, ret; int retry = 3; while (retry--) { @@ -239,7 +241,10 @@ static int zfcp_task_mgmt_function(struc break; zfcp_erp_wait(adapter); - fc_block_scsi_eh(scpnt); + ret = fc_block_scsi_eh(scpnt); + if (ret) + return ret; + if (!(atomic_read(&adapter->status) & ZFCP_STATUS_COMMON_RUNNING)) { zfcp_dbf_scsi_devreset("nres", tm_flags, unit, scpnt); @@ -275,10 +280,13 @@ static int zfcp_scsi_eh_host_reset_handl { struct zfcp_unit *unit = scpnt->device->hostdata; struct zfcp_adapter *adapter = unit->port->adapter; + int ret; zfcp_erp_adapter_reopen(adapter, 0, "schrh_1", scpnt); zfcp_erp_wait(adapter); - fc_block_scsi_eh(scpnt); + ret = fc_block_scsi_eh(scpnt); + if (ret) + return ret; return SUCCESS; } ^ permalink raw reply [flat|nested] 5+ messages in thread
* [patch 0/2] [PATCH/RESEND] Stop scsi eh when fast_io_fail_tmo fires @ 2010-04-01 11:48 Christof Schmitt 2010-04-01 11:48 ` [patch 1/2] scsi: Allow FC LLD to fast-fail scsi eh Christof Schmitt 0 siblings, 1 reply; 5+ messages in thread From: Christof Schmitt @ 2010-04-01 11:48 UTC (permalink / raw) To: James Bottomley; +Cc: linux-scsi, Mike Christie, Hannes Reinecke James, these are the same two patches i submitted to linux-scsi before. The initial submission went only to linux-scsi, so here it is again with the appropriate cc list. The patches solve the problem of the scsi eh blocking the host until the dev_loss_tmo fires, allowing the scsi eh to stop already when the fast_io_fail_tmo fires. Hannes and Mike agree with this approach. The patches apply to both, the current scsi-misc and the current scsi-rc-fixes tree, i am not quite sure which tree would be more appropriate. -- Christof ^ permalink raw reply [flat|nested] 5+ messages in thread
* [patch 1/2] scsi: Allow FC LLD to fast-fail scsi eh 2010-04-01 11:48 [patch 0/2] [PATCH/RESEND] Stop scsi eh when fast_io_fail_tmo fires Christof Schmitt @ 2010-04-01 11:48 ` Christof Schmitt 0 siblings, 0 replies; 5+ messages in thread From: Christof Schmitt @ 2010-04-01 11:48 UTC (permalink / raw) To: James Bottomley Cc: linux-scsi, Mike Christie, Hannes Reinecke, Christof Schmitt [-- Attachment #1: fast-fail-2.diff --] [-- Type: text/plain, Size: 6237 bytes --] From: Christof Schmitt <christof.schmitt@de.ibm.com> If the scsi eh is running and then a FC LLD calls fc_remote_port_delete, the SCSI commands sent from the eh will fail. To prevent this, a FC LLD can call fc_block_scsi_eh from the eh callback, blocking the eh thread until the dev_loss_tmo fires or the remote port is available again. If (e.g. for a multipathing setup) the dev_loss_tmo is set to a very large value, thus preventing the scsi device removal , the scsi eh can block for a long time. For multipathing, the fast_io_fail_tmo is then set to a low value to detect path problems sooner. This patch introduces a new return code FAST_IO_FAIL. The function fc_block_scsi_eh now returns FAST_IO_FAIL when the fast_io_fail_tmo fires. This indicates that the LLD terminated all pending I/O requests and there are no more pending SCSI commands for the scsi eh to wait for. This return code can be passed back to the scsi eh to stop the escalation and finish the recovery process for this device. Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> --- drivers/scsi/scsi_error.c | 15 ++++++++++----- drivers/scsi/scsi_transport_fc.c | 20 +++++++++++++++----- include/scsi/scsi.h | 1 + include/scsi/scsi_transport_fc.h | 2 +- 4 files changed, 27 insertions(+), 11 deletions(-) --- a/drivers/scsi/scsi_error.c 2010-03-23 11:25:24.000000000 +0100 +++ b/drivers/scsi/scsi_error.c 2010-03-23 11:25:26.000000000 +0100 @@ -956,9 +956,10 @@ static int scsi_eh_abort_cmds(struct lis "0x%p\n", current->comm, scmd)); rtn = scsi_try_to_abort_cmd(scmd); - if (rtn == SUCCESS) { + if (rtn == SUCCESS || rtn == FAST_IO_FAIL) { scmd->eh_eflags &= ~SCSI_EH_CANCEL_CMD; if (!scsi_device_online(scmd->device) || + rtn == FAST_IO_FAIL || !scsi_eh_tur(scmd)) { scsi_eh_finish_cmd(scmd, done_q); } @@ -1085,8 +1086,9 @@ static int scsi_eh_bus_device_reset(stru " 0x%p\n", current->comm, sdev)); rtn = scsi_try_bus_device_reset(bdr_scmd); - if (rtn == SUCCESS) { + if (rtn == SUCCESS || rtn == FAST_IO_FAIL) { if (!scsi_device_online(sdev) || + rtn == FAST_IO_FAIL || !scsi_eh_tur(bdr_scmd)) { list_for_each_entry_safe(scmd, next, work_q, eh_entry) { @@ -1149,10 +1151,11 @@ static int scsi_eh_target_reset(struct S "to target %d\n", current->comm, id)); rtn = scsi_try_target_reset(tgtr_scmd); - if (rtn == SUCCESS) { + if (rtn == SUCCESS || rtn == FAST_IO_FAIL) { list_for_each_entry_safe(scmd, next, work_q, eh_entry) { if (id == scmd_id(scmd)) if (!scsi_device_online(scmd->device) || + rtn == FAST_IO_FAIL || !scsi_eh_tur(tgtr_scmd)) scsi_eh_finish_cmd(scmd, done_q); @@ -1208,10 +1211,11 @@ static int scsi_eh_bus_reset(struct Scsi " %d\n", current->comm, channel)); rtn = scsi_try_bus_reset(chan_scmd); - if (rtn == SUCCESS) { + if (rtn == SUCCESS || rtn == FAST_IO_FAIL) { list_for_each_entry_safe(scmd, next, work_q, eh_entry) { if (channel == scmd_channel(scmd)) if (!scsi_device_online(scmd->device) || + rtn == FAST_IO_FAIL || !scsi_eh_tur(scmd)) scsi_eh_finish_cmd(scmd, done_q); @@ -1245,9 +1249,10 @@ static int scsi_eh_host_reset(struct lis , current->comm)); rtn = scsi_try_host_reset(scmd); - if (rtn == SUCCESS) { + if (rtn == SUCCESS || rtn == FAST_IO_FAIL) { list_for_each_entry_safe(scmd, next, work_q, eh_entry) { if (!scsi_device_online(scmd->device) || + rtn == FAST_IO_FAIL || (!scsi_eh_try_stu(scmd) && !scsi_eh_tur(scmd)) || !scsi_eh_tur(scmd)) scsi_eh_finish_cmd(scmd, done_q); --- a/drivers/scsi/scsi_transport_fc.c 2010-03-23 11:25:24.000000000 +0100 +++ b/drivers/scsi/scsi_transport_fc.c 2010-03-23 11:26:16.000000000 +0100 @@ -3190,23 +3190,33 @@ fc_scsi_scan_rport(struct work_struct *w * * This routine can be called from a FC LLD scsi_eh callback. It * blocks the scsi_eh thread until the fc_rport leaves the - * FC_PORTSTATE_BLOCKED. This is necessary to avoid the scsi_eh - * failing recovery actions for blocked rports which would lead to - * offlined SCSI devices. + * FC_PORTSTATE_BLOCKED, or the fast_io_fail_tmo fires. This is + * necessary to avoid the scsi_eh failing recovery actions for blocked + * rports which would lead to offlined SCSI devices. + * + * Returns: 0 if the fc_rport left the state FC_PORTSTATE_BLOCKED. + * FAST_IO_FAIL if the fast_io_fail_tmo fired, this should be + * passed back to scsi_eh. */ -void fc_block_scsi_eh(struct scsi_cmnd *cmnd) +int fc_block_scsi_eh(struct scsi_cmnd *cmnd) { struct Scsi_Host *shost = cmnd->device->host; struct fc_rport *rport = starget_to_rport(scsi_target(cmnd->device)); unsigned long flags; spin_lock_irqsave(shost->host_lock, flags); - while (rport->port_state == FC_PORTSTATE_BLOCKED) { + while (rport->port_state == FC_PORTSTATE_BLOCKED && + !(rport->flags & FC_RPORT_FAST_FAIL_TIMEDOUT)) { spin_unlock_irqrestore(shost->host_lock, flags); msleep(1000); spin_lock_irqsave(shost->host_lock, flags); } spin_unlock_irqrestore(shost->host_lock, flags); + + if (rport->flags & FC_RPORT_FAST_FAIL_TIMEDOUT) + return FAST_IO_FAIL; + + return 0; } EXPORT_SYMBOL(fc_block_scsi_eh); --- a/include/scsi/scsi.h 2010-03-23 11:25:24.000000000 +0100 +++ b/include/scsi/scsi.h 2010-03-23 11:25:26.000000000 +0100 @@ -423,6 +423,7 @@ static inline int scsi_is_wlun(unsigned #define ADD_TO_MLQUEUE 0x2006 #define TIMEOUT_ERROR 0x2007 #define SCSI_RETURN_NOT_HANDLED 0x2008 +#define FAST_IO_FAIL 0x2009 /* * Midlevel queue return values. --- a/include/scsi/scsi_transport_fc.h 2010-03-23 11:25:24.000000000 +0100 +++ b/include/scsi/scsi_transport_fc.h 2010-03-23 11:25:26.000000000 +0100 @@ -807,6 +807,6 @@ void fc_host_post_vendor_event(struct Sc struct fc_vport *fc_vport_create(struct Scsi_Host *shost, int channel, struct fc_vport_identifiers *); int fc_vport_terminate(struct fc_vport *vport); -void fc_block_scsi_eh(struct scsi_cmnd *cmnd); +int fc_block_scsi_eh(struct scsi_cmnd *cmnd); #endif /* SCSI_TRANSPORT_FC_H */ ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-04-01 11:57 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-03-24 15:50 [patch 0/2] Stop scsi eh when fast_io_fail_tmo fires Christof Schmitt 2010-03-24 15:50 ` [patch 1/2] scsi: Allow FC LLD to fast-fail scsi eh Christof Schmitt 2010-03-25 19:36 ` Mike Christie 2010-03-24 15:50 ` [patch 2/2] zfcp: Pass return code from fc_block_scsi_eh to " Christof Schmitt -- strict thread matches above, loose matches on Subject: below -- 2010-04-01 11:48 [patch 0/2] [PATCH/RESEND] Stop scsi eh when fast_io_fail_tmo fires Christof Schmitt 2010-04-01 11:48 ` [patch 1/2] scsi: Allow FC LLD to fast-fail scsi eh Christof Schmitt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox