* [PATCH 0/5] SCSI EH cleanup
@ 2015-12-03 7:17 Hannes Reinecke
2015-12-03 7:17 ` [PATCH 1/5] libsas: allow async aborts Hannes Reinecke
` (4 more replies)
0 siblings, 5 replies; 15+ messages in thread
From: Hannes Reinecke @ 2015-12-03 7:17 UTC (permalink / raw)
To: Martin K. Petersen
Cc: Christoph Hellwig, James Bottomley, linux-scsi, Hannes Reinecke
Hi all,
here's a small patchset for cleaning up SCSI EH.
Primary goal is to make asynchronous aborts mandatory; there hasn't
been a single report so far where asynchronous abort won't work, so
the 'no_async_abort' flag has never been used and will be removed
with this patchset.
Additionally there's a cleanup for handle failed EH commands, and
to detect retries of failed commands.
As usual, comments and reviews are welcome.
Christoph Hellwig (1):
libsas: allow async aborts
Hannes Reinecke (4):
scsi: make scsi_eh_scmd_add() always succeed
scsi: make eh_eflags persistent
scsi: make asynchronous aborts mandatory
scsi_error: do not escalate failed EH command
Documentation/scsi/scsi_eh.txt | 31 ++++-----
drivers/scsi/libsas/sas_scsi_host.c | 3 -
drivers/scsi/scsi_error.c | 125 ++++++------------------------------
drivers/scsi/scsi_lib.c | 4 +-
drivers/scsi/scsi_priv.h | 3 +-
include/scsi/scsi_eh.h | 1 +
include/scsi/scsi_host.h | 5 --
7 files changed, 35 insertions(+), 137 deletions(-)
--
1.8.5.6
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH 1/5] libsas: allow async aborts 2015-12-03 7:17 [PATCH 0/5] SCSI EH cleanup Hannes Reinecke @ 2015-12-03 7:17 ` Hannes Reinecke 2015-12-03 8:17 ` Johannes Thumshirn 2015-12-03 7:17 ` [PATCH 2/5] scsi: make scsi_eh_scmd_add() always succeed Hannes Reinecke ` (3 subsequent siblings) 4 siblings, 1 reply; 15+ messages in thread From: Hannes Reinecke @ 2015-12-03 7:17 UTC (permalink / raw) To: Martin K. Petersen; +Cc: Christoph Hellwig, James Bottomley, linux-scsi From: Christoph Hellwig <hch@lst.de> We now first try to call ->eh_abort_handler from a work queue, but libsas was always failing that for no good reason. Allow async aborts. Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> --- drivers/scsi/libsas/sas_scsi_host.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c index 519dac4..37a2a84 100644 --- a/drivers/scsi/libsas/sas_scsi_host.c +++ b/drivers/scsi/libsas/sas_scsi_host.c @@ -491,9 +491,6 @@ int sas_eh_abort_handler(struct scsi_cmnd *cmd) struct Scsi_Host *host = cmd->device->host; struct sas_internal *i = to_sas_internal(host->transportt); - if (current != host->ehandler) - return FAILED; - if (!i->dft->lldd_abort_task) return FAILED; -- 1.8.5.6 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 1/5] libsas: allow async aborts 2015-12-03 7:17 ` [PATCH 1/5] libsas: allow async aborts Hannes Reinecke @ 2015-12-03 8:17 ` Johannes Thumshirn 0 siblings, 0 replies; 15+ messages in thread From: Johannes Thumshirn @ 2015-12-03 8:17 UTC (permalink / raw) To: Hannes Reinecke, Martin K. Petersen Cc: Christoph Hellwig, James Bottomley, linux-scsi On Thu, 2015-12-03 at 08:17 +0100, Hannes Reinecke wrote: > From: Christoph Hellwig <hch@lst.de> > > We now first try to call ->eh_abort_handler from a work queue, but libsas > was always failing that for no good reason. Allow async aborts. > > Reviewed-by: Hannes Reinecke <hare@suse.de> > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > drivers/scsi/libsas/sas_scsi_host.c | 3 --- > 1 file changed, 3 deletions(-) > > diff --git a/drivers/scsi/libsas/sas_scsi_host.c > b/drivers/scsi/libsas/sas_scsi_host.c > index 519dac4..37a2a84 100644 > --- a/drivers/scsi/libsas/sas_scsi_host.c > +++ b/drivers/scsi/libsas/sas_scsi_host.c > @@ -491,9 +491,6 @@ int sas_eh_abort_handler(struct scsi_cmnd *cmd) > struct Scsi_Host *host = cmd->device->host; > struct sas_internal *i = to_sas_internal(host->transportt); > > - if (current != host->ehandler) > - return FAILED; > - > if (!i->dft->lldd_abort_task) > return FAILED; > Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 2/5] scsi: make scsi_eh_scmd_add() always succeed 2015-12-03 7:17 [PATCH 0/5] SCSI EH cleanup Hannes Reinecke 2015-12-03 7:17 ` [PATCH 1/5] libsas: allow async aborts Hannes Reinecke @ 2015-12-03 7:17 ` Hannes Reinecke 2015-12-03 9:07 ` Johannes Thumshirn 2015-12-03 16:51 ` Christoph Hellwig 2015-12-03 7:17 ` [PATCH 3/5] scsi: make eh_eflags persistent Hannes Reinecke ` (2 subsequent siblings) 4 siblings, 2 replies; 15+ messages in thread From: Hannes Reinecke @ 2015-12-03 7:17 UTC (permalink / raw) To: Martin K. Petersen Cc: Christoph Hellwig, James Bottomley, linux-scsi, Hannes Reinecke scsi_eh_scmd_add() currently only will fail if no error handler thread is started (which will never be the case) or if the state machine encounters an illegal transition. But if we're encountering an invalid state transition chances is we cannot fixup things with the error handler. So better add a WARN_ON for illegal host states and make scsi_dh_scmd_add() a void function. Signed-off-by: Hannes Reinecke <hare@suse.de> --- drivers/scsi/scsi_error.c | 39 +++++++++++++-------------------------- drivers/scsi/scsi_lib.c | 4 ++-- drivers/scsi/scsi_priv.h | 2 +- 3 files changed, 16 insertions(+), 29 deletions(-) diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 984ddcb..deb35737 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -162,13 +162,7 @@ scmd_eh_abort_handler(struct work_struct *work) } } - if (!scsi_eh_scmd_add(scmd, 0)) { - SCSI_LOG_ERROR_RECOVERY(3, - scmd_printk(KERN_WARNING, scmd, - "terminate aborted command\n")); - set_host_byte(scmd, DID_TIME_OUT); - scsi_finish_command(scmd); - } + scsi_eh_scmd_add(scmd, 0); } /** @@ -224,37 +218,32 @@ scsi_abort_command(struct scsi_cmnd *scmd) * scsi_eh_scmd_add - add scsi cmd to error handling. * @scmd: scmd to run eh on. * @eh_flag: optional SCSI_EH flag. - * - * Return value: - * 0 on failure. */ -int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag) +void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag) { struct Scsi_Host *shost = scmd->device->host; unsigned long flags; - int ret = 0; - if (!shost->ehandler) - return 0; + WARN_ON(!shost->ehandler); spin_lock_irqsave(shost->host_lock, flags); + WARN_ON(shost->shost_state != SHOST_RUNNING && + shost->shost_state != SHOST_CANCEL && + shost->shost_state != SHOST_RECOVERY && + shost->shost_state != SHOST_CANCEL_RECOVERY); if (scsi_host_set_state(shost, SHOST_RECOVERY)) - if (scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY)) - goto out_unlock; + scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY); if (shost->eh_deadline != -1 && !shost->last_reset) shost->last_reset = jiffies; - ret = 1; if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) eh_flag &= ~SCSI_EH_CANCEL_CMD; scmd->eh_eflags |= eh_flag; list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q); shost->host_failed++; scsi_eh_wakeup(shost); - out_unlock: spin_unlock_irqrestore(shost->host_lock, flags); - return ret; } /** @@ -285,13 +274,11 @@ enum blk_eh_timer_return scsi_times_out(struct request *req) rtn = host->hostt->eh_timed_out(scmd); if (rtn == BLK_EH_NOT_HANDLED) { - if (!host->hostt->no_async_abort && - scsi_abort_command(scmd) == SUCCESS) - return BLK_EH_NOT_HANDLED; - - set_host_byte(scmd, DID_TIME_OUT); - if (!scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD)) - rtn = BLK_EH_HANDLED; + if (host->hostt->no_async_abort || + scsi_abort_command(scmd) != SUCCESS) { + set_host_byte(scmd, DID_TIME_OUT); + scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD); + } } return rtn; diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index fa6b2c4..2dd7d0a 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1647,8 +1647,8 @@ static void scsi_softirq_done(struct request *rq) scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY); break; default: - if (!scsi_eh_scmd_add(cmd, 0)) - scsi_finish_command(cmd); + scsi_eh_scmd_add(cmd, 0); + break; } } diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h index 27b4d0a..8c26823 100644 --- a/drivers/scsi/scsi_priv.h +++ b/drivers/scsi/scsi_priv.h @@ -71,7 +71,7 @@ extern enum blk_eh_timer_return scsi_times_out(struct request *req); extern int scsi_error_handler(void *host); extern int scsi_decide_disposition(struct scsi_cmnd *cmd); extern void scsi_eh_wakeup(struct Scsi_Host *shost); -extern int scsi_eh_scmd_add(struct scsi_cmnd *, int); +extern void scsi_eh_scmd_add(struct scsi_cmnd *, int); void scsi_eh_ready_devs(struct Scsi_Host *shost, struct list_head *work_q, struct list_head *done_q); -- 1.8.5.6 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 2/5] scsi: make scsi_eh_scmd_add() always succeed 2015-12-03 7:17 ` [PATCH 2/5] scsi: make scsi_eh_scmd_add() always succeed Hannes Reinecke @ 2015-12-03 9:07 ` Johannes Thumshirn 2015-12-03 16:51 ` Christoph Hellwig 1 sibling, 0 replies; 15+ messages in thread From: Johannes Thumshirn @ 2015-12-03 9:07 UTC (permalink / raw) To: Hannes Reinecke, Martin K. Petersen Cc: Christoph Hellwig, James Bottomley, linux-scsi On Thu, 2015-12-03 at 08:17 +0100, Hannes Reinecke wrote: > scsi_eh_scmd_add() currently only will fail if no > error handler thread is started (which will never be the > case) or if the state machine encounters an illegal transition. > > But if we're encountering an invalid state transition > chances is we cannot fixup things with the error handler. > So better add a WARN_ON for illegal host states and > make scsi_dh_scmd_add() a void function. > > Signed-off-by: Hannes Reinecke <hare@suse.de> > --- > drivers/scsi/scsi_error.c | 39 +++++++++++++-------------------------- > drivers/scsi/scsi_lib.c | 4 ++-- > drivers/scsi/scsi_priv.h | 2 +- > 3 files changed, 16 insertions(+), 29 deletions(-) > > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c > index 984ddcb..deb35737 100644 > --- a/drivers/scsi/scsi_error.c > +++ b/drivers/scsi/scsi_error.c > @@ -162,13 +162,7 @@ scmd_eh_abort_handler(struct work_struct *work) > } > } > > - if (!scsi_eh_scmd_add(scmd, 0)) { > - SCSI_LOG_ERROR_RECOVERY(3, > - scmd_printk(KERN_WARNING, scmd, > - "terminate aborted command\n")); > - set_host_byte(scmd, DID_TIME_OUT); > - scsi_finish_command(scmd); > - } > + scsi_eh_scmd_add(scmd, 0); > } > > /** > @@ -224,37 +218,32 @@ scsi_abort_command(struct scsi_cmnd *scmd) > * scsi_eh_scmd_add - add scsi cmd to error handling. > * @scmd: scmd to run eh on. > * @eh_flag: optional SCSI_EH flag. > - * > - * Return value: > - * 0 on failure. > */ > -int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag) > +void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag) > { > struct Scsi_Host *shost = scmd->device->host; > unsigned long flags; > - int ret = 0; > > - if (!shost->ehandler) > - return 0; > + WARN_ON(!shost->ehandler); > > spin_lock_irqsave(shost->host_lock, flags); > + WARN_ON(shost->shost_state != SHOST_RUNNING && > + shost->shost_state != SHOST_CANCEL && > + shost->shost_state != SHOST_RECOVERY && > + shost->shost_state != SHOST_CANCEL_RECOVERY); > if (scsi_host_set_state(shost, SHOST_RECOVERY)) > - if (scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY)) > - goto out_unlock; > + scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY); > > if (shost->eh_deadline != -1 && !shost->last_reset) > shost->last_reset = jiffies; > > - ret = 1; > if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) > eh_flag &= ~SCSI_EH_CANCEL_CMD; > scmd->eh_eflags |= eh_flag; > list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q); > shost->host_failed++; > scsi_eh_wakeup(shost); > - out_unlock: > spin_unlock_irqrestore(shost->host_lock, flags); > - return ret; > } > > /** > @@ -285,13 +274,11 @@ enum blk_eh_timer_return scsi_times_out(struct request > *req) > rtn = host->hostt->eh_timed_out(scmd); > > if (rtn == BLK_EH_NOT_HANDLED) { > - if (!host->hostt->no_async_abort && > - scsi_abort_command(scmd) == SUCCESS) > - return BLK_EH_NOT_HANDLED; > - > - set_host_byte(scmd, DID_TIME_OUT); > - if (!scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD)) > - rtn = BLK_EH_HANDLED; > + if (host->hostt->no_async_abort || > + scsi_abort_command(scmd) != SUCCESS) { > + set_host_byte(scmd, DID_TIME_OUT); > + scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD); > + } > } > > return rtn; > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index fa6b2c4..2dd7d0a 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -1647,8 +1647,8 @@ static void scsi_softirq_done(struct request *rq) > scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY); > break; > default: > - if (!scsi_eh_scmd_add(cmd, 0)) > - scsi_finish_command(cmd); > + scsi_eh_scmd_add(cmd, 0); > + break; > } > } > > diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h > index 27b4d0a..8c26823 100644 > --- a/drivers/scsi/scsi_priv.h > +++ b/drivers/scsi/scsi_priv.h > @@ -71,7 +71,7 @@ extern enum blk_eh_timer_return scsi_times_out(struct > request *req); > extern int scsi_error_handler(void *host); > extern int scsi_decide_disposition(struct scsi_cmnd *cmd); > extern void scsi_eh_wakeup(struct Scsi_Host *shost); > -extern int scsi_eh_scmd_add(struct scsi_cmnd *, int); > +extern void scsi_eh_scmd_add(struct scsi_cmnd *, int); > void scsi_eh_ready_devs(struct Scsi_Host *shost, > struct list_head *work_q, > struct list_head *done_q); Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/5] scsi: make scsi_eh_scmd_add() always succeed 2015-12-03 7:17 ` [PATCH 2/5] scsi: make scsi_eh_scmd_add() always succeed Hannes Reinecke 2015-12-03 9:07 ` Johannes Thumshirn @ 2015-12-03 16:51 ` Christoph Hellwig 1 sibling, 0 replies; 15+ messages in thread From: Christoph Hellwig @ 2015-12-03 16:51 UTC (permalink / raw) To: Hannes Reinecke Cc: Martin K. Petersen, Christoph Hellwig, James Bottomley, linux-scsi On Thu, Dec 03, 2015 at 08:17:40AM +0100, Hannes Reinecke wrote: > scsi_eh_scmd_add() currently only will fail if no > error handler thread is started (which will never be the > case) or if the state machine encounters an illegal transition. > > But if we're encountering an invalid state transition > chances is we cannot fixup things with the error handler. > So better add a WARN_ON for illegal host states and > make scsi_dh_scmd_add() a void function. The ehandler parts looks trivially correct, but I'm a little worried about the state transition. The states that we can't transition from are: SHOST_CREATED, SHOST_DEL and SHOST_DEL_RECOVERY. We initialize the state to SHOST_CREATED in scsi_host_alloc and transition away from it in scsi_add_host_with_dma, so that's a true "should be impossible" condition. We transition to SHOST_DEL or SHOST_DEL_RECOVERY in scsi_remove_host and the host remains in it until the final reference is dropped. Given that we wait for all pending I/O in blk_cleanup_queue called from __scsi_remove_device this should be fine as well. So: Reviewed-by: Christoph Hellwig <hch@lst.de> But preferably with an updated changelog that explains things better. ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 3/5] scsi: make eh_eflags persistent 2015-12-03 7:17 [PATCH 0/5] SCSI EH cleanup Hannes Reinecke 2015-12-03 7:17 ` [PATCH 1/5] libsas: allow async aborts Hannes Reinecke 2015-12-03 7:17 ` [PATCH 2/5] scsi: make scsi_eh_scmd_add() always succeed Hannes Reinecke @ 2015-12-03 7:17 ` Hannes Reinecke 2015-12-03 9:08 ` Johannes Thumshirn 2015-12-03 16:55 ` Christoph Hellwig 2015-12-03 7:17 ` [PATCH 4/5] scsi: make asynchronous aborts mandatory Hannes Reinecke 2015-12-03 7:17 ` [PATCH 5/5] scsi_error: do not escalate failed EH command Hannes Reinecke 4 siblings, 2 replies; 15+ messages in thread From: Hannes Reinecke @ 2015-12-03 7:17 UTC (permalink / raw) To: Martin K. Petersen Cc: Christoph Hellwig, James Bottomley, linux-scsi, Hannes Reinecke To detect if a failed command has been retried we must not clear scmd->eh_eflags when EH finishes. The flag should be persistent throughout the lifetime of the command. Signed-off-by: Hannes Reinecke <hare@suse.de> --- Documentation/scsi/scsi_eh.txt | 3 --- drivers/scsi/scsi_error.c | 4 ++-- include/scsi/scsi_eh.h | 1 + 3 files changed, 3 insertions(+), 5 deletions(-) diff --git a/Documentation/scsi/scsi_eh.txt b/Documentation/scsi/scsi_eh.txt index 8638f61..745eed5 100644 --- a/Documentation/scsi/scsi_eh.txt +++ b/Documentation/scsi/scsi_eh.txt @@ -264,7 +264,6 @@ scmd->allowed. 3. scmd recovered ACTION: scsi_eh_finish_cmd() is invoked to EH-finish scmd - shost->host_failed-- - - clear scmd->eh_eflags - scsi_setup_cmd_retry() - move from local eh_work_q to local eh_done_q LOCKING: none @@ -452,8 +451,6 @@ except for #1 must be implemented by eh_strategy_handler(). - shost->host_failed is zero. - - Each scmd's eh_eflags field is cleared. - - Each scmd is in such a state that scsi_setup_cmd_retry() on the scmd doesn't make any difference. diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index deb35737..eb0f19f 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -182,7 +182,6 @@ scsi_abort_command(struct scsi_cmnd *scmd) /* * Retry after abort failed, escalate to next level. */ - scmd->eh_eflags &= ~SCSI_EH_ABORT_SCHEDULED; SCSI_LOG_ERROR_RECOVERY(3, scmd_printk(KERN_INFO, scmd, "previous abort failed\n")); @@ -919,6 +918,7 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd *scmd, struct scsi_eh_save *ses, ses->result = scmd->result; ses->underflow = scmd->underflow; ses->prot_op = scmd->prot_op; + ses->eh_eflags = scmd->eh_eflags; scmd->prot_op = SCSI_PROT_NORMAL; scmd->eh_eflags = 0; @@ -982,6 +982,7 @@ void scsi_eh_restore_cmnd(struct scsi_cmnd* scmd, struct scsi_eh_save *ses) scmd->result = ses->result; scmd->underflow = ses->underflow; scmd->prot_op = ses->prot_op; + scmd->eh_eflags = ses->eh_eflags; } EXPORT_SYMBOL(scsi_eh_restore_cmnd); @@ -1115,7 +1116,6 @@ static int scsi_eh_action(struct scsi_cmnd *scmd, int rtn) void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q) { scmd->device->host->host_failed--; - scmd->eh_eflags = 0; list_move_tail(&scmd->eh_entry, done_q); } EXPORT_SYMBOL(scsi_eh_finish_cmd); diff --git a/include/scsi/scsi_eh.h b/include/scsi/scsi_eh.h index dbb8c64..f2f876c 100644 --- a/include/scsi/scsi_eh.h +++ b/include/scsi/scsi_eh.h @@ -30,6 +30,7 @@ extern int scsi_ioctl_reset(struct scsi_device *, int __user *); struct scsi_eh_save { /* saved state */ int result; + int eh_eflags; enum dma_data_direction data_direction; unsigned underflow; unsigned char cmd_len; -- 1.8.5.6 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 3/5] scsi: make eh_eflags persistent 2015-12-03 7:17 ` [PATCH 3/5] scsi: make eh_eflags persistent Hannes Reinecke @ 2015-12-03 9:08 ` Johannes Thumshirn 2015-12-03 16:55 ` Christoph Hellwig 1 sibling, 0 replies; 15+ messages in thread From: Johannes Thumshirn @ 2015-12-03 9:08 UTC (permalink / raw) To: Hannes Reinecke, Martin K. Petersen Cc: Christoph Hellwig, James Bottomley, linux-scsi On Thu, 2015-12-03 at 08:17 +0100, Hannes Reinecke wrote: > To detect if a failed command has been retried we must not > clear scmd->eh_eflags when EH finishes. > The flag should be persistent throughout the lifetime > of the command. > > Signed-off-by: Hannes Reinecke <hare@suse.de> > --- > Documentation/scsi/scsi_eh.txt | 3 --- > drivers/scsi/scsi_error.c | 4 ++-- > include/scsi/scsi_eh.h | 1 + > 3 files changed, 3 insertions(+), 5 deletions(-) > > diff --git a/Documentation/scsi/scsi_eh.txt b/Documentation/scsi/scsi_eh.txt > index 8638f61..745eed5 100644 > --- a/Documentation/scsi/scsi_eh.txt > +++ b/Documentation/scsi/scsi_eh.txt > @@ -264,7 +264,6 @@ scmd->allowed. > 3. scmd recovered > ACTION: scsi_eh_finish_cmd() is invoked to EH-finish scmd > - shost->host_failed-- > - - clear scmd->eh_eflags > - scsi_setup_cmd_retry() > - move from local eh_work_q to local eh_done_q > LOCKING: none > @@ -452,8 +451,6 @@ except for #1 must be implemented by > eh_strategy_handler(). > > - shost->host_failed is zero. > > - - Each scmd's eh_eflags field is cleared. > - > - Each scmd is in such a state that scsi_setup_cmd_retry() on the > scmd doesn't make any difference. > > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c > index deb35737..eb0f19f 100644 > --- a/drivers/scsi/scsi_error.c > +++ b/drivers/scsi/scsi_error.c > @@ -182,7 +182,6 @@ scsi_abort_command(struct scsi_cmnd *scmd) > /* > * Retry after abort failed, escalate to next level. > */ > - scmd->eh_eflags &= ~SCSI_EH_ABORT_SCHEDULED; > SCSI_LOG_ERROR_RECOVERY(3, > scmd_printk(KERN_INFO, scmd, > "previous abort failed\n")); > @@ -919,6 +918,7 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd *scmd, struct > scsi_eh_save *ses, > ses->result = scmd->result; > ses->underflow = scmd->underflow; > ses->prot_op = scmd->prot_op; > + ses->eh_eflags = scmd->eh_eflags; > > scmd->prot_op = SCSI_PROT_NORMAL; > scmd->eh_eflags = 0; > @@ -982,6 +982,7 @@ void scsi_eh_restore_cmnd(struct scsi_cmnd* scmd, struct > scsi_eh_save *ses) > scmd->result = ses->result; > scmd->underflow = ses->underflow; > scmd->prot_op = ses->prot_op; > + scmd->eh_eflags = ses->eh_eflags; > } > EXPORT_SYMBOL(scsi_eh_restore_cmnd); > > @@ -1115,7 +1116,6 @@ static int scsi_eh_action(struct scsi_cmnd *scmd, int > rtn) > void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q) > { > scmd->device->host->host_failed--; > - scmd->eh_eflags = 0; > list_move_tail(&scmd->eh_entry, done_q); > } > EXPORT_SYMBOL(scsi_eh_finish_cmd); > diff --git a/include/scsi/scsi_eh.h b/include/scsi/scsi_eh.h > index dbb8c64..f2f876c 100644 > --- a/include/scsi/scsi_eh.h > +++ b/include/scsi/scsi_eh.h > @@ -30,6 +30,7 @@ extern int scsi_ioctl_reset(struct scsi_device *, int > __user *); > struct scsi_eh_save { > /* saved state */ > int result; > + int eh_eflags; > enum dma_data_direction data_direction; > unsigned underflow; > unsigned char cmd_len; Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/5] scsi: make eh_eflags persistent 2015-12-03 7:17 ` [PATCH 3/5] scsi: make eh_eflags persistent Hannes Reinecke 2015-12-03 9:08 ` Johannes Thumshirn @ 2015-12-03 16:55 ` Christoph Hellwig 1 sibling, 0 replies; 15+ messages in thread From: Christoph Hellwig @ 2015-12-03 16:55 UTC (permalink / raw) To: Hannes Reinecke Cc: Martin K. Petersen, Christoph Hellwig, James Bottomley, linux-scsi On Thu, Dec 03, 2015 at 08:17:41AM +0100, Hannes Reinecke wrote: > To detect if a failed command has been retried we must not > clear scmd->eh_eflags when EH finishes. > The flag should be persistent throughout the lifetime > of the command. So we save away eh_eflags before potentially reusing the command and the restore it. Seems seems fine, but an explanation of what this fixes would be very helpful as the behavior seems to be around basically forever. ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 4/5] scsi: make asynchronous aborts mandatory 2015-12-03 7:17 [PATCH 0/5] SCSI EH cleanup Hannes Reinecke ` (2 preceding siblings ...) 2015-12-03 7:17 ` [PATCH 3/5] scsi: make eh_eflags persistent Hannes Reinecke @ 2015-12-03 7:17 ` Hannes Reinecke 2015-12-03 9:13 ` Johannes Thumshirn 2015-12-03 7:17 ` [PATCH 5/5] scsi_error: do not escalate failed EH command Hannes Reinecke 4 siblings, 1 reply; 15+ messages in thread From: Hannes Reinecke @ 2015-12-03 7:17 UTC (permalink / raw) To: Martin K. Petersen Cc: Christoph Hellwig, James Bottomley, linux-scsi, Hannes Reinecke There hasn't been any reports for HBAs where asynchronous abort would not work, so we should make it mandatory and remove the fallback. Signed-off-by: Hannes Reinecke <hare@suse.de> --- Documentation/scsi/scsi_eh.txt | 28 +++++++-------- drivers/scsi/scsi_error.c | 81 ++++-------------------------------------- drivers/scsi/scsi_lib.c | 2 +- drivers/scsi/scsi_priv.h | 3 +- include/scsi/scsi_host.h | 5 --- 5 files changed, 22 insertions(+), 97 deletions(-) diff --git a/Documentation/scsi/scsi_eh.txt b/Documentation/scsi/scsi_eh.txt index 745eed5..6e07245fb 100644 --- a/Documentation/scsi/scsi_eh.txt +++ b/Documentation/scsi/scsi_eh.txt @@ -70,7 +70,7 @@ with the command. scmd is requeued to blk queue. - otherwise - scsi_eh_scmd_add(scmd, 0) is invoked for the command. See + scsi_eh_scmd_add(scmd) is invoked for the command. See [1-3] for details of this function. @@ -103,13 +103,15 @@ function eh_timed_out() callback did not handle the command. Step #2 is taken. - 2. If the host supports asynchronous completion (as indicated by the - no_async_abort setting in the host template) scsi_abort_command() - is invoked to schedule an asynchrous abort. If that fails - Step #3 is taken. + 2. scsi_abort_command() is invoked to schedule an asynchrous abort + (Seee [1-3] for more information). + Asynchronous abort are not invoked for commands which have + SCSI_EH_ABORT_SCHEDULED set (this indicates that the command + already had been aborted once, and this is a retry which failed), + or when the EH deadline is expired. In these case Step #3 is taken. - 2. scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD) is invoked for the - command. See [1-3] for more information. + 3. scsi_eh_scmd_add(scmd) is invoked for the + command. See [1-4] for more information. [1-3] Asynchronous command aborts @@ -124,16 +126,13 @@ function scmds enter EH via scsi_eh_scmd_add(), which does the following. - 1. Turns on scmd->eh_eflags as requested. It's 0 for error - completions and SCSI_EH_CANCEL_CMD for timeouts. + 1. Links scmd->eh_entry to shost->eh_cmd_q - 2. Links scmd->eh_entry to shost->eh_cmd_q + 2. Sets SHOST_RECOVERY bit in shost->shost_state - 3. Sets SHOST_RECOVERY bit in shost->shost_state + 3. Increments shost->host_failed - 4. Increments shost->host_failed - - 5. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed + 4. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed As can be seen above, once any scmd is added to shost->eh_cmd_q, SHOST_RECOVERY shost_state bit is turned on. This prevents any new @@ -249,7 +248,6 @@ scmd->allowed. 1. Error completion / time out ACTION: scsi_eh_scmd_add() is invoked for scmd - - set scmd->eh_eflags - add scmd to shost->eh_cmd_q - set SHOST_RECOVERY - shost->host_failed++ diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index eb0f19f..cf47b81 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -162,7 +162,7 @@ scmd_eh_abort_handler(struct work_struct *work) } } - scsi_eh_scmd_add(scmd, 0); + scsi_eh_scmd_add(scmd); } /** @@ -216,9 +216,8 @@ scsi_abort_command(struct scsi_cmnd *scmd) /** * scsi_eh_scmd_add - add scsi cmd to error handling. * @scmd: scmd to run eh on. - * @eh_flag: optional SCSI_EH flag. */ -void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag) +void scsi_eh_scmd_add(struct scsi_cmnd *scmd) { struct Scsi_Host *shost = scmd->device->host; unsigned long flags; @@ -236,9 +235,6 @@ void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag) if (shost->eh_deadline != -1 && !shost->last_reset) shost->last_reset = jiffies; - if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) - eh_flag &= ~SCSI_EH_CANCEL_CMD; - scmd->eh_eflags |= eh_flag; list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q); shost->host_failed++; scsi_eh_wakeup(shost); @@ -273,10 +269,9 @@ enum blk_eh_timer_return scsi_times_out(struct request *req) rtn = host->hostt->eh_timed_out(scmd); if (rtn == BLK_EH_NOT_HANDLED) { - if (host->hostt->no_async_abort || - scsi_abort_command(scmd) != SUCCESS) { + if (scsi_abort_command(scmd) != SUCCESS) { set_host_byte(scmd, DID_TIME_OUT); - scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD); + scsi_eh_scmd_add(scmd); } } @@ -329,7 +324,7 @@ static inline void scsi_eh_prt_fail_stats(struct Scsi_Host *shost, list_for_each_entry(scmd, work_q, eh_entry) { if (scmd->device == sdev) { ++total_failures; - if (scmd->eh_eflags & SCSI_EH_CANCEL_CMD) + if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) ++cmd_cancel; else ++cmd_failed; @@ -1152,8 +1147,7 @@ int scsi_eh_get_sense(struct list_head *work_q, * should not get sense. */ list_for_each_entry_safe(scmd, next, work_q, eh_entry) { - if ((scmd->eh_eflags & SCSI_EH_CANCEL_CMD) || - (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) || + if ((scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) || SCSI_SENSE_VALID(scmd)) continue; @@ -1293,61 +1287,6 @@ static int scsi_eh_test_devices(struct list_head *cmd_list, return list_empty(work_q); } - -/** - * scsi_eh_abort_cmds - abort pending commands. - * @work_q: &list_head for pending commands. - * @done_q: &list_head for processed commands. - * - * Decription: - * Try and see whether or not it makes sense to try and abort the - * running command. This only works out to be the case if we have one - * command that has timed out. If the command simply failed, it makes - * no sense to try and abort the command, since as far as the shost - * adapter is concerned, it isn't running. - */ -static int scsi_eh_abort_cmds(struct list_head *work_q, - struct list_head *done_q) -{ - struct scsi_cmnd *scmd, *next; - LIST_HEAD(check_list); - int rtn; - struct Scsi_Host *shost; - - list_for_each_entry_safe(scmd, next, work_q, eh_entry) { - if (!(scmd->eh_eflags & SCSI_EH_CANCEL_CMD)) - continue; - shost = scmd->device->host; - if (scsi_host_eh_past_deadline(shost)) { - list_splice_init(&check_list, work_q); - SCSI_LOG_ERROR_RECOVERY(3, - scmd_printk(KERN_INFO, scmd, - "%s: skip aborting cmd, past eh deadline\n", - current->comm)); - return list_empty(work_q); - } - SCSI_LOG_ERROR_RECOVERY(3, - scmd_printk(KERN_INFO, scmd, - "%s: aborting cmd\n", current->comm)); - rtn = scsi_try_to_abort_cmd(shost->hostt, scmd); - if (rtn == FAILED) { - SCSI_LOG_ERROR_RECOVERY(3, - scmd_printk(KERN_INFO, scmd, - "%s: aborting cmd failed\n", - current->comm)); - list_splice_init(&check_list, work_q); - return list_empty(work_q); - } - scmd->eh_eflags &= ~SCSI_EH_CANCEL_CMD; - if (rtn == FAST_IO_FAIL) - scsi_eh_finish_cmd(scmd, done_q); - else - list_move_tail(&scmd->eh_entry, &check_list); - } - - return scsi_eh_test_devices(&check_list, work_q, done_q, 0); -} - /** * scsi_eh_try_stu - Send START_UNIT to device. * @scmd: &scsi_cmnd to send START_UNIT @@ -1690,11 +1629,6 @@ static void scsi_eh_offline_sdevs(struct list_head *work_q, sdev_printk(KERN_INFO, scmd->device, "Device offlined - " "not ready after error recovery\n"); scsi_device_set_state(scmd->device, SDEV_OFFLINE); - if (scmd->eh_eflags & SCSI_EH_CANCEL_CMD) { - /* - * FIXME: Handle lost cmds. - */ - } scsi_eh_finish_cmd(scmd, done_q); } return; @@ -2138,8 +2072,7 @@ static void scsi_unjam_host(struct Scsi_Host *shost) SCSI_LOG_ERROR_RECOVERY(1, scsi_eh_prt_fail_stats(shost, &eh_work_q)); if (!scsi_eh_get_sense(&eh_work_q, &eh_done_q)) - if (!scsi_eh_abort_cmds(&eh_work_q, &eh_done_q)) - scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q); + scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q); spin_lock_irqsave(shost->host_lock, flags); if (shost->eh_deadline != -1) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 2dd7d0a..616e074 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1647,7 +1647,7 @@ static void scsi_softirq_done(struct request *rq) scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY); break; default: - scsi_eh_scmd_add(cmd, 0); + scsi_eh_scmd_add(cmd); break; } } diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h index 8c26823..5534825 100644 --- a/drivers/scsi/scsi_priv.h +++ b/drivers/scsi/scsi_priv.h @@ -18,7 +18,6 @@ struct scsi_nl_hdr; /* * Scsi Error Handler Flags */ -#define SCSI_EH_CANCEL_CMD 0x0001 /* Cancel this cmd */ #define SCSI_EH_ABORT_SCHEDULED 0x0002 /* Abort has been scheduled */ #define SCSI_SENSE_VALID(scmd) \ @@ -71,7 +70,7 @@ extern enum blk_eh_timer_return scsi_times_out(struct request *req); extern int scsi_error_handler(void *host); extern int scsi_decide_disposition(struct scsi_cmnd *cmd); extern void scsi_eh_wakeup(struct Scsi_Host *shost); -extern void scsi_eh_scmd_add(struct scsi_cmnd *, int); +extern void scsi_eh_scmd_add(struct scsi_cmnd *); void scsi_eh_ready_devs(struct Scsi_Host *shost, struct list_head *work_q, struct list_head *done_q); diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h index ed52712..ef7fff6 100644 --- a/include/scsi/scsi_host.h +++ b/include/scsi/scsi_host.h @@ -444,11 +444,6 @@ struct scsi_host_template { unsigned no_write_same:1; /* - * True if asynchronous aborts are not supported - */ - unsigned no_async_abort:1; - - /* * Countdown for host blocking with no commands outstanding. */ unsigned int max_host_blocked; -- 1.8.5.6 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 4/5] scsi: make asynchronous aborts mandatory 2015-12-03 7:17 ` [PATCH 4/5] scsi: make asynchronous aborts mandatory Hannes Reinecke @ 2015-12-03 9:13 ` Johannes Thumshirn 0 siblings, 0 replies; 15+ messages in thread From: Johannes Thumshirn @ 2015-12-03 9:13 UTC (permalink / raw) To: Hannes Reinecke, Martin K. Petersen Cc: Christoph Hellwig, James Bottomley, linux-scsi On Thu, 2015-12-03 at 08:17 +0100, Hannes Reinecke wrote: > There hasn't been any reports for HBAs where asynchronous abort > would not work, so we should make it mandatory and remove > the fallback. > > Signed-off-by: Hannes Reinecke <hare@suse.de> > --- > Documentation/scsi/scsi_eh.txt | 28 +++++++-------- > drivers/scsi/scsi_error.c | 81 ++++---------------------------------- > ---- > drivers/scsi/scsi_lib.c | 2 +- > drivers/scsi/scsi_priv.h | 3 +- > include/scsi/scsi_host.h | 5 --- > 5 files changed, 22 insertions(+), 97 deletions(-) > > diff --git a/Documentation/scsi/scsi_eh.txt b/Documentation/scsi/scsi_eh.txt > index 745eed5..6e07245fb 100644 > --- a/Documentation/scsi/scsi_eh.txt > +++ b/Documentation/scsi/scsi_eh.txt > @@ -70,7 +70,7 @@ with the command. > scmd is requeued to blk queue. > > - otherwise > - scsi_eh_scmd_add(scmd, 0) is invoked for the command. See > + scsi_eh_scmd_add(scmd) is invoked for the command. See > [1-3] for details of this function. > > > @@ -103,13 +103,15 @@ function > eh_timed_out() callback did not handle the command. > Step #2 is taken. > > - 2. If the host supports asynchronous completion (as indicated by the > - no_async_abort setting in the host template) scsi_abort_command() > - is invoked to schedule an asynchrous abort. If that fails > - Step #3 is taken. > + 2. scsi_abort_command() is invoked to schedule an asynchrous abort > + (Seee [1-3] for more information). > + Asynchronous abort are not invoked for commands which have > + SCSI_EH_ABORT_SCHEDULED set (this indicates that the command > + already had been aborted once, and this is a retry which failed), > + or when the EH deadline is expired. In these case Step #3 is taken. > > - 2. scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD) is invoked for the > - command. See [1-3] for more information. > + 3. scsi_eh_scmd_add(scmd) is invoked for the > + command. See [1-4] for more information. > > [1-3] Asynchronous command aborts > > @@ -124,16 +126,13 @@ function > > scmds enter EH via scsi_eh_scmd_add(), which does the following. > > - 1. Turns on scmd->eh_eflags as requested. It's 0 for error > - completions and SCSI_EH_CANCEL_CMD for timeouts. > + 1. Links scmd->eh_entry to shost->eh_cmd_q > > - 2. Links scmd->eh_entry to shost->eh_cmd_q > + 2. Sets SHOST_RECOVERY bit in shost->shost_state > > - 3. Sets SHOST_RECOVERY bit in shost->shost_state > + 3. Increments shost->host_failed > > - 4. Increments shost->host_failed > - > - 5. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed > + 4. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed > > As can be seen above, once any scmd is added to shost->eh_cmd_q, > SHOST_RECOVERY shost_state bit is turned on. This prevents any new > @@ -249,7 +248,6 @@ scmd->allowed. > > 1. Error completion / time out > ACTION: scsi_eh_scmd_add() is invoked for scmd > - - set scmd->eh_eflags > - add scmd to shost->eh_cmd_q > - set SHOST_RECOVERY > - shost->host_failed++ > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c > index eb0f19f..cf47b81 100644 > --- a/drivers/scsi/scsi_error.c > +++ b/drivers/scsi/scsi_error.c > @@ -162,7 +162,7 @@ scmd_eh_abort_handler(struct work_struct *work) > } > } > > - scsi_eh_scmd_add(scmd, 0); > + scsi_eh_scmd_add(scmd); > } > > /** > @@ -216,9 +216,8 @@ scsi_abort_command(struct scsi_cmnd *scmd) > /** > * scsi_eh_scmd_add - add scsi cmd to error handling. > * @scmd: scmd to run eh on. > - * @eh_flag: optional SCSI_EH flag. > */ > -void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag) > +void scsi_eh_scmd_add(struct scsi_cmnd *scmd) > { > struct Scsi_Host *shost = scmd->device->host; > unsigned long flags; > @@ -236,9 +235,6 @@ void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int > eh_flag) > if (shost->eh_deadline != -1 && !shost->last_reset) > shost->last_reset = jiffies; > > - if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) > - eh_flag &= ~SCSI_EH_CANCEL_CMD; > - scmd->eh_eflags |= eh_flag; > list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q); > shost->host_failed++; > scsi_eh_wakeup(shost); > @@ -273,10 +269,9 @@ enum blk_eh_timer_return scsi_times_out(struct request > *req) > rtn = host->hostt->eh_timed_out(scmd); > > if (rtn == BLK_EH_NOT_HANDLED) { > - if (host->hostt->no_async_abort || > - scsi_abort_command(scmd) != SUCCESS) { > + if (scsi_abort_command(scmd) != SUCCESS) { > set_host_byte(scmd, DID_TIME_OUT); > - scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD); > + scsi_eh_scmd_add(scmd); > } > } > > @@ -329,7 +324,7 @@ static inline void scsi_eh_prt_fail_stats(struct > Scsi_Host *shost, > list_for_each_entry(scmd, work_q, eh_entry) { > if (scmd->device == sdev) { > ++total_failures; > - if (scmd->eh_eflags & SCSI_EH_CANCEL_CMD) > + if (scmd->eh_eflags & > SCSI_EH_ABORT_SCHEDULED) > ++cmd_cancel; > else > ++cmd_failed; > @@ -1152,8 +1147,7 @@ int scsi_eh_get_sense(struct list_head *work_q, > * should not get sense. > */ > list_for_each_entry_safe(scmd, next, work_q, eh_entry) { > - if ((scmd->eh_eflags & SCSI_EH_CANCEL_CMD) || > - (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) || > + if ((scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) || > SCSI_SENSE_VALID(scmd)) > continue; > > @@ -1293,61 +1287,6 @@ static int scsi_eh_test_devices(struct list_head > *cmd_list, > return list_empty(work_q); > } > > - > -/** > - * scsi_eh_abort_cmds - abort pending commands. > - * @work_q: &list_head for pending commands. > - * @done_q: &list_head for processed commands. > - * > - * Decription: > - * Try and see whether or not it makes sense to try and abort the > - * running command. This only works out to be the case if we have one > - * command that has timed out. If the command simply failed, it makes > - * no sense to try and abort the command, since as far as the shost > - * adapter is concerned, it isn't running. > - */ > -static int scsi_eh_abort_cmds(struct list_head *work_q, > - struct list_head *done_q) > -{ > - struct scsi_cmnd *scmd, *next; > - LIST_HEAD(check_list); > - int rtn; > - struct Scsi_Host *shost; > - > - list_for_each_entry_safe(scmd, next, work_q, eh_entry) { > - if (!(scmd->eh_eflags & SCSI_EH_CANCEL_CMD)) > - continue; > - shost = scmd->device->host; > - if (scsi_host_eh_past_deadline(shost)) { > - list_splice_init(&check_list, work_q); > - SCSI_LOG_ERROR_RECOVERY(3, > - scmd_printk(KERN_INFO, scmd, > - "%s: skip aborting cmd, past eh > deadline\n", > - current->comm)); > - return list_empty(work_q); > - } > - SCSI_LOG_ERROR_RECOVERY(3, > - scmd_printk(KERN_INFO, scmd, > - "%s: aborting cmd\n", current->comm)); > - rtn = scsi_try_to_abort_cmd(shost->hostt, scmd); > - if (rtn == FAILED) { > - SCSI_LOG_ERROR_RECOVERY(3, > - scmd_printk(KERN_INFO, scmd, > - "%s: aborting cmd failed\n", > - current->comm)); > - list_splice_init(&check_list, work_q); > - return list_empty(work_q); > - } > - scmd->eh_eflags &= ~SCSI_EH_CANCEL_CMD; > - if (rtn == FAST_IO_FAIL) > - scsi_eh_finish_cmd(scmd, done_q); > - else > - list_move_tail(&scmd->eh_entry, &check_list); > - } > - > - return scsi_eh_test_devices(&check_list, work_q, done_q, 0); > -} > - > /** > * scsi_eh_try_stu - Send START_UNIT to device. > * @scmd: &scsi_cmnd to send START_UNIT > @@ -1690,11 +1629,6 @@ static void scsi_eh_offline_sdevs(struct list_head > *work_q, > sdev_printk(KERN_INFO, scmd->device, "Device offlined - " > "not ready after error recovery\n"); > scsi_device_set_state(scmd->device, SDEV_OFFLINE); > - if (scmd->eh_eflags & SCSI_EH_CANCEL_CMD) { > - /* > - * FIXME: Handle lost cmds. > - */ > - } > scsi_eh_finish_cmd(scmd, done_q); > } > return; > @@ -2138,8 +2072,7 @@ static void scsi_unjam_host(struct Scsi_Host *shost) > SCSI_LOG_ERROR_RECOVERY(1, scsi_eh_prt_fail_stats(shost, > &eh_work_q)); > > if (!scsi_eh_get_sense(&eh_work_q, &eh_done_q)) > - if (!scsi_eh_abort_cmds(&eh_work_q, &eh_done_q)) > - scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q); > + scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q); > > spin_lock_irqsave(shost->host_lock, flags); > if (shost->eh_deadline != -1) > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index 2dd7d0a..616e074 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -1647,7 +1647,7 @@ static void scsi_softirq_done(struct request *rq) > scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY); > break; > default: > - scsi_eh_scmd_add(cmd, 0); > + scsi_eh_scmd_add(cmd); > break; > } > } > diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h > index 8c26823..5534825 100644 > --- a/drivers/scsi/scsi_priv.h > +++ b/drivers/scsi/scsi_priv.h > @@ -18,7 +18,6 @@ struct scsi_nl_hdr; > /* > * Scsi Error Handler Flags > */ > -#define SCSI_EH_CANCEL_CMD 0x0001 /* Cancel this cmd */ > #define SCSI_EH_ABORT_SCHEDULED 0x0002 /* Abort has been > scheduled */ > > #define SCSI_SENSE_VALID(scmd) \ > @@ -71,7 +70,7 @@ extern enum blk_eh_timer_return scsi_times_out(struct > request *req); > extern int scsi_error_handler(void *host); > extern int scsi_decide_disposition(struct scsi_cmnd *cmd); > extern void scsi_eh_wakeup(struct Scsi_Host *shost); > -extern void scsi_eh_scmd_add(struct scsi_cmnd *, int); > +extern void scsi_eh_scmd_add(struct scsi_cmnd *); > void scsi_eh_ready_devs(struct Scsi_Host *shost, > struct list_head *work_q, > struct list_head *done_q); > diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h > index ed52712..ef7fff6 100644 > --- a/include/scsi/scsi_host.h > +++ b/include/scsi/scsi_host.h > @@ -444,11 +444,6 @@ struct scsi_host_template { > unsigned no_write_same:1; > > /* > - * True if asynchronous aborts are not supported > - */ > - unsigned no_async_abort:1; > - > - /* > * Countdown for host blocking with no commands outstanding. > */ > unsigned int max_host_blocked; Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 5/5] scsi_error: do not escalate failed EH command 2015-12-03 7:17 [PATCH 0/5] SCSI EH cleanup Hannes Reinecke ` (3 preceding siblings ...) 2015-12-03 7:17 ` [PATCH 4/5] scsi: make asynchronous aborts mandatory Hannes Reinecke @ 2015-12-03 7:17 ` Hannes Reinecke 2015-12-03 9:15 ` Johannes Thumshirn 4 siblings, 1 reply; 15+ messages in thread From: Hannes Reinecke @ 2015-12-03 7:17 UTC (permalink / raw) To: Martin K. Petersen Cc: Christoph Hellwig, James Bottomley, linux-scsi, Hannes Reinecke When a command is sent as part of the error handling there is not point whatsoever to start EH escalation when that command fails; we are _already_ in the error handler, and the escalation is about to commence anyway. So just call 'scsi_try_to_abort_cmd()' to abort outstanding commands and let the main EH routine handle the rest. Signed-off-by: Hannes Reinecke <hare@suse.de> --- drivers/scsi/scsi_error.c | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-) diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index cf47b81..0159498 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -870,15 +870,6 @@ static int scsi_try_to_abort_cmd(struct scsi_host_template *hostt, return hostt->eh_abort_handler(scmd); } -static void scsi_abort_eh_cmnd(struct scsi_cmnd *scmd) -{ - if (scsi_try_to_abort_cmd(scmd->device->host->hostt, scmd) != SUCCESS) - if (scsi_try_bus_device_reset(scmd) != SUCCESS) - if (scsi_try_target_reset(scmd) != SUCCESS) - if (scsi_try_bus_reset(scmd) != SUCCESS) - scsi_try_host_reset(scmd); -} - /** * scsi_eh_prep_cmnd - Save a scsi command info as part of error recovery * @scmd: SCSI command structure to hijack @@ -1063,7 +1054,7 @@ retry: break; } } else if (rtn != FAILED) { - scsi_abort_eh_cmnd(scmd); + scsi_try_to_abort_cmd(shost->hostt, scmd); rtn = FAILED; } -- 1.8.5.6 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 5/5] scsi_error: do not escalate failed EH command 2015-12-03 7:17 ` [PATCH 5/5] scsi_error: do not escalate failed EH command Hannes Reinecke @ 2015-12-03 9:15 ` Johannes Thumshirn 0 siblings, 0 replies; 15+ messages in thread From: Johannes Thumshirn @ 2015-12-03 9:15 UTC (permalink / raw) To: Hannes Reinecke, Martin K. Petersen Cc: Christoph Hellwig, James Bottomley, linux-scsi On Thu, 2015-12-03 at 08:17 +0100, Hannes Reinecke wrote: > When a command is sent as part of the error handling there > is not point whatsoever to start EH escalation when that > command fails; we are _already_ in the error handler, > and the escalation is about to commence anyway. > So just call 'scsi_try_to_abort_cmd()' to abort outstanding > commands and let the main EH routine handle the rest. > > Signed-off-by: Hannes Reinecke <hare@suse.de> > --- > drivers/scsi/scsi_error.c | 11 +---------- > 1 file changed, 1 insertion(+), 10 deletions(-) > > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c > index cf47b81..0159498 100644 > --- a/drivers/scsi/scsi_error.c > +++ b/drivers/scsi/scsi_error.c > @@ -870,15 +870,6 @@ static int scsi_try_to_abort_cmd(struct > scsi_host_template *hostt, > return hostt->eh_abort_handler(scmd); > } > > -static void scsi_abort_eh_cmnd(struct scsi_cmnd *scmd) > -{ > - if (scsi_try_to_abort_cmd(scmd->device->host->hostt, scmd) != > SUCCESS) > - if (scsi_try_bus_device_reset(scmd) != SUCCESS) > - if (scsi_try_target_reset(scmd) != SUCCESS) > - if (scsi_try_bus_reset(scmd) != SUCCESS) > - scsi_try_host_reset(scmd); > -} > - > /** > * scsi_eh_prep_cmnd - Save a scsi command info as part of error recovery > * @scmd: SCSI command structure to hijack > @@ -1063,7 +1054,7 @@ retry: > break; > } > } else if (rtn != FAILED) { > - scsi_abort_eh_cmnd(scmd); > + scsi_try_to_abort_cmd(shost->hostt, scmd); > rtn = FAILED; > } > Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 0/5] SCSI EH cleanup @ 2016-06-20 9:35 Hannes Reinecke 2016-06-20 9:35 ` [PATCH 2/5] scsi: make scsi_eh_scmd_add() always succeed Hannes Reinecke 0 siblings, 1 reply; 15+ messages in thread From: Hannes Reinecke @ 2016-06-20 9:35 UTC (permalink / raw) To: Martin K. Petersen Cc: James Bottomley, Christoph Hellwig, linux-scsi, Hannes Reinecke Hi all, here's a patchset to cleanup SCSI EH. The main point is that we should finally drop the no_async_abort flag; up to now we haven't had any issues with the asynchronous aborts, and the flag was never used. As usual, comments and reviews are welcome. Christoph Hellwig (1): libsas: allow async aborts Hannes Reinecke (4): scsi: make scsi_eh_scmd_add() always succeed scsi: make eh_eflags persistent scsi: make asynchronous aborts mandatory scsi: Do not escalate failed EH command Documentation/scsi/scsi_eh.txt | 31 ++++----- drivers/scsi/libsas/sas_scsi_host.c | 3 - drivers/scsi/scsi_error.c | 127 +++++------------------------------- drivers/scsi/scsi_lib.c | 4 +- drivers/scsi/scsi_priv.h | 3 +- include/scsi/scsi_eh.h | 1 + include/scsi/scsi_host.h | 5 -- 7 files changed, 35 insertions(+), 139 deletions(-) -- 1.8.5.6 ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 2/5] scsi: make scsi_eh_scmd_add() always succeed 2016-06-20 9:35 [PATCH 0/5] SCSI EH cleanup Hannes Reinecke @ 2016-06-20 9:35 ` Hannes Reinecke 2016-06-22 13:28 ` Christoph Hellwig 0 siblings, 1 reply; 15+ messages in thread From: Hannes Reinecke @ 2016-06-20 9:35 UTC (permalink / raw) To: Martin K. Petersen Cc: James Bottomley, Christoph Hellwig, linux-scsi, Hannes Reinecke scsi_eh_scmd_add() currently only will fail if no error handler thread is started (which will never be the case) or if the state machine encounters an illegal transition. But if we're encountering an invalid state transition chances is we cannot fixup things with the error handler. So better add a WARN_ON for illegal host states and make scsi_dh_scmd_add() a void function. Signed-off-by: Hannes Reinecke <hare@suse.de> --- drivers/scsi/scsi_error.c | 39 +++++++++++++-------------------------- drivers/scsi/scsi_lib.c | 4 ++-- drivers/scsi/scsi_priv.h | 2 +- 3 files changed, 16 insertions(+), 29 deletions(-) diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 984ddcb..deb35737 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -162,13 +162,7 @@ scmd_eh_abort_handler(struct work_struct *work) } } - if (!scsi_eh_scmd_add(scmd, 0)) { - SCSI_LOG_ERROR_RECOVERY(3, - scmd_printk(KERN_WARNING, scmd, - "terminate aborted command\n")); - set_host_byte(scmd, DID_TIME_OUT); - scsi_finish_command(scmd); - } + scsi_eh_scmd_add(scmd, 0); } /** @@ -224,37 +218,32 @@ scsi_abort_command(struct scsi_cmnd *scmd) * scsi_eh_scmd_add - add scsi cmd to error handling. * @scmd: scmd to run eh on. * @eh_flag: optional SCSI_EH flag. - * - * Return value: - * 0 on failure. */ -int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag) +void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag) { struct Scsi_Host *shost = scmd->device->host; unsigned long flags; - int ret = 0; - if (!shost->ehandler) - return 0; + WARN_ON(!shost->ehandler); spin_lock_irqsave(shost->host_lock, flags); + WARN_ON(shost->shost_state != SHOST_RUNNING && + shost->shost_state != SHOST_CANCEL && + shost->shost_state != SHOST_RECOVERY && + shost->shost_state != SHOST_CANCEL_RECOVERY); if (scsi_host_set_state(shost, SHOST_RECOVERY)) - if (scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY)) - goto out_unlock; + scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY); if (shost->eh_deadline != -1 && !shost->last_reset) shost->last_reset = jiffies; - ret = 1; if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) eh_flag &= ~SCSI_EH_CANCEL_CMD; scmd->eh_eflags |= eh_flag; list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q); shost->host_failed++; scsi_eh_wakeup(shost); - out_unlock: spin_unlock_irqrestore(shost->host_lock, flags); - return ret; } /** @@ -285,13 +274,11 @@ enum blk_eh_timer_return scsi_times_out(struct request *req) rtn = host->hostt->eh_timed_out(scmd); if (rtn == BLK_EH_NOT_HANDLED) { - if (!host->hostt->no_async_abort && - scsi_abort_command(scmd) == SUCCESS) - return BLK_EH_NOT_HANDLED; - - set_host_byte(scmd, DID_TIME_OUT); - if (!scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD)) - rtn = BLK_EH_HANDLED; + if (host->hostt->no_async_abort || + scsi_abort_command(scmd) != SUCCESS) { + set_host_byte(scmd, DID_TIME_OUT); + scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD); + } } return rtn; diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index b2e332a..7a8c9ad 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1559,8 +1559,8 @@ static void scsi_softirq_done(struct request *rq) scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY); break; default: - if (!scsi_eh_scmd_add(cmd, 0)) - scsi_finish_command(cmd); + scsi_eh_scmd_add(cmd, 0); + break; } } diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h index 57a4b99..de937ba 100644 --- a/drivers/scsi/scsi_priv.h +++ b/drivers/scsi/scsi_priv.h @@ -71,7 +71,7 @@ extern enum blk_eh_timer_return scsi_times_out(struct request *req); extern int scsi_error_handler(void *host); extern int scsi_decide_disposition(struct scsi_cmnd *cmd); extern void scsi_eh_wakeup(struct Scsi_Host *shost); -extern int scsi_eh_scmd_add(struct scsi_cmnd *, int); +extern void scsi_eh_scmd_add(struct scsi_cmnd *, int); void scsi_eh_ready_devs(struct Scsi_Host *shost, struct list_head *work_q, struct list_head *done_q); -- 1.8.5.6 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 2/5] scsi: make scsi_eh_scmd_add() always succeed 2016-06-20 9:35 ` [PATCH 2/5] scsi: make scsi_eh_scmd_add() always succeed Hannes Reinecke @ 2016-06-22 13:28 ` Christoph Hellwig 0 siblings, 0 replies; 15+ messages in thread From: Christoph Hellwig @ 2016-06-22 13:28 UTC (permalink / raw) To: Hannes Reinecke Cc: Martin K. Petersen, James Bottomley, Christoph Hellwig, linux-scsi Agreed, I think trying to handle these sorts of errors isn't going to be helpful, while the WARN_ON at least gives us a chance to diagnose the issue if it ever happened. > + WARN_ON(!shost->ehandler); > > spin_lock_irqsave(shost->host_lock, flags); > + WARN_ON(shost->shost_state != SHOST_RUNNING && > + shost->shost_state != SHOST_CANCEL && > + shost->shost_state != SHOST_RECOVERY && > + shost->shost_state != SHOST_CANCEL_RECOVERY); Use WARN_ON_ONCE to avoid repeated backtraces for the same condition. > if (scsi_host_set_state(shost, SHOST_RECOVERY)) > - if (scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY)) > - goto out_unlock; > + scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY); No warn_on or early return here? ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2016-06-22 13:28 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-12-03 7:17 [PATCH 0/5] SCSI EH cleanup Hannes Reinecke 2015-12-03 7:17 ` [PATCH 1/5] libsas: allow async aborts Hannes Reinecke 2015-12-03 8:17 ` Johannes Thumshirn 2015-12-03 7:17 ` [PATCH 2/5] scsi: make scsi_eh_scmd_add() always succeed Hannes Reinecke 2015-12-03 9:07 ` Johannes Thumshirn 2015-12-03 16:51 ` Christoph Hellwig 2015-12-03 7:17 ` [PATCH 3/5] scsi: make eh_eflags persistent Hannes Reinecke 2015-12-03 9:08 ` Johannes Thumshirn 2015-12-03 16:55 ` Christoph Hellwig 2015-12-03 7:17 ` [PATCH 4/5] scsi: make asynchronous aborts mandatory Hannes Reinecke 2015-12-03 9:13 ` Johannes Thumshirn 2015-12-03 7:17 ` [PATCH 5/5] scsi_error: do not escalate failed EH command Hannes Reinecke 2015-12-03 9:15 ` Johannes Thumshirn -- strict thread matches above, loose matches on Subject: below -- 2016-06-20 9:35 [PATCH 0/5] SCSI EH cleanup Hannes Reinecke 2016-06-20 9:35 ` [PATCH 2/5] scsi: make scsi_eh_scmd_add() always succeed Hannes Reinecke 2016-06-22 13:28 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).