* [Patch] scsi_error: should not get sense for timeout IO in scsi error handler
@ 2015-07-31 9:52 jiang.biao2
2015-07-31 13:17 ` Hannes Reinecke
2015-08-27 0:31 ` James Bottomley
0 siblings, 2 replies; 5+ messages in thread
From: jiang.biao2 @ 2015-07-31 9:52 UTC (permalink / raw)
To: linux-scsi, JBottomley
scsi_error: should not get sense for timeout IO in scsi error handler
When an IO timeout occurs, the IO will be aborted in
scsi_abort_command() and SCSI_EH_ABORT_SCHEDULED will be set. Because
of that, the SCSI_EH_CANCEL_CMD will be clear in scsi_eh_scmd_add().
So when scsi error handler starts, it will get sense for this
timeout IO and the scmd of the IO request will be reused. In that
case, the scmd may be double released when racing with io_done(),
which will result in crash.
SO SCSI_EH_ABORT_SCHEDULED should also be checked when getting sense.
The bug maybe reproduced when the link between host and disk is
unstable.
Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn>
Signed-off-by: Long Chun <long.chun@zte.com.cn>
Reviewed-by: Tan Hu <tan.hu@zte.com.cn>
Reviewed-by: Chen Donghai <chen.donghai@zte.com.cn>
Reviewed-by: Cai Qu <cai.qu@zte.com.cn>
diff -uprN drivers/scsi/scsi_error.c drivers_new/scsi/scsi_error.c
--- scsi/scsi_error.c 2015-07-31 16:03:18.000000000 +0800
+++ scsi_new/scsi_error.c 2015-07-31 16:29:25.000000000 +0800
@@ -1156,9 +1156,14 @@ int scsi_eh_get_sense(struct list_head *
struct Scsi_Host *shost;
int rtn;
+ /*
+ * If SCSI_EH_ABORT_SCHEDULED has been set, it is timeout IO,
+ * should not get sense.
+ */
list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
if ((scmd->eh_eflags & SCSI_EH_CANCEL_CMD) ||
- SCSI_SENSE_VALID(scmd))
+ (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) ||
+ SCSI_SENSE_VALID(scmd))
continue;
shost = scmd->device->host;
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [Patch] scsi_error: should not get sense for timeout IO in scsi error handler 2015-07-31 9:52 [Patch] scsi_error: should not get sense for timeout IO in scsi error handler jiang.biao2 @ 2015-07-31 13:17 ` Hannes Reinecke 2015-08-01 4:39 ` 答复: " jiang.biao2 [not found] ` <OF7187F435.4453DB49-ON48257E94.000CC72C-48257E94.000D6550@zte.com.cn> 2015-08-27 0:31 ` James Bottomley 1 sibling, 2 replies; 5+ messages in thread From: Hannes Reinecke @ 2015-07-31 13:17 UTC (permalink / raw) To: jiang.biao2, linux-scsi, JBottomley On 07/31/2015 11:52 AM, jiang.biao2@zte.com.cn wrote: > scsi_error: should not get sense for timeout IO in scsi error handler > > When an IO timeout occurs, the IO will be aborted in > scsi_abort_command() and SCSI_EH_ABORT_SCHEDULED will be set. Because > of that, the SCSI_EH_CANCEL_CMD will be clear in scsi_eh_scmd_add(). > So when scsi error handler starts, it will get sense for this > timeout IO and the scmd of the IO request will be reused. In that > case, the scmd may be double released when racing with io_done(), > which will result in crash. > SO SCSI_EH_ABORT_SCHEDULED should also be checked when getting sense. > The bug maybe reproduced when the link between host and disk is > unstable. > > Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn> > Signed-off-by: Long Chun <long.chun@zte.com.cn> > Reviewed-by: Tan Hu <tan.hu@zte.com.cn> > Reviewed-by: Chen Donghai <chen.donghai@zte.com.cn> > Reviewed-by: Cai Qu <cai.qu@zte.com.cn> > > diff -uprN drivers/scsi/scsi_error.c drivers_new/scsi/scsi_error.c > --- scsi/scsi_error.c 2015-07-31 16:03:18.000000000 +0800 > +++ scsi_new/scsi_error.c 2015-07-31 16:29:25.000000000 +0800 > @@ -1156,9 +1156,14 @@ int scsi_eh_get_sense(struct list_head * > struct Scsi_Host *shost; > int rtn; > > + /* > + * If SCSI_EH_ABORT_SCHEDULED has been set, it is timeout IO, > + * should not get sense. > + */ > list_for_each_entry_safe(scmd, next, work_q, eh_entry) { > if ((scmd->eh_eflags & SCSI_EH_CANCEL_CMD) || > - SCSI_SENSE_VALID(scmd)) > + (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) || > + SCSI_SENSE_VALID(scmd)) > continue; > > shost = scmd->device->host; > -- _Actually_ you need to test for both, SCSI_EH_CANCEL_CMD _and_ SCSI_EH_ABORT_SCHEDULED. Not every driver is required to implement and/or support asynchronous command aborts, and those will be setting SCSI_EH_CANCEL_CMD even though they've run into a timeout. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
* 答复: Re: [Patch] scsi_error: should not get sense for timeout IO in scsi error handler 2015-07-31 13:17 ` Hannes Reinecke @ 2015-08-01 4:39 ` jiang.biao2 [not found] ` <OF7187F435.4453DB49-ON48257E94.000CC72C-48257E94.000D6550@zte.com.cn> 1 sibling, 0 replies; 5+ messages in thread From: jiang.biao2 @ 2015-08-01 4:39 UTC (permalink / raw) To: Hannes Reinecke, linux-scsi, linux-scsi-owner, JBottomley linux-scsi-owner@vger.kernel.org wrote on 2015/07/31 21:17:33: > Hannes Reinecke <hare@suse.de> > 发件人: linux-scsi-owner@vger.kernel.org > > 2015/07/31 21:17 > > 收件人 > > jiang.biao2@zte.com.cn, linux-scsi@vger.kernel.org, JBottomley@odin.com, > > 抄送 > > 主题 > > Re: [Patch] scsi_error: should not get sense for timeout IO in scsi > error handler > > On 07/31/2015 11:52 AM, jiang.biao2@zte.com.cn wrote: > > scsi_error: should not get sense for timeout IO in scsi error handler > > > > When an IO timeout occurs, the IO will be aborted in > > scsi_abort_command() and SCSI_EH_ABORT_SCHEDULED will be set. Because > > of that, the SCSI_EH_CANCEL_CMD will be clear in scsi_eh_scmd_add(). > > So when scsi error handler starts, it will get sense for this > > timeout IO and the scmd of the IO request will be reused. In that > > case, the scmd may be double released when racing with io_done(), > > which will result in crash. > > SO SCSI_EH_ABORT_SCHEDULED should also be checked when getting sense. > > The bug maybe reproduced when the link between host and disk is > > unstable. > > > > Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn> > > Signed-off-by: Long Chun <long.chun@zte.com.cn> > > Reviewed-by: Tan Hu <tan.hu@zte.com.cn> > > Reviewed-by: Chen Donghai <chen.donghai@zte.com.cn> > > Reviewed-by: Cai Qu <cai.qu@zte.com.cn> > > > > diff -uprN drivers/scsi/scsi_error.c drivers_new/scsi/scsi_error.c > > --- scsi/scsi_error.c 2015-07-31 16:03:18.000000000 +0800 > > +++ scsi_new/scsi_error.c 2015-07-31 16:29:25.000000000 +0800 > > @@ -1156,9 +1156,14 @@ int scsi_eh_get_sense(struct list_head * > > struct Scsi_Host *shost; > > int rtn; > > > > + /* > > + * If SCSI_EH_ABORT_SCHEDULED has been set, it is timeout IO, > > + * should not get sense. > > + */ > > list_for_each_entry_safe(scmd, next, work_q, eh_entry) { > > if ((scmd->eh_eflags & SCSI_EH_CANCEL_CMD) || > > - SCSI_SENSE_VALID(scmd)) > > + (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) || > > + SCSI_SENSE_VALID(scmd)) > > continue; > > > > shost = scmd->device->host; > > -- > _Actually_ you need to test for both, SCSI_EH_CANCEL_CMD _and_ > SCSI_EH_ABORT_SCHEDULED. > Not every driver is required to implement and/or support > asynchronous command aborts, and those will be setting > SCSI_EH_CANCEL_CMD even though they've run into a timeout. > That's right, but SCSI_EH_CANCEL_CMD _has_ already been tested in the current code, so there's no need to add in the patch. After patched, both SCSI_EH_CANCEL_CMD _and_ SCSI_EH_ABORT_SCHEDULED are tested here, that'll ensure no getting sense for *timeout io*. Thanks. ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <OF7187F435.4453DB49-ON48257E94.000CC72C-48257E94.000D6550@zte.com.cn>]
* Re: 答复: Re: [Patch] scsi_error: should not get sense for timeout IO in scsi error handler [not found] ` <OF7187F435.4453DB49-ON48257E94.000CC72C-48257E94.000D6550@zte.com.cn> @ 2015-08-01 7:37 ` Hannes Reinecke 0 siblings, 0 replies; 5+ messages in thread From: Hannes Reinecke @ 2015-08-01 7:37 UTC (permalink / raw) To: jiang.biao2 Cc: JBottomley, linux-scsi, cai.qu, long.chun, tan.hu, chen.donghai On 08/01/2015 04:26 AM, jiang.biao2@zte.com.cn wrote: > Hannes Reinecke <hare@suse.de> 写于 2015/07/31 21:17:33: > >> Hannes Reinecke <hare@suse.de> >> 2015/07/31 21:17 >> >> 收件人 >> >> jiang.biao2@zte.com.cn, linux-scsi@vger.kernel.org, JBottomley@odin.com, > >> >> 抄送 >> >> 主题 >> >> Re: [Patch] scsi_error: should not get sense for timeout IO in scsi >> error handler >> >> On 07/31/2015 11:52 AM, jiang.biao2@zte.com.cn wrote: >>> scsi_error: should not get sense for timeout IO in scsi error handler >>> >>> When an IO timeout occurs, the IO will be aborted in >>> scsi_abort_command() and SCSI_EH_ABORT_SCHEDULED will be set. Because >>> of that, the SCSI_EH_CANCEL_CMD will be clear in scsi_eh_scmd_add(). >>> So when scsi error handler starts, it will get sense for this >>> timeout IO and the scmd of the IO request will be reused. In that >>> case, the scmd may be double released when racing with io_done(), >>> which will result in crash. >>> SO SCSI_EH_ABORT_SCHEDULED should also be checked when getting sense. >>> The bug maybe reproduced when the link between host and disk is >>> unstable. >>> >>> Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn> >>> Signed-off-by: Long Chun <long.chun@zte.com.cn> >>> Reviewed-by: Tan Hu <tan.hu@zte.com.cn> >>> Reviewed-by: Chen Donghai <chen.donghai@zte.com.cn> >>> Reviewed-by: Cai Qu <cai.qu@zte.com.cn> >>> >>> diff -uprN drivers/scsi/scsi_error.c drivers_new/scsi/scsi_error.c >>> --- scsi/scsi_error.c 2015-07-31 16:03:18.000000000 +0800 >>> +++ scsi_new/scsi_error.c 2015-07-31 16:29:25.000000000 +0800 >>> @@ -1156,9 +1156,14 @@ int scsi_eh_get_sense(struct list_head * >>> struct Scsi_Host *shost; >>> int rtn; >>> >>> + /* >>> + * If SCSI_EH_ABORT_SCHEDULED has been set, it is timeout IO, >>> + * should not get sense. >>> + */ >>> list_for_each_entry_safe(scmd, next, work_q, eh_entry) { >>> if ((scmd->eh_eflags & SCSI_EH_CANCEL_CMD) || >>> - SCSI_SENSE_VALID(scmd)) >>> + (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) || >>> + SCSI_SENSE_VALID(scmd)) >>> continue; >>> >>> shost = scmd->device->host; >>> -- >> _Actually_ you need to test for both, SCSI_EH_CANCEL_CMD _and_ >> SCSI_EH_ABORT_SCHEDULED. >> Not every driver is required to implement and/or support >> asynchronous command aborts, and those will be setting >> SCSI_EH_CANCEL_CMD even though they've run into a timeout. > That's right, but SCSI_EH_CANCEL_CMD _has_ already been tested > in the current code, so there's no need to add in the patch. > After patched, both SCSI_EH_CANCEL_CMD _and_ > SCSI_EH_ABORT_SCHEDULED are tested here, that'll ensure no > getting sense for timeout io. > Ah, right. I've misread the patch. Reviewed-by: Hannes Reinecke <hare@suse.com> Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Patch] scsi_error: should not get sense for timeout IO in scsi error handler 2015-07-31 9:52 [Patch] scsi_error: should not get sense for timeout IO in scsi error handler jiang.biao2 2015-07-31 13:17 ` Hannes Reinecke @ 2015-08-27 0:31 ` James Bottomley 1 sibling, 0 replies; 5+ messages in thread From: James Bottomley @ 2015-08-27 0:31 UTC (permalink / raw) To: jiang.biao2; +Cc: linux-scsi On Fri, 2015-07-31 at 17:52 +0800, jiang.biao2@zte.com.cn wrote: > scsi_error: should not get sense for timeout IO in scsi error handler > > When an IO timeout occurs, the IO will be aborted in > scsi_abort_command() and SCSI_EH_ABORT_SCHEDULED will be set. Because > of that, the SCSI_EH_CANCEL_CMD will be clear in scsi_eh_scmd_add(). > So when scsi error handler starts, it will get sense for this > timeout IO and the scmd of the IO request will be reused. In that > case, the scmd may be double released when racing with io_done(), > which will result in crash. > SO SCSI_EH_ABORT_SCHEDULED should also be checked when getting sense. > The bug maybe reproduced when the link between host and disk is > unstable. > > Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn> > Signed-off-by: Long Chun <long.chun@zte.com.cn> > Reviewed-by: Tan Hu <tan.hu@zte.com.cn> > Reviewed-by: Chen Donghai <chen.donghai@zte.com.cn> > Reviewed-by: Cai Qu <cai.qu@zte.com.cn> > > diff -uprN drivers/scsi/scsi_error.c drivers_new/scsi/scsi_error.c to apply easily, diffs need to start at the top of the tree, please. > --- scsi/scsi_error.c 2015-07-31 16:03:18.000000000 +0800 > +++ scsi_new/scsi_error.c 2015-07-31 16:29:25.000000000 +0800 > @@ -1156,9 +1156,14 @@ int scsi_eh_get_sense(struct list_head * > struct Scsi_Host *shost; > int rtn; > > + /* > + * If SCSI_EH_ABORT_SCHEDULED has been set, it is timeout IO, > + * should not get sense. > + */ > list_for_each_entry_safe(scmd, next, work_q, eh_entry) { and here all the tabs have been converted to spaces; you need to read Documentation/email-clients.txt for details on how to avoid this. I managed to fix it up this time, but won't again. James ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-08-27 0:31 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-31 9:52 [Patch] scsi_error: should not get sense for timeout IO in scsi error handler jiang.biao2
2015-07-31 13:17 ` Hannes Reinecke
2015-08-01 4:39 ` 答复: " jiang.biao2
[not found] ` <OF7187F435.4453DB49-ON48257E94.000CC72C-48257E94.000D6550@zte.com.cn>
2015-08-01 7:37 ` Hannes Reinecke
2015-08-27 0:31 ` James Bottomley
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).