[Patch] scsi_error: should not get sense for timeout IO in scsi error handler

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [Patch] scsi_error: should not get sense for timeout IO in scsi error handler
@ 2015-07-31  9:52 jiang.biao2
  2015-07-31 13:17 ` Hannes Reinecke
  2015-08-27  0:31 ` James Bottomley
  0 siblings, 2 replies; 5+ messages in thread
From: jiang.biao2 @ 2015-07-31  9:52 UTC (permalink / raw)
  To: linux-scsi, JBottomley

scsi_error: should not get sense for timeout IO in scsi error handler

When an IO timeout occurs, the IO will be aborted in
scsi_abort_command() and SCSI_EH_ABORT_SCHEDULED will be set. Because
of that, the SCSI_EH_CANCEL_CMD will be clear in scsi_eh_scmd_add().
So when scsi error handler starts, it will get sense for this
timeout IO and the scmd of the IO request will be reused. In that
case, the scmd may be double released when racing with io_done(),
which will result in crash.
SO SCSI_EH_ABORT_SCHEDULED should also be checked when getting sense.
The bug maybe reproduced when the link between host and disk is
unstable.

Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn>
Signed-off-by: Long Chun <long.chun@zte.com.cn>
Reviewed-by: Tan Hu <tan.hu@zte.com.cn>
Reviewed-by: Chen Donghai <chen.donghai@zte.com.cn>
Reviewed-by: Cai Qu <cai.qu@zte.com.cn>

diff -uprN drivers/scsi/scsi_error.c drivers_new/scsi/scsi_error.c
--- scsi/scsi_error.c   2015-07-31 16:03:18.000000000 +0800
+++ scsi_new/scsi_error.c       2015-07-31 16:29:25.000000000 +0800
@@ -1156,9 +1156,14 @@ int scsi_eh_get_sense(struct list_head *
        struct Scsi_Host *shost;
        int rtn;

+       /*
+        * If SCSI_EH_ABORT_SCHEDULED has been set, it is timeout IO,
+        * should not get sense.
+        */
        list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
                if ((scmd->eh_eflags & SCSI_EH_CANCEL_CMD) ||
-                   SCSI_SENSE_VALID(scmd))
+                   (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) ||
+                    SCSI_SENSE_VALID(scmd))
                        continue;

                shost = scmd->device->host;

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Patch] scsi_error: should not get sense for timeout IO in scsi error handler
  2015-07-31  9:52 [Patch] scsi_error: should not get sense for timeout IO in scsi error handler jiang.biao2
@ 2015-07-31 13:17 ` Hannes Reinecke
  2015-08-01  4:39   ` 答复: " jiang.biao2
       [not found]   ` <OF7187F435.4453DB49-ON48257E94.000CC72C-48257E94.000D6550@zte.com.cn>
  2015-08-27  0:31 ` James Bottomley
  1 sibling, 2 replies; 5+ messages in thread
From: Hannes Reinecke @ 2015-07-31 13:17 UTC (permalink / raw)
  To: jiang.biao2, linux-scsi, JBottomley

On 07/31/2015 11:52 AM, jiang.biao2@zte.com.cn wrote:
> scsi_error: should not get sense for timeout IO in scsi error handler
> 
> When an IO timeout occurs, the IO will be aborted in
> scsi_abort_command() and SCSI_EH_ABORT_SCHEDULED will be set. Because
> of that, the SCSI_EH_CANCEL_CMD will be clear in scsi_eh_scmd_add().
> So when scsi error handler starts, it will get sense for this
> timeout IO and the scmd of the IO request will be reused. In that
> case, the scmd may be double released when racing with io_done(),
> which will result in crash.
> SO SCSI_EH_ABORT_SCHEDULED should also be checked when getting sense.
> The bug maybe reproduced when the link between host and disk is
> unstable.
> 
> Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn>
> Signed-off-by: Long Chun <long.chun@zte.com.cn>
> Reviewed-by: Tan Hu <tan.hu@zte.com.cn>
> Reviewed-by: Chen Donghai <chen.donghai@zte.com.cn>
> Reviewed-by: Cai Qu <cai.qu@zte.com.cn>
> 
> diff -uprN drivers/scsi/scsi_error.c drivers_new/scsi/scsi_error.c
> --- scsi/scsi_error.c   2015-07-31 16:03:18.000000000 +0800
> +++ scsi_new/scsi_error.c       2015-07-31 16:29:25.000000000 +0800
> @@ -1156,9 +1156,14 @@ int scsi_eh_get_sense(struct list_head *
>         struct Scsi_Host *shost;
>         int rtn;
> 
> +       /*
> +        * If SCSI_EH_ABORT_SCHEDULED has been set, it is timeout IO,
> +        * should not get sense.
> +        */
>         list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
>                 if ((scmd->eh_eflags & SCSI_EH_CANCEL_CMD) ||
> -                   SCSI_SENSE_VALID(scmd))
> +                   (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) ||
> +                    SCSI_SENSE_VALID(scmd))
>                         continue;
> 
>                 shost = scmd->device->host;
> --
_Actually_ you need to test for both, SCSI_EH_CANCEL_CMD _and_
SCSI_EH_ABORT_SCHEDULED.
Not every driver is required to implement and/or support
asynchronous command aborts, and those will be setting
SCSI_EH_CANCEL_CMD even though they've run into a timeout.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* 答复: Re: [Patch] scsi_error: should not get sense for timeout IO in scsi error handler
  2015-07-31 13:17 ` Hannes Reinecke
@ 2015-08-01  4:39   ` jiang.biao2
       [not found]   ` <OF7187F435.4453DB49-ON48257E94.000CC72C-48257E94.000D6550@zte.com.cn>
  1 sibling, 0 replies; 5+ messages in thread
From: jiang.biao2 @ 2015-08-01  4:39 UTC (permalink / raw)
  To: Hannes Reinecke, linux-scsi, linux-scsi-owner, JBottomley

linux-scsi-owner@vger.kernel.org wrote on 2015/07/31 21:17:33:

> Hannes Reinecke <hare@suse.de> 
> 发件人:  linux-scsi-owner@vger.kernel.org
> 
> 2015/07/31 21:17
> 
> 收件人
> 
> jiang.biao2@zte.com.cn, linux-scsi@vger.kernel.org, JBottomley@odin.com, 

> 
> 抄送
> 
> 主题
> 
> Re: [Patch] scsi_error: should not get sense for timeout IO in scsi 
> error handler
> 
> On 07/31/2015 11:52 AM, jiang.biao2@zte.com.cn wrote:
> > scsi_error: should not get sense for timeout IO in scsi error handler
> > 
> > When an IO timeout occurs, the IO will be aborted in
> > scsi_abort_command() and SCSI_EH_ABORT_SCHEDULED will be set. Because
> > of that, the SCSI_EH_CANCEL_CMD will be clear in scsi_eh_scmd_add().
> > So when scsi error handler starts, it will get sense for this
> > timeout IO and the scmd of the IO request will be reused. In that
> > case, the scmd may be double released when racing with io_done(),
> > which will result in crash.
> > SO SCSI_EH_ABORT_SCHEDULED should also be checked when getting sense.
> > The bug maybe reproduced when the link between host and disk is
> > unstable.
> > 
> > Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn>
> > Signed-off-by: Long Chun <long.chun@zte.com.cn>
> > Reviewed-by: Tan Hu <tan.hu@zte.com.cn>
> > Reviewed-by: Chen Donghai <chen.donghai@zte.com.cn>
> > Reviewed-by: Cai Qu <cai.qu@zte.com.cn>
> > 
> > diff -uprN drivers/scsi/scsi_error.c drivers_new/scsi/scsi_error.c
> > --- scsi/scsi_error.c   2015-07-31 16:03:18.000000000 +0800
> > +++ scsi_new/scsi_error.c       2015-07-31 16:29:25.000000000 +0800
> > @@ -1156,9 +1156,14 @@ int scsi_eh_get_sense(struct list_head *
> >         struct Scsi_Host *shost;
> >         int rtn;
> > 
> > +       /*
> > +        * If SCSI_EH_ABORT_SCHEDULED has been set, it is timeout IO,
> > +        * should not get sense.
> > +        */
> >         list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
> >                 if ((scmd->eh_eflags & SCSI_EH_CANCEL_CMD) ||
> > -                   SCSI_SENSE_VALID(scmd))
> > +                   (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) ||
> > +                    SCSI_SENSE_VALID(scmd))
> >                         continue;
> > 
> >                 shost = scmd->device->host;
> > --
> _Actually_ you need to test for both, SCSI_EH_CANCEL_CMD _and_
> SCSI_EH_ABORT_SCHEDULED.
> Not every driver is required to implement and/or support
> asynchronous command aborts, and those will be setting
> SCSI_EH_CANCEL_CMD even though they've run into a timeout.
> 
That's right, but SCSI_EH_CANCEL_CMD _has_ already been tested 
in the current code, so there's no need to add in the patch. 
After patched, both SCSI_EH_CANCEL_CMD _and_ 
SCSI_EH_ABORT_SCHEDULED are tested here, that'll ensure no 
getting sense for *timeout io*.

Thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 答复: Re: [Patch] scsi_error: should not get sense for timeout IO in scsi error handler
       [not found]   ` <OF7187F435.4453DB49-ON48257E94.000CC72C-48257E94.000D6550@zte.com.cn>
@ 2015-08-01  7:37     ` Hannes Reinecke
  0 siblings, 0 replies; 5+ messages in thread
From: Hannes Reinecke @ 2015-08-01  7:37 UTC (permalink / raw)
  To: jiang.biao2
  Cc: JBottomley, linux-scsi, cai.qu, long.chun, tan.hu, chen.donghai

On 08/01/2015 04:26 AM, jiang.biao2@zte.com.cn wrote:
> Hannes Reinecke <hare@suse.de> 写于 2015/07/31 21:17:33:
> 
>> Hannes Reinecke <hare@suse.de> 
>> 2015/07/31 21:17
>>
>> 收件人
>>
>> jiang.biao2@zte.com.cn, linux-scsi@vger.kernel.org, JBottomley@odin.com, 
> 
>>
>> 抄送
>>
>> 主题
>>
>> Re: [Patch] scsi_error: should not get sense for timeout IO in scsi 
>> error handler
>>
>> On 07/31/2015 11:52 AM, jiang.biao2@zte.com.cn wrote:
>>> scsi_error: should not get sense for timeout IO in scsi error handler
>>>
>>> When an IO timeout occurs, the IO will be aborted in
>>> scsi_abort_command() and SCSI_EH_ABORT_SCHEDULED will be set. Because
>>> of that, the SCSI_EH_CANCEL_CMD will be clear in scsi_eh_scmd_add().
>>> So when scsi error handler starts, it will get sense for this
>>> timeout IO and the scmd of the IO request will be reused. In that
>>> case, the scmd may be double released when racing with io_done(),
>>> which will result in crash.
>>> SO SCSI_EH_ABORT_SCHEDULED should also be checked when getting sense.
>>> The bug maybe reproduced when the link between host and disk is
>>> unstable.
>>>
>>> Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn>
>>> Signed-off-by: Long Chun <long.chun@zte.com.cn>
>>> Reviewed-by: Tan Hu <tan.hu@zte.com.cn>
>>> Reviewed-by: Chen Donghai <chen.donghai@zte.com.cn>
>>> Reviewed-by: Cai Qu <cai.qu@zte.com.cn>
>>>
>>> diff -uprN drivers/scsi/scsi_error.c drivers_new/scsi/scsi_error.c
>>> --- scsi/scsi_error.c   2015-07-31 16:03:18.000000000 +0800
>>> +++ scsi_new/scsi_error.c       2015-07-31 16:29:25.000000000 +0800
>>> @@ -1156,9 +1156,14 @@ int scsi_eh_get_sense(struct list_head *
>>>         struct Scsi_Host *shost;
>>>         int rtn;
>>>
>>> +       /*
>>> +        * If SCSI_EH_ABORT_SCHEDULED has been set, it is timeout IO,
>>> +        * should not get sense.
>>> +        */
>>>         list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
>>>                 if ((scmd->eh_eflags & SCSI_EH_CANCEL_CMD) ||
>>> -                   SCSI_SENSE_VALID(scmd))
>>> +                   (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) ||
>>> +                    SCSI_SENSE_VALID(scmd))
>>>                         continue;
>>>
>>>                 shost = scmd->device->host;
>>> --
>> _Actually_ you need to test for both, SCSI_EH_CANCEL_CMD _and_
>> SCSI_EH_ABORT_SCHEDULED.
>> Not every driver is required to implement and/or support
>> asynchronous command aborts, and those will be setting
>> SCSI_EH_CANCEL_CMD even though they've run into a timeout.
> That's right, but SCSI_EH_CANCEL_CMD _has_ already been tested 
> in the current code, so there's no need to add in the patch. 
> After patched, both SCSI_EH_CANCEL_CMD _and_ 
> SCSI_EH_ABORT_SCHEDULED are tested here, that'll ensure no 
> getting sense for timeout io.
> 
Ah, right. I've misread the patch.

Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Patch] scsi_error: should not get sense for timeout IO in scsi error handler
  2015-07-31  9:52 [Patch] scsi_error: should not get sense for timeout IO in scsi error handler jiang.biao2
  2015-07-31 13:17 ` Hannes Reinecke
@ 2015-08-27  0:31 ` James Bottomley
  1 sibling, 0 replies; 5+ messages in thread
From: James Bottomley @ 2015-08-27  0:31 UTC (permalink / raw)
  To: jiang.biao2; +Cc: linux-scsi

On Fri, 2015-07-31 at 17:52 +0800, jiang.biao2@zte.com.cn wrote:
> scsi_error: should not get sense for timeout IO in scsi error handler
> 
> When an IO timeout occurs, the IO will be aborted in
> scsi_abort_command() and SCSI_EH_ABORT_SCHEDULED will be set. Because
> of that, the SCSI_EH_CANCEL_CMD will be clear in scsi_eh_scmd_add().
> So when scsi error handler starts, it will get sense for this
> timeout IO and the scmd of the IO request will be reused. In that
> case, the scmd may be double released when racing with io_done(),
> which will result in crash.
> SO SCSI_EH_ABORT_SCHEDULED should also be checked when getting sense.
> The bug maybe reproduced when the link between host and disk is
> unstable.
> 
> Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn>
> Signed-off-by: Long Chun <long.chun@zte.com.cn>
> Reviewed-by: Tan Hu <tan.hu@zte.com.cn>
> Reviewed-by: Chen Donghai <chen.donghai@zte.com.cn>
> Reviewed-by: Cai Qu <cai.qu@zte.com.cn>
> 
> diff -uprN drivers/scsi/scsi_error.c drivers_new/scsi/scsi_error.c

to apply easily, diffs need to start at the top of the tree, please.

> --- scsi/scsi_error.c   2015-07-31 16:03:18.000000000 +0800
> +++ scsi_new/scsi_error.c       2015-07-31 16:29:25.000000000 +0800
> @@ -1156,9 +1156,14 @@ int scsi_eh_get_sense(struct list_head *
>         struct Scsi_Host *shost;
>         int rtn;
> 
> +       /*
> +        * If SCSI_EH_ABORT_SCHEDULED has been set, it is timeout IO,
> +        * should not get sense.
> +        */
>         list_for_each_entry_safe(scmd, next, work_q, eh_entry) {

and here all the tabs have been converted to spaces; you need to read
Documentation/email-clients.txt for details on how to avoid this.

I managed to fix it up this time, but won't again.

James



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-08-27  0:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-31  9:52 [Patch] scsi_error: should not get sense for timeout IO in scsi error handler jiang.biao2
2015-07-31 13:17 ` Hannes Reinecke
2015-08-01  4:39   ` 答复: " jiang.biao2
     [not found]   ` <OF7187F435.4453DB49-ON48257E94.000CC72C-48257E94.000D6550@zte.com.cn>
2015-08-01  7:37     ` Hannes Reinecke
2015-08-27  0:31 ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).