* [PATCH] scsi: core: set result when the command cannot be dispatched [not found] <1554846371-33660-1-git-send-email-jalee@purestorage.com> @ 2019-04-09 21:53 ` Jaesoo Lee 2019-04-09 21:57 ` Jaesoo Lee 2019-04-09 22:14 ` Bart Van Assche 0 siblings, 2 replies; 6+ messages in thread From: Jaesoo Lee @ 2019-04-09 21:53 UTC (permalink / raw) To: James E.J. Bottomley, Martin K. Petersen, Jens Axboe, Douglas Gilbert Cc: linux-scsi, linux-block, Roland Dreier When SCSI blk-mq is enabled, there is a bug in handling errors in scsi_queue_rq. Specifically, the bug is not setting result field of scsi_request correctly when the dispatch of the command has been failed. Since the upper layer code including the sg_io ioctl expects to receive any error status from result field of scsi_request, the error is silently ignored and this could cause data corruptions for some applications. This commit also fixes another bug that the result field is not initialized when scsi_request is allocated. Signed-off-by: Jaesoo Lee <jalee@purestorage.com> --- block/scsi_ioctl.c | 1 + drivers/scsi/scsi_lib.c | 1 + 2 files changed, 2 insertions(+) diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c index 533f4ae..f2d7979 100644 --- a/block/scsi_ioctl.c +++ b/block/scsi_ioctl.c @@ -723,6 +723,7 @@ void scsi_req_init(struct scsi_request *req) req->cmd = req->__cmd; req->cmd_len = BLK_MAX_CDB; req->sense_len = 0; + req->result = 0; } EXPORT_SYMBOL(scsi_req_init); diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 2018967..af1488d 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1699,6 +1699,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, ret = BLK_STS_DEV_RESOURCE; break; default: + scsi_req(req)->result = DID_NO_CONNECT << 16; /* * Make sure to release all allocated ressources when * we hit an error, as we will never see this command -- 2.7.4 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] scsi: core: set result when the command cannot be dispatched 2019-04-09 21:53 ` [PATCH] scsi: core: set result when the command cannot be dispatched Jaesoo Lee @ 2019-04-09 21:57 ` Jaesoo Lee 2019-04-09 22:14 ` Bart Van Assche 1 sibling, 0 replies; 6+ messages in thread From: Jaesoo Lee @ 2019-04-09 21:57 UTC (permalink / raw) To: James E.J. Bottomley, Martin K. Petersen, Jens Axboe, Douglas Gilbert Cc: linux-scsi, linux-block, Roland Dreier Hello, This is the test results. 0. Kernel configs Version: 5.1-rc1 Boot parameter: dm_mod.use_blk_mq=Y scsi_mod.use_blk_mq=Y 1. Normal state : (As expected) The command succeeded $ sg_write_same --lba=100 --xferlen=512 /dev/sg5 $ 2. Immediately after bringing down the iSCSI interface at the target : (As expected) Failed with DID_TRANSPORT_DISRUPTED after a few seconds $ sg_write_same --lba=100 --xferlen=512 /dev/sg5 Write same: transport: Host_status=0x0e [DID_TRANSPORT_DISRUPTED] Driver_status=0x00 [DRIVER_OK, SUGGEST_OK] Write same(10) command failed 3. Immediately after the DID_TRANSPORT_DISRUPTED error : (As expected) Failed with DID_NO_CONNECT after a few seconds $ sg_write_same --lba=100 --xferlen=512 /dev/sg5 Write same: transport: Host_status=0x01 [DID_NO_CONNECT] Driver_status=0x00 [DRIVER_OK, SUGGEST_OK] Write same(10) command failed 4. Issued IO again : (As expected) The command failed $ sg_write_same --lba=100 --xferlen=512 /dev/sg5 Write same: pass through os error: No such device or address Write same(10) command failed Thanks, Jaesoo Lee. On Tue, Apr 9, 2019 at 2:53 PM Jaesoo Lee <jalee@purestorage.com> wrote: > > When SCSI blk-mq is enabled, there is a bug in handling errors in scsi_queue_rq. > Specifically, the bug is not setting result field of scsi_request correctly when > the dispatch of the command has been failed. Since the upper layer code > including the sg_io ioctl expects to receive any error status from result field > of scsi_request, the error is silently ignored and this could cause data > corruptions for some applications. This commit also fixes another bug that the > result field is not initialized when scsi_request is allocated. > > Signed-off-by: Jaesoo Lee <jalee@purestorage.com> > --- > block/scsi_ioctl.c | 1 + > drivers/scsi/scsi_lib.c | 1 + > 2 files changed, 2 insertions(+) > > diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c > index 533f4ae..f2d7979 100644 > --- a/block/scsi_ioctl.c > +++ b/block/scsi_ioctl.c > @@ -723,6 +723,7 @@ void scsi_req_init(struct scsi_request *req) > req->cmd = req->__cmd; > req->cmd_len = BLK_MAX_CDB; > req->sense_len = 0; > + req->result = 0; > } > EXPORT_SYMBOL(scsi_req_init); > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index 2018967..af1488d 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -1699,6 +1699,7 @@ static blk_status_t scsi_queue_rq(struct > blk_mq_hw_ctx *hctx, > ret = BLK_STS_DEV_RESOURCE; > break; > default: > + scsi_req(req)->result = DID_NO_CONNECT << 16; > /* > * Make sure to release all allocated ressources when > * we hit an error, as we will never see this command > -- > 2.7.4 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] scsi: core: set result when the command cannot be dispatched 2019-04-09 21:53 ` [PATCH] scsi: core: set result when the command cannot be dispatched Jaesoo Lee 2019-04-09 21:57 ` Jaesoo Lee @ 2019-04-09 22:14 ` Bart Van Assche 2019-04-09 23:29 ` Jaesoo Lee 1 sibling, 1 reply; 6+ messages in thread From: Bart Van Assche @ 2019-04-09 22:14 UTC (permalink / raw) To: Jaesoo Lee, James E.J. Bottomley, Martin K. Petersen, Jens Axboe, Douglas Gilbert Cc: linux-scsi, linux-block, Roland Dreier On Tue, 2019-04-09 at 14:53 -0700, Jaesoo Lee wrote: > When SCSI blk-mq is enabled, there is a bug in handling errors in scsi_queue_rq. > Specifically, the bug is not setting result field of scsi_request correctly when > the dispatch of the command has been failed. Since the upper layer code > including the sg_io ioctl expects to receive any error status from result field > of scsi_request, the error is silently ignored and this could cause data > corruptions for some applications. This commit also fixes another bug that the > result field is not initialized when scsi_request is allocated. > > Signed-off-by: Jaesoo Lee <jalee@purestorage.com> > --- > block/scsi_ioctl.c | 1 + > drivers/scsi/scsi_lib.c | 1 + > 2 files changed, 2 insertions(+) > > diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c > index 533f4ae..f2d7979 100644 > --- a/block/scsi_ioctl.c > +++ b/block/scsi_ioctl.c > @@ -723,6 +723,7 @@ void scsi_req_init(struct scsi_request *req) > req->cmd = req->__cmd; > req->cmd_len = BLK_MAX_CDB; > req->sense_len = 0; > + req->result = 0; > } > EXPORT_SYMBOL(scsi_req_init); What makes you think that this assignment is necessary? > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index 2018967..af1488d 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -1699,6 +1699,7 @@ static blk_status_t scsi_queue_rq(struct > blk_mq_hw_ctx *hctx, > ret = BLK_STS_DEV_RESOURCE; > break; > default: > + scsi_req(req)->result = DID_NO_CONNECT << 16; > /* > * Make sure to release all allocated ressources when > * we hit an error, as we will never see this command What leads you to the conclusion that (ret != BLK_STS_OK && ret != BLK_STS_RESOUCE) means that there is a connectivity issue? Thanks, Bart. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] scsi: core: set result when the command cannot be dispatched 2019-04-09 22:14 ` Bart Van Assche @ 2019-04-09 23:29 ` Jaesoo Lee 2019-04-09 23:44 ` Bart Van Assche 0 siblings, 1 reply; 6+ messages in thread From: Jaesoo Lee @ 2019-04-09 23:29 UTC (permalink / raw) To: Bart Van Assche Cc: James E.J. Bottomley, Martin K. Petersen, Jens Axboe, Douglas Gilbert, linux-scsi, linux-block, Roland Dreier Let me comment in line. On Tue, Apr 9, 2019 at 3:14 PM Bart Van Assche <bvanassche@acm.org> wrote: > > On Tue, 2019-04-09 at 14:53 -0700, Jaesoo Lee wrote: > > When SCSI blk-mq is enabled, there is a bug in handling errors in scsi_queue_rq. > > Specifically, the bug is not setting result field of scsi_request correctly when > > the dispatch of the command has been failed. Since the upper layer code > > including the sg_io ioctl expects to receive any error status from result field > > of scsi_request, the error is silently ignored and this could cause data > > corruptions for some applications. This commit also fixes another bug that the > > result field is not initialized when scsi_request is allocated. > > > > Signed-off-by: Jaesoo Lee <jalee@purestorage.com> > > --- > > block/scsi_ioctl.c | 1 + > > drivers/scsi/scsi_lib.c | 1 + > > 2 files changed, 2 insertions(+) > > > > diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c > > index 533f4ae..f2d7979 100644 > > --- a/block/scsi_ioctl.c > > +++ b/block/scsi_ioctl.c > > @@ -723,6 +723,7 @@ void scsi_req_init(struct scsi_request *req) > > req->cmd = req->__cmd; > > req->cmd_len = BLK_MAX_CDB; > > req->sense_len = 0; > > + req->result = 0; > > } > > EXPORT_SYMBOL(scsi_req_init); > > What makes you think that this assignment is necessary? > Actually, I discovered this before fixing this bug and we might not see this problem anymore once this bug is fixed. Previously, since we are not setting scsi_req(req)->result in scsi_queue_rq, I found that the application could receive another DID_TRANSPORT_DISRUPTED host_status again if the same 'struct request' is allocated for the IO. Please let me know if I need to remove this change. > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > > index 2018967..af1488d 100644 > > --- a/drivers/scsi/scsi_lib.c > > +++ b/drivers/scsi/scsi_lib.c > > @@ -1699,6 +1699,7 @@ static blk_status_t scsi_queue_rq(struct > > blk_mq_hw_ctx *hctx, > > ret = BLK_STS_DEV_RESOURCE; > > break; > > default: > > + scsi_req(req)->result = DID_NO_CONNECT << 16; > > /* > > * Make sure to release all allocated ressources when > > * we hit an error, as we will never see this command > > What leads you to the conclusion that (ret != BLK_STS_OK && > ret != BLK_STS_RESOUCE) means that there is a connectivity issue? I found this is what we are doing for legacy queue case; I referred to scsi_prep_return() and scsi_kill_request() code where we always returning DID_NO_CONNECT. However, I think proper return code handling should be something like: diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 2018967..21e516e 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1699,6 +1699,10 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, ret = BLK_STS_DEV_RESOURCE; break; default: + if (unlikely(!scsi_device_online(sdev))) + scsi_req(req)->result = DID_NO_CONNECT << 16; + else + scsi_req(req)->result = DID_ERROR << 16; /* * Make sure to release all allocated ressources when * we hit an error, as we will never see this command > > Thanks, > > Bart. Thanks, Jaesoo. ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] scsi: core: set result when the command cannot be dispatched 2019-04-09 23:29 ` Jaesoo Lee @ 2019-04-09 23:44 ` Bart Van Assche 2019-04-10 0:02 ` Jaesoo Lee 0 siblings, 1 reply; 6+ messages in thread From: Bart Van Assche @ 2019-04-09 23:44 UTC (permalink / raw) To: Jaesoo Lee Cc: James E.J. Bottomley, Martin K. Petersen, Jens Axboe, Douglas Gilbert, linux-scsi, linux-block, Roland Dreier On Tue, 2019-04-09 at 16:29 -0700, Jaesoo Lee wrote: > Let me comment in line. > > On Tue, Apr 9, 2019 at 3:14 PM Bart Van Assche <bvanassche@acm.org> wrote: > > > > On Tue, 2019-04-09 at 14:53 -0700, Jaesoo Lee wrote: > > > When SCSI blk-mq is enabled, there is a bug in handling errors in scsi_queue_rq. > > > Specifically, the bug is not setting result field of scsi_request correctly when > > > the dispatch of the command has been failed. Since the upper layer code > > > including the sg_io ioctl expects to receive any error status from result field > > > of scsi_request, the error is silently ignored and this could cause data > > > corruptions for some applications. This commit also fixes another bug that the > > > result field is not initialized when scsi_request is allocated. > > > > > > Signed-off-by: Jaesoo Lee <jalee@purestorage.com> > > > --- > > > block/scsi_ioctl.c | 1 + > > > drivers/scsi/scsi_lib.c | 1 + > > > 2 files changed, 2 insertions(+) > > > > > > diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c > > > index 533f4ae..f2d7979 100644 > > > --- a/block/scsi_ioctl.c > > > +++ b/block/scsi_ioctl.c > > > @@ -723,6 +723,7 @@ void scsi_req_init(struct scsi_request *req) > > > req->cmd = req->__cmd; > > > req->cmd_len = BLK_MAX_CDB; > > > req->sense_len = 0; > > > + req->result = 0; > > > } > > > EXPORT_SYMBOL(scsi_req_init); > > > > What makes you think that this assignment is necessary? > > > > Actually, I discovered this before fixing this bug and we might not > see this problem anymore once this bug is fixed. > > Previously, since we are not setting scsi_req(req)->result in > scsi_queue_rq, I found that the application could receive another > DID_TRANSPORT_DISRUPTED host_status again if the same 'struct request' > is allocated for the IO. > > Please let me know if I need to remove this change. Since SCSI LLDs have to set that result variable anyway if a request completes successfully I'd prefer not to add that assignment. > > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > > > index 2018967..af1488d 100644 > > > --- a/drivers/scsi/scsi_lib.c > > > +++ b/drivers/scsi/scsi_lib.c > > > @@ -1699,6 +1699,7 @@ static blk_status_t scsi_queue_rq(struct > > > blk_mq_hw_ctx *hctx, > > > ret = BLK_STS_DEV_RESOURCE; > > > break; > > > default: > > > + scsi_req(req)->result = DID_NO_CONNECT << 16; > > > /* > > > * Make sure to release all allocated ressources when > > > * we hit an error, as we will never see this command > > > > What leads you to the conclusion that (ret != BLK_STS_OK && > > ret != BLK_STS_RESOUCE) means that there is a connectivity issue? > > I found this is what we are doing for legacy queue case; I referred to > scsi_prep_return() and scsi_kill_request() code where we always > returning DID_NO_CONNECT. > > However, I think proper return code handling should be something like: > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index 2018967..21e516e 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -1699,6 +1699,10 @@ static blk_status_t scsi_queue_rq(struct > blk_mq_hw_ctx *hctx, > ret = BLK_STS_DEV_RESOURCE; > break; > default: > + if (unlikely(!scsi_device_online(sdev))) > + scsi_req(req)->result = DID_NO_CONNECT << 16; > + else > + scsi_req(req)->result = DID_ERROR << 16; > /* > * Make sure to release all allocated ressources when > * we hit an error, as we will never see this command The above looks better to me than the original patch. Thanks, Bart. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] scsi: core: set result when the command cannot be dispatched 2019-04-09 23:44 ` Bart Van Assche @ 2019-04-10 0:02 ` Jaesoo Lee 0 siblings, 0 replies; 6+ messages in thread From: Jaesoo Lee @ 2019-04-10 0:02 UTC (permalink / raw) To: Bart Van Assche Cc: James E.J. Bottomley, Martin K. Petersen, Jens Axboe, Douglas Gilbert, linux-scsi, linux-block, Roland Dreier Let me send v2 addressing your comments. Thanks, Jaesoo Lee. On Tue, Apr 9, 2019 at 4:45 PM Bart Van Assche <bvanassche@acm.org> wrote: > > On Tue, 2019-04-09 at 16:29 -0700, Jaesoo Lee wrote: > > Let me comment in line. > > > > On Tue, Apr 9, 2019 at 3:14 PM Bart Van Assche <bvanassche@acm.org> wrote: > > > > > > On Tue, 2019-04-09 at 14:53 -0700, Jaesoo Lee wrote: > > > > When SCSI blk-mq is enabled, there is a bug in handling errors in scsi_queue_rq. > > > > Specifically, the bug is not setting result field of scsi_request correctly when > > > > the dispatch of the command has been failed. Since the upper layer code > > > > including the sg_io ioctl expects to receive any error status from result field > > > > of scsi_request, the error is silently ignored and this could cause data > > > > corruptions for some applications. This commit also fixes another bug that the > > > > result field is not initialized when scsi_request is allocated. > > > > > > > > Signed-off-by: Jaesoo Lee <jalee@purestorage.com> > > > > --- > > > > block/scsi_ioctl.c | 1 + > > > > drivers/scsi/scsi_lib.c | 1 + > > > > 2 files changed, 2 insertions(+) > > > > > > > > diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c > > > > index 533f4ae..f2d7979 100644 > > > > --- a/block/scsi_ioctl.c > > > > +++ b/block/scsi_ioctl.c > > > > @@ -723,6 +723,7 @@ void scsi_req_init(struct scsi_request *req) > > > > req->cmd = req->__cmd; > > > > req->cmd_len = BLK_MAX_CDB; > > > > req->sense_len = 0; > > > > + req->result = 0; > > > > } > > > > EXPORT_SYMBOL(scsi_req_init); > > > > > > What makes you think that this assignment is necessary? > > > > > > > Actually, I discovered this before fixing this bug and we might not > > see this problem anymore once this bug is fixed. > > > > Previously, since we are not setting scsi_req(req)->result in > > scsi_queue_rq, I found that the application could receive another > > DID_TRANSPORT_DISRUPTED host_status again if the same 'struct request' > > is allocated for the IO. > > > > Please let me know if I need to remove this change. > > Since SCSI LLDs have to set that result variable anyway if a request > completes successfully I'd prefer not to add that assignment. > > > > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > > > > index 2018967..af1488d 100644 > > > > --- a/drivers/scsi/scsi_lib.c > > > > +++ b/drivers/scsi/scsi_lib.c > > > > @@ -1699,6 +1699,7 @@ static blk_status_t scsi_queue_rq(struct > > > > blk_mq_hw_ctx *hctx, > > > > ret = BLK_STS_DEV_RESOURCE; > > > > break; > > > > default: > > > > + scsi_req(req)->result = DID_NO_CONNECT << 16; > > > > /* > > > > * Make sure to release all allocated ressources when > > > > * we hit an error, as we will never see this command > > > > > > What leads you to the conclusion that (ret != BLK_STS_OK && > > > ret != BLK_STS_RESOUCE) means that there is a connectivity issue? > > > > I found this is what we are doing for legacy queue case; I referred to > > scsi_prep_return() and scsi_kill_request() code where we always > > returning DID_NO_CONNECT. > > > > However, I think proper return code handling should be something like: > > > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > > index 2018967..21e516e 100644 > > --- a/drivers/scsi/scsi_lib.c > > +++ b/drivers/scsi/scsi_lib.c > > @@ -1699,6 +1699,10 @@ static blk_status_t scsi_queue_rq(struct > > blk_mq_hw_ctx *hctx, > > ret = BLK_STS_DEV_RESOURCE; > > break; > > default: > > + if (unlikely(!scsi_device_online(sdev))) > > + scsi_req(req)->result = DID_NO_CONNECT << 16; > > + else > > + scsi_req(req)->result = DID_ERROR << 16; > > /* > > * Make sure to release all allocated ressources when > > * we hit an error, as we will never see this command > > The above looks better to me than the original patch. > > Thanks, > > Bart. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2019-04-10 0:02 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1554846371-33660-1-git-send-email-jalee@purestorage.com>
2019-04-09 21:53 ` [PATCH] scsi: core: set result when the command cannot be dispatched Jaesoo Lee
2019-04-09 21:57 ` Jaesoo Lee
2019-04-09 22:14 ` Bart Van Assche
2019-04-09 23:29 ` Jaesoo Lee
2019-04-09 23:44 ` Bart Van Assche
2019-04-10 0:02 ` Jaesoo Lee
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.