* [PATCH] scsi: core: fix race between handling STS_RESOURCE and completion @ 2020-12-02 10:04 Ming Lei 2020-12-02 18:10 ` Bart Van Assche 2020-12-08 4:56 ` Martin K. Petersen 0 siblings, 2 replies; 6+ messages in thread From: Ming Lei @ 2020-12-02 10:04 UTC (permalink / raw) To: linux-scsi, Martin K . Petersen Cc: Ming Lei, Hannes Reinecke, Sumit Saxena, Kashyap Desai, Bart Van Assche, Ewan Milne, Long Li, chenxiang (M), John Garry When queuing IO request to LLD, STS_RESOURCE may be returned because: - host in recovery or blocked - target queue throttling or blocked - LLD rejection Any one of the above doesn't happen frequently enough. BLK_STS_DEV_RESOURCE is returned to block layer for avoiding unnecessary re-run queue, and it is just one small optimization. However, all in-flight requests originated from this scsi device may be completed just after reading 'sdev->device_busy', so BLK_STS_DEV_RESOURCE is returned to block layer. And the current failed IO won't get chance to be queued any more, since it is invisible at that time for either scsi_run_queue_async() or blk-mq's RESTART. Fix the issue by not returning BLK_STS_DEV_RESOURCE in this situation. Cc: Hannes Reinecke <hare@suse.com> Cc: Sumit Saxena <sumit.saxena@broadcom.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Ewan Milne <emilne@redhat.com> Cc: Long Li <longli@microsoft.com> Tested-by: "chenxiang (M)" <chenxiang66@hisilicon.com> Reported-by: John Garry <john.garry@huawei.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> --- drivers/scsi/scsi_lib.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 60c7a7d74852..03c6d0620bfd 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1703,8 +1703,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, break; case BLK_STS_RESOURCE: case BLK_STS_ZONE_RESOURCE: - if (atomic_read(&sdev->device_busy) || - scsi_device_blocked(sdev)) + if (scsi_device_blocked(sdev)) ret = BLK_STS_DEV_RESOURCE; break; default: -- 2.28.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] scsi: core: fix race between handling STS_RESOURCE and completion 2020-12-02 10:04 [PATCH] scsi: core: fix race between handling STS_RESOURCE and completion Ming Lei @ 2020-12-02 18:10 ` Bart Van Assche 2020-12-03 1:03 ` Ming Lei 2020-12-08 4:56 ` Martin K. Petersen 1 sibling, 1 reply; 6+ messages in thread From: Bart Van Assche @ 2020-12-02 18:10 UTC (permalink / raw) To: Ming Lei, linux-scsi, Martin K . Petersen Cc: Hannes Reinecke, Sumit Saxena, Kashyap Desai, Ewan Milne, Long Li, chenxiang (M), John Garry On 12/2/20 2:04 AM, Ming Lei wrote: > When queuing IO request to LLD, STS_RESOURCE may be returned because: > > - host in recovery or blocked > - target queue throttling or blocked > - LLD rejection > > Any one of the above doesn't happen frequently enough. > > BLK_STS_DEV_RESOURCE is returned to block layer for avoiding unnecessary > re-run queue, and it is just one small optimization. However, all > in-flight requests originated from this scsi device may be completed > just after reading 'sdev->device_busy', so BLK_STS_DEV_RESOURCE is > returned to block layer. And the current failed IO won't get chance > to be queued any more, since it is invisible at that time for either > scsi_run_queue_async() or blk-mq's RESTART. > > Fix the issue by not returning BLK_STS_DEV_RESOURCE in this situation. > > Cc: Hannes Reinecke <hare@suse.com> > Cc: Sumit Saxena <sumit.saxena@broadcom.com> > Cc: Kashyap Desai <kashyap.desai@broadcom.com> > Cc: Bart Van Assche <bvanassche@acm.org> > Cc: Ewan Milne <emilne@redhat.com> > Cc: Long Li <longli@microsoft.com> > Tested-by: "chenxiang (M)" <chenxiang66@hisilicon.com> > Reported-by: John Garry <john.garry@huawei.com> > Signed-off-by: Ming Lei <ming.lei@redhat.com> > --- > drivers/scsi/scsi_lib.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index 60c7a7d74852..03c6d0620bfd 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -1703,8 +1703,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, > break; > case BLK_STS_RESOURCE: > case BLK_STS_ZONE_RESOURCE: > - if (atomic_read(&sdev->device_busy) || > - scsi_device_blocked(sdev)) > + if (scsi_device_blocked(sdev)) > ret = BLK_STS_DEV_RESOURCE; > break; > default: Since this patch modifies code introduced in commit 86ff7c2a80cd ("blk-mq: introduce BLK_STS_DEV_RESOURCE"), does this patch perhaps needs a Fixes: tag? Thanks, Bart. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] scsi: core: fix race between handling STS_RESOURCE and completion 2020-12-02 18:10 ` Bart Van Assche @ 2020-12-03 1:03 ` Ming Lei 2020-12-07 3:32 ` chenxiang (M) 0 siblings, 1 reply; 6+ messages in thread From: Ming Lei @ 2020-12-03 1:03 UTC (permalink / raw) To: Bart Van Assche Cc: linux-scsi, Martin K . Petersen, Hannes Reinecke, Sumit Saxena, Kashyap Desai, Ewan Milne, Long Li, chenxiang (M), John Garry On Wed, Dec 02, 2020 at 10:10:30AM -0800, Bart Van Assche wrote: > On 12/2/20 2:04 AM, Ming Lei wrote: > > When queuing IO request to LLD, STS_RESOURCE may be returned because: > > > > - host in recovery or blocked > > - target queue throttling or blocked > > - LLD rejection > > > > Any one of the above doesn't happen frequently enough. > > > > BLK_STS_DEV_RESOURCE is returned to block layer for avoiding unnecessary > > re-run queue, and it is just one small optimization. However, all > > in-flight requests originated from this scsi device may be completed > > just after reading 'sdev->device_busy', so BLK_STS_DEV_RESOURCE is > > returned to block layer. And the current failed IO won't get chance > > to be queued any more, since it is invisible at that time for either > > scsi_run_queue_async() or blk-mq's RESTART. > > > > Fix the issue by not returning BLK_STS_DEV_RESOURCE in this situation. > > > > Cc: Hannes Reinecke <hare@suse.com> > > Cc: Sumit Saxena <sumit.saxena@broadcom.com> > > Cc: Kashyap Desai <kashyap.desai@broadcom.com> > > Cc: Bart Van Assche <bvanassche@acm.org> > > Cc: Ewan Milne <emilne@redhat.com> > > Cc: Long Li <longli@microsoft.com> > > Tested-by: "chenxiang (M)" <chenxiang66@hisilicon.com> > > Reported-by: John Garry <john.garry@huawei.com> > > Signed-off-by: Ming Lei <ming.lei@redhat.com> > > --- > > drivers/scsi/scsi_lib.c | 3 +-- > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > > index 60c7a7d74852..03c6d0620bfd 100644 > > --- a/drivers/scsi/scsi_lib.c > > +++ b/drivers/scsi/scsi_lib.c > > @@ -1703,8 +1703,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, > > break; > > case BLK_STS_RESOURCE: > > case BLK_STS_ZONE_RESOURCE: > > - if (atomic_read(&sdev->device_busy) || > > - scsi_device_blocked(sdev)) > > + if (scsi_device_blocked(sdev)) > > ret = BLK_STS_DEV_RESOURCE; > > break; > > default: > > Since this patch modifies code introduced in commit 86ff7c2a80cd ("blk-mq: > introduce BLK_STS_DEV_RESOURCE"), does this patch perhaps needs a Fixes: > tag? This same race exists before commit 86ff7c2a80cd, so I think the 'Fixes:' tag is misleading. Thanks, Ming ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] scsi: core: fix race between handling STS_RESOURCE and completion 2020-12-03 1:03 ` Ming Lei @ 2020-12-07 3:32 ` chenxiang (M) 2020-12-07 7:51 ` Ming Lei 0 siblings, 1 reply; 6+ messages in thread From: chenxiang (M) @ 2020-12-07 3:32 UTC (permalink / raw) To: Ming Lei, Bart Van Assche Cc: linux-scsi, Martin K . Petersen, Hannes Reinecke, Sumit Saxena, Kashyap Desai, Ewan Milne, Long Li, John Garry 在 2020/12/3 9:03, Ming Lei 写道: > On Wed, Dec 02, 2020 at 10:10:30AM -0800, Bart Van Assche wrote: >> On 12/2/20 2:04 AM, Ming Lei wrote: >>> When queuing IO request to LLD, STS_RESOURCE may be returned because: >>> >>> - host in recovery or blocked >>> - target queue throttling or blocked >>> - LLD rejection >>> >>> Any one of the above doesn't happen frequently enough. >>> >>> BLK_STS_DEV_RESOURCE is returned to block layer for avoiding unnecessary >>> re-run queue, and it is just one small optimization. However, all >>> in-flight requests originated from this scsi device may be completed >>> just after reading 'sdev->device_busy', so BLK_STS_DEV_RESOURCE is >>> returned to block layer. And the current failed IO won't get chance >>> to be queued any more, since it is invisible at that time for either >>> scsi_run_queue_async() or blk-mq's RESTART. >>> >>> Fix the issue by not returning BLK_STS_DEV_RESOURCE in this situation. >>> >>> Cc: Hannes Reinecke <hare@suse.com> >>> Cc: Sumit Saxena <sumit.saxena@broadcom.com> >>> Cc: Kashyap Desai <kashyap.desai@broadcom.com> >>> Cc: Bart Van Assche <bvanassche@acm.org> >>> Cc: Ewan Milne <emilne@redhat.com> >>> Cc: Long Li <longli@microsoft.com> >>> Tested-by: "chenxiang (M)" <chenxiang66@hisilicon.com> >>> Reported-by: John Garry <john.garry@huawei.com> >>> Signed-off-by: Ming Lei <ming.lei@redhat.com> >>> --- >>> drivers/scsi/scsi_lib.c | 3 +-- >>> 1 file changed, 1 insertion(+), 2 deletions(-) >>> >>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c >>> index 60c7a7d74852..03c6d0620bfd 100644 >>> --- a/drivers/scsi/scsi_lib.c >>> +++ b/drivers/scsi/scsi_lib.c >>> @@ -1703,8 +1703,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, >>> break; >>> case BLK_STS_RESOURCE: >>> case BLK_STS_ZONE_RESOURCE: >>> - if (atomic_read(&sdev->device_busy) || >>> - scsi_device_blocked(sdev)) >>> + if (scsi_device_blocked(sdev)) >>> ret = BLK_STS_DEV_RESOURCE; >>> break; >>> default: >> Since this patch modifies code introduced in commit 86ff7c2a80cd ("blk-mq: >> introduce BLK_STS_DEV_RESOURCE"), does this patch perhaps needs a Fixes: >> tag? > This same race exists before commit 86ff7c2a80cd, so I think the 'Fixes:' tag > is misleading. When reverted the patch "scsi: core: Only re-run queue in scsi_end_request() if device queue is busy", it also solves the issue. Does the issue is brought by the patch? If so, maybe adding fixes("Fixes: ed5dd6a67d5e ("scsi: core: Only re-run queue in scsi_end_request() if device queue is busy")") is more accuratte. > > > > Thanks, > Ming > > > . > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] scsi: core: fix race between handling STS_RESOURCE and completion 2020-12-07 3:32 ` chenxiang (M) @ 2020-12-07 7:51 ` Ming Lei 0 siblings, 0 replies; 6+ messages in thread From: Ming Lei @ 2020-12-07 7:51 UTC (permalink / raw) To: chenxiang (M) Cc: Bart Van Assche, linux-scsi, Martin K . Petersen, Hannes Reinecke, Sumit Saxena, Kashyap Desai, Ewan Milne, Long Li, John Garry On Mon, Dec 07, 2020 at 11:32:54AM +0800, chenxiang (M) wrote: > > > 在 2020/12/3 9:03, Ming Lei 写道: > > On Wed, Dec 02, 2020 at 10:10:30AM -0800, Bart Van Assche wrote: > > > On 12/2/20 2:04 AM, Ming Lei wrote: > > > > When queuing IO request to LLD, STS_RESOURCE may be returned because: > > > > > > > > - host in recovery or blocked > > > > - target queue throttling or blocked > > > > - LLD rejection > > > > > > > > Any one of the above doesn't happen frequently enough. > > > > > > > > BLK_STS_DEV_RESOURCE is returned to block layer for avoiding unnecessary > > > > re-run queue, and it is just one small optimization. However, all > > > > in-flight requests originated from this scsi device may be completed > > > > just after reading 'sdev->device_busy', so BLK_STS_DEV_RESOURCE is > > > > returned to block layer. And the current failed IO won't get chance > > > > to be queued any more, since it is invisible at that time for either > > > > scsi_run_queue_async() or blk-mq's RESTART. > > > > > > > > Fix the issue by not returning BLK_STS_DEV_RESOURCE in this situation. > > > > > > > > Cc: Hannes Reinecke <hare@suse.com> > > > > Cc: Sumit Saxena <sumit.saxena@broadcom.com> > > > > Cc: Kashyap Desai <kashyap.desai@broadcom.com> > > > > Cc: Bart Van Assche <bvanassche@acm.org> > > > > Cc: Ewan Milne <emilne@redhat.com> > > > > Cc: Long Li <longli@microsoft.com> > > > > Tested-by: "chenxiang (M)" <chenxiang66@hisilicon.com> > > > > Reported-by: John Garry <john.garry@huawei.com> > > > > Signed-off-by: Ming Lei <ming.lei@redhat.com> > > > > --- > > > > drivers/scsi/scsi_lib.c | 3 +-- > > > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > > > > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > > > > index 60c7a7d74852..03c6d0620bfd 100644 > > > > --- a/drivers/scsi/scsi_lib.c > > > > +++ b/drivers/scsi/scsi_lib.c > > > > @@ -1703,8 +1703,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, > > > > break; > > > > case BLK_STS_RESOURCE: > > > > case BLK_STS_ZONE_RESOURCE: > > > > - if (atomic_read(&sdev->device_busy) || > > > > - scsi_device_blocked(sdev)) > > > > + if (scsi_device_blocked(sdev)) > > > > ret = BLK_STS_DEV_RESOURCE; > > > > break; > > > > default: > > > Since this patch modifies code introduced in commit 86ff7c2a80cd ("blk-mq: > > > introduce BLK_STS_DEV_RESOURCE"), does this patch perhaps needs a Fixes: > > > tag? > > This same race exists before commit 86ff7c2a80cd, so I think the 'Fixes:' tag > > is misleading. > > When reverted the patch "scsi: core: Only re-run queue in scsi_end_request() > if device queue is busy", it also solves the issue. > Does the issue is brought by the patch? If so, maybe adding fixes("Fixes: > ed5dd6a67d5e ("scsi: core: Only re-run queue in scsi_end_request() if device > queue is busy")") is more accuratte. The debugfs log shows that the issue isn't related with restart or out of budget because the single request stays in hctx->dispatch, and the flag of RQF_DONTPREP is set too. The scsi's restart(scsi_run_queue_async) from request completion is just for handling requests in scheduler queue. Thanks, Ming ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] scsi: core: fix race between handling STS_RESOURCE and completion 2020-12-02 10:04 [PATCH] scsi: core: fix race between handling STS_RESOURCE and completion Ming Lei 2020-12-02 18:10 ` Bart Van Assche @ 2020-12-08 4:56 ` Martin K. Petersen 1 sibling, 0 replies; 6+ messages in thread From: Martin K. Petersen @ 2020-12-08 4:56 UTC (permalink / raw) To: Ming Lei, linux-scsi Cc: Martin K . Petersen, Kashyap Desai, Hannes Reinecke, John Garry, Long Li, Sumit Saxena, chenxiang (M), Bart Van Assche, Ewan Milne On Wed, 2 Dec 2020 18:04:19 +0800, Ming Lei wrote: > When queuing IO request to LLD, STS_RESOURCE may be returned because: > > - host in recovery or blocked > - target queue throttling or blocked > - LLD rejection > > Any one of the above doesn't happen frequently enough. > > [...] Applied to 5.10/scsi-fixes, thanks! [1/1] scsi: core: fix race between handling STS_RESOURCE and completion https://git.kernel.org/mkp/scsi/c/673235f91531 -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-12-08 4:57 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-12-02 10:04 [PATCH] scsi: core: fix race between handling STS_RESOURCE and completion Ming Lei 2020-12-02 18:10 ` Bart Van Assche 2020-12-03 1:03 ` Ming Lei 2020-12-07 3:32 ` chenxiang (M) 2020-12-07 7:51 ` Ming Lei 2020-12-08 4:56 ` Martin K. Petersen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox