* [PATCH 0/1] scsi: ufs: core: Fix task management completion timeout race
@ 2021-10-13 15:01 Adrian Hunter
2021-10-13 15:01 ` [PATCH 1/1] " Adrian Hunter
0 siblings, 1 reply; 8+ messages in thread
From: Adrian Hunter @ 2021-10-13 15:01 UTC (permalink / raw)
To: Martin K . Petersen
Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar,
Can Guo, Asutosh Das, Bart Van Assche, linux-scsi
Hi
Please consider this for v5.15 fixes.
Adrian Hunter (1):
scsi: ufs: core: Fix task management completion timeout race
drivers/scsi/ufs/ufshcd.c | 1 -
1 file changed, 1 deletion(-)
Regards
Adrian
^ permalink raw reply [flat|nested] 8+ messages in thread* [PATCH 1/1] scsi: ufs: core: Fix task management completion timeout race 2021-10-13 15:01 [PATCH 0/1] scsi: ufs: core: Fix task management completion timeout race Adrian Hunter @ 2021-10-13 15:01 ` Adrian Hunter 2021-10-14 4:14 ` Bart Van Assche 2021-10-14 19:18 ` Bart Van Assche 0 siblings, 2 replies; 8+ messages in thread From: Adrian Hunter @ 2021-10-13 15:01 UTC (permalink / raw) To: Martin K . Petersen Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar, Can Guo, Asutosh Das, Bart Van Assche, linux-scsi __ufshcd_issue_tm_cmd() clears req->end_io_data after timing out, which races with the completion function ufshcd_tmc_handler() which expects req->end_io_data to have a value. Note __ufshcd_issue_tm_cmd() and ufshcd_tmc_handler() are already synchronized using hba->tmf_rqs and hba->outstanding_tasks under the host_lock spinlock. It is also not necessary (nor typical) to clear req->end_io_data because the block layer does it before allocating out requests e.g. via blk_get_request(). So fix by not clearing it. Fixes: f5ef336fd2e4c3 ("scsi: ufs: core: Fix task management completion") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> --- drivers/scsi/ufs/ufshcd.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index 95be7ecdfe10..f34b3994d1aa 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -6550,11 +6550,6 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba, err = wait_for_completion_io_timeout(&wait, msecs_to_jiffies(TM_CMD_TIMEOUT)); if (!err) { - /* - * Make sure that ufshcd_compl_tm() does not trigger a - * use-after-free. - */ - req->end_io_data = NULL; ufshcd_add_tm_upiu_trace(hba, task_tag, UFS_TM_ERR); dev_err(hba->dev, "%s: task management cmd 0x%.2x timed-out\n", __func__, tm_function); -- 2.25.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] scsi: ufs: core: Fix task management completion timeout race 2021-10-13 15:01 ` [PATCH 1/1] " Adrian Hunter @ 2021-10-14 4:14 ` Bart Van Assche 2021-10-14 6:02 ` Adrian Hunter 2021-10-14 19:18 ` Bart Van Assche 1 sibling, 1 reply; 8+ messages in thread From: Bart Van Assche @ 2021-10-14 4:14 UTC (permalink / raw) To: Adrian Hunter, Martin K . Petersen Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar, Can Guo, Asutosh Das, linux-scsi On 10/13/21 08:01, Adrian Hunter wrote: > __ufshcd_issue_tm_cmd() clears req->end_io_data after timing out, > which races with the completion function ufshcd_tmc_handler() which > expects req->end_io_data to have a value. > > Note __ufshcd_issue_tm_cmd() and ufshcd_tmc_handler() are already > synchronized using hba->tmf_rqs and hba->outstanding_tasks under the > host_lock spinlock. > > It is also not necessary (nor typical) to clear req->end_io_data because > the block layer does it before allocating out requests e.g. via > blk_get_request(). > > So fix by not clearing it. > > Fixes: f5ef336fd2e4c3 ("scsi: ufs: core: Fix task management completion") > Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> > --- > drivers/scsi/ufs/ufshcd.c | 5 ----- > 1 file changed, 5 deletions(-) > > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c > index 95be7ecdfe10..f34b3994d1aa 100644 > --- a/drivers/scsi/ufs/ufshcd.c > +++ b/drivers/scsi/ufs/ufshcd.c > @@ -6550,11 +6550,6 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba, > err = wait_for_completion_io_timeout(&wait, > msecs_to_jiffies(TM_CMD_TIMEOUT)); > if (!err) { > - /* > - * Make sure that ufshcd_compl_tm() does not trigger a > - * use-after-free. > - */ > - req->end_io_data = NULL; > ufshcd_add_tm_upiu_trace(hba, task_tag, UFS_TM_ERR); > dev_err(hba->dev, "%s: task management cmd 0x%.2x timed-out\n", > __func__, tm_function); With this patch applied ufshcd_tmc_handler() can trigger a use-after-free of the stack memory used for the 'wait' completion. Wouldn't it be better to keep the code that clears req->end_io_data and to change complete(c) into if(c) complete(c) in ufshcd_tmc_handler()? Thanks, Bart. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] scsi: ufs: core: Fix task management completion timeout race 2021-10-14 4:14 ` Bart Van Assche @ 2021-10-14 6:02 ` Adrian Hunter 2021-10-14 16:47 ` Bart Van Assche 0 siblings, 1 reply; 8+ messages in thread From: Adrian Hunter @ 2021-10-14 6:02 UTC (permalink / raw) To: Bart Van Assche, Martin K . Petersen Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar, Can Guo, Asutosh Das, linux-scsi On 14/10/2021 07:14, Bart Van Assche wrote: > On 10/13/21 08:01, Adrian Hunter wrote: >> __ufshcd_issue_tm_cmd() clears req->end_io_data after timing out, >> which races with the completion function ufshcd_tmc_handler() which >> expects req->end_io_data to have a value. >> >> Note __ufshcd_issue_tm_cmd() and ufshcd_tmc_handler() are already >> synchronized using hba->tmf_rqs and hba->outstanding_tasks under the >> host_lock spinlock. >> >> It is also not necessary (nor typical) to clear req->end_io_data because >> the block layer does it before allocating out requests e.g. via >> blk_get_request(). >> >> So fix by not clearing it. >> >> Fixes: f5ef336fd2e4c3 ("scsi: ufs: core: Fix task management completion") >> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> >> --- >> drivers/scsi/ufs/ufshcd.c | 5 ----- >> 1 file changed, 5 deletions(-) >> >> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c >> index 95be7ecdfe10..f34b3994d1aa 100644 >> --- a/drivers/scsi/ufs/ufshcd.c >> +++ b/drivers/scsi/ufs/ufshcd.c >> @@ -6550,11 +6550,6 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba, >> err = wait_for_completion_io_timeout(&wait, >> msecs_to_jiffies(TM_CMD_TIMEOUT)); >> if (!err) { >> - /* >> - * Make sure that ufshcd_compl_tm() does not trigger a >> - * use-after-free. >> - */ >> - req->end_io_data = NULL; >> ufshcd_add_tm_upiu_trace(hba, task_tag, UFS_TM_ERR); >> dev_err(hba->dev, "%s: task management cmd 0x%.2x timed-out\n", >> __func__, tm_function); > > With this patch applied ufshcd_tmc_handler() can trigger a use-after-free of the stack memory used for the 'wait' completion. AFAICT that would only happen because blk_mq_tagset_busy_iter() is not synchronized with respect to blk_put_request(). But ufshcd_tmc_handler() does not use blk_mq_tagset_busy_iter() anymore, so that can't happen. Wouldn't it be better to keep the code that clears req->end_io_data and to change complete(c) into if(c) complete(c) in ufshcd_tmc_handler()? If that were needed, it would imply the synchronization was broken i.e. why are we referencing a request that has already been through blk_put_request()? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] scsi: ufs: core: Fix task management completion timeout race 2021-10-14 6:02 ` Adrian Hunter @ 2021-10-14 16:47 ` Bart Van Assche 2021-10-14 18:50 ` Adrian Hunter 0 siblings, 1 reply; 8+ messages in thread From: Bart Van Assche @ 2021-10-14 16:47 UTC (permalink / raw) To: Adrian Hunter, Martin K . Petersen Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar, Can Guo, Asutosh Das, linux-scsi On 10/13/21 11:02 PM, Adrian Hunter wrote: > On 14/10/2021 07:14, Bart Van Assche wrote: >> Wouldn't it be better to keep the code that clears req->end_io_data >> and to change complete(c) into if(c) complete(c) in >> ufshcd_tmc_handler()? > > If that were needed, it would imply the synchronization was broken > i.e. why are we referencing a request that has already been through > blk_put_request()? The scenario I'm worried about is as follows: * __ufshcd_issue_tm_cmd() issues a task management function. * No completion is received before TM_CMD_TIMEOUT has expired (100 ms). * ufshcd_clear_tm_cmd() fails. * The TMF completes, ufshcd_tmc_handler() is called and that function calls complete(req->end_io_data). Can this happen? I agree that this scenario involves completion of a request that has already been through blk_put_request(). Thanks, Bart. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] scsi: ufs: core: Fix task management completion timeout race 2021-10-14 16:47 ` Bart Van Assche @ 2021-10-14 18:50 ` Adrian Hunter 2021-10-15 5:41 ` Adrian Hunter 0 siblings, 1 reply; 8+ messages in thread From: Adrian Hunter @ 2021-10-14 18:50 UTC (permalink / raw) To: Bart Van Assche, Martin K . Petersen Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar, Can Guo, Asutosh Das, linux-scsi On 14/10/2021 19:47, Bart Van Assche wrote: > On 10/13/21 11:02 PM, Adrian Hunter wrote: >> On 14/10/2021 07:14, Bart Van Assche wrote: >>> Wouldn't it be better to keep the code that clears req->end_io_data >>> and to change complete(c) into if(c) complete(c) in >>> ufshcd_tmc_handler()? >> >> If that were needed, it would imply the synchronization was broken >> i.e. why are we referencing a request that has already been through >> blk_put_request()? > > The scenario I'm worried about is as follows: > * __ufshcd_issue_tm_cmd() issues a task management function. > * No completion is received before TM_CMD_TIMEOUT has expired (100 ms). > * ufshcd_clear_tm_cmd() fails. > * The TMF completes, ufshcd_tmc_handler() is called and that function calls complete(req->end_io_data). > > Can this happen? No because the tag's bit is cleared from outstanding_tasks before blk_put_request() and access to outstanding_tasks is protected by host_lock in both __ufshcd_issue_tm_cmd() and ufshcd_clear_tm_cmd(). > > I agree that this scenario involves completion of a request that has already been through blk_put_request(). > > Thanks, > > Bart. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] scsi: ufs: core: Fix task management completion timeout race 2021-10-14 18:50 ` Adrian Hunter @ 2021-10-15 5:41 ` Adrian Hunter 0 siblings, 0 replies; 8+ messages in thread From: Adrian Hunter @ 2021-10-15 5:41 UTC (permalink / raw) To: Bart Van Assche, Martin K . Petersen Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar, Can Guo, Asutosh Das, linux-scsi On 14/10/2021 21:50, Adrian Hunter wrote: > On 14/10/2021 19:47, Bart Van Assche wrote: >> On 10/13/21 11:02 PM, Adrian Hunter wrote: >>> On 14/10/2021 07:14, Bart Van Assche wrote: >>>> Wouldn't it be better to keep the code that clears req->end_io_data >>>> and to change complete(c) into if(c) complete(c) in >>>> ufshcd_tmc_handler()? >>> >>> If that were needed, it would imply the synchronization was broken >>> i.e. why are we referencing a request that has already been through >>> blk_put_request()? >> >> The scenario I'm worried about is as follows: >> * __ufshcd_issue_tm_cmd() issues a task management function. >> * No completion is received before TM_CMD_TIMEOUT has expired (100 ms). >> * ufshcd_clear_tm_cmd() fails. >> * The TMF completes, ufshcd_tmc_handler() is called and that function calls complete(req->end_io_data). >> >> Can this happen? > > No because the tag's bit is cleared from outstanding_tasks before blk_put_request() and > access to outstanding_tasks is protected by host_lock in both __ufshcd_issue_tm_cmd() > and ufshcd_clear_tm_cmd(). Although I just noticed a different issue. In ufshcd_tmc_handler() the task doorbell register needs to be read in conjunction with outstanding_tasks i.e. under the spinlock I will send a V2. > >> >> I agree that this scenario involves completion of a request that has already been through blk_put_request(). >> >> Thanks, >> >> Bart. > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] scsi: ufs: core: Fix task management completion timeout race 2021-10-13 15:01 ` [PATCH 1/1] " Adrian Hunter 2021-10-14 4:14 ` Bart Van Assche @ 2021-10-14 19:18 ` Bart Van Assche 1 sibling, 0 replies; 8+ messages in thread From: Bart Van Assche @ 2021-10-14 19:18 UTC (permalink / raw) To: Adrian Hunter, Martin K . Petersen Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar, Can Guo, Asutosh Das, linux-scsi On 10/13/21 8:01 AM, Adrian Hunter wrote: > __ufshcd_issue_tm_cmd() clears req->end_io_data after timing out, > which races with the completion function ufshcd_tmc_handler() which > expects req->end_io_data to have a value. > > Note __ufshcd_issue_tm_cmd() and ufshcd_tmc_handler() are already > synchronized using hba->tmf_rqs and hba->outstanding_tasks under the > host_lock spinlock. > > It is also not necessary (nor typical) to clear req->end_io_data because > the block layer does it before allocating out requests e.g. via > blk_get_request(). > > So fix by not clearing it. > > Fixes: f5ef336fd2e4c3 ("scsi: ufs: core: Fix task management completion") > Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> > --- > drivers/scsi/ufs/ufshcd.c | 5 ----- > 1 file changed, 5 deletions(-) > > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c > index 95be7ecdfe10..f34b3994d1aa 100644 > --- a/drivers/scsi/ufs/ufshcd.c > +++ b/drivers/scsi/ufs/ufshcd.c > @@ -6550,11 +6550,6 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba, > err = wait_for_completion_io_timeout(&wait, > msecs_to_jiffies(TM_CMD_TIMEOUT)); > if (!err) { > - /* > - * Make sure that ufshcd_compl_tm() does not trigger a > - * use-after-free. > - */ > - req->end_io_data = NULL; > ufshcd_add_tm_upiu_trace(hba, task_tag, UFS_TM_ERR); > dev_err(hba->dev, "%s: task management cmd 0x%.2x timed-out\n", > __func__, tm_function); Reviewed-by: Bart Van Assche <bvanassche@acm.org> ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-10-15 5:41 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-10-13 15:01 [PATCH 0/1] scsi: ufs: core: Fix task management completion timeout race Adrian Hunter 2021-10-13 15:01 ` [PATCH 1/1] " Adrian Hunter 2021-10-14 4:14 ` Bart Van Assche 2021-10-14 6:02 ` Adrian Hunter 2021-10-14 16:47 ` Bart Van Assche 2021-10-14 18:50 ` Adrian Hunter 2021-10-15 5:41 ` Adrian Hunter 2021-10-14 19:18 ` Bart Van Assche
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox