[PATCH 0/1] scsi: ufs: core: Fix task management completion timeout race

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/1] scsi: ufs: core: Fix task management completion timeout race
@ 2021-10-13 15:01 Adrian Hunter
  2021-10-13 15:01 ` [PATCH 1/1] " Adrian Hunter
  0 siblings, 1 reply; 8+ messages in thread
From: Adrian Hunter @ 2021-10-13 15:01 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar,
	Can Guo, Asutosh Das, Bart Van Assche, linux-scsi

Hi

Please consider this for v5.15 fixes.


Adrian Hunter (1):
      scsi: ufs: core: Fix task management completion timeout race

 drivers/scsi/ufs/ufshcd.c | 1 -
 1 file changed, 1 deletion(-)


Regards
Adrian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/1] scsi: ufs: core: Fix task management completion timeout race
  2021-10-13 15:01 [PATCH 0/1] scsi: ufs: core: Fix task management completion timeout race Adrian Hunter
@ 2021-10-13 15:01 ` Adrian Hunter
  2021-10-14  4:14   ` Bart Van Assche
  2021-10-14 19:18   ` Bart Van Assche
  0 siblings, 2 replies; 8+ messages in thread
From: Adrian Hunter @ 2021-10-13 15:01 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar,
	Can Guo, Asutosh Das, Bart Van Assche, linux-scsi

__ufshcd_issue_tm_cmd() clears req->end_io_data after timing out,
which races with the completion function ufshcd_tmc_handler() which
expects req->end_io_data to have a value.

Note __ufshcd_issue_tm_cmd() and ufshcd_tmc_handler() are already
synchronized using hba->tmf_rqs and hba->outstanding_tasks under the
host_lock spinlock.

It is also not necessary (nor typical) to clear req->end_io_data because
the block layer does it before allocating out requests e.g. via
blk_get_request().

So fix by not clearing it.

Fixes: f5ef336fd2e4c3 ("scsi: ufs: core: Fix task management completion")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 drivers/scsi/ufs/ufshcd.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 95be7ecdfe10..f34b3994d1aa 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -6550,11 +6550,6 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba,
 	err = wait_for_completion_io_timeout(&wait,
 			msecs_to_jiffies(TM_CMD_TIMEOUT));
 	if (!err) {
-		/*
-		 * Make sure that ufshcd_compl_tm() does not trigger a
-		 * use-after-free.
-		 */
-		req->end_io_data = NULL;
 		ufshcd_add_tm_upiu_trace(hba, task_tag, UFS_TM_ERR);
 		dev_err(hba->dev, "%s: task management cmd 0x%.2x timed-out\n",
 				__func__, tm_function);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] scsi: ufs: core: Fix task management completion timeout race
  2021-10-13 15:01 ` [PATCH 1/1] " Adrian Hunter
@ 2021-10-14  4:14   ` Bart Van Assche
  2021-10-14  6:02     ` Adrian Hunter
  2021-10-14 19:18   ` Bart Van Assche
  1 sibling, 1 reply; 8+ messages in thread
From: Bart Van Assche @ 2021-10-14  4:14 UTC (permalink / raw)
  To: Adrian Hunter, Martin K . Petersen
  Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar,
	Can Guo, Asutosh Das, linux-scsi

On 10/13/21 08:01, Adrian Hunter wrote:
> __ufshcd_issue_tm_cmd() clears req->end_io_data after timing out,
> which races with the completion function ufshcd_tmc_handler() which
> expects req->end_io_data to have a value.
> 
> Note __ufshcd_issue_tm_cmd() and ufshcd_tmc_handler() are already
> synchronized using hba->tmf_rqs and hba->outstanding_tasks under the
> host_lock spinlock.
> 
> It is also not necessary (nor typical) to clear req->end_io_data because
> the block layer does it before allocating out requests e.g. via
> blk_get_request().
> 
> So fix by not clearing it.
> 
> Fixes: f5ef336fd2e4c3 ("scsi: ufs: core: Fix task management completion")
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>   drivers/scsi/ufs/ufshcd.c | 5 -----
>   1 file changed, 5 deletions(-)
> 
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index 95be7ecdfe10..f34b3994d1aa 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -6550,11 +6550,6 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba,
>   	err = wait_for_completion_io_timeout(&wait,
>   			msecs_to_jiffies(TM_CMD_TIMEOUT));
>   	if (!err) {
> -		/*
> -		 * Make sure that ufshcd_compl_tm() does not trigger a
> -		 * use-after-free.
> -		 */
> -		req->end_io_data = NULL;
>   		ufshcd_add_tm_upiu_trace(hba, task_tag, UFS_TM_ERR);
>   		dev_err(hba->dev, "%s: task management cmd 0x%.2x timed-out\n",
>   				__func__, tm_function);

With this patch applied ufshcd_tmc_handler() can trigger a 
use-after-free of the stack memory used for the 'wait' completion. 
Wouldn't it be better to keep the code that clears req->end_io_data and 
to change complete(c) into if(c) complete(c) in ufshcd_tmc_handler()?

Thanks,

Bart.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] scsi: ufs: core: Fix task management completion timeout race
  2021-10-14  4:14   ` Bart Van Assche
@ 2021-10-14  6:02     ` Adrian Hunter
  2021-10-14 16:47       ` Bart Van Assche
  0 siblings, 1 reply; 8+ messages in thread
From: Adrian Hunter @ 2021-10-14  6:02 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen
  Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar,
	Can Guo, Asutosh Das, linux-scsi

On 14/10/2021 07:14, Bart Van Assche wrote:
> On 10/13/21 08:01, Adrian Hunter wrote:
>> __ufshcd_issue_tm_cmd() clears req->end_io_data after timing out,
>> which races with the completion function ufshcd_tmc_handler() which
>> expects req->end_io_data to have a value.
>>
>> Note __ufshcd_issue_tm_cmd() and ufshcd_tmc_handler() are already
>> synchronized using hba->tmf_rqs and hba->outstanding_tasks under the
>> host_lock spinlock.
>>
>> It is also not necessary (nor typical) to clear req->end_io_data because
>> the block layer does it before allocating out requests e.g. via
>> blk_get_request().
>>
>> So fix by not clearing it.
>>
>> Fixes: f5ef336fd2e4c3 ("scsi: ufs: core: Fix task management completion")
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>> ---
>>   drivers/scsi/ufs/ufshcd.c | 5 -----
>>   1 file changed, 5 deletions(-)
>>
>> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
>> index 95be7ecdfe10..f34b3994d1aa 100644
>> --- a/drivers/scsi/ufs/ufshcd.c
>> +++ b/drivers/scsi/ufs/ufshcd.c
>> @@ -6550,11 +6550,6 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba,
>>       err = wait_for_completion_io_timeout(&wait,
>>               msecs_to_jiffies(TM_CMD_TIMEOUT));
>>       if (!err) {
>> -        /*
>> -         * Make sure that ufshcd_compl_tm() does not trigger a
>> -         * use-after-free.
>> -         */
>> -        req->end_io_data = NULL;
>>           ufshcd_add_tm_upiu_trace(hba, task_tag, UFS_TM_ERR);
>>           dev_err(hba->dev, "%s: task management cmd 0x%.2x timed-out\n",
>>                   __func__, tm_function);
> 
> With this patch applied ufshcd_tmc_handler() can trigger a use-after-free of the stack memory used for the 'wait' completion.

AFAICT that would only happen because blk_mq_tagset_busy_iter() is not synchronized with respect to blk_put_request().
But ufshcd_tmc_handler() does not use blk_mq_tagset_busy_iter() anymore, so that can't happen.

Wouldn't it be better to keep the code that clears req->end_io_data and to change complete(c) into if(c) complete(c) in ufshcd_tmc_handler()?

If that were needed, it would imply the synchronization was broken i.e. why are we referencing a request that has already been through blk_put_request()?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] scsi: ufs: core: Fix task management completion timeout race
  2021-10-14  6:02     ` Adrian Hunter
@ 2021-10-14 16:47       ` Bart Van Assche
  2021-10-14 18:50         ` Adrian Hunter
  0 siblings, 1 reply; 8+ messages in thread
From: Bart Van Assche @ 2021-10-14 16:47 UTC (permalink / raw)
  To: Adrian Hunter, Martin K . Petersen
  Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar,
	Can Guo, Asutosh Das, linux-scsi

On 10/13/21 11:02 PM, Adrian Hunter wrote:
> On 14/10/2021 07:14, Bart Van Assche wrote:
>> Wouldn't it be better to keep the code that clears req->end_io_data
>> and to change complete(c) into if(c) complete(c) in
>> ufshcd_tmc_handler()?
> 
> If that were needed, it would imply the synchronization was broken
> i.e. why are we referencing a request that has already been through
> blk_put_request()?

The scenario I'm worried about is as follows:
* __ufshcd_issue_tm_cmd() issues a task management function.
* No completion is received before TM_CMD_TIMEOUT has expired (100 ms).
* ufshcd_clear_tm_cmd() fails.
* The TMF completes, ufshcd_tmc_handler() is called and that function 
calls complete(req->end_io_data).

Can this happen?

I agree that this scenario involves completion of a request that has 
already been through blk_put_request().

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] scsi: ufs: core: Fix task management completion timeout race
  2021-10-14 16:47       ` Bart Van Assche
@ 2021-10-14 18:50         ` Adrian Hunter
  2021-10-15  5:41           ` Adrian Hunter
  0 siblings, 1 reply; 8+ messages in thread
From: Adrian Hunter @ 2021-10-14 18:50 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen
  Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar,
	Can Guo, Asutosh Das, linux-scsi

On 14/10/2021 19:47, Bart Van Assche wrote:
> On 10/13/21 11:02 PM, Adrian Hunter wrote:
>> On 14/10/2021 07:14, Bart Van Assche wrote:
>>> Wouldn't it be better to keep the code that clears req->end_io_data
>>> and to change complete(c) into if(c) complete(c) in
>>> ufshcd_tmc_handler()?
>>
>> If that were needed, it would imply the synchronization was broken
>> i.e. why are we referencing a request that has already been through
>> blk_put_request()?
> 
> The scenario I'm worried about is as follows:
> * __ufshcd_issue_tm_cmd() issues a task management function.
> * No completion is received before TM_CMD_TIMEOUT has expired (100 ms).
> * ufshcd_clear_tm_cmd() fails.
> * The TMF completes, ufshcd_tmc_handler() is called and that function calls complete(req->end_io_data).
> 
> Can this happen?

No because the tag's bit is cleared from outstanding_tasks before blk_put_request() and
access to outstanding_tasks is protected by host_lock in both __ufshcd_issue_tm_cmd()
and ufshcd_clear_tm_cmd().

> 
> I agree that this scenario involves completion of a request that has already been through blk_put_request().
> 
> Thanks,
> 
> Bart.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] scsi: ufs: core: Fix task management completion timeout race
  2021-10-13 15:01 ` [PATCH 1/1] " Adrian Hunter
  2021-10-14  4:14   ` Bart Van Assche
@ 2021-10-14 19:18   ` Bart Van Assche
  1 sibling, 0 replies; 8+ messages in thread
From: Bart Van Assche @ 2021-10-14 19:18 UTC (permalink / raw)
  To: Adrian Hunter, Martin K . Petersen
  Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar,
	Can Guo, Asutosh Das, linux-scsi

On 10/13/21 8:01 AM, Adrian Hunter wrote:
> __ufshcd_issue_tm_cmd() clears req->end_io_data after timing out,
> which races with the completion function ufshcd_tmc_handler() which
> expects req->end_io_data to have a value.
> 
> Note __ufshcd_issue_tm_cmd() and ufshcd_tmc_handler() are already
> synchronized using hba->tmf_rqs and hba->outstanding_tasks under the
> host_lock spinlock.
> 
> It is also not necessary (nor typical) to clear req->end_io_data because
> the block layer does it before allocating out requests e.g. via
> blk_get_request().
> 
> So fix by not clearing it.
> 
> Fixes: f5ef336fd2e4c3 ("scsi: ufs: core: Fix task management completion")
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>   drivers/scsi/ufs/ufshcd.c | 5 -----
>   1 file changed, 5 deletions(-)
> 
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index 95be7ecdfe10..f34b3994d1aa 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -6550,11 +6550,6 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba,
>   	err = wait_for_completion_io_timeout(&wait,
>   			msecs_to_jiffies(TM_CMD_TIMEOUT));
>   	if (!err) {
> -		/*
> -		 * Make sure that ufshcd_compl_tm() does not trigger a
> -		 * use-after-free.
> -		 */
> -		req->end_io_data = NULL;
>   		ufshcd_add_tm_upiu_trace(hba, task_tag, UFS_TM_ERR);
>   		dev_err(hba->dev, "%s: task management cmd 0x%.2x timed-out\n",
>   				__func__, tm_function);

Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] scsi: ufs: core: Fix task management completion timeout race
  2021-10-14 18:50         ` Adrian Hunter
@ 2021-10-15  5:41           ` Adrian Hunter
  0 siblings, 0 replies; 8+ messages in thread
From: Adrian Hunter @ 2021-10-15  5:41 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen
  Cc: James E . J . Bottomley, Bean Huo, Avri Altman, Alim Akhtar,
	Can Guo, Asutosh Das, linux-scsi

On 14/10/2021 21:50, Adrian Hunter wrote:
> On 14/10/2021 19:47, Bart Van Assche wrote:
>> On 10/13/21 11:02 PM, Adrian Hunter wrote:
>>> On 14/10/2021 07:14, Bart Van Assche wrote:
>>>> Wouldn't it be better to keep the code that clears req->end_io_data
>>>> and to change complete(c) into if(c) complete(c) in
>>>> ufshcd_tmc_handler()?
>>>
>>> If that were needed, it would imply the synchronization was broken
>>> i.e. why are we referencing a request that has already been through
>>> blk_put_request()?
>>
>> The scenario I'm worried about is as follows:
>> * __ufshcd_issue_tm_cmd() issues a task management function.
>> * No completion is received before TM_CMD_TIMEOUT has expired (100 ms).
>> * ufshcd_clear_tm_cmd() fails.
>> * The TMF completes, ufshcd_tmc_handler() is called and that function calls complete(req->end_io_data).
>>
>> Can this happen?
> 
> No because the tag's bit is cleared from outstanding_tasks before blk_put_request() and
> access to outstanding_tasks is protected by host_lock in both __ufshcd_issue_tm_cmd()
> and ufshcd_clear_tm_cmd().

Although I just noticed a different issue.

In ufshcd_tmc_handler() the task doorbell register needs to be read
in conjunction with outstanding_tasks i.e. under the spinlock

I will send a V2.

> 
>>
>> I agree that this scenario involves completion of a request that has already been through blk_put_request().
>>
>> Thanks,
>>
>> Bart.
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-10-15  5:41 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-10-13 15:01 [PATCH 0/1] scsi: ufs: core: Fix task management completion timeout race Adrian Hunter
2021-10-13 15:01 ` [PATCH 1/1] " Adrian Hunter
2021-10-14  4:14   ` Bart Van Assche
2021-10-14  6:02     ` Adrian Hunter
2021-10-14 16:47       ` Bart Van Assche
2021-10-14 18:50         ` Adrian Hunter
2021-10-15  5:41           ` Adrian Hunter
2021-10-14 19:18   ` Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox