[PATCH] ufs: Increase the usable queue depth

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] ufs: Increase the usable queue depth
@ 2021-05-13 16:49 Bart Van Assche
  2021-05-14  4:04 ` Can Guo
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Bart Van Assche @ 2021-05-13 16:49 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: James E . J . Bottomley, Jaegeuk Kim, Bean Huo, Avri Altman,
	Asutosh Das, Vignesh Raghavendra, Can Guo, linux-scsi,
	Bart Van Assche, Alim Akhtar, Stanley Chu, Adrian Hunter

With the current implementation of the UFS driver active_queues is 1
instead of 0 if all UFS request queues are idle. That causes
hctx_may_queue() to divide the queue depth by 2 when queueing a request
and hence reduces the usable queue depth.

The shared tag set code in the block layer keeps track of the number of
active request queues. blk_mq_tag_busy() is called before a request is
queued onto a hwq and blk_mq_tag_idle() is called some time after the hwq
became idle. blk_mq_tag_idle() is called from inside blk_mq_timeout_work().
Hence, blk_mq_tag_idle() is only called if a timer is associated with each
request that is submitted to a request queue that shares a tag set with
another request queue. Hence this patch that adds a blk_mq_start_request()
call in ufshcd_exec_dev_cmd(). This patch doubles the queue depth on my
test setup from 16 to 32.

In addition to increasing the usable queue depth, also fix the
documentation of the 'timeout' parameter in the header above
ufshcd_exec_dev_cmd().

Cc: Can Guo <cang@codeaurora.org>
Cc: Alim Akhtar <alim.akhtar@samsung.com>
Cc: Avri Altman <avri.altman@wdc.com>
Cc: Stanley Chu <stanley.chu@mediatek.com>
Cc: Bean Huo <beanhuo@micron.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Fixes: 7252a3603015 ("scsi: ufs: Avoid busy-waiting by eliminating tag conflicts")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/ufs/ufshcd.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index c96e36aab989..e669243354da 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -2838,7 +2838,7 @@ static int ufshcd_wait_for_dev_cmd(struct ufs_hba *hba,
  * ufshcd_exec_dev_cmd - API for sending device management requests
  * @hba: UFS hba
  * @cmd_type: specifies the type (NOP, Query...)
- * @timeout: time in seconds
+ * @timeout: timeout in milliseconds
  *
  * NOTE: Since there is only one available tag for device management commands,
  * it is expected you hold the hba->dev_cmd.lock mutex.
@@ -2868,6 +2868,9 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba *hba,
 	}
 	tag = req->tag;
 	WARN_ON_ONCE(!ufshcd_valid_tag(hba, tag));
+	/* Set the timeout such that the SCSI error handler is not activated. */
+	req->timeout = msecs_to_jiffies(2 * timeout);
+	blk_mq_start_request(req);

 	init_completion(&wait);
 	lrbp = &hba->lrb[tag];

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] ufs: Increase the usable queue depth
  2021-05-13 16:49 [PATCH] ufs: Increase the usable queue depth Bart Van Assche
@ 2021-05-14  4:04 ` Can Guo
  2021-05-14  4:19   ` Bart Van Assche
  2021-05-14  4:22   ` Can Guo
  2021-05-15  3:13 ` Martin K. Petersen
  2021-06-29 13:40 ` Can Guo
  2 siblings, 2 replies; 8+ messages in thread
From: Can Guo @ 2021-05-14  4:04 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Martin K . Petersen, James E . J . Bottomley, Jaegeuk Kim,
	Bean Huo, Avri Altman, Asutosh Das, Vignesh Raghavendra,
	linux-scsi, Alim Akhtar, Stanley Chu, Adrian Hunter

Hi Bart,

On 2021-05-14 00:49, Bart Van Assche wrote:
> With the current implementation of the UFS driver active_queues is 1
> instead of 0 if all UFS request queues are idle. That causes
> hctx_may_queue() to divide the queue depth by 2 when queueing a request
> and hence reduces the usable queue depth.

This is interesting. When all UFS queues are idle, in hctx_may_queue(),
active_queues reads 1 (users == 1, depth == 32), where is it divided by 
2?

static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
                                   struct sbitmap_queue *bt)
{
         unsigned int depth, users;

....
                 users = atomic_read(&hctx->tags->active_queues);
         }

         if (!users)
                 return true;

         /*
          * Allow at least some tags
          */
         depth = max((bt->sb.depth + users - 1) / users, 4U);
         return __blk_mq_active_requests(hctx) < depth;
}

Thanks,
Can Guo.

> 
> The shared tag set code in the block layer keeps track of the number of
> active request queues. blk_mq_tag_busy() is called before a request is
> queued onto a hwq and blk_mq_tag_idle() is called some time after the 
> hwq
> became idle. blk_mq_tag_idle() is called from inside 
> blk_mq_timeout_work().
> Hence, blk_mq_tag_idle() is only called if a timer is associated with 
> each
> request that is submitted to a request queue that shares a tag set with
> another request queue. Hence this patch that adds a 
> blk_mq_start_request()
> call in ufshcd_exec_dev_cmd(). This patch doubles the queue depth on my
> test setup from 16 to 32.
> 
> In addition to increasing the usable queue depth, also fix the
> documentation of the 'timeout' parameter in the header above
> ufshcd_exec_dev_cmd().
> 
> Cc: Can Guo <cang@codeaurora.org>
> Cc: Alim Akhtar <alim.akhtar@samsung.com>
> Cc: Avri Altman <avri.altman@wdc.com>
> Cc: Stanley Chu <stanley.chu@mediatek.com>
> Cc: Bean Huo <beanhuo@micron.com>
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Fixes: 7252a3603015 ("scsi: ufs: Avoid busy-waiting by eliminating tag
> conflicts")
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/scsi/ufs/ufshcd.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index c96e36aab989..e669243354da 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -2838,7 +2838,7 @@ static int ufshcd_wait_for_dev_cmd(struct ufs_hba 
> *hba,
>   * ufshcd_exec_dev_cmd - API for sending device management requests
>   * @hba: UFS hba
>   * @cmd_type: specifies the type (NOP, Query...)
> - * @timeout: time in seconds
> + * @timeout: timeout in milliseconds
>   *
>   * NOTE: Since there is only one available tag for device management 
> commands,
>   * it is expected you hold the hba->dev_cmd.lock mutex.
> @@ -2868,6 +2868,9 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba 
> *hba,
>  	}
>  	tag = req->tag;
>  	WARN_ON_ONCE(!ufshcd_valid_tag(hba, tag));
> +	/* Set the timeout such that the SCSI error handler is not activated. 
> */
> +	req->timeout = msecs_to_jiffies(2 * timeout);
> +	blk_mq_start_request(req);
> 
>  	init_completion(&wait);
>  	lrbp = &hba->lrb[tag];

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ufs: Increase the usable queue depth
  2021-05-14  4:04 ` Can Guo
@ 2021-05-14  4:19   ` Bart Van Assche
  2021-05-14  4:24     ` Can Guo
  2021-05-14  4:22   ` Can Guo
  1 sibling, 1 reply; 8+ messages in thread
From: Bart Van Assche @ 2021-05-14  4:19 UTC (permalink / raw)
  To: Can Guo
  Cc: Martin K . Petersen, James E . J . Bottomley, Jaegeuk Kim,
	Bean Huo, Avri Altman, Asutosh Das, Vignesh Raghavendra,
	linux-scsi, Alim Akhtar, Stanley Chu, Adrian Hunter

On 5/13/21 9:04 PM, Can Guo wrote:
> Hi Bart,
> 
> On 2021-05-14 00:49, Bart Van Assche wrote:
>> With the current implementation of the UFS driver active_queues is 1
>> instead of 0 if all UFS request queues are idle. That causes
>> hctx_may_queue() to divide the queue depth by 2 when queueing a request
>> and hence reduces the usable queue depth.
> 
> This is interesting. When all UFS queues are idle, in hctx_may_queue(),
> active_queues reads 1 (users == 1, depth == 32), where is it divided by 2?
> 
> static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
>                                  struct sbitmap_queue *bt)
> {
>        unsigned int depth, users;
> 
> ....
>                users = atomic_read(&hctx->tags->active_queues);
>        }
> 
>        if (!users)
>                return true;
> 
>        /*
>         * Allow at least some tags
>         */
>        depth = max((bt->sb.depth + users - 1) / users, 4U);
>        return __blk_mq_active_requests(hctx) < depth;
> }

Hi Can,

If no I/O scheduler has been configured then the active_queues counter
is increased from inside blk_get_request() by blk_mq_tag_busy() before
hctx_may_queue() is called. So if active_queues == 1 when the UFS device
is idle, the active_queues counter will be increased to 2 if a request
is submitted to another request queue than hba->cmd_queue. This will
cause the hctx_may_queue() calls from inside __blk_mq_alloc_request()
and __blk_mq_get_driver_tag() to limit the queue depth to 32 / 2 = 16.

If an I/O scheduler has been configured then __blk_mq_get_driver_tag()
will be the first function to call blk_mq_tag_busy() while processing a
request. The hctx_may_queue() call in __blk_mq_get_driver_tag() will
limit the queue depth to 32 / 2 = 16 if an I/O scheduler has been
configured.

Bart.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ufs: Increase the usable queue depth
  2021-05-14  4:19   ` Bart Van Assche
@ 2021-05-14  4:24     ` Can Guo
  2021-05-14  4:47       ` Can Guo
  0 siblings, 1 reply; 8+ messages in thread
From: Can Guo @ 2021-05-14  4:24 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Martin K . Petersen, James E . J . Bottomley, Jaegeuk Kim,
	Bean Huo, Avri Altman, Asutosh Das, Vignesh Raghavendra,
	linux-scsi, Alim Akhtar, Stanley Chu, Adrian Hunter

On 2021-05-14 12:19, Bart Van Assche wrote:
> On 5/13/21 9:04 PM, Can Guo wrote:
>> Hi Bart,
>> 
>> On 2021-05-14 00:49, Bart Van Assche wrote:
>>> With the current implementation of the UFS driver active_queues is 1
>>> instead of 0 if all UFS request queues are idle. That causes
>>> hctx_may_queue() to divide the queue depth by 2 when queueing a 
>>> request
>>> and hence reduces the usable queue depth.
>> 
>> This is interesting. When all UFS queues are idle, in 
>> hctx_may_queue(),
>> active_queues reads 1 (users == 1, depth == 32), where is it divided 
>> by 2?
>> 
>> static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
>>                                  struct sbitmap_queue *bt)
>> {
>>        unsigned int depth, users;
>> 
>> ....
>>                users = atomic_read(&hctx->tags->active_queues);
>>        }
>> 
>>        if (!users)
>>                return true;
>> 
>>        /*
>>         * Allow at least some tags
>>         */
>>        depth = max((bt->sb.depth + users - 1) / users, 4U);
>>        return __blk_mq_active_requests(hctx) < depth;
>> }
> 
> Hi Can,
> 
> If no I/O scheduler has been configured then the active_queues counter
> is increased from inside blk_get_request() by blk_mq_tag_busy() before
> hctx_may_queue() is called. So if active_queues == 1 when the UFS 
> device
> is idle, the active_queues counter will be increased to 2 if a request
> is submitted to another request queue than hba->cmd_queue. This will
> cause the hctx_may_queue() calls from inside __blk_mq_alloc_request()
> and __blk_mq_get_driver_tag() to limit the queue depth to 32 / 2 = 16.
> 
> If an I/O scheduler has been configured then __blk_mq_get_driver_tag()
> will be the first function to call blk_mq_tag_busy() while processing a
> request. The hctx_may_queue() call in __blk_mq_get_driver_tag() will
> limit the queue depth to 32 / 2 = 16 if an I/O scheduler has been
> configured.
> 
> Bart.

Yes, I just figured out what you are saying from the commit message and
gave my reviewed-by tag. Thanks for the explanation and the fix.

Regards,
Can Guo.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ufs: Increase the usable queue depth
  2021-05-14  4:24     ` Can Guo
@ 2021-05-14  4:47       ` Can Guo
  0 siblings, 0 replies; 8+ messages in thread
From: Can Guo @ 2021-05-14  4:47 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Martin K . Petersen, James E . J . Bottomley, Jaegeuk Kim,
	Bean Huo, Avri Altman, Asutosh Das, Vignesh Raghavendra,
	linux-scsi, Alim Akhtar, Stanley Chu, Adrian Hunter

On 2021-05-14 12:24, Can Guo wrote:
> On 2021-05-14 12:19, Bart Van Assche wrote:
>> On 5/13/21 9:04 PM, Can Guo wrote:
>>> Hi Bart,
>>> 
>>> On 2021-05-14 00:49, Bart Van Assche wrote:
>>>> With the current implementation of the UFS driver active_queues is 1
>>>> instead of 0 if all UFS request queues are idle. That causes
>>>> hctx_may_queue() to divide the queue depth by 2 when queueing a 
>>>> request
>>>> and hence reduces the usable queue depth.
>>> 
>>> This is interesting. When all UFS queues are idle, in 
>>> hctx_may_queue(),
>>> active_queues reads 1 (users == 1, depth == 32), where is it divided 
>>> by 2?
>>> 
>>> static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
>>>                                  struct sbitmap_queue *bt)
>>> {
>>>        unsigned int depth, users;
>>> 
>>> ....
>>>                users = atomic_read(&hctx->tags->active_queues);
>>>        }
>>> 
>>>        if (!users)
>>>                return true;
>>> 
>>>        /*
>>>         * Allow at least some tags
>>>         */
>>>        depth = max((bt->sb.depth + users - 1) / users, 4U);
>>>        return __blk_mq_active_requests(hctx) < depth;
>>> }
>> 
>> Hi Can,
>> 
>> If no I/O scheduler has been configured then the active_queues counter
>> is increased from inside blk_get_request() by blk_mq_tag_busy() before
>> hctx_may_queue() is called. So if active_queues == 1 when the UFS 
>> device
>> is idle, the active_queues counter will be increased to 2 if a request
>> is submitted to another request queue than hba->cmd_queue. This will
>> cause the hctx_may_queue() calls from inside __blk_mq_alloc_request()
>> and __blk_mq_get_driver_tag() to limit the queue depth to 32 / 2 = 16.
>> 
>> If an I/O scheduler has been configured then __blk_mq_get_driver_tag()
>> will be the first function to call blk_mq_tag_busy() while processing 
>> a
>> request. The hctx_may_queue() call in __blk_mq_get_driver_tag() will
>> limit the queue depth to 32 / 2 = 16 if an I/O scheduler has been
>> configured.
>> 
>> Bart.
> 
> Yes, I just figured out what you are saying from the commit message and
> gave my reviewed-by tag. Thanks for the explanation and the fix.
> 
> Regards,
> Can Guo.

We definitely need to have fix present on Android12-5.10,
because performance may be impacted without it...

Thanks,
Can Guo.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ufs: Increase the usable queue depth
  2021-05-14  4:04 ` Can Guo
  2021-05-14  4:19   ` Bart Van Assche
@ 2021-05-14  4:22   ` Can Guo
  1 sibling, 0 replies; 8+ messages in thread
From: Can Guo @ 2021-05-14  4:22 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Martin K . Petersen, James E . J . Bottomley, Jaegeuk Kim,
	Bean Huo, Avri Altman, Asutosh Das, Vignesh Raghavendra,
	linux-scsi, Alim Akhtar, Stanley Chu, Adrian Hunter

On 2021-05-14 12:04, Can Guo wrote:
> Hi Bart,
> 
> On 2021-05-14 00:49, Bart Van Assche wrote:
>> With the current implementation of the UFS driver active_queues is 1
>> instead of 0 if all UFS request queues are idle. That causes
>> hctx_may_queue() to divide the queue depth by 2 when queueing a 
>> request
>> and hence reduces the usable queue depth.
> 
> This is interesting. When all UFS queues are idle, in hctx_may_queue(),
> active_queues reads 1 (users == 1, depth == 32), where is it divided by 
> 2?
> 

Are you saying that if we queue a new request on one of the UFS request
queues, since the active_queues is always above 0, we can never use
the full queue depth? If so, then I agree.

Reviewed-by: Can Guo <cang@codeaurora.org>

> static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
>                                   struct sbitmap_queue *bt)
> {
>         unsigned int depth, users;
> 
> ....
>                 users = atomic_read(&hctx->tags->active_queues);
>         }
> 
>         if (!users)
>                 return true;
> 
>         /*
>          * Allow at least some tags
>          */
>         depth = max((bt->sb.depth + users - 1) / users, 4U);
>         return __blk_mq_active_requests(hctx) < depth;
> }
> 
> Thanks,
> Can Guo.
> 
>> 
>> The shared tag set code in the block layer keeps track of the number 
>> of
>> active request queues. blk_mq_tag_busy() is called before a request is
>> queued onto a hwq and blk_mq_tag_idle() is called some time after the 
>> hwq
>> became idle. blk_mq_tag_idle() is called from inside 
>> blk_mq_timeout_work().
>> Hence, blk_mq_tag_idle() is only called if a timer is associated with 
>> each
>> request that is submitted to a request queue that shares a tag set 
>> with
>> another request queue. Hence this patch that adds a 
>> blk_mq_start_request()
>> call in ufshcd_exec_dev_cmd(). This patch doubles the queue depth on 
>> my
>> test setup from 16 to 32.
>> 
>> In addition to increasing the usable queue depth, also fix the
>> documentation of the 'timeout' parameter in the header above
>> ufshcd_exec_dev_cmd().
>> 
>> Cc: Can Guo <cang@codeaurora.org>
>> Cc: Alim Akhtar <alim.akhtar@samsung.com>
>> Cc: Avri Altman <avri.altman@wdc.com>
>> Cc: Stanley Chu <stanley.chu@mediatek.com>
>> Cc: Bean Huo <beanhuo@micron.com>
>> Cc: Adrian Hunter <adrian.hunter@intel.com>
>> Fixes: 7252a3603015 ("scsi: ufs: Avoid busy-waiting by eliminating tag
>> conflicts")
>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>> ---
>>  drivers/scsi/ufs/ufshcd.c | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
>> index c96e36aab989..e669243354da 100644
>> --- a/drivers/scsi/ufs/ufshcd.c
>> +++ b/drivers/scsi/ufs/ufshcd.c
>> @@ -2838,7 +2838,7 @@ static int ufshcd_wait_for_dev_cmd(struct 
>> ufs_hba *hba,
>>   * ufshcd_exec_dev_cmd - API for sending device management requests
>>   * @hba: UFS hba
>>   * @cmd_type: specifies the type (NOP, Query...)
>> - * @timeout: time in seconds
>> + * @timeout: timeout in milliseconds
>>   *
>>   * NOTE: Since there is only one available tag for device management 
>> commands,
>>   * it is expected you hold the hba->dev_cmd.lock mutex.
>> @@ -2868,6 +2868,9 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba 
>> *hba,
>>  	}
>>  	tag = req->tag;
>>  	WARN_ON_ONCE(!ufshcd_valid_tag(hba, tag));
>> +	/* Set the timeout such that the SCSI error handler is not 
>> activated. */
>> +	req->timeout = msecs_to_jiffies(2 * timeout);
>> +	blk_mq_start_request(req);
>> 
>>  	init_completion(&wait);
>>  	lrbp = &hba->lrb[tag];

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ufs: Increase the usable queue depth
  2021-05-13 16:49 [PATCH] ufs: Increase the usable queue depth Bart Van Assche
  2021-05-14  4:04 ` Can Guo
@ 2021-05-15  3:13 ` Martin K. Petersen
  2021-06-29 13:40 ` Can Guo
  2 siblings, 0 replies; 8+ messages in thread
From: Martin K. Petersen @ 2021-05-15  3:13 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Martin K . Petersen, Avri Altman, Alim Akhtar, Adrian Hunter,
	Asutosh Das, Jaegeuk Kim, Stanley Chu, Vignesh Raghavendra,
	linux-scsi, James E . J . Bottomley, Can Guo, Bean Huo

On Thu, 13 May 2021 09:49:12 -0700, Bart Van Assche wrote:

> With the current implementation of the UFS driver active_queues is 1
> instead of 0 if all UFS request queues are idle. That causes
> hctx_may_queue() to divide the queue depth by 2 when queueing a request
> and hence reduces the usable queue depth.
> 
> The shared tag set code in the block layer keeps track of the number of
> active request queues. blk_mq_tag_busy() is called before a request is
> queued onto a hwq and blk_mq_tag_idle() is called some time after the hwq
> became idle. blk_mq_tag_idle() is called from inside blk_mq_timeout_work().
> Hence, blk_mq_tag_idle() is only called if a timer is associated with each
> request that is submitted to a request queue that shares a tag set with
> another request queue. Hence this patch that adds a blk_mq_start_request()
> call in ufshcd_exec_dev_cmd(). This patch doubles the queue depth on my
> test setup from 16 to 32.
> 
> [...]

Applied to 5.13/scsi-fixes, thanks!

[1/1] ufs: Increase the usable queue depth
      https://git.kernel.org/mkp/scsi/c/d0b2b70eb12e

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ufs: Increase the usable queue depth
  2021-05-13 16:49 [PATCH] ufs: Increase the usable queue depth Bart Van Assche
  2021-05-14  4:04 ` Can Guo
  2021-05-15  3:13 ` Martin K. Petersen
@ 2021-06-29 13:40 ` Can Guo
  2 siblings, 0 replies; 8+ messages in thread
From: Can Guo @ 2021-06-29 13:40 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Martin K . Petersen, James E . J . Bottomley, Jaegeuk Kim,
	Bean Huo, Avri Altman, Asutosh Das, Vignesh Raghavendra,
	linux-scsi, Alim Akhtar, Stanley Chu, Adrian Hunter

Hi Bart,

On 2021-05-14 00:49, Bart Van Assche wrote:
> With the current implementation of the UFS driver active_queues is 1
> instead of 0 if all UFS request queues are idle. That causes
> hctx_may_queue() to divide the queue depth by 2 when queueing a request
> and hence reduces the usable queue depth.
> 
> The shared tag set code in the block layer keeps track of the number of
> active request queues. blk_mq_tag_busy() is called before a request is
> queued onto a hwq and blk_mq_tag_idle() is called some time after the 
> hwq
> became idle. blk_mq_tag_idle() is called from inside 
> blk_mq_timeout_work().
> Hence, blk_mq_tag_idle() is only called if a timer is associated with 
> each
> request that is submitted to a request queue that shares a tag set with
> another request queue. Hence this patch that adds a 
> blk_mq_start_request()
> call in ufshcd_exec_dev_cmd(). This patch doubles the queue depth on my
> test setup from 16 to 32.
> 
> In addition to increasing the usable queue depth, also fix the
> documentation of the 'timeout' parameter in the header above
> ufshcd_exec_dev_cmd().
> 
> Cc: Can Guo <cang@codeaurora.org>
> Cc: Alim Akhtar <alim.akhtar@samsung.com>
> Cc: Avri Altman <avri.altman@wdc.com>
> Cc: Stanley Chu <stanley.chu@mediatek.com>
> Cc: Bean Huo <beanhuo@micron.com>
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Fixes: 7252a3603015 ("scsi: ufs: Avoid busy-waiting by eliminating tag
> conflicts")
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/scsi/ufs/ufshcd.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index c96e36aab989..e669243354da 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -2838,7 +2838,7 @@ static int ufshcd_wait_for_dev_cmd(struct ufs_hba 
> *hba,
>   * ufshcd_exec_dev_cmd - API for sending device management requests
>   * @hba: UFS hba
>   * @cmd_type: specifies the type (NOP, Query...)
> - * @timeout: time in seconds
> + * @timeout: timeout in milliseconds
>   *
>   * NOTE: Since there is only one available tag for device management 
> commands,
>   * it is expected you hold the hba->dev_cmd.lock mutex.
> @@ -2868,6 +2868,9 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba 
> *hba,
>  	}
>  	tag = req->tag;
>  	WARN_ON_ONCE(!ufshcd_valid_tag(hba, tag));
> +	/* Set the timeout such that the SCSI error handler is not activated. 
> */
> +	req->timeout = msecs_to_jiffies(2 * timeout);
> +	blk_mq_start_request(req);
> 
>  	init_completion(&wait);
>  	lrbp = &hba->lrb[tag];

We found a regression after this change gets merged -

schedule
blk_mq_get_tag
__blk_mq_alloc_request
blk_get_request
ufshcd_exec_dev_cmd
ufshcd_query_flag
ufshcd_wb_ctrl
ufshcd_devfreq_scale
ufshcd_devfreq_target
devfreq_set_target
update_devfreq
devfreq_monitor
process_one_work
worker_thread
kthread
ret_from_fork

Since ufshcd_devfreq_scale() blocks scsi requests,
when ufshcd_wb_ctrl() runs, if it cannot get a free
tag (all tags are taken by normal requests), then
ufshcd_devfreq_scale() gets stuck, thus scsi layer
stays blocked, which leads to I/O hung. Maybe consider
unblocking scsi requests before call ufshcd_wb_ctrl()?

Thanks,

Can Guo.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-06-29 13:41 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-05-13 16:49 [PATCH] ufs: Increase the usable queue depth Bart Van Assche
2021-05-14  4:04 ` Can Guo
2021-05-14  4:19   ` Bart Van Assche
2021-05-14  4:24     ` Can Guo
2021-05-14  4:47       ` Can Guo
2021-05-14  4:22   ` Can Guo
2021-05-15  3:13 ` Martin K. Petersen
2021-06-29 13:40 ` Can Guo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).