* Re: [PATCH] ufs: Increase the usable queue depth
2021-05-13 16:49 [PATCH] ufs: Increase the usable queue depth Bart Van Assche
@ 2021-05-14 4:04 ` Can Guo
2021-05-14 4:19 ` Bart Van Assche
2021-05-14 4:22 ` Can Guo
2021-05-15 3:13 ` Martin K. Petersen
2021-06-29 13:40 ` Can Guo
2 siblings, 2 replies; 8+ messages in thread
From: Can Guo @ 2021-05-14 4:04 UTC (permalink / raw)
To: Bart Van Assche
Cc: Martin K . Petersen, James E . J . Bottomley, Jaegeuk Kim,
Bean Huo, Avri Altman, Asutosh Das, Vignesh Raghavendra,
linux-scsi, Alim Akhtar, Stanley Chu, Adrian Hunter
Hi Bart,
On 2021-05-14 00:49, Bart Van Assche wrote:
> With the current implementation of the UFS driver active_queues is 1
> instead of 0 if all UFS request queues are idle. That causes
> hctx_may_queue() to divide the queue depth by 2 when queueing a request
> and hence reduces the usable queue depth.
This is interesting. When all UFS queues are idle, in hctx_may_queue(),
active_queues reads 1 (users == 1, depth == 32), where is it divided by
2?
static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
struct sbitmap_queue *bt)
{
unsigned int depth, users;
....
users = atomic_read(&hctx->tags->active_queues);
}
if (!users)
return true;
/*
* Allow at least some tags
*/
depth = max((bt->sb.depth + users - 1) / users, 4U);
return __blk_mq_active_requests(hctx) < depth;
}
Thanks,
Can Guo.
>
> The shared tag set code in the block layer keeps track of the number of
> active request queues. blk_mq_tag_busy() is called before a request is
> queued onto a hwq and blk_mq_tag_idle() is called some time after the
> hwq
> became idle. blk_mq_tag_idle() is called from inside
> blk_mq_timeout_work().
> Hence, blk_mq_tag_idle() is only called if a timer is associated with
> each
> request that is submitted to a request queue that shares a tag set with
> another request queue. Hence this patch that adds a
> blk_mq_start_request()
> call in ufshcd_exec_dev_cmd(). This patch doubles the queue depth on my
> test setup from 16 to 32.
>
> In addition to increasing the usable queue depth, also fix the
> documentation of the 'timeout' parameter in the header above
> ufshcd_exec_dev_cmd().
>
> Cc: Can Guo <cang@codeaurora.org>
> Cc: Alim Akhtar <alim.akhtar@samsung.com>
> Cc: Avri Altman <avri.altman@wdc.com>
> Cc: Stanley Chu <stanley.chu@mediatek.com>
> Cc: Bean Huo <beanhuo@micron.com>
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Fixes: 7252a3603015 ("scsi: ufs: Avoid busy-waiting by eliminating tag
> conflicts")
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
> drivers/scsi/ufs/ufshcd.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index c96e36aab989..e669243354da 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -2838,7 +2838,7 @@ static int ufshcd_wait_for_dev_cmd(struct ufs_hba
> *hba,
> * ufshcd_exec_dev_cmd - API for sending device management requests
> * @hba: UFS hba
> * @cmd_type: specifies the type (NOP, Query...)
> - * @timeout: time in seconds
> + * @timeout: timeout in milliseconds
> *
> * NOTE: Since there is only one available tag for device management
> commands,
> * it is expected you hold the hba->dev_cmd.lock mutex.
> @@ -2868,6 +2868,9 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba
> *hba,
> }
> tag = req->tag;
> WARN_ON_ONCE(!ufshcd_valid_tag(hba, tag));
> + /* Set the timeout such that the SCSI error handler is not activated.
> */
> + req->timeout = msecs_to_jiffies(2 * timeout);
> + blk_mq_start_request(req);
>
> init_completion(&wait);
> lrbp = &hba->lrb[tag];
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] ufs: Increase the usable queue depth
2021-05-14 4:04 ` Can Guo
@ 2021-05-14 4:19 ` Bart Van Assche
2021-05-14 4:24 ` Can Guo
2021-05-14 4:22 ` Can Guo
1 sibling, 1 reply; 8+ messages in thread
From: Bart Van Assche @ 2021-05-14 4:19 UTC (permalink / raw)
To: Can Guo
Cc: Martin K . Petersen, James E . J . Bottomley, Jaegeuk Kim,
Bean Huo, Avri Altman, Asutosh Das, Vignesh Raghavendra,
linux-scsi, Alim Akhtar, Stanley Chu, Adrian Hunter
On 5/13/21 9:04 PM, Can Guo wrote:
> Hi Bart,
>
> On 2021-05-14 00:49, Bart Van Assche wrote:
>> With the current implementation of the UFS driver active_queues is 1
>> instead of 0 if all UFS request queues are idle. That causes
>> hctx_may_queue() to divide the queue depth by 2 when queueing a request
>> and hence reduces the usable queue depth.
>
> This is interesting. When all UFS queues are idle, in hctx_may_queue(),
> active_queues reads 1 (users == 1, depth == 32), where is it divided by 2?
>
> static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
> struct sbitmap_queue *bt)
> {
> unsigned int depth, users;
>
> ....
> users = atomic_read(&hctx->tags->active_queues);
> }
>
> if (!users)
> return true;
>
> /*
> * Allow at least some tags
> */
> depth = max((bt->sb.depth + users - 1) / users, 4U);
> return __blk_mq_active_requests(hctx) < depth;
> }
Hi Can,
If no I/O scheduler has been configured then the active_queues counter
is increased from inside blk_get_request() by blk_mq_tag_busy() before
hctx_may_queue() is called. So if active_queues == 1 when the UFS device
is idle, the active_queues counter will be increased to 2 if a request
is submitted to another request queue than hba->cmd_queue. This will
cause the hctx_may_queue() calls from inside __blk_mq_alloc_request()
and __blk_mq_get_driver_tag() to limit the queue depth to 32 / 2 = 16.
If an I/O scheduler has been configured then __blk_mq_get_driver_tag()
will be the first function to call blk_mq_tag_busy() while processing a
request. The hctx_may_queue() call in __blk_mq_get_driver_tag() will
limit the queue depth to 32 / 2 = 16 if an I/O scheduler has been
configured.
Bart.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] ufs: Increase the usable queue depth
2021-05-14 4:19 ` Bart Van Assche
@ 2021-05-14 4:24 ` Can Guo
2021-05-14 4:47 ` Can Guo
0 siblings, 1 reply; 8+ messages in thread
From: Can Guo @ 2021-05-14 4:24 UTC (permalink / raw)
To: Bart Van Assche
Cc: Martin K . Petersen, James E . J . Bottomley, Jaegeuk Kim,
Bean Huo, Avri Altman, Asutosh Das, Vignesh Raghavendra,
linux-scsi, Alim Akhtar, Stanley Chu, Adrian Hunter
On 2021-05-14 12:19, Bart Van Assche wrote:
> On 5/13/21 9:04 PM, Can Guo wrote:
>> Hi Bart,
>>
>> On 2021-05-14 00:49, Bart Van Assche wrote:
>>> With the current implementation of the UFS driver active_queues is 1
>>> instead of 0 if all UFS request queues are idle. That causes
>>> hctx_may_queue() to divide the queue depth by 2 when queueing a
>>> request
>>> and hence reduces the usable queue depth.
>>
>> This is interesting. When all UFS queues are idle, in
>> hctx_may_queue(),
>> active_queues reads 1 (users == 1, depth == 32), where is it divided
>> by 2?
>>
>> static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
>> struct sbitmap_queue *bt)
>> {
>> unsigned int depth, users;
>>
>> ....
>> users = atomic_read(&hctx->tags->active_queues);
>> }
>>
>> if (!users)
>> return true;
>>
>> /*
>> * Allow at least some tags
>> */
>> depth = max((bt->sb.depth + users - 1) / users, 4U);
>> return __blk_mq_active_requests(hctx) < depth;
>> }
>
> Hi Can,
>
> If no I/O scheduler has been configured then the active_queues counter
> is increased from inside blk_get_request() by blk_mq_tag_busy() before
> hctx_may_queue() is called. So if active_queues == 1 when the UFS
> device
> is idle, the active_queues counter will be increased to 2 if a request
> is submitted to another request queue than hba->cmd_queue. This will
> cause the hctx_may_queue() calls from inside __blk_mq_alloc_request()
> and __blk_mq_get_driver_tag() to limit the queue depth to 32 / 2 = 16.
>
> If an I/O scheduler has been configured then __blk_mq_get_driver_tag()
> will be the first function to call blk_mq_tag_busy() while processing a
> request. The hctx_may_queue() call in __blk_mq_get_driver_tag() will
> limit the queue depth to 32 / 2 = 16 if an I/O scheduler has been
> configured.
>
> Bart.
Yes, I just figured out what you are saying from the commit message and
gave my reviewed-by tag. Thanks for the explanation and the fix.
Regards,
Can Guo.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] ufs: Increase the usable queue depth
2021-05-14 4:24 ` Can Guo
@ 2021-05-14 4:47 ` Can Guo
0 siblings, 0 replies; 8+ messages in thread
From: Can Guo @ 2021-05-14 4:47 UTC (permalink / raw)
To: Bart Van Assche
Cc: Martin K . Petersen, James E . J . Bottomley, Jaegeuk Kim,
Bean Huo, Avri Altman, Asutosh Das, Vignesh Raghavendra,
linux-scsi, Alim Akhtar, Stanley Chu, Adrian Hunter
On 2021-05-14 12:24, Can Guo wrote:
> On 2021-05-14 12:19, Bart Van Assche wrote:
>> On 5/13/21 9:04 PM, Can Guo wrote:
>>> Hi Bart,
>>>
>>> On 2021-05-14 00:49, Bart Van Assche wrote:
>>>> With the current implementation of the UFS driver active_queues is 1
>>>> instead of 0 if all UFS request queues are idle. That causes
>>>> hctx_may_queue() to divide the queue depth by 2 when queueing a
>>>> request
>>>> and hence reduces the usable queue depth.
>>>
>>> This is interesting. When all UFS queues are idle, in
>>> hctx_may_queue(),
>>> active_queues reads 1 (users == 1, depth == 32), where is it divided
>>> by 2?
>>>
>>> static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
>>> struct sbitmap_queue *bt)
>>> {
>>> unsigned int depth, users;
>>>
>>> ....
>>> users = atomic_read(&hctx->tags->active_queues);
>>> }
>>>
>>> if (!users)
>>> return true;
>>>
>>> /*
>>> * Allow at least some tags
>>> */
>>> depth = max((bt->sb.depth + users - 1) / users, 4U);
>>> return __blk_mq_active_requests(hctx) < depth;
>>> }
>>
>> Hi Can,
>>
>> If no I/O scheduler has been configured then the active_queues counter
>> is increased from inside blk_get_request() by blk_mq_tag_busy() before
>> hctx_may_queue() is called. So if active_queues == 1 when the UFS
>> device
>> is idle, the active_queues counter will be increased to 2 if a request
>> is submitted to another request queue than hba->cmd_queue. This will
>> cause the hctx_may_queue() calls from inside __blk_mq_alloc_request()
>> and __blk_mq_get_driver_tag() to limit the queue depth to 32 / 2 = 16.
>>
>> If an I/O scheduler has been configured then __blk_mq_get_driver_tag()
>> will be the first function to call blk_mq_tag_busy() while processing
>> a
>> request. The hctx_may_queue() call in __blk_mq_get_driver_tag() will
>> limit the queue depth to 32 / 2 = 16 if an I/O scheduler has been
>> configured.
>>
>> Bart.
>
> Yes, I just figured out what you are saying from the commit message and
> gave my reviewed-by tag. Thanks for the explanation and the fix.
>
> Regards,
> Can Guo.
We definitely need to have fix present on Android12-5.10,
because performance may be impacted without it...
Thanks,
Can Guo.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] ufs: Increase the usable queue depth
2021-05-14 4:04 ` Can Guo
2021-05-14 4:19 ` Bart Van Assche
@ 2021-05-14 4:22 ` Can Guo
1 sibling, 0 replies; 8+ messages in thread
From: Can Guo @ 2021-05-14 4:22 UTC (permalink / raw)
To: Bart Van Assche
Cc: Martin K . Petersen, James E . J . Bottomley, Jaegeuk Kim,
Bean Huo, Avri Altman, Asutosh Das, Vignesh Raghavendra,
linux-scsi, Alim Akhtar, Stanley Chu, Adrian Hunter
On 2021-05-14 12:04, Can Guo wrote:
> Hi Bart,
>
> On 2021-05-14 00:49, Bart Van Assche wrote:
>> With the current implementation of the UFS driver active_queues is 1
>> instead of 0 if all UFS request queues are idle. That causes
>> hctx_may_queue() to divide the queue depth by 2 when queueing a
>> request
>> and hence reduces the usable queue depth.
>
> This is interesting. When all UFS queues are idle, in hctx_may_queue(),
> active_queues reads 1 (users == 1, depth == 32), where is it divided by
> 2?
>
Are you saying that if we queue a new request on one of the UFS request
queues, since the active_queues is always above 0, we can never use
the full queue depth? If so, then I agree.
Reviewed-by: Can Guo <cang@codeaurora.org>
> static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
> struct sbitmap_queue *bt)
> {
> unsigned int depth, users;
>
> ....
> users = atomic_read(&hctx->tags->active_queues);
> }
>
> if (!users)
> return true;
>
> /*
> * Allow at least some tags
> */
> depth = max((bt->sb.depth + users - 1) / users, 4U);
> return __blk_mq_active_requests(hctx) < depth;
> }
>
> Thanks,
> Can Guo.
>
>>
>> The shared tag set code in the block layer keeps track of the number
>> of
>> active request queues. blk_mq_tag_busy() is called before a request is
>> queued onto a hwq and blk_mq_tag_idle() is called some time after the
>> hwq
>> became idle. blk_mq_tag_idle() is called from inside
>> blk_mq_timeout_work().
>> Hence, blk_mq_tag_idle() is only called if a timer is associated with
>> each
>> request that is submitted to a request queue that shares a tag set
>> with
>> another request queue. Hence this patch that adds a
>> blk_mq_start_request()
>> call in ufshcd_exec_dev_cmd(). This patch doubles the queue depth on
>> my
>> test setup from 16 to 32.
>>
>> In addition to increasing the usable queue depth, also fix the
>> documentation of the 'timeout' parameter in the header above
>> ufshcd_exec_dev_cmd().
>>
>> Cc: Can Guo <cang@codeaurora.org>
>> Cc: Alim Akhtar <alim.akhtar@samsung.com>
>> Cc: Avri Altman <avri.altman@wdc.com>
>> Cc: Stanley Chu <stanley.chu@mediatek.com>
>> Cc: Bean Huo <beanhuo@micron.com>
>> Cc: Adrian Hunter <adrian.hunter@intel.com>
>> Fixes: 7252a3603015 ("scsi: ufs: Avoid busy-waiting by eliminating tag
>> conflicts")
>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>> ---
>> drivers/scsi/ufs/ufshcd.c | 5 ++++-
>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
>> index c96e36aab989..e669243354da 100644
>> --- a/drivers/scsi/ufs/ufshcd.c
>> +++ b/drivers/scsi/ufs/ufshcd.c
>> @@ -2838,7 +2838,7 @@ static int ufshcd_wait_for_dev_cmd(struct
>> ufs_hba *hba,
>> * ufshcd_exec_dev_cmd - API for sending device management requests
>> * @hba: UFS hba
>> * @cmd_type: specifies the type (NOP, Query...)
>> - * @timeout: time in seconds
>> + * @timeout: timeout in milliseconds
>> *
>> * NOTE: Since there is only one available tag for device management
>> commands,
>> * it is expected you hold the hba->dev_cmd.lock mutex.
>> @@ -2868,6 +2868,9 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba
>> *hba,
>> }
>> tag = req->tag;
>> WARN_ON_ONCE(!ufshcd_valid_tag(hba, tag));
>> + /* Set the timeout such that the SCSI error handler is not
>> activated. */
>> + req->timeout = msecs_to_jiffies(2 * timeout);
>> + blk_mq_start_request(req);
>>
>> init_completion(&wait);
>> lrbp = &hba->lrb[tag];
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] ufs: Increase the usable queue depth
2021-05-13 16:49 [PATCH] ufs: Increase the usable queue depth Bart Van Assche
2021-05-14 4:04 ` Can Guo
@ 2021-05-15 3:13 ` Martin K. Petersen
2021-06-29 13:40 ` Can Guo
2 siblings, 0 replies; 8+ messages in thread
From: Martin K. Petersen @ 2021-05-15 3:13 UTC (permalink / raw)
To: Bart Van Assche
Cc: Martin K . Petersen, Avri Altman, Alim Akhtar, Adrian Hunter,
Asutosh Das, Jaegeuk Kim, Stanley Chu, Vignesh Raghavendra,
linux-scsi, James E . J . Bottomley, Can Guo, Bean Huo
On Thu, 13 May 2021 09:49:12 -0700, Bart Van Assche wrote:
> With the current implementation of the UFS driver active_queues is 1
> instead of 0 if all UFS request queues are idle. That causes
> hctx_may_queue() to divide the queue depth by 2 when queueing a request
> and hence reduces the usable queue depth.
>
> The shared tag set code in the block layer keeps track of the number of
> active request queues. blk_mq_tag_busy() is called before a request is
> queued onto a hwq and blk_mq_tag_idle() is called some time after the hwq
> became idle. blk_mq_tag_idle() is called from inside blk_mq_timeout_work().
> Hence, blk_mq_tag_idle() is only called if a timer is associated with each
> request that is submitted to a request queue that shares a tag set with
> another request queue. Hence this patch that adds a blk_mq_start_request()
> call in ufshcd_exec_dev_cmd(). This patch doubles the queue depth on my
> test setup from 16 to 32.
>
> [...]
Applied to 5.13/scsi-fixes, thanks!
[1/1] ufs: Increase the usable queue depth
https://git.kernel.org/mkp/scsi/c/d0b2b70eb12e
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] ufs: Increase the usable queue depth
2021-05-13 16:49 [PATCH] ufs: Increase the usable queue depth Bart Van Assche
2021-05-14 4:04 ` Can Guo
2021-05-15 3:13 ` Martin K. Petersen
@ 2021-06-29 13:40 ` Can Guo
2 siblings, 0 replies; 8+ messages in thread
From: Can Guo @ 2021-06-29 13:40 UTC (permalink / raw)
To: Bart Van Assche
Cc: Martin K . Petersen, James E . J . Bottomley, Jaegeuk Kim,
Bean Huo, Avri Altman, Asutosh Das, Vignesh Raghavendra,
linux-scsi, Alim Akhtar, Stanley Chu, Adrian Hunter
Hi Bart,
On 2021-05-14 00:49, Bart Van Assche wrote:
> With the current implementation of the UFS driver active_queues is 1
> instead of 0 if all UFS request queues are idle. That causes
> hctx_may_queue() to divide the queue depth by 2 when queueing a request
> and hence reduces the usable queue depth.
>
> The shared tag set code in the block layer keeps track of the number of
> active request queues. blk_mq_tag_busy() is called before a request is
> queued onto a hwq and blk_mq_tag_idle() is called some time after the
> hwq
> became idle. blk_mq_tag_idle() is called from inside
> blk_mq_timeout_work().
> Hence, blk_mq_tag_idle() is only called if a timer is associated with
> each
> request that is submitted to a request queue that shares a tag set with
> another request queue. Hence this patch that adds a
> blk_mq_start_request()
> call in ufshcd_exec_dev_cmd(). This patch doubles the queue depth on my
> test setup from 16 to 32.
>
> In addition to increasing the usable queue depth, also fix the
> documentation of the 'timeout' parameter in the header above
> ufshcd_exec_dev_cmd().
>
> Cc: Can Guo <cang@codeaurora.org>
> Cc: Alim Akhtar <alim.akhtar@samsung.com>
> Cc: Avri Altman <avri.altman@wdc.com>
> Cc: Stanley Chu <stanley.chu@mediatek.com>
> Cc: Bean Huo <beanhuo@micron.com>
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Fixes: 7252a3603015 ("scsi: ufs: Avoid busy-waiting by eliminating tag
> conflicts")
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
> drivers/scsi/ufs/ufshcd.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index c96e36aab989..e669243354da 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -2838,7 +2838,7 @@ static int ufshcd_wait_for_dev_cmd(struct ufs_hba
> *hba,
> * ufshcd_exec_dev_cmd - API for sending device management requests
> * @hba: UFS hba
> * @cmd_type: specifies the type (NOP, Query...)
> - * @timeout: time in seconds
> + * @timeout: timeout in milliseconds
> *
> * NOTE: Since there is only one available tag for device management
> commands,
> * it is expected you hold the hba->dev_cmd.lock mutex.
> @@ -2868,6 +2868,9 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba
> *hba,
> }
> tag = req->tag;
> WARN_ON_ONCE(!ufshcd_valid_tag(hba, tag));
> + /* Set the timeout such that the SCSI error handler is not activated.
> */
> + req->timeout = msecs_to_jiffies(2 * timeout);
> + blk_mq_start_request(req);
>
> init_completion(&wait);
> lrbp = &hba->lrb[tag];
We found a regression after this change gets merged -
schedule
blk_mq_get_tag
__blk_mq_alloc_request
blk_get_request
ufshcd_exec_dev_cmd
ufshcd_query_flag
ufshcd_wb_ctrl
ufshcd_devfreq_scale
ufshcd_devfreq_target
devfreq_set_target
update_devfreq
devfreq_monitor
process_one_work
worker_thread
kthread
ret_from_fork
Since ufshcd_devfreq_scale() blocks scsi requests,
when ufshcd_wb_ctrl() runs, if it cannot get a free
tag (all tags are taken by normal requests), then
ufshcd_devfreq_scale() gets stuck, thus scsi layer
stays blocked, which leads to I/O hung. Maybe consider
unblocking scsi requests before call ufshcd_wb_ctrl()?
Thanks,
Can Guo.
^ permalink raw reply [flat|nested] 8+ messages in thread