[PATCH v2] block: reduce kblockd_mod_delayed_work

public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2] block: reduce kblockd_mod_delayed_work_on() CPU consumption
@ 2021-12-14 20:49 Jens Axboe
  2021-12-15  2:51 ` Ming Lei
  2021-12-15 10:25 ` John Garry
  0 siblings, 2 replies; 5+ messages in thread
From: Jens Axboe @ 2021-12-14 20:49 UTC (permalink / raw)
  To: linux-block@vger.kernel.org; +Cc: Dexuan Cui, Ming Lei

Dexuan reports that he's seeing spikes of very heavy CPU utilization when
running 24 disks and using the 'none' scheduler. This happens off the
sched restart path, because SCSI requires the queue to be restarted async,
and hence we're hammering on mod_delayed_work_on() to ensure that the work
item gets run appropriately.

Avoid hammering on the timer and just use queue_work_on() if no delay
has been specified.

Reported-and-tested-by: Dexuan Cui <decui@microsoft.com>
Link: https://lore.kernel.org/linux-block/BYAPR21MB1270C598ED214C0490F47400BF719@BYAPR21MB1270.namprd21.prod.outlook.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk>

---

diff --git a/block/blk-core.c b/block/blk-core.c
index 1378d084c770..c1833f95cb97 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1484,6 +1484,8 @@ EXPORT_SYMBOL(kblockd_schedule_work);
 int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork,
 				unsigned long delay)
 {
+	if (!delay)
+		return queue_work_on(cpu, kblockd_workqueue, &dwork->work);
 	return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay);
 }
 EXPORT_SYMBOL(kblockd_mod_delayed_work_on);

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] block: reduce kblockd_mod_delayed_work_on() CPU consumption
  2021-12-14 20:49 [PATCH v2] block: reduce kblockd_mod_delayed_work_on() CPU consumption Jens Axboe
@ 2021-12-15  2:51 ` Ming Lei
  2021-12-15 10:25 ` John Garry
  1 sibling, 0 replies; 5+ messages in thread
From: Ming Lei @ 2021-12-15  2:51 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block@vger.kernel.org, Dexuan Cui

On Tue, Dec 14, 2021 at 01:49:34PM -0700, Jens Axboe wrote:
> Dexuan reports that he's seeing spikes of very heavy CPU utilization when
> running 24 disks and using the 'none' scheduler. This happens off the
> sched restart path, because SCSI requires the queue to be restarted async,
> and hence we're hammering on mod_delayed_work_on() to ensure that the work
> item gets run appropriately.
> 
> Avoid hammering on the timer and just use queue_work_on() if no delay
> has been specified.
> 
> Reported-and-tested-by: Dexuan Cui <decui@microsoft.com>
> Link: https://lore.kernel.org/linux-block/BYAPR21MB1270C598ED214C0490F47400BF719@BYAPR21MB1270.namprd21.prod.outlook.com/
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> ---
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 1378d084c770..c1833f95cb97 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1484,6 +1484,8 @@ EXPORT_SYMBOL(kblockd_schedule_work);
>  int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork,
>  				unsigned long delay)
>  {
> +	if (!delay)
> +		return queue_work_on(cpu, kblockd_workqueue, &dwork->work);
>  	return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay);

Reviewed-by: Ming Lei <ming.lei@redhat.com>


Thanks,
Ming


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] block: reduce kblockd_mod_delayed_work_on() CPU consumption
  2021-12-14 20:49 [PATCH v2] block: reduce kblockd_mod_delayed_work_on() CPU consumption Jens Axboe
  2021-12-15  2:51 ` Ming Lei
@ 2021-12-15 10:25 ` John Garry
  2021-12-15 15:47   ` Jens Axboe
  1 sibling, 1 reply; 5+ messages in thread
From: John Garry @ 2021-12-15 10:25 UTC (permalink / raw)
  To: Jens Axboe, linux-block@vger.kernel.org; +Cc: Dexuan Cui, Ming Lei

On 14/12/2021 20:49, Jens Axboe wrote:
> Dexuan reports that he's seeing spikes of very heavy CPU utilization when
> running 24 disks and using the 'none' scheduler. This happens off the
> sched restart path, because SCSI requires the queue to be restarted async,
> and hence we're hammering on mod_delayed_work_on() to ensure that the work
> item gets run appropriately.
> 
> Avoid hammering on the timer and just use queue_work_on() if no delay
> has been specified.
> 
> Reported-and-tested-by: Dexuan Cui <decui@microsoft.com>
> Link: https://lore.kernel.org/linux-block/BYAPR21MB1270C598ED214C0490F47400BF719@BYAPR21MB1270.namprd21.prod.outlook.com/
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> ---
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 1378d084c770..c1833f95cb97 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1484,6 +1484,8 @@ EXPORT_SYMBOL(kblockd_schedule_work);
>   int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork,
>   				unsigned long delay)
>   {
> +	if (!delay)
> +		return queue_work_on(cpu, kblockd_workqueue, &dwork->work);
>   	return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay);
>   }
>   EXPORT_SYMBOL(kblockd_mod_delayed_work_on);
> 

Hi Jens,

I have a related comment on the current code and interface it uses, if 
you don't mind, as I did wonder if we are doing a msec_to_jiffies(0 [not 
built-in const]) call somewhere.

So we pass msecs to blk-mq.c, and we do a msec_to_jiffies() call on it 
before calling kblockd_mod_delayed_work_on(). Now most/all callsites 
uses const value for the msec value, so if we did the msec_to_jiffies() 
conversion at the callsites and passed a jiffies value, it should be 
compiled out by gcc. This is my current __blk_mq_delay_run_hw_queue 
assembler:

0000000000001ef0 <__blk_mq_delay_run_hw_queue>:
     [snip]
     2024: a942dfb6 ldp x22, x23, [x29, #40]
     2028: 2a1503e0 mov w0, w21
     202c: 94000000 bl 0 <__msecs_to_jiffies>
kblockd_mod_delayed_work_on(blk_mq_hctx_next_cpu(hctx), &hctx->run_work,
     2030: aa0003e2 mov x2, x0
     2034: 91010261 add x1, x19, #0x40
     2038: 2a1403e0 mov w0, w20
     203c: 94000000 bl 0 <kblockd_mod_delayed_work_on>

I'm not sure if you would want to change so many APIs or if jiffies is 
sensible to pass or even any performance gain. Additionally Function 
blk_mq_delay_kick_requeue_list() would not see so much gain in such a 
change as msec value is not const. Any thoughts? Maybe testing 
performance would not do much harm.

Thanks,
John

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] block: reduce kblockd_mod_delayed_work_on() CPU consumption
  2021-12-15 10:25 ` John Garry
@ 2021-12-15 15:47   ` Jens Axboe
  2021-12-16 12:43     ` John Garry
  0 siblings, 1 reply; 5+ messages in thread
From: Jens Axboe @ 2021-12-15 15:47 UTC (permalink / raw)
  To: John Garry, linux-block@vger.kernel.org; +Cc: Dexuan Cui, Ming Lei

On 12/15/21 3:25 AM, John Garry wrote:
> On 14/12/2021 20:49, Jens Axboe wrote:
>> Dexuan reports that he's seeing spikes of very heavy CPU utilization when
>> running 24 disks and using the 'none' scheduler. This happens off the
>> sched restart path, because SCSI requires the queue to be restarted async,
>> and hence we're hammering on mod_delayed_work_on() to ensure that the work
>> item gets run appropriately.
>>
>> Avoid hammering on the timer and just use queue_work_on() if no delay
>> has been specified.
>>
>> Reported-and-tested-by: Dexuan Cui <decui@microsoft.com>
>> Link: https://lore.kernel.org/linux-block/BYAPR21MB1270C598ED214C0490F47400BF719@BYAPR21MB1270.namprd21.prod.outlook.com/
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>
>> ---
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index 1378d084c770..c1833f95cb97 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -1484,6 +1484,8 @@ EXPORT_SYMBOL(kblockd_schedule_work);
>>   int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork,
>>   				unsigned long delay)
>>   {
>> +	if (!delay)
>> +		return queue_work_on(cpu, kblockd_workqueue, &dwork->work);
>>   	return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay);
>>   }
>>   EXPORT_SYMBOL(kblockd_mod_delayed_work_on);
>>
> 
> Hi Jens,
> 
> I have a related comment on the current code and interface it uses, if 
> you don't mind, as I did wonder if we are doing a msec_to_jiffies(0 [not 
> built-in const]) call somewhere.
> 
> So we pass msecs to blk-mq.c, and we do a msec_to_jiffies() call on it 
> before calling kblockd_mod_delayed_work_on(). Now most/all callsites 
> uses const value for the msec value, so if we did the msec_to_jiffies() 
> conversion at the callsites and passed a jiffies value, it should be 
> compiled out by gcc. This is my current __blk_mq_delay_run_hw_queue 
> assembler:
> 
> 0000000000001ef0 <__blk_mq_delay_run_hw_queue>:
>      [snip]
>      2024: a942dfb6 ldp x22, x23, [x29, #40]
>      2028: 2a1503e0 mov w0, w21
>      202c: 94000000 bl 0 <__msecs_to_jiffies>
> kblockd_mod_delayed_work_on(blk_mq_hctx_next_cpu(hctx), &hctx->run_work,
>      2030: aa0003e2 mov x2, x0
>      2034: 91010261 add x1, x19, #0x40
>      2038: 2a1403e0 mov w0, w20
>      203c: 94000000 bl 0 <kblockd_mod_delayed_work_on>
> 
> I'm not sure if you would want to change so many APIs or if jiffies is 
> sensible to pass or even any performance gain. Additionally Function 
> blk_mq_delay_kick_requeue_list() would not see so much gain in such a 
> change as msec value is not const. Any thoughts? Maybe testing 
> performance would not do much harm.

In general I totally agree with you, it'd be smarter to flip the
conversion so it can be done in a more efficient manner. At the same
time, the queue delay running is not at all a fast path, so shouldn't
really matter in practice.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] block: reduce kblockd_mod_delayed_work_on() CPU consumption
  2021-12-15 15:47   ` Jens Axboe
@ 2021-12-16 12:43     ` John Garry
  0 siblings, 0 replies; 5+ messages in thread
From: John Garry @ 2021-12-16 12:43 UTC (permalink / raw)
  To: Jens Axboe, linux-block@vger.kernel.org; +Cc: Dexuan Cui, Ming Lei

On 15/12/2021 15:47, Jens Axboe wrote:
>> 0000000000001ef0 <__blk_mq_delay_run_hw_queue>:
>>       [snip]
>>       2024: a942dfb6 ldp x22, x23, [x29, #40]
>>       2028: 2a1503e0 mov w0, w21
>>       202c: 94000000 bl 0 <__msecs_to_jiffies>
>> kblockd_mod_delayed_work_on(blk_mq_hctx_next_cpu(hctx), &hctx->run_work,
>>       2030: aa0003e2 mov x2, x0
>>       2034: 91010261 add x1, x19, #0x40
>>       2038: 2a1403e0 mov w0, w20
>>       203c: 94000000 bl 0 <kblockd_mod_delayed_work_on>
>>
>> I'm not sure if you would want to change so many APIs or if jiffies is
>> sensible to pass or even any performance gain. Additionally Function
>> blk_mq_delay_kick_requeue_list() would not see so much gain in such a
>> change as msec value is not const. Any thoughts? Maybe testing
>> performance would not do much harm.
> In general I totally agree with you, it'd be smarter to flip the
> conversion so it can be done in a more efficient manner.


> At the same
> time, the queue delay running is not at all a fast path, so shouldn't
> really matter in practice.

ok, I just thought that from checking your change that we have a 
frequent msec_to_jiffies(0 [non const]) call in 
__blk_mq_delay_run_hw_queue() -> kblockd_mod_delayed_work_on().

Thanks,
John

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-12-16 12:44 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-12-14 20:49 [PATCH v2] block: reduce kblockd_mod_delayed_work_on() CPU consumption Jens Axboe
2021-12-15  2:51 ` Ming Lei
2021-12-15 10:25 ` John Garry
2021-12-15 15:47   ` Jens Axboe
2021-12-16 12:43     ` John Garry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox