Re: [PATCH] block: reduce kblockd_mod_delayed_work

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] block: reduce kblockd_mod_delayed_work_on() CPU consumption
       [not found] <bc529a3e-31d5-c266-8633-91095b346b19@kernel.dk>
@ 2021-12-14 15:04 ` Christoph Hellwig
  2021-12-14 15:59   ` Jens Axboe
  0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2021-12-14 15:04 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block@vger.kernel.org, Dexuan Cui, Ming Lei, linux-scsi,
	Tejun Heo, Lai Jiangshan

On Tue, Dec 14, 2021 at 07:53:46AM -0700, Jens Axboe wrote:
> Dexuan reports that he's seeing spikes of very heavy CPU utilization when
> running 24 disks and using the 'none' scheduler. This happens off the
> flush path, because SCSI requires the queue to be restarted async, and
> hence we're hammering on mod_delayed_work_on() to ensure that the work
> item gets run appropriately.
> 
> What we care about here is that the queue is run, and we don't need to
> repeatedly re-arm the timer associated with the delayed work item. If we
> check if the work item is pending upfront, then we don't really need to do
> anything else. This is safe as theh work pending bit is cleared before a
> work item is started.
> 
> The only potential caveat here is if we have callers with wildly different
> timeouts specified. That's generally not the case, so don't think we need
> to care for that case.

So why not do a non-delayed queue_work for that case?  Might be good
to get the scsi and workqueue maintaines involved to understand the
issue a bit better first.

> 
> Reported-by: Dexuan Cui <decui@microsoft.com>
> Link: https://lore.kernel.org/linux-block/BYAPR21MB1270C598ED214C0490F47400BF719@BYAPR21MB1270.namprd21.prod.outlook.com/
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> ---
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 1378d084c770..4584fe709c15 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1484,7 +1484,16 @@ EXPORT_SYMBOL(kblockd_schedule_work);
>  int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork,
>  				unsigned long delay)
>  {
> -	return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay);
> +	/*
> +	 * Avoid hammering on work addition, if the work item is already
> +	 * pending. This is safe the work pending state is cleared before
> +	 * the work item is started, so if we see it set, then we know that
> +	 * whatever was previously queued on the block side will get run by
> +	 * an existing pending work item.
> +	 */
> +	if (!work_pending(&dwork->work))
> +		return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay);
> +	return true;
>  }
>  EXPORT_SYMBOL(kblockd_mod_delayed_work_on);
>  
> -- 
> Jens Axboe
> 
---end quoted text---

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] block: reduce kblockd_mod_delayed_work_on() CPU consumption
  2021-12-14 15:04 ` [PATCH] block: reduce kblockd_mod_delayed_work_on() CPU consumption Christoph Hellwig
@ 2021-12-14 15:59   ` Jens Axboe
  2021-12-14 20:42     ` Dexuan Cui
  2021-12-15 17:40     ` Bart Van Assche
  0 siblings, 2 replies; 5+ messages in thread
From: Jens Axboe @ 2021-12-14 15:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-block@vger.kernel.org, Dexuan Cui, Ming Lei, linux-scsi,
	Tejun Heo, Lai Jiangshan

On 12/14/21 8:04 AM, Christoph Hellwig wrote:
> On Tue, Dec 14, 2021 at 07:53:46AM -0700, Jens Axboe wrote:
>> Dexuan reports that he's seeing spikes of very heavy CPU utilization when
>> running 24 disks and using the 'none' scheduler. This happens off the
>> flush path, because SCSI requires the queue to be restarted async, and
>> hence we're hammering on mod_delayed_work_on() to ensure that the work
>> item gets run appropriately.
>>
>> What we care about here is that the queue is run, and we don't need to
>> repeatedly re-arm the timer associated with the delayed work item. If we
>> check if the work item is pending upfront, then we don't really need to do
>> anything else. This is safe as theh work pending bit is cleared before a
>> work item is started.
>>
>> The only potential caveat here is if we have callers with wildly different
>> timeouts specified. That's generally not the case, so don't think we need
>> to care for that case.
> 
> So why not do a non-delayed queue_work for that case?  Might be good
> to get the scsi and workqueue maintaines involved to understand the
> issue a bit better first.

We can probably get by with doing just that, and just ignore if a delayed
work timer is already running.

Dexuan, can you try this one?

diff --git a/block/blk-core.c b/block/blk-core.c
index 1378d084c770..c1833f95cb97 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1484,6 +1484,8 @@ EXPORT_SYMBOL(kblockd_schedule_work);
 int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork,
 				unsigned long delay)
 {
+	if (!delay)
+		return queue_work_on(cpu, kblockd_workqueue, &dwork->work);
 	return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay);
 }
 EXPORT_SYMBOL(kblockd_mod_delayed_work_on);

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* RE: [PATCH] block: reduce kblockd_mod_delayed_work_on() CPU consumption
  2021-12-14 15:59   ` Jens Axboe
@ 2021-12-14 20:42     ` Dexuan Cui
  2021-12-15 17:40     ` Bart Van Assche
  1 sibling, 0 replies; 5+ messages in thread
From: Dexuan Cui @ 2021-12-14 20:42 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig
  Cc: linux-block@vger.kernel.org, Ming Lei, linux-scsi@vger.kernel.org,
	Tejun Heo, Lai Jiangshan

> From: Jens Axboe <axboe@kernel.dk>
> Sent: Tuesday, December 14, 2021 8:00 AM
>  ...
> We can probably get by with doing just that, and just ignore if a delayed
> work timer is already running.
> 
> Dexuan, can you try this one?
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 1378d084c770..c1833f95cb97 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1484,6 +1484,8 @@ EXPORT_SYMBOL(kblockd_schedule_work);
>  int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork,
>  				unsigned long delay)
>  {
> +	if (!delay)
> +		return queue_work_on(cpu, kblockd_workqueue, &dwork->work);
>  	return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay);
>  }
>  EXPORT_SYMBOL(kblockd_mod_delayed_work_on);
> 
> --
> Jens Axboe

According to my test, this patch works as efficiently as the v1.  Thanks Jens!

Thanks,
-- Dexuan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] block: reduce kblockd_mod_delayed_work_on() CPU consumption
  2021-12-14 15:59   ` Jens Axboe
  2021-12-14 20:42     ` Dexuan Cui
@ 2021-12-15 17:40     ` Bart Van Assche
  2021-12-16  7:22       ` Ming Lei
  1 sibling, 1 reply; 5+ messages in thread
From: Bart Van Assche @ 2021-12-15 17:40 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig
  Cc: linux-block@vger.kernel.org, Dexuan Cui, Ming Lei, linux-scsi,
	Tejun Heo, Lai Jiangshan

On 12/14/21 7:59 AM, Jens Axboe wrote:
> On 12/14/21 8:04 AM, Christoph Hellwig wrote:
>> So why not do a non-delayed queue_work for that case?  Might be good
>> to get the scsi and workqueue maintaines involved to understand the
>> issue a bit better first.
> 
> We can probably get by with doing just that, and just ignore if a delayed
> work timer is already running.
> 
> Dexuan, can you try this one?
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 1378d084c770..c1833f95cb97 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1484,6 +1484,8 @@ EXPORT_SYMBOL(kblockd_schedule_work);
>   int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork,
>   				unsigned long delay)
>   {
> +	if (!delay)
> +		return queue_work_on(cpu, kblockd_workqueue, &dwork->work);
>   	return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay);
>   }
>   EXPORT_SYMBOL(kblockd_mod_delayed_work_on);

As Christoph already mentioned, it would be great to receive feedback from the
workqueue maintainer about this patch since I'm not aware of other kernel code
that queues delayed_work in a similar way.
Regarding the feedback from the view of the SCSI subsystem: I'd like to see the
block layer core track whether or not a queue needs to be run such that the
scsi_run_queue_async() call can be removed from scsi_end_request(). No such call
was present in the original conversion of the SCSI core from the legacy block
layer to blk-mq. See also commit d285203cf647 ("scsi: add support for a blk-mq
based I/O path.").

Bart.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] block: reduce kblockd_mod_delayed_work_on() CPU consumption
  2021-12-15 17:40     ` Bart Van Assche
@ 2021-12-16  7:22       ` Ming Lei
  0 siblings, 0 replies; 5+ messages in thread
From: Ming Lei @ 2021-12-16  7:22 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, linux-block@vger.kernel.org,
	Dexuan Cui, linux-scsi, Tejun Heo, Lai Jiangshan

On Wed, Dec 15, 2021 at 09:40:38AM -0800, Bart Van Assche wrote:
> On 12/14/21 7:59 AM, Jens Axboe wrote:
> > On 12/14/21 8:04 AM, Christoph Hellwig wrote:
> > > So why not do a non-delayed queue_work for that case?  Might be good
> > > to get the scsi and workqueue maintaines involved to understand the
> > > issue a bit better first.
> > 
> > We can probably get by with doing just that, and just ignore if a delayed
> > work timer is already running.
> > 
> > Dexuan, can you try this one?
> > 
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index 1378d084c770..c1833f95cb97 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -1484,6 +1484,8 @@ EXPORT_SYMBOL(kblockd_schedule_work);
> >   int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork,
> >   				unsigned long delay)
> >   {
> > +	if (!delay)
> > +		return queue_work_on(cpu, kblockd_workqueue, &dwork->work);
> >   	return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay);
> >   }
> >   EXPORT_SYMBOL(kblockd_mod_delayed_work_on);
> 
> As Christoph already mentioned, it would be great to receive feedback from the
> workqueue maintainer about this patch since I'm not aware of other kernel code
> that queues delayed_work in a similar way.
> Regarding the feedback from the view of the SCSI subsystem: I'd like to see the
> block layer core track whether or not a queue needs to be run such that the
> scsi_run_queue_async() call can be removed from scsi_end_request(). No such call

scsi_run_queue_async() is just for handling restart from running out of
scsi's device queue limit, which shouldn't be hot now, and it is for
handling scsi's own queue limit.

> was present in the original conversion of the SCSI core from the legacy block
> layer to blk-mq. See also commit d285203cf647 ("scsi: add support for a blk-mq
> based I/O path.").

That isn't true, see scsi_next_command()->scsi_run_queue().


Thanks,
Ming


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-12-16  7:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <bc529a3e-31d5-c266-8633-91095b346b19@kernel.dk>
2021-12-14 15:04 ` [PATCH] block: reduce kblockd_mod_delayed_work_on() CPU consumption Christoph Hellwig
2021-12-14 15:59   ` Jens Axboe
2021-12-14 20:42     ` Dexuan Cui
2021-12-15 17:40     ` Bart Van Assche
2021-12-16  7:22       ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox