* Re: [PATCH] block: reduce kblockd_mod_delayed_work_on() CPU consumption
[not found] <bc529a3e-31d5-c266-8633-91095b346b19@kernel.dk>
@ 2021-12-14 15:04 ` Christoph Hellwig
2021-12-14 15:59 ` Jens Axboe
0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2021-12-14 15:04 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block@vger.kernel.org, Dexuan Cui, Ming Lei, linux-scsi,
Tejun Heo, Lai Jiangshan
On Tue, Dec 14, 2021 at 07:53:46AM -0700, Jens Axboe wrote:
> Dexuan reports that he's seeing spikes of very heavy CPU utilization when
> running 24 disks and using the 'none' scheduler. This happens off the
> flush path, because SCSI requires the queue to be restarted async, and
> hence we're hammering on mod_delayed_work_on() to ensure that the work
> item gets run appropriately.
>
> What we care about here is that the queue is run, and we don't need to
> repeatedly re-arm the timer associated with the delayed work item. If we
> check if the work item is pending upfront, then we don't really need to do
> anything else. This is safe as theh work pending bit is cleared before a
> work item is started.
>
> The only potential caveat here is if we have callers with wildly different
> timeouts specified. That's generally not the case, so don't think we need
> to care for that case.
So why not do a non-delayed queue_work for that case? Might be good
to get the scsi and workqueue maintaines involved to understand the
issue a bit better first.
>
> Reported-by: Dexuan Cui <decui@microsoft.com>
> Link: https://lore.kernel.org/linux-block/BYAPR21MB1270C598ED214C0490F47400BF719@BYAPR21MB1270.namprd21.prod.outlook.com/
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>
> ---
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 1378d084c770..4584fe709c15 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1484,7 +1484,16 @@ EXPORT_SYMBOL(kblockd_schedule_work);
> int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork,
> unsigned long delay)
> {
> - return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay);
> + /*
> + * Avoid hammering on work addition, if the work item is already
> + * pending. This is safe the work pending state is cleared before
> + * the work item is started, so if we see it set, then we know that
> + * whatever was previously queued on the block side will get run by
> + * an existing pending work item.
> + */
> + if (!work_pending(&dwork->work))
> + return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay);
> + return true;
> }
> EXPORT_SYMBOL(kblockd_mod_delayed_work_on);
>
> --
> Jens Axboe
>
---end quoted text---
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] block: reduce kblockd_mod_delayed_work_on() CPU consumption
2021-12-14 15:04 ` [PATCH] block: reduce kblockd_mod_delayed_work_on() CPU consumption Christoph Hellwig
@ 2021-12-14 15:59 ` Jens Axboe
2021-12-14 20:42 ` Dexuan Cui
2021-12-15 17:40 ` Bart Van Assche
0 siblings, 2 replies; 5+ messages in thread
From: Jens Axboe @ 2021-12-14 15:59 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-block@vger.kernel.org, Dexuan Cui, Ming Lei, linux-scsi,
Tejun Heo, Lai Jiangshan
On 12/14/21 8:04 AM, Christoph Hellwig wrote:
> On Tue, Dec 14, 2021 at 07:53:46AM -0700, Jens Axboe wrote:
>> Dexuan reports that he's seeing spikes of very heavy CPU utilization when
>> running 24 disks and using the 'none' scheduler. This happens off the
>> flush path, because SCSI requires the queue to be restarted async, and
>> hence we're hammering on mod_delayed_work_on() to ensure that the work
>> item gets run appropriately.
>>
>> What we care about here is that the queue is run, and we don't need to
>> repeatedly re-arm the timer associated with the delayed work item. If we
>> check if the work item is pending upfront, then we don't really need to do
>> anything else. This is safe as theh work pending bit is cleared before a
>> work item is started.
>>
>> The only potential caveat here is if we have callers with wildly different
>> timeouts specified. That's generally not the case, so don't think we need
>> to care for that case.
>
> So why not do a non-delayed queue_work for that case? Might be good
> to get the scsi and workqueue maintaines involved to understand the
> issue a bit better first.
We can probably get by with doing just that, and just ignore if a delayed
work timer is already running.
Dexuan, can you try this one?
diff --git a/block/blk-core.c b/block/blk-core.c
index 1378d084c770..c1833f95cb97 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1484,6 +1484,8 @@ EXPORT_SYMBOL(kblockd_schedule_work);
int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork,
unsigned long delay)
{
+ if (!delay)
+ return queue_work_on(cpu, kblockd_workqueue, &dwork->work);
return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay);
}
EXPORT_SYMBOL(kblockd_mod_delayed_work_on);
--
Jens Axboe
^ permalink raw reply related [flat|nested] 5+ messages in thread
* RE: [PATCH] block: reduce kblockd_mod_delayed_work_on() CPU consumption
2021-12-14 15:59 ` Jens Axboe
@ 2021-12-14 20:42 ` Dexuan Cui
2021-12-15 17:40 ` Bart Van Assche
1 sibling, 0 replies; 5+ messages in thread
From: Dexuan Cui @ 2021-12-14 20:42 UTC (permalink / raw)
To: Jens Axboe, Christoph Hellwig
Cc: linux-block@vger.kernel.org, Ming Lei, linux-scsi@vger.kernel.org,
Tejun Heo, Lai Jiangshan
> From: Jens Axboe <axboe@kernel.dk>
> Sent: Tuesday, December 14, 2021 8:00 AM
> ...
> We can probably get by with doing just that, and just ignore if a delayed
> work timer is already running.
>
> Dexuan, can you try this one?
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 1378d084c770..c1833f95cb97 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1484,6 +1484,8 @@ EXPORT_SYMBOL(kblockd_schedule_work);
> int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork,
> unsigned long delay)
> {
> + if (!delay)
> + return queue_work_on(cpu, kblockd_workqueue, &dwork->work);
> return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay);
> }
> EXPORT_SYMBOL(kblockd_mod_delayed_work_on);
>
> --
> Jens Axboe
According to my test, this patch works as efficiently as the v1. Thanks Jens!
Thanks,
-- Dexuan
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] block: reduce kblockd_mod_delayed_work_on() CPU consumption
2021-12-14 15:59 ` Jens Axboe
2021-12-14 20:42 ` Dexuan Cui
@ 2021-12-15 17:40 ` Bart Van Assche
2021-12-16 7:22 ` Ming Lei
1 sibling, 1 reply; 5+ messages in thread
From: Bart Van Assche @ 2021-12-15 17:40 UTC (permalink / raw)
To: Jens Axboe, Christoph Hellwig
Cc: linux-block@vger.kernel.org, Dexuan Cui, Ming Lei, linux-scsi,
Tejun Heo, Lai Jiangshan
On 12/14/21 7:59 AM, Jens Axboe wrote:
> On 12/14/21 8:04 AM, Christoph Hellwig wrote:
>> So why not do a non-delayed queue_work for that case? Might be good
>> to get the scsi and workqueue maintaines involved to understand the
>> issue a bit better first.
>
> We can probably get by with doing just that, and just ignore if a delayed
> work timer is already running.
>
> Dexuan, can you try this one?
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 1378d084c770..c1833f95cb97 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1484,6 +1484,8 @@ EXPORT_SYMBOL(kblockd_schedule_work);
> int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork,
> unsigned long delay)
> {
> + if (!delay)
> + return queue_work_on(cpu, kblockd_workqueue, &dwork->work);
> return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay);
> }
> EXPORT_SYMBOL(kblockd_mod_delayed_work_on);
As Christoph already mentioned, it would be great to receive feedback from the
workqueue maintainer about this patch since I'm not aware of other kernel code
that queues delayed_work in a similar way.
Regarding the feedback from the view of the SCSI subsystem: I'd like to see the
block layer core track whether or not a queue needs to be run such that the
scsi_run_queue_async() call can be removed from scsi_end_request(). No such call
was present in the original conversion of the SCSI core from the legacy block
layer to blk-mq. See also commit d285203cf647 ("scsi: add support for a blk-mq
based I/O path.").
Bart.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] block: reduce kblockd_mod_delayed_work_on() CPU consumption
2021-12-15 17:40 ` Bart Van Assche
@ 2021-12-16 7:22 ` Ming Lei
0 siblings, 0 replies; 5+ messages in thread
From: Ming Lei @ 2021-12-16 7:22 UTC (permalink / raw)
To: Bart Van Assche
Cc: Jens Axboe, Christoph Hellwig, linux-block@vger.kernel.org,
Dexuan Cui, linux-scsi, Tejun Heo, Lai Jiangshan
On Wed, Dec 15, 2021 at 09:40:38AM -0800, Bart Van Assche wrote:
> On 12/14/21 7:59 AM, Jens Axboe wrote:
> > On 12/14/21 8:04 AM, Christoph Hellwig wrote:
> > > So why not do a non-delayed queue_work for that case? Might be good
> > > to get the scsi and workqueue maintaines involved to understand the
> > > issue a bit better first.
> >
> > We can probably get by with doing just that, and just ignore if a delayed
> > work timer is already running.
> >
> > Dexuan, can you try this one?
> >
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index 1378d084c770..c1833f95cb97 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -1484,6 +1484,8 @@ EXPORT_SYMBOL(kblockd_schedule_work);
> > int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork,
> > unsigned long delay)
> > {
> > + if (!delay)
> > + return queue_work_on(cpu, kblockd_workqueue, &dwork->work);
> > return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay);
> > }
> > EXPORT_SYMBOL(kblockd_mod_delayed_work_on);
>
> As Christoph already mentioned, it would be great to receive feedback from the
> workqueue maintainer about this patch since I'm not aware of other kernel code
> that queues delayed_work in a similar way.
> Regarding the feedback from the view of the SCSI subsystem: I'd like to see the
> block layer core track whether or not a queue needs to be run such that the
> scsi_run_queue_async() call can be removed from scsi_end_request(). No such call
scsi_run_queue_async() is just for handling restart from running out of
scsi's device queue limit, which shouldn't be hot now, and it is for
handling scsi's own queue limit.
> was present in the original conversion of the SCSI core from the legacy block
> layer to blk-mq. See also commit d285203cf647 ("scsi: add support for a blk-mq
> based I/O path.").
That isn't true, see scsi_next_command()->scsi_run_queue().
Thanks,
Ming
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-12-16 7:22 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <bc529a3e-31d5-c266-8633-91095b346b19@kernel.dk>
2021-12-14 15:04 ` [PATCH] block: reduce kblockd_mod_delayed_work_on() CPU consumption Christoph Hellwig
2021-12-14 15:59 ` Jens Axboe
2021-12-14 20:42 ` Dexuan Cui
2021-12-15 17:40 ` Bart Van Assche
2021-12-16 7:22 ` Ming Lei
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).