* Re: [PATCH] block: reduce kblockd_mod_delayed_work_on() CPU consumption [not found] <bc529a3e-31d5-c266-8633-91095b346b19@kernel.dk> @ 2021-12-14 15:04 ` Christoph Hellwig 2021-12-14 15:59 ` Jens Axboe 0 siblings, 1 reply; 5+ messages in thread From: Christoph Hellwig @ 2021-12-14 15:04 UTC (permalink / raw) To: Jens Axboe Cc: linux-block@vger.kernel.org, Dexuan Cui, Ming Lei, linux-scsi, Tejun Heo, Lai Jiangshan On Tue, Dec 14, 2021 at 07:53:46AM -0700, Jens Axboe wrote: > Dexuan reports that he's seeing spikes of very heavy CPU utilization when > running 24 disks and using the 'none' scheduler. This happens off the > flush path, because SCSI requires the queue to be restarted async, and > hence we're hammering on mod_delayed_work_on() to ensure that the work > item gets run appropriately. > > What we care about here is that the queue is run, and we don't need to > repeatedly re-arm the timer associated with the delayed work item. If we > check if the work item is pending upfront, then we don't really need to do > anything else. This is safe as theh work pending bit is cleared before a > work item is started. > > The only potential caveat here is if we have callers with wildly different > timeouts specified. That's generally not the case, so don't think we need > to care for that case. So why not do a non-delayed queue_work for that case? Might be good to get the scsi and workqueue maintaines involved to understand the issue a bit better first. > > Reported-by: Dexuan Cui <decui@microsoft.com> > Link: https://lore.kernel.org/linux-block/BYAPR21MB1270C598ED214C0490F47400BF719@BYAPR21MB1270.namprd21.prod.outlook.com/ > Signed-off-by: Jens Axboe <axboe@kernel.dk> > > --- > > diff --git a/block/blk-core.c b/block/blk-core.c > index 1378d084c770..4584fe709c15 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -1484,7 +1484,16 @@ EXPORT_SYMBOL(kblockd_schedule_work); > int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork, > unsigned long delay) > { > - return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay); > + /* > + * Avoid hammering on work addition, if the work item is already > + * pending. This is safe the work pending state is cleared before > + * the work item is started, so if we see it set, then we know that > + * whatever was previously queued on the block side will get run by > + * an existing pending work item. > + */ > + if (!work_pending(&dwork->work)) > + return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay); > + return true; > } > EXPORT_SYMBOL(kblockd_mod_delayed_work_on); > > -- > Jens Axboe > ---end quoted text--- ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] block: reduce kblockd_mod_delayed_work_on() CPU consumption 2021-12-14 15:04 ` [PATCH] block: reduce kblockd_mod_delayed_work_on() CPU consumption Christoph Hellwig @ 2021-12-14 15:59 ` Jens Axboe 2021-12-14 20:42 ` Dexuan Cui 2021-12-15 17:40 ` Bart Van Assche 0 siblings, 2 replies; 5+ messages in thread From: Jens Axboe @ 2021-12-14 15:59 UTC (permalink / raw) To: Christoph Hellwig Cc: linux-block@vger.kernel.org, Dexuan Cui, Ming Lei, linux-scsi, Tejun Heo, Lai Jiangshan On 12/14/21 8:04 AM, Christoph Hellwig wrote: > On Tue, Dec 14, 2021 at 07:53:46AM -0700, Jens Axboe wrote: >> Dexuan reports that he's seeing spikes of very heavy CPU utilization when >> running 24 disks and using the 'none' scheduler. This happens off the >> flush path, because SCSI requires the queue to be restarted async, and >> hence we're hammering on mod_delayed_work_on() to ensure that the work >> item gets run appropriately. >> >> What we care about here is that the queue is run, and we don't need to >> repeatedly re-arm the timer associated with the delayed work item. If we >> check if the work item is pending upfront, then we don't really need to do >> anything else. This is safe as theh work pending bit is cleared before a >> work item is started. >> >> The only potential caveat here is if we have callers with wildly different >> timeouts specified. That's generally not the case, so don't think we need >> to care for that case. > > So why not do a non-delayed queue_work for that case? Might be good > to get the scsi and workqueue maintaines involved to understand the > issue a bit better first. We can probably get by with doing just that, and just ignore if a delayed work timer is already running. Dexuan, can you try this one? diff --git a/block/blk-core.c b/block/blk-core.c index 1378d084c770..c1833f95cb97 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1484,6 +1484,8 @@ EXPORT_SYMBOL(kblockd_schedule_work); int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork, unsigned long delay) { + if (!delay) + return queue_work_on(cpu, kblockd_workqueue, &dwork->work); return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay); } EXPORT_SYMBOL(kblockd_mod_delayed_work_on); -- Jens Axboe ^ permalink raw reply related [flat|nested] 5+ messages in thread
* RE: [PATCH] block: reduce kblockd_mod_delayed_work_on() CPU consumption 2021-12-14 15:59 ` Jens Axboe @ 2021-12-14 20:42 ` Dexuan Cui 2021-12-15 17:40 ` Bart Van Assche 1 sibling, 0 replies; 5+ messages in thread From: Dexuan Cui @ 2021-12-14 20:42 UTC (permalink / raw) To: Jens Axboe, Christoph Hellwig Cc: linux-block@vger.kernel.org, Ming Lei, linux-scsi@vger.kernel.org, Tejun Heo, Lai Jiangshan > From: Jens Axboe <axboe@kernel.dk> > Sent: Tuesday, December 14, 2021 8:00 AM > ... > We can probably get by with doing just that, and just ignore if a delayed > work timer is already running. > > Dexuan, can you try this one? > > diff --git a/block/blk-core.c b/block/blk-core.c > index 1378d084c770..c1833f95cb97 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -1484,6 +1484,8 @@ EXPORT_SYMBOL(kblockd_schedule_work); > int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork, > unsigned long delay) > { > + if (!delay) > + return queue_work_on(cpu, kblockd_workqueue, &dwork->work); > return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay); > } > EXPORT_SYMBOL(kblockd_mod_delayed_work_on); > > -- > Jens Axboe According to my test, this patch works as efficiently as the v1. Thanks Jens! Thanks, -- Dexuan ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] block: reduce kblockd_mod_delayed_work_on() CPU consumption 2021-12-14 15:59 ` Jens Axboe 2021-12-14 20:42 ` Dexuan Cui @ 2021-12-15 17:40 ` Bart Van Assche 2021-12-16 7:22 ` Ming Lei 1 sibling, 1 reply; 5+ messages in thread From: Bart Van Assche @ 2021-12-15 17:40 UTC (permalink / raw) To: Jens Axboe, Christoph Hellwig Cc: linux-block@vger.kernel.org, Dexuan Cui, Ming Lei, linux-scsi, Tejun Heo, Lai Jiangshan On 12/14/21 7:59 AM, Jens Axboe wrote: > On 12/14/21 8:04 AM, Christoph Hellwig wrote: >> So why not do a non-delayed queue_work for that case? Might be good >> to get the scsi and workqueue maintaines involved to understand the >> issue a bit better first. > > We can probably get by with doing just that, and just ignore if a delayed > work timer is already running. > > Dexuan, can you try this one? > > diff --git a/block/blk-core.c b/block/blk-core.c > index 1378d084c770..c1833f95cb97 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -1484,6 +1484,8 @@ EXPORT_SYMBOL(kblockd_schedule_work); > int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork, > unsigned long delay) > { > + if (!delay) > + return queue_work_on(cpu, kblockd_workqueue, &dwork->work); > return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay); > } > EXPORT_SYMBOL(kblockd_mod_delayed_work_on); As Christoph already mentioned, it would be great to receive feedback from the workqueue maintainer about this patch since I'm not aware of other kernel code that queues delayed_work in a similar way. Regarding the feedback from the view of the SCSI subsystem: I'd like to see the block layer core track whether or not a queue needs to be run such that the scsi_run_queue_async() call can be removed from scsi_end_request(). No such call was present in the original conversion of the SCSI core from the legacy block layer to blk-mq. See also commit d285203cf647 ("scsi: add support for a blk-mq based I/O path."). Bart. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] block: reduce kblockd_mod_delayed_work_on() CPU consumption 2021-12-15 17:40 ` Bart Van Assche @ 2021-12-16 7:22 ` Ming Lei 0 siblings, 0 replies; 5+ messages in thread From: Ming Lei @ 2021-12-16 7:22 UTC (permalink / raw) To: Bart Van Assche Cc: Jens Axboe, Christoph Hellwig, linux-block@vger.kernel.org, Dexuan Cui, linux-scsi, Tejun Heo, Lai Jiangshan On Wed, Dec 15, 2021 at 09:40:38AM -0800, Bart Van Assche wrote: > On 12/14/21 7:59 AM, Jens Axboe wrote: > > On 12/14/21 8:04 AM, Christoph Hellwig wrote: > > > So why not do a non-delayed queue_work for that case? Might be good > > > to get the scsi and workqueue maintaines involved to understand the > > > issue a bit better first. > > > > We can probably get by with doing just that, and just ignore if a delayed > > work timer is already running. > > > > Dexuan, can you try this one? > > > > diff --git a/block/blk-core.c b/block/blk-core.c > > index 1378d084c770..c1833f95cb97 100644 > > --- a/block/blk-core.c > > +++ b/block/blk-core.c > > @@ -1484,6 +1484,8 @@ EXPORT_SYMBOL(kblockd_schedule_work); > > int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork, > > unsigned long delay) > > { > > + if (!delay) > > + return queue_work_on(cpu, kblockd_workqueue, &dwork->work); > > return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay); > > } > > EXPORT_SYMBOL(kblockd_mod_delayed_work_on); > > As Christoph already mentioned, it would be great to receive feedback from the > workqueue maintainer about this patch since I'm not aware of other kernel code > that queues delayed_work in a similar way. > Regarding the feedback from the view of the SCSI subsystem: I'd like to see the > block layer core track whether or not a queue needs to be run such that the > scsi_run_queue_async() call can be removed from scsi_end_request(). No such call scsi_run_queue_async() is just for handling restart from running out of scsi's device queue limit, which shouldn't be hot now, and it is for handling scsi's own queue limit. > was present in the original conversion of the SCSI core from the legacy block > layer to blk-mq. See also commit d285203cf647 ("scsi: add support for a blk-mq > based I/O path."). That isn't true, see scsi_next_command()->scsi_run_queue(). Thanks, Ming ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-12-16 7:22 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <bc529a3e-31d5-c266-8633-91095b346b19@kernel.dk>
2021-12-14 15:04 ` [PATCH] block: reduce kblockd_mod_delayed_work_on() CPU consumption Christoph Hellwig
2021-12-14 15:59 ` Jens Axboe
2021-12-14 20:42 ` Dexuan Cui
2021-12-15 17:40 ` Bart Van Assche
2021-12-16 7:22 ` Ming Lei
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox