From: Tejun Heo <tj@kernel.org>
To: jiangshanlai@gmail.com
Cc: torvalds@linux-foundation.org, peterz@infradead.org,
linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: [PATCHSET v2 wq/for-6.5] workqueue: Implement automatic CPU intensive detection and add monitoring
Date: Tue, 9 May 2023 17:07:46 -1000 [thread overview]
Message-ID: <20230510030752.542340-1-tj@kernel.org> (raw)
v2: * Lai pointed out that !SM_NONE cases should also be notified. 0001 and
0004 are updated accordingly.
* PeterZ suggested reporting on work items that trigger the auto CPU
intensive mechanism. 0006 adds reporting of work functions that
trigger the mechanism repeatedly with exponential backoff.
Hello,
To reduce the number of concurrent worker threads, workqueue holds back
starting per-cpu work items while the previous work item stays in the
RUNNING state. As such a per-cpu work item which consumes a lot of CPU
cycles, even if it has cond_resched()'s in the right places, can stall other
per-cpu work items.
To support per-cpu work items that may occupy the CPU for a substantial
period of time, workqueue has WQ_CPU_INTENSIVE flag which exempts work items
issued through the marked workqueue from concurrency management - they're
started immediately and don't block other work items. While this works, it's
error-prone in that a workqueue user can easily forget to set the flag or
set it unnecessarily. Furthermore, the impacts of the wrong flag setting can
be rather indirect and challenging to root-cause.
This patchset makes workqueue auto-detect CPU intensive work items based on
CPU consumption. If a work item consumes more than the threshold (5ms by
default) of CPU time, it's automatically marked as CPU intensive when it
gets scheduled out which unblocks starting of pending per-cpu work items.
The mechanism isn't foolproof in that the detection delays can add up if
many CPU-hogging work items are queued at the same time. However, in such
situations, the bigger problem likely is the CPU being saturated with
per-cpu work items and the solution would be making them UNBOUND. Future
changes will make UNBOUND workqueues more attractive by improving their
locality behaviors and configurability. We might eventually remove the
explicit WQ_CPU_INTENSIVE flag.
While at it, add statistics and a monitoring script. Lack of visibility has
always been a bit of pain point when debugging workqueue related issues and
with this change and more drastic ones planned for workqueue, this is a good
time to address the shortcoming.
This patchset was born out of the discussion in the following thread:
https://lkml.kernel.org/r/CAHk-=wgE9kORADrDJ4nEsHHLirqPCZ1tGaEPAZejHdZ03qCOGg@mail.gmail.com
and contains the following five patches:
0001-workqueue-sched-Notify-workqueue-of-scheduling-of-RU.patch
0002-workqueue-Re-order-struct-worker-fields.patch
0003-workqueue-Move-worker_set-clr_flags-upwards.patch
0004-workqueue-Automatically-mark-CPU-hogging-work-items-.patch
0005-workqueue-Report-work-funcs-that-trigger-automatic-C.patch
0006-workqueue-Add-pwq-stats-and-a-monitoring-script.patch
and also available in the following git branch:
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git auto-cpu-intensive-v2
diffstat follows. Thanks.
Documentation/core-api/workqueue.rst | 30 ++
kernel/sched/core.c | 48 ++--
kernel/workqueue.c | 364 +++++++++++++++++++++++++++--------
kernel/workqueue_internal.h | 14 -
lib/Kconfig.debug | 13 +
tools/workqueue/wq_monitor.py | 148 ++++++++++++++
6 files changed, 513 insertions(+), 104 deletions(-)
--
tejun
next reply other threads:[~2023-05-10 3:08 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-10 3:07 Tejun Heo [this message]
2023-05-10 3:07 ` [PATCH 1/6] workqueue, sched: Notify workqueue of scheduling of RUNNING and preempted tasks Tejun Heo
2023-05-10 3:07 ` [PATCH 2/6] workqueue: Re-order struct worker fields Tejun Heo
2023-05-10 3:07 ` [PATCH 3/6] workqueue: Move worker_set/clr_flags() upwards Tejun Heo
2023-05-10 14:30 ` Linus Torvalds
2023-05-10 18:18 ` Tejun Heo
2023-05-10 3:07 ` [PATCH 4/6] workqueue: Automatically mark CPU-hogging work items CPU_INTENSIVE Tejun Heo
2023-05-10 15:09 ` Peter Zijlstra
2023-05-10 16:08 ` Tejun Heo
2023-05-10 3:07 ` [PATCH 5/6] workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism Tejun Heo
2023-05-10 3:07 ` [PATCH 6/6] workqueue: Add pwq->stats[] and a monitoring script Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230510030752.542340-1-tj@kernel.org \
--to=tj@kernel.org \
--cc=jiangshanlai@gmail.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox