From: Tejun Heo <tj@kernel.org>
To: jiangshanlai@gmail.com
Cc: torvalds@linux-foundation.org, peterz@infradead.org,
linux-kernel@vger.kernel.org, kernel-team@meta.com,
Tejun Heo <tj@kernel.org>
Subject: [PATCH 4/5] workqueue: Automatically mark CPU-hogging work items CPU_INTENSIVE
Date: Tue, 18 Apr 2023 10:51:58 -1000 [thread overview]
Message-ID: <20230418205159.724789-5-tj@kernel.org> (raw)
In-Reply-To: <20230418205159.724789-1-tj@kernel.org>
If a per-cpu work item hogs the CPU, it can prevent other work items from
starting through concurrency management. A per-cpu workqueue which intends
to host such CPU-hogging work items can choose to not participate in
concurrency management by setting %WQ_CPU_INTENSIVE; however, this can be
error-prone and difficult to debug when missed.
This patch adds an automatic CPU usage based detection. If a
concurrency-managed work item consumes more CPU time than the threshold (5ms
by default), it's marked CPU_INTENSIVE automatically on schedule-out.
The mechanism isn't foolproof in that the 5ms detection delays can add up if
many CPU-hogging work items are queued at the same time. However, in such
situations, the bigger problem likely is the CPU being saturated with
per-cpu work items and the solution would be making them UNBOUND.
For occasional CPU hogging, the new automatic mechanism should provide
reasonable protection with minimal increase in code complexity.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
---
kernel/workqueue.c | 77 ++++++++++++++++++++++++++-----------
kernel/workqueue_internal.h | 1 +
2 files changed, 56 insertions(+), 22 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index b9e8dc54272d..d24b887ddd86 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -306,6 +306,14 @@ static struct kmem_cache *pwq_cache;
static cpumask_var_t *wq_numa_possible_cpumask;
/* possible CPUs of each node */
+/*
+ * Per-cpu work items which run for longer than the following threshold are
+ * automatically considered CPU intensive and excluded from concurrency
+ * management to prevent them from noticeably delaying other per-cpu work items.
+ */
+static unsigned long wq_cpu_intensive_thresh_us = 5000;
+module_param_named(cpu_intensive_thresh_us, wq_cpu_intensive_thresh_us, ulong, 0644);
+
static bool wq_disable_numa;
module_param_named(disable_numa, wq_disable_numa, bool, 0444);
@@ -951,9 +959,6 @@ void wq_worker_stopping(struct task_struct *task)
struct worker *worker = kthread_data(task);
struct worker_pool *pool;
- if (task_is_running(task))
- return;
-
/*
* Rescuers, which may not have all the fields set up like normal
* workers, also reach here, let's not access anything before
@@ -964,24 +969,49 @@ void wq_worker_stopping(struct task_struct *task)
pool = worker->pool;
- /* Return if preempted before wq_worker_running() was reached */
- if (worker->sleeping)
- return;
+ if (task_is_running(task)) {
+ /*
+ * Concurrency-managed @worker is still RUNNING. See if the
+ * current work is hogging CPU stalling other per-cpu work
+ * items. If so, mark @worker CPU_INTENSIVE to exclude it from
+ * concurrency management. @worker->current_* are stable as they
+ * can only be modified by @task which is %current.
+ */
+ if (!worker->current_work ||
+ task->se.sum_exec_runtime - worker->current_at <
+ wq_cpu_intensive_thresh_us * NSEC_PER_USEC)
+ return;
- worker->sleeping = 1;
- raw_spin_lock_irq(&pool->lock);
+ raw_spin_lock_irq(&pool->lock);
+ worker_set_flags(worker, WORKER_CPU_INTENSIVE);
+ } else {
+ /*
+ * Concurrency-managed @worker is sleeping. Decrement the
+ * associated pool's nr_running accordingly.
+ */
- /*
- * Recheck in case unbind_workers() preempted us. We don't
- * want to decrement nr_running after the worker is unbound
- * and nr_running has been reset.
- */
- if (worker->flags & WORKER_NOT_RUNNING) {
- raw_spin_unlock_irq(&pool->lock);
- return;
+ /* Return if preempted before wq_worker_running() was reached */
+ if (worker->sleeping)
+ return;
+
+ worker->sleeping = 1;
+ raw_spin_lock_irq(&pool->lock);
+
+ /*
+ * Recheck in case unbind_workers() preempted us. We don't want
+ * to decrement nr_running after the worker is unbound and
+ * nr_running has been reset. As a running worker never sleeps
+ * inbetween work items, we know that @worker->current_* are
+ * valid after the following check.
+ */
+ if (worker->flags & WORKER_NOT_RUNNING) {
+ raw_spin_unlock_irq(&pool->lock);
+ return;
+ }
+
+ pool->nr_running--;
}
- pool->nr_running--;
if (need_more_worker(pool))
wake_up_worker(pool);
raw_spin_unlock_irq(&pool->lock);
@@ -2288,7 +2318,6 @@ __acquires(&pool->lock)
{
struct pool_workqueue *pwq = get_work_pwq(work);
struct worker_pool *pool = worker->pool;
- bool cpu_intensive = pwq->wq->flags & WQ_CPU_INTENSIVE;
unsigned long work_data;
struct worker *collision;
#ifdef CONFIG_LOCKDEP
@@ -2325,6 +2354,7 @@ __acquires(&pool->lock)
worker->current_work = work;
worker->current_func = work->func;
worker->current_pwq = pwq;
+ worker->current_at = worker->task->se.sum_exec_runtime;
work_data = *work_data_bits(work);
worker->current_color = get_work_color(work_data);
@@ -2342,7 +2372,7 @@ __acquires(&pool->lock)
* of concurrency management and the next code block will chain
* execution of the pending work items.
*/
- if (unlikely(cpu_intensive))
+ if (unlikely(pwq->wq->flags & WQ_CPU_INTENSIVE))
worker_set_flags(worker, WORKER_CPU_INTENSIVE);
/*
@@ -2420,9 +2450,12 @@ __acquires(&pool->lock)
raw_spin_lock_irq(&pool->lock);
- /* clear cpu intensive status */
- if (unlikely(cpu_intensive))
- worker_clr_flags(worker, WORKER_CPU_INTENSIVE);
+ /*
+ * In addition to %WQ_CPU_INTENSIVE, @worker may also have been marked
+ * CPU intensive by wq_worker_stopping() if @work consumed more than
+ * wq_cpu_intensive_thresh_us. Clear it.
+ */
+ worker_clr_flags(worker, WORKER_CPU_INTENSIVE);
/* tag the worker for identification in schedule() */
worker->last_func = worker->current_func;
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index 902459b328de..08ffed41f104 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -31,6 +31,7 @@ struct worker {
struct work_struct *current_work; /* L: work being processed */
work_func_t current_func; /* L: current_work's fn */
struct pool_workqueue *current_pwq; /* L: current_work's pwq */
+ u64 current_at; /* L: runtime when current_work started */
unsigned int current_color; /* L: current_work's color */
int sleeping; /* None */
--
2.40.0
next prev parent reply other threads:[~2023-04-18 20:52 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-18 20:51 [PATCHSET wq/for-6.5] workqueue: Implement automatic CPU intensive detection and add monitoring Tejun Heo
2023-04-18 20:51 ` [PATCH 1/5] workqueue, sched: Notify workqueue of scheduling of RUNNING tasks Tejun Heo
2023-04-18 20:51 ` [PATCH 2/5] workqueue: Re-order struct worker fields Tejun Heo
2023-04-18 20:51 ` [PATCH 3/5] workqueue: Move worker_set/clr_flags() upwards Tejun Heo
2023-04-18 20:51 ` Tejun Heo [this message]
2023-04-23 3:23 ` [PATCH 4/5] workqueue: Automatically mark CPU-hogging work items CPU_INTENSIVE Lai Jiangshan
2023-04-24 15:29 ` Tejun Heo
2023-04-25 13:12 ` Peter Zijlstra
2023-04-28 15:19 ` Tejun Heo
2023-04-18 20:51 ` [PATCH 5/5] workqueue: Add pwq->stats[] and a monitoring script Tejun Heo
[not found] ` <20230419014552.1410-1-hdanton@sina.com>
2023-04-19 15:45 ` [PATCH 4/5] workqueue: Automatically mark CPU-hogging work items CPU_INTENSIVE Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230418205159.724789-5-tj@kernel.org \
--to=tj@kernel.org \
--cc=jiangshanlai@gmail.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox