From: Breno Leitao <leitao@debian.org>
To: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>,
linux-kernel@vger.kernel.org, marco.crivellari@suse.com,
frederic@kernel.org, bigeasy@linutronix.de,
Hillf Danton <hdanton@sina.com>,
kernel-team@meta.com
Subject: Re: [PATCH v2 1/4] workqueue: park kicked worker on pool->kicked_list
Date: Mon, 8 Jun 2026 10:12:38 -0700 [thread overview]
Message-ID: <aib0R4r-JSmPK0eo@gmail.com> (raw)
In-Reply-To: <aiMHoGja1Ve7bYC0@slm.duckdns.org>
On Fri, Jun 05, 2026 at 07:30:08AM -1000, Tejun Heo wrote:
> Hello,
>
> On Fri, Jun 05, 2026 at 07:40:43AM -0700, Breno Leitao wrote:
> ...
> > > > task state, so a kicked-but-not-yet-scheduled worker is still a valid
> > > > cull victim -- the cull can reap it before it consumes the just-enqueued
> > > > work, stranding the item. The window is narrow today but later patches
> > > > in this series defer the wakeup outside pool->lock, widening it.
> > >
> > > Have you actually reproduced this?
> >
> > No -- not without artificially changing the timeout in the kernel. So far this
> > is a theoretical race window rather than something I've hit in
> > practice; it was flagged as a critical issue by sashiko:
> >
> > https://sashiko.dev/#/patchset/20260526-fastwake-v1-0-e69ad86923e6%40debian.org
> >
> > The only way I could get it to actually strand an item was by shrinking
> > IDLE_WORKER_TIMEOUT 150,000x (300s -> 2ms), so that a worker counts as
> > "timed out" almost immediately.
>
> I see. This is a bug then. It shouldn't happen even with that.
>
> > Would it make sense to refresh last_active when a worker is kicked? The
> > cull walks tail->head and breaks at the first non-expired worker, so a
> > freshly-stamped kicked worker would simply be skipped while genuinely
> > old workers behind it are still reaped.
>
> I don't think timestamping is where the problem is. The intention of the
> code is that the idle thread minimum count + how workers transition their
> states can't lead to a situation where a work item is pending without an
> idle worker to execute it regardless of timing.
>
> Can you instrument code with the lowered threshold and record the sequence
> of events. If we record the sequence of work item and worker state
> transitions, it should tell us what's broken. We shouldn't need to protect
> all kicked workers to fix this.
Sure, let me show what I am using first, so, we are comfortable with the
"reproducer" and get on the same page.
This is what I am using. Basically colling a task that is waking up.
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 78f25afb4a9d6..20e6f8b1fb6b6 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -110,7 +110,7 @@ enum wq_internal_consts {
BUSY_WORKER_HASH_ORDER = 6, /* 64 pointers */
MAX_IDLE_WORKERS_RATIO = 4, /* 1/4 of busy can be idle */
- IDLE_WORKER_TIMEOUT = 300 * HZ, /* keep idle ones for 5 mins */
+ IDLE_WORKER_TIMEOUT = HZ / 500, /* REPRO: 2ms (was 300 * HZ) */
MAYDAY_INITIAL_TIMEOUT = HZ / 100 >= 2 ? HZ / 100 : 2,
/* call for help after 10ms
@@ -2954,6 +2954,25 @@ static void set_worker_dying(struct worker *worker, struct list_head *list)
/* get an extra task struct reference for later kthread_stop_put() */
get_task_struct(worker->task);
+
+ {
+ static atomic_t total = ATOMIC_INIT(0);
+ static atomic_t kicked = ATOMIC_INIT(0);
+ unsigned int st = READ_ONCE(worker->task->__state);
+ int t = atomic_inc_return(&total);
+
+ /* TASK_IDLE = TASK_UNINTERRUPTIBLE|TASK_NOLOAD = 0x402. Anything
+ * else means the task was already woken before the cull caught it.
+ */
+ if (st != TASK_IDLE) {
+ int k = atomic_inc_return(&kicked);
+
+ pr_warn("REPRO: KICKED-then-CULLED #%d/%d (pool=%d state=0x%x worklist=%s nr_idle=%d nr_workers=%d)\n",
+ k, t, pool->id, st,
+ list_empty(&pool->worklist) ? "empty" : "NON-EMPTY",
+ pool->nr_idle, pool->nr_workers);
+ }
+ }
}
And as soon as the machine boots (even before init) I start seeing:
[ 8.340748] REPRO: KICKED-then-CULLED #1/13 (pool=294 state=0x0 worklist=empty nr_idle=2 nr_workers=2)
[ 753.586340] REPRO: KICKED-then-CULLED #2/5946 (pool=289 state=0x0 worklist=NON-EMPTY nr_idle=4 nr_workers=6)
[ 766.349179] REPRO: KICKED-then-CULLED #3/6024 (pool=288 state=0x0 worklist=NON-EMPTY nr_idle=3 nr_workers=6)
Do you think this is a good test?
Thanks
--breno
next prev parent reply other threads:[~2026-06-08 17:12 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-03 13:40 [PATCH v2 0/4] workqueue: Shrink the lock time Breno Leitao
2026-06-03 13:40 ` [PATCH v2 1/4] workqueue: park kicked worker on pool->kicked_list Breno Leitao
2026-06-04 8:50 ` Tejun Heo
2026-06-05 14:40 ` Breno Leitao
2026-06-05 17:30 ` Tejun Heo
2026-06-08 17:12 ` Breno Leitao [this message]
2026-06-03 13:40 ` [PATCH v2 2/4] workqueue: split kick_pool() into kick_pool_pick() + wake_up_q() Breno Leitao
2026-06-03 13:40 ` [PATCH v2 3/4] workqueue: defer the worker wakeup outside pool->lock in __queue_work() Breno Leitao
2026-06-03 13:40 ` [PATCH v2 4/4] workqueue: defer the worker wakeup outside pool->lock in process_one_work() Breno Leitao
2026-06-04 8:50 ` Tejun Heo
2026-06-04 15:29 ` Breno Leitao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aib0R4r-JSmPK0eo@gmail.com \
--to=leitao@debian.org \
--cc=bigeasy@linutronix.de \
--cc=frederic@kernel.org \
--cc=hdanton@sina.com \
--cc=jiangshanlai@gmail.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=marco.crivellari@suse.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox