All of lore.kernel.org
 help / color / mirror / Atom feed
From: Breno Leitao <leitao@debian.org>
To: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>,
	linux-kernel@vger.kernel.org,  marco.crivellari@suse.com,
	frederic@kernel.org, bigeasy@linutronix.de,
	 Hillf Danton <hdanton@sina.com>,
	kernel-team@meta.com
Subject: Re: [PATCH v2 1/4] workqueue: park kicked worker on pool->kicked_list
Date: Mon, 8 Jun 2026 10:12:38 -0700	[thread overview]
Message-ID: <aib0R4r-JSmPK0eo@gmail.com> (raw)
In-Reply-To: <aiMHoGja1Ve7bYC0@slm.duckdns.org>

On Fri, Jun 05, 2026 at 07:30:08AM -1000, Tejun Heo wrote:
> Hello,
> 
> On Fri, Jun 05, 2026 at 07:40:43AM -0700, Breno Leitao wrote:
> ...
> > > > task state, so a kicked-but-not-yet-scheduled worker is still a valid
> > > > cull victim -- the cull can reap it before it consumes the just-enqueued
> > > > work, stranding the item.  The window is narrow today but later patches
> > > > in this series defer the wakeup outside pool->lock, widening it.
> > > 
> > > Have you actually reproduced this?
> > 
> > No -- not without artificially changing the timeout in the kernel. So far this
> > is a theoretical race window rather than something I've hit in
> > practice; it was flagged as a critical issue by sashiko:
> > 
> >     https://sashiko.dev/#/patchset/20260526-fastwake-v1-0-e69ad86923e6%40debian.org
> > 
> > The only way I could get it to actually strand an item was by shrinking
> > IDLE_WORKER_TIMEOUT 150,000x (300s -> 2ms), so that a worker counts as
> > "timed out" almost immediately.
> 
> I see. This is a bug then. It shouldn't happen even with that.
> 
> > Would it make sense to refresh last_active when a worker is kicked? The
> > cull walks tail->head and breaks at the first non-expired worker, so a
> > freshly-stamped kicked worker would simply be skipped while genuinely
> > old workers behind it are still reaped.
> 
> I don't think timestamping is where the problem is. The intention of the
> code is that the idle thread minimum count + how workers transition their
> states can't lead to a situation where a work item is pending without an
> idle worker to execute it regardless of timing.
> 
> Can you instrument code with the lowered threshold and record the sequence
> of events. If we record the sequence of work item and worker state
> transitions, it should tell us what's broken. We shouldn't need to protect
> all kicked workers to fix this.

Sure, let me show what I am using first, so, we are comfortable with the
"reproducer" and get on the same page.

This is what I am using. Basically colling a task that is waking up.

	diff --git a/kernel/workqueue.c b/kernel/workqueue.c
	index 78f25afb4a9d6..20e6f8b1fb6b6 100644
	--- a/kernel/workqueue.c
	+++ b/kernel/workqueue.c
	@@ -110,7 +110,7 @@ enum wq_internal_consts {
		BUSY_WORKER_HASH_ORDER  = 6,            /* 64 pointers */

		MAX_IDLE_WORKERS_RATIO  = 4,            /* 1/4 of busy can be idle */
	-       IDLE_WORKER_TIMEOUT     = 300 * HZ,     /* keep idle ones for 5 mins */
	+       IDLE_WORKER_TIMEOUT     = HZ / 500,     /* REPRO: 2ms (was 300 * HZ) */

		MAYDAY_INITIAL_TIMEOUT  = HZ / 100 >= 2 ? HZ / 100 : 2,
							/* call for help after 10ms
	@@ -2954,6 +2954,25 @@ static void set_worker_dying(struct worker *worker, struct list_head *list)

		/* get an extra task struct reference for later kthread_stop_put() */
		get_task_struct(worker->task);
	+
	+       {
	+               static atomic_t total = ATOMIC_INIT(0);
	+               static atomic_t kicked = ATOMIC_INIT(0);
	+               unsigned int st = READ_ONCE(worker->task->__state);
	+               int t = atomic_inc_return(&total);
	+
	+               /* TASK_IDLE = TASK_UNINTERRUPTIBLE|TASK_NOLOAD = 0x402.  Anything
	+                * else means the task was already woken before the cull caught it.
	+                */
	+               if (st != TASK_IDLE) {
	+                       int k = atomic_inc_return(&kicked);
	+
	+                       pr_warn("REPRO: KICKED-then-CULLED #%d/%d (pool=%d state=0x%x worklist=%s nr_idle=%d nr_workers=%d)\n",
	+                               k, t, pool->id, st,
	+                               list_empty(&pool->worklist) ? "empty" : "NON-EMPTY",
	+                               pool->nr_idle, pool->nr_workers);
	+               }
	+       }
	}

And as soon as the machine boots (even before init) I start seeing:

	[    8.340748] REPRO: KICKED-then-CULLED #1/13 (pool=294 state=0x0 worklist=empty nr_idle=2 nr_workers=2)
	[  753.586340] REPRO: KICKED-then-CULLED #2/5946 (pool=289 state=0x0 worklist=NON-EMPTY nr_idle=4 nr_workers=6)
	[  766.349179] REPRO: KICKED-then-CULLED #3/6024 (pool=288 state=0x0 worklist=NON-EMPTY nr_idle=3 nr_workers=6)


Do you think this is a good test?

Thanks
--breno

  reply	other threads:[~2026-06-08 17:12 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-03 13:40 [PATCH v2 0/4] workqueue: Shrink the lock time Breno Leitao
2026-06-03 13:40 ` [PATCH v2 1/4] workqueue: park kicked worker on pool->kicked_list Breno Leitao
2026-06-04  8:50   ` Tejun Heo
2026-06-05 14:40     ` Breno Leitao
2026-06-05 17:30       ` Tejun Heo
2026-06-08 17:12         ` Breno Leitao [this message]
2026-06-03 13:40 ` [PATCH v2 2/4] workqueue: split kick_pool() into kick_pool_pick() + wake_up_q() Breno Leitao
2026-06-03 13:40 ` [PATCH v2 3/4] workqueue: defer the worker wakeup outside pool->lock in __queue_work() Breno Leitao
2026-06-03 13:40 ` [PATCH v2 4/4] workqueue: defer the worker wakeup outside pool->lock in process_one_work() Breno Leitao
2026-06-04  8:50   ` Tejun Heo
2026-06-04 15:29     ` Breno Leitao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aib0R4r-JSmPK0eo@gmail.com \
    --to=leitao@debian.org \
    --cc=bigeasy@linutronix.de \
    --cc=frederic@kernel.org \
    --cc=hdanton@sina.com \
    --cc=jiangshanlai@gmail.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marco.crivellari@suse.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.