From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2D5793D3D15 for ; Mon, 8 Jun 2026 17:12:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780938772; cv=none; b=aGEf6o1DiOuJrW9aDPIFJBmQe8yLx3cw+7vOihTxtWLFMDdVN2NA1wRdpQRfSl+9LxL6wxGnrDjIEGt+DE9/EwLgB0ItVrbhrVxcWbbKGR3R6oaG07EniV6F0CvheQ0dQkqlNN+hYyNe59R+iBJwAz2+UoSf8vWglMcN1jYo/vI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780938772; c=relaxed/simple; bh=eVrcHVoGgBm1Hqw5SDBk8EV4p1E/1P2JwP5mSUga+Io=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=ieK9Ufi3mYWwmCQJ976qzY5X60AA/NBhvDV2Ul2ptlKgr9JEx95k4pcKb8ifYAO8N0T/mNmpjZzCDSbWr5PMcJj4gM7+Lg6RBSyqeZNRxmJAwAkcGuTSE9UF39lAfs0N571e+wrtdX8s1QwICfHvniBcOotNgeVucq+AaK5MUwY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org; spf=pass smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=J5Bbm8F6; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="J5Bbm8F6" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=Zpm+F8TSEOQuAcA1aY2PT7AMQ+RGdFmT2ytdsEOZp0U=; b=J5Bbm8F62uW9yWtQnDnaIBTxj3 WBj7PXySvS5ZE3JcATxKpOzjZDrzp7QwGwnNW1yNk+NDtcGDHUwwsmrpC9aJZ+X88xNqqdc78RY9v FhBEVorI2qoxaMhJ0c5gIOQlmfbfJQjlO+D7gheKMVZKSIySz2Ws/c1H6CC7ByeR86VVZkuc0v2Ub gCqhJYm52B1nnD5DfhxmgPVQI9bDcXgk7n59HefBUl96ehAPdBDJRj+58tsXLdaKxOQ/Ojyzw4Wtr AAfaEBT6tfrbczpGHuVM92RJgrEjfAa17y/LhIAti2g75W+xVpijgeNY4RR8j4iEq3ZMErpffx/kK df+XpitQ==; Received: from authenticated-user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.96) (envelope-from ) id 1wWdWk-007mGo-2G; Mon, 08 Jun 2026 17:12:43 +0000 Date: Mon, 8 Jun 2026 10:12:38 -0700 From: Breno Leitao To: Tejun Heo Cc: Lai Jiangshan , linux-kernel@vger.kernel.org, marco.crivellari@suse.com, frederic@kernel.org, bigeasy@linutronix.de, Hillf Danton , kernel-team@meta.com Subject: Re: [PATCH v2 1/4] workqueue: park kicked worker on pool->kicked_list Message-ID: References: <20260603-fastwake-v2-0-2977512fe7fa@debian.org> <20260603-fastwake-v2-1-2977512fe7fa@debian.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Debian-User: leitao On Fri, Jun 05, 2026 at 07:30:08AM -1000, Tejun Heo wrote: > Hello, > > On Fri, Jun 05, 2026 at 07:40:43AM -0700, Breno Leitao wrote: > ... > > > > task state, so a kicked-but-not-yet-scheduled worker is still a valid > > > > cull victim -- the cull can reap it before it consumes the just-enqueued > > > > work, stranding the item. The window is narrow today but later patches > > > > in this series defer the wakeup outside pool->lock, widening it. > > > > > > Have you actually reproduced this? > > > > No -- not without artificially changing the timeout in the kernel. So far this > > is a theoretical race window rather than something I've hit in > > practice; it was flagged as a critical issue by sashiko: > > > > https://sashiko.dev/#/patchset/20260526-fastwake-v1-0-e69ad86923e6%40debian.org > > > > The only way I could get it to actually strand an item was by shrinking > > IDLE_WORKER_TIMEOUT 150,000x (300s -> 2ms), so that a worker counts as > > "timed out" almost immediately. > > I see. This is a bug then. It shouldn't happen even with that. > > > Would it make sense to refresh last_active when a worker is kicked? The > > cull walks tail->head and breaks at the first non-expired worker, so a > > freshly-stamped kicked worker would simply be skipped while genuinely > > old workers behind it are still reaped. > > I don't think timestamping is where the problem is. The intention of the > code is that the idle thread minimum count + how workers transition their > states can't lead to a situation where a work item is pending without an > idle worker to execute it regardless of timing. > > Can you instrument code with the lowered threshold and record the sequence > of events. If we record the sequence of work item and worker state > transitions, it should tell us what's broken. We shouldn't need to protect > all kicked workers to fix this. Sure, let me show what I am using first, so, we are comfortable with the "reproducer" and get on the same page. This is what I am using. Basically colling a task that is waking up. diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 78f25afb4a9d6..20e6f8b1fb6b6 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -110,7 +110,7 @@ enum wq_internal_consts { BUSY_WORKER_HASH_ORDER = 6, /* 64 pointers */ MAX_IDLE_WORKERS_RATIO = 4, /* 1/4 of busy can be idle */ - IDLE_WORKER_TIMEOUT = 300 * HZ, /* keep idle ones for 5 mins */ + IDLE_WORKER_TIMEOUT = HZ / 500, /* REPRO: 2ms (was 300 * HZ) */ MAYDAY_INITIAL_TIMEOUT = HZ / 100 >= 2 ? HZ / 100 : 2, /* call for help after 10ms @@ -2954,6 +2954,25 @@ static void set_worker_dying(struct worker *worker, struct list_head *list) /* get an extra task struct reference for later kthread_stop_put() */ get_task_struct(worker->task); + + { + static atomic_t total = ATOMIC_INIT(0); + static atomic_t kicked = ATOMIC_INIT(0); + unsigned int st = READ_ONCE(worker->task->__state); + int t = atomic_inc_return(&total); + + /* TASK_IDLE = TASK_UNINTERRUPTIBLE|TASK_NOLOAD = 0x402. Anything + * else means the task was already woken before the cull caught it. + */ + if (st != TASK_IDLE) { + int k = atomic_inc_return(&kicked); + + pr_warn("REPRO: KICKED-then-CULLED #%d/%d (pool=%d state=0x%x worklist=%s nr_idle=%d nr_workers=%d)\n", + k, t, pool->id, st, + list_empty(&pool->worklist) ? "empty" : "NON-EMPTY", + pool->nr_idle, pool->nr_workers); + } + } } And as soon as the machine boots (even before init) I start seeing: [ 8.340748] REPRO: KICKED-then-CULLED #1/13 (pool=294 state=0x0 worklist=empty nr_idle=2 nr_workers=2) [ 753.586340] REPRO: KICKED-then-CULLED #2/5946 (pool=289 state=0x0 worklist=NON-EMPTY nr_idle=4 nr_workers=6) [ 766.349179] REPRO: KICKED-then-CULLED #3/6024 (pool=288 state=0x0 worklist=NON-EMPTY nr_idle=3 nr_workers=6) Do you think this is a good test? Thanks --breno