From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f41.google.com (mail-wm1-f41.google.com [209.85.128.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0E36A2F12CE for ; Thu, 28 May 2026 14:35:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.41 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779978920; cv=none; b=Y+BSM4nX5AvOn4J3woPL0LUCFaf+47wv9W98CS1ztxpvIg8evj9F+oCrR/61lFJX2Y1JDTGAyR4HaSQJlIIDuEVObFvt3LUMUfoCXkNbBkh/iU+9muFT3b23AH/8Nh4td5awDHiRCZ+l5ovc3qR9Q03dKYtVgS+tBaeH6O4ncCU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779978920; c=relaxed/simple; bh=anOEmg7AO51aVjzyp2avrN+OddYr1cDfaEegATe3ivU=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=n0rjfLWs5bscUz3nMjbHyoh0Jl86Wz46PX+bO7P+ljsrT1xyYSsp2dkmlzZK3XIAvafPs2lSFlUM7MSUb+09i4EOIn9ZTKSGtHVnmzXS9GyoEAiFDpA/P3sNyPNNTTcxDwJZMuDoTDcTXMZilbKdbfjL1NjjwA6kIYPDujS8TU8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=debian.org; spf=pass smtp.mailfrom=gmail.com; arc=none smtp.client-ip=209.85.128.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-wm1-f41.google.com with SMTP id 5b1f17b1804b1-4891e86fabeso36991945e9.1 for ; Thu, 28 May 2026 07:35:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779978917; x=1780583717; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mLNAvuaz/qSOqrYE0U0UNrcQuzzQC4Or2G2MBskDXjE=; b=MlDQjafA3US8m6tQCRkoz0++c1XDUSIWj7OH32HEt4e2lu7HpwTzIwvlNVFMPFYTB+ F/bYl5FGDnkJubu2PPQfnTChYdbATwlGfnS34eTUSDnb/wN6BDA9B1JEN+el0rbRCE0P LSwu753mvzaFzA76vBCHILiUBmJsx0UjqSthuWxZaEBq9lcCHEMVvMaFF4f2lJqP8hsA gV89LGa4zH+DL8BV5i5TFb4cix1dkC74uwcRCrQw/+2GGFSBOxujy2Gsuv9NKIJ85q+1 D+WzwxTnXx/gb4GthoM4DzMLIOfncCXsRoWhVGg4zQzej+l9638IxUQSYnSWZHVD+o9k UFNA== X-Forwarded-Encrypted: i=1; AFNElJ9ylWx0z/7Aea9pedGvFbnEdAJpWUTfiRq9LL+JzHCUhM6v8d8TrCop7UstJ8k03tsRwMkwqcbdpQJ+oQc=@vger.kernel.org X-Gm-Message-State: AOJu0Yz9xvf5bqgpYNTymsiZuter4kNvsxm6Wpw/iYAhNR8IA+Gs23M3 RnfhOpiIIiu/1T8RcsU/rSqCudXoYVNXaHb9uqTOGFd4TmhUGFllmVCC X-Gm-Gg: Acq92OGOob6eJk09L66iRz8b7LOquhGNY2ASeH18l1Bu8gdmHsajcrIqS4X1yc7WUmF 2bL3Pt9QlsMLTdnu5fRtHXJ64xhnmy3qB88JpcpDgAnqUjYBiv25O35+xUV4h7NYMKQKO5/gii1 sMD+tUb73UqLJXnETiquKh4hILnFKG/Zj4pZDqi+MDtnKxIvdIDhFXEMDrCvz9tkx121av4gBD0 WWrRYjicdW2j5EdGJls1V+rZwnJX3TFETgF9vwL0zdB2UtT/tzEroWzthpSEwA2vgQ6qYDdNEqh SITr/WUjSGSAqMSuo2uMjJMHbrYxmEVGkl/iMvfoXE0ZE9bRBuP95O5qm5Dx/+fEocGHIJqWk5Q vmeFfv9Ce9tX2c8Jg5RQogomjyH3rdOF3Ebka3fSb9IlpF8v0OciTlnWA7+cfEEAj3AdqnaWAgM 3SFbUmx2TxDqVttf1so1GerrRUD0b+L6fdIXngNbAi1PB/wn4= X-Received: by 2002:a05:600d:6413:20b0:488:78f2:6b0 with SMTP id 5b1f17b1804b1-490428ed552mr340436905e9.29.1779978917189; Thu, 28 May 2026 07:35:17 -0700 (PDT) Received: from gmail.com ([62.197.47.167]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-45edb54a432sm19693173f8f.3.2026.05.28.07.35.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 May 2026 07:35:16 -0700 (PDT) Date: Thu, 28 May 2026 15:35:16 +0100 From: Breno Leitao To: Sebastian Andrzej Siewior Cc: Hillf Danton , Tejun Heo , Lai Jiangshan , linux-kernel@vger.kernel.org, marco.crivellari@suse.com, frederic@kernel.org Subject: Re: [PATCH 2/2] workqueue: defer wake_up_process() outside pool->lock on hot paths Message-ID: References: <20260526-fastwake-v1-0-e69ad86923e6@debian.org> <20260526212346.1100-1-hdanton@sina.com> <20260527153500.ERVl3my3@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260527153500.ERVl3my3@linutronix.de> On Wed, May 27, 2026 at 05:35:00PM +0000, Sebastian Andrzej Siewior wrote: > On 2026-05-27 07:51:17 [-0700], Breno Leitao wrote: > > @@ -3447,7 +3459,13 @@ static int worker_thread(void *__worker) > > return 0; > > } > > > > - worker_leave_idle(worker); > > + /* > > + * Kicked workers have already been removed from pool->idle_list > > + * by kick_pool(); only first-time wakeups (via create_worker()) > > + * still arrive with WORKER_IDLE set. > > + */ > > + if (worker->flags & WORKER_IDLE) > > + worker_leave_idle(worker); > > Couldn't create_worker() be aligned here not set the idle flag and wake > the thread a few lines later? Then we wouldn't have to conditionally > clear the idle flag here (which sort of NULL renders the flag check in > worker_leave_idle()). Agreed. If create_worker() defers worker_enter_idle() until just before wake_up_process(), we can drop this enter/leave pair entirely and avoid the awkward conditional worker_leave_idle() What about something like: commit 0e91d33f8fe8e86c9a2be40d7a0163f68300ea1f Author: Breno Leitao Date: Thu May 28 09:58:28 2026 -0400 workqueue: leave idle under pool->lock before waking the worker A woken worker is woken from two different shapes today: - kick_pool_pick() picks a worker off pool->idle_list (LIFO head), optionally adjusts wake_cpu under pool->lock and returns the task to wake. The worker is left WORKER_IDLE and on pool->idle_list at the point wake_up_process() runs. - create_worker() calls worker_enter_idle() to put a brand-new worker on pool->idle_list, then wakes it for the first time. In both cases the woken worker eventually reaches worker_thread:woke_up:, reacquires pool->lock and unconditionally calls worker_leave_idle() to flip its own WORKER_IDLE / pool->idle_list / pool->nr_idle state. That is bookkeeping the waker could just as well have done under the same pool->lock acquisition it already holds, and it muddles the invariant for any concurrent observer: a runnable worker still observably looks idle until it actually schedules in. Move the worker_leave_idle() to the waker side, while pool->lock is still held: - kick_pool_pick() calls worker_leave_idle(worker) right before returning the task to wake. Both kick_pool() and the deferred- wake callers in __queue_work() / process_one_work() inherit this. - create_worker() pairs its worker_enter_idle() with a matching worker_leave_idle() right before wake_up_process(), under the same pool->lock section. nr_idle accounting is unchanged from the outside (enter and leave happen back-to-back under the lock); the only visible difference is that the new worker is no longer on pool->idle_list at the point it is woken. - worker_thread:woke_up: drops its worker_leave_idle() call: by the invariant above, a worker reaching woke_up: is always already !WORKER_IDLE and off pool->idle_list. The resulting invariant is uniform across all wakeup paths: a woken worker is !WORKER_IDLE and off pool->idle_list. No functional change intended. This also shrinks the locked region at worker_thread:woke_up: (one fewer field update while holding pool->lock). Suggested-by: Sebastian Andrzej Siewior Signed-off-by: Breno Leitao diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 8df671066dd1..e66dd507a841 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -1310,6 +1310,8 @@ static bool kick_pool(struct worker_pool *pool) } } #endif + /* Leave idle under pool->lock; worker_thread:woke_up: relies on this. */ + worker_leave_idle(worker); wake_up_process(p); return true; } @@ -2881,8 +2883,11 @@ static struct worker *create_worker(struct worker_pool *pool) * check if not woken up soon. As kick_pool() is noop if @pool is empty, * wake it up explicitly. */ - if (worker->task) + if (worker->task) { + /* match kick_pool_pick(): leave idle before waking */ + worker_leave_idle(worker); wake_up_process(worker->task); + } raw_spin_unlock_irq(&pool->lock); @@ -3447,7 +3452,7 @@ static int worker_thread(void *__worker) return 0; } - worker_leave_idle(worker); + /* wakers already called worker_leave_idle() under pool->lock */ recheck: /* no more worker necessary? */ if (!need_more_worker(pool))