From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AEB802E8897 for ; Wed, 24 Jun 2026 11:19:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782299964; cv=none; b=rEfguVvPGfRi1m2sPeZtL7Nimx1Fk5XnuhRJ6n9nGf8RfIdfPmdNSnhGKREb7S2o51Dqnf7Ewph75GNsGVIlU1IV7oe3sU/Bi43NPMf9sBHSEikzF5n1GzOtdkxynm9oRNUc1Y/h76x7L/LYPxQ8ipOTt5bmpERyAen8ziM3KxU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782299964; c=relaxed/simple; bh=e0A84KnMA0xyb5aZ1Tc203iuLgcNthqQSkeSi731pkc=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=sMpsfPxRXgyit6od/ACeKYxGisCQmKOSUarRUkjkfjTDsCiVPwZG4dMO/oCnBhQP5v51HC4GZgXGictactvebtXFmoKu+QWVv/g+U384eD3wURIJrFzZyl0IkYpCBQFjsL4SPVqbftLUsTBbJ6bz0p3dovhQXyw0yYJnL602hqg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org; spf=pass smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=BToFDBe1; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="BToFDBe1" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:In-Reply-To:Content-Transfer-Encoding: Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date: Reply-To:Content-ID:Content-Description; bh=+xOaLu5bc4CbXgg1P44yc0tA9Fd4yw1VFp+FQ7XQZPY=; b=BToFDBe15eSgmQJyHUPas/d8pN pj2wOUaKIP4McAqJDTAbN4ojcdPIipY/m+EvHQz38mjBklqmB0C5+/4wWqYVuU5WW/Zr+t0sYGlLp VZ1sL8w+HRuV/FvYDEBEisEsqn5aPF/wPsu46WPWfqF0sCyqDKb3Oy1Nk95US0sFTuZGPpFQEjPgH EnC4h8XwZn0PhMR7gPOdC7DvRQtnoGJLAY3icoG+7436uBLFpv5X+DbhWyWtMHO8Tb/SDpcFbS5lR ToWJMcbnHguJtLeVygpiJ88B0yZlGKJav1BfkB7pERBr8hdMivZXmv6frz1y6eImrrSxgF7eGzbQa zg6l4vdg==; Received: from authenticated-user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.96) (envelope-from ) id 1wcLdK-002O7M-2i; Wed, 24 Jun 2026 11:19:07 +0000 Date: Wed, 24 Jun 2026 04:19:02 -0700 From: Breno Leitao To: Sebastian Andrzej Siewior Cc: Tejun Heo , Lai Jiangshan , linux-kernel@vger.kernel.org, marco.crivellari@suse.com, frederic@kernel.org, Hillf Danton , kernel-team@meta.com, kmagar@redhat.com, psuriset@redhat.com Subject: Re: [PATCH v3 2/3] workqueue: defer the worker wakeup outside pool->lock in __queue_work() Message-ID: References: <20260616-fastwake-v3-0-79da19fcd08f@debian.org> <20260616-fastwake-v3-2-79da19fcd08f@debian.org> <20260624084756.X2i4QiPm@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20260624084756.X2i4QiPm@linutronix.de> X-Debian-User: leitao Hello Sebastian, On Wed, Jun 24, 2026 at 10:47:56AM +0200, Sebastian Andrzej Siewior wrote: > On 2026-06-16 06:33:32 [-0700], Breno Leitao wrote: > > __queue_work() is the enqueue hot path: it inserts the work item and > > calls kick_pool() while holding pool->lock. kick_pool() ends in a > > wakeup, which takes the target task's rq->lock, so rq->lock nests under > > pool->lock on every enqueue that wakes a worker on a contended unbound > > pool. > > > > Use kick_pool_pick() to select and claim the worker under pool->lock, > > queue it on an on-stack wake_q, and issue the wakeup with wake_up_q() > > right after dropping the lock via raw_spin_unlock_wake(). Worker > > selection, wake_cpu setup and claiming the worker off pool->idle_list > > still happen under the lock; only the rq->lock acquisition moves out. > > > > Signed-off-by: Breno Leitao > … > > --- a/kernel/workqueue.c > > +++ b/kernel/workqueue.c > … > > out: > > - raw_spin_unlock(&pool->lock); > > + raw_spin_unlock_wake(&pool->lock, &wakeq); > > rcu_read_unlock(); > > } > > It is not wrong but I am not sure if this is really needed here. The > pattern > preempt_disable(); > raw_spin_unlock(); > wake_up_q() > preempt_enable(); > > is used to prevent task preemption after the unlock operation. The > futex/ locking code needs to wake a task but before the unlock operation > the task priority might have been lowered as result of dropping the > lock. This means it might not be the task with the highest priority in > the system and the task with highest is not yet active and we schedule a > task in middle in the instead. > To form this easier: say we have Task A prio=1, B prio=2 and C prio=3 > with higher number higher priority. > A owns a lock and is preempted by B. C gets on the CPU preempts B. C > wants A's lock so it passes its priority to A (PI-boost) and goes idle. > A gets on the CPU and unlocks. > Now: As part of the unlock operation (before the raw_spin_unlock()) A > goes back to its initial priority and C is not yet woken up meaning B is > the task with highest priority and the held raw_spinlock_t is the only > thing preventing scheduling. So we disable preemption, unlock and then > perform the wake-up so C becomes the next candidate (as it should be) to > occupy the CPU. > Otherwise it would be B which means per definition a task with lower > priority runs before a task with higher priority. > > I don't think workqueue has this requirements here. Worst case something > else gets on the CPU and worker wakeup is delayed until the task is > scheduled again. It could be fine since the preemption could happen > before queue-work(). > It probably does not lead to a huge performance > regression but who knows. > > Your goal was to lower the contention on the pool lock during the wake > up which you achieved. The __queue_work() remains still not preemptible > until after the wake up which might be fine. You're absolutely right — the preemption dance isn't needed here. I used raw_spin_unlock_wake() because the helper seemed to match what I was doing, but this preempt_disable/enable pattern (while cheap) is unnecessary for this use case. I'll update the series to use the standard approach: raw_spin_unlock_irq(&pool->lock); wake_up_q(&wakeq); Thanks, --breno