From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D8120331A65 for ; Thu, 4 Jun 2026 15:29:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780586984; cv=none; b=s960qiVTwgqCCwV248WbUY+X93UOXHwzJ+aPPeXDeF+3N6Is0EbkHfme0vzla0FOuhBJGcr3ffooiDCwP18m5dgAHtIBw6PnGsqZxU5zPVzwY5709vdsbmTTtNMj7ZI6TAq9HfdpCdP6aQDo+n8q5cFhkCkWZE1MU8XnVIOm8kw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780586984; c=relaxed/simple; bh=10MlPEoEU6u97prJfddwcfjuoK/2wcZEK79aLMD0Njw=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=jBU4dA+8BTDNmamjDDyRqM6iFMXE/8lWXOtPbLtLNs6zUJ3WffFaI04nMFHL3X8BuPIndCpWzFFX/KRjL2bnTTwY/XU7k7t/hlq6ioz2DweNGwf7UZuPb7z7djMuLyzqZAS1YZgjdrde6OuZcVASDye3Y8HPVPvUWNKrLJZ2NbA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org; spf=pass smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=Hb0u577e; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="Hb0u577e" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=+Fr7L1CLy2dHroKBSN0AEjkGoAKNI9+EEJxN6e2Gtrk=; b=Hb0u577eqv3+a6GnxLhww+4373 4d+RUQqwrtzUF+dLGqvoyDbxVtu77hqs5e/7dDCuJ48S8XtmHETCrHs6tGprC3IXA745j8F3Z9q5+ vXJln/YE3Ad69Sv6Vc4/C6u+lhWopN7/8GKeznSYzCmCSHUKRodBK1KId2SsgTo1p/1Ygt2OWp/10 nlQFpHaMGAM4c9a/dkBOISewTegKdLz+0w123Rf5wxpuu2AP8g8TV0J1GrY0ZSJWa9aMp5vPE1SYq JSxiVGpKK/iswvAiFvDzyyAYnZBv/KPaRlsRW3bDdD0yb+b/z/3tNJEfiGyFUopiJRrRZTeFjbZ2W X6jM09Hw==; Received: from authenticated-user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.96) (envelope-from ) id 1wVA0k-004bxK-1z; Thu, 04 Jun 2026 15:29:34 +0000 Date: Thu, 4 Jun 2026 08:29:30 -0700 From: Breno Leitao To: Tejun Heo Cc: Lai Jiangshan , linux-kernel@vger.kernel.org, marco.crivellari@suse.com, frederic@kernel.org, bigeasy@linutronix.de, Hillf Danton , kernel-team@meta.com Subject: Re: [PATCH v2 4/4] workqueue: defer the worker wakeup outside pool->lock in process_one_work() Message-ID: References: <20260603-fastwake-v2-0-2977512fe7fa@debian.org> <20260603-fastwake-v2-4-2977512fe7fa@debian.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Debian-User: leitao On Wed, Jun 03, 2026 at 10:50:29PM -1000, Tejun Heo wrote: > On Wed, Jun 03, 2026 at 06:40:11AM -0700, Breno Leitao wrote: > > process_one_work() kicks the pool to chain execution of the remaining > > work items on WORKER_NOT_RUNNING pools (the UNBOUND and CPU_INTENSIVE > > ones), calling kick_pool() while holding pool->lock. As in the enqueue > > path, the wakeup pulls the target rq->lock in under pool->lock. > > > > Use kick_pool_pick() to select and claim the worker under pool->lock and > > issue the wakeup with wake_up_q() after the lock is dropped via > > raw_spin_unlock_irq_wake(). > > > > With both hot paths converted, measured on a CONFIG_SMP x86 VM (8 vCPUs) > > with the in-tree test_workqueue benchmark (lib/test_workqueue.c; each of > > 8 producers queues 200000 work items one at a time on a WQ_UNBOUND > > workqueue, waiting for each to complete), medians of five boots per > > scope: > > Please test on bare metal. Done, on two bare-metal machines, same test_workqueue benchmark (8 producers x 200000 items, one in flight at a time -- exactly like the test we have lib/test_workqueue.c today): - arm64: NVIDIA Grace (Neoverse V2), 72 cores - x86: Intel Xeon Platinum 8321HC (Cooper Lake), 52 cores VMs and arm64 (Grace) is where this series is meant to pay off -- waking an idle CPU sitting in wfi costs an IPI, so doing it under pool->lock lengthens the critical section. The bare-metal numbers match what the VM showed: affinity_scope baseline patched tput p95 (items/s) (items/s) gain drop -------------- --------- --------- ------ ------ cpu 2,569,880 3,029,740 +17.9% -13.6% smt 2,586,485 3,044,788 +17.7% -14.0% cache_shard 572,055 797,621 +39.4% -37.1% cache 538,132 724,997 +34.7% -30.1% numa 528,673 658,215 +24.5% -20.5% system 524,287 614,486 +17.2% -21.1% (p95 drop = change in p95 enqueue latency; negative is better.) (tput gain = number of requests enqueued per sec; bigger is better.) On x86 (Cooper Lake) the same test was neutral, thow -- within boot-to-boot noise on the contended scopes. I got the impression waking an idle x86 CPU is cheap, so there is little under-lock wakeup cost to move out, and the benchmark stays pool->lock-acquisition bound either way (perf shows ~46% in queued_spin_lock_slowpath both before and after, unchanged). So the win is real but architecture-dependent: arm64 (and virt, where a vCPU wakeup is even more expensive) benefit; x86 bare metal is a Null-ish.