From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A081A1A08AF for ; Fri, 5 Jun 2026 14:41:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780670463; cv=none; b=QpEGmNQDHfUVUochcIsT3BPJl3q3B7QRFxLWEPUmTiJBfFGzkBB4oVhfuSryuc2sr0Y2pQRktmUEJnE3pa16mlkSimondntMGGYaQ5U0nXz4kbXeJ5/+XZGGTV/aRe+MxLQAnWLWGw1XAaiQIe7l1h2vgeNnm3uGu4RHB30YYA4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780670463; c=relaxed/simple; bh=oV5P6P/hfZGktoMQ1R2ZFzq9yDW7/tINrGm0qr6f/7E=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=mYF8kO82jUotu5DCTYvbGUtVUivBjtgHt6d+WTtjB3NhIg/j8hHgyc+VRx7pncBWD5lwQfhDpg+2dlcn8BWjLooBjbvmnShcZalR97eosu2EeDzEOtPfJ5djW4GkR5sGxxBt7uxqKof/8dTuEYJB0bkMpmSRVUbRHZwaKMRZXhA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org; spf=pass smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=KCtOWcMw; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="KCtOWcMw" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=gFBOmF8TW9wWxPWbvy9/gLaudyIsWGmly2cM9okqcsA=; b=KCtOWcMw1yFDzRvSK9D/dC+Lhx 0sNAHitLEmSRq9fgd3q3lQAV4cA8gmkXvqzDbrAKNgZpCplMgw228BHeMb2H23xQfx0F0g0jtq2dX /8xhcCD85F0AAWyLQy8ziWvBp7flsgg6fuC0/gLf4qJLi1GHmvt40b2n1FJatFtvhf+iL3D5YKxex 8u/hAK7W8ViiC3I72a1l2tLw3IllXDAjAnnXNP6JWczwoeRaNnWCqrUsmrnz8cWjmB1SOWY/qVt6P rOqdqjDEePnBYx58qTxUqMH4o4utw4kHcfNEoALGBPSUbYFDx//JDQUf8H864W3oY8M23eVOPz4VT Rxk1d4Sg==; Received: from authenticated-user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.96) (envelope-from ) id 1wVVj6-005MXu-2H; Fri, 05 Jun 2026 14:40:49 +0000 Date: Fri, 5 Jun 2026 07:40:43 -0700 From: Breno Leitao To: Tejun Heo Cc: Lai Jiangshan , linux-kernel@vger.kernel.org, marco.crivellari@suse.com, frederic@kernel.org, bigeasy@linutronix.de, Hillf Danton , kernel-team@meta.com Subject: Re: [PATCH v2 1/4] workqueue: park kicked worker on pool->kicked_list Message-ID: References: <20260603-fastwake-v2-0-2977512fe7fa@debian.org> <20260603-fastwake-v2-1-2977512fe7fa@debian.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Debian-User: leitao Hello Tejun, On Wed, Jun 03, 2026 at 10:50:13PM -1000, Tejun Heo wrote: > On Wed, Jun 03, 2026 at 06:40:08AM -0700, Breno Leitao wrote: > > kick_pool() picks an idle worker and wakes it, but leaves it WORKER_IDLE > > on pool->idle_list until the woken kthread schedules in and runs > > worker_leave_idle(). idle_cull_fn() only checks WORKER_IDLE, not the > > A lot of the existing comments in the file are double spaced but let's not > do that anymore. Please use single space to separate sentences in desc and > comments. I suppose you mean double space after double space? Sorry, I will avoid it. > > task state, so a kicked-but-not-yet-scheduled worker is still a valid > > cull victim -- the cull can reap it before it consumes the just-enqueued > > work, stranding the item. The window is narrow today but later patches > > in this series defer the wakeup outside pool->lock, widening it. > > Have you actually reproduced this? No -- not without artificially changing the timeout in the kernel. So far this is a theoretical race window rather than something I've hit in practice; it was flagged as a critical issue by sashiko: https://sashiko.dev/#/patchset/20260526-fastwake-v1-0-e69ad86923e6%40debian.org The only way I could get it to actually strand an item was by shrinking IDLE_WORKER_TIMEOUT 150,000x (300s -> 2ms), so that a worker counts as "timed out" almost immediately. > Idle culling never dips below 2 idle workers. Wakeup is from the head > of the list and culling from the end. Right -- idle_cull_fn() reaps from the tail (list_last_entry), i.e. the worker with the oldest last_active, and stops once nr_idle drops to 2, while kick_pool() wakes the head. So in the common case the kicked worker (head) and the cull victim (tail) are different workers. What I'm less sure about: the cull decision is purely time based -- it reaps the tail worker only if worker->last_active + IDLE_WORKER_TIMEOUT has elapsed -- but last_active is written in exactly one place, worker_enter_idle(). Neither kick_pool()/kick_pool_pick() nor worker_leave_idle() update it; it is only read by the two cull timers. So being kicked gives a worker no grace period against the cull: a worker that has already been idle ~IDLE_WORKER_TIMEOUT and is then kicked still looks expired, and if it drifts off the head (other workers going idle ahead of it) before it gets a chance to run, the cull can still reap it. Would it make sense to refresh last_active when a worker is kicked? The cull walks tail->head and breaks at the first non-expired worker, so a freshly-stamped kicked worker would simply be skipped while genuinely old workers behind it are still reaped. Thanks for the review, --breno