From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wr1-f54.google.com (mail-wr1-f54.google.com [209.85.221.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CB8673C2BA3 for ; Wed, 27 May 2026 09:48:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.54 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779875296; cv=none; b=W4YG7rykW3Z6p7WKJ9IMOBS3NjS7FonRyhkryRJZWPS66ah6xF1KQvjQGz4cAZD91NlkFKHbDBV4vlS1CyIfcTFJ6aUR5yJww/NEnVdjd9kpcDIKhUO2fOoeCE205I0xZf3hGAbMb413maI/JpxFNf8XkFo1k6YBF0EQSPEp+Hc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779875296; c=relaxed/simple; bh=VGOJuP1y7tz9fChFcIFSBL632XTot4skqnS/rI73Yto=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=C1MXa/96ENzzYYmrfB4yy4fQGrZMVENCr4zLJR8gKmzOBeBAF2GNasdbX4GDXO/nbp1Qq8ujzvst9GuoroyJVDU8oc6n2fX7reQENwN4KtBHA2WqZfUrowXBwSHhYGN2yv2lv2k+7S3gLYuiz03xD0O6B5nFQFiLpJuSPqOBzkg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=debian.org; spf=pass smtp.mailfrom=gmail.com; arc=none smtp.client-ip=209.85.221.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-wr1-f54.google.com with SMTP id ffacd0b85a97d-44a14580111so8508596f8f.0 for ; Wed, 27 May 2026 02:48:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779875293; x=1780480093; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=7uG/iEYIKZIujwR0ElhbUpqcs2oKT/JLAb1vOQ2gWg0=; b=NTD9VwHz4OhKs4tjA3tpEd2Xk51B1cvaS0IYb4v20RoikNC+o0CCn9o8BbteoVfQSL G9lKxgpmsBgj6z2eBUVYMNoRYEwWcZqd9bcRAIq3R47icU5EHsJOUNfkx+7+SHP9im/B qKXsXNSyXVBTZ1iedDGx2puwgkKAmutDJjsNVylFJNUUfauv8Yqc83MzuSlT3uJ9aAHp v5leQq0BjQairBBVx8oZR3vrt5qx6RtwzV0hEQQTIcUmePJWkT0mTeokhFUP198GtpEx dkLMED3kYVx8RpQbqO++6cBgSxyx8RWA5r872dch1AWVzdIHjz9eFuKnGhei2NdH5Bo9 mCkQ== X-Forwarded-Encrypted: i=1; AFNElJ+Hk4plVtt74ag/OA2x57cXXuHNIuZy7vr+StoXrajJAj3ua91AELtmDaITrsxLR1YON/aCBCF8ioIZQnE=@vger.kernel.org X-Gm-Message-State: AOJu0YzAbBPaoc4s546wMrJaMDHG2ofI172wxtJKQjjvJHa5uiZgO58W DsXpnqA1RIOHV+jTsDxFMp+GR81N1kyyqgUMsjOHO6iyF9kOzyBpqGrH X-Gm-Gg: Acq92OG/8OH06mnA0RToovo+6T/Gi2IH9rkbFHYcxn0nZ++fZJjHTxiw7fbnYs0wehz 8RhGH2yz1Mnf73mM8TxVsX0k2o+TrJU4j0hV3vIJ27dQnWCaDU9l/+hzkaxnVRHxJmriQvJxhDE EOtNKCorSYDQOJLHVoA2BD7DAIy6k2HUz09F6LYts/IRusUyp4lle86lztEs3iSApgOYtg6XN1e ElWtnf/HjOCSp7SBQwtsiC3dZy87vmyh//x+6RKxhfxijVcVenG/QCnphjnmByLGNpEni+pt60r om3XsbrwvD1MBERaBjkSZappA8zhfUoX3r5zr09kpuNh3m5YItLQLf8NMXlWrUQoEZq5IIHX+AY hmFMJBSGt1xyYhw1IiA1EBV3y9bPzdoUeWCPYr1H/Yg3Sm+8UGU8he5QRZfrK9rUZ30SJXd1QrS 35n7UAN1gljC0el3jrZW4VgUC/cdmEonl341e7 X-Received: by 2002:a05:6000:26cf:b0:45e:b215:12e9 with SMTP id ffacd0b85a97d-45eb368903emr36807000f8f.6.1779875293004; Wed, 27 May 2026 02:48:13 -0700 (PDT) Received: from gmail.com ([62.197.47.167]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-45edb5579b8sm4727017f8f.9.2026.05.27.02.48.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 May 2026 02:48:12 -0700 (PDT) Date: Wed, 27 May 2026 10:48:12 +0100 From: Breno Leitao To: Hillf Danton Cc: Tejun Heo , Lai Jiangshan , linux-kernel@vger.kernel.org, marco.crivellari@suse.com, frederic@kernel.org, bigeasy@linutronix.de Subject: Re: [PATCH 2/2] workqueue: defer wake_up_process() outside pool->lock on hot paths Message-ID: References: <20260526-fastwake-v1-0-e69ad86923e6@debian.org> <20260526212346.1100-1-hdanton@sina.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20260526212346.1100-1-hdanton@sina.com> On Wed, May 27, 2026 at 05:23:40AM +0000, Hillf Danton wrote: > On Tue, 26 May 2026 14:08:06 -0400 Breno Leitao wrote: > > @@ -2423,6 +2424,15 @@ static void __queue_work(int cpu, struct workqueue_struct *wq, > > > > out: > > raw_spin_unlock(&pool->lock); > > + /* > > + * Issue the wakeup after dropping pool->lock to shorten the > > + * locked region on this hot enqueue path. kick_pool_pick() did all > > + * of the work that required the lock (worker selection and > > + * wake_cpu setup); the wake_up_process() itself only needs to > > + * take the target rq->lock. > > + */ > > + if (wake_p) > > + wake_up_process(wake_p); > > rcu_read_unlock(); > > } > > > I suspect this works without the pool lock held, go and see workers culled > in idle_cull_fn() for example. Ack, you're right that wake_up_process() itself works without pool->lock held — idle_cull_fn() is a good precedent. But sashiko's point made me look again, and I think there's a latent bug that should be fixed before (or as part of) this patch. When kick_pool() picks a worker and wakes it, the worker is flipped to TASK_RUNNING by the scheduler, but from the workqueue's point of view it is still WORKER_IDLE and still on pool->idle_list. The transition off the idle list only happens later, when >the woken kthread actually schedules in, reacquires pool->lock, and runs worker_leave_idle(). That matters because idle_cull_fn() never looks at the task's scheduler state. It doesn't read task->__state or anything equivalent — its "is this worker idle?" check is purely "is it on pool->idle_list, and is last_active + IDLE_WORKER_TIMEOUT in the past?". So a worker that has been woken but hasn't yet run worker_leave_idle() is, as far as cull is concerned, a perfectly valid victim. The race is then: CPU A (kicker) CPU B (cull) ---------------- ---------------- raw_spin_lock(&pool->lock) pick worker W (idle_list head) wake_up_process(W->task) raw_spin_unlock(&pool->lock) raw_spin_lock(&pool->lock) too_many_workers() == true W is list_last_entry(idle_list) set_worker_dying(W, cull_list) raw_spin_unlock(&pool->lock) reap_dying_workers() kthread_stop_put(W->task) (W finally schedules in) worker_thread(): raw_spin_lock(&pool->lock) WORKER_DIE set -> return 0 W exits without ever running worker_leave_idle() or touching pool->worklist, and the work item we just enqueued is stranded until the next pool event kicks somebody else. This window is narrow in mainline today (bounded by scheduler latency between the kicker dropping pool->lock and the wakee reacquiring it, plus the 5 s IDLE_WORKER_TIMEOUT precondition), but my patch widens it considerably — process_one_work()'s deferred-wake site re-enables IRQs and is fully preemptible across the gap. I am still unclear how to fix it, but I am thinking about it. -- pw-bot: cr