From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mail-wr1-f54.google.com (mail-wr1-f54.google.com [209.85.221.54])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id CB8673C2BA3
	for <linux-kernel@vger.kernel.org>; Wed, 27 May 2026 09:48:14 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.54
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1779875296; cv=none; b=W4YG7rykW3Z6p7WKJ9IMOBS3NjS7FonRyhkryRJZWPS66ah6xF1KQvjQGz4cAZD91NlkFKHbDBV4vlS1CyIfcTFJ6aUR5yJww/NEnVdjd9kpcDIKhUO2fOoeCE205I0xZf3hGAbMb413maI/JpxFNf8XkFo1k6YBF0EQSPEp+Hc=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1779875296; c=relaxed/simple;
	bh=VGOJuP1y7tz9fChFcIFSBL632XTot4skqnS/rI73Yto=;
	h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version:
	 Content-Type:Content-Disposition:In-Reply-To; b=C1MXa/96ENzzYYmrfB4yy4fQGrZMVENCr4zLJR8gKmzOBeBAF2GNasdbX4GDXO/nbp1Qq8ujzvst9GuoroyJVDU8oc6n2fX7reQENwN4KtBHA2WqZfUrowXBwSHhYGN2yv2lv2k+7S3gLYuiz03xD0O6B5nFQFiLpJuSPqOBzkg=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=debian.org; spf=pass smtp.mailfrom=gmail.com; arc=none smtp.client-ip=209.85.221.54
Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=debian.org
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-wr1-f54.google.com with SMTP id ffacd0b85a97d-44a14580111so8508596f8f.0
        for <linux-kernel@vger.kernel.org>; Wed, 27 May 2026 02:48:14 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20251104; t=1779875293; x=1780480093;
        h=in-reply-to:content-transfer-encoding:content-disposition
         :mime-version:references:message-id:subject:cc:to:from:date:x-gm-gg
         :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=7uG/iEYIKZIujwR0ElhbUpqcs2oKT/JLAb1vOQ2gWg0=;
        b=NTD9VwHz4OhKs4tjA3tpEd2Xk51B1cvaS0IYb4v20RoikNC+o0CCn9o8BbteoVfQSL
         G9lKxgpmsBgj6z2eBUVYMNoRYEwWcZqd9bcRAIq3R47icU5EHsJOUNfkx+7+SHP9im/B
         qKXsXNSyXVBTZ1iedDGx2puwgkKAmutDJjsNVylFJNUUfauv8Yqc83MzuSlT3uJ9aAHp
         v5leQq0BjQairBBVx8oZR3vrt5qx6RtwzV0hEQQTIcUmePJWkT0mTeokhFUP198GtpEx
         dkLMED3kYVx8RpQbqO++6cBgSxyx8RWA5r872dch1AWVzdIHjz9eFuKnGhei2NdH5Bo9
         mCkQ==
X-Forwarded-Encrypted: i=1; AFNElJ+Hk4plVtt74ag/OA2x57cXXuHNIuZy7vr+StoXrajJAj3ua91AELtmDaITrsxLR1YON/aCBCF8ioIZQnE=@vger.kernel.org
X-Gm-Message-State: AOJu0YzAbBPaoc4s546wMrJaMDHG2ofI172wxtJKQjjvJHa5uiZgO58W
	DsXpnqA1RIOHV+jTsDxFMp+GR81N1kyyqgUMsjOHO6iyF9kOzyBpqGrH
X-Gm-Gg: Acq92OG/8OH06mnA0RToovo+6T/Gi2IH9rkbFHYcxn0nZ++fZJjHTxiw7fbnYs0wehz
	8RhGH2yz1Mnf73mM8TxVsX0k2o+TrJU4j0hV3vIJ27dQnWCaDU9l/+hzkaxnVRHxJmriQvJxhDE
	EOtNKCorSYDQOJLHVoA2BD7DAIy6k2HUz09F6LYts/IRusUyp4lle86lztEs3iSApgOYtg6XN1e
	ElWtnf/HjOCSp7SBQwtsiC3dZy87vmyh//x+6RKxhfxijVcVenG/QCnphjnmByLGNpEni+pt60r
	om3XsbrwvD1MBERaBjkSZappA8zhfUoX3r5zr09kpuNh3m5YItLQLf8NMXlWrUQoEZq5IIHX+AY
	hmFMJBSGt1xyYhw1IiA1EBV3y9bPzdoUeWCPYr1H/Yg3Sm+8UGU8he5QRZfrK9rUZ30SJXd1QrS
	35n7UAN1gljC0el3jrZW4VgUC/cdmEonl341e7
X-Received: by 2002:a05:6000:26cf:b0:45e:b215:12e9 with SMTP id ffacd0b85a97d-45eb368903emr36807000f8f.6.1779875293004;
        Wed, 27 May 2026 02:48:13 -0700 (PDT)
Received: from gmail.com ([62.197.47.167])
        by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-45edb5579b8sm4727017f8f.9.2026.05.27.02.48.12
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 27 May 2026 02:48:12 -0700 (PDT)
Date: Wed, 27 May 2026 10:48:12 +0100
From: Breno Leitao <leitao@debian.org>
To: Hillf Danton <hdanton@sina.com>
Cc: Tejun Heo <tj@kernel.org>, Lai Jiangshan <jiangshanlai@gmail.com>, 
	linux-kernel@vger.kernel.org, marco.crivellari@suse.com, frederic@kernel.org, 
	bigeasy@linutronix.de
Subject: Re: [PATCH 2/2] workqueue: defer wake_up_process() outside
 pool->lock on hot paths
Message-ID: <aha6ENW-K-4F4AHi@gmail.com>
References: <20260526-fastwake-v1-0-e69ad86923e6@debian.org>
 <20260526212346.1100-1-hdanton@sina.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <20260526212346.1100-1-hdanton@sina.com>

On Wed, May 27, 2026 at 05:23:40AM +0000, Hillf Danton wrote:
> On Tue, 26 May 2026 14:08:06 -0400 Breno Leitao wrote:
> > @@ -2423,6 +2424,15 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
> >  
> >  out:
> >  	raw_spin_unlock(&pool->lock);
> > +	/*
> > +	 * Issue the wakeup after dropping pool->lock to shorten the
> > +	 * locked region on this hot enqueue path.  kick_pool_pick() did all
> > +	 * of the work that required the lock (worker selection and
> > +	 * wake_cpu setup); the wake_up_process() itself only needs to
> > +	 * take the target rq->lock.
> > +	 */
> > +	if (wake_p)
> > +		wake_up_process(wake_p);
> >  	rcu_read_unlock();
> >  }
> >
> I suspect this works without the pool lock held, go and see workers culled
> in idle_cull_fn() for example.

Ack, you're right that wake_up_process() itself works without pool->lock
held — idle_cull_fn() is a good precedent. But sashiko's point made me
look again, and I think there's a latent bug that should be fixed before
(or as part of) this patch.

When kick_pool() picks a worker and wakes it, the worker is flipped to
TASK_RUNNING by the scheduler, but from the workqueue's point of view it
is still WORKER_IDLE and still on pool->idle_list. The transition off
the idle list only happens later, when >the woken kthread actually
schedules in, reacquires pool->lock, and runs worker_leave_idle().

That matters because idle_cull_fn() never looks at the task's scheduler
state. It doesn't read task->__state or anything equivalent — its "is
this worker idle?" check is purely "is it on pool->idle_list, and is
last_active + IDLE_WORKER_TIMEOUT in the past?". So a worker that has
been woken but hasn't yet run worker_leave_idle() is, as far as cull is
concerned, a perfectly valid victim.

   The race is then:
   CPU A (kicker)                      CPU B (cull)
   ----------------                    ----------------
   raw_spin_lock(&pool->lock)
   pick worker W (idle_list head)
   wake_up_process(W->task)
   raw_spin_unlock(&pool->lock)
                                       raw_spin_lock(&pool->lock)
                                       too_many_workers() == true
                                       W is list_last_entry(idle_list)
                                       set_worker_dying(W, cull_list)
                                       raw_spin_unlock(&pool->lock)
                                       reap_dying_workers()
                                       kthread_stop_put(W->task)
   (W finally schedules in)
   worker_thread():
     raw_spin_lock(&pool->lock)
     WORKER_DIE set -> return 0

W exits without ever running worker_leave_idle() or touching
pool->worklist, and the work item we just enqueued is stranded until the
next pool event kicks somebody else.

This window is narrow in mainline today (bounded by scheduler latency
between the kicker dropping pool->lock and the wakee reacquiring it,
plus the 5 s IDLE_WORKER_TIMEOUT precondition), but my patch widens it
considerably — process_one_work()'s deferred-wake site re-enables IRQs
and is fully preemptible across the gap.

I am still unclear how to fix it, but I am thinking about it.

--
pw-bot: cr