From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7332B3B636B for ; Thu, 7 May 2026 11:18:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778152720; cv=none; b=EbuZt74RGd+LVW3Obe7m7H15qln9PGsdn9XkSMApYvYVSGx+MUAUkzePIUePdDNf4IXkw+8I4hEr7+SYeEyQXYWkrRvTo83Lxuq2HCwqo3KUUY6054ipL1YgOXTFcanjfOBDI4+/fBbD0uvCRPlyEx7McUM77iuHKp+D+H8E6kI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778152720; c=relaxed/simple; bh=R6UIEuB7ozTEg4GwD7WZ0hKUcr/Xl0iGg51USHQcNEU=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:To:Cc; b=WQFqAya6rnsBSCkz/MHDaVK6l8m1L9dReW0gcZwf8gtOvvv1Lnnh6AG1xYdIgud9pZafJvAijfVL49bzgEgMthCkap9t7mEkSGP3pZ4QSENUgEdWnc58dKbOp/T20AxWyWwBO/at0+EL6yirFSs+sp/3sGMokOxOfdzP9xti2gU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org; spf=pass smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=GeovrH15; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="GeovrH15" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:Cc:To:Message-Id: Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date:From: Reply-To:Content-ID:Content-Description:In-Reply-To:References; bh=Q2SOBwSin2XOGKUunP+75+d6lW9UG1JpKp6vDU//bfY=; b=GeovrH15WYrEnEYFI0+brNgIFW WZpRwOsqoqg4oSafe1sSTp3vJm734iG89QRnA29SRvkQ99/2oFH6DhUDJ7Mk/K2D3dI2CKIgtioxv JVwCXY+oEMSlP5Kvt1p3fvu866u+iAPV/kmN8Mrrjpzk344GM2AaBbHte6NMc0IA2dHelJjCG5rLe +DwxgX4qQnNSzLJcxVnDPMtYdxnXBb1hroswK1o5VC/0ncwopzKRhxZs+zSLZif+cJouiohiUR4rS dXHYqVL1OY5OZDIbYBpMdxn+NURxGY1LWOgrBNuAt48G0uiTproIIcSvP49jj1oh7rzW+oZagYfcs Z4ejAnTA==; Received: from authenticated user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.96) (envelope-from ) id 1wKwkL-004KTE-1P; Thu, 07 May 2026 11:18:25 +0000 From: Breno Leitao Date: Thu, 07 May 2026 04:04:46 -0700 Subject: [PATCH] workqueue: release PENDING in __queue_work() drain/destroy reject path Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20260507-workqueue_pending-v1-1-3a53e2facf4e@debian.org> X-B4-Tracking: v=1; b=H4sIAM5x/GkC/yXM0QqCMBQG4Fc5/NcOppjhXiUimv7aKZi2tRTEd 4/q8rv5NiRGZYKTDZFvTToFOCkLQXe7hpFGezhBZavGHuzRLFN8PDMzLzNDr2E0vi09fVtzqBs Ugjly0PV3ns5/p+zv7F7fCPv+AXXky/J1AAAA X-Change-ID: 20260507-workqueue_pending-b91beb94ef46 To: Tejun Heo , Lai Jiangshan Cc: linux-kernel@vger.kernel.org, clm@meta.com, kernel-team@meta.com, Breno Leitao X-Mailer: b4 0.16-dev-453a6 X-Developer-Signature: v=1; a=openpgp-sha256; l=2874; i=leitao@debian.org; h=from:subject:message-id; bh=R6UIEuB7ozTEg4GwD7WZ0hKUcr/Xl0iGg51USHQcNEU=; b=owEBbQKS/ZANAwAIATWjk5/8eHdtAcsmYgBp/HT+O0JEx3PtEhiZnbzJbYsJGEbeW26Cveqx7 dqOcID4w8mJAjMEAAEIAB0WIQSshTmm6PRnAspKQ5s1o5Of/Hh3bQUCafx0/gAKCRA1o5Of/Hh3 bQY1EACI7ADjjSnrhsrcHJUKweHusHiGjbzuwq0i6Jxb4iNbTmf7CmIHr4CPkQfN/nZ5T4l5QR5 Qc+QAlJTFSKiZxZV/bYtMygqf2v7htJJslJjnavjTZrlxuM+SEL2noBQUjcKjWlpyVWyDDyKfhE ARUjGP+7lhAC9IFx2presNQ0tvdP8RJdmVysjB9m463+IIV9I3yqMXX7nCiYGuVVjOWVt0lJ6Np YEhzBW6wwlJekj9e3a99TD2ww7KSXVo4RZN/EQnO1+x+EvVggn+T27xdRklWe9UyoniWPExiPS2 ZOYHTlrK4XGQ0Jbqr+Rd78MqBkqhVtEpwZnlb3guneznZP1Ca/29ehMYJFOP5r3MmemVgBnLPeO m2/+C8d5ik7RsL1JnX35TWTHHxynlQRUCtZP+kR4GSVW7/XDbR9FebmYMo1HLttVMXS1Ek5H8+p JjGoXbCSag5rWo8dCG065tMQQat5cHkkbqWdT1OweCvf8xz1t3NOqCjCW/xLfZEIq3GMvMKFAsV 9ZiOzw/uetLsF+cED0gXyOyeEETgcduxb/pF+tAUpUUZiiKH6ASsRiQfvqAfYSj9KAvCx6zxv1V EJwP2oiqEubpr1uJDqu5mFhhDhmtpguIjui4TM2ajHa1O5mgwld45sD+wniYVaHDuyvw6MjgQ5F LgKUJZGIS0EFN2w== X-Developer-Key: i=leitao@debian.org; a=openpgp; fpr=AC8539A6E8F46702CA4A439B35A3939FFC78776D X-Debian-User: leitao The caller of __queue_work() owns WORK_STRUCT_PENDING, won via test_and_set_bit() in queue_work_on()/__queue_delayed_work(). The state machine documented above __queue_work() requires that owner to either hand the token to a pwq (insert_work() -> set_work_pwq()), hand it to a timer, or release it via set_work_pool_and_clear_pending(). try_to_grab_pending() relies on this: when it observes "PENDING && off-queue" it busy-loops, trusting the current owner to make progress. The (__WQ_DESTROYING | __WQ_DRAINING) early-return path violates that contract. It WARN_ONCE()s and bare-returns, leaving work->data with PENDING set, WORK_STRUCT_PWQ clear, and work->entry empty. The path is reachable without explicit API abuse: queue_delayed_work() arms a timer with PENDING set; if drain_workqueue() runs while the timer is still pending, delayed_work_timer_fn() -> __queue_work() in softirq context hits the WARN, current is not a wq worker so is_chained_work() is false, and the work is silently dropped with PENDING leaked. Mirror what clear_pending_if_disabled() already does on its analogous reject path: unpack the off-queue data and call set_work_pool_and_clear_pending() to release the token before returning. I was able to reproduce this by queueing several slow works on a max_active=1 wq, arm a delayed_work whose timer fires while drain_workqueue() is blocked, then call cancel_delayed_work_sync(). Without this patch the cancel livelocks at 100% CPU; with it the cancel returns immediately. Signed-off-by: Breno Leitao --- not sure you want to have a Fixes tag, but, if you do, I would point it to Fixes: e41e704bc4f4 ("workqueue: improve destroy_workqueue() debuggability") --- kernel/workqueue.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 2506b5cfbb133..885be263b2825 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -2296,6 +2296,18 @@ static void __queue_work(int cpu, struct workqueue_struct *wq, if (unlikely(wq->flags & (__WQ_DESTROYING | __WQ_DRAINING) && WARN_ONCE(!is_chained_work(wq), "workqueue: cannot queue %ps on wq %s\n", work->func, wq->name))) { + struct work_offq_data offqd; + + /* + * State on entry: PENDING is set, work is off-queue (no + * insert_work() has run). + * + * Returning without clearing PENDING would leave the work + * in a weird state (PENDING=1, PWQ=0, entry empty) + */ + work_offqd_unpack(&offqd, *work_data_bits(work)); + set_work_pool_and_clear_pending(work, offqd.pool_id, + work_offqd_pack_flags(&offqd)); return; } rcu_read_lock(); --- base-commit: 735d2f48cadaa9a87e7c7601667878de70c771c5 change-id: 20260507-workqueue_pending-b91beb94ef46 Best regards, -- Breno Leitao