From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Breno Leitao <leitao@debian.org>, Tejun Heo <tj@kernel.org>,
Sasha Levin <sashal@kernel.org>,
linux-kernel@vger.kernel.org
Subject: [PATCH AUTOSEL 7.0-6.18] workqueue: Release PENDING in __queue_work() drain/destroy reject path
Date: Wed, 20 May 2026 07:19:33 -0400 [thread overview]
Message-ID: <20260520111944.3424570-61-sashal@kernel.org> (raw)
In-Reply-To: <20260520111944.3424570-1-sashal@kernel.org>
From: Breno Leitao <leitao@debian.org>
[ Upstream commit a7488f089bdfa87c4fef1744d4dca9f4f8b46f8b ]
The caller of __queue_work() owns WORK_STRUCT_PENDING, won via
test_and_set_bit() in queue_work_on()/__queue_delayed_work(). The
state machine documented above __queue_work() requires that owner
to either hand the token to a pwq (insert_work() -> set_work_pwq()),
hand it to a timer, or release it via set_work_pool_and_clear_pending().
try_to_grab_pending() relies on this: when it observes
"PENDING && off-queue" it busy-loops, trusting the current owner to
make progress.
The (__WQ_DESTROYING | __WQ_DRAINING) early-return path violates that
contract. It WARN_ONCE()s and bare-returns, leaving work->data with
PENDING set, WORK_STRUCT_PWQ clear, and work->entry empty.
The path is reachable without explicit API abuse: queue_delayed_work()
arms a timer with PENDING set; if drain_workqueue() runs while the
timer is still pending, delayed_work_timer_fn() -> __queue_work() in
softirq context hits the WARN, current is not a wq worker so
is_chained_work() is false, and the work is silently dropped with
PENDING leaked.
Mirror what clear_pending_if_disabled() already does on its analogous
reject path: unpack the off-queue data and call
set_work_pool_and_clear_pending() to release the token before
returning.
I was able to reproduce this by queueing several slow works on
a max_active=1 wq, arm a delayed_work whose timer fires while
drain_workqueue() is blocked, then call cancel_delayed_work_sync().
Without this patch the cancel livelocks at 100% CPU; with it the cancel
returns immediately.
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
## Phase Walkthrough
### Phase 1: Commit Message Forensics
Record 1.1: Subsystem `workqueue`; action verb `Release`; intent: clear
`WORK_STRUCT_PENDING` when `__queue_work()` rejects work during
drain/destroy.
Record 1.2: Commit tags: `Signed-off-by: Breno Leitao
<leitao@debian.org>` and `Signed-off-by: Tejun Heo <tj@kernel.org>`. No
committed `Fixes:`, `Reported-by:`, `Tested-by:`, `Reviewed-by:`,
`Acked-by:`, `Link:`, or `Cc: stable@vger.kernel.org`.
Record 1.3: The body describes a real state-machine bug:
`__queue_work()` may return from the `__WQ_DESTROYING | __WQ_DRAINING`
reject path with `PENDING` still set while the work is off-queue.
`try_to_grab_pending()` then sees `PENDING && off-queue` and busy-
retries. The verified failure mode is `cancel_delayed_work_sync()`
livelocking at 100% CPU after a delayed timer fires during
`drain_workqueue()`.
Record 1.4: This is not a hidden cleanup; it is an explicit correctness
fix for a leaked pending token causing livelock.
### Phase 2: Diff Analysis
Record 2.1: One file changed: `kernel/workqueue.c`, 12 insertions. One
function changed: `__queue_work()`. Scope: single-file surgical core
workqueue fix.
Record 2.2: Before: reject path warned and returned directly. After:
reject path unpacks off-queue work data and calls
`set_work_pool_and_clear_pending()` before returning.
Record 2.3: Bug category: workqueue state-machine/token leak with a race
between delayed-work timer execution and draining/destroying a
workqueue. Mechanism: queued delayed work owns `PENDING`; timer callback
enters `__queue_work()`; reject path drops the work but did not release
`PENDING`.
Record 2.4: Fix quality is high. It mirrors the existing
`clear_pending_if_disabled()` pattern in the same file and does not add
an API or change normal queueing behavior. Regression risk is low; the
change is limited to an already-rejecting path.
### Phase 3: Git History
Record 3.1: `git blame` shows the PENDING/off-queue busy-loop contract
comment came from `8930caba3dbd`; the drain reject mechanism is rooted
in `9c5a2ba70251` (`v3.1`), and the destroy-side diagnostic flag came
from `33e3f0a3358b` (`v6.3`). The bare reject `return` traces back to
`e41e704bc4f4` (`v2.6.36-rc4` era).
Record 3.2: No committed `Fixes:` tag. I inspected the patch-thread note
suggesting `e41e704bc4f4`; that commit added the “warn and ignore”
dying-workqueue behavior.
Record 3.3: Recent `kernel/workqueue.c` history shows nearby independent
workqueue fixes and diagnostics; this patch is standalone, not a multi-
patch series dependency.
Record 3.4: Breno Leitao has multiple recent workqueue commits in
`origin/master`; Tejun Heo, the workqueue maintainer, committed this
patch.
Record 3.5: Dependencies: conceptually standalone. Exact patch applies
cleanly to `v6.18.32`, `v6.19.14`, `v7.0`, and `v7.0.9`; older trees
need small backport adjustment because helper APIs and warning text
differ.
### Phase 4: Mailing List / External Research
Record 4.1: `b4 dig -c a7488f089bdfa` found the original thread: `https:
//patch.msgid.link/20260507-workqueue_pending-v1-1-
3a53e2facf4e@debian.org`. Series revisions: v1 only.
Record 4.2: Original recipients included Breno Leitao, Tejun Heo, Lai
Jiangshan, `linux-kernel@vger.kernel.org`, `clm@meta.com`, and `kernel-
team@meta.com`. Tejun replied that it was applied to `wq/for-7.1-fixes`.
Record 4.3: No external bug report or syzbot link in this commit. The
bug report evidence is the author’s concrete reproducer in the
commit/thread.
Record 4.4: No related patch series found; b4 reports a single-patch v1.
Record 4.5: Web search found no stable-specific discussion for this
exact patch. Lore WebFetch was blocked by Anubis, but b4 successfully
fetched the mbox.
### Phase 5: Code Semantic Analysis
Record 5.1: Modified function: `__queue_work()`.
Record 5.2: Callers verified in `kernel/workqueue.c`: `queue_work_on()`,
`queue_work_node()`, `delayed_work_timer_fn()`, zero-delay
`__queue_delayed_work()`, `rcu_work_rcufn()`, and requeue paths. The
relevant caller is `delayed_work_timer_fn()`.
Record 5.3: Key callees: `is_chained_work()`, `work_offqd_unpack()`,
`set_work_pool_and_clear_pending()`, and, on non-reject paths, pool
selection plus `insert_work()`.
Record 5.4: Verified call chain: `queue_delayed_work_on()` sets
`PENDING`; `__queue_delayed_work()` arms the timer;
`delayed_work_timer_fn()` calls `__queue_work()`; `drain_workqueue()`
sets `__WQ_DRAINING`; `cancel_delayed_work_sync()` reaches
`try_to_grab_pending()` through `work_grab_pending()` and can spin on
`-EAGAIN`. Direct unprivileged trigger was not verified.
Record 5.5: Similar in-tree pattern verified:
`clear_pending_if_disabled()` already unpacks off-queue data and clears
pending in an analogous reject path.
### Phase 6: Stable Tree Analysis
Record 6.1: The buggy drain reject path exists in checked tags from
`v5.4`, `v5.10`, `v5.15`, `v6.1`, `v6.6`, `v6.12`, `v6.18`, `v6.19`, and
`v7.0`. Drain behavior dates back to `v3.1`.
Record 6.2: Backport difficulty: clean for `v6.18+` and `v7.0`; minor
manual backport for `v6.12`; more adjustment for `v6.6` and older
because `work_offqd_unpack()`/flags helpers differ or are absent.
Record 6.3: I found no exact related fix already present by subject in
checked stable branches.
### Phase 7: Subsystem Context
Record 7.1: Subsystem is core workqueue infrastructure. Criticality:
CORE, because workqueues are used throughout drivers, filesystems,
networking, storage, GPU, and core kernel code.
Record 7.2: Workqueue is actively maintained, with recent commits by
Breno Leitao and Tejun Heo, but the affected state machine is mature and
present across long-lived stable lines.
### Phase 8: Impact / Risk
Record 8.1: Affected population: broad kernel users, but trigger-
specific to users of delayed work on workqueues being drained or
destroyed.
Record 8.2: Trigger: delayed-work timer fires while `drain_workqueue()`
or destroy-time draining is active, followed by cancellation/grab of
that delayed work. Commonness is workload dependent; direct unprivileged
reachability was not verified.
Record 8.3: Failure mode: livelock at 100% CPU in
`cancel_delayed_work_sync()`. Severity: HIGH, potentially CRITICAL for
teardown/suspend/remove paths that must complete.
Record 8.4: Benefit high: prevents a real hang/livelock in core
infrastructure. Risk low: 12-line change, reject path only, mirrors
existing cleanup logic.
### Phase 9: Final Synthesis
Record 9.1: Evidence for backporting: real reproduced livelock, core
subsystem, broad stable-tree presence, small fix, maintainer-applied, no
new API. Evidence against: exact patch does not cleanly apply to older
stable trees and lacks independent `Tested-by`; older trees need careful
helper-specific backports. Unresolved: no independent bug report found,
no direct unprivileged trigger verified.
Record 9.2: Stable rules: obviously correct and self-tested: yes; fixes
real bug: yes; important issue: yes, livelock/hang; small and contained:
yes, one function and 12 lines; no new features/APIs: yes; applies to
stable: clean for newer stable, minor/manual for older.
Record 9.3: No exception category applies; this is a core bug fix, not a
device ID, quirk, DT, build, or documentation change.
Record 9.4: Decision: backport. The technical merit is strong: this
fixes a verified workqueue livelock with a small, localized, maintainer-
accepted change.
## Verification
- Phase 1: Parsed `git show --format=fuller` for `a7488f089bdfa`;
confirmed tags and commit body.
- Phase 2: Inspected the diff; confirmed only `kernel/workqueue.c`
changed with 12 insertions in `__queue_work()`.
- Phase 3: Used `git blame`, `git show`, `git describe --contains`, and
path-limited `git log`; confirmed historical origins and stable
version presence.
- Phase 4: Used `b4 dig -c`, `-a`, `-w`, and mbox fetch; confirmed v1
patch and Tejun’s apply reply. WebFetch to lore was blocked.
- Phase 5: Used `rg` and `ReadFile` to trace callers/callees and the
delayed-work/cancel path.
- Phase 6: Checked specific stable tags and ran patch apply checks in
temporary worktrees; confirmed clean apply for `v6.18+` and backport
needs for older trees.
- Phase 7: Verified subsystem path and recent workqueue history.
- Phase 8: Verified failure mode from commit body and matching code
path; unprivileged trigger remains unverified.
**YES**
kernel/workqueue.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index c6ea96d5b7167..2c512b4a74482 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2281,6 +2281,18 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
if (unlikely(wq->flags & (__WQ_DESTROYING | __WQ_DRAINING) &&
WARN_ONCE(!is_chained_work(wq), "workqueue: cannot queue %ps on wq %s\n",
work->func, wq->name))) {
+ struct work_offq_data offqd;
+
+ /*
+ * State on entry: PENDING is set, work is off-queue (no
+ * insert_work() has run).
+ *
+ * Returning without clearing PENDING would leave the work
+ * in a weird state (PENDING=1, PWQ=0, entry empty)
+ */
+ work_offqd_unpack(&offqd, *work_data_bits(work));
+ set_work_pool_and_clear_pending(work, offqd.pool_id,
+ work_offqd_pack_flags(&offqd));
return;
}
rcu_read_lock();
--
2.53.0
next prev parent reply other threads:[~2026-05-20 11:21 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-20 11:18 [PATCH AUTOSEL 7.0-6.12] HID: logitech-hidpp: Add support for newer Bluetooth keyboards Sasha Levin
2026-05-20 11:18 ` [PATCH AUTOSEL 7.0] drm/amdgpu: remove deadlocks from amdgpu_userq_pre_reset Sasha Levin
2026-05-20 11:18 ` [PATCH AUTOSEL 7.0-5.10] ALSA: sparc/dbri: add missing fallthrough Sasha Levin
2026-05-20 11:18 ` [PATCH AUTOSEL 7.0-6.6] docs: cgroup-v1: Update charge-commit section Sasha Levin
2026-05-20 11:18 ` [PATCH AUTOSEL 7.0-5.10] drm/panel: feiyang-fy07024di26a30d: return display-on error Sasha Levin
2026-05-20 11:18 ` [PATCH AUTOSEL 7.0-6.6] ALSA: usb-audio: Add iface reset and delay quirk for TTGK Technology USB-C Audio Sasha Levin
2026-05-20 11:18 ` [PATCH AUTOSEL 7.0-5.10] selftests/cgroup: Fix cg_read_strcmp() empty string comparison Sasha Levin
2026-05-20 11:18 ` [PATCH AUTOSEL 7.0-6.1] smb: client: Zero-pad short GSS session keys per MS-SMB2 Sasha Levin
2026-05-20 11:18 ` [PATCH AUTOSEL 7.0-5.10] HID: magicmouse: Prevent out-of-bounds (OOB) read during DOUBLE_REPORT_ID Sasha Levin
2026-05-20 11:18 ` [PATCH AUTOSEL 7.0] smb: client: avoid integer overflow in SMB2 READ length check Sasha Levin
2026-05-20 11:18 ` [PATCH AUTOSEL 7.0-5.10] libceph: Fix unnecessarily high ceph_decode_need() for uniform bucket Sasha Levin
2026-05-20 11:18 ` [PATCH AUTOSEL 7.0-6.6] ALSA: hda/realtek: fix mic boost on Framework PTL Sasha Levin
2026-05-20 11:18 ` [PATCH AUTOSEL 7.0-6.6] io_uring: hold uring_lock when walking link chain in io_wq_free_work() Sasha Levin
2026-05-20 11:18 ` [PATCH AUTOSEL 7.0-5.15] wifi: nl80211: re-check wiphy netns in nl80211_prepare_wdev_dump() continuation Sasha Levin
2026-05-20 11:18 ` [PATCH AUTOSEL 7.0-6.12] KVM: arm64: nv: Consider the DS bit when translating TCR_EL2 Sasha Levin
2026-05-20 11:18 ` [PATCH AUTOSEL 7.0] docs: hwmon: sy7636a: fix temperature sysfs attribute name Sasha Levin
2026-05-20 11:18 ` [PATCH AUTOSEL 7.0] ALSA: hda/realtek: ALC269 fixup for Lenovo Yoga Pro 7 15ASH111 audio Sasha Levin
2026-05-20 11:18 ` [PATCH AUTOSEL 7.0-6.6] ipv6: Implement limits on extension header parsing Sasha Levin
2026-05-20 11:18 ` [PATCH AUTOSEL 7.0-6.12] net: usb: cdc_ncm: add Apple Mac USB-C direct networking quirk Sasha Levin
2026-05-20 11:18 ` [PATCH AUTOSEL 7.0-5.15] net: usb: r8152: add TRENDnet TUC-ET2G v2.0 Sasha Levin
2026-05-20 11:18 ` [PATCH AUTOSEL 7.0-5.10] i2c: dev: prevent integer overflow in I2C_TIMEOUT ioctl Sasha Levin
2026-05-20 11:18 ` [PATCH AUTOSEL 7.0-5.10] powerpc/vmx: avoid KASAN instrumentation in enter_vmx_ops() for kexec Sasha Levin
2026-05-20 11:18 ` [PATCH AUTOSEL 7.0-6.18] ALSA: usb-audio: add min_mute quirk for Razer Nommo V2 X Sasha Levin
2026-05-20 11:18 ` [PATCH AUTOSEL 7.0-5.10] wifi: libertas: fix integer underflow in process_cmdrequest() Sasha Levin
2026-05-20 20:41 ` James Cameron
2026-05-20 11:18 ` [PATCH AUTOSEL 7.0-5.10] HID: mcp2221: fix OOB write in mcp2221_raw_event() Sasha Levin
2026-05-20 11:18 ` [PATCH AUTOSEL 7.0] io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER Sasha Levin
2026-05-20 11:40 ` Jens Axboe
2026-05-20 11:18 ` [PATCH AUTOSEL 7.0-5.10] wifi: nl80211: require CAP_NET_ADMIN over the target netns in SET_WIPHY_NETNS Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-5.10] HID: elan: Add support for ELAN SB974D touchpad Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-6.12] media: qcom: camss: avoid format string warning Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-6.12] HID: i2c-hid: add reset quirk for BLTP7853 touchpad Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-6.12] ALSA: hda/realtek: Limit mic boost on Positivo DN50E Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-6.12] Documentation: kvm: update links in the references section of AMD Memory Encryption Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-5.10] scsi: scsi_dh_alua: Increase default ALUA timeout to maximum spec value Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-6.1] HID: google: hammer: stop hardware on devres action failure Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-6.18] ALSA: doc: cs35l56: Update path to HDA driver source Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-6.18] Bluetooth: hci_uart: Fix NULL deref in recv callbacks when priv is uninitialized Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0] ALSA: hda/realtek: Add mute LED fixup for HP Pavilion 15-cs1xxx Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-6.12] btrfs: fix check_chunk_block_group_mappings() to iterate all chunk maps Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-5.15] ALSA: usb-audio: Add quirk flags for AlphaTheta EUPHONIA Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-5.10] powerpc/g5: Enable all windfarms by default Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-6.18] ALSA: hda/realtek: Add codec SSID quirk for Lenovo Yoga Pro 9 16IMH9 Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-6.18] tools/ynl: add missing uapi header deps in Makefile.deps Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-5.10] fbdev: ipu-v3: clean up kernel-doc warnings Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-6.6] ASoC: amd: yc: Add DMI quirk for MSI Bravo 15 C7VE Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-6.1] powerpc/pasemi: Drop redundant res assignment Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-5.10] cgroup/cpuset: move PF_EXITING check before __GFP_HARDWALL in cpuset_current_node_allowed() Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-6.18] drm/amd/ras: Fix CPER ring debugfs read overflow Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-5.15] scsi: smartpqi: Silence a recursive lock warning Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0] io_uring: defer linked-timeout chain splice out of hrtimer context Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-6.18] io_uring: validate user-controlled cq.head in io_cqe_cache_refill() Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-6.12] platform/x86: asus-nb-wmi: add DMI quirk for ASUS Zenbook Duo UX8407AA Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-6.18] powerpc/pseries/htmdump: Free the global buffers in htmdump module exit Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-6.6] HID: sony: add missing size validation for SMK-Link remotes Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-5.15] HID: ft260: validate i2c input report length Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0] io_uring: hold uring_lock across io_kill_timeouts() in cancel path Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0] platform/x86: hp-wmi: Add support for Victus 16-r0xxx (8BC2) Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-5.10] i2c: acpi: Add ELAN0678 to i2c_acpi_force_100khz_device_ids Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-6.18] KVM: VMX: introduce module parameter to disable CET Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-6.18] iommu/amd: Use maximum Event log buffer size when SNP is enabled on Family 0x19 Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-5.10] ALSA: usb-audio: add clock quirk for Motu 1248 Sasha Levin
2026-05-20 11:19 ` Sasha Levin [this message]
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0] ASoC: sdw_utils: avoid the SDCA companion function not supported failure Sasha Levin
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0] Documentation: security-bugs: do not systematically Cc the security team Sasha Levin
2026-05-20 13:07 ` Jonathan Corbet
2026-05-20 11:19 ` [PATCH AUTOSEL 7.0-6.12] io_uring/fdinfo: translate SqThread PID through caller's pid_ns Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260520111944.3424570-61-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=leitao@debian.org \
--cc=linux-kernel@vger.kernel.org \
--cc=patches@lists.linux.dev \
--cc=stable@vger.kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox