From: Matthew Brost <matthew.brost@intel.com>
To: intel-xe@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org, Tejun Heo <tj@kernel.org>,
Lai Jiangshan <jiangshanlai@gmail.com>,
linux-kernel@vger.kernel.org
Subject: [RFC PATCH 01/12] workqueue: Add interface to teach lockdep to warn on reclaim violations
Date: Sun, 15 Mar 2026 21:32:44 -0700 [thread overview]
Message-ID: <20260316043255.226352-2-matthew.brost@intel.com> (raw)
In-Reply-To: <20260316043255.226352-1-matthew.brost@intel.com>
Drivers often use workqueues that run in reclaim paths (e.g., DRM
scheduler workqueues). It is useful to teach lockdep that memory
allocations which can recurse into reclaim (e.g., GFP_KERNEL) are not
allowed on these workqueues. Add an interface that taints a workqueue’s
lockdep state with reclaim.
Also add a helper to test whether a workqueue is reclaim annotated,
allowing drivers to enforce reclaim-safe behavior.
Example of lockdep splat upon violation below:
[ 60.953095] =============================================
[ 73.023656] Console: switching to colour dummy device 80x25
[ 73.023684] [IGT] xe_exec_reset: executing
[ 73.038237] [IGT] xe_exec_reset: starting subtest gt-reset
[ 73.044163] xe 0000:03:00.0: [drm] Tile0: GT0: trying reset from force_reset_write [xe]
[ 73.044276] xe 0000:03:00.0: [drm] Tile0: GT0: reset queued
[ 73.045963] ======================================================
[ 73.052133] WARNING: possible circular locking dependency detected
[ 73.058302] 7.0.0-rc3-xe+ #31 Tainted: G U
[ 73.063866] ------------------------------------------------------
[ 73.070036] kworker/u64:5/158 is trying to acquire lock:
[ 73.075342] ffffffff829a87a0 (fs_reclaim){+.+.}-{0:0}, at: __kmalloc_cache_noprof+0x39/0x420
[ 73.083791]
but task is already holding lock:
[ 73.089612] ffffc9000152fe60 ((work_completion)(>->reset.worker)){+.+.}-{0:0}, at: process_one_work+0x1d2/0x6a0
[ 73.099852]
which lock already depends on the new lock.
[ 73.108013]
the existing dependency chain (in reverse order) is:
[ 73.115481]
-> #2 ((work_completion)(>->reset.worker)){+.+.}-{0:0}:
[ 73.123381] process_one_work+0x1ec/0x6a0
[ 73.127906] worker_thread+0x183/0x330
[ 73.132173] kthread+0xe2/0x120
[ 73.135833] ret_from_fork+0x289/0x2f0
[ 73.140101] ret_from_fork_asm+0x1a/0x30
[ 73.144540]
-> #1 ((wq_completion)gt-ordered-wq){+.+.}-{0:0}:
[ 73.151749] workqueue_warn_on_reclaim.part.0+0x32/0x50
[ 73.157487] alloc_workqueue_noprof+0xef/0x100
[ 73.162445] xe_gt_alloc+0x92/0x220 [xe]
[ 73.166954] xe_pci_probe+0x734/0x1660 [xe]
[ 73.171720] pci_device_probe+0x98/0x140
[ 73.176161] really_probe+0xcf/0x2c0
[ 73.180256] __driver_probe_device+0x6e/0x120
[ 73.185126] driver_probe_device+0x19/0x90
[ 73.189740] __driver_attach+0x89/0x140
[ 73.194091] bus_for_each_dev+0x79/0xd0
[ 73.198446] bus_add_driver+0xe6/0x210
[ 73.202712] driver_register+0x5b/0x110
[ 73.207064] 0xffffffffa00aa0db
[ 73.210724] do_one_initcall+0x59/0x2e0
[ 73.215077] do_init_module+0x5f/0x230
[ 73.219345] init_module_from_file+0xc7/0xe0
[ 73.224128] idempotent_init_module+0x176/0x270
[ 73.229175] __x64_sys_finit_module+0x61/0xb0
[ 73.234047] do_syscall_64+0x9b/0x540
[ 73.238228] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 73.243793]
-> #0 (fs_reclaim){+.+.}-{0:0}:
[ 73.249442] __lock_acquire+0x1496/0x2510
[ 73.253970] lock_acquire+0xbd/0x2f0
[ 73.258062] fs_reclaim_acquire+0x98/0xd0
[ 73.262586] __kmalloc_cache_noprof+0x39/0x420
[ 73.267545] gt_reset_worker+0x27/0x1f0 [xe]
[ 73.272385] process_one_work+0x213/0x6a0
[ 73.276910] worker_thread+0x183/0x330
[ 73.281178] kthread+0xe2/0x120
[ 73.284838] ret_from_fork+0x289/0x2f0
[ 73.289104] ret_from_fork_asm+0x1a/0x30
[ 73.293542]
other info that might help us debug this:
[ 73.301528] Chain exists of:
fs_reclaim --> (wq_completion)gt-ordered-wq --> (work_completion)(>->reset.worker)
[ 73.314795] Possible unsafe locking scenario:
[ 73.320705] CPU0 CPU1
[ 73.325232] ---- ----
[ 73.329759] lock((work_completion)(>->reset.worker));
[ 73.335148] lock((wq_completion)gt-ordered-wq);
[ 73.342359] lock((work_completion)(>->reset.worker));
[ 73.350259] lock(fs_reclaim);
[ 73.353400]
*** DEADLOCK ***
v2:
- Add WQ flag to warn on reclaim violations (Tejun)
- Add a helper function to test if WQ is annotated
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
include/linux/workqueue.h | 3 +++
kernel/workqueue.c | 41 +++++++++++++++++++++++++++++++++++++++
2 files changed, 44 insertions(+)
diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index a4749f56398f..5ad3b92ddd75 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -403,6 +403,7 @@ enum wq_flags {
*/
WQ_POWER_EFFICIENT = 1 << 7,
WQ_PERCPU = 1 << 8, /* bound to a specific cpu */
+ WQ_MEM_WARN_ON_RECLAIM = 1 << 9, /* teach lockdep to warn on reclaim */
__WQ_DESTROYING = 1 << 15, /* internal: workqueue is destroying */
__WQ_DRAINING = 1 << 16, /* internal: workqueue is draining */
@@ -582,6 +583,8 @@ alloc_workqueue_lockdep_map(const char *fmt, unsigned int flags, int max_active,
extern void destroy_workqueue(struct workqueue_struct *wq);
+extern bool workqueue_is_reclaim_annotated(struct workqueue_struct *wq);
+
struct workqueue_attrs *alloc_workqueue_attrs_noprof(void);
#define alloc_workqueue_attrs(...) alloc_hooks(alloc_workqueue_attrs_noprof(__VA_ARGS__))
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index b77119d71641..9c2c3a503e2c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -5872,6 +5872,45 @@ static struct workqueue_struct *__alloc_workqueue(const char *fmt,
return NULL;
}
+#ifdef CONFIG_LOCKDEP
+static void workqueue_warn_on_reclaim(struct workqueue_struct *wq)
+{
+ if (wq->flags & WQ_MEM_WARN_ON_RECLAIM) {
+ fs_reclaim_acquire(GFP_KERNEL);
+ lock_map_acquire(wq->lockdep_map);
+ lock_map_release(wq->lockdep_map);
+ fs_reclaim_release(GFP_KERNEL);
+ }
+}
+#else
+static void workqueue_warn_on_reclaim(struct workqueue_struct *wq)
+{
+}
+#endif
+
+/**
+ * workqueue_is_reclaim_annotated() - Test whether a workqueue is annotated for
+ * reclaim safety
+ * @wq: workqueue to test
+ *
+ * Returns true if @wq is flags have both %WQ_MEM_WARN_ON_RECLAIM and
+ * %WQ_MEM_RECLAIM set. A workqueue marked with these flags indicates that it
+ * participates in reclaim paths, and therefore must not perform memory
+ * allocations that can recurse into reclaim (e.g., GFP_KERNEL is not allowed).
+ *
+ * Drivers can use this helper to enforce reclaim-safe behavior on workqueues
+ * that are created or provided elsewhere in the code.
+ *
+ * Return:
+ * true if the workqueue is reclaim-annotated, false otherwise.
+ */
+bool workqueue_is_reclaim_annotated(struct workqueue_struct *wq)
+{
+ return (wq->flags & WQ_MEM_WARN_ON_RECLAIM) &&
+ (wq->flags & WQ_MEM_RECLAIM);
+}
+EXPORT_SYMBOL_GPL(workqueue_is_reclaim_annotated);
+
__printf(1, 4)
struct workqueue_struct *alloc_workqueue_noprof(const char *fmt,
unsigned int flags,
@@ -5887,6 +5926,7 @@ struct workqueue_struct *alloc_workqueue_noprof(const char *fmt,
return NULL;
wq_init_lockdep(wq);
+ workqueue_warn_on_reclaim(wq);
return wq;
}
@@ -5908,6 +5948,7 @@ alloc_workqueue_lockdep_map(const char *fmt, unsigned int flags,
return NULL;
wq->lockdep_map = lockdep_map;
+ workqueue_warn_on_reclaim(wq);
return wq;
}
--
2.34.1
next prev parent reply other threads:[~2026-03-16 4:33 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-16 4:32 [RFC PATCH 00/12] Introduce DRM dep queue Matthew Brost
2026-03-16 4:32 ` Matthew Brost [this message]
2026-03-25 15:59 ` [RFC PATCH 01/12] workqueue: Add interface to teach lockdep to warn on reclaim violations Tejun Heo
2026-03-26 1:49 ` Matthew Brost
2026-03-26 2:19 ` Tejun Heo
2026-03-27 4:33 ` Matthew Brost
2026-03-27 17:25 ` Tejun Heo
2026-03-16 4:32 ` [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer Matthew Brost
2026-03-16 9:16 ` Boris Brezillon
2026-03-17 5:22 ` Matthew Brost
2026-03-17 8:48 ` Boris Brezillon
2026-03-16 10:25 ` Danilo Krummrich
2026-03-17 5:10 ` Matthew Brost
2026-03-17 12:19 ` Danilo Krummrich
2026-03-18 23:02 ` Matthew Brost
2026-03-17 2:47 ` Daniel Almeida
2026-03-17 5:45 ` Matthew Brost
2026-03-17 7:17 ` Miguel Ojeda
2026-03-17 8:26 ` Matthew Brost
2026-03-17 12:04 ` Daniel Almeida
2026-03-17 19:41 ` Miguel Ojeda
2026-03-23 17:31 ` Matthew Brost
2026-03-23 17:42 ` Miguel Ojeda
2026-03-17 18:14 ` Matthew Brost
2026-03-17 19:48 ` Daniel Almeida
2026-03-17 20:43 ` Boris Brezillon
2026-03-18 22:40 ` Matthew Brost
2026-03-19 9:57 ` Boris Brezillon
2026-03-22 6:43 ` Matthew Brost
2026-03-23 7:58 ` Matthew Brost
2026-03-23 10:06 ` Boris Brezillon
2026-03-23 17:11 ` Matthew Brost
2026-03-17 12:31 ` Danilo Krummrich
2026-03-17 14:25 ` Daniel Almeida
2026-03-17 14:33 ` Danilo Krummrich
2026-03-18 22:50 ` Matthew Brost
2026-03-17 8:47 ` Christian König
2026-03-17 14:55 ` Boris Brezillon
2026-03-18 23:28 ` Matthew Brost
2026-03-19 9:11 ` Boris Brezillon
2026-03-23 4:50 ` Matthew Brost
2026-03-23 9:55 ` Boris Brezillon
2026-03-23 17:08 ` Matthew Brost
2026-03-23 18:38 ` Matthew Brost
2026-03-24 9:23 ` Boris Brezillon
2026-03-24 16:06 ` Matthew Brost
2026-03-25 2:33 ` Matthew Brost
2026-03-24 8:49 ` Boris Brezillon
2026-03-24 16:51 ` Matthew Brost
2026-03-17 16:30 ` Shashank Sharma
2026-03-16 4:32 ` [RFC PATCH 03/12] drm/xe: Use WQ_MEM_WARN_ON_RECLAIM on all workqueues in the reclaim path Matthew Brost
2026-03-16 4:32 ` [RFC PATCH 04/12] drm/xe: Issue GGTT invalidation under lock in ggtt_node_remove Matthew Brost
2026-03-26 5:45 ` Bhadane, Dnyaneshwar
2026-03-16 4:32 ` [RFC PATCH 05/12] drm/xe: Return fence from xe_sched_job_arm and adjust job references Matthew Brost
2026-03-16 4:32 ` [RFC PATCH 06/12] drm/xe: Convert to DRM dep queue scheduler layer Matthew Brost
2026-03-16 4:32 ` [RFC PATCH 07/12] drm/xe: Make scheduler message lock IRQ-safe Matthew Brost
2026-03-16 4:32 ` [RFC PATCH 08/12] drm/xe: Rework exec queue object on top of DRM dep Matthew Brost
2026-03-16 4:32 ` [RFC PATCH 09/12] drm/xe: Enable IRQ job put in " Matthew Brost
2026-03-16 4:32 ` [RFC PATCH 10/12] drm/xe: Use DRM dep queue kill semantics Matthew Brost
2026-03-16 4:32 ` [RFC PATCH 11/12] accel/amdxdna: Convert to drm_dep scheduler layer Matthew Brost
2026-03-16 4:32 ` [RFC PATCH 12/12] drm/panthor: " Matthew Brost
2026-03-16 4:52 ` ✗ CI.checkpatch: warning for Introduce DRM dep queue Patchwork
2026-03-16 4:53 ` ✓ CI.KUnit: success " Patchwork
2026-03-16 5:28 ` ✓ Xe.CI.BAT: " Patchwork
2026-03-16 8:09 ` ✗ Xe.CI.FULL: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260316043255.226352-2-matthew.brost@intel.com \
--to=matthew.brost@intel.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=jiangshanlai@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox