From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Stuart Summers <stuart.summers@intel.com>,
Matthew Brost <matthew.brost@intel.com>,
Sasha Levin <sashal@kernel.org>,
lucas.demarchi@intel.com, thomas.hellstrom@linux.intel.com,
rodrigo.vivi@intel.com, intel-xe@lists.freedesktop.org
Subject: [PATCH AUTOSEL 6.17] drm/xe: Cancel pending TLB inval workers on teardown
Date: Sat, 25 Oct 2025 11:56:18 -0400 [thread overview]
Message-ID: <20251025160905.3857885-147-sashal@kernel.org> (raw)
In-Reply-To: <20251025160905.3857885-1-sashal@kernel.org>
From: Stuart Summers <stuart.summers@intel.com>
[ Upstream commit 76186a253a4b9eb41c5a83224c14efdf30960a71 ]
Add a new _fini() routine on the GT TLB invalidation
side to handle this worker cleanup on driver teardown.
v2: Move the TLB teardown to the gt fini() routine called during
gt_init rather than in gt_alloc. This way the GT structure stays
alive for while we reset the TLB state.
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250826182911.392550-3-stuart.summers@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
YES
- What it fixes
- Prevents use-after-free/hangs on driver teardown by cancelling
pending TLB-invalidation workers/fences before GT resources are
dismantled. The reset path already handles this during GT resets;
this commit ensures the same cleanup occurs on teardown.
- Key changes and why they matter
- drivers/gpu/drm/xe/xe_gt.c: `xe_gt_fini()` now calls
`xe_gt_tlb_invalidation_fini(gt)` first. This ensures TLB
invalidation workers/fences are cancelled while the GT is still
alive, avoiding races/UAF during teardown.
- drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c: Adds
`xe_gt_tlb_invalidation_fini(struct xe_gt *gt)` which simply calls
`xe_gt_tlb_invalidation_reset(gt)`. The reset routine:
- Computes a “pending” seqno and updates `seqno_recv` so waiters see
all prior invalidations as complete.
- Iterates `pending_fences` and signals them, waking any kworkers
waiting for TLB flush completion.
- This mirrors the existing reset behavior (cancel delayed work,
advance seqno, signal fences) used during GT resets to guarantee
no waiter is left behind.
- drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h: Adds the prototype for
the new fini, keeping the API consistent.
- Concrete evidence in the code changes
- The commit places `xe_gt_tlb_invalidation_fini(gt)` at the start of
GT teardown (xe_gt.c: in `xe_gt_fini()`), so TLB/worker cleanup runs
before `xe_hw_fence_irq_finish()` and
`xe_gt_disable_host_l2_vram()`. This ordering minimizes races with
IRQ/fence infrastructure and other GT resources during teardown.
- The finish routine calls into the reset path which explicitly:
- Sets `seqno_recv` to a value covering all outstanding requests.
- Signals all pending invalidation fences via
`list_for_each_entry_safe(... pending_fences ...)`, ensuring
waiters are released.
- This matches the comment in the reset path about kworkers not
tracked by explicit TLB fences and the need to wake them assuming
a full GT reset.
- Mapping to current tree (for context/impact assessment)
- In this tree, the corresponding logic lives under the “tlb_inval”
names:
- The reset path is implemented in
`drivers/gpu/drm/xe/xe_tlb_inval.c:156` (`xe_tlb_inval_reset()`),
which cancels the delayed timeout work, updates `seqno_recv`, and
signals all `pending_fences`.
- This path is already invoked during GT reset flows (e.g.,
`drivers/gpu/drm/xe/xe_gt.c:853, 1067, 1139`), proving the
approach is safe and battle-tested during runtime resets.
- A drmm-managed teardown hook exists
(`drivers/gpu/drm/xe/xe_tlb_inval.c:114`), but that operates at
DRM device teardown. If GT devm teardown runs earlier, there is a
window where TLB invalidation workers could outlive GT, risking
UAF. Moving the cleanup into `xe_gt_fini()` (devm action, see
`drivers/gpu/drm/xe/xe_gt.c:624`) closes that gap, which is
exactly what this commit does in its codebase.
- Stable backport criteria
- Important bugfix: avoids teardown-time UAF/hangs/leaks by cancelling
and signalling all pending TLB invalidation work.
- Small and contained: touches only the xe GT/TLB invalidation
teardown path; adds one call-site and a thin wrapper.
- No feature or architectural change: purely lifecycle/cleanup
ordering.
- Low regression risk: uses the same reset logic already exercised in
GT reset paths.
- Driver subsystem only (DRM xe), not core kernel.
- Conclusion
- This is a clear, low-risk correctness fix for teardown-time resource
and worker cleanup in the xe driver. It should be backported to
stable trees where the xe driver and TLB invalidation workers exist,
adapting symbol/file names as needed (e.g., calling
`xe_tlb_inval_reset(>->tlb_inval)` from `xe_gt_fini()` in trees
with the older naming).
drivers/gpu/drm/xe/xe_gt.c | 2 ++
drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 12 ++++++++++++
drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h | 1 +
3 files changed, 15 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index 17634195cdc26..6f63c658c341f 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -605,6 +605,8 @@ static void xe_gt_fini(void *arg)
struct xe_gt *gt = arg;
int i;
+ xe_gt_tlb_invalidation_fini(gt);
+
for (i = 0; i < XE_ENGINE_CLASS_MAX; ++i)
xe_hw_fence_irq_finish(>->fence_irq[i]);
diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
index 086c12ee3d9de..64cd6cf0ab8df 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
@@ -173,6 +173,18 @@ void xe_gt_tlb_invalidation_reset(struct xe_gt *gt)
mutex_unlock(>->uc.guc.ct.lock);
}
+/**
+ *
+ * xe_gt_tlb_invalidation_fini - Clean up GT TLB invalidation state
+ *
+ * Cancel pending fence workers and clean up any additional
+ * GT TLB invalidation state.
+ */
+void xe_gt_tlb_invalidation_fini(struct xe_gt *gt)
+{
+ xe_gt_tlb_invalidation_reset(gt);
+}
+
static bool tlb_invalidation_seqno_past(struct xe_gt *gt, int seqno)
{
int seqno_recv = READ_ONCE(gt->tlb_invalidation.seqno_recv);
diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
index f7f0f2eaf4b59..3e4cff3922d6f 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
@@ -16,6 +16,7 @@ struct xe_vm;
struct xe_vma;
int xe_gt_tlb_invalidation_init_early(struct xe_gt *gt);
+void xe_gt_tlb_invalidation_fini(struct xe_gt *gt);
void xe_gt_tlb_invalidation_reset(struct xe_gt *gt);
int xe_gt_tlb_invalidation_ggtt(struct xe_gt *gt);
--
2.51.0
next prev parent reply other threads:[~2025-10-25 16:15 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20251025160905.3857885-1-sashal@kernel.org>
2025-10-25 15:54 ` [PATCH AUTOSEL 6.17] drm/xe/pcode: Initialize data0 for pcode read routine Sasha Levin
2025-10-25 15:54 ` [PATCH AUTOSEL 6.17] drm/xe: improve dma-resv handling for backup object Sasha Levin
2025-10-25 15:54 ` [PATCH AUTOSEL 6.17] drm/xe: Extend wa_13012615864 to additional Xe2 and Xe3 platforms Sasha Levin
2025-10-25 15:54 ` [PATCH AUTOSEL 6.17] drm/xe/ptl: Apply Wa_16026007364 Sasha Levin
2025-10-25 15:55 ` [PATCH AUTOSEL 6.17] drm/xe: Set GT as wedged before sending wedged uevent Sasha Levin
2025-10-25 15:55 ` [PATCH AUTOSEL 6.17] drm/xe/i2c: Enable bus mastering Sasha Levin
2025-10-25 15:55 ` [PATCH AUTOSEL 6.17] drm/xe/configfs: Enforce canonical device names Sasha Levin
2025-10-25 15:56 ` [PATCH AUTOSEL 6.17] drm/xe: Extend Wa_22021007897 to Xe3 platforms Sasha Levin
2025-10-25 15:56 ` Sasha Levin [this message]
2025-10-25 15:57 ` [PATCH AUTOSEL 6.17-6.12] drm/xe/guc: Increase GuC crash dump buffer size Sasha Levin
2025-10-25 15:57 ` [PATCH AUTOSEL 6.17] drm/xe/wcl: Extend L3bank mask workaround Sasha Levin
2025-10-25 15:57 ` [PATCH AUTOSEL 6.17-6.12] drm/xe/guc: Set upper limit of H2G retries over CTB Sasha Levin
2025-10-25 15:57 ` [PATCH AUTOSEL 6.17] drm/xe: Make page size consistent in loop Sasha Levin
2025-10-25 15:57 ` [PATCH AUTOSEL 6.17] drm/xe/guc: Add devm release action to safely tear down CT Sasha Levin
2025-10-25 15:57 ` [PATCH AUTOSEL 6.17] drm/xe/pf: Program LMTT directory pointer on all GTs within a tile Sasha Levin
2025-10-25 15:58 ` [PATCH AUTOSEL 6.17] drm/xe/guc: Always add CT disable action during second init step Sasha Levin
2025-10-25 15:58 ` [PATCH AUTOSEL 6.17] drm/xe/pf: Don't resume device from restart worker Sasha Levin
2025-10-25 15:59 ` [PATCH AUTOSEL 6.17-6.12] drm/xe/guc: Return an error code if the GuC load fails Sasha Levin
2025-10-25 15:59 ` [PATCH AUTOSEL 6.17] drm/xe: Ensure GT is in C0 during resumes Sasha Levin
2025-10-25 15:59 ` [PATCH AUTOSEL 6.17] drm/xe: rework PDE PAT index selection Sasha Levin
2025-10-25 16:01 ` [PATCH AUTOSEL 6.17-6.12] drm/xe/guc: Add more GuC load error status codes Sasha Levin
2025-10-25 16:01 ` [PATCH AUTOSEL 6.17-6.12] drm/xe: Fix oops in xe_gem_fault when running core_hotunplug test Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251025160905.3857885-147-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=lucas.demarchi@intel.com \
--cc=matthew.brost@intel.com \
--cc=patches@lists.linux.dev \
--cc=rodrigo.vivi@intel.com \
--cc=stable@vger.kernel.org \
--cc=stuart.summers@intel.com \
--cc=thomas.hellstrom@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox