From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>,
Satyanarayana K V P <satyanarayana.k.v.p@intel.com>,
Matthew Brost <matthew.brost@intel.com>,
Sasha Levin <sashal@kernel.org>,
lucas.demarchi@intel.com, thomas.hellstrom@linux.intel.com,
rodrigo.vivi@intel.com, intel-xe@lists.freedesktop.org
Subject: [PATCH AUTOSEL 6.17] drm/xe/guc: Always add CT disable action during second init step
Date: Sat, 25 Oct 2025 11:58:22 -0400 [thread overview]
Message-ID: <20251025160905.3857885-271-sashal@kernel.org> (raw)
In-Reply-To: <20251025160905.3857885-1-sashal@kernel.org>
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
[ Upstream commit 955f3bc4af440bb950c7a1567197aaf6aa2213ae ]
On DGFX, during init_post_hwconfig() step, we are reinitializing
CTB BO in VRAM and we have to replace cleanup action to disable CT
communication prior to release of underlying BO.
But that introduces some discrepancy between DGFX and iGFX, as for
iGFX we keep previously added disable CT action that would be called
during unwind much later.
To keep the same flow on both types of platforms, always replace old
cleanup action and register new one.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Link: https://lore.kernel.org/r/20250908102053.539-2-michal.wajdeczko@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
YES
- What changed
- The second-stage init `xe_guc_ct_init_post_hwconfig` now always
rebinds the device-managed cleanup for disabling CT, not only on
dGFX.
- Before: early return on non-dGFX skipped cleanup rebind, leaving the
original `guc_action_disable_ct` action in its earlier position in
the devres stack.
- Reference: previous flow in `drivers/gpu/drm/xe/xe_guc_ct.c` where
a `return 0` occurred for non-dGFX after the `!IS_DGFX(xe)` check.
- After: `IS_DGFX(xe)` gates only the VRAM reinit, but the function
always removes/releases the existing disable action and re-adds it.
- Reference (current code structure):
`drivers/gpu/drm/xe/xe_guc_ct.c:294` definition; VRAM reinit in
`drivers/gpu/drm/xe/xe_guc_ct.c:303` and rebind sequence at
`drivers/gpu/drm/xe/xe_guc_ct.c:309` (remove) and
`drivers/gpu/drm/xe/xe_guc_ct.c:310` (re-add).
- Why it matters (ordering bug/consistency)
- `guc_action_disable_ct` is first registered in the initial init path
so CT is disabled before CTB BO teardown during managed cleanup.
- Reference: first registration in `xe_guc_ct_init` at
`drivers/gpu/drm/xe/xe_guc_ct.c:281`.
- In the dGFX path, `xe_managed_bo_reinit_in_vram` replaces the SMEM
BO with a VRAM BO and executes the old BO’s managed action
immediately.
- Reference: `drivers/gpu/drm/xe/xe_bo.c:2679` uses
`devm_release_action(...)` to release the old BO pin/map action
during reinit.
- Without re-registering `guc_action_disable_ct` after the new BO is
created, the devres LIFO order can invert: the new BO’s cleanup runs
before CT is disabled, risking CT activity referencing a BO that is
being torn down. This was corrected for dGFX and is now made
consistent for iGFX by always re-registering.
- The function asserts CT is not enabled at this stage, so
removing/re-adding (or releasing/re-adding) the disable action is
safe and purely affects future cleanup ordering, not current state.
- Reference: `drivers/gpu/drm/xe/xe_guc_ct.c:301` `xe_assert(xe,
!xe_guc_ct_enabled(ct));`
- Impact scope and risk
- Scope: one function in the Xe GuC CT path,
`drivers/gpu/drm/xe/xe_guc_ct.c:294`.
- Behavior change: only the devres cleanup ordering for
`guc_action_disable_ct` relative to resources registered around the
post-hwconfig phase. No API/ABI changes, no functional changes at
runtime beyond safer teardown/unwind ordering.
- Low regression risk: re-registering the same action is idempotent
with respect to runtime, and improves consistency between dGFX and
iGFX flows. If `devm_release_action` is used (as in the patch text),
`guc_action_disable_ct` executes immediately; this is safe because
CT is asserted disabled at this point and the action is a no-op
state transition to DISABLED.
- Reference: `guc_action_disable_ct` body at
`drivers/gpu/drm/xe/xe_guc_ct.c:257` sets `ct->state =
XE_GUC_CT_STATE_DISABLED`.
- Stable backport criteria
- Fixes a real (if timing-dependent) bug class: mismatched cleanup
ordering between platform variants that could allow CT communication
to outlive its buffer during teardown/error-unwind.
- Minimal, contained change in a driver subsystem; no architectural
changes.
- No new features; improves correctness and consistency.
- Touches only DRM Xe GuC control transport; not a core subsystem.
- Additional context
- This function is invoked during the post-hwconfig phase:
`drivers/gpu/drm/xe/xe_guc.c:837`. Ensuring the disable action is
re-registered here makes its cleanup ordering correct relative to
the newly created VRAM BO (on dGFX) and consistent on iGFX as
further managed resources are registered after this step.
Given the low risk, small scope, and correctness benefit (unified and
safe cleanup ordering), this is a good candidate for stable backporting.
drivers/gpu/drm/xe/xe_guc_ct.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index ff622628d823f..22eff8476ad48 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -300,12 +300,11 @@ int xe_guc_ct_init_post_hwconfig(struct xe_guc_ct *ct)
xe_assert(xe, !xe_guc_ct_enabled(ct));
- if (!IS_DGFX(xe))
- return 0;
-
- ret = xe_managed_bo_reinit_in_vram(xe, tile, &ct->bo);
- if (ret)
- return ret;
+ if (IS_DGFX(xe)) {
+ ret = xe_managed_bo_reinit_in_vram(xe, tile, &ct->bo);
+ if (ret)
+ return ret;
+ }
devm_release_action(xe->drm.dev, guc_action_disable_ct, ct);
return devm_add_action_or_reset(xe->drm.dev, guc_action_disable_ct, ct);
--
2.51.0
next prev parent reply other threads:[~2025-10-25 16:21 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20251025160905.3857885-1-sashal@kernel.org>
2025-10-25 15:54 ` [PATCH AUTOSEL 6.17] drm/xe/pcode: Initialize data0 for pcode read routine Sasha Levin
2025-10-25 15:54 ` [PATCH AUTOSEL 6.17] drm/xe: improve dma-resv handling for backup object Sasha Levin
2025-10-25 15:54 ` [PATCH AUTOSEL 6.17] drm/xe: Extend wa_13012615864 to additional Xe2 and Xe3 platforms Sasha Levin
2025-10-25 15:54 ` [PATCH AUTOSEL 6.17] drm/xe/ptl: Apply Wa_16026007364 Sasha Levin
2025-10-25 15:55 ` [PATCH AUTOSEL 6.17] drm/xe: Set GT as wedged before sending wedged uevent Sasha Levin
2025-10-25 15:55 ` [PATCH AUTOSEL 6.17] drm/xe/i2c: Enable bus mastering Sasha Levin
2025-10-25 15:55 ` [PATCH AUTOSEL 6.17] drm/xe/configfs: Enforce canonical device names Sasha Levin
2025-10-25 15:56 ` [PATCH AUTOSEL 6.17] drm/xe: Extend Wa_22021007897 to Xe3 platforms Sasha Levin
2025-10-25 15:56 ` [PATCH AUTOSEL 6.17] drm/xe: Cancel pending TLB inval workers on teardown Sasha Levin
2025-10-25 15:57 ` [PATCH AUTOSEL 6.17-6.12] drm/xe/guc: Increase GuC crash dump buffer size Sasha Levin
2025-10-25 15:57 ` [PATCH AUTOSEL 6.17] drm/xe/wcl: Extend L3bank mask workaround Sasha Levin
2025-10-25 15:57 ` [PATCH AUTOSEL 6.17-6.12] drm/xe/guc: Set upper limit of H2G retries over CTB Sasha Levin
2025-10-25 15:57 ` [PATCH AUTOSEL 6.17] drm/xe: Make page size consistent in loop Sasha Levin
2025-10-25 15:57 ` [PATCH AUTOSEL 6.17] drm/xe/guc: Add devm release action to safely tear down CT Sasha Levin
2025-10-25 15:57 ` [PATCH AUTOSEL 6.17] drm/xe/pf: Program LMTT directory pointer on all GTs within a tile Sasha Levin
2025-10-25 15:58 ` Sasha Levin [this message]
2025-10-25 15:58 ` [PATCH AUTOSEL 6.17] drm/xe/pf: Don't resume device from restart worker Sasha Levin
2025-10-25 15:59 ` [PATCH AUTOSEL 6.17-6.12] drm/xe/guc: Return an error code if the GuC load fails Sasha Levin
2025-10-25 15:59 ` [PATCH AUTOSEL 6.17] drm/xe: Ensure GT is in C0 during resumes Sasha Levin
2025-10-25 15:59 ` [PATCH AUTOSEL 6.17] drm/xe: rework PDE PAT index selection Sasha Levin
2025-10-25 16:01 ` [PATCH AUTOSEL 6.17-6.12] drm/xe/guc: Add more GuC load error status codes Sasha Levin
2025-10-25 16:01 ` [PATCH AUTOSEL 6.17-6.12] drm/xe: Fix oops in xe_gem_fault when running core_hotunplug test Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251025160905.3857885-271-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=lucas.demarchi@intel.com \
--cc=matthew.brost@intel.com \
--cc=michal.wajdeczko@intel.com \
--cc=patches@lists.linux.dev \
--cc=rodrigo.vivi@intel.com \
--cc=satyanarayana.k.v.p@intel.com \
--cc=stable@vger.kernel.org \
--cc=thomas.hellstrom@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox