Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>,
	Satyanarayana K V P <satyanarayana.k.v.p@intel.com>,
	Matthew Brost <matthew.brost@intel.com>,
	Sasha Levin <sashal@kernel.org>,
	lucas.demarchi@intel.com, thomas.hellstrom@linux.intel.com,
	rodrigo.vivi@intel.com, intel-xe@lists.freedesktop.org
Subject: [PATCH AUTOSEL 6.17] drm/xe/guc: Always add CT disable action during second init step
Date: Sat, 25 Oct 2025 11:58:22 -0400	[thread overview]
Message-ID: <20251025160905.3857885-271-sashal@kernel.org> (raw)
In-Reply-To: <20251025160905.3857885-1-sashal@kernel.org>

From: Michal Wajdeczko <michal.wajdeczko@intel.com>

[ Upstream commit 955f3bc4af440bb950c7a1567197aaf6aa2213ae ]

On DGFX, during init_post_hwconfig() step, we are reinitializing
CTB BO in VRAM and we have to replace cleanup action to disable CT
communication prior to release of underlying BO.

But that introduces some discrepancy between DGFX and iGFX, as for
iGFX we keep previously added disable CT action that would be called
during unwind much later.

To keep the same flow on both types of platforms, always replace old
cleanup action and register new one.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Link: https://lore.kernel.org/r/20250908102053.539-2-michal.wajdeczko@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

YES

- What changed
  - The second-stage init `xe_guc_ct_init_post_hwconfig` now always
    rebinds the device-managed cleanup for disabling CT, not only on
    dGFX.
  - Before: early return on non-dGFX skipped cleanup rebind, leaving the
    original `guc_action_disable_ct` action in its earlier position in
    the devres stack.
    - Reference: previous flow in `drivers/gpu/drm/xe/xe_guc_ct.c` where
      a `return 0` occurred for non-dGFX after the `!IS_DGFX(xe)` check.
  - After: `IS_DGFX(xe)` gates only the VRAM reinit, but the function
    always removes/releases the existing disable action and re-adds it.
    - Reference (current code structure):
      `drivers/gpu/drm/xe/xe_guc_ct.c:294` definition; VRAM reinit in
      `drivers/gpu/drm/xe/xe_guc_ct.c:303` and rebind sequence at
      `drivers/gpu/drm/xe/xe_guc_ct.c:309` (remove) and
      `drivers/gpu/drm/xe/xe_guc_ct.c:310` (re-add).

- Why it matters (ordering bug/consistency)
  - `guc_action_disable_ct` is first registered in the initial init path
    so CT is disabled before CTB BO teardown during managed cleanup.
    - Reference: first registration in `xe_guc_ct_init` at
      `drivers/gpu/drm/xe/xe_guc_ct.c:281`.
  - In the dGFX path, `xe_managed_bo_reinit_in_vram` replaces the SMEM
    BO with a VRAM BO and executes the old BO’s managed action
    immediately.
    - Reference: `drivers/gpu/drm/xe/xe_bo.c:2679` uses
      `devm_release_action(...)` to release the old BO pin/map action
      during reinit.
  - Without re-registering `guc_action_disable_ct` after the new BO is
    created, the devres LIFO order can invert: the new BO’s cleanup runs
    before CT is disabled, risking CT activity referencing a BO that is
    being torn down. This was corrected for dGFX and is now made
    consistent for iGFX by always re-registering.
  - The function asserts CT is not enabled at this stage, so
    removing/re-adding (or releasing/re-adding) the disable action is
    safe and purely affects future cleanup ordering, not current state.
    - Reference: `drivers/gpu/drm/xe/xe_guc_ct.c:301` `xe_assert(xe,
      !xe_guc_ct_enabled(ct));`

- Impact scope and risk
  - Scope: one function in the Xe GuC CT path,
    `drivers/gpu/drm/xe/xe_guc_ct.c:294`.
  - Behavior change: only the devres cleanup ordering for
    `guc_action_disable_ct` relative to resources registered around the
    post-hwconfig phase. No API/ABI changes, no functional changes at
    runtime beyond safer teardown/unwind ordering.
  - Low regression risk: re-registering the same action is idempotent
    with respect to runtime, and improves consistency between dGFX and
    iGFX flows. If `devm_release_action` is used (as in the patch text),
    `guc_action_disable_ct` executes immediately; this is safe because
    CT is asserted disabled at this point and the action is a no-op
    state transition to DISABLED.
    - Reference: `guc_action_disable_ct` body at
      `drivers/gpu/drm/xe/xe_guc_ct.c:257` sets `ct->state =
      XE_GUC_CT_STATE_DISABLED`.

- Stable backport criteria
  - Fixes a real (if timing-dependent) bug class: mismatched cleanup
    ordering between platform variants that could allow CT communication
    to outlive its buffer during teardown/error-unwind.
  - Minimal, contained change in a driver subsystem; no architectural
    changes.
  - No new features; improves correctness and consistency.
  - Touches only DRM Xe GuC control transport; not a core subsystem.

- Additional context
  - This function is invoked during the post-hwconfig phase:
    `drivers/gpu/drm/xe/xe_guc.c:837`. Ensuring the disable action is
    re-registered here makes its cleanup ordering correct relative to
    the newly created VRAM BO (on dGFX) and consistent on iGFX as
    further managed resources are registered after this step.

Given the low risk, small scope, and correctness benefit (unified and
safe cleanup ordering), this is a good candidate for stable backporting.

 drivers/gpu/drm/xe/xe_guc_ct.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index ff622628d823f..22eff8476ad48 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -300,12 +300,11 @@ int xe_guc_ct_init_post_hwconfig(struct xe_guc_ct *ct)
 
 	xe_assert(xe, !xe_guc_ct_enabled(ct));
 
-	if (!IS_DGFX(xe))
-		return 0;
-
-	ret = xe_managed_bo_reinit_in_vram(xe, tile, &ct->bo);
-	if (ret)
-		return ret;
+	if (IS_DGFX(xe)) {
+		ret = xe_managed_bo_reinit_in_vram(xe, tile, &ct->bo);
+		if (ret)
+			return ret;
+	}
 
 	devm_release_action(xe->drm.dev, guc_action_disable_ct, ct);
 	return devm_add_action_or_reset(xe->drm.dev, guc_action_disable_ct, ct);
-- 
2.51.0


  parent reply	other threads:[~2025-10-25 16:21 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20251025160905.3857885-1-sashal@kernel.org>
2025-10-25 15:54 ` [PATCH AUTOSEL 6.17] drm/xe/pcode: Initialize data0 for pcode read routine Sasha Levin
2025-10-25 15:54 ` [PATCH AUTOSEL 6.17] drm/xe: improve dma-resv handling for backup object Sasha Levin
2025-10-25 15:54 ` [PATCH AUTOSEL 6.17] drm/xe: Extend wa_13012615864 to additional Xe2 and Xe3 platforms Sasha Levin
2025-10-25 15:54 ` [PATCH AUTOSEL 6.17] drm/xe/ptl: Apply Wa_16026007364 Sasha Levin
2025-10-25 15:55 ` [PATCH AUTOSEL 6.17] drm/xe: Set GT as wedged before sending wedged uevent Sasha Levin
2025-10-25 15:55 ` [PATCH AUTOSEL 6.17] drm/xe/i2c: Enable bus mastering Sasha Levin
2025-10-25 15:55 ` [PATCH AUTOSEL 6.17] drm/xe/configfs: Enforce canonical device names Sasha Levin
2025-10-25 15:56 ` [PATCH AUTOSEL 6.17] drm/xe: Extend Wa_22021007897 to Xe3 platforms Sasha Levin
2025-10-25 15:56 ` [PATCH AUTOSEL 6.17] drm/xe: Cancel pending TLB inval workers on teardown Sasha Levin
2025-10-25 15:57 ` [PATCH AUTOSEL 6.17-6.12] drm/xe/guc: Increase GuC crash dump buffer size Sasha Levin
2025-10-25 15:57 ` [PATCH AUTOSEL 6.17] drm/xe/wcl: Extend L3bank mask workaround Sasha Levin
2025-10-25 15:57 ` [PATCH AUTOSEL 6.17-6.12] drm/xe/guc: Set upper limit of H2G retries over CTB Sasha Levin
2025-10-25 15:57 ` [PATCH AUTOSEL 6.17] drm/xe: Make page size consistent in loop Sasha Levin
2025-10-25 15:57 ` [PATCH AUTOSEL 6.17] drm/xe/guc: Add devm release action to safely tear down CT Sasha Levin
2025-10-25 15:57 ` [PATCH AUTOSEL 6.17] drm/xe/pf: Program LMTT directory pointer on all GTs within a tile Sasha Levin
2025-10-25 15:58 ` Sasha Levin [this message]
2025-10-25 15:58 ` [PATCH AUTOSEL 6.17] drm/xe/pf: Don't resume device from restart worker Sasha Levin
2025-10-25 15:59 ` [PATCH AUTOSEL 6.17-6.12] drm/xe/guc: Return an error code if the GuC load fails Sasha Levin
2025-10-25 15:59 ` [PATCH AUTOSEL 6.17] drm/xe: Ensure GT is in C0 during resumes Sasha Levin
2025-10-25 15:59 ` [PATCH AUTOSEL 6.17] drm/xe: rework PDE PAT index selection Sasha Levin
2025-10-25 16:01 ` [PATCH AUTOSEL 6.17-6.12] drm/xe/guc: Add more GuC load error status codes Sasha Levin
2025-10-25 16:01 ` [PATCH AUTOSEL 6.17-6.12] drm/xe: Fix oops in xe_gem_fault when running core_hotunplug test Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251025160905.3857885-271-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=lucas.demarchi@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=michal.wajdeczko@intel.com \
    --cc=patches@lists.linux.dev \
    --cc=rodrigo.vivi@intel.com \
    --cc=satyanarayana.k.v.p@intel.com \
    --cc=stable@vger.kernel.org \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox