From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 490FF22A4DB; Sat, 25 Oct 2025 16:24:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761409478; cv=none; b=ULDfzkOm3dY6kAnYvsDUTD+FKxn0kvHLlaVkUjwRncHvfaEc1ILyGn1WotjspDs2KpEYCaunBMqFmd68YnN0WSjtx6lhZsezeoPNO7XAn3DwyZ6EoUJddKXIvhgGMZxhw95/d5HpExA0PURs2hYPqs9VVV/eV9HliYSaStJKSGw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761409478; c=relaxed/simple; bh=cX5pTKlpEQevGbZjn7IpLbIVzLQNoO0WL7C5EIr3MB8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=fj9wfyQ0iJrS17RU3mxS12W3lJgH8zeMcM0Zf8nTPkzAaFede9tTR9VXa8BawNkEljYvAsaEYoWqhl/HRKyoo4fo+Y4Uao77NIZ/0PQkDJoAjIo+JiR7xDpsMV3X56Fv5QiIp1HbgruyfKB5xGR6PESoLKTNcRgR72VcjaTgO4k= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=c+sFBSK+; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="c+sFBSK+" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 414B6C4CEF5; Sat, 25 Oct 2025 16:24:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1761409478; bh=cX5pTKlpEQevGbZjn7IpLbIVzLQNoO0WL7C5EIr3MB8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=c+sFBSK+LVv5tVJ5iojT7SJiRb4JN1naZ31ptuqH4RF1LcpYAxRDDyS1f4IF7jVze LPqyWJMgfSW9zw3t55SzH7EOB3abnW6qYBtM0740GLljamWsrKDqKzNRgIikSEOfum 9J2tXgV2GTmiixkuKRb77usJQyisFPUd8kKmb8V0smJ1MAqpmfJ+K5mS0aUZWVXySN 0RXW6ExYWGDUAXG/4JKV+3C5lLi73OCglU7K4hCBVuk/+46/TAPG1KNwcIU9An4IDL fi6Iy4TpRthRvYaOPf9e47d0nr4fpcOefE7oNtiEDsE+ILjAKmfLNysI3KsHJAq1LY i9Tnmknmioifg== From: Sasha Levin To: patches@lists.linux.dev, stable@vger.kernel.org Cc: Xin Wang , Rodrigo Vivi , Sasha Levin , lucas.demarchi@intel.com, thomas.hellstrom@linux.intel.com, intel-xe@lists.freedesktop.org Subject: [PATCH AUTOSEL 6.17] drm/xe: Ensure GT is in C0 during resumes Date: Sat, 25 Oct 2025 11:59:32 -0400 Message-ID: <20251025160905.3857885-341-sashal@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251025160905.3857885-1-sashal@kernel.org> References: <20251025160905.3857885-1-sashal@kernel.org> Precedence: bulk X-Mailing-List: patches@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore X-stable-base: Linux 6.17.5 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit From: Xin Wang [ Upstream commit 95d0883ac8105717f59c2dcdc0d8b9150f13aa12 ] This patch ensures the gt will be awake for the entire duration of the resume sequences until GuCRC takes over and GT-C6 gets re-enabled. Before suspending GT-C6 is kept enabled, but upon resume, GuCRC is not yet alive to properly control the exits and some cases of instability and corruption related to GT-C6 can be observed. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4037 Suggested-by: Rodrigo Vivi Signed-off-by: Xin Wang Reviewed-by: Rodrigo Vivi Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4037 Link: https://lore.kernel.org/r/20250827000633.1369890-3-x.wang@intel.com Signed-off-by: Rodrigo Vivi Signed-off-by: Sasha Levin --- LLM Generated explanations, may be completely bogus: YES - Fixes real user-visible bugs: The commit addresses resume-time instability and corruption linked to GT-C6 being enabled before GuC (power control) is up, as described in the message and tracked by the referenced issue. Keeping the GT awake through resume avoids those failures. - Small, contained change: It only adds early RC6 disable during resume paths and the required header: - Adds `#include "xe_gt_idle.h"` so the helper is visible (drivers/gpu/drm/xe/xe_pm.c:21). - System resume: disables GT-C6 for all GTs right at resume entry, before any other resume work (drivers/gpu/drm/xe/xe_pm.c:184-186). - Runtime resume: same early disable for all GTs (drivers/gpu/drm/xe/xe_pm.c:570-572). - No architectural changes, no interface changes, no behavior changes outside resume paths. - Correct technical fix: The helper `xe_gt_idle_disable_c6()` safely forces GT out of RC6 before resume proceeds: - Implementation clears RC6 and RC state under forcewake and is a no- op on VFs (drivers/gpu/drm/xe/xe_gt_idle.c:389-407). - If forcewake is not available yet, it returns `-ETIMEDOUT`; the resume continues without regressing behavior (callers ignore return, which is acceptable to prevent blocking resume). - Proper handoff to re-enable C-states: RC6 is re-enabled by GuC Power Conservation once firmware is up, or explicitly when GuC PC is skipped: - `xe_uc_load_hw()` starts GuC PC during GT bringup (drivers/gpu/drm/xe/xe_uc.c:215). - If GuC PC is skipped, RC6 is explicitly re-enabled via `xe_gt_idle_enable_c6(gt)` (drivers/gpu/drm/xe/xe_guc_pc.c:1257). - Thus the “keep GT awake only until GuC takes over” intent is fulfilled, avoiding prolonged power impact. - Low regression risk: - Scope limited to early resume time; worst-case effect is slightly higher power during resume window. - No changes to suspend sequencing, only resume entry. - SR-IOV VFs unaffected (helper is no-op there). - Resume sequences already transition to GuC-controlled power states, so this change aligns with existing design. - Stable backport suitability: - Bug fix with user impact (instability/corruption) and a minimal, targeted change. - No new features or ABI changes. - Touches the `drm/xe` driver only, not core subsystems. - If a target stable branch predates `xe_gt_idle_disable_c6()` or `xe_gt_idle.h`, the backport must include or adapt to the equivalent RC6 control helper; otherwise this applies cleanly. Overall, this is a classic stable-worthy fix: minimal, isolated, and prevents real-world resume failures without architectural churn. drivers/gpu/drm/xe/xe_pm.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c index 3e301e42b2f19..9fccc7a855f30 100644 --- a/drivers/gpu/drm/xe/xe_pm.c +++ b/drivers/gpu/drm/xe/xe_pm.c @@ -18,7 +18,7 @@ #include "xe_device.h" #include "xe_ggtt.h" #include "xe_gt.h" -#include "xe_guc.h" +#include "xe_gt_idle.h" #include "xe_i2c.h" #include "xe_irq.h" #include "xe_pcode.h" @@ -177,6 +177,9 @@ int xe_pm_resume(struct xe_device *xe) drm_dbg(&xe->drm, "Resuming device\n"); trace_xe_pm_resume(xe, __builtin_return_address(0)); + for_each_gt(gt, xe, id) + xe_gt_idle_disable_c6(gt); + for_each_tile(tile, xe, id) xe_wa_apply_tile_workarounds(tile); @@ -547,6 +550,9 @@ int xe_pm_runtime_resume(struct xe_device *xe) xe_rpm_lockmap_acquire(xe); + for_each_gt(gt, xe, id) + xe_gt_idle_disable_c6(gt); + if (xe->d3cold.allowed) { err = xe_pcode_ready(xe, true); if (err) -- 2.51.0