Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [CI 1/2] drm/xe/pm: Temporarily disable D3Cold on BMG
@ 2025-03-08  0:56 Rodrigo Vivi
  2025-03-08  0:56 ` [CI 2/2] drm/xe/pm: Re-enable D3Cold by default " Rodrigo Vivi
                   ` (8 more replies)
  0 siblings, 9 replies; 12+ messages in thread
From: Rodrigo Vivi @ 2025-03-08  0:56 UTC (permalink / raw)
  To: intel-xe; +Cc: Rodrigo Vivi, Karthik Poosa, Lucas De Marchi

Currently, many instability cases related to D3Cold -> D0 transition
on BMG are under investigation. Among them some bad cases where
the device is lost after 1 to 3 transitions from D3Cold to D0
on the runtime pm, with pcieport upstream bridge port link retrain
failure.

In other cases, it works fine, but with some sudden random memory
corruptions after D3cold, that could be 0xffff missed ack on GT
forcewake or GuC reload related failures.

In some other cases though, D3Cold -> D0 works pretty reliably.
It looks like it is a combination of GPU cards and Host boards at
this point. So, there is no possible/available quirk at this time.

This patch disables the D3Cold by default on BMG by reducing the
vram_d3cold_threshold to 0. Users and developers who wants to enable
it are still able to via
$ echo 300 > /sys/bus/pci/devices/<addr>/vram_d3cold_threshold

Fixes: 3adcf970dc7e ("drm/xe/bmg: Drop force_probe requirement")
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4037
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4395
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4396
Cc: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_pm.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
index 12200be7b43d..7b6b754ad6eb 100644
--- a/drivers/gpu/drm/xe/xe_pm.c
+++ b/drivers/gpu/drm/xe/xe_pm.c
@@ -277,6 +277,15 @@ int xe_pm_init_early(struct xe_device *xe)
 }
 ALLOW_ERROR_INJECTION(xe_pm_init_early, ERRNO); /* See xe_pci_probe() */
 
+static u32 vram_threshold_value(struct xe_device *xe)
+{
+	/* FIXME: D3Cold temporarily disabled by default on BMG */
+	if (xe->info.platform == XE_BATTLEMAGE)
+		return 0;
+
+	return DEFAULT_VRAM_THRESHOLD;
+}
+
 /**
  * xe_pm_init - Initialize Xe Power Management
  * @xe: xe device instance
@@ -287,6 +296,7 @@ ALLOW_ERROR_INJECTION(xe_pm_init_early, ERRNO); /* See xe_pci_probe() */
  */
 int xe_pm_init(struct xe_device *xe)
 {
+	u32 vram_threshold;
 	int err;
 
 	/* For now suspend/resume is only allowed with GuC */
@@ -300,7 +310,8 @@ int xe_pm_init(struct xe_device *xe)
 		if (err)
 			return err;
 
-		err = xe_pm_set_vram_threshold(xe, DEFAULT_VRAM_THRESHOLD);
+		vram_threshold = vram_threshold_value(xe);
+		err = xe_pm_set_vram_threshold(xe, vram_threshold);
 		if (err)
 			return err;
 	}
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-03-24 14:32 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-08  0:56 [CI 1/2] drm/xe/pm: Temporarily disable D3Cold on BMG Rodrigo Vivi
2025-03-08  0:56 ` [CI 2/2] drm/xe/pm: Re-enable D3Cold by default " Rodrigo Vivi
2025-03-21 18:23   ` Lucas De Marchi
2025-03-24 14:31     ` Rodrigo Vivi
2025-03-08  2:28 ` ✓ CI.Patch_applied: success for series starting with [CI,1/2] drm/xe/pm: Temporarily disable D3Cold " Patchwork
2025-03-08  2:28 ` ✓ CI.checkpatch: " Patchwork
2025-03-08  2:29 ` ✓ CI.KUnit: " Patchwork
2025-03-08  2:46 ` ✓ CI.Build: " Patchwork
2025-03-08  2:48 ` ✓ CI.Hooks: " Patchwork
2025-03-08  2:50 ` ✓ CI.checksparse: " Patchwork
2025-03-08  3:15 ` ✓ Xe.CI.BAT: " Patchwork
2025-03-09 21:56 ` ✗ Xe.CI.Full: failure " Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox