All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1] drm/xe: Improve wedged state management
@ 2026-06-17 12:03 Raag Jadav
  2026-06-17 12:16 ` ✓ CI.KUnit: success for " Patchwork
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Raag Jadav @ 2026-06-17 12:03 UTC (permalink / raw)
  To: intel-xe
  Cc: matthew.brost, rodrigo.vivi, riana.tauro, michal.wajdeczko,
	matthew.d.roper, mallesh.koujalagi, Raag Jadav

Currently, wedged state is serving a single usecase where the device is
permanently declared wedged, but this doesn't allow any wedged state
management for runtime usecases. In preparation of usecases which require
to facilitate temporary device wedging, convert wedged.flag to wedged.ref
which serves as a driver internal refcount for wedged state and blocks
critical path execution during device lifetime. While at it, introduce
wedged.perm which signifies permanent device wedging and operates
independent of the refcount allowing relevant cleanup action on unwind
path.

Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
Split from FLR series[1].

[1] https://lore.kernel.org/intel-xe/20260603101814.916948-9-raag.jadav@intel.com/
---
 drivers/gpu/drm/xe/xe_device.c       |  5 +++--
 drivers/gpu/drm/xe/xe_device.h       | 18 +++++++++++++++++-
 drivers/gpu/drm/xe/xe_device_types.h |  6 ++++--
 3 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index ef730f2bdf32..00ade433a23b 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -916,7 +916,7 @@ static void xe_device_wedged_fini(struct drm_device *drm, void *arg)
 {
 	struct xe_device *xe = arg;
 
-	if (atomic_read(&xe->wedged.flag))
+	if (atomic_read(&xe->wedged.perm))
 		xe_pm_runtime_put(xe);
 }
 
@@ -1421,7 +1421,8 @@ void xe_device_declare_wedged(struct xe_device *xe)
 		return;
 	}
 
-	if (!atomic_xchg(&xe->wedged.flag, 1)) {
+	if (!atomic_xchg(&xe->wedged.perm, 1)) {
+		xe_device_wedged_get(xe);
 		xe->needs_flr_on_fini = true;
 		xe_pm_runtime_get_noresume(xe);
 		drm_err(&xe->drm,
diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
index 975768a6a9c8..1aea83e3517c 100644
--- a/drivers/gpu/drm/xe/xe_device.h
+++ b/drivers/gpu/drm/xe/xe_device.h
@@ -192,9 +192,25 @@ bool xe_device_is_l2_flush_optimized(struct xe_device *xe);
 void xe_device_td_flush(struct xe_device *xe);
 void xe_device_l2_flush(struct xe_device *xe);
 
+static inline void xe_device_wedged_get(struct xe_device *xe)
+{
+	int ref;
+
+	ref = atomic_inc_return(&xe->wedged.ref);
+	xe_assert(xe, ref > 0);
+}
+
+static inline void xe_device_wedged_put(struct xe_device *xe)
+{
+	int ref;
+
+	ref = atomic_dec_return(&xe->wedged.ref);
+	xe_assert(xe, ref >= 0);
+}
+
 static inline bool xe_device_wedged(struct xe_device *xe)
 {
-	return atomic_read(&xe->wedged.flag);
+	return atomic_read(&xe->wedged.ref);
 }
 
 void xe_device_set_wedged_method(struct xe_device *xe, unsigned long method);
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 32dd2ffbc796..f13e0fb2f18e 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -485,8 +485,10 @@ struct xe_device {
 
 	/** @wedged: Struct to control Wedged States and mode */
 	struct {
-		/** @wedged.flag: Xe device faced a critical error and is now blocked. */
-		atomic_t flag;
+		/** @wedged.perm: Permanently wedged, needs cleanup on fini */
+		atomic_t perm;
+		/** @wedged.ref: Refcount for wedged device, blocks critical path execution */
+		atomic_t ref;
 		/** @wedged.mode: Mode controlled by kernel parameter and debugfs */
 		enum xe_wedged_mode mode;
 		/** @wedged.method: Recovery method to be sent in the drm device wedged uevent */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-17 14:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-17 12:03 [PATCH v1] drm/xe: Improve wedged state management Raag Jadav
2026-06-17 12:16 ` ✓ CI.KUnit: success for " Patchwork
2026-06-17 12:54 ` ✓ Xe.CI.BAT: " Patchwork
2026-06-17 14:06 ` [PATCH v1] " Rodrigo Vivi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.