public inbox for intel-xe@lists.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] Introduce cold reset recovery method
@ 2026-03-18  6:40 Mallesh Koujalagi
  2026-03-18  6:40 ` [PATCH v2 1/5] Introduce Xe Uncorrectable Error Handling Mallesh Koujalagi
                   ` (8 more replies)
  0 siblings, 9 replies; 25+ messages in thread
From: Mallesh Koujalagi @ 2026-03-18  6:40 UTC (permalink / raw)
  To: intel-xe, dri-devel, rodrigo.vivi
  Cc: andrealmeid, christian.koenig, airlied, simona.vetter, mripard,
	anshuman.gupta, badal.nilawar, riana.tauro, karthik.poosa,
	sk.anirban, raag.jadav, Mallesh Koujalagi

This series builds on top of Introduce Xe Uncorrectable Error Handling[1]
and adds support for handling power management unit (PMU) errors
that require a complete device power cycle to recover.

Certain PMU error conditions leave the device in a persistent hardware
error state that cannot be cleared through existing recovery mechanisms
such as driver reload or PCIe reset. In these cases, functionality can
only be restored by performing a cold reset (complete power cycle).

To support this, the series introduces a new DRM wedging recovery
method, DRM_WEDGE_RECOVERY_COLD_RESET (BIT(4)). When a device is wedged
with this method, the DRM core notifies userspace via a uevent that a cold
reset is required. This allows userspace to take appropriate action to
power-cycle the device.

Example uevent received:
  SUBSYSTEM=drm
  WEDGED=cold-reset
  DEVPATH=/devices/.../drm/card0

The cold reset recovery path can be exercised through the debugfs
interface:

  echo 1 > /sys/kernel/debug/dri/N/trigger_punit_error

This triggers the PMU error handler, wedges the device using the cold
reset recovery method, and emits the corresponding uevent to userspace.

Detailed description in commit message.

[1] https://patchwork.freedesktop.org/series/160482/
This patch series introduces a call to xe_punit_error_handler() from
within handle_soc_internal_errors() when PMU errors detected.

v2:
- Add use case: Handling errors from power management unit,
  which requires a complete power cycle (cold reset)
  to recover. (Christian)
- Add several instead of number to avoid update. (Jani)

Cc: André Almeida <andrealmeid@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona.vetter@ffwll.ch>
Cc: Maxime Ripard <mripard@kernel.org>

Mallesh Koujalagi (4):
  drm: Add DRM_WEDGE_RECOVERY_COLD_RESET for power management unit error
  drm/doc: Document DRM_WEDGE_RECOVERY_COLD_RESET recovery method
  drm/xe: Add handler for power management unit errors which require
    cold-reset
  drm/xe/debugfs: Add interface to trigger power management unit error
    handler

Riana Tauro (1):
  Introduce Xe Uncorrectable Error Handling

 Documentation/gpu/drm-uapi.rst                |  73 +++-
 drivers/gpu/drm/drm_drv.c                     |   2 +
 drivers/gpu/drm/xe/Makefile                   |   4 +
 drivers/gpu/drm/xe/regs/xe_sysctrl_regs.h     |  36 ++
 drivers/gpu/drm/xe/xe_debugfs.c               |  38 ++
 drivers/gpu/drm/xe/xe_device.c                |  15 +
 drivers/gpu/drm/xe/xe_device.h                |  15 +
 drivers/gpu/drm/xe/xe_device_types.h          |  12 +
 drivers/gpu/drm/xe/xe_gt.c                    |  11 +-
 drivers/gpu/drm/xe/xe_guc_submit.c            |   9 +-
 drivers/gpu/drm/xe/xe_hw_error.c              |  27 ++
 drivers/gpu/drm/xe/xe_hw_error.h              |   1 +
 drivers/gpu/drm/xe/xe_pci.c                   |   5 +
 drivers/gpu/drm/xe/xe_pci_error.c             | 111 +++++
 drivers/gpu/drm/xe/xe_pci_types.h             |   1 +
 drivers/gpu/drm/xe/xe_ras.c                   | 332 +++++++++++++++
 drivers/gpu/drm/xe/xe_ras.h                   |  16 +
 drivers/gpu/drm/xe/xe_ras_types.h             | 282 +++++++++++++
 drivers/gpu/drm/xe/xe_survivability_mode.c    |  12 +-
 drivers/gpu/drm/xe/xe_sysctrl.c               |  80 ++++
 drivers/gpu/drm/xe/xe_sysctrl.h               |  13 +
 drivers/gpu/drm/xe/xe_sysctrl_mailbox.c       | 390 ++++++++++++++++++
 drivers/gpu/drm/xe/xe_sysctrl_mailbox.h       |  35 ++
 drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h |  55 +++
 drivers/gpu/drm/xe/xe_sysctrl_types.h         |  33 ++
 include/drm/drm_device.h                      |   1 +
 26 files changed, 1600 insertions(+), 9 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/regs/xe_sysctrl_regs.h
 create mode 100644 drivers/gpu/drm/xe/xe_pci_error.c
 create mode 100644 drivers/gpu/drm/xe/xe_ras.c
 create mode 100644 drivers/gpu/drm/xe/xe_ras.h
 create mode 100644 drivers/gpu/drm/xe/xe_ras_types.h
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl.c
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl.h
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl_mailbox.c
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl_mailbox.h
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl_types.h

-- 
2.34.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 1/5] Introduce Xe Uncorrectable Error Handling
  2026-03-18  6:40 [PATCH v2 0/5] Introduce cold reset recovery method Mallesh Koujalagi
@ 2026-03-18  6:40 ` Mallesh Koujalagi
  2026-03-18 19:35   ` kernel test robot
                     ` (2 more replies)
  2026-03-18  6:40 ` [PATCH v2 2/5] drm: Add DRM_WEDGE_RECOVERY_COLD_RESET for power management unit error Mallesh Koujalagi
                   ` (7 subsequent siblings)
  8 siblings, 3 replies; 25+ messages in thread
From: Mallesh Koujalagi @ 2026-03-18  6:40 UTC (permalink / raw)
  To: intel-xe, dri-devel, rodrigo.vivi
  Cc: andrealmeid, christian.koenig, airlied, simona.vetter, mripard,
	anshuman.gupta, badal.nilawar, riana.tauro, karthik.poosa,
	sk.anirban, raag.jadav, Mallesh Koujalagi

From: Riana Tauro <riana.tauro@intel.com>

DO NOT REVIEW. COMPILATION ONLY
This patch is from https://patchwork.freedesktop.org/series/160482/
Added only for Compilation.

Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
---
 drivers/gpu/drm/xe/Makefile                   |   4 +
 drivers/gpu/drm/xe/regs/xe_sysctrl_regs.h     |  36 ++
 drivers/gpu/drm/xe/xe_device.c                |  15 +
 drivers/gpu/drm/xe/xe_device.h                |  15 +
 drivers/gpu/drm/xe/xe_device_types.h          |  12 +
 drivers/gpu/drm/xe/xe_gt.c                    |  11 +-
 drivers/gpu/drm/xe/xe_guc_submit.c            |   9 +-
 drivers/gpu/drm/xe/xe_pci.c                   |   5 +
 drivers/gpu/drm/xe/xe_pci_error.c             | 111 +++++
 drivers/gpu/drm/xe/xe_pci_types.h             |   1 +
 drivers/gpu/drm/xe/xe_ras.c                   | 331 +++++++++++++++
 drivers/gpu/drm/xe/xe_ras.h                   |  16 +
 drivers/gpu/drm/xe/xe_ras_types.h             | 282 +++++++++++++
 drivers/gpu/drm/xe/xe_survivability_mode.c    |  12 +-
 drivers/gpu/drm/xe/xe_sysctrl.c               |  80 ++++
 drivers/gpu/drm/xe/xe_sysctrl.h               |  13 +
 drivers/gpu/drm/xe/xe_sysctrl_mailbox.c       | 390 ++++++++++++++++++
 drivers/gpu/drm/xe/xe_sysctrl_mailbox.h       |  35 ++
 drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h |  55 +++
 drivers/gpu/drm/xe/xe_sysctrl_types.h         |  33 ++
 20 files changed, 1458 insertions(+), 8 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/regs/xe_sysctrl_regs.h
 create mode 100644 drivers/gpu/drm/xe/xe_pci_error.c
 create mode 100644 drivers/gpu/drm/xe/xe_ras.c
 create mode 100644 drivers/gpu/drm/xe/xe_ras.h
 create mode 100644 drivers/gpu/drm/xe/xe_ras_types.h
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl.c
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl.h
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl_mailbox.c
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl_mailbox.h
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl_types.h

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index dab979287a96..88ade7d3fc80 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -100,6 +100,7 @@ xe-y += xe_bb.o \
 	xe_page_reclaim.o \
 	xe_pat.o \
 	xe_pci.o \
+	xe_pci_error.o \
 	xe_pci_rebar.o \
 	xe_pcode.o \
 	xe_pm.o \
@@ -111,6 +112,7 @@ xe-y += xe_bb.o \
 	xe_pxp_debugfs.o \
 	xe_pxp_submit.o \
 	xe_query.o \
+	xe_ras.o \
 	xe_range_fence.o \
 	xe_reg_sr.o \
 	xe_reg_whitelist.o \
@@ -123,6 +125,8 @@ xe-y += xe_bb.o \
 	xe_step.o \
 	xe_survivability_mode.o \
 	xe_sync.o \
+	xe_sysctrl.o \
+	xe_sysctrl_mailbox.o \
 	xe_tile.o \
 	xe_tile_sysfs.o \
 	xe_tlb_inval.o \
diff --git a/drivers/gpu/drm/xe/regs/xe_sysctrl_regs.h b/drivers/gpu/drm/xe/regs/xe_sysctrl_regs.h
new file mode 100644
index 000000000000..2e91febfa9a2
--- /dev/null
+++ b/drivers/gpu/drm/xe/regs/xe_sysctrl_regs.h
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#ifndef _XE_SYSCTRL_REGS_H_
+#define _XE_SYSCTRL_REGS_H_
+
+#include "xe_regs.h"
+
+#define SYSCTRL_BASE_OFFSET			0xdb000
+#define SYSCTRL_BASE				(SOC_BASE + SYSCTRL_BASE_OFFSET)
+#define SYSCTRL_MAILBOX_INDEX			0x03
+#define SYSCTRL_BAR_LENGTH			0x1000
+
+#define SYSCTRL_MB_CTRL				XE_REG(0x10)
+#define   SYSCTRL_MB_CTRL_RUN_BUSY		REG_BIT(31)
+#define   SYSCTRL_MB_CTRL_IRQ			REG_BIT(30)
+#define   SYSCTRL_MB_CTRL_RUN_BUSY_OUT		REG_BIT(29)
+#define   SYSCTRL_MB_CTRL_PARAM3_MASK		REG_GENMASK(28, 24)
+#define   SYSCTRL_MB_CTRL_PARAM2_MASK		REG_GENMASK(23, 16)
+#define   SYSCTRL_MB_CTRL_PARAM1_MASK		REG_GENMASK(15, 8)
+#define   SYSCTRL_MB_CTRL_COMMAND_MASK		REG_GENMASK(7, 0)
+#define   SYSCTRL_MB_CTRL_MKHI_CMD		REG_FIELD_PREP(SYSCTRL_MB_CTRL_COMMAND_MASK, 5)
+
+#define SYSCTRL_MB_DATA0			XE_REG(0x14)
+#define SYSCTRL_MB_DATA1			XE_REG(0x18)
+#define SYSCTRL_MB_DATA2			XE_REG(0x1C)
+#define SYSCTRL_MB_DATA3			XE_REG(0x20)
+
+#define MKHI_FRAME_PHASE			REG_BIT(24)
+#define MKHI_FRAME_CURRENT_MASK			REG_GENMASK(21, 16)
+#define MKHI_FRAME_TOTAL_MASK			REG_GENMASK(13, 8)
+#define MKHI_FRAME_COMMAND_MASK			REG_GENMASK(7, 0)
+
+#endif /* _XE_SYSCTRL_REGS_H_ */
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index e77a3a3db73d..1c0f05b66b06 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -60,11 +60,13 @@
 #include "xe_psmi.h"
 #include "xe_pxp.h"
 #include "xe_query.h"
+#include "xe_ras.h"
 #include "xe_shrinker.h"
 #include "xe_soc_remapper.h"
 #include "xe_survivability_mode.h"
 #include "xe_sriov.h"
 #include "xe_svm.h"
+#include "xe_sysctrl.h"
 #include "xe_tile.h"
 #include "xe_ttm_stolen_mgr.h"
 #include "xe_ttm_sys_mgr.h"
@@ -440,6 +442,7 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
 				   const struct pci_device_id *ent)
 {
 	struct xe_device *xe;
+	void *devres_id;
 	int err;
 
 	xe_display_driver_set_hooks(&driver);
@@ -448,10 +451,16 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
 	if (err)
 		return ERR_PTR(err);
 
+	devres_id = devres_open_group(&pdev->dev, NULL, GFP_KERNEL);
+	if (!devres_id)
+		return ERR_PTR(-ENOMEM);
+
 	xe = devm_drm_dev_alloc(&pdev->dev, &driver, struct xe_device, drm);
 	if (IS_ERR(xe))
 		return xe;
 
+	xe->devres_group_id = devres_id;
+
 	err = ttm_device_init(&xe->ttm, &xe_ttm_funcs, xe->drm.dev,
 			      xe->drm.anon_inode->i_mapping,
 			      xe->drm.vma_offset_manager, 0);
@@ -985,6 +994,10 @@ int xe_device_probe(struct xe_device *xe)
 	if (err)
 		goto err_unregister_display;
 
+	err = xe_sysctrl_init(xe);
+	if (err)
+		goto err_unregister_display;
+
 	err = xe_device_sysfs_init(xe);
 	if (err)
 		goto err_unregister_display;
@@ -1004,6 +1017,8 @@ int xe_device_probe(struct xe_device *xe)
 
 	xe_vsec_init(xe);
 
+	xe_ras_init(xe);
+
 	err = xe_sriov_init_late(xe);
 	if (err)
 		goto err_unregister_display;
diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
index c4d267002661..ce9ec08572be 100644
--- a/drivers/gpu/drm/xe/xe_device.h
+++ b/drivers/gpu/drm/xe/xe_device.h
@@ -43,6 +43,21 @@ static inline struct xe_device *ttm_to_xe_device(struct ttm_device *ttm)
 	return container_of(ttm, struct xe_device, ttm);
 }
 
+static inline bool xe_device_is_in_recovery(struct xe_device *xe)
+{
+	return atomic_read(&xe->in_recovery);
+}
+
+static inline void xe_device_set_in_recovery(struct xe_device *xe)
+{
+	atomic_set(&xe->in_recovery, 1);
+}
+
+static inline void xe_device_clear_in_recovery(struct xe_device *xe)
+{
+	 atomic_set(&xe->in_recovery, 0);
+}
+
 struct xe_device *xe_device_create(struct pci_dev *pdev,
 				   const struct pci_device_id *ent);
 int xe_device_probe_early(struct xe_device *xe);
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 615218d775b1..f55f0d72191d 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -26,6 +26,7 @@
 #include "xe_sriov_vf_types.h"
 #include "xe_sriov_vf_ccs_types.h"
 #include "xe_step_types.h"
+#include "xe_sysctrl_types.h"
 #include "xe_survivability_mode_types.h"
 #include "xe_tile_types.h"
 #include "xe_validation.h"
@@ -196,6 +197,8 @@ struct xe_device {
 		u8 has_soc_remapper_telem:1;
 		/** @info.has_sriov: Supports SR-IOV */
 		u8 has_sriov:1;
+		/** @info.has_sysctrl: Supports System Controller */
+		u8 has_sysctrl:1;
 		/** @info.has_usm: Device has unified shared memory support */
 		u8 has_usm:1;
 		/** @info.has_64bit_timestamp: Device supports 64-bit timestamps */
@@ -464,6 +467,9 @@ struct xe_device {
 	/** @heci_gsc: graphics security controller */
 	struct xe_heci_gsc heci_gsc;
 
+	/** @sc: System Controller */
+	struct xe_sysctrl sc;
+
 	/** @nvm: discrete graphics non-volatile memory */
 	struct intel_dg_nvm_dev *nvm;
 
@@ -491,6 +497,12 @@ struct xe_device {
 		bool inconsistent_reset;
 	} wedged;
 
+	/** @in_recovery: Indicates if device is in recovery */
+	atomic_t in_recovery;
+
+	/** @devres_group_id: id for devres group */
+	void *devres_group_id;
+
 	/** @bo_device: Struct to control async free of BOs */
 	struct xe_bo_dev {
 		/** @bo_device.async_free: Free worker */
diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index bae895fa066a..cf4ca3260a55 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -974,18 +974,23 @@ static void gt_reset_worker(struct work_struct *w)
 
 void xe_gt_reset_async(struct xe_gt *gt)
 {
-	xe_gt_info(gt, "trying reset from %ps\n", __builtin_return_address(0));
+	struct xe_device *xe = gt_to_xe(gt);
+
+	if (xe_device_is_in_recovery(xe))
+		return;
 
 	/* Don't do a reset while one is already in flight */
 	if (!xe_fault_inject_gt_reset() && xe_uc_reset_prepare(&gt->uc))
 		return;
 
+	xe_gt_info(gt, "trying reset from %ps\n", __builtin_return_address(0));
+
 	xe_gt_info(gt, "reset queued\n");
 
 	/* Pair with put in gt_reset_worker() if work is enqueued */
-	xe_pm_runtime_get_noresume(gt_to_xe(gt));
+	xe_pm_runtime_get_noresume(xe);
 	if (!queue_work(gt->ordered_wq, &gt->reset.worker))
-		xe_pm_runtime_put(gt_to_xe(gt));
+		xe_pm_runtime_put(xe);
 }
 
 void xe_gt_suspend_prepare(struct xe_gt *gt)
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index a145234f662b..f9f151861c88 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1537,7 +1537,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 	 * If devcoredump not captured and GuC capture for the job is not ready
 	 * do manual capture first and decide later if we need to use it
 	 */
-	if (!exec_queue_killed(q) && !xe->devcoredump.captured &&
+	if (!xe_device_is_in_recovery(xe) && !exec_queue_killed(q) && !xe->devcoredump.captured &&
 	    !xe_guc_capture_get_matching_and_lock(q)) {
 		/* take force wake before engine register manual capture */
 		CLASS(xe_force_wake, fw_ref)(gt_to_fw(q->gt), XE_FORCEWAKE_ALL);
@@ -1559,8 +1559,8 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 	set_exec_queue_banned(q);
 
 	/* Kick job / queue off hardware */
-	if (!wedged && (exec_queue_enabled(primary) ||
-			exec_queue_pending_disable(primary))) {
+	if (!xe_device_is_in_recovery(xe) && !wedged &&
+	    (exec_queue_enabled(primary) || exec_queue_pending_disable(primary))) {
 		int ret;
 
 		if (exec_queue_reset(primary))
@@ -1628,7 +1628,8 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 
 	trace_xe_sched_job_timedout(job);
 
-	if (!exec_queue_killed(q))
+	/* Do not access device if in recovery */
+	if (!xe_device_is_in_recovery(xe) && !exec_queue_killed(q))
 		xe_devcoredump(q, job,
 			       "Timedout job - seqno=%u, lrc_seqno=%u, guc_id=%d, flags=0x%lx",
 			       xe_sched_job_seqno(job), xe_sched_job_lrc_seqno(job),
diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
index 189e2a1c29f9..2a73035b863d 100644
--- a/drivers/gpu/drm/xe/xe_pci.c
+++ b/drivers/gpu/drm/xe/xe_pci.c
@@ -465,6 +465,7 @@ static const struct xe_device_desc cri_desc = {
 	.has_soc_remapper_sysctrl = true,
 	.has_soc_remapper_telem = true,
 	.has_sriov = true,
+	.has_sysctrl = true,
 	.max_gt_per_tile = 2,
 	MULTI_LRC_MASK,
 	.require_force_probe = true,
@@ -764,6 +765,7 @@ static int xe_info_init_early(struct xe_device *xe,
 	xe->info.has_soc_remapper_telem = desc->has_soc_remapper_telem;
 	xe->info.has_sriov = xe_configfs_primary_gt_allowed(to_pci_dev(xe->drm.dev)) &&
 		desc->has_sriov;
+	xe->info.has_sysctrl = desc->has_sysctrl;
 	xe->info.skip_guc_pc = desc->skip_guc_pc;
 	xe->info.skip_mtcfg = desc->skip_mtcfg;
 	xe->info.skip_pcode = desc->skip_pcode;
@@ -1315,6 +1317,8 @@ static const struct dev_pm_ops xe_pm_ops = {
 };
 #endif
 
+extern const struct pci_error_handlers xe_pci_error_handlers;
+
 static struct pci_driver xe_pci_driver = {
 	.name = DRIVER_NAME,
 	.id_table = pciidlist,
@@ -1322,6 +1326,7 @@ static struct pci_driver xe_pci_driver = {
 	.remove = xe_pci_remove,
 	.shutdown = xe_pci_shutdown,
 	.sriov_configure = xe_pci_sriov_configure,
+	.err_handler = &xe_pci_error_handlers,
 #ifdef CONFIG_PM_SLEEP
 	.driver.pm = &xe_pm_ops,
 #endif
diff --git a/drivers/gpu/drm/xe/xe_pci_error.c b/drivers/gpu/drm/xe/xe_pci_error.c
new file mode 100644
index 000000000000..35bc5b8cee99
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_pci_error.c
@@ -0,0 +1,111 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+#include <linux/pci.h>
+
+#include <drm/drm_drv.h>
+
+#include "xe_device.h"
+#include "xe_gt.h"
+#include "xe_pci.h"
+#include "xe_ras.h"
+#include "xe_uc.h"
+
+static void xe_pci_error_handling(struct pci_dev *pdev)
+{
+	struct xe_device *xe = pdev_to_xe_device(pdev);
+	struct xe_gt *gt;
+	u8 id;
+
+	/* Wedge the device to prevent userspace access but don't send the event yet */
+	atomic_set(&xe->wedged.flag, 1);
+
+	for_each_gt(gt, xe, id)
+		xe_gt_declare_wedged(gt);
+
+	pci_disable_device(pdev);
+}
+
+static pci_ers_result_t ras_recovery_action_to_pci_result[] = {
+	[XE_RAS_RECOVERY_ACTION_RECOVERED] = PCI_ERS_RESULT_RECOVERED,
+	[XE_RAS_RECOVERY_ACTION_RESET] = PCI_ERS_RESULT_NEED_RESET,
+	[XE_RAS_RECOVERY_ACTION_DISCONNECT] = PCI_ERS_RESULT_DISCONNECT,
+};
+
+static pci_ers_result_t xe_pci_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
+{
+	struct xe_device *xe = pdev_to_xe_device(pdev);
+
+	dev_err(&pdev->dev, "Xe Pci error recovery: error detected state %d\n", state);
+
+	xe_device_set_in_recovery(xe);
+
+	switch (state) {
+	case pci_channel_io_normal:
+		return PCI_ERS_RESULT_CAN_RECOVER;
+	case pci_channel_io_frozen:
+		xe_pci_error_handling(pdev);
+		return PCI_ERS_RESULT_NEED_RESET;
+	case pci_channel_io_perm_failure:
+		return PCI_ERS_RESULT_DISCONNECT;
+	default:
+		dev_err(&pdev->dev, "Unknown state %d\n", state);
+		return PCI_ERS_RESULT_NEED_RESET;
+	}
+}
+
+static pci_ers_result_t xe_pci_error_mmio_enabled(struct pci_dev *pdev)
+{
+	struct xe_device *xe = pdev_to_xe_device(pdev);
+	enum xe_ras_recovery_action action;
+
+	dev_err(&pdev->dev, "Xe Pci error recovery: MMIO enabled\n");
+	action = xe_ras_process_errors(xe);
+
+	return ras_recovery_action_to_pci_result[action];
+}
+
+static pci_ers_result_t xe_pci_error_slot_reset(struct pci_dev *pdev)
+{
+	const struct pci_device_id *ent = pci_match_id(pdev->driver->id_table, pdev);
+	struct xe_device *xe = pdev_to_xe_device(pdev);
+
+	dev_err(&pdev->dev, "Xe Pci error recovery: Slot reset\n");
+
+	pci_restore_state(pdev);
+
+	if (pci_enable_device(pdev)) {
+		dev_err(&pdev->dev,
+			"Cannot re-enable PCI device after reset\n");
+		return PCI_ERS_RESULT_DISCONNECT;
+	}
+
+	/*
+	 * Secondary Bus Reset wipes out all device memory
+	 * requiring XE KMD to perform a device removal and reprobe.
+	 */
+	pdev->driver->remove(pdev);
+	devres_release_group(&pdev->dev, xe->devres_group_id);
+
+	if (!pdev->driver->probe(pdev, ent))
+		return PCI_ERS_RESULT_RECOVERED;
+
+	return PCI_ERS_RESULT_DISCONNECT;
+}
+
+static void xe_pci_error_resume(struct pci_dev *pdev)
+{
+	struct xe_device *xe = pdev_to_xe_device(pdev);
+
+	dev_info(&pdev->dev, "Xe Pci error recovery: Recovered\n");
+
+	xe_device_clear_in_recovery(xe);
+}
+
+const struct pci_error_handlers xe_pci_error_handlers = {
+	.error_detected	= xe_pci_error_detected,
+	.mmio_enabled	= xe_pci_error_mmio_enabled,
+	.slot_reset	= xe_pci_error_slot_reset,
+	.resume		= xe_pci_error_resume,
+};
diff --git a/drivers/gpu/drm/xe/xe_pci_types.h b/drivers/gpu/drm/xe/xe_pci_types.h
index 8eee4fb1c57c..08386c5eca27 100644
--- a/drivers/gpu/drm/xe/xe_pci_types.h
+++ b/drivers/gpu/drm/xe/xe_pci_types.h
@@ -57,6 +57,7 @@ struct xe_device_desc {
 	u8 has_soc_remapper_sysctrl:1;
 	u8 has_soc_remapper_telem:1;
 	u8 has_sriov:1;
+	u8 has_sysctrl:1;
 	u8 needs_scratch:1;
 	u8 skip_guc_pc:1;
 	u8 skip_mtcfg:1;
diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
new file mode 100644
index 000000000000..777321021391
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_ras.c
@@ -0,0 +1,331 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#include "xe_device_types.h"
+#include "xe_printk.h"
+#include "xe_ras.h"
+#include "xe_ras_types.h"
+#include "xe_survivability_mode.h"
+#include "xe_sysctrl_mailbox.h"
+#include "xe_sysctrl_mailbox_types.h"
+
+#define COMPUTE_ERROR_SEVERITY_MASK		GENMASK(26, 25)
+#define GLOBAL_UNCORR_ERROR			2
+
+/* Severity classification of detected errors */
+enum xe_ras_severity {
+	XE_RAS_SEVERITY_NOT_SUPPORTED = 0,
+	XE_RAS_SEVERITY_CORRECTABLE,
+	XE_RAS_SEVERITY_UNCORRECTABLE,
+	XE_RAS_SEVERITY_INFORMATIONAL,
+	XE_RAS_SEVERITY_MAX
+};
+
+/* major IP blocks where errors can originate */
+enum xe_ras_component {
+	XE_RAS_COMPONENT_NOT_SUPPORTED = 0,
+	XE_RAS_COMPONENT_DEVICE_MEMORY,
+	XE_RAS_COMPONENT_CORE_COMPUTE,
+	XE_RAS_COMPONENT_RESERVED,
+	XE_RAS_COMPONENT_PCIE,
+	XE_RAS_COMPONENT_FABRIC,
+	XE_RAS_COMPONENT_SOC_INTERNAL,
+	XE_RAS_COMPONENT_MAX
+};
+
+static const char * const xe_ras_severities[] = {
+	[XE_RAS_SEVERITY_NOT_SUPPORTED]		= "Not Supported",
+	[XE_RAS_SEVERITY_CORRECTABLE]		= "Correctable",
+	[XE_RAS_SEVERITY_UNCORRECTABLE]		= "Uncorrectable",
+	[XE_RAS_SEVERITY_INFORMATIONAL]		= "Informational",
+};
+
+static_assert(ARRAY_SIZE(xe_ras_severities) == XE_RAS_SEVERITY_MAX);
+
+static const char * const xe_ras_components[] = {
+	[XE_RAS_COMPONENT_NOT_SUPPORTED]	= "Not Supported",
+	[XE_RAS_COMPONENT_DEVICE_MEMORY]	= "Device Memory",
+	[XE_RAS_COMPONENT_CORE_COMPUTE]		= "Core Compute",
+	[XE_RAS_COMPONENT_RESERVED]		= "Reserved",
+	[XE_RAS_COMPONENT_PCIE]			= "PCIe",
+	[XE_RAS_COMPONENT_FABRIC]		= "Fabric",
+	[XE_RAS_COMPONENT_SOC_INTERNAL]		= "SoC Internal",
+};
+
+static_assert(ARRAY_SIZE(xe_ras_components) == XE_RAS_COMPONENT_MAX);
+
+static inline const char *severity_to_str(struct xe_device *xe, u32 severity)
+{
+	if (severity >= XE_RAS_SEVERITY_MAX)
+		return "Unknown Severity";
+
+	return xe_ras_severities[severity];
+}
+
+static inline const char *comp_to_str(struct xe_device *xe, u32 comp)
+{
+	if (comp >= XE_RAS_COMPONENT_MAX)
+		return "Unknown Component";
+
+	return xe_ras_components[comp];
+}
+
+static void log_ras_error(struct xe_device *xe, struct xe_ras_error_class *error_class)
+{
+	struct xe_ras_error_common common_info = error_class->common;
+	struct xe_ras_error_product product_info = error_class->product;
+	u8 tile = product_info.unit.tile;
+	u32 instance = product_info.unit.instance;
+	u32 cause = product_info.error_cause.cause;
+
+	xe_err(xe, "[RAS]: Tile%u, Instance %u, %s %s Error detected Cause: 0x%x\n",
+	       tile, instance, severity_to_str(xe, common_info.severity),
+	       comp_to_str(xe, common_info.component), cause);
+}
+
+static enum xe_ras_recovery_action handle_compute_errors(struct xe_device *xe,
+							 struct xe_ras_error_array *arr)
+{
+	struct xe_ras_compute_error *error_info = (struct xe_ras_compute_error *)arr->error_details;
+	u8 uncorr_type;
+
+	uncorr_type = FIELD_GET(COMPUTE_ERROR_SEVERITY_MASK, error_info->error_log_header);
+	log_ras_error(xe, &arr->error_class);
+
+	xe_err(xe, "[RAS]: Core Compute Error: timestamp %llu Uncorrected error type %u\n",
+	       arr->timestamp, uncorr_type);
+
+	/* Request a RESET if error is global */
+	if (uncorr_type == GLOBAL_UNCORR_ERROR)
+		return XE_RAS_RECOVERY_ACTION_RESET;
+
+	/* Local errors are recovered using a engine reset */
+	return XE_RAS_RECOVERY_ACTION_RECOVERED;
+}
+
+static enum xe_ras_recovery_action handle_soc_internal_errors(struct xe_device *xe,
+							      struct xe_ras_error_array *arr)
+{
+	struct xe_ras_soc_error *error_info = (struct xe_ras_soc_error *)arr->error_details;
+	struct xe_ras_soc_error_source source = error_info->error_source;
+	struct xe_ras_error_common common_info = arr->error_class.common;
+	enum xe_ras_recovery_action action;
+
+	/* Default action */
+	action = XE_RAS_RECOVERY_ACTION_RESET;
+
+	log_ras_error(xe, &arr->error_class);
+
+	if (source.csc) {
+		struct xe_ras_csc_error *csc_error;
+
+		csc_error = (struct xe_ras_csc_error *)error_info->additional_details;
+
+		/*
+		 * CSC uncorrectable errors are classified as hardware errors and firmware errors.
+		 * CSC firmware errors are critical errors that can be recovered only by firmware
+		 * update via SPI driver. PCODE enables FDO mode and sets the bit in the capability
+		 * register. On receiving this error, the driver enables runtime survivability mode
+		 * which notifies userspace that a firmware update is required.
+		 */
+		if (csc_error->hec_uncorr_fw_err_dw0) {
+			xe_err(xe, "[RAS]: CSC %s error detected: 0x%x\n",
+			       severity_to_str(xe, common_info.severity),
+			       csc_error->hec_uncorr_fw_err_dw0);
+			xe_survivability_mode_runtime_enable(xe);
+			action = XE_RAS_RECOVERY_ACTION_DISCONNECT;
+		}
+	}
+
+	if (source.soc) {
+		struct xe_ras_ieh_error *ieh_error;
+
+		ieh_error = (struct xe_ras_ieh_error *)error_info->additional_details;
+
+		if (ieh_error->error_sources_ieh0.punit) {
+			xe_err(xe, "[RAS]: PUNIT %s error detected: 0x%x\n",
+			       severity_to_str(xe, common_info.severity),
+			       ieh_error->error_sources_ieh0.punit);
+			/** TODO: Add PUNIT error handling */
+			action = XE_RAS_RECOVERY_ACTION_DISCONNECT;
+		}
+	}
+
+	/* For other SOC internal errors, request a reset as recovery mechanism */
+	return action;
+}
+
+static void xe_ras_prepare_sysctrl_command(struct xe_sysctrl_mailbox_command *command,
+					   u32 cmd_mask, void *request, size_t request_len,
+					   void *response, size_t response_len)
+{
+	struct xe_sysctrl_mailbox_app_msg_hdr hdr = {0};
+	u32 req_hdr;
+
+	req_hdr = FIELD_PREP(APP_HDR_GROUP_ID_MASK, XE_SYSCTRL_GROUP_GFSP) |
+		  FIELD_PREP(APP_HDR_COMMAND_MASK, cmd_mask);
+
+	hdr.data = req_hdr;
+	command->header = hdr;
+	command->data_in = request;
+	command->data_in_len = request_len;
+	command->data_out = response;
+	command->data_out_len = response_len;
+}
+
+/**
+ * xe_ras_process_errors - Process and contain hardware errors
+ * @xe: xe device instance
+ *
+ * Get error details from system controller and return recovery
+ * method. Called only from PCI error handling.
+ *
+ * Returns: recovery action to be taken
+ */
+enum xe_ras_recovery_action xe_ras_process_errors(struct xe_device *xe)
+{
+	struct xe_sysctrl_mailbox_command command = {0};
+	struct xe_ras_get_error_response response;
+	enum xe_ras_recovery_action final_action;
+	size_t rlen;
+	int ret;
+
+	/* Default action */
+	final_action = XE_RAS_RECOVERY_ACTION_RECOVERED;
+
+	if (!xe->info.has_sysctrl)
+		return XE_RAS_RECOVERY_ACTION_RESET;
+
+	xe_ras_prepare_sysctrl_command(&command, XE_SYSCTRL_CMD_GET_SOC_ERROR, NULL, 0,
+				       &response, sizeof(response));
+
+	do {
+		memset(&response, 0, sizeof(response));
+		rlen = 0;
+
+		ret = xe_sysctrl_send_command(xe, &command, &rlen);
+		if (ret || !rlen) {
+			xe_err(xe, "[RAS]: Sysctrl error ret %d\n", ret);
+			goto err;
+		}
+
+		if (rlen != sizeof(response)) {
+			xe_err(xe, "[RAS]: Sysctrl response does not match len!!\n");
+			goto err;
+		}
+
+		if (response.num_errors > XE_RAS_NUM_ERROR_ARR) {
+			xe_err(xe, "[RAS]: Number of errors out of bound (%d)\n",
+			       XE_RAS_NUM_ERROR_ARR);
+			goto err;
+		}
+
+		for (int i = 0; i < response.num_errors; i++) {
+			struct xe_ras_error_array arr = response.error_arr[i];
+			enum xe_ras_recovery_action action;
+			struct xe_ras_error_class error_class;
+			u8 component;
+
+			error_class = arr.error_class;
+			component = error_class.common.component;
+
+			switch (component) {
+			case XE_RAS_COMPONENT_CORE_COMPUTE:
+				action = handle_compute_errors(xe, &arr);
+				break;
+			case XE_RAS_COMPONENT_SOC_INTERNAL:
+				action = handle_soc_internal_errors(xe, &arr);
+				break;
+			default:
+				xe_err(xe, "[RAS]: Unknown error component %u\n", component);
+				break;
+			}
+
+			/*
+			 * Retain the highest severity action. Process and log all errors
+			 * and then take appropriate recovery action
+			 */
+			if (action > final_action)
+				final_action = action;
+		}
+
+	} while (response.additional_errors);
+
+	return final_action;
+
+err:
+	return XE_RAS_RECOVERY_ACTION_RESET;
+}
+
+#ifdef CONFIG_PCIEAER
+static void aer_unmask_and_downgrade_internal_error(struct xe_device *xe)
+{
+	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
+	struct pci_dev *vsp, *usp;
+	u32 aer_uncorr_mask, aer_uncorr_sev, aer_uncorr_status;
+	u16 aer_cap;
+
+	 /* Gfx Device Hierarchy: USP-->VSP-->SGunit */
+	vsp = pci_upstream_bridge(pdev);
+	if (!vsp)
+		return;
+
+	usp = pci_upstream_bridge(vsp);
+	if (!usp)
+		return;
+
+	aer_cap = usp->aer_cap;
+
+	if (!aer_cap)
+		return;
+
+	/*
+	 * Clear any stale Uncorrectable Internal Error Status event in Uncorrectable Error
+	 * Status Register.
+	 */
+	pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_STATUS, &aer_uncorr_status);
+	if (aer_uncorr_status & PCI_ERR_UNC_INTN)
+		pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_STATUS, PCI_ERR_UNC_INTN);
+
+	/*
+	 * All errors are steered to USP which is a PCIe AER Compliant device.
+	 * Downgrade all the errors to non-fatal to prevent PCIe bus driver
+	 * from triggering a Secondary Bus Reset (SBR). This allows error
+	 * detection, containment and recovery in the driver.
+	 *
+	 * The Uncorrectable Error Severity Register has the 'Uncorrectable
+	 * Internal Error Severity' set to fatal by default. Set this to
+	 * non-fatal and unmask the error.
+	 */
+
+	/* Initialize Uncorrectable Error Severity Register */
+	pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_SEVER, &aer_uncorr_sev);
+	aer_uncorr_sev &= ~PCI_ERR_UNC_INTN;
+	pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_SEVER, aer_uncorr_sev);
+
+	/* Initialize Uncorrectable Error Mask Register */
+	pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, &aer_uncorr_mask);
+	aer_uncorr_mask &= ~PCI_ERR_UNC_INTN;
+	pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, aer_uncorr_mask);
+
+	pci_save_state(usp);
+}
+#endif
+
+/**
+ * xe_ras_init - Initialize Xe RAS
+ * @xe: xe device instance
+ *
+ * Initialize Xe RAS
+ */
+void xe_ras_init(struct xe_device *xe)
+{
+	if (!xe->info.has_sysctrl)
+		return;
+
+#ifdef CONFIG_PCIEAER
+	aer_unmask_and_downgrade_internal_error(xe);
+#endif
+}
diff --git a/drivers/gpu/drm/xe/xe_ras.h b/drivers/gpu/drm/xe/xe_ras.h
new file mode 100644
index 000000000000..e191ab80080c
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_ras.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#ifndef _XE_RAS_H_
+#define _XE_RAS_H_
+
+#include "xe_ras_types.h"
+
+struct xe_device;
+
+void xe_ras_init(struct xe_device *xe);
+enum xe_ras_recovery_action  xe_ras_process_errors(struct xe_device *xe);
+
+#endif
diff --git a/drivers/gpu/drm/xe/xe_ras_types.h b/drivers/gpu/drm/xe/xe_ras_types.h
new file mode 100644
index 000000000000..466db9f47127
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_ras_types.h
@@ -0,0 +1,282 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#ifndef _XE_RAS_TYPES_H_
+#define _XE_RAS_TYPES_H_
+
+#include <linux/types.h>
+
+#define XE_RAS_NUM_ERROR_ARR		3
+#define XE_RAS_MAX_ERROR_DETAILS	16
+
+/**
+ * enum xe_ras_recovery_action - RAS recovery actions
+ *
+ * @XE_RAS_RECOVERY_ACTION_RECOVERED: Error recovered
+ * @XE_RAS_RECOVERY_ACTION_RESET: Requires reset
+ * @XE_RAS_RECOVERY_ACTION_DISCONNECT: Requires disconnect
+ *
+ * This enum defines the possible recovery actions that can be taken in response
+ * to RAS errors.
+ */
+enum xe_ras_recovery_action {
+	XE_RAS_RECOVERY_ACTION_RECOVERED = 0,
+	XE_RAS_RECOVERY_ACTION_RESET,
+	XE_RAS_RECOVERY_ACTION_DISCONNECT
+};
+
+/**
+ * struct xe_ras_error_common - Common RAS error class
+ *
+ * This structure contains error severity and component information
+ * across all products
+ */
+struct xe_ras_error_common {
+	/** @severity: Error Severity */
+	u8 severity;
+	/** @component: IP where the error originated */
+	u8 component;
+} __packed;
+
+/**
+ * struct xe_ras_error_unit - Error unit information
+ */
+struct xe_ras_error_unit {
+	/** @tile: Tile identifier */
+	u8 tile;
+	/** @instance: Instance identifier within a component */
+	u32 instance;
+} __packed;
+
+/**
+ * struct xe_ras_error_cause - Error cause information
+ */
+struct xe_ras_error_cause {
+	/** @cause: Cause */
+	u32 cause;
+	/** @reserved: For future use */
+	u8 reserved;
+} __packed;
+
+/**
+ * struct xe_ras_error_product - Error fields that are specific to the product
+ */
+struct xe_ras_error_product {
+	/** @unit: Unit within IP block */
+	struct xe_ras_error_unit unit;
+	/** @error_cause: Cause/checker */
+	struct xe_ras_error_cause error_cause;
+} __packed;
+
+/**
+ * struct xe_ras_error_class - Complete RAS Error Class
+ *
+ * This structure provides the complete error classification by combining
+ * the common error class with the product-specific error class.
+ */
+struct xe_ras_error_class {
+	/** @common: Common error severity and component */
+	struct xe_ras_error_common common;
+	/** @product: Product-specific unit and cause */
+	struct xe_ras_error_product product;
+} __packed;
+
+/**
+ * struct xe_ras_error_array - Details of the error types
+ */
+struct xe_ras_error_array {
+	/** @error_class: Error class */
+	struct xe_ras_error_class error_class;
+	/** @timestamp: Timestamp */
+	u64 timestamp;
+	/** @error_details: Error details specific to the class */
+	u32 error_details[XE_RAS_MAX_ERROR_DETAILS];
+} __packed;
+
+/**
+ * struct xe_ras_get_error_response - Response for XE_SYSCTRL_GET_SOC_ERROR
+ */
+struct xe_ras_get_error_response {
+	/** @num_errors: Number of errors reported in this response */
+	u8 num_errors;
+	/** @additional_errors: Indicates if the errors are pending */
+	u8 additional_errors;
+	/** @error_arr: Array of up to 3 errors */
+	struct xe_ras_error_array error_arr[XE_RAS_NUM_ERROR_ARR];
+} __packed;
+
+/**
+ * struct xe_ras_compute_error - Error details of Core Compute error
+ */
+struct xe_ras_compute_error {
+	/** @error_log_header: Error Source and type */
+	u32 error_log_header;
+	/** @internal_error_log: Internal Error log */
+	u32 internal_error_log;
+	/** @fabric_log: Fabric Error log */
+	u32 fabric_log;
+	/** @internal_error_addr_log0: Internal Error addr log */
+	u32 internal_error_addr_log0;
+	/** @internal_error_addr_log1: Internal Error addr log */
+	u32 internal_error_addr_log1;
+	/** @packet_log0: Packet log */
+	u32 packet_log0;
+	/** @packet_log1: Packet log */
+	u32 packet_log1;
+	/** @packet_log2: Packet log */
+	u32 packet_log2;
+	/** @packet_log3: Packet log */
+	u32 packet_log3;
+	/** @packet_log4: Packet log */
+	u32 packet_log4;
+	/** @misc_log0: Misc log */
+	u32 misc_log0;
+	/** @misc_log1: Misc log */
+	u32 misc_log1;
+	/** @spare_log0: Spare log */
+	u32 spare_log0;
+	/** @spare_log1: Spare log */
+	u32 spare_log1;
+	/** @spare_log2: Spare log */
+	u32 spare_log2;
+	/** @spare_log3: Spare log */
+	u32 spare_log3;
+} __packed;
+
+/**
+ * struct xe_ras_soc_error_source - Source of SOC error
+ */
+struct xe_ras_soc_error_source {
+	/** @csc: CSC error */
+	u32 csc:1;
+	/** @soc: SOC error */
+	u32 soc:1;
+	/** @reserved: Reserved for future use */
+	u32 reserved:30;
+} __packed;
+
+/**
+ * struct xe_ras_soc_error - SOC error details
+ */
+struct xe_ras_soc_error {
+	/** @error_source: Error Source */
+	struct xe_ras_soc_error_source error_source;
+	/** @additional_details: Additional details */
+	u32 additional_details[15];
+} __packed;
+
+/**
+ * struct xe_ras_csc_error - CSC error details
+ */
+struct xe_ras_csc_error {
+	/** @hec_uncorr_err_status: CSC error */
+	u32 hec_uncorr_err_status;
+	/** @hec_uncorr_fw_err_dw0: CSC f/w error */
+	u32 hec_uncorr_fw_err_dw0;
+} __packed;
+
+/**
+ * struct xe_ras_ieh_error - SoC IEH error details
+ */
+struct xe_ras_ieh_error {
+	/** @ieh_instance: IEH instance */
+	u32 ieh_instance:2;
+	/** @reserved: Reserved for future use */
+	u32 reserved:30;
+	union {
+		/** @global_error_status: Global error status */
+		u32 global_error_status;
+		/** @error_sources_ieh0: Error sources for IEH0 */
+		struct {
+			/** @psf0_psf1_npk: PSF0, PSF1, NPK */
+			u32 psf0_psf1_npk:1;
+			/** @punit: PUNIT */
+			u32 punit:1;
+			/** @reserved_2: Reserved */
+			u32 reserved_2:1;
+			/** @oobmsm: OOBMSM */
+			u32 oobmsm:1;
+			/** @i2c: I2C */
+			u32 i2c:1;
+			/** @pciess_gpma: PCIESS GPMA */
+			u32 pciess_gpma:1;
+			/** @lpioss_pma: LPIOSS PMA */
+			u32 lpioss_pma:1;
+			/** @fabss0_pma: FabSS0 PMA */
+			u32 fabss0_pma:1;
+			/** @fabss1_pma: FabSS1 PMA */
+			u32 fabss1_pma:1;
+			/** @reserved_9: Reserved */
+			u32 reserved_9:1;
+			/** @reserved_10: Reserved */
+			u32 reserved_10:1;
+			/** @reserved_11: Reserved */
+			u32 reserved_11:1;
+			/** @reserved_12: Reserved */
+			u32 reserved_12:1;
+			/** @reserved_13: Reserved */
+			u32 reserved_13:1;
+			/** @memss_ieh1: MEMSS -> IEH1 */
+			u32 memss_ieh1:1;
+			/** @memss_ieh2: MEMSS -> IEH2 */
+			u32 memss_ieh2:1;
+			/** @saf0_mhb0: SAF0 MHB0 */
+			u32 saf0_mhb0:1;
+			/** @saf0_mhb1: SAF0 MHB1 */
+			u32 saf0_mhb1:1;
+			 /** @saf0_mhb2: SAF0 MHB2 */
+			u32 saf0_mhb2:1;
+			/** @saf0_mhb3: SAF0 MHB3 */
+			u32 saf0_mhb3:1;
+			/** @saf0_mhb4: SAF0 MHB4 */
+			u32 saf0_mhb4:1;
+			/** @saf0_mhb5: SAF0 MHB5 */
+			u32 saf0_mhb5:1;
+			/** @saf0_mhb6: SAF0 MHB6 */
+			u32 saf0_mhb6:1;
+			/** @saf0_mhb7: SAF0 MHB7 */
+			u32 saf0_mhb7:1;
+			/** @saf1_mhb0: SAF1 MHB0 */
+			u32 saf1_mhb0:1;
+			/** @saf1_mhb1: SAF1 MHB1 */
+			u32 saf1_mhb1:1;
+			/** @saf1_mhb2: SAF1 MHB2 */
+			u32 saf1_mhb2:1;
+			/** @saf1_mhb3: SAF1 MHB3 */
+			u32 saf1_mhb3:1;
+			/** @saf1_mhb4: SAF1 MHB4 */
+			u32 saf1_mhb4:1;
+			/** @saf1_mhb5: SAF1 MHB5 */
+			u32 saf1_mhb5:1;
+			/** @saf1_mhb6: SAF1 MHB6 */
+			u32 saf1_mhb6:1;
+			/** @saf1_mhb7: SAF1 MHB7 */
+			u32 saf1_mhb7:1;
+		} error_sources_ieh0;
+	};
+
+	/** @lerr_status_ieh0: Local error status of IEH0 */
+	struct {
+		/** @reserved_0: Reserved for future use */
+		u32 reserved_0:1;
+		/** @psf0: PSF0 */
+		u32 psf0:1;
+		/** @psf1: PSF1 */
+		u32 psf1:1;
+		/** @reserved_26: Reserved */
+		u32 reserved_26:26;
+		/** @npk: NPK */
+		u32 npk:1;
+		/** @reserved_30: Reserved */
+		u32 reserved_30:2;
+	} lerr_status_ieh0;
+
+	/** @gerr_mask: Global error mask */
+	u32 gerr_mask;
+	/** @additional_info: Additional information */
+	u32 additional_info[10];
+} __packed;
+
+#endif
diff --git a/drivers/gpu/drm/xe/xe_survivability_mode.c b/drivers/gpu/drm/xe/xe_survivability_mode.c
index db64cac39c94..70feb192fa2f 100644
--- a/drivers/gpu/drm/xe/xe_survivability_mode.c
+++ b/drivers/gpu/drm/xe/xe_survivability_mode.c
@@ -98,6 +98,15 @@
  *	# cat /sys/bus/pci/devices/<device>/survivability_mode
  *	  Runtime
  *
+ * On some CSC firmware errors, PCODE sets FDO mode and the only recovery possible is through
+ * firmware flash using SPI driver. Userspace can check if FDO mode is set by checking the below
+ * sysfs entry.
+ *
+ * .. code-block:: shell
+ *
+ *	# cat /sys/bus/pci/devices/<device>/survivability_info/fdo_mode
+ *       enabled
+ *
  * When such errors occur, userspace is notified with the drm device wedged uevent and runtime
  * survivability mode. User can then initiate a firmware flash using userspace tools like fwupd
  * to restore device to normal operation.
@@ -296,7 +305,8 @@ static int create_survivability_sysfs(struct pci_dev *pdev)
 	if (ret)
 		return ret;
 
-	if (check_boot_failure(xe)) {
+	/* Surivivability info is not required if enabled via configfs */
+	if (!xe_configfs_get_survivability_mode(pdev)) {
 		ret = devm_device_add_group(dev, &survivability_info_group);
 		if (ret)
 			return ret;
diff --git a/drivers/gpu/drm/xe/xe_sysctrl.c b/drivers/gpu/drm/xe/xe_sysctrl.c
new file mode 100644
index 000000000000..430bccbdc3b9
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_sysctrl.c
@@ -0,0 +1,80 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#include <drm/drm_managed.h>
+#include <linux/device.h>
+#include <linux/mutex.h>
+
+#include "regs/xe_sysctrl_regs.h"
+#include "xe_device.h"
+#include "xe_mmio.h"
+#include "xe_printk.h"
+#include "xe_soc_remapper.h"
+#include "xe_sysctrl.h"
+#include "xe_sysctrl_mailbox.h"
+#include "xe_sysctrl_types.h"
+
+/**
+ * DOC: System Controller (sysctrl)
+ *
+ * The System Controller (sysctrl) is an embedded microcontroller in Intel GPUs
+ * responsible for managing various low-level platform functions. Communication
+ * between the driver and the System Controller occurs via a mailbox interface,
+ * enabling the exchange of commands and responses.
+ *
+ * This module provides initialization routines and helper functions to interact
+ * with the System Controller through the mailbox.
+ */
+
+static void xe_sysctrl_fini(void *arg)
+{
+	struct xe_device *xe = arg;
+
+	xe->soc_remapper.set_sysctrl_region(xe, 0);
+}
+
+/**
+ * xe_sysctrl_init - Initialize System Controller subsystem
+ * @xe: xe device instance
+ *
+ * Entry point for System Controller initialization, called from xe_device_probe.
+ * This function checks platform support and initializes the system controller.
+ *
+ * Return: 0 on success, error code on failure
+ */
+int xe_sysctrl_init(struct xe_device *xe)
+{
+	struct xe_tile *tile = xe_device_get_root_tile(xe);
+	struct xe_sysctrl *sc = &xe->sc;
+	int ret;
+
+	if (!xe->info.has_sysctrl)
+		return 0;
+
+	if (!xe->soc_remapper.set_sysctrl_region)
+		return -ENODEV;
+
+	xe->soc_remapper.set_sysctrl_region(xe, SYSCTRL_MAILBOX_INDEX);
+
+	ret = devm_add_action_or_reset(xe->drm.dev, xe_sysctrl_fini, xe);
+	if (ret)
+		return ret;
+
+	sc->mmio = devm_kzalloc(xe->drm.dev, sizeof(*sc->mmio), GFP_KERNEL);
+	if (!sc->mmio)
+		return -ENOMEM;
+
+	xe_mmio_init(sc->mmio, tile, tile->mmio.regs, tile->mmio.regs_size);
+	sc->mmio->adj_offset = SYSCTRL_BASE;
+	sc->mmio->adj_limit = U32_MAX;
+
+	ret = drmm_mutex_init(&xe->drm, &sc->cmd_lock);
+	if (ret)
+		return ret;
+
+	xe_sysctrl_mailbox_init(sc);
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/xe/xe_sysctrl.h b/drivers/gpu/drm/xe/xe_sysctrl.h
new file mode 100644
index 000000000000..ee7826fe4c98
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_sysctrl.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#ifndef _XE_SYSCTRL_H_
+#define _XE_SYSCTRL_H_
+
+struct xe_device;
+
+int xe_sysctrl_init(struct xe_device *xe);
+
+#endif /* _XE_SYSCTRL_H_ */
diff --git a/drivers/gpu/drm/xe/xe_sysctrl_mailbox.c b/drivers/gpu/drm/xe/xe_sysctrl_mailbox.c
new file mode 100644
index 000000000000..15a186a6f057
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_sysctrl_mailbox.c
@@ -0,0 +1,390 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#include <linux/bitfield.h>
+#include <linux/cleanup.h>
+#include <linux/container_of.h>
+#include <linux/errno.h>
+#include <linux/minmax.h>
+#include <linux/mutex.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/types.h>
+
+#include "regs/xe_sysctrl_regs.h"
+#include "xe_device.h"
+#include "xe_device_types.h"
+#include "xe_mmio.h"
+#include "xe_pm.h"
+#include "xe_printk.h"
+#include "xe_sysctrl.h"
+#include "xe_sysctrl_mailbox.h"
+#include "xe_sysctrl_mailbox_types.h"
+#include "xe_sysctrl_types.h"
+
+#define MKHI_HDR_GROUP_ID_MASK		GENMASK(7, 0)
+#define MKHI_HDR_COMMAND_MASK		GENMASK(14, 8)
+#define MKHI_HDR_IS_RESPONSE		BIT(15)
+#define MKHI_HDR_RESERVED_MASK		GENMASK(23, 16)
+#define MKHI_HDR_RESULT_MASK		GENMASK(31, 24)
+
+#define XE_SYSCTRL_MKHI_HDR_GROUP_ID(hdr) \
+	FIELD_GET(MKHI_HDR_GROUP_ID_MASK, le32_to_cpu((hdr)->data))
+
+#define XE_SYSCTRL_MKHI_HDR_COMMAND(hdr) \
+	FIELD_GET(MKHI_HDR_COMMAND_MASK, le32_to_cpu((hdr)->data))
+
+#define XE_SYSCTRL_MKHI_HDR_IS_RESPONSE(hdr) \
+	FIELD_GET(MKHI_HDR_IS_RESPONSE, le32_to_cpu((hdr)->data))
+
+#define XE_SYSCTRL_MKHI_HDR_RESULT(hdr) \
+	FIELD_GET(MKHI_HDR_RESULT_MASK, le32_to_cpu((hdr)->data))
+
+static struct xe_device *sc_to_xe(struct xe_sysctrl *sc)
+{
+	return container_of(sc, struct xe_device, sc);
+}
+
+static bool xe_sysctrl_mailbox_wait_bit_clear(struct xe_sysctrl *sc, u32 bit_mask,
+					      unsigned int timeout_ms)
+{
+	int ret;
+
+	ret = xe_mmio_wait32_not(sc->mmio, SYSCTRL_MB_CTRL, bit_mask, bit_mask,
+				 timeout_ms * 1000, NULL, false);
+
+	return ret == 0;
+}
+
+static bool xe_sysctrl_mailbox_wait_bit_set(struct xe_sysctrl *sc, u32 bit_mask,
+					    unsigned int timeout_ms)
+{
+	int ret;
+
+	ret = xe_mmio_wait32(sc->mmio, SYSCTRL_MB_CTRL, bit_mask, bit_mask,
+			     timeout_ms * 1000, NULL, false);
+
+	return ret == 0;
+}
+
+static int xe_sysctrl_mailbox_write_frame(struct xe_sysctrl *sc, const void *frame,
+					  size_t len)
+{
+	static const struct xe_reg regs[] = {
+		SYSCTRL_MB_DATA0, SYSCTRL_MB_DATA1, SYSCTRL_MB_DATA2, SYSCTRL_MB_DATA3
+	};
+	u32 val[SYSCTRL_MB_FRAME_SIZE / sizeof(u32)] = {0};
+	u32 dw = DIV_ROUND_UP(len, sizeof(u32));
+	u32 i;
+
+	memcpy(val, frame, len);
+
+	for (i = 0; i < dw; i++)
+		xe_mmio_write32(sc->mmio, regs[i], val[i]);
+
+	return 0;
+}
+
+static int xe_sysctrl_mailbox_read_frame(struct xe_sysctrl *sc, void *frame,
+					 size_t len)
+{
+	static const struct xe_reg regs[] = {
+		SYSCTRL_MB_DATA0, SYSCTRL_MB_DATA1, SYSCTRL_MB_DATA2, SYSCTRL_MB_DATA3
+	};
+	u32 val[SYSCTRL_MB_FRAME_SIZE / sizeof(u32)] = {0};
+	u32 dw = DIV_ROUND_UP(len, sizeof(u32));
+	u32 i;
+
+	for (i = 0; i < dw; i++)
+		val[i] = xe_mmio_read32(sc->mmio, regs[i]);
+
+	memcpy(frame, val, len);
+
+	return 0;
+}
+
+static void xe_sysctrl_mailbox_clear_response(struct xe_sysctrl *sc)
+{
+	xe_mmio_rmw32(sc->mmio, SYSCTRL_MB_CTRL, SYSCTRL_MB_CTRL_RUN_BUSY_OUT, 0);
+}
+
+static int xe_sysctrl_mailbox_prepare_command(struct xe_device *xe,
+					      u8 group_id, u8 command,
+					      const void *data_in, size_t data_in_len,
+					      u8 **mbox_cmd, size_t *cmd_size)
+{
+	struct xe_sysctrl_mailbox_mkhi_msg_hdr *mkhi_hdr;
+	size_t size;
+	u8 *buffer;
+
+	if (data_in_len > SYSCTRL_MB_MAX_MESSAGE_SIZE - sizeof(*mkhi_hdr)) {
+		xe_err(xe, "sysctrl: Input data too large: %zu bytes\n", data_in_len);
+		return -EINVAL;
+	}
+
+	size = sizeof(*mkhi_hdr) + data_in_len;
+
+	buffer = kmalloc(size, GFP_KERNEL);
+	if (!buffer)
+		return -ENOMEM;
+
+	mkhi_hdr = (struct xe_sysctrl_mailbox_mkhi_msg_hdr *)buffer;
+	mkhi_hdr->data = cpu_to_le32(FIELD_PREP(MKHI_HDR_GROUP_ID_MASK, group_id) |
+				     FIELD_PREP(MKHI_HDR_COMMAND_MASK, command & 0x7F) |
+				     FIELD_PREP(MKHI_HDR_IS_RESPONSE, 0) |
+				     FIELD_PREP(MKHI_HDR_RESERVED_MASK, 0) |
+				     FIELD_PREP(MKHI_HDR_RESULT_MASK, 0));
+
+	if (data_in && data_in_len)
+		memcpy(buffer + sizeof(*mkhi_hdr), data_in, data_in_len);
+
+	*mbox_cmd = buffer;
+	*cmd_size = size;
+
+	return 0;
+}
+
+static int xe_sysctrl_mailbox_send_frames(struct xe_sysctrl *sc,
+					  const u8 *mbox_cmd,
+					  size_t cmd_size, unsigned int timeout_ms)
+{
+	struct xe_device *xe = sc_to_xe(sc);
+	u32 ctrl_reg, total_frames, frame;
+	size_t bytes_sent, frame_size;
+
+	total_frames = DIV_ROUND_UP(cmd_size, SYSCTRL_MB_FRAME_SIZE);
+
+	if (!xe_sysctrl_mailbox_wait_bit_clear(sc, SYSCTRL_MB_CTRL_RUN_BUSY, timeout_ms)) {
+		xe_err(xe, "sysctrl: Mailbox busy\n");
+		return -EBUSY;
+	}
+
+	sc->phase_bit ^= 1;
+	bytes_sent = 0;
+
+	for (frame = 0; frame < total_frames; frame++) {
+		frame_size = min_t(size_t, cmd_size - bytes_sent, SYSCTRL_MB_FRAME_SIZE);
+
+		if (xe_sysctrl_mailbox_write_frame(sc, mbox_cmd + bytes_sent, frame_size)) {
+			xe_err(xe, "sysctrl: Failed to write frame %u\n", frame);
+			sc->phase_bit = 0;
+			return -EIO;
+		}
+
+		ctrl_reg = SYSCTRL_MB_CTRL_RUN_BUSY |
+			   FIELD_PREP(MKHI_FRAME_CURRENT_MASK, frame) |
+			   FIELD_PREP(MKHI_FRAME_TOTAL_MASK, total_frames - 1) |
+			   SYSCTRL_MB_CTRL_MKHI_CMD |
+			   (sc->phase_bit ? MKHI_FRAME_PHASE : 0);
+
+		xe_mmio_write32(sc->mmio, SYSCTRL_MB_CTRL, ctrl_reg);
+
+		if (!xe_sysctrl_mailbox_wait_bit_clear(sc, SYSCTRL_MB_CTRL_RUN_BUSY, timeout_ms)) {
+			xe_err(xe, "sysctrl: Frame %u acknowledgment timeout\n", frame);
+			sc->phase_bit = 0;
+			return -ETIMEDOUT;
+		}
+
+		bytes_sent += frame_size;
+	}
+
+	return 0;
+}
+
+static int xe_sysctrl_mailbox_process_frame(struct xe_sysctrl *sc, void *out,
+					    size_t frame_size, unsigned int timeout_ms,
+					    bool *done)
+{
+	u32 curr_frame, total_frames, ctrl_reg;
+	struct xe_device *xe = sc_to_xe(sc);
+	int ret;
+
+	if (!xe_sysctrl_mailbox_wait_bit_set(sc, SYSCTRL_MB_CTRL_RUN_BUSY_OUT, timeout_ms)) {
+		xe_err(xe, "sysctrl: Response frame timeout\n");
+		return -ETIMEDOUT;
+	}
+
+	ctrl_reg = xe_mmio_read32(sc->mmio, SYSCTRL_MB_CTRL);
+	total_frames = FIELD_GET(MKHI_FRAME_TOTAL_MASK, ctrl_reg);
+	curr_frame = FIELD_GET(MKHI_FRAME_CURRENT_MASK, ctrl_reg);
+
+	ret = xe_sysctrl_mailbox_read_frame(sc, out, frame_size);
+	if (ret)
+		return ret;
+
+	xe_sysctrl_mailbox_clear_response(sc);
+
+	if (curr_frame == total_frames)
+		*done = true;
+
+	return 0;
+}
+
+static int xe_sysctrl_mailbox_receive_frames(struct xe_sysctrl *sc,
+					     const struct xe_sysctrl_mailbox_mkhi_msg_hdr *req,
+					     void *data_out, size_t data_out_len,
+					     size_t *rdata_len, unsigned int timeout_ms)
+{
+	struct xe_sysctrl_mailbox_mkhi_msg_hdr *mkhi_hdr;
+	struct xe_device *xe = sc_to_xe(sc);
+	size_t frame_size, remain;
+	bool done = false;
+	u8 *out;
+	int ret = 0;
+
+	remain = sizeof(*mkhi_hdr) + data_out_len;
+	u8 *buffer __free(kfree) = kzalloc(remain, GFP_KERNEL);
+	if (!buffer)
+		return -ENOMEM;
+
+	out = buffer;
+	while (!done && remain) {
+		frame_size = min_t(size_t, remain, SYSCTRL_MB_FRAME_SIZE);
+
+		ret = xe_sysctrl_mailbox_process_frame(sc, out, frame_size, timeout_ms,
+						       &done);
+		if (ret)
+			return ret;
+
+		remain -= frame_size;
+		out += frame_size;
+	}
+
+	mkhi_hdr = (struct xe_sysctrl_mailbox_mkhi_msg_hdr *)buffer;
+
+	if (!XE_SYSCTRL_MKHI_HDR_IS_RESPONSE(mkhi_hdr) ||
+	    XE_SYSCTRL_MKHI_HDR_GROUP_ID(mkhi_hdr) != XE_SYSCTRL_MKHI_HDR_GROUP_ID(req) ||
+	    XE_SYSCTRL_MKHI_HDR_COMMAND(mkhi_hdr) != XE_SYSCTRL_MKHI_HDR_COMMAND(req)) {
+		xe_err(xe, "sysctrl: Response header mismatch\n");
+		return -EPROTO;
+	}
+
+	if (XE_SYSCTRL_MKHI_HDR_RESULT(mkhi_hdr) != 0) {
+		xe_err(xe, "sysctrl: Firmware error: 0x%02lx\n",
+		       XE_SYSCTRL_MKHI_HDR_RESULT(mkhi_hdr));
+		return -EIO;
+	}
+
+	memcpy(data_out, mkhi_hdr + 1, data_out_len);
+	*rdata_len = out - buffer - sizeof(*mkhi_hdr);
+
+	return ret;
+}
+
+static int xe_sysctrl_mailbox_send_command(struct xe_sysctrl *sc,
+					   const u8 *mbox_cmd, size_t cmd_size,
+					   void *data_out, size_t data_out_len,
+					   size_t *rdata_len, unsigned int timeout_ms)
+{
+	const struct xe_sysctrl_mailbox_mkhi_msg_hdr *mkhi_hdr;
+	size_t received;
+	int ret;
+
+	ret = xe_sysctrl_mailbox_send_frames(sc, mbox_cmd, cmd_size, timeout_ms);
+	if (ret)
+		return ret;
+
+	if (!data_out || !rdata_len)
+		return 0;
+
+	mkhi_hdr = (const struct xe_sysctrl_mailbox_mkhi_msg_hdr *)mbox_cmd;
+
+	ret = xe_sysctrl_mailbox_receive_frames(sc, mkhi_hdr, data_out, data_out_len,
+						&received, timeout_ms);
+	if (ret)
+		return ret;
+
+	*rdata_len = received;
+
+	return 0;
+}
+
+/**
+ * xe_sysctrl_mailbox_init - Initialize System Controller mailbox interface
+ * @sc: System controller structure
+ *
+ * Initialize system controller mailbox interface for communication.
+ */
+void xe_sysctrl_mailbox_init(struct xe_sysctrl *sc)
+{
+	u32 ctrl_reg;
+
+	ctrl_reg = xe_mmio_read32(sc->mmio, SYSCTRL_MB_CTRL);
+	sc->phase_bit = (ctrl_reg & MKHI_FRAME_PHASE) ? 1 : 0;
+}
+
+/**
+ * xe_sysctrl_send_command - Send command to System Controller via mailbox
+ * @xe: XE device instance
+ * @cmd: Pointer to xe_sysctrl_mailbox_command structure
+ * @rdata_len: Pointer to store actual response data size (can be NULL)
+ *
+ * Send a command to the System Controller using MKHI protocol. Handles
+ * command preparation, fragmentation, transmission, and response reception.
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+int xe_sysctrl_send_command(struct xe_device *xe,
+			    struct xe_sysctrl_mailbox_command *cmd,
+			    size_t *rdata_len)
+{
+	struct xe_sysctrl *sc;
+	u8 group_id, command_code;
+	u8 *mbox_cmd = NULL;
+	size_t cmd_size = 0;
+	int ret = 0;
+
+	if (!xe) {
+		pr_err("sysctrl: Invalid device handle\n");
+		return -EINVAL;
+	}
+
+	if (!xe->info.has_sysctrl)
+		return -ENODEV;
+
+	sc = &xe->sc;
+
+	if (!cmd) {
+		xe_err(xe, "sysctrl: Invalid command buffer\n");
+		return -EINVAL;
+	}
+
+	group_id = XE_SYSCTRL_APP_HDR_GROUP_ID(&cmd->header);
+	command_code = XE_SYSCTRL_APP_HDR_COMMAND(&cmd->header);
+
+	if (!cmd->data_in && cmd->data_in_len) {
+		xe_err(xe, "sysctrl: Invalid input parameters\n");
+		return -EINVAL;
+	}
+
+	if (!cmd->data_out && cmd->data_out_len) {
+		xe_err(xe, "sysctrl: Invalid output parameters\n");
+		return -EINVAL;
+	}
+
+	might_sleep();
+
+	ret = xe_sysctrl_mailbox_prepare_command(xe, group_id, command_code,
+						 cmd->data_in, cmd->data_in_len,
+						 &mbox_cmd, &cmd_size);
+	if (ret) {
+		xe_err(xe, "sysctrl: Failed to prepare command: %d\n", ret);
+		return ret;
+	}
+
+	guard(xe_pm_runtime)(xe);
+
+	guard(mutex)(&sc->cmd_lock);
+
+	ret = xe_sysctrl_mailbox_send_command(sc, mbox_cmd, cmd_size,
+					      cmd->data_out, cmd->data_out_len, rdata_len,
+					      SYSCTRL_MB_DEFAULT_TIMEOUT_MS);
+	if (ret)
+		xe_err(xe, "sysctrl: Mailbox command failed: %d\n", ret);
+
+	kfree(mbox_cmd);
+
+	return ret;
+}
diff --git a/drivers/gpu/drm/xe/xe_sysctrl_mailbox.h b/drivers/gpu/drm/xe/xe_sysctrl_mailbox.h
new file mode 100644
index 000000000000..2b64165c8e76
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_sysctrl_mailbox.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#ifndef __XE_SYSCTRL_MAILBOX_H__
+#define __XE_SYSCTRL_MAILBOX_H__
+
+#include <linux/bitfield.h>
+#include <linux/types.h>
+
+struct xe_sysctrl;
+struct xe_device;
+struct xe_sysctrl_mailbox_command;
+
+#define APP_HDR_GROUP_ID_MASK			GENMASK(7, 0)
+#define APP_HDR_COMMAND_MASK			GENMASK(15, 8)
+#define APP_HDR_VERSION_MASK			GENMASK(23, 16)
+#define APP_HDR_RESERVED_MASK			GENMASK(31, 24)
+
+#define XE_SYSCTRL_APP_HDR_GROUP_ID(hdr) \
+	FIELD_GET(APP_HDR_GROUP_ID_MASK, le32_to_cpu((hdr)->data))
+
+#define XE_SYSCTRL_APP_HDR_COMMAND(hdr) \
+	FIELD_GET(APP_HDR_COMMAND_MASK, le32_to_cpu((hdr)->data))
+
+#define XE_SYSCTRL_APP_HDR_VERSION(hdr) \
+	FIELD_GET(APP_HDR_VERSION_MASK, le32_to_cpu((hdr)->data))
+
+void xe_sysctrl_mailbox_init(struct xe_sysctrl *sc);
+int xe_sysctrl_send_command(struct xe_device *xe,
+			    struct xe_sysctrl_mailbox_command *cmd,
+			    size_t *rdata_len);
+
+#endif /* __XE_SYSCTRL_MAILBOX_H__ */
diff --git a/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h b/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
new file mode 100644
index 000000000000..14e2d7989fcc
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#ifndef __XE_SYSCTRL_MAILBOX_TYPES_H__
+#define __XE_SYSCTRL_MAILBOX_TYPES_H__
+
+#include <linux/types.h>
+
+/**
+ * enum xe_sysctrl_mailbox_command_id - RAS Command ID's for GFSP group
+ *
+ * @XE_SYSCTRL_CMD_GET_SOC_ERROR: Get basic error information
+ */
+enum xe_sysctrl_mailbox_command_id {
+	XE_SYSCTRL_CMD_GET_SOC_ERROR = 1
+};
+
+enum xe_sysctrl_group {
+	XE_SYSCTRL_GROUP_GFSP = 1
+};
+
+struct xe_sysctrl_mailbox_mkhi_msg_hdr {
+	__le32 data;
+} __packed;
+
+struct xe_sysctrl_mailbox_app_msg_hdr {
+	__le32 data;
+} __packed;
+
+struct xe_sysctrl_mailbox_command {
+	/** @header: Application message header containing command information */
+	struct xe_sysctrl_mailbox_app_msg_hdr header;
+
+	/** @data_in: Pointer to input payload data (can be NULL if no input data) */
+	void *data_in;
+
+	/** @data_in_len: Size of input payload in bytes (0 if no input data) */
+	size_t data_in_len;
+
+	/** @data_out: Pointer to output buffer for response data (can be NULL if no response) */
+	void *data_out;
+
+	/** @data_out_len: Size of output buffer in bytes (0 if no response expected) */
+	size_t data_out_len;
+};
+
+#define SYSCTRL_MB_FRAME_SIZE			16
+#define SYSCTRL_MB_MAX_FRAMES			64
+#define SYSCTRL_MB_MAX_MESSAGE_SIZE		(SYSCTRL_MB_FRAME_SIZE * SYSCTRL_MB_MAX_FRAMES)
+
+#define SYSCTRL_MB_DEFAULT_TIMEOUT_MS		500
+
+#endif /* __XE_SYSCTRL_MAILBOX_TYPES_H__ */
diff --git a/drivers/gpu/drm/xe/xe_sysctrl_types.h b/drivers/gpu/drm/xe/xe_sysctrl_types.h
new file mode 100644
index 000000000000..d4a362564925
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_sysctrl_types.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#ifndef _XE_SYSCTRL_TYPES_H_
+#define _XE_SYSCTRL_TYPES_H_
+
+#include <linux/mutex.h>
+#include <linux/types.h>
+
+struct xe_mmio;
+
+/**
+ * struct xe_sysctrl - System Controller driver context
+ */
+struct xe_sysctrl {
+	/** @mmio: MMIO region for system control registers */
+	struct xe_mmio *mmio;
+
+	/** @cmd_lock: Mutex protecting mailbox command operations */
+	struct mutex cmd_lock;
+
+	/**
+	 * @phase_bit: MKHI message boundary phase toggle bit
+	 *
+	 * Phase bit alternates between 0 and 1 for consecutive
+	 * messages to help distinguish message boundaries.
+	 */
+	bool phase_bit;
+};
+
+#endif /* _XE_SYSCTRL_TYPES_H_ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 2/5] drm: Add DRM_WEDGE_RECOVERY_COLD_RESET for power management unit error
  2026-03-18  6:40 [PATCH v2 0/5] Introduce cold reset recovery method Mallesh Koujalagi
  2026-03-18  6:40 ` [PATCH v2 1/5] Introduce Xe Uncorrectable Error Handling Mallesh Koujalagi
@ 2026-03-18  6:40 ` Mallesh Koujalagi
  2026-03-30  5:26   ` Tauro, Riana
  2026-03-18  6:40 ` [PATCH v2 3/5] drm/doc: Document DRM_WEDGE_RECOVERY_COLD_RESET recovery method Mallesh Koujalagi
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: Mallesh Koujalagi @ 2026-03-18  6:40 UTC (permalink / raw)
  To: intel-xe, dri-devel, rodrigo.vivi
  Cc: andrealmeid, christian.koenig, airlied, simona.vetter, mripard,
	anshuman.gupta, badal.nilawar, riana.tauro, karthik.poosa,
	sk.anirban, raag.jadav, Mallesh Koujalagi

Introduce DRM_WEDGE_RECOVERY_COLD_RESET (BIT(4)) recovery method to
handle power management unit errors requiring complete device power
cycling.

This method addresses scenarios where recovery mechanisms
(driver reload, PCIe reset, etc.) are insufficient to restore
device functionality. When set, it indicates to userspace that
only a full cold reset can recover the device from its current error
state. The cold reset method serves as a last resort for power management
unit errors.

Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
---
 drivers/gpu/drm/drm_drv.c | 2 ++
 include/drm/drm_device.h  | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index 6b965c3d3307..2dace9070531 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -535,6 +535,8 @@ static const char *drm_get_wedge_recovery(unsigned int opt)
 		return "bus-reset";
 	case DRM_WEDGE_RECOVERY_VENDOR:
 		return "vendor-specific";
+	case DRM_WEDGE_RECOVERY_COLD_RESET:
+		return "cold-reset";
 	default:
 		return NULL;
 	}
diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h
index bc78fb77cc27..3e386eb42023 100644
--- a/include/drm/drm_device.h
+++ b/include/drm/drm_device.h
@@ -37,6 +37,7 @@ struct pci_controller;
 #define DRM_WEDGE_RECOVERY_REBIND	BIT(1)	/* unbind + bind driver */
 #define DRM_WEDGE_RECOVERY_BUS_RESET	BIT(2)	/* unbind + reset bus device + bind */
 #define DRM_WEDGE_RECOVERY_VENDOR	BIT(3)	/* vendor specific recovery method */
+#define DRM_WEDGE_RECOVERY_COLD_RESET	BIT(4)	/* full device cold reset */
 
 /**
  * struct drm_wedge_task_info - information about the guilty task of a wedge dev
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 3/5] drm/doc: Document DRM_WEDGE_RECOVERY_COLD_RESET recovery method
  2026-03-18  6:40 [PATCH v2 0/5] Introduce cold reset recovery method Mallesh Koujalagi
  2026-03-18  6:40 ` [PATCH v2 1/5] Introduce Xe Uncorrectable Error Handling Mallesh Koujalagi
  2026-03-18  6:40 ` [PATCH v2 2/5] drm: Add DRM_WEDGE_RECOVERY_COLD_RESET for power management unit error Mallesh Koujalagi
@ 2026-03-18  6:40 ` Mallesh Koujalagi
  2026-03-30  5:00   ` Tauro, Riana
  2026-04-02  8:16   ` Raag Jadav
  2026-03-18  6:40 ` [PATCH v2 4/5] drm/xe: Add handler for power management unit errors which require cold-reset Mallesh Koujalagi
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 25+ messages in thread
From: Mallesh Koujalagi @ 2026-03-18  6:40 UTC (permalink / raw)
  To: intel-xe, dri-devel, rodrigo.vivi
  Cc: andrealmeid, christian.koenig, airlied, simona.vetter, mripard,
	anshuman.gupta, badal.nilawar, riana.tauro, karthik.poosa,
	sk.anirban, raag.jadav, Mallesh Koujalagi

Add documentation for the DRM_WEDGE_RECOVERY_COLD_RESET recovery
method introduced for handling power management unit errors. This method is
designated for severe errors that compromise core device functionality
and are unrecoverable via recovery mechanisms such as driver reload or PCIe
bus reset. The documentation clarifies when this recovery method should be
used and its implications for userspace applications.

v2:
- Add several instead of number to avoid update. (Jani)

Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
---
 Documentation/gpu/drm-uapi.rst | 73 +++++++++++++++++++++++++++++++++-
 1 file changed, 72 insertions(+), 1 deletion(-)

diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
index d98428a592f1..5b63f1c17b9b 100644
--- a/Documentation/gpu/drm-uapi.rst
+++ b/Documentation/gpu/drm-uapi.rst
@@ -418,7 +418,7 @@ needed.
 Recovery
 --------
 
-Current implementation defines four recovery methods, out of which, drivers
+Current implementation defines several recovery methods, out of which, drivers
 can use any one, multiple or none. Method(s) of choice will be sent in the
 uevent environment as ``WEDGED=<method1>[,..,<methodN>]`` in order of less to
 more side-effects. See the section `Vendor Specific Recovery`_
@@ -435,6 +435,7 @@ following expectations.
     rebind          unbind + bind driver
     bus-reset       unbind + bus reset/re-enumeration + bind
     vendor-specific vendor specific recovery method
+    cold-reset      full device cold reset required
     unknown         consumer policy
     =============== ========================================
 
@@ -446,6 +447,27 @@ telemetry information (devcoredump, syslog). This is useful because the first
 hang is usually the most critical one which can result in consequential hangs or
 complete wedging.
 
+Cold Reset Recovery
+-------------------
+
+The ``WEDGED=cold-reset`` event indicates that the device has encountered
+power management unit errors that affect core functionality that cannot be
+resolved through recovery mechanisms.
+
+This recovery method is reserved for power management unit error conditions where the
+device state cannot be restored via:
+
+- Driver unbind/rebind operations
+- PCIe bus reset and re-enumeration
+- Device Function Level Reset (FLR)
+- Warm device resets
+
+Such power management unit error state typically persists across all software-based
+recovery attempts. Only a complete device power cycle can restore
+normal operation.
+
+Upon receiving a ``WEDGED=cold-reset`` event, userspace should initiate
+a full cold reset of the affected device to restore functionality.
 
 Vendor Specific Recovery
 ------------------------
@@ -524,6 +546,55 @@ Recovery script::
     echo -n $DEVICE > $DRIVER/unbind
     echo -n $DEVICE > $DRIVER/bind
 
+Example - cold-reset
+--------------------
+
+Udev rule::
+
+    SUBSYSTEM=="drm", ENV{WEDGED}=="cold-reset", DEVPATH=="*/drm/card[0-9]",
+    RUN+="/path/to/cold-reset.sh $env{DEVPATH}"
+
+Recovery script::
+
+    #!/bin/sh
+
+    [ -z "$1" ] && echo "Usage: $0 <device-path>" && exit 1
+
+    # Get device
+    DEVPATH=$(readlink -f /sys/$1/device 2>/dev/null || readlink -f /sys/$1)
+    DEVICE=$(basename $DEVPATH)
+
+    echo "Cold reset: $DEVICE"
+
+    # Try slot power reset first
+    SLOT=$(find /sys/bus/pci/slots/ -type l 2>/dev/null | while read slot; do
+	    ADDR=$(cat "$slot" 2>/dev/null)
+	    [ -n "$ADDR" ] && echo "$DEVICE" | grep -q "^$ADDR" && basename $(dirname "$slot") && break
+    done)
+
+    if [ -n "$SLOT" ]; then
+	echo "Using slot $SLOT"
+
+	# Unbind driver
+	[ -e "/sys/bus/pci/devices/$DEVICE/driver" ] && \
+	echo "$DEVICE" > /sys/bus/pci/devices/$DEVICE/driver/unbind 2>/dev/null
+
+	# Remove device
+	echo 1 > /sys/bus/pci/devices/$DEVICE/remove
+
+	# Power cycle slot
+	echo 0 > /sys/bus/pci/slots/$SLOT/power
+	sleep 2
+	echo 1 > /sys/bus/pci/slots/$SLOT/power
+	sleep 1
+
+	# Rescan
+	echo 1 > /sys/bus/pci/rescan
+	echo "Done!"
+    else
+	echo "No slot found"
+    fi
+
 Customization
 -------------
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 4/5] drm/xe: Add handler for power management unit errors which require cold-reset
  2026-03-18  6:40 [PATCH v2 0/5] Introduce cold reset recovery method Mallesh Koujalagi
                   ` (2 preceding siblings ...)
  2026-03-18  6:40 ` [PATCH v2 3/5] drm/doc: Document DRM_WEDGE_RECOVERY_COLD_RESET recovery method Mallesh Koujalagi
@ 2026-03-18  6:40 ` Mallesh Koujalagi
  2026-03-30  4:54   ` Tauro, Riana
  2026-04-02  8:19   ` Raag Jadav
  2026-03-18  6:40 ` [PATCH v2 5/5] drm/xe/debugfs: Add interface to trigger power management unit error handler Mallesh Koujalagi
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 25+ messages in thread
From: Mallesh Koujalagi @ 2026-03-18  6:40 UTC (permalink / raw)
  To: intel-xe, dri-devel, rodrigo.vivi
  Cc: andrealmeid, christian.koenig, airlied, simona.vetter, mripard,
	anshuman.gupta, badal.nilawar, riana.tauro, karthik.poosa,
	sk.anirban, raag.jadav, Mallesh Koujalagi

This handler is designed to be called when power management unit errors are
detected that affect device-level state persisting across warm resets. The
cold reset recovery method signals to userspace that only a complete device
power cycle can restore normal operation.

v2:
- Add use case: Handling errors from power management unit,
  which requires a complete power cycle (cold reset)
  to recover. (Christian)

Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
---
 drivers/gpu/drm/xe/xe_hw_error.c | 27 +++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_hw_error.h |  1 +
 drivers/gpu/drm/xe/xe_ras.c      |  3 ++-
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_hw_error.c b/drivers/gpu/drm/xe/xe_hw_error.c
index 2a31b430570e..ca965a2b092c 100644
--- a/drivers/gpu/drm/xe/xe_hw_error.c
+++ b/drivers/gpu/drm/xe/xe_hw_error.c
@@ -5,6 +5,7 @@
 
 #include <linux/bitmap.h>
 #include <linux/fault-inject.h>
+#include <drm/drm_drv.h>
 
 #include "regs/xe_gsc_regs.h"
 #include "regs/xe_hw_error_regs.h"
@@ -542,6 +543,32 @@ static void process_hw_errors(struct xe_device *xe)
 	}
 }
 
+/**
+ * xe_punit_error_handler - Handler for power management unit errors
+ * @xe: device instance
+ *
+ * Handles power management unit errors that affect the device and cannot
+ * be recovered through driver reload, PCIe reset, etc.
+ *
+ * Marks the device as wedged with DRM_WEDGE_RECOVERY_COLD_RESET method
+ * and notifies userspace that a complete device power cycle is required.
+ */
+void xe_punit_error_handler(struct xe_device *xe)
+{
+	drm_err(&xe->drm, "CRITICAL: PMU error detected\n");
+	drm_err(&xe->drm, "Recovery: Device cold reset required\n");
+
+	/* Set cold reset recovery method */
+	xe_device_set_wedged_method(xe, DRM_WEDGE_RECOVERY_COLD_RESET);
+
+	if (xe_device_wedged(xe)) {
+		drm_dev_wedged_event(&xe->drm, xe->wedged.method, NULL);
+	} else {
+		/* Declare device wedged - will trigger uevent with cold reset method */
+		xe_device_declare_wedged(xe);
+	}
+}
+
 /**
  * xe_hw_error_init - Initialize hw errors
  * @xe: xe device instance
diff --git a/drivers/gpu/drm/xe/xe_hw_error.h b/drivers/gpu/drm/xe/xe_hw_error.h
index d86e28c5180c..f588320eb94d 100644
--- a/drivers/gpu/drm/xe/xe_hw_error.h
+++ b/drivers/gpu/drm/xe/xe_hw_error.h
@@ -11,5 +11,6 @@ struct xe_tile;
 struct xe_device;
 
 void xe_hw_error_irq_handler(struct xe_tile *tile, const u32 master_ctl);
+void xe_punit_error_handler(struct xe_device *xe);
 void xe_hw_error_init(struct xe_device *xe);
 #endif
diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
index 777321021391..93257d0eaaa0 100644
--- a/drivers/gpu/drm/xe/xe_ras.c
+++ b/drivers/gpu/drm/xe/xe_ras.c
@@ -10,6 +10,7 @@
 #include "xe_survivability_mode.h"
 #include "xe_sysctrl_mailbox.h"
 #include "xe_sysctrl_mailbox_types.h"
+#include "xe_hw_error.h"
 
 #define COMPUTE_ERROR_SEVERITY_MASK		GENMASK(26, 25)
 #define GLOBAL_UNCORR_ERROR			2
@@ -148,7 +149,7 @@ static enum xe_ras_recovery_action handle_soc_internal_errors(struct xe_device *
 			xe_err(xe, "[RAS]: PUNIT %s error detected: 0x%x\n",
 			       severity_to_str(xe, common_info.severity),
 			       ieh_error->error_sources_ieh0.punit);
-			/** TODO: Add PUNIT error handling */
+			xe_punit_error_handler(xe);
 			action = XE_RAS_RECOVERY_ACTION_DISCONNECT;
 		}
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 5/5] drm/xe/debugfs: Add interface to trigger power management unit error handler
  2026-03-18  6:40 [PATCH v2 0/5] Introduce cold reset recovery method Mallesh Koujalagi
                   ` (3 preceding siblings ...)
  2026-03-18  6:40 ` [PATCH v2 4/5] drm/xe: Add handler for power management unit errors which require cold-reset Mallesh Koujalagi
@ 2026-03-18  6:40 ` Mallesh Koujalagi
  2026-03-30  4:55   ` Tauro, Riana
  2026-03-18  6:49 ` ✗ CI.checkpatch: warning for Introduce cold reset recovery method Patchwork
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: Mallesh Koujalagi @ 2026-03-18  6:40 UTC (permalink / raw)
  To: intel-xe, dri-devel, rodrigo.vivi
  Cc: andrealmeid, christian.koenig, airlied, simona.vetter, mripard,
	anshuman.gupta, badal.nilawar, riana.tauro, karthik.poosa,
	sk.anirban, raag.jadav, Mallesh Koujalagi

Add a debugfs interface to manually trigger power management unit error
handler for testing cold reset recovery paths. This is useful for
validating the error recovery mechanism.

The new debugfs entry 'trigger_punit_error' is located at:
  /sys/kernel/debug/dri/N/trigger_punit_error

Reading the file displays usage instructions. Writing '1' invokes
xe_punit_error_handler(), which marks the device as wedged with
DRM_WEDGE_RECOVERY_COLD_RESET method and sends a uevent to userspace
indicating that a complete device power cycle is required for recovery.

Writing '0' or any other false value has no effect.

This interface is intended for development, testing, and validation
of power management unit error recovery code.

Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
---
 drivers/gpu/drm/xe/xe_debugfs.c | 38 +++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_debugfs.c b/drivers/gpu/drm/xe/xe_debugfs.c
index 844cfafe1ec7..390bbed9c1af 100644
--- a/drivers/gpu/drm/xe/xe_debugfs.c
+++ b/drivers/gpu/drm/xe/xe_debugfs.c
@@ -18,6 +18,7 @@
 #include "xe_gt_debugfs.h"
 #include "xe_gt_printk.h"
 #include "xe_guc_ads.h"
+#include "xe_hw_error.h"
 #include "xe_mmio.h"
 #include "xe_pm.h"
 #include "xe_psmi.h"
@@ -509,6 +510,40 @@ static const struct file_operations disable_late_binding_fops = {
 	.write = disable_late_binding_set,
 };
 
+static ssize_t trigger_punit_error_show(struct file *f, char __user *ubuf,
+					size_t size, loff_t *pos)
+{
+	const char *msg = "Write 1 to trigger power management unit error handler\n";
+
+	return simple_read_from_buffer(ubuf, size, pos, msg, strlen(msg));
+}
+
+static ssize_t trigger_punit_error_set(struct file *f,
+				       const char __user *ubuf,
+				       size_t size, loff_t *pos)
+{
+	struct xe_device *xe = file_inode(f)->i_private;
+	bool trigger;
+	ssize_t ret;
+
+	ret = kstrtobool_from_user(ubuf, size, &trigger);
+	if (ret)
+		return ret;
+
+	if (trigger) {
+		xe_punit_error_handler(xe);
+		drm_info(&xe->drm, "PMU error handler triggered via debugfs\n");
+	}
+
+	return size;
+}
+
+static const struct file_operations trigger_punit_error_fops = {
+	.owner = THIS_MODULE,
+	.read = trigger_punit_error_show,
+	.write = trigger_punit_error_set,
+};
+
 void xe_debugfs_register(struct xe_device *xe)
 {
 	struct ttm_device *bdev = &xe->ttm;
@@ -550,6 +585,9 @@ void xe_debugfs_register(struct xe_device *xe)
 	debugfs_create_file("disable_late_binding", 0600, root, xe,
 			    &disable_late_binding_fops);
 
+	debugfs_create_file("trigger_punit_error", 0600, root, xe,
+			    &trigger_punit_error_fops);
+
 	/*
 	 * Don't expose page reclaim configuration file if not supported by the
 	 * hardware initially.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* ✗ CI.checkpatch: warning for Introduce cold reset recovery method
  2026-03-18  6:40 [PATCH v2 0/5] Introduce cold reset recovery method Mallesh Koujalagi
                   ` (4 preceding siblings ...)
  2026-03-18  6:40 ` [PATCH v2 5/5] drm/xe/debugfs: Add interface to trigger power management unit error handler Mallesh Koujalagi
@ 2026-03-18  6:49 ` Patchwork
  2026-03-18  6:50 ` ✓ CI.KUnit: success " Patchwork
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: Patchwork @ 2026-03-18  6:49 UTC (permalink / raw)
  To: Mallesh Koujalagi; +Cc: intel-xe

== Series Details ==

Series: Introduce cold reset recovery method
URL   : https://patchwork.freedesktop.org/series/163428/
State : warning

== Summary ==

+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
1f57ba1afceae32108bd24770069f764d940a0e4
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit c28828b028bc9deaa12917490edc967ba70a630b
Author: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
Date:   Wed Mar 18 12:10:22 2026 +0530

    drm/xe/debugfs: Add interface to trigger power management unit error handler
    
    Add a debugfs interface to manually trigger power management unit error
    handler for testing cold reset recovery paths. This is useful for
    validating the error recovery mechanism.
    
    The new debugfs entry 'trigger_punit_error' is located at:
      /sys/kernel/debug/dri/N/trigger_punit_error
    
    Reading the file displays usage instructions. Writing '1' invokes
    xe_punit_error_handler(), which marks the device as wedged with
    DRM_WEDGE_RECOVERY_COLD_RESET method and sends a uevent to userspace
    indicating that a complete device power cycle is required for recovery.
    
    Writing '0' or any other false value has no effect.
    
    This interface is intended for development, testing, and validation
    of power management unit error recovery code.
    
    Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
+ /mt/dim checkpatch 146a21986f74225d0343edeb925095825fa5474f drm-intel
f1fbd4586e6a Introduce Xe Uncorrectable Error Handling
-:42: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#42: 
new file mode 100644

-:1765: ERROR:NO_AUTHOR_SIGN_OFF: Missing Signed-off-by: line by nominal patch author 'Riana Tauro <riana.tauro@intel.com>'

total: 1 errors, 1 warnings, 0 checks, 1628 lines checked
576aef7e2f0d drm: Add DRM_WEDGE_RECOVERY_COLD_RESET for power management unit error
ca74123dec02 drm/doc: Document DRM_WEDGE_RECOVERY_COLD_RESET recovery method
5d0d53688229 drm/xe: Add handler for power management unit errors which require cold-reset
c28828b028bc drm/xe/debugfs: Add interface to trigger power management unit error handler



^ permalink raw reply	[flat|nested] 25+ messages in thread

* ✓ CI.KUnit: success for Introduce cold reset recovery method
  2026-03-18  6:40 [PATCH v2 0/5] Introduce cold reset recovery method Mallesh Koujalagi
                   ` (5 preceding siblings ...)
  2026-03-18  6:49 ` ✗ CI.checkpatch: warning for Introduce cold reset recovery method Patchwork
@ 2026-03-18  6:50 ` Patchwork
  2026-03-18  7:33 ` ✓ Xe.CI.BAT: " Patchwork
  2026-03-19 20:20 ` ✓ Xe.CI.FULL: " Patchwork
  8 siblings, 0 replies; 25+ messages in thread
From: Patchwork @ 2026-03-18  6:50 UTC (permalink / raw)
  To: Mallesh Koujalagi; +Cc: intel-xe

== Series Details ==

Series: Introduce cold reset recovery method
URL   : https://patchwork.freedesktop.org/series/163428/
State : success

== Summary ==

+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
[06:49:31] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[06:49:35] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[06:50:06] Starting KUnit Kernel (1/1)...
[06:50:06] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[06:50:06] ================== guc_buf (11 subtests) ===================
[06:50:06] [PASSED] test_smallest
[06:50:06] [PASSED] test_largest
[06:50:06] [PASSED] test_granular
[06:50:06] [PASSED] test_unique
[06:50:06] [PASSED] test_overlap
[06:50:06] [PASSED] test_reusable
[06:50:06] [PASSED] test_too_big
[06:50:06] [PASSED] test_flush
[06:50:06] [PASSED] test_lookup
[06:50:06] [PASSED] test_data
[06:50:06] [PASSED] test_class
[06:50:06] ===================== [PASSED] guc_buf =====================
[06:50:06] =================== guc_dbm (7 subtests) ===================
[06:50:06] [PASSED] test_empty
[06:50:06] [PASSED] test_default
[06:50:06] ======================== test_size  ========================
[06:50:06] [PASSED] 4
[06:50:06] [PASSED] 8
[06:50:06] [PASSED] 32
[06:50:06] [PASSED] 256
[06:50:06] ==================== [PASSED] test_size ====================
[06:50:06] ======================= test_reuse  ========================
[06:50:06] [PASSED] 4
[06:50:06] [PASSED] 8
[06:50:06] [PASSED] 32
[06:50:06] [PASSED] 256
[06:50:06] =================== [PASSED] test_reuse ====================
[06:50:06] =================== test_range_overlap  ====================
[06:50:06] [PASSED] 4
[06:50:06] [PASSED] 8
[06:50:06] [PASSED] 32
[06:50:06] [PASSED] 256
[06:50:06] =============== [PASSED] test_range_overlap ================
[06:50:06] =================== test_range_compact  ====================
[06:50:06] [PASSED] 4
[06:50:06] [PASSED] 8
[06:50:06] [PASSED] 32
[06:50:06] [PASSED] 256
[06:50:06] =============== [PASSED] test_range_compact ================
[06:50:06] ==================== test_range_spare  =====================
[06:50:06] [PASSED] 4
[06:50:06] [PASSED] 8
[06:50:06] [PASSED] 32
[06:50:06] [PASSED] 256
[06:50:06] ================ [PASSED] test_range_spare =================
[06:50:06] ===================== [PASSED] guc_dbm =====================
[06:50:06] =================== guc_idm (6 subtests) ===================
[06:50:06] [PASSED] bad_init
[06:50:06] [PASSED] no_init
[06:50:06] [PASSED] init_fini
[06:50:06] [PASSED] check_used
[06:50:06] [PASSED] check_quota
[06:50:06] [PASSED] check_all
[06:50:06] ===================== [PASSED] guc_idm =====================
[06:50:06] ================== no_relay (3 subtests) ===================
[06:50:06] [PASSED] xe_drops_guc2pf_if_not_ready
[06:50:06] [PASSED] xe_drops_guc2vf_if_not_ready
[06:50:06] [PASSED] xe_rejects_send_if_not_ready
[06:50:06] ==================== [PASSED] no_relay =====================
[06:50:06] ================== pf_relay (14 subtests) ==================
[06:50:06] [PASSED] pf_rejects_guc2pf_too_short
[06:50:06] [PASSED] pf_rejects_guc2pf_too_long
[06:50:06] [PASSED] pf_rejects_guc2pf_no_payload
[06:50:06] [PASSED] pf_fails_no_payload
[06:50:06] [PASSED] pf_fails_bad_origin
[06:50:06] [PASSED] pf_fails_bad_type
[06:50:06] [PASSED] pf_txn_reports_error
[06:50:06] [PASSED] pf_txn_sends_pf2guc
[06:50:06] [PASSED] pf_sends_pf2guc
[06:50:06] [SKIPPED] pf_loopback_nop
[06:50:06] [SKIPPED] pf_loopback_echo
[06:50:06] [SKIPPED] pf_loopback_fail
[06:50:06] [SKIPPED] pf_loopback_busy
[06:50:06] [SKIPPED] pf_loopback_retry
[06:50:06] ==================== [PASSED] pf_relay =====================
[06:50:06] ================== vf_relay (3 subtests) ===================
[06:50:06] [PASSED] vf_rejects_guc2vf_too_short
[06:50:06] [PASSED] vf_rejects_guc2vf_too_long
[06:50:06] [PASSED] vf_rejects_guc2vf_no_payload
[06:50:06] ==================== [PASSED] vf_relay =====================
[06:50:06] ================ pf_gt_config (9 subtests) =================
[06:50:06] [PASSED] fair_contexts_1vf
[06:50:06] [PASSED] fair_doorbells_1vf
[06:50:06] [PASSED] fair_ggtt_1vf
[06:50:06] ====================== fair_vram_1vf  ======================
[06:50:06] [PASSED] 3.50 GiB
[06:50:06] [PASSED] 11.5 GiB
[06:50:06] [PASSED] 15.5 GiB
[06:50:06] [PASSED] 31.5 GiB
[06:50:06] [PASSED] 63.5 GiB
[06:50:06] [PASSED] 1.91 GiB
[06:50:06] ================== [PASSED] fair_vram_1vf ==================
[06:50:06] ================ fair_vram_1vf_admin_only  =================
[06:50:06] [PASSED] 3.50 GiB
[06:50:06] [PASSED] 11.5 GiB
[06:50:06] [PASSED] 15.5 GiB
[06:50:06] [PASSED] 31.5 GiB
[06:50:06] [PASSED] 63.5 GiB
[06:50:06] [PASSED] 1.91 GiB
[06:50:06] ============ [PASSED] fair_vram_1vf_admin_only =============
[06:50:06] ====================== fair_contexts  ======================
[06:50:06] [PASSED] 1 VF
[06:50:06] [PASSED] 2 VFs
[06:50:06] [PASSED] 3 VFs
[06:50:06] [PASSED] 4 VFs
[06:50:06] [PASSED] 5 VFs
[06:50:06] [PASSED] 6 VFs
[06:50:06] [PASSED] 7 VFs
[06:50:06] [PASSED] 8 VFs
[06:50:06] [PASSED] 9 VFs
[06:50:06] [PASSED] 10 VFs
[06:50:06] [PASSED] 11 VFs
[06:50:06] [PASSED] 12 VFs
[06:50:06] [PASSED] 13 VFs
[06:50:06] [PASSED] 14 VFs
[06:50:06] [PASSED] 15 VFs
[06:50:06] [PASSED] 16 VFs
[06:50:06] [PASSED] 17 VFs
[06:50:06] [PASSED] 18 VFs
[06:50:06] [PASSED] 19 VFs
[06:50:06] [PASSED] 20 VFs
[06:50:06] [PASSED] 21 VFs
[06:50:06] [PASSED] 22 VFs
[06:50:06] [PASSED] 23 VFs
[06:50:06] [PASSED] 24 VFs
[06:50:06] [PASSED] 25 VFs
[06:50:06] [PASSED] 26 VFs
[06:50:06] [PASSED] 27 VFs
[06:50:06] [PASSED] 28 VFs
[06:50:06] [PASSED] 29 VFs
[06:50:06] [PASSED] 30 VFs
[06:50:06] [PASSED] 31 VFs
[06:50:06] [PASSED] 32 VFs
[06:50:06] [PASSED] 33 VFs
[06:50:06] [PASSED] 34 VFs
[06:50:06] [PASSED] 35 VFs
[06:50:06] [PASSED] 36 VFs
[06:50:06] [PASSED] 37 VFs
[06:50:06] [PASSED] 38 VFs
[06:50:06] [PASSED] 39 VFs
[06:50:06] [PASSED] 40 VFs
[06:50:06] [PASSED] 41 VFs
[06:50:06] [PASSED] 42 VFs
[06:50:06] [PASSED] 43 VFs
[06:50:06] [PASSED] 44 VFs
[06:50:06] [PASSED] 45 VFs
[06:50:06] [PASSED] 46 VFs
[06:50:06] [PASSED] 47 VFs
[06:50:06] [PASSED] 48 VFs
[06:50:06] [PASSED] 49 VFs
[06:50:06] [PASSED] 50 VFs
[06:50:06] [PASSED] 51 VFs
[06:50:06] [PASSED] 52 VFs
[06:50:06] [PASSED] 53 VFs
[06:50:06] [PASSED] 54 VFs
[06:50:06] [PASSED] 55 VFs
[06:50:06] [PASSED] 56 VFs
[06:50:06] [PASSED] 57 VFs
[06:50:06] [PASSED] 58 VFs
[06:50:06] [PASSED] 59 VFs
[06:50:06] [PASSED] 60 VFs
[06:50:06] [PASSED] 61 VFs
[06:50:06] [PASSED] 62 VFs
[06:50:06] [PASSED] 63 VFs
[06:50:06] ================== [PASSED] fair_contexts ==================
[06:50:06] ===================== fair_doorbells  ======================
[06:50:06] [PASSED] 1 VF
[06:50:06] [PASSED] 2 VFs
[06:50:06] [PASSED] 3 VFs
[06:50:06] [PASSED] 4 VFs
[06:50:06] [PASSED] 5 VFs
[06:50:06] [PASSED] 6 VFs
[06:50:06] [PASSED] 7 VFs
[06:50:06] [PASSED] 8 VFs
[06:50:06] [PASSED] 9 VFs
[06:50:06] [PASSED] 10 VFs
[06:50:06] [PASSED] 11 VFs
[06:50:06] [PASSED] 12 VFs
[06:50:06] [PASSED] 13 VFs
[06:50:06] [PASSED] 14 VFs
[06:50:06] [PASSED] 15 VFs
[06:50:06] [PASSED] 16 VFs
[06:50:06] [PASSED] 17 VFs
[06:50:06] [PASSED] 18 VFs
[06:50:06] [PASSED] 19 VFs
[06:50:06] [PASSED] 20 VFs
[06:50:06] [PASSED] 21 VFs
[06:50:06] [PASSED] 22 VFs
[06:50:06] [PASSED] 23 VFs
[06:50:06] [PASSED] 24 VFs
[06:50:06] [PASSED] 25 VFs
[06:50:06] [PASSED] 26 VFs
[06:50:06] [PASSED] 27 VFs
[06:50:06] [PASSED] 28 VFs
[06:50:06] [PASSED] 29 VFs
[06:50:06] [PASSED] 30 VFs
[06:50:06] [PASSED] 31 VFs
[06:50:06] [PASSED] 32 VFs
[06:50:06] [PASSED] 33 VFs
[06:50:06] [PASSED] 34 VFs
[06:50:06] [PASSED] 35 VFs
[06:50:06] [PASSED] 36 VFs
[06:50:06] [PASSED] 37 VFs
[06:50:06] [PASSED] 38 VFs
[06:50:06] [PASSED] 39 VFs
[06:50:06] [PASSED] 40 VFs
[06:50:06] [PASSED] 41 VFs
[06:50:06] [PASSED] 42 VFs
[06:50:06] [PASSED] 43 VFs
[06:50:06] [PASSED] 44 VFs
[06:50:06] [PASSED] 45 VFs
[06:50:06] [PASSED] 46 VFs
[06:50:06] [PASSED] 47 VFs
[06:50:06] [PASSED] 48 VFs
[06:50:06] [PASSED] 49 VFs
[06:50:06] [PASSED] 50 VFs
[06:50:06] [PASSED] 51 VFs
[06:50:06] [PASSED] 52 VFs
[06:50:06] [PASSED] 53 VFs
[06:50:06] [PASSED] 54 VFs
[06:50:06] [PASSED] 55 VFs
[06:50:06] [PASSED] 56 VFs
[06:50:06] [PASSED] 57 VFs
[06:50:06] [PASSED] 58 VFs
[06:50:06] [PASSED] 59 VFs
[06:50:06] [PASSED] 60 VFs
[06:50:06] [PASSED] 61 VFs
[06:50:06] [PASSED] 62 VFs
[06:50:06] [PASSED] 63 VFs
[06:50:06] ================= [PASSED] fair_doorbells ==================
[06:50:06] ======================== fair_ggtt  ========================
[06:50:06] [PASSED] 1 VF
[06:50:06] [PASSED] 2 VFs
[06:50:06] [PASSED] 3 VFs
[06:50:06] [PASSED] 4 VFs
[06:50:06] [PASSED] 5 VFs
[06:50:06] [PASSED] 6 VFs
[06:50:06] [PASSED] 7 VFs
[06:50:06] [PASSED] 8 VFs
[06:50:06] [PASSED] 9 VFs
[06:50:06] [PASSED] 10 VFs
[06:50:06] [PASSED] 11 VFs
[06:50:06] [PASSED] 12 VFs
[06:50:06] [PASSED] 13 VFs
[06:50:06] [PASSED] 14 VFs
[06:50:06] [PASSED] 15 VFs
[06:50:06] [PASSED] 16 VFs
[06:50:06] [PASSED] 17 VFs
[06:50:06] [PASSED] 18 VFs
[06:50:06] [PASSED] 19 VFs
[06:50:06] [PASSED] 20 VFs
[06:50:06] [PASSED] 21 VFs
[06:50:06] [PASSED] 22 VFs
[06:50:06] [PASSED] 23 VFs
[06:50:06] [PASSED] 24 VFs
[06:50:06] [PASSED] 25 VFs
[06:50:06] [PASSED] 26 VFs
[06:50:06] [PASSED] 27 VFs
[06:50:06] [PASSED] 28 VFs
[06:50:06] [PASSED] 29 VFs
[06:50:06] [PASSED] 30 VFs
[06:50:06] [PASSED] 31 VFs
[06:50:06] [PASSED] 32 VFs
[06:50:06] [PASSED] 33 VFs
[06:50:06] [PASSED] 34 VFs
[06:50:06] [PASSED] 35 VFs
[06:50:06] [PASSED] 36 VFs
[06:50:06] [PASSED] 37 VFs
[06:50:06] [PASSED] 38 VFs
[06:50:06] [PASSED] 39 VFs
[06:50:06] [PASSED] 40 VFs
[06:50:06] [PASSED] 41 VFs
[06:50:06] [PASSED] 42 VFs
[06:50:06] [PASSED] 43 VFs
[06:50:06] [PASSED] 44 VFs
[06:50:06] [PASSED] 45 VFs
[06:50:06] [PASSED] 46 VFs
[06:50:06] [PASSED] 47 VFs
[06:50:06] [PASSED] 48 VFs
[06:50:06] [PASSED] 49 VFs
[06:50:06] [PASSED] 50 VFs
[06:50:06] [PASSED] 51 VFs
[06:50:06] [PASSED] 52 VFs
[06:50:06] [PASSED] 53 VFs
[06:50:06] [PASSED] 54 VFs
[06:50:06] [PASSED] 55 VFs
[06:50:06] [PASSED] 56 VFs
[06:50:06] [PASSED] 57 VFs
[06:50:06] [PASSED] 58 VFs
[06:50:06] [PASSED] 59 VFs
[06:50:06] [PASSED] 60 VFs
[06:50:06] [PASSED] 61 VFs
[06:50:06] [PASSED] 62 VFs
[06:50:06] [PASSED] 63 VFs
[06:50:06] ==================== [PASSED] fair_ggtt ====================
[06:50:06] ======================== fair_vram  ========================
[06:50:06] [PASSED] 1 VF
[06:50:06] [PASSED] 2 VFs
[06:50:06] [PASSED] 3 VFs
[06:50:06] [PASSED] 4 VFs
[06:50:06] [PASSED] 5 VFs
[06:50:06] [PASSED] 6 VFs
[06:50:06] [PASSED] 7 VFs
[06:50:06] [PASSED] 8 VFs
[06:50:06] [PASSED] 9 VFs
[06:50:06] [PASSED] 10 VFs
[06:50:06] [PASSED] 11 VFs
[06:50:06] [PASSED] 12 VFs
[06:50:06] [PASSED] 13 VFs
[06:50:06] [PASSED] 14 VFs
[06:50:06] [PASSED] 15 VFs
[06:50:06] [PASSED] 16 VFs
[06:50:06] [PASSED] 17 VFs
[06:50:06] [PASSED] 18 VFs
[06:50:06] [PASSED] 19 VFs
[06:50:06] [PASSED] 20 VFs
[06:50:06] [PASSED] 21 VFs
[06:50:06] [PASSED] 22 VFs
[06:50:06] [PASSED] 23 VFs
[06:50:06] [PASSED] 24 VFs
[06:50:06] [PASSED] 25 VFs
[06:50:06] [PASSED] 26 VFs
[06:50:06] [PASSED] 27 VFs
[06:50:06] [PASSED] 28 VFs
[06:50:06] [PASSED] 29 VFs
[06:50:06] [PASSED] 30 VFs
[06:50:06] [PASSED] 31 VFs
[06:50:06] [PASSED] 32 VFs
[06:50:06] [PASSED] 33 VFs
[06:50:06] [PASSED] 34 VFs
[06:50:06] [PASSED] 35 VFs
[06:50:06] [PASSED] 36 VFs
[06:50:06] [PASSED] 37 VFs
[06:50:06] [PASSED] 38 VFs
[06:50:06] [PASSED] 39 VFs
[06:50:06] [PASSED] 40 VFs
[06:50:06] [PASSED] 41 VFs
[06:50:06] [PASSED] 42 VFs
[06:50:06] [PASSED] 43 VFs
[06:50:06] [PASSED] 44 VFs
[06:50:06] [PASSED] 45 VFs
[06:50:06] [PASSED] 46 VFs
[06:50:06] [PASSED] 47 VFs
[06:50:06] [PASSED] 48 VFs
[06:50:06] [PASSED] 49 VFs
[06:50:06] [PASSED] 50 VFs
[06:50:06] [PASSED] 51 VFs
[06:50:06] [PASSED] 52 VFs
[06:50:06] [PASSED] 53 VFs
[06:50:06] [PASSED] 54 VFs
[06:50:06] [PASSED] 55 VFs
[06:50:06] [PASSED] 56 VFs
[06:50:06] [PASSED] 57 VFs
[06:50:06] [PASSED] 58 VFs
[06:50:06] [PASSED] 59 VFs
[06:50:06] [PASSED] 60 VFs
[06:50:06] [PASSED] 61 VFs
[06:50:06] [PASSED] 62 VFs
[06:50:06] [PASSED] 63 VFs
[06:50:06] ==================== [PASSED] fair_vram ====================
[06:50:06] ================== [PASSED] pf_gt_config ===================
[06:50:06] ===================== lmtt (1 subtest) =====================
[06:50:06] ======================== test_ops  =========================
[06:50:06] [PASSED] 2-level
[06:50:06] [PASSED] multi-level
[06:50:06] ==================== [PASSED] test_ops =====================
[06:50:06] ====================== [PASSED] lmtt =======================
[06:50:06] ================= pf_service (11 subtests) =================
[06:50:06] [PASSED] pf_negotiate_any
[06:50:06] [PASSED] pf_negotiate_base_match
[06:50:06] [PASSED] pf_negotiate_base_newer
[06:50:06] [PASSED] pf_negotiate_base_next
[06:50:06] [SKIPPED] pf_negotiate_base_older
[06:50:06] [PASSED] pf_negotiate_base_prev
[06:50:06] [PASSED] pf_negotiate_latest_match
[06:50:06] [PASSED] pf_negotiate_latest_newer
[06:50:06] [PASSED] pf_negotiate_latest_next
[06:50:06] [SKIPPED] pf_negotiate_latest_older
[06:50:06] [SKIPPED] pf_negotiate_latest_prev
[06:50:06] =================== [PASSED] pf_service ====================
[06:50:06] ================= xe_guc_g2g (2 subtests) ==================
[06:50:06] ============== xe_live_guc_g2g_kunit_default  ==============
[06:50:06] ========= [SKIPPED] xe_live_guc_g2g_kunit_default ==========
[06:50:06] ============== xe_live_guc_g2g_kunit_allmem  ===============
[06:50:06] ========== [SKIPPED] xe_live_guc_g2g_kunit_allmem ==========
[06:50:06] =================== [SKIPPED] xe_guc_g2g ===================
[06:50:06] =================== xe_mocs (2 subtests) ===================
[06:50:06] ================ xe_live_mocs_kernel_kunit  ================
[06:50:06] =========== [SKIPPED] xe_live_mocs_kernel_kunit ============
[06:50:06] ================ xe_live_mocs_reset_kunit  =================
[06:50:06] ============ [SKIPPED] xe_live_mocs_reset_kunit ============
[06:50:06] ==================== [SKIPPED] xe_mocs =====================
[06:50:06] ================= xe_migrate (2 subtests) ==================
[06:50:06] ================= xe_migrate_sanity_kunit  =================
[06:50:06] ============ [SKIPPED] xe_migrate_sanity_kunit =============
[06:50:06] ================== xe_validate_ccs_kunit  ==================
[06:50:06] ============= [SKIPPED] xe_validate_ccs_kunit ==============
[06:50:06] =================== [SKIPPED] xe_migrate ===================
[06:50:06] ================== xe_dma_buf (1 subtest) ==================
[06:50:06] ==================== xe_dma_buf_kunit  =====================
[06:50:06] ================ [SKIPPED] xe_dma_buf_kunit ================
[06:50:06] =================== [SKIPPED] xe_dma_buf ===================
[06:50:06] ================= xe_bo_shrink (1 subtest) =================
[06:50:06] =================== xe_bo_shrink_kunit  ====================
[06:50:06] =============== [SKIPPED] xe_bo_shrink_kunit ===============
[06:50:06] ================== [SKIPPED] xe_bo_shrink ==================
[06:50:06] ==================== xe_bo (2 subtests) ====================
[06:50:06] ================== xe_ccs_migrate_kunit  ===================
[06:50:06] ============== [SKIPPED] xe_ccs_migrate_kunit ==============
[06:50:06] ==================== xe_bo_evict_kunit  ====================
[06:50:06] =============== [SKIPPED] xe_bo_evict_kunit ================
[06:50:06] ===================== [SKIPPED] xe_bo ======================
[06:50:06] ==================== args (13 subtests) ====================
[06:50:06] [PASSED] count_args_test
[06:50:06] [PASSED] call_args_example
[06:50:06] [PASSED] call_args_test
[06:50:06] [PASSED] drop_first_arg_example
[06:50:06] [PASSED] drop_first_arg_test
[06:50:06] [PASSED] first_arg_example
[06:50:06] [PASSED] first_arg_test
[06:50:06] [PASSED] last_arg_example
[06:50:06] [PASSED] last_arg_test
[06:50:06] [PASSED] pick_arg_example
[06:50:06] [PASSED] if_args_example
[06:50:06] [PASSED] if_args_test
[06:50:06] [PASSED] sep_comma_example
[06:50:06] ====================== [PASSED] args =======================
[06:50:06] =================== xe_pci (3 subtests) ====================
[06:50:06] ==================== check_graphics_ip  ====================
[06:50:06] [PASSED] 12.00 Xe_LP
[06:50:06] [PASSED] 12.10 Xe_LP+
[06:50:06] [PASSED] 12.55 Xe_HPG
[06:50:06] [PASSED] 12.60 Xe_HPC
[06:50:06] [PASSED] 12.70 Xe_LPG
[06:50:06] [PASSED] 12.71 Xe_LPG
[06:50:06] [PASSED] 12.74 Xe_LPG+
[06:50:06] [PASSED] 20.01 Xe2_HPG
[06:50:06] [PASSED] 20.02 Xe2_HPG
[06:50:06] [PASSED] 20.04 Xe2_LPG
[06:50:06] [PASSED] 30.00 Xe3_LPG
[06:50:06] [PASSED] 30.01 Xe3_LPG
[06:50:06] [PASSED] 30.03 Xe3_LPG
[06:50:06] [PASSED] 30.04 Xe3_LPG
[06:50:06] [PASSED] 30.05 Xe3_LPG
[06:50:06] [PASSED] 35.10 Xe3p_LPG
[06:50:06] [PASSED] 35.11 Xe3p_XPC
[06:50:06] ================ [PASSED] check_graphics_ip ================
[06:50:06] ===================== check_media_ip  ======================
[06:50:06] [PASSED] 12.00 Xe_M
[06:50:06] [PASSED] 12.55 Xe_HPM
[06:50:06] [PASSED] 13.00 Xe_LPM+
[06:50:06] [PASSED] 13.01 Xe2_HPM
[06:50:06] [PASSED] 20.00 Xe2_LPM
[06:50:06] [PASSED] 30.00 Xe3_LPM
[06:50:06] [PASSED] 30.02 Xe3_LPM
[06:50:06] [PASSED] 35.00 Xe3p_LPM
[06:50:06] [PASSED] 35.03 Xe3p_HPM
[06:50:06] ================= [PASSED] check_media_ip ==================
[06:50:06] =================== check_platform_desc  ===================
[06:50:06] [PASSED] 0x9A60 (TIGERLAKE)
[06:50:06] [PASSED] 0x9A68 (TIGERLAKE)
[06:50:06] [PASSED] 0x9A70 (TIGERLAKE)
[06:50:06] [PASSED] 0x9A40 (TIGERLAKE)
[06:50:06] [PASSED] 0x9A49 (TIGERLAKE)
[06:50:06] [PASSED] 0x9A59 (TIGERLAKE)
[06:50:06] [PASSED] 0x9A78 (TIGERLAKE)
[06:50:06] [PASSED] 0x9AC0 (TIGERLAKE)
[06:50:06] [PASSED] 0x9AC9 (TIGERLAKE)
[06:50:06] [PASSED] 0x9AD9 (TIGERLAKE)
[06:50:06] [PASSED] 0x9AF8 (TIGERLAKE)
[06:50:06] [PASSED] 0x4C80 (ROCKETLAKE)
[06:50:06] [PASSED] 0x4C8A (ROCKETLAKE)
[06:50:06] [PASSED] 0x4C8B (ROCKETLAKE)
[06:50:06] [PASSED] 0x4C8C (ROCKETLAKE)
[06:50:06] [PASSED] 0x4C90 (ROCKETLAKE)
[06:50:06] [PASSED] 0x4C9A (ROCKETLAKE)
[06:50:06] [PASSED] 0x4680 (ALDERLAKE_S)
[06:50:06] [PASSED] 0x4682 (ALDERLAKE_S)
[06:50:06] [PASSED] 0x4688 (ALDERLAKE_S)
[06:50:06] [PASSED] 0x468A (ALDERLAKE_S)
[06:50:06] [PASSED] 0x468B (ALDERLAKE_S)
[06:50:06] [PASSED] 0x4690 (ALDERLAKE_S)
[06:50:06] [PASSED] 0x4692 (ALDERLAKE_S)
[06:50:06] [PASSED] 0x4693 (ALDERLAKE_S)
[06:50:06] [PASSED] 0x46A0 (ALDERLAKE_P)
[06:50:06] [PASSED] 0x46A1 (ALDERLAKE_P)
[06:50:06] [PASSED] 0x46A2 (ALDERLAKE_P)
[06:50:06] [PASSED] 0x46A3 (ALDERLAKE_P)
[06:50:06] [PASSED] 0x46A6 (ALDERLAKE_P)
[06:50:06] [PASSED] 0x46A8 (ALDERLAKE_P)
[06:50:06] [PASSED] 0x46AA (ALDERLAKE_P)
[06:50:06] [PASSED] 0x462A (ALDERLAKE_P)
[06:50:06] [PASSED] 0x4626 (ALDERLAKE_P)
[06:50:06] [PASSED] 0x4628 (ALDERLAKE_P)
[06:50:06] [PASSED] 0x46B0 (ALDERLAKE_P)
[06:50:06] [PASSED] 0x46B1 (ALDERLAKE_P)
[06:50:06] [PASSED] 0x46B2 (ALDERLAKE_P)
[06:50:06] [PASSED] 0x46B3 (ALDERLAKE_P)
[06:50:06] [PASSED] 0x46C0 (ALDERLAKE_P)
[06:50:06] [PASSED] 0x46C1 (ALDERLAKE_P)
[06:50:06] [PASSED] 0x46C2 (ALDERLAKE_P)
[06:50:06] [PASSED] 0x46C3 (ALDERLAKE_P)
[06:50:06] [PASSED] 0x46D0 (ALDERLAKE_N)
[06:50:06] [PASSED] 0x46D1 (ALDERLAKE_N)
[06:50:06] [PASSED] 0x46D2 (ALDERLAKE_N)
[06:50:06] [PASSED] 0x46D3 (ALDERLAKE_N)
[06:50:06] [PASSED] 0x46D4 (ALDERLAKE_N)
[06:50:06] [PASSED] 0xA721 (ALDERLAKE_P)
[06:50:06] [PASSED] 0xA7A1 (ALDERLAKE_P)
[06:50:06] [PASSED] 0xA7A9 (ALDERLAKE_P)
[06:50:06] [PASSED] 0xA7AC (ALDERLAKE_P)
[06:50:06] [PASSED] 0xA7AD (ALDERLAKE_P)
[06:50:06] [PASSED] 0xA720 (ALDERLAKE_P)
[06:50:06] [PASSED] 0xA7A0 (ALDERLAKE_P)
[06:50:06] [PASSED] 0xA7A8 (ALDERLAKE_P)
[06:50:06] [PASSED] 0xA7AA (ALDERLAKE_P)
[06:50:06] [PASSED] 0xA7AB (ALDERLAKE_P)
[06:50:06] [PASSED] 0xA780 (ALDERLAKE_S)
[06:50:06] [PASSED] 0xA781 (ALDERLAKE_S)
[06:50:06] [PASSED] 0xA782 (ALDERLAKE_S)
[06:50:06] [PASSED] 0xA783 (ALDERLAKE_S)
[06:50:06] [PASSED] 0xA788 (ALDERLAKE_S)
[06:50:06] [PASSED] 0xA789 (ALDERLAKE_S)
[06:50:06] [PASSED] 0xA78A (ALDERLAKE_S)
[06:50:06] [PASSED] 0xA78B (ALDERLAKE_S)
[06:50:06] [PASSED] 0x4905 (DG1)
[06:50:06] [PASSED] 0x4906 (DG1)
[06:50:06] [PASSED] 0x4907 (DG1)
[06:50:06] [PASSED] 0x4908 (DG1)
[06:50:06] [PASSED] 0x4909 (DG1)
[06:50:06] [PASSED] 0x56C0 (DG2)
[06:50:06] [PASSED] 0x56C2 (DG2)
[06:50:06] [PASSED] 0x56C1 (DG2)
[06:50:06] [PASSED] 0x7D51 (METEORLAKE)
[06:50:06] [PASSED] 0x7DD1 (METEORLAKE)
[06:50:06] [PASSED] 0x7D41 (METEORLAKE)
[06:50:06] [PASSED] 0x7D67 (METEORLAKE)
[06:50:06] [PASSED] 0xB640 (METEORLAKE)
[06:50:06] [PASSED] 0x56A0 (DG2)
[06:50:06] [PASSED] 0x56A1 (DG2)
[06:50:06] [PASSED] 0x56A2 (DG2)
[06:50:06] [PASSED] 0x56BE (DG2)
[06:50:06] [PASSED] 0x56BF (DG2)
[06:50:06] [PASSED] 0x5690 (DG2)
[06:50:06] [PASSED] 0x5691 (DG2)
[06:50:06] [PASSED] 0x5692 (DG2)
[06:50:06] [PASSED] 0x56A5 (DG2)
[06:50:06] [PASSED] 0x56A6 (DG2)
[06:50:06] [PASSED] 0x56B0 (DG2)
[06:50:06] [PASSED] 0x56B1 (DG2)
[06:50:06] [PASSED] 0x56BA (DG2)
[06:50:06] [PASSED] 0x56BB (DG2)
[06:50:06] [PASSED] 0x56BC (DG2)
[06:50:06] [PASSED] 0x56BD (DG2)
[06:50:06] [PASSED] 0x5693 (DG2)
[06:50:06] [PASSED] 0x5694 (DG2)
[06:50:06] [PASSED] 0x5695 (DG2)
[06:50:06] [PASSED] 0x56A3 (DG2)
[06:50:06] [PASSED] 0x56A4 (DG2)
[06:50:06] [PASSED] 0x56B2 (DG2)
[06:50:06] [PASSED] 0x56B3 (DG2)
[06:50:06] [PASSED] 0x5696 (DG2)
[06:50:06] [PASSED] 0x5697 (DG2)
[06:50:06] [PASSED] 0xB69 (PVC)
[06:50:06] [PASSED] 0xB6E (PVC)
[06:50:06] [PASSED] 0xBD4 (PVC)
[06:50:06] [PASSED] 0xBD5 (PVC)
[06:50:06] [PASSED] 0xBD6 (PVC)
[06:50:06] [PASSED] 0xBD7 (PVC)
[06:50:06] [PASSED] 0xBD8 (PVC)
[06:50:06] [PASSED] 0xBD9 (PVC)
[06:50:06] [PASSED] 0xBDA (PVC)
[06:50:06] [PASSED] 0xBDB (PVC)
[06:50:06] [PASSED] 0xBE0 (PVC)
[06:50:06] [PASSED] 0xBE1 (PVC)
[06:50:06] [PASSED] 0xBE5 (PVC)
[06:50:06] [PASSED] 0x7D40 (METEORLAKE)
[06:50:06] [PASSED] 0x7D45 (METEORLAKE)
[06:50:06] [PASSED] 0x7D55 (METEORLAKE)
[06:50:06] [PASSED] 0x7D60 (METEORLAKE)
[06:50:06] [PASSED] 0x7DD5 (METEORLAKE)
[06:50:06] [PASSED] 0x6420 (LUNARLAKE)
[06:50:06] [PASSED] 0x64A0 (LUNARLAKE)
[06:50:06] [PASSED] 0x64B0 (LUNARLAKE)
[06:50:06] [PASSED] 0xE202 (BATTLEMAGE)
[06:50:06] [PASSED] 0xE209 (BATTLEMAGE)
[06:50:06] [PASSED] 0xE20B (BATTLEMAGE)
[06:50:06] [PASSED] 0xE20C (BATTLEMAGE)
[06:50:06] [PASSED] 0xE20D (BATTLEMAGE)
[06:50:06] [PASSED] 0xE210 (BATTLEMAGE)
[06:50:06] [PASSED] 0xE211 (BATTLEMAGE)
[06:50:06] [PASSED] 0xE212 (BATTLEMAGE)
[06:50:06] [PASSED] 0xE216 (BATTLEMAGE)
[06:50:06] [PASSED] 0xE220 (BATTLEMAGE)
[06:50:06] [PASSED] 0xE221 (BATTLEMAGE)
[06:50:06] [PASSED] 0xE222 (BATTLEMAGE)
[06:50:06] [PASSED] 0xE223 (BATTLEMAGE)
[06:50:06] [PASSED] 0xB080 (PANTHERLAKE)
[06:50:06] [PASSED] 0xB081 (PANTHERLAKE)
[06:50:06] [PASSED] 0xB082 (PANTHERLAKE)
[06:50:06] [PASSED] 0xB083 (PANTHERLAKE)
[06:50:06] [PASSED] 0xB084 (PANTHERLAKE)
[06:50:06] [PASSED] 0xB085 (PANTHERLAKE)
[06:50:06] [PASSED] 0xB086 (PANTHERLAKE)
[06:50:06] [PASSED] 0xB087 (PANTHERLAKE)
[06:50:06] [PASSED] 0xB08F (PANTHERLAKE)
[06:50:06] [PASSED] 0xB090 (PANTHERLAKE)
[06:50:06] [PASSED] 0xB0A0 (PANTHERLAKE)
[06:50:06] [PASSED] 0xB0B0 (PANTHERLAKE)
[06:50:06] [PASSED] 0xFD80 (PANTHERLAKE)
[06:50:06] [PASSED] 0xFD81 (PANTHERLAKE)
[06:50:06] [PASSED] 0xD740 (NOVALAKE_S)
[06:50:06] [PASSED] 0xD741 (NOVALAKE_S)
[06:50:06] [PASSED] 0xD742 (NOVALAKE_S)
[06:50:06] [PASSED] 0xD743 (NOVALAKE_S)
[06:50:06] [PASSED] 0xD744 (NOVALAKE_S)
[06:50:06] [PASSED] 0xD745 (NOVALAKE_S)
[06:50:06] [PASSED] 0x674C (CRESCENTISLAND)
[06:50:06] [PASSED] 0xD750 (NOVALAKE_P)
[06:50:06] [PASSED] 0xD751 (NOVALAKE_P)
[06:50:06] [PASSED] 0xD752 (NOVALAKE_P)
[06:50:06] [PASSED] 0xD753 (NOVALAKE_P)
[06:50:06] [PASSED] 0xD754 (NOVALAKE_P)
[06:50:06] [PASSED] 0xD755 (NOVALAKE_P)
[06:50:06] [PASSED] 0xD756 (NOVALAKE_P)
[06:50:06] [PASSED] 0xD757 (NOVALAKE_P)
[06:50:06] [PASSED] 0xD75F (NOVALAKE_P)
[06:50:06] =============== [PASSED] check_platform_desc ===============
[06:50:06] ===================== [PASSED] xe_pci ======================
[06:50:06] =================== xe_rtp (2 subtests) ====================
[06:50:06] =============== xe_rtp_process_to_sr_tests  ================
[06:50:06] [PASSED] coalesce-same-reg
[06:50:06] [PASSED] no-match-no-add
[06:50:06] [PASSED] match-or
[06:50:06] [PASSED] match-or-xfail
[06:50:06] [PASSED] no-match-no-add-multiple-rules
[06:50:06] [PASSED] two-regs-two-entries
[06:50:06] [PASSED] clr-one-set-other
[06:50:06] [PASSED] set-field
[06:50:06] [PASSED] conflict-duplicate
stty: 'standard input': Inappropriate ioctl for device
[06:50:06] [PASSED] conflict-not-disjoint
[06:50:06] [PASSED] conflict-reg-type
[06:50:06] =========== [PASSED] xe_rtp_process_to_sr_tests ============
[06:50:06] ================== xe_rtp_process_tests  ===================
[06:50:06] [PASSED] active1
[06:50:06] [PASSED] active2
[06:50:06] [PASSED] active-inactive
[06:50:06] [PASSED] inactive-active
[06:50:06] [PASSED] inactive-1st_or_active-inactive
[06:50:06] [PASSED] inactive-2nd_or_active-inactive
[06:50:06] [PASSED] inactive-last_or_active-inactive
[06:50:06] [PASSED] inactive-no_or_active-inactive
[06:50:06] ============== [PASSED] xe_rtp_process_tests ===============
[06:50:06] ===================== [PASSED] xe_rtp ======================
[06:50:06] ==================== xe_wa (1 subtest) =====================
[06:50:06] ======================== xe_wa_gt  =========================
[06:50:06] [PASSED] TIGERLAKE B0
[06:50:06] [PASSED] DG1 A0
[06:50:06] [PASSED] DG1 B0
[06:50:06] [PASSED] ALDERLAKE_S A0
[06:50:06] [PASSED] ALDERLAKE_S B0
[06:50:06] [PASSED] ALDERLAKE_S C0
[06:50:06] [PASSED] ALDERLAKE_S D0
[06:50:06] [PASSED] ALDERLAKE_P A0
[06:50:06] [PASSED] ALDERLAKE_P B0
[06:50:06] [PASSED] ALDERLAKE_P C0
[06:50:06] [PASSED] ALDERLAKE_S RPLS D0
[06:50:06] [PASSED] ALDERLAKE_P RPLU E0
[06:50:06] [PASSED] DG2 G10 C0
[06:50:06] [PASSED] DG2 G11 B1
[06:50:06] [PASSED] DG2 G12 A1
[06:50:06] [PASSED] METEORLAKE 12.70(Xe_LPG) A0 13.00(Xe_LPM+) A0
[06:50:06] [PASSED] METEORLAKE 12.71(Xe_LPG) A0 13.00(Xe_LPM+) A0
[06:50:06] [PASSED] METEORLAKE 12.74(Xe_LPG+) A0 13.00(Xe_LPM+) A0
[06:50:06] [PASSED] LUNARLAKE 20.04(Xe2_LPG) A0 20.00(Xe2_LPM) A0
[06:50:06] [PASSED] LUNARLAKE 20.04(Xe2_LPG) B0 20.00(Xe2_LPM) A0
[06:50:06] [PASSED] BATTLEMAGE 20.01(Xe2_HPG) A0 13.01(Xe2_HPM) A1
[06:50:06] [PASSED] PANTHERLAKE 30.00(Xe3_LPG) A0 30.00(Xe3_LPM) A0
[06:50:06] ==================== [PASSED] xe_wa_gt =====================
[06:50:06] ====================== [PASSED] xe_wa ======================
[06:50:06] ============================================================
[06:50:06] Testing complete. Ran 597 tests: passed: 579, skipped: 18
[06:50:06] Elapsed time: 35.466s total, 4.289s configuring, 30.559s building, 0.606s running

+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/tests/.kunitconfig
[06:50:07] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[06:50:08] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[06:50:33] Starting KUnit Kernel (1/1)...
[06:50:33] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[06:50:33] ============ drm_test_pick_cmdline (2 subtests) ============
[06:50:33] [PASSED] drm_test_pick_cmdline_res_1920_1080_60
[06:50:33] =============== drm_test_pick_cmdline_named  ===============
[06:50:33] [PASSED] NTSC
[06:50:33] [PASSED] NTSC-J
[06:50:33] [PASSED] PAL
[06:50:33] [PASSED] PAL-M
[06:50:33] =========== [PASSED] drm_test_pick_cmdline_named ===========
[06:50:33] ============== [PASSED] drm_test_pick_cmdline ==============
[06:50:33] == drm_test_atomic_get_connector_for_encoder (1 subtest) ===
[06:50:33] [PASSED] drm_test_drm_atomic_get_connector_for_encoder
[06:50:33] ==== [PASSED] drm_test_atomic_get_connector_for_encoder ====
[06:50:33] =========== drm_validate_clone_mode (2 subtests) ===========
[06:50:33] ============== drm_test_check_in_clone_mode  ===============
[06:50:33] [PASSED] in_clone_mode
[06:50:33] [PASSED] not_in_clone_mode
[06:50:33] ========== [PASSED] drm_test_check_in_clone_mode ===========
[06:50:33] =============== drm_test_check_valid_clones  ===============
[06:50:33] [PASSED] not_in_clone_mode
[06:50:33] [PASSED] valid_clone
[06:50:33] [PASSED] invalid_clone
[06:50:33] =========== [PASSED] drm_test_check_valid_clones ===========
[06:50:33] ============= [PASSED] drm_validate_clone_mode =============
[06:50:33] ============= drm_validate_modeset (1 subtest) =============
[06:50:33] [PASSED] drm_test_check_connector_changed_modeset
[06:50:33] ============== [PASSED] drm_validate_modeset ===============
[06:50:33] ====== drm_test_bridge_get_current_state (2 subtests) ======
[06:50:33] [PASSED] drm_test_drm_bridge_get_current_state_atomic
[06:50:33] [PASSED] drm_test_drm_bridge_get_current_state_legacy
[06:50:33] ======== [PASSED] drm_test_bridge_get_current_state ========
[06:50:33] ====== drm_test_bridge_helper_reset_crtc (3 subtests) ======
[06:50:33] [PASSED] drm_test_drm_bridge_helper_reset_crtc_atomic
[06:50:33] [PASSED] drm_test_drm_bridge_helper_reset_crtc_atomic_disabled
[06:50:33] [PASSED] drm_test_drm_bridge_helper_reset_crtc_legacy
[06:50:33] ======== [PASSED] drm_test_bridge_helper_reset_crtc ========
[06:50:33] ============== drm_bridge_alloc (2 subtests) ===============
[06:50:33] [PASSED] drm_test_drm_bridge_alloc_basic
[06:50:33] [PASSED] drm_test_drm_bridge_alloc_get_put
[06:50:33] ================ [PASSED] drm_bridge_alloc =================
[06:50:33] ============= drm_cmdline_parser (40 subtests) =============
[06:50:33] [PASSED] drm_test_cmdline_force_d_only
[06:50:33] [PASSED] drm_test_cmdline_force_D_only_dvi
[06:50:33] [PASSED] drm_test_cmdline_force_D_only_hdmi
[06:50:33] [PASSED] drm_test_cmdline_force_D_only_not_digital
[06:50:33] [PASSED] drm_test_cmdline_force_e_only
[06:50:33] [PASSED] drm_test_cmdline_res
[06:50:33] [PASSED] drm_test_cmdline_res_vesa
[06:50:33] [PASSED] drm_test_cmdline_res_vesa_rblank
[06:50:33] [PASSED] drm_test_cmdline_res_rblank
[06:50:33] [PASSED] drm_test_cmdline_res_bpp
[06:50:33] [PASSED] drm_test_cmdline_res_refresh
[06:50:33] [PASSED] drm_test_cmdline_res_bpp_refresh
[06:50:33] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced
[06:50:33] [PASSED] drm_test_cmdline_res_bpp_refresh_margins
[06:50:33] [PASSED] drm_test_cmdline_res_bpp_refresh_force_off
[06:50:33] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on
[06:50:33] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_analog
[06:50:33] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_digital
[06:50:33] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced_margins_force_on
[06:50:33] [PASSED] drm_test_cmdline_res_margins_force_on
[06:50:33] [PASSED] drm_test_cmdline_res_vesa_margins
[06:50:33] [PASSED] drm_test_cmdline_name
[06:50:33] [PASSED] drm_test_cmdline_name_bpp
[06:50:33] [PASSED] drm_test_cmdline_name_option
[06:50:33] [PASSED] drm_test_cmdline_name_bpp_option
[06:50:33] [PASSED] drm_test_cmdline_rotate_0
[06:50:33] [PASSED] drm_test_cmdline_rotate_90
[06:50:33] [PASSED] drm_test_cmdline_rotate_180
[06:50:33] [PASSED] drm_test_cmdline_rotate_270
[06:50:33] [PASSED] drm_test_cmdline_hmirror
[06:50:33] [PASSED] drm_test_cmdline_vmirror
[06:50:33] [PASSED] drm_test_cmdline_margin_options
[06:50:33] [PASSED] drm_test_cmdline_multiple_options
[06:50:33] [PASSED] drm_test_cmdline_bpp_extra_and_option
[06:50:33] [PASSED] drm_test_cmdline_extra_and_option
[06:50:33] [PASSED] drm_test_cmdline_freestanding_options
[06:50:33] [PASSED] drm_test_cmdline_freestanding_force_e_and_options
[06:50:33] [PASSED] drm_test_cmdline_panel_orientation
[06:50:33] ================ drm_test_cmdline_invalid  =================
[06:50:33] [PASSED] margin_only
[06:50:33] [PASSED] interlace_only
[06:50:33] [PASSED] res_missing_x
[06:50:33] [PASSED] res_missing_y
[06:50:33] [PASSED] res_bad_y
[06:50:33] [PASSED] res_missing_y_bpp
[06:50:33] [PASSED] res_bad_bpp
[06:50:33] [PASSED] res_bad_refresh
[06:50:33] [PASSED] res_bpp_refresh_force_on_off
[06:50:33] [PASSED] res_invalid_mode
[06:50:33] [PASSED] res_bpp_wrong_place_mode
[06:50:33] [PASSED] name_bpp_refresh
[06:50:33] [PASSED] name_refresh
[06:50:33] [PASSED] name_refresh_wrong_mode
[06:50:33] [PASSED] name_refresh_invalid_mode
[06:50:33] [PASSED] rotate_multiple
[06:50:33] [PASSED] rotate_invalid_val
[06:50:33] [PASSED] rotate_truncated
[06:50:33] [PASSED] invalid_option
[06:50:33] [PASSED] invalid_tv_option
[06:50:33] [PASSED] truncated_tv_option
[06:50:33] ============ [PASSED] drm_test_cmdline_invalid =============
[06:50:33] =============== drm_test_cmdline_tv_options  ===============
[06:50:33] [PASSED] NTSC
[06:50:33] [PASSED] NTSC_443
[06:50:33] [PASSED] NTSC_J
[06:50:33] [PASSED] PAL
[06:50:33] [PASSED] PAL_M
[06:50:33] [PASSED] PAL_N
[06:50:33] [PASSED] SECAM
[06:50:33] [PASSED] MONO_525
[06:50:33] [PASSED] MONO_625
[06:50:33] =========== [PASSED] drm_test_cmdline_tv_options ===========
[06:50:33] =============== [PASSED] drm_cmdline_parser ================
[06:50:33] ========== drmm_connector_hdmi_init (20 subtests) ==========
[06:50:33] [PASSED] drm_test_connector_hdmi_init_valid
[06:50:33] [PASSED] drm_test_connector_hdmi_init_bpc_8
[06:50:33] [PASSED] drm_test_connector_hdmi_init_bpc_10
[06:50:33] [PASSED] drm_test_connector_hdmi_init_bpc_12
[06:50:33] [PASSED] drm_test_connector_hdmi_init_bpc_invalid
[06:50:33] [PASSED] drm_test_connector_hdmi_init_bpc_null
[06:50:33] [PASSED] drm_test_connector_hdmi_init_formats_empty
[06:50:33] [PASSED] drm_test_connector_hdmi_init_formats_no_rgb
[06:50:33] === drm_test_connector_hdmi_init_formats_yuv420_allowed  ===
[06:50:33] [PASSED] supported_formats=0x9 yuv420_allowed=1
[06:50:33] [PASSED] supported_formats=0x9 yuv420_allowed=0
[06:50:33] [PASSED] supported_formats=0x3 yuv420_allowed=1
[06:50:33] [PASSED] supported_formats=0x3 yuv420_allowed=0
[06:50:33] === [PASSED] drm_test_connector_hdmi_init_formats_yuv420_allowed ===
[06:50:33] [PASSED] drm_test_connector_hdmi_init_null_ddc
[06:50:33] [PASSED] drm_test_connector_hdmi_init_null_product
[06:50:33] [PASSED] drm_test_connector_hdmi_init_null_vendor
[06:50:33] [PASSED] drm_test_connector_hdmi_init_product_length_exact
[06:50:33] [PASSED] drm_test_connector_hdmi_init_product_length_too_long
[06:50:33] [PASSED] drm_test_connector_hdmi_init_product_valid
[06:50:33] [PASSED] drm_test_connector_hdmi_init_vendor_length_exact
[06:50:33] [PASSED] drm_test_connector_hdmi_init_vendor_length_too_long
[06:50:33] [PASSED] drm_test_connector_hdmi_init_vendor_valid
[06:50:33] ========= drm_test_connector_hdmi_init_type_valid  =========
[06:50:33] [PASSED] HDMI-A
[06:50:33] [PASSED] HDMI-B
[06:50:33] ===== [PASSED] drm_test_connector_hdmi_init_type_valid =====
[06:50:33] ======== drm_test_connector_hdmi_init_type_invalid  ========
[06:50:33] [PASSED] Unknown
[06:50:33] [PASSED] VGA
[06:50:33] [PASSED] DVI-I
[06:50:33] [PASSED] DVI-D
[06:50:33] [PASSED] DVI-A
[06:50:33] [PASSED] Composite
[06:50:33] [PASSED] SVIDEO
[06:50:33] [PASSED] LVDS
[06:50:33] [PASSED] Component
[06:50:33] [PASSED] DIN
[06:50:33] [PASSED] DP
[06:50:33] [PASSED] TV
[06:50:33] [PASSED] eDP
[06:50:33] [PASSED] Virtual
[06:50:33] [PASSED] DSI
[06:50:33] [PASSED] DPI
[06:50:33] [PASSED] Writeback
[06:50:33] [PASSED] SPI
[06:50:33] [PASSED] USB
[06:50:33] ==== [PASSED] drm_test_connector_hdmi_init_type_invalid ====
[06:50:33] ============ [PASSED] drmm_connector_hdmi_init =============
[06:50:33] ============= drmm_connector_init (3 subtests) =============
[06:50:33] [PASSED] drm_test_drmm_connector_init
[06:50:33] [PASSED] drm_test_drmm_connector_init_null_ddc
[06:50:33] ========= drm_test_drmm_connector_init_type_valid  =========
[06:50:33] [PASSED] Unknown
[06:50:33] [PASSED] VGA
[06:50:33] [PASSED] DVI-I
[06:50:33] [PASSED] DVI-D
[06:50:33] [PASSED] DVI-A
[06:50:33] [PASSED] Composite
[06:50:33] [PASSED] SVIDEO
[06:50:33] [PASSED] LVDS
[06:50:33] [PASSED] Component
[06:50:33] [PASSED] DIN
[06:50:33] [PASSED] DP
[06:50:33] [PASSED] HDMI-A
[06:50:33] [PASSED] HDMI-B
[06:50:33] [PASSED] TV
[06:50:33] [PASSED] eDP
[06:50:33] [PASSED] Virtual
[06:50:33] [PASSED] DSI
[06:50:33] [PASSED] DPI
[06:50:33] [PASSED] Writeback
[06:50:33] [PASSED] SPI
[06:50:33] [PASSED] USB
[06:50:33] ===== [PASSED] drm_test_drmm_connector_init_type_valid =====
[06:50:33] =============== [PASSED] drmm_connector_init ===============
[06:50:33] ========= drm_connector_dynamic_init (6 subtests) ==========
[06:50:33] [PASSED] drm_test_drm_connector_dynamic_init
[06:50:33] [PASSED] drm_test_drm_connector_dynamic_init_null_ddc
[06:50:33] [PASSED] drm_test_drm_connector_dynamic_init_not_added
[06:50:33] [PASSED] drm_test_drm_connector_dynamic_init_properties
[06:50:33] ===== drm_test_drm_connector_dynamic_init_type_valid  ======
[06:50:33] [PASSED] Unknown
[06:50:33] [PASSED] VGA
[06:50:33] [PASSED] DVI-I
[06:50:33] [PASSED] DVI-D
[06:50:33] [PASSED] DVI-A
[06:50:33] [PASSED] Composite
[06:50:33] [PASSED] SVIDEO
[06:50:33] [PASSED] LVDS
[06:50:33] [PASSED] Component
[06:50:33] [PASSED] DIN
[06:50:33] [PASSED] DP
[06:50:33] [PASSED] HDMI-A
[06:50:33] [PASSED] HDMI-B
[06:50:33] [PASSED] TV
[06:50:33] [PASSED] eDP
[06:50:33] [PASSED] Virtual
[06:50:33] [PASSED] DSI
[06:50:33] [PASSED] DPI
[06:50:33] [PASSED] Writeback
[06:50:33] [PASSED] SPI
[06:50:33] [PASSED] USB
[06:50:33] = [PASSED] drm_test_drm_connector_dynamic_init_type_valid ==
[06:50:33] ======== drm_test_drm_connector_dynamic_init_name  =========
[06:50:33] [PASSED] Unknown
[06:50:33] [PASSED] VGA
[06:50:33] [PASSED] DVI-I
[06:50:33] [PASSED] DVI-D
[06:50:33] [PASSED] DVI-A
[06:50:33] [PASSED] Composite
[06:50:33] [PASSED] SVIDEO
[06:50:33] [PASSED] LVDS
[06:50:33] [PASSED] Component
[06:50:33] [PASSED] DIN
[06:50:33] [PASSED] DP
[06:50:33] [PASSED] HDMI-A
[06:50:33] [PASSED] HDMI-B
[06:50:33] [PASSED] TV
[06:50:33] [PASSED] eDP
[06:50:33] [PASSED] Virtual
[06:50:33] [PASSED] DSI
[06:50:33] [PASSED] DPI
[06:50:33] [PASSED] Writeback
[06:50:33] [PASSED] SPI
[06:50:33] [PASSED] USB
[06:50:33] ==== [PASSED] drm_test_drm_connector_dynamic_init_name =====
[06:50:33] =========== [PASSED] drm_connector_dynamic_init ============
[06:50:33] ==== drm_connector_dynamic_register_early (4 subtests) =====
[06:50:33] [PASSED] drm_test_drm_connector_dynamic_register_early_on_list
[06:50:33] [PASSED] drm_test_drm_connector_dynamic_register_early_defer
[06:50:33] [PASSED] drm_test_drm_connector_dynamic_register_early_no_init
[06:50:33] [PASSED] drm_test_drm_connector_dynamic_register_early_no_mode_object
[06:50:33] ====== [PASSED] drm_connector_dynamic_register_early =======
[06:50:33] ======= drm_connector_dynamic_register (7 subtests) ========
[06:50:33] [PASSED] drm_test_drm_connector_dynamic_register_on_list
[06:50:33] [PASSED] drm_test_drm_connector_dynamic_register_no_defer
[06:50:33] [PASSED] drm_test_drm_connector_dynamic_register_no_init
[06:50:33] [PASSED] drm_test_drm_connector_dynamic_register_mode_object
[06:50:33] [PASSED] drm_test_drm_connector_dynamic_register_sysfs
[06:50:33] [PASSED] drm_test_drm_connector_dynamic_register_sysfs_name
[06:50:33] [PASSED] drm_test_drm_connector_dynamic_register_debugfs
[06:50:33] ========= [PASSED] drm_connector_dynamic_register ==========
[06:50:33] = drm_connector_attach_broadcast_rgb_property (2 subtests) =
[06:50:33] [PASSED] drm_test_drm_connector_attach_broadcast_rgb_property
[06:50:33] [PASSED] drm_test_drm_connector_attach_broadcast_rgb_property_hdmi_connector
[06:50:33] === [PASSED] drm_connector_attach_broadcast_rgb_property ===
[06:50:33] ========== drm_get_tv_mode_from_name (2 subtests) ==========
[06:50:33] ========== drm_test_get_tv_mode_from_name_valid  ===========
[06:50:33] [PASSED] NTSC
[06:50:33] [PASSED] NTSC-443
[06:50:33] [PASSED] NTSC-J
[06:50:33] [PASSED] PAL
[06:50:33] [PASSED] PAL-M
[06:50:33] [PASSED] PAL-N
[06:50:33] [PASSED] SECAM
[06:50:33] [PASSED] Mono
[06:50:33] ====== [PASSED] drm_test_get_tv_mode_from_name_valid =======
[06:50:33] [PASSED] drm_test_get_tv_mode_from_name_truncated
[06:50:33] ============ [PASSED] drm_get_tv_mode_from_name ============
[06:50:33] = drm_test_connector_hdmi_compute_mode_clock (12 subtests) =
[06:50:33] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb
[06:50:33] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_10bpc
[06:50:33] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_10bpc_vic_1
[06:50:33] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_12bpc
[06:50:33] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_12bpc_vic_1
[06:50:33] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_double
[06:50:33] = drm_test_connector_hdmi_compute_mode_clock_yuv420_valid  =
[06:50:33] [PASSED] VIC 96
[06:50:33] [PASSED] VIC 97
[06:50:33] [PASSED] VIC 101
[06:50:33] [PASSED] VIC 102
[06:50:33] [PASSED] VIC 106
[06:50:33] [PASSED] VIC 107
[06:50:33] === [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_valid ===
[06:50:33] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_10_bpc
[06:50:33] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_12_bpc
[06:50:33] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_8_bpc
[06:50:33] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_10_bpc
[06:50:33] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_12_bpc
[06:50:33] === [PASSED] drm_test_connector_hdmi_compute_mode_clock ====
[06:50:33] == drm_hdmi_connector_get_broadcast_rgb_name (2 subtests) ==
[06:50:33] === drm_test_drm_hdmi_connector_get_broadcast_rgb_name  ====
[06:50:33] [PASSED] Automatic
[06:50:33] [PASSED] Full
[06:50:33] [PASSED] Limited 16:235
[06:50:33] === [PASSED] drm_test_drm_hdmi_connector_get_broadcast_rgb_name ===
[06:50:33] [PASSED] drm_test_drm_hdmi_connector_get_broadcast_rgb_name_invalid
[06:50:33] ==== [PASSED] drm_hdmi_connector_get_broadcast_rgb_name ====
[06:50:33] == drm_hdmi_connector_get_output_format_name (2 subtests) ==
[06:50:33] === drm_test_drm_hdmi_connector_get_output_format_name  ====
[06:50:33] [PASSED] RGB
[06:50:33] [PASSED] YUV 4:2:0
[06:50:33] [PASSED] YUV 4:2:2
[06:50:33] [PASSED] YUV 4:4:4
[06:50:33] === [PASSED] drm_test_drm_hdmi_connector_get_output_format_name ===
[06:50:33] [PASSED] drm_test_drm_hdmi_connector_get_output_format_name_invalid
[06:50:33] ==== [PASSED] drm_hdmi_connector_get_output_format_name ====
[06:50:33] ============= drm_damage_helper (21 subtests) ==============
[06:50:33] [PASSED] drm_test_damage_iter_no_damage
[06:50:33] [PASSED] drm_test_damage_iter_no_damage_fractional_src
[06:50:33] [PASSED] drm_test_damage_iter_no_damage_src_moved
[06:50:33] [PASSED] drm_test_damage_iter_no_damage_fractional_src_moved
[06:50:33] [PASSED] drm_test_damage_iter_no_damage_not_visible
[06:50:33] [PASSED] drm_test_damage_iter_no_damage_no_crtc
[06:50:33] [PASSED] drm_test_damage_iter_no_damage_no_fb
[06:50:33] [PASSED] drm_test_damage_iter_simple_damage
[06:50:33] [PASSED] drm_test_damage_iter_single_damage
[06:50:33] [PASSED] drm_test_damage_iter_single_damage_intersect_src
[06:50:33] [PASSED] drm_test_damage_iter_single_damage_outside_src
[06:50:33] [PASSED] drm_test_damage_iter_single_damage_fractional_src
[06:50:33] [PASSED] drm_test_damage_iter_single_damage_intersect_fractional_src
[06:50:33] [PASSED] drm_test_damage_iter_single_damage_outside_fractional_src
[06:50:33] [PASSED] drm_test_damage_iter_single_damage_src_moved
[06:50:33] [PASSED] drm_test_damage_iter_single_damage_fractional_src_moved
[06:50:33] [PASSED] drm_test_damage_iter_damage
[06:50:33] [PASSED] drm_test_damage_iter_damage_one_intersect
[06:50:33] [PASSED] drm_test_damage_iter_damage_one_outside
[06:50:33] [PASSED] drm_test_damage_iter_damage_src_moved
[06:50:33] [PASSED] drm_test_damage_iter_damage_not_visible
[06:50:33] ================ [PASSED] drm_damage_helper ================
[06:50:33] ============== drm_dp_mst_helper (3 subtests) ==============
[06:50:33] ============== drm_test_dp_mst_calc_pbn_mode  ==============
[06:50:33] [PASSED] Clock 154000 BPP 30 DSC disabled
[06:50:33] [PASSED] Clock 234000 BPP 30 DSC disabled
[06:50:33] [PASSED] Clock 297000 BPP 24 DSC disabled
[06:50:33] [PASSED] Clock 332880 BPP 24 DSC enabled
[06:50:33] [PASSED] Clock 324540 BPP 24 DSC enabled
[06:50:33] ========== [PASSED] drm_test_dp_mst_calc_pbn_mode ==========
[06:50:33] ============== drm_test_dp_mst_calc_pbn_div  ===============
[06:50:33] [PASSED] Link rate 2000000 lane count 4
[06:50:33] [PASSED] Link rate 2000000 lane count 2
[06:50:33] [PASSED] Link rate 2000000 lane count 1
[06:50:33] [PASSED] Link rate 1350000 lane count 4
[06:50:33] [PASSED] Link rate 1350000 lane count 2
[06:50:33] [PASSED] Link rate 1350000 lane count 1
[06:50:33] [PASSED] Link rate 1000000 lane count 4
[06:50:33] [PASSED] Link rate 1000000 lane count 2
[06:50:33] [PASSED] Link rate 1000000 lane count 1
[06:50:33] [PASSED] Link rate 810000 lane count 4
[06:50:33] [PASSED] Link rate 810000 lane count 2
[06:50:33] [PASSED] Link rate 810000 lane count 1
[06:50:33] [PASSED] Link rate 540000 lane count 4
[06:50:33] [PASSED] Link rate 540000 lane count 2
[06:50:33] [PASSED] Link rate 540000 lane count 1
[06:50:33] [PASSED] Link rate 270000 lane count 4
[06:50:33] [PASSED] Link rate 270000 lane count 2
[06:50:33] [PASSED] Link rate 270000 lane count 1
[06:50:33] [PASSED] Link rate 162000 lane count 4
[06:50:33] [PASSED] Link rate 162000 lane count 2
[06:50:33] [PASSED] Link rate 162000 lane count 1
[06:50:33] ========== [PASSED] drm_test_dp_mst_calc_pbn_div ===========
[06:50:33] ========= drm_test_dp_mst_sideband_msg_req_decode  =========
[06:50:33] [PASSED] DP_ENUM_PATH_RESOURCES with port number
[06:50:33] [PASSED] DP_POWER_UP_PHY with port number
[06:50:33] [PASSED] DP_POWER_DOWN_PHY with port number
[06:50:33] [PASSED] DP_ALLOCATE_PAYLOAD with SDP stream sinks
[06:50:33] [PASSED] DP_ALLOCATE_PAYLOAD with port number
[06:50:33] [PASSED] DP_ALLOCATE_PAYLOAD with VCPI
[06:50:33] [PASSED] DP_ALLOCATE_PAYLOAD with PBN
[06:50:33] [PASSED] DP_QUERY_PAYLOAD with port number
[06:50:33] [PASSED] DP_QUERY_PAYLOAD with VCPI
[06:50:33] [PASSED] DP_REMOTE_DPCD_READ with port number
[06:50:33] [PASSED] DP_REMOTE_DPCD_READ with DPCD address
[06:50:33] [PASSED] DP_REMOTE_DPCD_READ with max number of bytes
[06:50:33] [PASSED] DP_REMOTE_DPCD_WRITE with port number
[06:50:33] [PASSED] DP_REMOTE_DPCD_WRITE with DPCD address
[06:50:33] [PASSED] DP_REMOTE_DPCD_WRITE with data array
[06:50:33] [PASSED] DP_REMOTE_I2C_READ with port number
[06:50:33] [PASSED] DP_REMOTE_I2C_READ with I2C device ID
[06:50:33] [PASSED] DP_REMOTE_I2C_READ with transactions array
[06:50:33] [PASSED] DP_REMOTE_I2C_WRITE with port number
[06:50:33] [PASSED] DP_REMOTE_I2C_WRITE with I2C device ID
[06:50:33] [PASSED] DP_REMOTE_I2C_WRITE with data array
[06:50:33] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream ID
[06:50:33] [PASSED] DP_QUERY_STREAM_ENC_STATUS with client ID
[06:50:33] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream event
[06:50:33] [PASSED] DP_QUERY_STREAM_ENC_STATUS with valid stream event
[06:50:33] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream behavior
[06:50:33] [PASSED] DP_QUERY_STREAM_ENC_STATUS with a valid stream behavior
[06:50:33] ===== [PASSED] drm_test_dp_mst_sideband_msg_req_decode =====
[06:50:33] ================ [PASSED] drm_dp_mst_helper ================
[06:50:33] ================== drm_exec (7 subtests) ===================
[06:50:33] [PASSED] sanitycheck
[06:50:33] [PASSED] test_lock
[06:50:33] [PASSED] test_lock_unlock
[06:50:33] [PASSED] test_duplicates
[06:50:33] [PASSED] test_prepare
[06:50:33] [PASSED] test_prepare_array
[06:50:33] [PASSED] test_multiple_loops
[06:50:33] ==================== [PASSED] drm_exec =====================
[06:50:33] =========== drm_format_helper_test (17 subtests) ===========
[06:50:33] ============== drm_test_fb_xrgb8888_to_gray8  ==============
[06:50:33] [PASSED] single_pixel_source_buffer
[06:50:33] [PASSED] single_pixel_clip_rectangle
[06:50:33] [PASSED] well_known_colors
[06:50:33] [PASSED] destination_pitch
[06:50:33] ========== [PASSED] drm_test_fb_xrgb8888_to_gray8 ==========
[06:50:33] ============= drm_test_fb_xrgb8888_to_rgb332  ==============
[06:50:33] [PASSED] single_pixel_source_buffer
[06:50:33] [PASSED] single_pixel_clip_rectangle
[06:50:33] [PASSED] well_known_colors
[06:50:33] [PASSED] destination_pitch
[06:50:33] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb332 ==========
[06:50:33] ============= drm_test_fb_xrgb8888_to_rgb565  ==============
[06:50:33] [PASSED] single_pixel_source_buffer
[06:50:33] [PASSED] single_pixel_clip_rectangle
[06:50:33] [PASSED] well_known_colors
[06:50:33] [PASSED] destination_pitch
[06:50:33] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb565 ==========
[06:50:33] ============ drm_test_fb_xrgb8888_to_xrgb1555  =============
[06:50:33] [PASSED] single_pixel_source_buffer
[06:50:33] [PASSED] single_pixel_clip_rectangle
[06:50:33] [PASSED] well_known_colors
[06:50:33] [PASSED] destination_pitch
[06:50:33] ======== [PASSED] drm_test_fb_xrgb8888_to_xrgb1555 =========
[06:50:33] ============ drm_test_fb_xrgb8888_to_argb1555  =============
[06:50:33] [PASSED] single_pixel_source_buffer
[06:50:33] [PASSED] single_pixel_clip_rectangle
[06:50:33] [PASSED] well_known_colors
[06:50:33] [PASSED] destination_pitch
[06:50:33] ======== [PASSED] drm_test_fb_xrgb8888_to_argb1555 =========
[06:50:33] ============ drm_test_fb_xrgb8888_to_rgba5551  =============
[06:50:33] [PASSED] single_pixel_source_buffer
[06:50:33] [PASSED] single_pixel_clip_rectangle
[06:50:33] [PASSED] well_known_colors
[06:50:33] [PASSED] destination_pitch
[06:50:33] ======== [PASSED] drm_test_fb_xrgb8888_to_rgba5551 =========
[06:50:33] ============= drm_test_fb_xrgb8888_to_rgb888  ==============
[06:50:33] [PASSED] single_pixel_source_buffer
[06:50:33] [PASSED] single_pixel_clip_rectangle
[06:50:33] [PASSED] well_known_colors
[06:50:33] [PASSED] destination_pitch
[06:50:33] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb888 ==========
[06:50:33] ============= drm_test_fb_xrgb8888_to_bgr888  ==============
[06:50:33] [PASSED] single_pixel_source_buffer
[06:50:33] [PASSED] single_pixel_clip_rectangle
[06:50:33] [PASSED] well_known_colors
[06:50:33] [PASSED] destination_pitch
[06:50:33] ========= [PASSED] drm_test_fb_xrgb8888_to_bgr888 ==========
[06:50:33] ============ drm_test_fb_xrgb8888_to_argb8888  =============
[06:50:33] [PASSED] single_pixel_source_buffer
[06:50:33] [PASSED] single_pixel_clip_rectangle
[06:50:33] [PASSED] well_known_colors
[06:50:33] [PASSED] destination_pitch
[06:50:33] ======== [PASSED] drm_test_fb_xrgb8888_to_argb8888 =========
[06:50:33] =========== drm_test_fb_xrgb8888_to_xrgb2101010  ===========
[06:50:33] [PASSED] single_pixel_source_buffer
[06:50:33] [PASSED] single_pixel_clip_rectangle
[06:50:33] [PASSED] well_known_colors
[06:50:33] [PASSED] destination_pitch
[06:50:33] ======= [PASSED] drm_test_fb_xrgb8888_to_xrgb2101010 =======
[06:50:33] =========== drm_test_fb_xrgb8888_to_argb2101010  ===========
[06:50:33] [PASSED] single_pixel_source_buffer
[06:50:33] [PASSED] single_pixel_clip_rectangle
[06:50:33] [PASSED] well_known_colors
[06:50:33] [PASSED] destination_pitch
[06:50:33] ======= [PASSED] drm_test_fb_xrgb8888_to_argb2101010 =======
[06:50:33] ============== drm_test_fb_xrgb8888_to_mono  ===============
[06:50:33] [PASSED] single_pixel_source_buffer
[06:50:33] [PASSED] single_pixel_clip_rectangle
[06:50:33] [PASSED] well_known_colors
[06:50:33] [PASSED] destination_pitch
[06:50:33] ========== [PASSED] drm_test_fb_xrgb8888_to_mono ===========
[06:50:33] ==================== drm_test_fb_swab  =====================
[06:50:33] [PASSED] single_pixel_source_buffer
[06:50:33] [PASSED] single_pixel_clip_rectangle
[06:50:33] [PASSED] well_known_colors
[06:50:33] [PASSED] destination_pitch
[06:50:33] ================ [PASSED] drm_test_fb_swab =================
[06:50:33] ============ drm_test_fb_xrgb8888_to_xbgr8888  =============
[06:50:33] [PASSED] single_pixel_source_buffer
[06:50:33] [PASSED] single_pixel_clip_rectangle
[06:50:33] [PASSED] well_known_colors
[06:50:33] [PASSED] destination_pitch
[06:50:33] ======== [PASSED] drm_test_fb_xrgb8888_to_xbgr8888 =========
[06:50:33] ============ drm_test_fb_xrgb8888_to_abgr8888  =============
[06:50:33] [PASSED] single_pixel_source_buffer
[06:50:33] [PASSED] single_pixel_clip_rectangle
[06:50:33] [PASSED] well_known_colors
[06:50:33] [PASSED] destination_pitch
[06:50:33] ======== [PASSED] drm_test_fb_xrgb8888_to_abgr8888 =========
[06:50:33] ================= drm_test_fb_clip_offset  =================
[06:50:33] [PASSED] pass through
[06:50:33] [PASSED] horizontal offset
[06:50:33] [PASSED] vertical offset
[06:50:33] [PASSED] horizontal and vertical offset
[06:50:33] [PASSED] horizontal offset (custom pitch)
[06:50:33] [PASSED] vertical offset (custom pitch)
[06:50:33] [PASSED] horizontal and vertical offset (custom pitch)
[06:50:33] ============= [PASSED] drm_test_fb_clip_offset =============
[06:50:33] =================== drm_test_fb_memcpy  ====================
[06:50:33] [PASSED] single_pixel_source_buffer: XR24 little-endian (0x34325258)
[06:50:33] [PASSED] single_pixel_source_buffer: XRA8 little-endian (0x38415258)
[06:50:33] [PASSED] single_pixel_source_buffer: YU24 little-endian (0x34325559)
[06:50:33] [PASSED] single_pixel_clip_rectangle: XB24 little-endian (0x34324258)
[06:50:33] [PASSED] single_pixel_clip_rectangle: XRA8 little-endian (0x38415258)
[06:50:33] [PASSED] single_pixel_clip_rectangle: YU24 little-endian (0x34325559)
[06:50:33] [PASSED] well_known_colors: XB24 little-endian (0x34324258)
[06:50:33] [PASSED] well_known_colors: XRA8 little-endian (0x38415258)
[06:50:33] [PASSED] well_known_colors: YU24 little-endian (0x34325559)
[06:50:33] [PASSED] destination_pitch: XB24 little-endian (0x34324258)
[06:50:33] [PASSED] destination_pitch: XRA8 little-endian (0x38415258)
[06:50:33] [PASSED] destination_pitch: YU24 little-endian (0x34325559)
[06:50:33] =============== [PASSED] drm_test_fb_memcpy ================
[06:50:33] ============= [PASSED] drm_format_helper_test ==============
[06:50:33] ================= drm_format (18 subtests) =================
[06:50:33] [PASSED] drm_test_format_block_width_invalid
[06:50:33] [PASSED] drm_test_format_block_width_one_plane
[06:50:33] [PASSED] drm_test_format_block_width_two_plane
[06:50:33] [PASSED] drm_test_format_block_width_three_plane
[06:50:33] [PASSED] drm_test_format_block_width_tiled
[06:50:33] [PASSED] drm_test_format_block_height_invalid
[06:50:33] [PASSED] drm_test_format_block_height_one_plane
[06:50:33] [PASSED] drm_test_format_block_height_two_plane
[06:50:33] [PASSED] drm_test_format_block_height_three_plane
[06:50:33] [PASSED] drm_test_format_block_height_tiled
[06:50:33] [PASSED] drm_test_format_min_pitch_invalid
[06:50:33] [PASSED] drm_test_format_min_pitch_one_plane_8bpp
[06:50:33] [PASSED] drm_test_format_min_pitch_one_plane_16bpp
[06:50:33] [PASSED] drm_test_format_min_pitch_one_plane_24bpp
[06:50:33] [PASSED] drm_test_format_min_pitch_one_plane_32bpp
[06:50:33] [PASSED] drm_test_format_min_pitch_two_plane
[06:50:33] [PASSED] drm_test_format_min_pitch_three_plane_8bpp
[06:50:33] [PASSED] drm_test_format_min_pitch_tiled
[06:50:33] =================== [PASSED] drm_format ====================
[06:50:33] ============== drm_framebuffer (10 subtests) ===============
[06:50:33] ========== drm_test_framebuffer_check_src_coords  ==========
[06:50:33] [PASSED] Success: source fits into fb
[06:50:33] [PASSED] Fail: overflowing fb with x-axis coordinate
[06:50:33] [PASSED] Fail: overflowing fb with y-axis coordinate
[06:50:33] [PASSED] Fail: overflowing fb with source width
[06:50:33] [PASSED] Fail: overflowing fb with source height
[06:50:33] ====== [PASSED] drm_test_framebuffer_check_src_coords ======
[06:50:33] [PASSED] drm_test_framebuffer_cleanup
[06:50:33] =============== drm_test_framebuffer_create  ===============
[06:50:33] [PASSED] ABGR8888 normal sizes
[06:50:33] [PASSED] ABGR8888 max sizes
[06:50:33] [PASSED] ABGR8888 pitch greater than min required
[06:50:33] [PASSED] ABGR8888 pitch less than min required
[06:50:33] [PASSED] ABGR8888 Invalid width
[06:50:33] [PASSED] ABGR8888 Invalid buffer handle
[06:50:33] [PASSED] No pixel format
[06:50:33] [PASSED] ABGR8888 Width 0
[06:50:33] [PASSED] ABGR8888 Height 0
[06:50:33] [PASSED] ABGR8888 Out of bound height * pitch combination
[06:50:33] [PASSED] ABGR8888 Large buffer offset
[06:50:33] [PASSED] ABGR8888 Buffer offset for inexistent plane
[06:50:33] [PASSED] ABGR8888 Invalid flag
[06:50:33] [PASSED] ABGR8888 Set DRM_MODE_FB_MODIFIERS without modifiers
[06:50:33] [PASSED] ABGR8888 Valid buffer modifier
[06:50:33] [PASSED] ABGR8888 Invalid buffer modifier(DRM_FORMAT_MOD_SAMSUNG_64_32_TILE)
[06:50:33] [PASSED] ABGR8888 Extra pitches without DRM_MODE_FB_MODIFIERS
[06:50:33] [PASSED] ABGR8888 Extra pitches with DRM_MODE_FB_MODIFIERS
[06:50:33] [PASSED] NV12 Normal sizes
[06:50:33] [PASSED] NV12 Max sizes
[06:50:33] [PASSED] NV12 Invalid pitch
[06:50:33] [PASSED] NV12 Invalid modifier/missing DRM_MODE_FB_MODIFIERS flag
[06:50:33] [PASSED] NV12 different  modifier per-plane
[06:50:33] [PASSED] NV12 with DRM_FORMAT_MOD_SAMSUNG_64_32_TILE
[06:50:33] [PASSED] NV12 Valid modifiers without DRM_MODE_FB_MODIFIERS
[06:50:33] [PASSED] NV12 Modifier for inexistent plane
[06:50:33] [PASSED] NV12 Handle for inexistent plane
[06:50:33] [PASSED] NV12 Handle for inexistent plane without DRM_MODE_FB_MODIFIERS
[06:50:33] [PASSED] YVU420 DRM_MODE_FB_MODIFIERS set without modifier
[06:50:33] [PASSED] YVU420 Normal sizes
[06:50:33] [PASSED] YVU420 Max sizes
[06:50:33] [PASSED] YVU420 Invalid pitch
[06:50:33] [PASSED] YVU420 Different pitches
[06:50:33] [PASSED] YVU420 Different buffer offsets/pitches
[06:50:33] [PASSED] YVU420 Modifier set just for plane 0, without DRM_MODE_FB_MODIFIERS
[06:50:33] [PASSED] YVU420 Modifier set just for planes 0, 1, without DRM_MODE_FB_MODIFIERS
[06:50:33] [PASSED] YVU420 Modifier set just for plane 0, 1, with DRM_MODE_FB_MODIFIERS
[06:50:33] [PASSED] YVU420 Valid modifier
[06:50:33] [PASSED] YVU420 Different modifiers per plane
[06:50:33] [PASSED] YVU420 Modifier for inexistent plane
[06:50:33] [PASSED] YUV420_10BIT Invalid modifier(DRM_FORMAT_MOD_LINEAR)
[06:50:33] [PASSED] X0L2 Normal sizes
[06:50:33] [PASSED] X0L2 Max sizes
[06:50:33] [PASSED] X0L2 Invalid pitch
[06:50:33] [PASSED] X0L2 Pitch greater than minimum required
[06:50:33] [PASSED] X0L2 Handle for inexistent plane
[06:50:33] [PASSED] X0L2 Offset for inexistent plane, without DRM_MODE_FB_MODIFIERS set
[06:50:33] [PASSED] X0L2 Modifier without DRM_MODE_FB_MODIFIERS set
[06:50:33] [PASSED] X0L2 Valid modifier
[06:50:33] [PASSED] X0L2 Modifier for inexistent plane
[06:50:33] =========== [PASSED] drm_test_framebuffer_create ===========
[06:50:33] [PASSED] drm_test_framebuffer_free
[06:50:33] [PASSED] drm_test_framebuffer_init
[06:50:33] [PASSED] drm_test_framebuffer_init_bad_format
[06:50:33] [PASSED] drm_test_framebuffer_init_dev_mismatch
[06:50:33] [PASSED] drm_test_framebuffer_lookup
[06:50:33] [PASSED] drm_test_framebuffer_lookup_inexistent
[06:50:33] [PASSED] drm_test_framebuffer_modifiers_not_supported
[06:50:33] ================= [PASSED] drm_framebuffer =================
[06:50:33] ================ drm_gem_shmem (8 subtests) ================
[06:50:33] [PASSED] drm_gem_shmem_test_obj_create
[06:50:33] [PASSED] drm_gem_shmem_test_obj_create_private
[06:50:33] [PASSED] drm_gem_shmem_test_pin_pages
[06:50:33] [PASSED] drm_gem_shmem_test_vmap
[06:50:33] [PASSED] drm_gem_shmem_test_get_sg_table
[06:50:33] [PASSED] drm_gem_shmem_test_get_pages_sgt
[06:50:33] [PASSED] drm_gem_shmem_test_madvise
[06:50:33] [PASSED] drm_gem_shmem_test_purge
[06:50:33] ================== [PASSED] drm_gem_shmem ==================
[06:50:33] === drm_atomic_helper_connector_hdmi_check (27 subtests) ===
[06:50:33] [PASSED] drm_test_check_broadcast_rgb_auto_cea_mode
[06:50:33] [PASSED] drm_test_check_broadcast_rgb_auto_cea_mode_vic_1
[06:50:33] [PASSED] drm_test_check_broadcast_rgb_full_cea_mode
[06:50:33] [PASSED] drm_test_check_broadcast_rgb_full_cea_mode_vic_1
[06:50:33] [PASSED] drm_test_check_broadcast_rgb_limited_cea_mode
[06:50:33] [PASSED] drm_test_check_broadcast_rgb_limited_cea_mode_vic_1
[06:50:33] ====== drm_test_check_broadcast_rgb_cea_mode_yuv420  =======
[06:50:33] [PASSED] Automatic
[06:50:33] [PASSED] Full
[06:50:33] [PASSED] Limited 16:235
[06:50:33] == [PASSED] drm_test_check_broadcast_rgb_cea_mode_yuv420 ===
[06:50:33] [PASSED] drm_test_check_broadcast_rgb_crtc_mode_changed
[06:50:33] [PASSED] drm_test_check_broadcast_rgb_crtc_mode_not_changed
[06:50:33] [PASSED] drm_test_check_disable_connector
[06:50:33] [PASSED] drm_test_check_hdmi_funcs_reject_rate
[06:50:33] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_rgb
[06:50:33] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_yuv420
[06:50:33] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_ignore_yuv422
[06:50:33] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_ignore_yuv420
[06:50:33] [PASSED] drm_test_check_driver_unsupported_fallback_yuv420
[06:50:33] [PASSED] drm_test_check_output_bpc_crtc_mode_changed
[06:50:33] [PASSED] drm_test_check_output_bpc_crtc_mode_not_changed
[06:50:33] [PASSED] drm_test_check_output_bpc_dvi
[06:50:33] [PASSED] drm_test_check_output_bpc_format_vic_1
[06:50:33] [PASSED] drm_test_check_output_bpc_format_display_8bpc_only
[06:50:33] [PASSED] drm_test_check_output_bpc_format_display_rgb_only
[06:50:33] [PASSED] drm_test_check_output_bpc_format_driver_8bpc_only
[06:50:33] [PASSED] drm_test_check_output_bpc_format_driver_rgb_only
[06:50:33] [PASSED] drm_test_check_tmds_char_rate_rgb_8bpc
[06:50:33] [PASSED] drm_test_check_tmds_char_rate_rgb_10bpc
[06:50:33] [PASSED] drm_test_check_tmds_char_rate_rgb_12bpc
[06:50:33] ===== [PASSED] drm_atomic_helper_connector_hdmi_check ======
[06:50:33] === drm_atomic_helper_connector_hdmi_reset (6 subtests) ====
[06:50:33] [PASSED] drm_test_check_broadcast_rgb_value
[06:50:33] [PASSED] drm_test_check_bpc_8_value
[06:50:33] [PASSED] drm_test_check_bpc_10_value
[06:50:33] [PASSED] drm_test_check_bpc_12_value
[06:50:33] [PASSED] drm_test_check_format_value
[06:50:33] [PASSED] drm_test_check_tmds_char_value
[06:50:33] ===== [PASSED] drm_atomic_helper_connector_hdmi_reset ======
[06:50:33] = drm_atomic_helper_connector_hdmi_mode_valid (4 subtests) =
[06:50:33] [PASSED] drm_test_check_mode_valid
[06:50:33] [PASSED] drm_test_check_mode_valid_reject
[06:50:33] [PASSED] drm_test_check_mode_valid_reject_rate
[06:50:33] [PASSED] drm_test_check_mode_valid_reject_max_clock
[06:50:33] === [PASSED] drm_atomic_helper_connector_hdmi_mode_valid ===
[06:50:33] = drm_atomic_helper_connector_hdmi_infoframes (5 subtests) =
[06:50:33] [PASSED] drm_test_check_infoframes
[06:50:33] [PASSED] drm_test_check_reject_avi_infoframe
[06:50:33] [PASSED] drm_test_check_reject_hdr_infoframe_bpc_8
[06:50:33] [PASSED] drm_test_check_reject_hdr_infoframe_bpc_10
[06:50:33] [PASSED] drm_test_check_reject_audio_infoframe
[06:50:33] === [PASSED] drm_atomic_helper_connector_hdmi_infoframes ===
[06:50:33] ================= drm_managed (2 subtests) =================
[06:50:33] [PASSED] drm_test_managed_release_action
[06:50:33] [PASSED] drm_test_managed_run_action
[06:50:33] =================== [PASSED] drm_managed ===================
[06:50:33] =================== drm_mm (6 subtests) ====================
[06:50:33] [PASSED] drm_test_mm_init
[06:50:33] [PASSED] drm_test_mm_debug
[06:50:33] [PASSED] drm_test_mm_align32
[06:50:33] [PASSED] drm_test_mm_align64
[06:50:33] [PASSED] drm_test_mm_lowest
[06:50:33] [PASSED] drm_test_mm_highest
[06:50:33] ===================== [PASSED] drm_mm ======================
[06:50:33] ============= drm_modes_analog_tv (5 subtests) =============
[06:50:33] [PASSED] drm_test_modes_analog_tv_mono_576i
[06:50:33] [PASSED] drm_test_modes_analog_tv_ntsc_480i
[06:50:33] [PASSED] drm_test_modes_analog_tv_ntsc_480i_inlined
[06:50:33] [PASSED] drm_test_modes_analog_tv_pal_576i
[06:50:33] [PASSED] drm_test_modes_analog_tv_pal_576i_inlined
[06:50:33] =============== [PASSED] drm_modes_analog_tv ===============
[06:50:33] ============== drm_plane_helper (2 subtests) ===============
[06:50:33] =============== drm_test_check_plane_state  ================
[06:50:33] [PASSED] clipping_simple
[06:50:33] [PASSED] clipping_rotate_reflect
[06:50:33] [PASSED] positioning_simple
[06:50:33] [PASSED] upscaling
[06:50:33] [PASSED] downscaling
[06:50:33] [PASSED] rounding1
[06:50:33] [PASSED] rounding2
[06:50:33] [PASSED] rounding3
[06:50:33] [PASSED] rounding4
[06:50:33] =========== [PASSED] drm_test_check_plane_state ============
[06:50:33] =========== drm_test_check_invalid_plane_state  ============
[06:50:33] [PASSED] positioning_invalid
[06:50:33] [PASSED] upscaling_invalid
[06:50:33] [PASSED] downscaling_invalid
[06:50:33] ======= [PASSED] drm_test_check_invalid_plane_state ========
[06:50:33] ================ [PASSED] drm_plane_helper =================
[06:50:33] ====== drm_connector_helper_tv_get_modes (1 subtest) =======
[06:50:33] ====== drm_test_connector_helper_tv_get_modes_check  =======
[06:50:33] [PASSED] None
[06:50:33] [PASSED] PAL
[06:50:33] [PASSED] NTSC
[06:50:33] [PASSED] Both, NTSC Default
[06:50:33] [PASSED] Both, PAL Default
[06:50:33] [PASSED] Both, NTSC Default, with PAL on command-line
[06:50:33] [PASSED] Both, PAL Default, with NTSC on command-line
[06:50:33] == [PASSED] drm_test_connector_helper_tv_get_modes_check ===
[06:50:33] ======== [PASSED] drm_connector_helper_tv_get_modes ========
[06:50:33] ================== drm_rect (9 subtests) ===================
[06:50:33] [PASSED] drm_test_rect_clip_scaled_div_by_zero
[06:50:33] [PASSED] drm_test_rect_clip_scaled_not_clipped
[06:50:33] [PASSED] drm_test_rect_clip_scaled_clipped
[06:50:33] [PASSED] drm_test_rect_clip_scaled_signed_vs_unsigned
[06:50:33] ================= drm_test_rect_intersect  =================
[06:50:33] [PASSED] top-left x bottom-right: 2x2+1+1 x 2x2+0+0
[06:50:33] [PASSED] top-right x bottom-left: 2x2+0+0 x 2x2+1-1
[06:50:33] [PASSED] bottom-left x top-right: 2x2+1-1 x 2x2+0+0
[06:50:33] [PASSED] bottom-right x top-left: 2x2+0+0 x 2x2+1+1
[06:50:33] [PASSED] right x left: 2x1+0+0 x 3x1+1+0
[06:50:33] [PASSED] left x right: 3x1+1+0 x 2x1+0+0
[06:50:33] [PASSED] up x bottom: 1x2+0+0 x 1x3+0-1
[06:50:33] [PASSED] bottom x up: 1x3+0-1 x 1x2+0+0
[06:50:33] [PASSED] touching corner: 1x1+0+0 x 2x2+1+1
[06:50:33] [PASSED] touching side: 1x1+0+0 x 1x1+1+0
[06:50:33] [PASSED] equal rects: 2x2+0+0 x 2x2+0+0
[06:50:33] [PASSED] inside another: 2x2+0+0 x 1x1+1+1
[06:50:33] [PASSED] far away: 1x1+0+0 x 1x1+3+6
[06:50:33] [PASSED] points intersecting: 0x0+5+10 x 0x0+5+10
[06:50:33] [PASSED] points not intersecting: 0x0+0+0 x 0x0+5+10
[06:50:33] ============= [PASSED] drm_test_rect_intersect =============
[06:50:33] ================ drm_test_rect_calc_hscale  ================
[06:50:33] [PASSED] normal use
[06:50:33] [PASSED] out of max range
[06:50:33] [PASSED] out of min range
[06:50:33] [PASSED] zero dst
[06:50:33] [PASSED] negative src
[06:50:33] [PASSED] negative dst
[06:50:33] ============ [PASSED] drm_test_rect_calc_hscale ============
[06:50:33] ================ drm_test_rect_calc_vscale  ================
[06:50:33] [PASSED] normal use
[06:50:33] [PASSED] out of max range
[06:50:33] [PASSED] out of min range
[06:50:33] [PASSED] zero dst
[06:50:33] [PASSED] negative src
[06:50:33] [PASSED] negative dst
stty: 'standard input': Inappropriate ioctl for device
[06:50:33] ============ [PASSED] drm_test_rect_calc_vscale ============
[06:50:33] ================== drm_test_rect_rotate  ===================
[06:50:33] [PASSED] reflect-x
[06:50:33] [PASSED] reflect-y
[06:50:33] [PASSED] rotate-0
[06:50:33] [PASSED] rotate-90
[06:50:33] [PASSED] rotate-180
[06:50:33] [PASSED] rotate-270
[06:50:33] ============== [PASSED] drm_test_rect_rotate ===============
[06:50:33] ================ drm_test_rect_rotate_inv  =================
[06:50:33] [PASSED] reflect-x
[06:50:33] [PASSED] reflect-y
[06:50:33] [PASSED] rotate-0
[06:50:33] [PASSED] rotate-90
[06:50:33] [PASSED] rotate-180
[06:50:33] [PASSED] rotate-270
[06:50:33] ============ [PASSED] drm_test_rect_rotate_inv =============
[06:50:33] ==================== [PASSED] drm_rect =====================
[06:50:33] ============ drm_sysfb_modeset_test (1 subtest) ============
[06:50:33] ============ drm_test_sysfb_build_fourcc_list  =============
[06:50:33] [PASSED] no native formats
[06:50:33] [PASSED] XRGB8888 as native format
[06:50:33] [PASSED] remove duplicates
[06:50:33] [PASSED] convert alpha formats
[06:50:33] [PASSED] random formats
[06:50:33] ======== [PASSED] drm_test_sysfb_build_fourcc_list =========
[06:50:33] ============= [PASSED] drm_sysfb_modeset_test ==============
[06:50:33] ================== drm_fixp (2 subtests) ===================
[06:50:33] [PASSED] drm_test_int2fixp
[06:50:33] [PASSED] drm_test_sm2fixp
[06:50:33] ==================== [PASSED] drm_fixp =====================
[06:50:33] ============================================================
[06:50:33] Testing complete. Ran 621 tests: passed: 621
[06:50:33] Elapsed time: 26.203s total, 1.731s configuring, 24.292s building, 0.179s running

+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/ttm/tests/.kunitconfig
[06:50:33] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[06:50:35] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[06:50:44] Starting KUnit Kernel (1/1)...
[06:50:44] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[06:50:44] ================= ttm_device (5 subtests) ==================
[06:50:44] [PASSED] ttm_device_init_basic
[06:50:44] [PASSED] ttm_device_init_multiple
[06:50:44] [PASSED] ttm_device_fini_basic
[06:50:44] [PASSED] ttm_device_init_no_vma_man
[06:50:44] ================== ttm_device_init_pools  ==================
[06:50:44] [PASSED] No DMA allocations, no DMA32 required
[06:50:44] [PASSED] DMA allocations, DMA32 required
[06:50:44] [PASSED] No DMA allocations, DMA32 required
[06:50:44] [PASSED] DMA allocations, no DMA32 required
[06:50:44] ============== [PASSED] ttm_device_init_pools ==============
[06:50:44] =================== [PASSED] ttm_device ====================
[06:50:44] ================== ttm_pool (8 subtests) ===================
[06:50:44] ================== ttm_pool_alloc_basic  ===================
[06:50:44] [PASSED] One page
[06:50:44] [PASSED] More than one page
[06:50:44] [PASSED] Above the allocation limit
[06:50:44] [PASSED] One page, with coherent DMA mappings enabled
[06:50:44] [PASSED] Above the allocation limit, with coherent DMA mappings enabled
[06:50:44] ============== [PASSED] ttm_pool_alloc_basic ===============
[06:50:44] ============== ttm_pool_alloc_basic_dma_addr  ==============
[06:50:44] [PASSED] One page
[06:50:44] [PASSED] More than one page
[06:50:44] [PASSED] Above the allocation limit
[06:50:44] [PASSED] One page, with coherent DMA mappings enabled
[06:50:44] [PASSED] Above the allocation limit, with coherent DMA mappings enabled
[06:50:44] ========== [PASSED] ttm_pool_alloc_basic_dma_addr ==========
[06:50:44] [PASSED] ttm_pool_alloc_order_caching_match
[06:50:44] [PASSED] ttm_pool_alloc_caching_mismatch
[06:50:44] [PASSED] ttm_pool_alloc_order_mismatch
[06:50:44] [PASSED] ttm_pool_free_dma_alloc
[06:50:44] [PASSED] ttm_pool_free_no_dma_alloc
[06:50:44] [PASSED] ttm_pool_fini_basic
[06:50:44] ==================== [PASSED] ttm_pool =====================
[06:50:44] ================ ttm_resource (8 subtests) =================
[06:50:44] ================= ttm_resource_init_basic  =================
[06:50:44] [PASSED] Init resource in TTM_PL_SYSTEM
[06:50:44] [PASSED] Init resource in TTM_PL_VRAM
[06:50:44] [PASSED] Init resource in a private placement
[06:50:44] [PASSED] Init resource in TTM_PL_SYSTEM, set placement flags
[06:50:44] ============= [PASSED] ttm_resource_init_basic =============
[06:50:44] [PASSED] ttm_resource_init_pinned
[06:50:44] [PASSED] ttm_resource_fini_basic
[06:50:44] [PASSED] ttm_resource_manager_init_basic
[06:50:44] [PASSED] ttm_resource_manager_usage_basic
[06:50:44] [PASSED] ttm_resource_manager_set_used_basic
[06:50:44] [PASSED] ttm_sys_man_alloc_basic
[06:50:44] [PASSED] ttm_sys_man_free_basic
[06:50:44] ================== [PASSED] ttm_resource ===================
[06:50:44] =================== ttm_tt (15 subtests) ===================
[06:50:44] ==================== ttm_tt_init_basic  ====================
[06:50:44] [PASSED] Page-aligned size
[06:50:44] [PASSED] Extra pages requested
[06:50:44] ================ [PASSED] ttm_tt_init_basic ================
[06:50:44] [PASSED] ttm_tt_init_misaligned
[06:50:44] [PASSED] ttm_tt_fini_basic
[06:50:44] [PASSED] ttm_tt_fini_sg
[06:50:44] [PASSED] ttm_tt_fini_shmem
[06:50:44] [PASSED] ttm_tt_create_basic
[06:50:44] [PASSED] ttm_tt_create_invalid_bo_type
[06:50:44] [PASSED] ttm_tt_create_ttm_exists
[06:50:44] [PASSED] ttm_tt_create_failed
[06:50:44] [PASSED] ttm_tt_destroy_basic
[06:50:44] [PASSED] ttm_tt_populate_null_ttm
[06:50:44] [PASSED] ttm_tt_populate_populated_ttm
[06:50:44] [PASSED] ttm_tt_unpopulate_basic
[06:50:44] [PASSED] ttm_tt_unpopulate_empty_ttm
[06:50:44] [PASSED] ttm_tt_swapin_basic
[06:50:44] ===================== [PASSED] ttm_tt ======================
[06:50:44] =================== ttm_bo (14 subtests) ===================
[06:50:44] =========== ttm_bo_reserve_optimistic_no_ticket  ===========
[06:50:44] [PASSED] Cannot be interrupted and sleeps
[06:50:44] [PASSED] Cannot be interrupted, locks straight away
[06:50:44] [PASSED] Can be interrupted, sleeps
[06:50:44] ======= [PASSED] ttm_bo_reserve_optimistic_no_ticket =======
[06:50:44] [PASSED] ttm_bo_reserve_locked_no_sleep
[06:50:44] [PASSED] ttm_bo_reserve_no_wait_ticket
[06:50:44] [PASSED] ttm_bo_reserve_double_resv
[06:50:44] [PASSED] ttm_bo_reserve_interrupted
[06:50:44] [PASSED] ttm_bo_reserve_deadlock
[06:50:44] [PASSED] ttm_bo_unreserve_basic
[06:50:44] [PASSED] ttm_bo_unreserve_pinned
[06:50:44] [PASSED] ttm_bo_unreserve_bulk
[06:50:44] [PASSED] ttm_bo_fini_basic
[06:50:44] [PASSED] ttm_bo_fini_shared_resv
[06:50:44] [PASSED] ttm_bo_pin_basic
[06:50:44] [PASSED] ttm_bo_pin_unpin_resource
[06:50:44] [PASSED] ttm_bo_multiple_pin_one_unpin
[06:50:44] ===================== [PASSED] ttm_bo ======================
[06:50:44] ============== ttm_bo_validate (22 subtests) ===============
[06:50:44] ============== ttm_bo_init_reserved_sys_man  ===============
[06:50:44] [PASSED] Buffer object for userspace
[06:50:44] [PASSED] Kernel buffer object
[06:50:44] [PASSED] Shared buffer object
[06:50:44] ========== [PASSED] ttm_bo_init_reserved_sys_man ===========
[06:50:44] ============== ttm_bo_init_reserved_mock_man  ==============
[06:50:44] [PASSED] Buffer object for userspace
[06:50:44] [PASSED] Kernel buffer object
[06:50:44] [PASSED] Shared buffer object
[06:50:44] ========== [PASSED] ttm_bo_init_reserved_mock_man ==========
[06:50:44] [PASSED] ttm_bo_init_reserved_resv
[06:50:44] ================== ttm_bo_validate_basic  ==================
[06:50:44] [PASSED] Buffer object for userspace
[06:50:44] [PASSED] Kernel buffer object
[06:50:44] [PASSED] Shared buffer object
[06:50:44] ============== [PASSED] ttm_bo_validate_basic ==============
[06:50:44] [PASSED] ttm_bo_validate_invalid_placement
[06:50:44] ============= ttm_bo_validate_same_placement  ==============
[06:50:44] [PASSED] System manager
[06:50:44] [PASSED] VRAM manager
[06:50:44] ========= [PASSED] ttm_bo_validate_same_placement ==========
[06:50:44] [PASSED] ttm_bo_validate_failed_alloc
[06:50:44] [PASSED] ttm_bo_validate_pinned
[06:50:44] [PASSED] ttm_bo_validate_busy_placement
[06:50:44] ================ ttm_bo_validate_multihop  =================
[06:50:44] [PASSED] Buffer object for userspace
[06:50:44] [PASSED] Kernel buffer object
[06:50:44] [PASSED] Shared buffer object
[06:50:44] ============ [PASSED] ttm_bo_validate_multihop =============
[06:50:44] ========== ttm_bo_validate_no_placement_signaled  ==========
[06:50:44] [PASSED] Buffer object in system domain, no page vector
[06:50:44] [PASSED] Buffer object in system domain with an existing page vector
[06:50:44] ====== [PASSED] ttm_bo_validate_no_placement_signaled ======
[06:50:44] ======== ttm_bo_validate_no_placement_not_signaled  ========
[06:50:44] [PASSED] Buffer object for userspace
[06:50:44] [PASSED] Kernel buffer object
[06:50:44] [PASSED] Shared buffer object
[06:50:44] ==== [PASSED] ttm_bo_validate_no_placement_not_signaled ====
[06:50:44] [PASSED] ttm_bo_validate_move_fence_signaled
[06:50:44] ========= ttm_bo_validate_move_fence_not_signaled  =========
[06:50:44] [PASSED] Waits for GPU
[06:50:44] [PASSED] Tries to lock straight away
[06:50:44] ===== [PASSED] ttm_bo_validate_move_fence_not_signaled =====
[06:50:44] [PASSED] ttm_bo_validate_swapout
[06:50:44] [PASSED] ttm_bo_validate_happy_evict
[06:50:44] [PASSED] ttm_bo_validate_all_pinned_evict
[06:50:44] [PASSED] ttm_bo_validate_allowed_only_evict
[06:50:44] [PASSED] ttm_bo_validate_deleted_evict
[06:50:44] [PASSED] ttm_bo_validate_busy_domain_evict
[06:50:44] [PASSED] ttm_bo_validate_evict_gutting
[06:50:44] [PASSED] ttm_bo_validate_recrusive_evict
stty: 'standard input': Inappropriate ioctl for device
[06:50:44] ================= [PASSED] ttm_bo_validate =================
[06:50:44] ============================================================
[06:50:44] Testing complete. Ran 102 tests: passed: 102
[06:50:44] Elapsed time: 11.516s total, 1.756s configuring, 9.543s building, 0.179s running

+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel



^ permalink raw reply	[flat|nested] 25+ messages in thread

* ✓ Xe.CI.BAT: success for Introduce cold reset recovery method
  2026-03-18  6:40 [PATCH v2 0/5] Introduce cold reset recovery method Mallesh Koujalagi
                   ` (6 preceding siblings ...)
  2026-03-18  6:50 ` ✓ CI.KUnit: success " Patchwork
@ 2026-03-18  7:33 ` Patchwork
  2026-03-19 20:20 ` ✓ Xe.CI.FULL: " Patchwork
  8 siblings, 0 replies; 25+ messages in thread
From: Patchwork @ 2026-03-18  7:33 UTC (permalink / raw)
  To: Mallesh Koujalagi; +Cc: intel-xe

[-- Attachment #1: Type: text/plain, Size: 2407 bytes --]

== Series Details ==

Series: Introduce cold reset recovery method
URL   : https://patchwork.freedesktop.org/series/163428/
State : success

== Summary ==

CI Bug Log - changes from xe-4742-146a21986f74225d0343edeb925095825fa5474f_BAT -> xe-pw-163428v1_BAT
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Participating hosts (14 -> 14)
------------------------------

  No changes in participating hosts

Known issues
------------

  Here are the changes found in xe-pw-163428v1_BAT that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@kms_flip@basic-flip-vs-wf_vblank@c-edp1:
    - bat-adlp-7:         [PASS][1] -> [DMESG-WARN][2] ([Intel XE#7483])
   [1]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4742-146a21986f74225d0343edeb925095825fa5474f/bat-adlp-7/igt@kms_flip@basic-flip-vs-wf_vblank@c-edp1.html
   [2]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/bat-adlp-7/igt@kms_flip@basic-flip-vs-wf_vblank@c-edp1.html

  * igt@xe_waitfence@engine:
    - bat-dg2-oem2:       [PASS][3] -> [FAIL][4] ([Intel XE#6519])
   [3]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4742-146a21986f74225d0343edeb925095825fa5474f/bat-dg2-oem2/igt@xe_waitfence@engine.html
   [4]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/bat-dg2-oem2/igt@xe_waitfence@engine.html

  
#### Possible fixes ####

  * igt@kms_flip@basic-flip-vs-wf_vblank@d-edp1:
    - bat-adlp-7:         [DMESG-WARN][5] ([Intel XE#7483]) -> [PASS][6]
   [5]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4742-146a21986f74225d0343edeb925095825fa5474f/bat-adlp-7/igt@kms_flip@basic-flip-vs-wf_vblank@d-edp1.html
   [6]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/bat-adlp-7/igt@kms_flip@basic-flip-vs-wf_vblank@d-edp1.html

  
  [Intel XE#6519]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6519
  [Intel XE#7483]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7483


Build changes
-------------

  * Linux: xe-4742-146a21986f74225d0343edeb925095825fa5474f -> xe-pw-163428v1

  IGT_8807: 7f44d96d705f1583d689f1f8c2275b685b4ca11d @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  xe-4742-146a21986f74225d0343edeb925095825fa5474f: 146a21986f74225d0343edeb925095825fa5474f
  xe-pw-163428v1: 163428v1

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/index.html

[-- Attachment #2: Type: text/html, Size: 3104 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/5] Introduce Xe Uncorrectable Error Handling
  2026-03-18  6:40 ` [PATCH v2 1/5] Introduce Xe Uncorrectable Error Handling Mallesh Koujalagi
@ 2026-03-18 19:35   ` kernel test robot
  2026-03-19 14:42   ` kernel test robot
  2026-03-19 20:02   ` kernel test robot
  2 siblings, 0 replies; 25+ messages in thread
From: kernel test robot @ 2026-03-18 19:35 UTC (permalink / raw)
  To: Mallesh Koujalagi, intel-xe, dri-devel, rodrigo.vivi
  Cc: llvm, oe-kbuild-all, andrealmeid, christian.koenig, airlied,
	simona.vetter, mripard, anshuman.gupta, badal.nilawar,
	riana.tauro, karthik.poosa, sk.anirban, raag.jadav,
	Mallesh Koujalagi

Hi Mallesh,

kernel test robot noticed the following build warnings:

[auto build test WARNING on drm-xe/drm-xe-next]
[also build test WARNING on drm-misc/drm-misc-next drm/drm-next next-20260318]
[cannot apply to linus/master v7.0-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Mallesh-Koujalagi/Introduce-Xe-Uncorrectable-Error-Handling/20260318-153303
base:   https://gitlab.freedesktop.org/drm/xe/kernel.git drm-xe-next
patch link:    https://lore.kernel.org/r/20260318064016.374656-8-mallesh.koujalagi%40intel.com
patch subject: [PATCH v2 1/5] Introduce Xe Uncorrectable Error Handling
config: s390-allmodconfig (https://download.01.org/0day-ci/archive/20260319/202603190311.jqPlioXj-lkp@intel.com/config)
compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260319/202603190311.jqPlioXj-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202603190311.jqPlioXj-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/gpu/drm/xe/xe_ras.c:241:4: warning: variable 'action' is used uninitialized whenever switch default is taken [-Wsometimes-uninitialized]
     241 |                         default:
         |                         ^~~~~~~
   drivers/gpu/drm/xe/xe_ras.c:250:8: note: uninitialized use occurs here
     250 |                         if (action > final_action)
         |                             ^~~~~~
   drivers/gpu/drm/xe/xe_ras.c:227:4: note: variable 'action' is declared here
     227 |                         enum xe_ras_recovery_action action;
         |                         ^
   1 warning generated.


vim +/action +241 drivers/gpu/drm/xe/xe_ras.c

   177	
   178	/**
   179	 * xe_ras_process_errors - Process and contain hardware errors
   180	 * @xe: xe device instance
   181	 *
   182	 * Get error details from system controller and return recovery
   183	 * method. Called only from PCI error handling.
   184	 *
   185	 * Returns: recovery action to be taken
   186	 */
   187	enum xe_ras_recovery_action xe_ras_process_errors(struct xe_device *xe)
   188	{
   189		struct xe_sysctrl_mailbox_command command = {0};
   190		struct xe_ras_get_error_response response;
   191		enum xe_ras_recovery_action final_action;
   192		size_t rlen;
   193		int ret;
   194	
   195		/* Default action */
   196		final_action = XE_RAS_RECOVERY_ACTION_RECOVERED;
   197	
   198		if (!xe->info.has_sysctrl)
   199			return XE_RAS_RECOVERY_ACTION_RESET;
   200	
   201		xe_ras_prepare_sysctrl_command(&command, XE_SYSCTRL_CMD_GET_SOC_ERROR, NULL, 0,
   202					       &response, sizeof(response));
   203	
   204		do {
   205			memset(&response, 0, sizeof(response));
   206			rlen = 0;
   207	
   208			ret = xe_sysctrl_send_command(xe, &command, &rlen);
   209			if (ret || !rlen) {
   210				xe_err(xe, "[RAS]: Sysctrl error ret %d\n", ret);
   211				goto err;
   212			}
   213	
   214			if (rlen != sizeof(response)) {
   215				xe_err(xe, "[RAS]: Sysctrl response does not match len!!\n");
   216				goto err;
   217			}
   218	
   219			if (response.num_errors > XE_RAS_NUM_ERROR_ARR) {
   220				xe_err(xe, "[RAS]: Number of errors out of bound (%d)\n",
   221				       XE_RAS_NUM_ERROR_ARR);
   222				goto err;
   223			}
   224	
   225			for (int i = 0; i < response.num_errors; i++) {
   226				struct xe_ras_error_array arr = response.error_arr[i];
   227				enum xe_ras_recovery_action action;
   228				struct xe_ras_error_class error_class;
   229				u8 component;
   230	
   231				error_class = arr.error_class;
   232				component = error_class.common.component;
   233	
   234				switch (component) {
   235				case XE_RAS_COMPONENT_CORE_COMPUTE:
   236					action = handle_compute_errors(xe, &arr);
   237					break;
   238				case XE_RAS_COMPONENT_SOC_INTERNAL:
   239					action = handle_soc_internal_errors(xe, &arr);
   240					break;
 > 241				default:
   242					xe_err(xe, "[RAS]: Unknown error component %u\n", component);
   243					break;
   244				}
   245	
   246				/*
   247				 * Retain the highest severity action. Process and log all errors
   248				 * and then take appropriate recovery action
   249				 */
   250				if (action > final_action)
   251					final_action = action;
   252			}
   253	
   254		} while (response.additional_errors);
   255	
   256		return final_action;
   257	
   258	err:
   259		return XE_RAS_RECOVERY_ACTION_RESET;
   260	}
   261	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/5] Introduce Xe Uncorrectable Error Handling
  2026-03-18  6:40 ` [PATCH v2 1/5] Introduce Xe Uncorrectable Error Handling Mallesh Koujalagi
  2026-03-18 19:35   ` kernel test robot
@ 2026-03-19 14:42   ` kernel test robot
  2026-03-19 20:02   ` kernel test robot
  2 siblings, 0 replies; 25+ messages in thread
From: kernel test robot @ 2026-03-19 14:42 UTC (permalink / raw)
  To: Mallesh Koujalagi, intel-xe, dri-devel, rodrigo.vivi
  Cc: oe-kbuild-all, andrealmeid, christian.koenig, airlied,
	simona.vetter, mripard, anshuman.gupta, badal.nilawar,
	riana.tauro, karthik.poosa, sk.anirban, raag.jadav,
	Mallesh Koujalagi

Hi Mallesh,

kernel test robot noticed the following build warnings:

[auto build test WARNING on drm-xe/drm-xe-next]
[also build test WARNING on drm-misc/drm-misc-next drm/drm-next next-20260318]
[cannot apply to linus/master v7.0-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Mallesh-Koujalagi/Introduce-Xe-Uncorrectable-Error-Handling/20260318-153303
base:   https://gitlab.freedesktop.org/drm/xe/kernel.git drm-xe-next
patch link:    https://lore.kernel.org/r/20260318064016.374656-8-mallesh.koujalagi%40intel.com
patch subject: [PATCH v2 1/5] Introduce Xe Uncorrectable Error Handling
config: csky-randconfig-r122-20260319 (https://download.01.org/0day-ci/archive/20260319/202603192221.3sEGGig7-lkp@intel.com/config)
compiler: csky-linux-gcc (GCC) 12.5.0
sparse: v0.6.5-rc1
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260319/202603192221.3sEGGig7-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202603192221.3sEGGig7-lkp@intel.com/

sparse warnings: (new ones prefixed by >>)
>> drivers/gpu/drm/xe/xe_ras.c:170:18: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le32 [usertype] data @@     got unsigned int [assigned] [usertype] req_hdr @@
   drivers/gpu/drm/xe/xe_ras.c:170:18: sparse:     expected restricted __le32 [usertype] data
   drivers/gpu/drm/xe/xe_ras.c:170:18: sparse:     got unsigned int [assigned] [usertype] req_hdr

vim +170 drivers/gpu/drm/xe/xe_ras.c

   159	
   160	static void xe_ras_prepare_sysctrl_command(struct xe_sysctrl_mailbox_command *command,
   161						   u32 cmd_mask, void *request, size_t request_len,
   162						   void *response, size_t response_len)
   163	{
   164		struct xe_sysctrl_mailbox_app_msg_hdr hdr = {0};
   165		u32 req_hdr;
   166	
   167		req_hdr = FIELD_PREP(APP_HDR_GROUP_ID_MASK, XE_SYSCTRL_GROUP_GFSP) |
   168			  FIELD_PREP(APP_HDR_COMMAND_MASK, cmd_mask);
   169	
 > 170		hdr.data = req_hdr;
   171		command->header = hdr;
   172		command->data_in = request;
   173		command->data_in_len = request_len;
   174		command->data_out = response;
   175		command->data_out_len = response_len;
   176	}
   177	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/5] Introduce Xe Uncorrectable Error Handling
  2026-03-18  6:40 ` [PATCH v2 1/5] Introduce Xe Uncorrectable Error Handling Mallesh Koujalagi
  2026-03-18 19:35   ` kernel test robot
  2026-03-19 14:42   ` kernel test robot
@ 2026-03-19 20:02   ` kernel test robot
  2 siblings, 0 replies; 25+ messages in thread
From: kernel test robot @ 2026-03-19 20:02 UTC (permalink / raw)
  To: Mallesh Koujalagi, intel-xe, dri-devel, rodrigo.vivi
  Cc: llvm, oe-kbuild-all, andrealmeid, christian.koenig, airlied,
	simona.vetter, mripard, anshuman.gupta, badal.nilawar,
	riana.tauro, karthik.poosa, sk.anirban, raag.jadav,
	Mallesh Koujalagi

Hi Mallesh,

kernel test robot noticed the following build errors:

[auto build test ERROR on drm-xe/drm-xe-next]
[also build test ERROR on drm-misc/drm-misc-next drm/drm-next next-20260319]
[cannot apply to linus/master v7.0-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Mallesh-Koujalagi/Introduce-Xe-Uncorrectable-Error-Handling/20260318-153303
base:   https://gitlab.freedesktop.org/drm/xe/kernel.git drm-xe-next
patch link:    https://lore.kernel.org/r/20260318064016.374656-8-mallesh.koujalagi%40intel.com
patch subject: [PATCH v2 1/5] Introduce Xe Uncorrectable Error Handling
config: i386-randconfig-017-20260319 (https://download.01.org/0day-ci/archive/20260320/202603200358.BacRkqob-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260320/202603200358.BacRkqob-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202603200358.BacRkqob-lkp@intel.com/

All errors (new ones prefixed by >>):

>> drivers/gpu/drm/xe/xe_ras.c:241:4: error: variable 'action' is used uninitialized whenever switch default is taken [-Werror,-Wsometimes-uninitialized]
     241 |                         default:
         |                         ^~~~~~~
   drivers/gpu/drm/xe/xe_ras.c:250:8: note: uninitialized use occurs here
     250 |                         if (action > final_action)
         |                             ^~~~~~
   drivers/gpu/drm/xe/xe_ras.c:227:4: note: variable 'action' is declared here
     227 |                         enum xe_ras_recovery_action action;
         |                         ^
   1 error generated.


vim +/action +241 drivers/gpu/drm/xe/xe_ras.c

   177	
   178	/**
   179	 * xe_ras_process_errors - Process and contain hardware errors
   180	 * @xe: xe device instance
   181	 *
   182	 * Get error details from system controller and return recovery
   183	 * method. Called only from PCI error handling.
   184	 *
   185	 * Returns: recovery action to be taken
   186	 */
   187	enum xe_ras_recovery_action xe_ras_process_errors(struct xe_device *xe)
   188	{
   189		struct xe_sysctrl_mailbox_command command = {0};
   190		struct xe_ras_get_error_response response;
   191		enum xe_ras_recovery_action final_action;
   192		size_t rlen;
   193		int ret;
   194	
   195		/* Default action */
   196		final_action = XE_RAS_RECOVERY_ACTION_RECOVERED;
   197	
   198		if (!xe->info.has_sysctrl)
   199			return XE_RAS_RECOVERY_ACTION_RESET;
   200	
   201		xe_ras_prepare_sysctrl_command(&command, XE_SYSCTRL_CMD_GET_SOC_ERROR, NULL, 0,
   202					       &response, sizeof(response));
   203	
   204		do {
   205			memset(&response, 0, sizeof(response));
   206			rlen = 0;
   207	
   208			ret = xe_sysctrl_send_command(xe, &command, &rlen);
   209			if (ret || !rlen) {
   210				xe_err(xe, "[RAS]: Sysctrl error ret %d\n", ret);
   211				goto err;
   212			}
   213	
   214			if (rlen != sizeof(response)) {
   215				xe_err(xe, "[RAS]: Sysctrl response does not match len!!\n");
   216				goto err;
   217			}
   218	
   219			if (response.num_errors > XE_RAS_NUM_ERROR_ARR) {
   220				xe_err(xe, "[RAS]: Number of errors out of bound (%d)\n",
   221				       XE_RAS_NUM_ERROR_ARR);
   222				goto err;
   223			}
   224	
   225			for (int i = 0; i < response.num_errors; i++) {
   226				struct xe_ras_error_array arr = response.error_arr[i];
   227				enum xe_ras_recovery_action action;
   228				struct xe_ras_error_class error_class;
   229				u8 component;
   230	
   231				error_class = arr.error_class;
   232				component = error_class.common.component;
   233	
   234				switch (component) {
   235				case XE_RAS_COMPONENT_CORE_COMPUTE:
   236					action = handle_compute_errors(xe, &arr);
   237					break;
   238				case XE_RAS_COMPONENT_SOC_INTERNAL:
   239					action = handle_soc_internal_errors(xe, &arr);
   240					break;
 > 241				default:
   242					xe_err(xe, "[RAS]: Unknown error component %u\n", component);
   243					break;
   244				}
   245	
   246				/*
   247				 * Retain the highest severity action. Process and log all errors
   248				 * and then take appropriate recovery action
   249				 */
   250				if (action > final_action)
   251					final_action = action;
   252			}
   253	
   254		} while (response.additional_errors);
   255	
   256		return final_action;
   257	
   258	err:
   259		return XE_RAS_RECOVERY_ACTION_RESET;
   260	}
   261	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 25+ messages in thread

* ✓ Xe.CI.FULL: success for Introduce cold reset recovery method
  2026-03-18  6:40 [PATCH v2 0/5] Introduce cold reset recovery method Mallesh Koujalagi
                   ` (7 preceding siblings ...)
  2026-03-18  7:33 ` ✓ Xe.CI.BAT: " Patchwork
@ 2026-03-19 20:20 ` Patchwork
  8 siblings, 0 replies; 25+ messages in thread
From: Patchwork @ 2026-03-19 20:20 UTC (permalink / raw)
  To: Mallesh Koujalagi; +Cc: intel-xe

[-- Attachment #1: Type: text/plain, Size: 29811 bytes --]

== Series Details ==

Series: Introduce cold reset recovery method
URL   : https://patchwork.freedesktop.org/series/163428/
State : success

== Summary ==

CI Bug Log - changes from xe-4742-146a21986f74225d0343edeb925095825fa5474f_FULL -> xe-pw-163428v1_FULL
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Participating hosts (2 -> 2)
------------------------------

  No changes in participating hosts

Known issues
------------

  Here are the changes found in xe-pw-163428v1_FULL that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@kms_big_fb@4-tiled-8bpp-rotate-90:
    - shard-bmg:          NOTRUN -> [SKIP][1] ([Intel XE#2327])
   [1]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-4/igt@kms_big_fb@4-tiled-8bpp-rotate-90.html

  * igt@kms_big_fb@yf-tiled-max-hw-stride-64bpp-rotate-0-hflip:
    - shard-bmg:          NOTRUN -> [SKIP][2] ([Intel XE#1124]) +5 other tests skip
   [2]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-5/igt@kms_big_fb@yf-tiled-max-hw-stride-64bpp-rotate-0-hflip.html

  * igt@kms_bw@connected-linear-tiling-2-displays-2160x1440p:
    - shard-bmg:          NOTRUN -> [SKIP][3] ([Intel XE#7621])
   [3]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-6/igt@kms_bw@connected-linear-tiling-2-displays-2160x1440p.html

  * igt@kms_bw@connected-linear-tiling-2-displays-2560x1440p:
    - shard-bmg:          [PASS][4] -> [SKIP][5] ([Intel XE#2314] / [Intel XE#2894] / [Intel XE#7373])
   [4]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4742-146a21986f74225d0343edeb925095825fa5474f/shard-bmg-10/igt@kms_bw@connected-linear-tiling-2-displays-2560x1440p.html
   [5]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-3/igt@kms_bw@connected-linear-tiling-2-displays-2560x1440p.html

  * igt@kms_bw@connected-linear-tiling-4-displays-2560x1440p:
    - shard-bmg:          NOTRUN -> [SKIP][6] ([Intel XE#2314] / [Intel XE#2894] / [Intel XE#7373])
   [6]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-5/igt@kms_bw@connected-linear-tiling-4-displays-2560x1440p.html

  * igt@kms_ccs@bad-pixel-format-4-tiled-dg2-rc-ccs:
    - shard-bmg:          NOTRUN -> [SKIP][7] ([Intel XE#2887]) +6 other tests skip
   [7]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-4/igt@kms_ccs@bad-pixel-format-4-tiled-dg2-rc-ccs.html

  * igt@kms_ccs@crc-primary-rotation-180-4-tiled-lnl-ccs@pipe-c-dp-2:
    - shard-bmg:          NOTRUN -> [SKIP][8] ([Intel XE#2652]) +11 other tests skip
   [8]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-2/igt@kms_ccs@crc-primary-rotation-180-4-tiled-lnl-ccs@pipe-c-dp-2.html

  * igt@kms_chamelium_edid@dp-edid-read:
    - shard-bmg:          NOTRUN -> [SKIP][9] ([Intel XE#2252]) +4 other tests skip
   [9]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-6/igt@kms_chamelium_edid@dp-edid-read.html

  * igt@kms_content_protection@atomic-hdcp14:
    - shard-bmg:          NOTRUN -> [FAIL][10] ([Intel XE#1178] / [Intel XE#3304] / [Intel XE#7374]) +3 other tests fail
   [10]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-4/igt@kms_content_protection@atomic-hdcp14.html

  * igt@kms_content_protection@dp-mst-type-1:
    - shard-bmg:          NOTRUN -> [SKIP][11] ([Intel XE#2390] / [Intel XE#6974])
   [11]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-6/igt@kms_content_protection@dp-mst-type-1.html

  * igt@kms_cursor_crc@cursor-random-32x32:
    - shard-bmg:          NOTRUN -> [SKIP][12] ([Intel XE#2320]) +3 other tests skip
   [12]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-4/igt@kms_cursor_crc@cursor-random-32x32.html

  * igt@kms_cursor_crc@cursor-sliding-512x512:
    - shard-bmg:          NOTRUN -> [SKIP][13] ([Intel XE#2321] / [Intel XE#7355])
   [13]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-6/igt@kms_cursor_crc@cursor-sliding-512x512.html

  * igt@kms_cursor_legacy@cursora-vs-flipb-atomic-transitions:
    - shard-bmg:          [PASS][14] -> [SKIP][15] ([Intel XE#2291]) +1 other test skip
   [14]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4742-146a21986f74225d0343edeb925095825fa5474f/shard-bmg-8/igt@kms_cursor_legacy@cursora-vs-flipb-atomic-transitions.html
   [15]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-5/igt@kms_cursor_legacy@cursora-vs-flipb-atomic-transitions.html

  * igt@kms_dp_aux_dev:
    - shard-bmg:          [PASS][16] -> [SKIP][17] ([Intel XE#3009])
   [16]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4742-146a21986f74225d0343edeb925095825fa5474f/shard-bmg-10/igt@kms_dp_aux_dev.html
   [17]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-3/igt@kms_dp_aux_dev.html

  * igt@kms_dsc@dsc-with-formats:
    - shard-bmg:          NOTRUN -> [SKIP][18] ([Intel XE#2244])
   [18]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-6/igt@kms_dsc@dsc-with-formats.html

  * igt@kms_flip@2x-flip-vs-panning:
    - shard-bmg:          [PASS][19] -> [SKIP][20] ([Intel XE#2316]) +2 other tests skip
   [19]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4742-146a21986f74225d0343edeb925095825fa5474f/shard-bmg-8/igt@kms_flip@2x-flip-vs-panning.html
   [20]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-5/igt@kms_flip@2x-flip-vs-panning.html

  * igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-32bpp-ytilegen12rcccs-upscaling:
    - shard-bmg:          NOTRUN -> [SKIP][21] ([Intel XE#7178] / [Intel XE#7351])
   [21]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-6/igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-32bpp-ytilegen12rcccs-upscaling.html

  * igt@kms_frontbuffer_tracking@drrs-2p-primscrn-cur-indfb-move:
    - shard-bmg:          NOTRUN -> [SKIP][22] ([Intel XE#2312])
   [22]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-5/igt@kms_frontbuffer_tracking@drrs-2p-primscrn-cur-indfb-move.html

  * igt@kms_frontbuffer_tracking@drrs-2p-primscrn-spr-indfb-draw-render:
    - shard-bmg:          NOTRUN -> [SKIP][23] ([Intel XE#2311]) +14 other tests skip
   [23]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-6/igt@kms_frontbuffer_tracking@drrs-2p-primscrn-spr-indfb-draw-render.html

  * igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-cur-indfb-draw-blt:
    - shard-bmg:          NOTRUN -> [SKIP][24] ([Intel XE#4141]) +4 other tests skip
   [24]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-6/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-cur-indfb-draw-blt.html

  * igt@kms_frontbuffer_tracking@fbcdrrs-tiling-y:
    - shard-bmg:          NOTRUN -> [SKIP][25] ([Intel XE#2352] / [Intel XE#7399]) +1 other test skip
   [25]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-6/igt@kms_frontbuffer_tracking@fbcdrrs-tiling-y.html

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-cur-indfb-draw-blt:
    - shard-bmg:          NOTRUN -> [SKIP][26] ([Intel XE#2313]) +12 other tests skip
   [26]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-6/igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-cur-indfb-draw-blt.html

  * igt@kms_joiner@basic-big-joiner:
    - shard-bmg:          NOTRUN -> [SKIP][27] ([Intel XE#6901])
   [27]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-6/igt@kms_joiner@basic-big-joiner.html

  * igt@kms_joiner@invalid-modeset-force-ultra-joiner:
    - shard-bmg:          NOTRUN -> [SKIP][28] ([Intel XE#6911] / [Intel XE#7466])
   [28]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-5/igt@kms_joiner@invalid-modeset-force-ultra-joiner.html

  * igt@kms_multipipe_modeset@basic-max-pipe-crc-check:
    - shard-bmg:          NOTRUN -> [SKIP][29] ([Intel XE#7591])
   [29]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-6/igt@kms_multipipe_modeset@basic-max-pipe-crc-check.html

  * igt@kms_plane@pixel-format-4-tiled-lnl-ccs-modifier-source-clamping:
    - shard-bmg:          NOTRUN -> [SKIP][30] ([Intel XE#7283]) +1 other test skip
   [30]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-4/igt@kms_plane@pixel-format-4-tiled-lnl-ccs-modifier-source-clamping.html

  * igt@kms_plane_multiple@2x-tiling-x:
    - shard-bmg:          [PASS][31] -> [ABORT][32] ([Intel XE#5175])
   [31]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4742-146a21986f74225d0343edeb925095825fa5474f/shard-bmg-8/igt@kms_plane_multiple@2x-tiling-x.html
   [32]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-2/igt@kms_plane_multiple@2x-tiling-x.html

  * igt@kms_plane_multiple@2x-tiling-x@pipe-d-dp-2-pipe-c-hdmi-a-3:
    - shard-bmg:          [PASS][33] -> [ABORT][34] ([Intel XE#5545] / [Intel XE#6652])
   [33]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4742-146a21986f74225d0343edeb925095825fa5474f/shard-bmg-8/igt@kms_plane_multiple@2x-tiling-x@pipe-d-dp-2-pipe-c-hdmi-a-3.html
   [34]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-2/igt@kms_plane_multiple@2x-tiling-x@pipe-d-dp-2-pipe-c-hdmi-a-3.html

  * igt@kms_plane_scaling@planes-upscale-20x20-downscale-factor-0-75@pipe-a:
    - shard-bmg:          NOTRUN -> [SKIP][35] ([Intel XE#2763] / [Intel XE#6886]) +4 other tests skip
   [35]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-6/igt@kms_plane_scaling@planes-upscale-20x20-downscale-factor-0-75@pipe-a.html

  * igt@kms_psr2_sf@pr-primary-plane-update-sf-dmg-area:
    - shard-bmg:          NOTRUN -> [SKIP][36] ([Intel XE#1489])
   [36]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-6/igt@kms_psr2_sf@pr-primary-plane-update-sf-dmg-area.html

  * igt@kms_psr2_su@page_flip-nv12:
    - shard-bmg:          NOTRUN -> [SKIP][37] ([Intel XE#2387] / [Intel XE#7429])
   [37]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-4/igt@kms_psr2_su@page_flip-nv12.html

  * igt@kms_psr@pr-cursor-plane-onoff:
    - shard-bmg:          NOTRUN -> [SKIP][38] ([Intel XE#2234] / [Intel XE#2850]) +3 other tests skip
   [38]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-6/igt@kms_psr@pr-cursor-plane-onoff.html

  * igt@kms_psr_stress_test@invalidate-primary-flip-overlay:
    - shard-bmg:          NOTRUN -> [SKIP][39] ([Intel XE#1406] / [Intel XE#2414])
   [39]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-4/igt@kms_psr_stress_test@invalidate-primary-flip-overlay.html

  * igt@kms_sharpness_filter@filter-basic:
    - shard-bmg:          NOTRUN -> [SKIP][40] ([Intel XE#6503])
   [40]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-6/igt@kms_sharpness_filter@filter-basic.html

  * igt@xe_eudebug@multigpu-basic-client:
    - shard-bmg:          NOTRUN -> [SKIP][41] ([Intel XE#4837]) +1 other test skip
   [41]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-6/igt@xe_eudebug@multigpu-basic-client.html

  * igt@xe_eudebug_online@interrupt-all:
    - shard-bmg:          NOTRUN -> [SKIP][42] ([Intel XE#4837] / [Intel XE#6665])
   [42]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-4/igt@xe_eudebug_online@interrupt-all.html

  * igt@xe_evict@evict-threads-small-multi-queue:
    - shard-bmg:          NOTRUN -> [SKIP][43] ([Intel XE#7140])
   [43]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-6/igt@xe_evict@evict-threads-small-multi-queue.html

  * igt@xe_exec_basic@multigpu-many-execqueues-many-vm-null-defer-bind:
    - shard-bmg:          NOTRUN -> [SKIP][44] ([Intel XE#2322] / [Intel XE#7372]) +2 other tests skip
   [44]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-6/igt@xe_exec_basic@multigpu-many-execqueues-many-vm-null-defer-bind.html

  * igt@xe_exec_fault_mode@once-multi-queue-rebind-imm:
    - shard-bmg:          NOTRUN -> [SKIP][45] ([Intel XE#7136]) +4 other tests skip
   [45]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-4/igt@xe_exec_fault_mode@once-multi-queue-rebind-imm.html

  * igt@xe_exec_multi_queue@one-queue-preempt-mode-fault-dyn-priority-smem:
    - shard-bmg:          NOTRUN -> [SKIP][46] ([Intel XE#6874]) +8 other tests skip
   [46]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-4/igt@xe_exec_multi_queue@one-queue-preempt-mode-fault-dyn-priority-smem.html

  * igt@xe_exec_threads@threads-multi-queue-mixed-userptr-invalidate:
    - shard-bmg:          NOTRUN -> [SKIP][47] ([Intel XE#7138]) +3 other tests skip
   [47]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-4/igt@xe_exec_threads@threads-multi-queue-mixed-userptr-invalidate.html

  * igt@xe_multigpu_svm@mgpu-coherency-fail-basic:
    - shard-bmg:          NOTRUN -> [SKIP][48] ([Intel XE#6964])
   [48]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-6/igt@xe_multigpu_svm@mgpu-coherency-fail-basic.html

  * igt@xe_pat@pat-index-xehpc:
    - shard-bmg:          NOTRUN -> [SKIP][49] ([Intel XE#1420])
   [49]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-4/igt@xe_pat@pat-index-xehpc.html

  * igt@xe_pm@d3cold-mmap-system:
    - shard-bmg:          NOTRUN -> [SKIP][50] ([Intel XE#2284] / [Intel XE#7370])
   [50]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-6/igt@xe_pm@d3cold-mmap-system.html

  
#### Possible fixes ####

  * igt@kms_cursor_legacy@cursor-vs-flip-varying-size:
    - shard-bmg:          [DMESG-WARN][51] ([Intel XE#5354]) -> [PASS][52]
   [51]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4742-146a21986f74225d0343edeb925095825fa5474f/shard-bmg-3/igt@kms_cursor_legacy@cursor-vs-flip-varying-size.html
   [52]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-2/igt@kms_cursor_legacy@cursor-vs-flip-varying-size.html

  * igt@kms_cursor_legacy@cursorb-vs-flipa-atomic-transitions-varying-size:
    - shard-bmg:          [SKIP][53] ([Intel XE#2291]) -> [PASS][54] +3 other tests pass
   [53]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4742-146a21986f74225d0343edeb925095825fa5474f/shard-bmg-5/igt@kms_cursor_legacy@cursorb-vs-flipa-atomic-transitions-varying-size.html
   [54]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-1/igt@kms_cursor_legacy@cursorb-vs-flipa-atomic-transitions-varying-size.html

  * igt@kms_cursor_legacy@cursorb-vs-flipb-atomic-transitions-varying-size:
    - shard-bmg:          [SKIP][55] ([Intel XE#2291] / [Intel XE#7343]) -> [PASS][56] +1 other test pass
   [55]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4742-146a21986f74225d0343edeb925095825fa5474f/shard-bmg-5/igt@kms_cursor_legacy@cursorb-vs-flipb-atomic-transitions-varying-size.html
   [56]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-7/igt@kms_cursor_legacy@cursorb-vs-flipb-atomic-transitions-varying-size.html

  * igt@kms_dp_link_training@non-uhbr-sst:
    - shard-bmg:          [SKIP][57] ([Intel XE#4354]) -> [PASS][58]
   [57]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4742-146a21986f74225d0343edeb925095825fa5474f/shard-bmg-3/igt@kms_dp_link_training@non-uhbr-sst.html
   [58]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-2/igt@kms_dp_link_training@non-uhbr-sst.html

  * igt@kms_feature_discovery@display-2x:
    - shard-bmg:          [SKIP][59] ([Intel XE#2373] / [Intel XE#7344]) -> [PASS][60]
   [59]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4742-146a21986f74225d0343edeb925095825fa5474f/shard-bmg-5/igt@kms_feature_discovery@display-2x.html
   [60]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-1/igt@kms_feature_discovery@display-2x.html

  * igt@kms_flip@2x-flip-vs-dpms-off-vs-modeset:
    - shard-bmg:          [SKIP][61] ([Intel XE#2316]) -> [PASS][62] +6 other tests pass
   [61]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4742-146a21986f74225d0343edeb925095825fa5474f/shard-bmg-5/igt@kms_flip@2x-flip-vs-dpms-off-vs-modeset.html
   [62]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-7/igt@kms_flip@2x-flip-vs-dpms-off-vs-modeset.html

  * igt@kms_flip@plain-flip-fb-recreate@b-dp2:
    - shard-bmg:          [ABORT][63] ([Intel XE#5545] / [Intel XE#6652]) -> [PASS][64] +1 other test pass
   [63]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4742-146a21986f74225d0343edeb925095825fa5474f/shard-bmg-2/igt@kms_flip@plain-flip-fb-recreate@b-dp2.html
   [64]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-6/igt@kms_flip@plain-flip-fb-recreate@b-dp2.html

  * igt@kms_joiner@invalid-modeset-force-big-joiner:
    - shard-bmg:          [SKIP][65] ([Intel XE#7086]) -> [PASS][66] +1 other test pass
   [65]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4742-146a21986f74225d0343edeb925095825fa5474f/shard-bmg-5/igt@kms_joiner@invalid-modeset-force-big-joiner.html
   [66]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-7/igt@kms_joiner@invalid-modeset-force-big-joiner.html

  * igt@kms_setmode@invalid-clone-single-crtc-stealing:
    - shard-bmg:          [SKIP][67] ([Intel XE#1435]) -> [PASS][68] +1 other test pass
   [67]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4742-146a21986f74225d0343edeb925095825fa5474f/shard-bmg-5/igt@kms_setmode@invalid-clone-single-crtc-stealing.html
   [68]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-1/igt@kms_setmode@invalid-clone-single-crtc-stealing.html

  * igt@kms_vrr@cmrr@pipe-a-edp-1:
    - shard-lnl:          [FAIL][69] ([Intel XE#4459]) -> [PASS][70] +1 other test pass
   [69]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4742-146a21986f74225d0343edeb925095825fa5474f/shard-lnl-6/igt@kms_vrr@cmrr@pipe-a-edp-1.html
   [70]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-lnl-5/igt@kms_vrr@cmrr@pipe-a-edp-1.html

  * igt@kms_vrr@seamless-rr-switch-virtual@pipe-a-edp-1:
    - shard-lnl:          [FAIL][71] ([Intel XE#2142]) -> [PASS][72] +1 other test pass
   [71]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4742-146a21986f74225d0343edeb925095825fa5474f/shard-lnl-7/igt@kms_vrr@seamless-rr-switch-virtual@pipe-a-edp-1.html
   [72]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-lnl-4/igt@kms_vrr@seamless-rr-switch-virtual@pipe-a-edp-1.html

  
#### Warnings ####

  * igt@kms_content_protection@legacy:
    - shard-bmg:          [SKIP][73] ([Intel XE#2341]) -> [FAIL][74] ([Intel XE#1178] / [Intel XE#3304] / [Intel XE#7374])
   [73]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4742-146a21986f74225d0343edeb925095825fa5474f/shard-bmg-3/igt@kms_content_protection@legacy.html
   [74]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-2/igt@kms_content_protection@legacy.html

  * igt@kms_content_protection@legacy-hdcp14:
    - shard-bmg:          [FAIL][75] ([Intel XE#1178] / [Intel XE#3304] / [Intel XE#7374]) -> [SKIP][76] ([Intel XE#7194])
   [75]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4742-146a21986f74225d0343edeb925095825fa5474f/shard-bmg-10/igt@kms_content_protection@legacy-hdcp14.html
   [76]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-5/igt@kms_content_protection@legacy-hdcp14.html

  * igt@kms_content_protection@lic-type-0:
    - shard-bmg:          [FAIL][77] ([Intel XE#1178] / [Intel XE#3304] / [Intel XE#7374]) -> [SKIP][78] ([Intel XE#2341])
   [77]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4742-146a21986f74225d0343edeb925095825fa5474f/shard-bmg-10/igt@kms_content_protection@lic-type-0.html
   [78]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-3/igt@kms_content_protection@lic-type-0.html

  * igt@kms_content_protection@suspend-resume:
    - shard-bmg:          [SKIP][79] ([Intel XE#6705]) -> [FAIL][80] ([Intel XE#1178] / [Intel XE#3304] / [Intel XE#7374])
   [79]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4742-146a21986f74225d0343edeb925095825fa5474f/shard-bmg-3/igt@kms_content_protection@suspend-resume.html
   [80]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-2/igt@kms_content_protection@suspend-resume.html

  * igt@kms_frontbuffer_tracking@drrs-2p-primscrn-spr-indfb-draw-mmap-wc:
    - shard-bmg:          [SKIP][81] ([Intel XE#2311]) -> [SKIP][82] ([Intel XE#2312]) +12 other tests skip
   [81]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4742-146a21986f74225d0343edeb925095825fa5474f/shard-bmg-10/igt@kms_frontbuffer_tracking@drrs-2p-primscrn-spr-indfb-draw-mmap-wc.html
   [82]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-5/igt@kms_frontbuffer_tracking@drrs-2p-primscrn-spr-indfb-draw-mmap-wc.html

  * igt@kms_frontbuffer_tracking@drrs-2p-scndscrn-pri-indfb-draw-mmap-wc:
    - shard-bmg:          [SKIP][83] ([Intel XE#2312]) -> [SKIP][84] ([Intel XE#2311]) +17 other tests skip
   [83]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4742-146a21986f74225d0343edeb925095825fa5474f/shard-bmg-3/igt@kms_frontbuffer_tracking@drrs-2p-scndscrn-pri-indfb-draw-mmap-wc.html
   [84]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-2/igt@kms_frontbuffer_tracking@drrs-2p-scndscrn-pri-indfb-draw-mmap-wc.html

  * igt@kms_frontbuffer_tracking@fbc-2p-primscrn-pri-indfb-draw-mmap-wc:
    - shard-bmg:          [SKIP][85] ([Intel XE#4141]) -> [SKIP][86] ([Intel XE#2312]) +4 other tests skip
   [85]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4742-146a21986f74225d0343edeb925095825fa5474f/shard-bmg-8/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-pri-indfb-draw-mmap-wc.html
   [86]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-5/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-pri-indfb-draw-mmap-wc.html

  * igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-spr-indfb-onoff:
    - shard-bmg:          [SKIP][87] ([Intel XE#2312]) -> [SKIP][88] ([Intel XE#4141]) +5 other tests skip
   [87]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4742-146a21986f74225d0343edeb925095825fa5474f/shard-bmg-5/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-spr-indfb-onoff.html
   [88]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-1/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-spr-indfb-onoff.html

  * igt@kms_frontbuffer_tracking@psr-2p-primscrn-indfb-plflip-blt:
    - shard-bmg:          [SKIP][89] ([Intel XE#2312]) -> [SKIP][90] ([Intel XE#2313]) +16 other tests skip
   [89]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4742-146a21986f74225d0343edeb925095825fa5474f/shard-bmg-5/igt@kms_frontbuffer_tracking@psr-2p-primscrn-indfb-plflip-blt.html
   [90]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-7/igt@kms_frontbuffer_tracking@psr-2p-primscrn-indfb-plflip-blt.html

  * igt@kms_frontbuffer_tracking@psr-2p-scndscrn-spr-indfb-draw-blt:
    - shard-bmg:          [SKIP][91] ([Intel XE#2313]) -> [SKIP][92] ([Intel XE#2312]) +14 other tests skip
   [91]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4742-146a21986f74225d0343edeb925095825fa5474f/shard-bmg-10/igt@kms_frontbuffer_tracking@psr-2p-scndscrn-spr-indfb-draw-blt.html
   [92]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-5/igt@kms_frontbuffer_tracking@psr-2p-scndscrn-spr-indfb-draw-blt.html

  * igt@kms_tiled_display@basic-test-pattern-with-chamelium:
    - shard-bmg:          [SKIP][93] ([Intel XE#2509] / [Intel XE#7437]) -> [SKIP][94] ([Intel XE#2426] / [Intel XE#5848])
   [93]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4742-146a21986f74225d0343edeb925095825fa5474f/shard-bmg-10/igt@kms_tiled_display@basic-test-pattern-with-chamelium.html
   [94]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/shard-bmg-5/igt@kms_tiled_display@basic-test-pattern-with-chamelium.html

  
  [Intel XE#1124]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1124
  [Intel XE#1178]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1178
  [Intel XE#1406]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1406
  [Intel XE#1420]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1420
  [Intel XE#1435]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1435
  [Intel XE#1489]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1489
  [Intel XE#2142]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2142
  [Intel XE#2234]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2234
  [Intel XE#2244]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2244
  [Intel XE#2252]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2252
  [Intel XE#2284]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2284
  [Intel XE#2291]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2291
  [Intel XE#2311]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2311
  [Intel XE#2312]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2312
  [Intel XE#2313]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2313
  [Intel XE#2314]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2314
  [Intel XE#2316]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2316
  [Intel XE#2320]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2320
  [Intel XE#2321]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2321
  [Intel XE#2322]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2322
  [Intel XE#2327]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2327
  [Intel XE#2341]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2341
  [Intel XE#2352]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2352
  [Intel XE#2373]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2373
  [Intel XE#2387]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2387
  [Intel XE#2390]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2390
  [Intel XE#2414]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2414
  [Intel XE#2426]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2426
  [Intel XE#2509]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2509
  [Intel XE#2652]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2652
  [Intel XE#2763]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2763
  [Intel XE#2850]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2850
  [Intel XE#2887]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2887
  [Intel XE#2894]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2894
  [Intel XE#3009]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3009
  [Intel XE#3304]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3304
  [Intel XE#4141]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4141
  [Intel XE#4354]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4354
  [Intel XE#4459]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4459
  [Intel XE#4837]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4837
  [Intel XE#5175]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5175
  [Intel XE#5354]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5354
  [Intel XE#5545]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5545
  [Intel XE#5848]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5848
  [Intel XE#6503]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6503
  [Intel XE#6652]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6652
  [Intel XE#6665]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6665
  [Intel XE#6705]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6705
  [Intel XE#6874]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6874
  [Intel XE#6886]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6886
  [Intel XE#6901]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6901
  [Intel XE#6911]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6911
  [Intel XE#6964]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6964
  [Intel XE#6974]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6974
  [Intel XE#7086]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7086
  [Intel XE#7136]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7136
  [Intel XE#7138]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7138
  [Intel XE#7140]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7140
  [Intel XE#7178]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7178
  [Intel XE#7194]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7194
  [Intel XE#7283]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7283
  [Intel XE#7343]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7343
  [Intel XE#7344]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7344
  [Intel XE#7351]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7351
  [Intel XE#7355]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7355
  [Intel XE#7370]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7370
  [Intel XE#7372]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7372
  [Intel XE#7373]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7373
  [Intel XE#7374]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7374
  [Intel XE#7399]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7399
  [Intel XE#7429]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7429
  [Intel XE#7437]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7437
  [Intel XE#7466]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7466
  [Intel XE#7591]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7591
  [Intel XE#7621]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7621


Build changes
-------------

  * Linux: xe-4742-146a21986f74225d0343edeb925095825fa5474f -> xe-pw-163428v1

  IGT_8807: 7f44d96d705f1583d689f1f8c2275b685b4ca11d @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  xe-4742-146a21986f74225d0343edeb925095825fa5474f: 146a21986f74225d0343edeb925095825fa5474f
  xe-pw-163428v1: 163428v1

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v1/index.html

[-- Attachment #2: Type: text/html, Size: 34097 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 4/5] drm/xe: Add handler for power management unit errors which require cold-reset
  2026-03-18  6:40 ` [PATCH v2 4/5] drm/xe: Add handler for power management unit errors which require cold-reset Mallesh Koujalagi
@ 2026-03-30  4:54   ` Tauro, Riana
  2026-03-30 13:50     ` Mallesh, Koujalagi
  2026-04-02  8:19   ` Raag Jadav
  1 sibling, 1 reply; 25+ messages in thread
From: Tauro, Riana @ 2026-03-30  4:54 UTC (permalink / raw)
  To: Mallesh Koujalagi, intel-xe, dri-devel, rodrigo.vivi
  Cc: andrealmeid, christian.koenig, airlied, simona.vetter, mripard,
	anshuman.gupta, badal.nilawar, karthik.poosa, sk.anirban,
	raag.jadav


On 3/18/2026 12:10 PM, Mallesh Koujalagi wrote:
> This handler is designed to be called when power management unit errors are
> detected that affect device-level state persisting across warm resets.
Commit message should be clear.
>   The
> cold reset recovery method signals to userspace that only a complete device
> power cycle can restore normal operation.
>
> v2:
> - Add use case: Handling errors from power management unit,
>    which requires a complete power cycle (cold reset)
>    to recover. (Christian)
This is not an addition to the patch. Only add this in cover letter.
>
> Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_hw_error.c | 27 +++++++++++++++++++++++++++
>   drivers/gpu/drm/xe/xe_hw_error.h |  1 +
>   drivers/gpu/drm/xe/xe_ras.c      |  3 ++-
>   3 files changed, 30 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_hw_error.c b/drivers/gpu/drm/xe/xe_hw_error.c
> index 2a31b430570e..ca965a2b092c 100644
> --- a/drivers/gpu/drm/xe/xe_hw_error.c
> +++ b/drivers/gpu/drm/xe/xe_hw_error.c
> @@ -5,6 +5,7 @@
>   
>   #include <linux/bitmap.h>
>   #include <linux/fault-inject.h>
> +#include <drm/drm_drv.h>
>   
>   #include "regs/xe_gsc_regs.h"
>   #include "regs/xe_hw_error_regs.h"
> @@ -542,6 +543,32 @@ static void process_hw_errors(struct xe_device *xe)
>   	}
>   }
>   
> +/**
> + * xe_punit_error_handler - Handler for power management unit errors
> + * @xe: device instance
> + *
> + * Handles power management unit errors that affect the device and cannot
> + * be recovered through driver reload, PCIe reset, etc.
> + *
> + * Marks the device as wedged with DRM_WEDGE_RECOVERY_COLD_RESET method
> + * and notifies userspace that a complete device power cycle is required.
> + */
> +void xe_punit_error_handler(struct xe_device *xe)

You can move this to xe_ras

> +{
> +	drm_err(&xe->drm, "CRITICAL: PMU error detected\n");

PMU? also means Performance monitoring unit. Please use Punit instead
Keep this consistent in both commit message, kernel-doc and code

> +	drm_err(&xe->drm, "Recovery: Device cold reset required\n");
power-cycle/cold-reset. Use consistent wording

Thanks
Riana

> +
> +	/* Set cold reset recovery method */
> +	xe_device_set_wedged_method(xe, DRM_WEDGE_RECOVERY_COLD_RESET);
> +
> +	if (xe_device_wedged(xe)) {
> +		drm_dev_wedged_event(&xe->drm, xe->wedged.method, NULL);
> +	} else {
> +		/* Declare device wedged - will trigger uevent with cold reset method */
> +		xe_device_declare_wedged(xe);
> +	}
> +}
> +
>   /**
>    * xe_hw_error_init - Initialize hw errors
>    * @xe: xe device instance
> diff --git a/drivers/gpu/drm/xe/xe_hw_error.h b/drivers/gpu/drm/xe/xe_hw_error.h
> index d86e28c5180c..f588320eb94d 100644
> --- a/drivers/gpu/drm/xe/xe_hw_error.h
> +++ b/drivers/gpu/drm/xe/xe_hw_error.h
> @@ -11,5 +11,6 @@ struct xe_tile;
>   struct xe_device;
>   
>   void xe_hw_error_irq_handler(struct xe_tile *tile, const u32 master_ctl);
> +void xe_punit_error_handler(struct xe_device *xe);
>   void xe_hw_error_init(struct xe_device *xe);
>   #endif
> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
> index 777321021391..93257d0eaaa0 100644
> --- a/drivers/gpu/drm/xe/xe_ras.c
> +++ b/drivers/gpu/drm/xe/xe_ras.c
> @@ -10,6 +10,7 @@
>   #include "xe_survivability_mode.h"
>   #include "xe_sysctrl_mailbox.h"
>   #include "xe_sysctrl_mailbox_types.h"
> +#include "xe_hw_error.h"
>   
>   #define COMPUTE_ERROR_SEVERITY_MASK		GENMASK(26, 25)
>   #define GLOBAL_UNCORR_ERROR			2
> @@ -148,7 +149,7 @@ static enum xe_ras_recovery_action handle_soc_internal_errors(struct xe_device *
>   			xe_err(xe, "[RAS]: PUNIT %s error detected: 0x%x\n",
>   			       severity_to_str(xe, common_info.severity),
>   			       ieh_error->error_sources_ieh0.punit);
> -			/** TODO: Add PUNIT error handling */
> +			xe_punit_error_handler(xe);
>   			action = XE_RAS_RECOVERY_ACTION_DISCONNECT;
>   		}
>   	}

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/5] drm/xe/debugfs: Add interface to trigger power management unit error handler
  2026-03-18  6:40 ` [PATCH v2 5/5] drm/xe/debugfs: Add interface to trigger power management unit error handler Mallesh Koujalagi
@ 2026-03-30  4:55   ` Tauro, Riana
  2026-03-30 13:40     ` Mallesh, Koujalagi
  0 siblings, 1 reply; 25+ messages in thread
From: Tauro, Riana @ 2026-03-30  4:55 UTC (permalink / raw)
  To: Mallesh Koujalagi, intel-xe, dri-devel, rodrigo.vivi
  Cc: andrealmeid, christian.koenig, airlied, simona.vetter, mripard,
	anshuman.gupta, badal.nilawar, karthik.poosa, sk.anirban,
	raag.jadav


On 3/18/2026 12:10 PM, Mallesh Koujalagi wrote:
> Add a debugfs interface to manually trigger power management unit error
> handler for testing cold reset recovery paths. This is useful for
> validating the error recovery mechanism.
>
> The new debugfs entry 'trigger_punit_error' is located at:
>    /sys/kernel/debug/dri/N/trigger_punit_error
>
> Reading the file displays usage instructions. Writing '1' invokes
> xe_punit_error_handler(), which marks the device as wedged with
> DRM_WEDGE_RECOVERY_COLD_RESET method and sends a uevent to userspace
> indicating that a complete device power cycle is required for recovery.
>
> Writing '0' or any other false value has no effect.
>
> This interface is intended for development, testing, and validation
> of power management unit error recovery code.

Would fault injection be more appropriate here?

Thanks
Riana

> Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_debugfs.c | 38 +++++++++++++++++++++++++++++++++
>   1 file changed, 38 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/xe_debugfs.c b/drivers/gpu/drm/xe/xe_debugfs.c
> index 844cfafe1ec7..390bbed9c1af 100644
> --- a/drivers/gpu/drm/xe/xe_debugfs.c
> +++ b/drivers/gpu/drm/xe/xe_debugfs.c
> @@ -18,6 +18,7 @@
>   #include "xe_gt_debugfs.h"
>   #include "xe_gt_printk.h"
>   #include "xe_guc_ads.h"
> +#include "xe_hw_error.h"
>   #include "xe_mmio.h"
>   #include "xe_pm.h"
>   #include "xe_psmi.h"
> @@ -509,6 +510,40 @@ static const struct file_operations disable_late_binding_fops = {
>   	.write = disable_late_binding_set,
>   };
>   
> +static ssize_t trigger_punit_error_show(struct file *f, char __user *ubuf,
> +					size_t size, loff_t *pos)
> +{
> +	const char *msg = "Write 1 to trigger power management unit error handler\n";
> +
> +	return simple_read_from_buffer(ubuf, size, pos, msg, strlen(msg));
> +}
> +
> +static ssize_t trigger_punit_error_set(struct file *f,
> +				       const char __user *ubuf,
> +				       size_t size, loff_t *pos)
> +{
> +	struct xe_device *xe = file_inode(f)->i_private;
> +	bool trigger;
> +	ssize_t ret;
> +
> +	ret = kstrtobool_from_user(ubuf, size, &trigger);
> +	if (ret)
> +		return ret;
> +
> +	if (trigger) {
> +		xe_punit_error_handler(xe);
> +		drm_info(&xe->drm, "PMU error handler triggered via debugfs\n");
> +	}
> +
> +	return size;
> +}
> +
> +static const struct file_operations trigger_punit_error_fops = {
> +	.owner = THIS_MODULE,
> +	.read = trigger_punit_error_show,
> +	.write = trigger_punit_error_set,
> +};
> +
>   void xe_debugfs_register(struct xe_device *xe)
>   {
>   	struct ttm_device *bdev = &xe->ttm;
> @@ -550,6 +585,9 @@ void xe_debugfs_register(struct xe_device *xe)
>   	debugfs_create_file("disable_late_binding", 0600, root, xe,
>   			    &disable_late_binding_fops);
>   
> +	debugfs_create_file("trigger_punit_error", 0600, root, xe,
> +			    &trigger_punit_error_fops);
> +
>   	/*
>   	 * Don't expose page reclaim configuration file if not supported by the
>   	 * hardware initially.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/5] drm/doc: Document DRM_WEDGE_RECOVERY_COLD_RESET recovery method
  2026-03-18  6:40 ` [PATCH v2 3/5] drm/doc: Document DRM_WEDGE_RECOVERY_COLD_RESET recovery method Mallesh Koujalagi
@ 2026-03-30  5:00   ` Tauro, Riana
  2026-03-30 14:02     ` Mallesh, Koujalagi
  2026-04-02  8:16   ` Raag Jadav
  1 sibling, 1 reply; 25+ messages in thread
From: Tauro, Riana @ 2026-03-30  5:00 UTC (permalink / raw)
  To: Mallesh Koujalagi, intel-xe, dri-devel, rodrigo.vivi
  Cc: andrealmeid, christian.koenig, airlied, simona.vetter, mripard,
	anshuman.gupta, badal.nilawar, karthik.poosa, sk.anirban,
	raag.jadav


On 3/18/2026 12:10 PM, Mallesh Koujalagi wrote:
> Add documentation for the DRM_WEDGE_RECOVERY_COLD_RESET recovery
> method introduced for handling power management unit errors. This method is
> designated for severe errors that compromise core device functionality
> and are unrecoverable via recovery mechanisms such as driver reload or PCIe
> bus reset. The documentation clarifies when this recovery method should be
> used and its implications for userspace applications.
>
> v2:
> - Add several instead of number to avoid update. (Jani)
>
> Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
> ---
>   Documentation/gpu/drm-uapi.rst | 73 +++++++++++++++++++++++++++++++++-
>   1 file changed, 72 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
> index d98428a592f1..5b63f1c17b9b 100644
> --- a/Documentation/gpu/drm-uapi.rst
> +++ b/Documentation/gpu/drm-uapi.rst
> @@ -418,7 +418,7 @@ needed.
>   Recovery
>   --------
>   
> -Current implementation defines four recovery methods, out of which, drivers
> +Current implementation defines several recovery methods, out of which, drivers
>   can use any one, multiple or none. Method(s) of choice will be sent in the
>   uevent environment as ``WEDGED=<method1>[,..,<methodN>]`` in order of less to
>   more side-effects. See the section `Vendor Specific Recovery`_
> @@ -435,6 +435,7 @@ following expectations.
>       rebind          unbind + bind driver
>       bus-reset       unbind + bus reset/re-enumeration + bind
>       vendor-specific vendor specific recovery method
> +    cold-reset      full device cold reset required
>       unknown         consumer policy
>       =============== ========================================
>   
> @@ -446,6 +447,27 @@ telemetry information (devcoredump, syslog). This is useful because the first
>   hang is usually the most critical one which can result in consequential hangs or
>   complete wedging.
>   
> +Cold Reset Recovery
> +-------------------
> +
> +The ``WEDGED=cold-reset`` event indicates that the device has encountered
> +power management unit errors that affect core functionality that cannot be

Power management errors may be only xe usecase.  Keep the documentation 
vendor-agnostic.
Some vendors may want to use it for a different usecase

> +resolved through recovery mechanisms.
> +
> +This recovery method is reserved for power management unit error conditions where the
Same as above

Thanks
Riana


> +device state cannot be restored via:
> +
> +- Driver unbind/rebind operations
> +- PCIe bus reset and re-enumeration
> +- Device Function Level Reset (FLR)
> +- Warm device resets
> +
> +Such power management unit error state typically persists across all software-based
> +recovery attempts. Only a complete device power cycle can restore
> +normal operation.
> +
> +Upon receiving a ``WEDGED=cold-reset`` event, userspace should initiate
> +a full cold reset of the affected device to restore functionality.
>   
>   Vendor Specific Recovery
>   ------------------------
> @@ -524,6 +546,55 @@ Recovery script::
>       echo -n $DEVICE > $DRIVER/unbind
>       echo -n $DEVICE > $DRIVER/bind
>   
> +Example - cold-reset
> +--------------------
> +
> +Udev rule::
> +
> +    SUBSYSTEM=="drm", ENV{WEDGED}=="cold-reset", DEVPATH=="*/drm/card[0-9]",
> +    RUN+="/path/to/cold-reset.sh $env{DEVPATH}"
> +
> +Recovery script::
> +
> +    #!/bin/sh
> +
> +    [ -z "$1" ] && echo "Usage: $0 <device-path>" && exit 1
> +
> +    # Get device
> +    DEVPATH=$(readlink -f /sys/$1/device 2>/dev/null || readlink -f /sys/$1)
> +    DEVICE=$(basename $DEVPATH)
> +
> +    echo "Cold reset: $DEVICE"
> +
> +    # Try slot power reset first
> +    SLOT=$(find /sys/bus/pci/slots/ -type l 2>/dev/null | while read slot; do
> +	    ADDR=$(cat "$slot" 2>/dev/null)
> +	    [ -n "$ADDR" ] && echo "$DEVICE" | grep -q "^$ADDR" && basename $(dirname "$slot") && break
> +    done)
> +
> +    if [ -n "$SLOT" ]; then
> +	echo "Using slot $SLOT"
> +
> +	# Unbind driver
> +	[ -e "/sys/bus/pci/devices/$DEVICE/driver" ] && \
> +	echo "$DEVICE" > /sys/bus/pci/devices/$DEVICE/driver/unbind 2>/dev/null
> +
> +	# Remove device
> +	echo 1 > /sys/bus/pci/devices/$DEVICE/remove
> +
> +	# Power cycle slot
> +	echo 0 > /sys/bus/pci/slots/$SLOT/power
> +	sleep 2
> +	echo 1 > /sys/bus/pci/slots/$SLOT/power
> +	sleep 1
> +
> +	# Rescan
> +	echo 1 > /sys/bus/pci/rescan
> +	echo "Done!"
> +    else
> +	echo "No slot found"
> +    fi
> +
>   Customization
>   -------------
>   

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/5] drm: Add DRM_WEDGE_RECOVERY_COLD_RESET for power management unit error
  2026-03-18  6:40 ` [PATCH v2 2/5] drm: Add DRM_WEDGE_RECOVERY_COLD_RESET for power management unit error Mallesh Koujalagi
@ 2026-03-30  5:26   ` Tauro, Riana
  0 siblings, 0 replies; 25+ messages in thread
From: Tauro, Riana @ 2026-03-30  5:26 UTC (permalink / raw)
  To: Mallesh Koujalagi, intel-xe, dri-devel, rodrigo.vivi
  Cc: andrealmeid, christian.koenig, airlied, simona.vetter, mripard,
	anshuman.gupta, badal.nilawar, karthik.poosa, sk.anirban,
	raag.jadav


On 3/18/2026 12:10 PM, Mallesh Koujalagi wrote:
> Introduce DRM_WEDGE_RECOVERY_COLD_RESET (BIT(4)) recovery method to
> handle power management unit errors requiring complete device power
> cycling.

Same as other comments, do not use power management unit errors.

Keep the documentation in drm-layer vendor agnostic so that anyone can 
use this method for

any scenario that requires cold-reset


Thanks
Riana

>
> This method addresses scenarios where recovery mechanisms
> (driver reload, PCIe reset, etc.) are insufficient to restore
> device functionality. When set, it indicates to userspace that
> only a full cold reset can recover the device from its current error
> state. The cold reset method serves as a last resort for power management
> unit errors.
>
> Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
> ---
>   drivers/gpu/drm/drm_drv.c | 2 ++
>   include/drm/drm_device.h  | 1 +
>   2 files changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> index 6b965c3d3307..2dace9070531 100644
> --- a/drivers/gpu/drm/drm_drv.c
> +++ b/drivers/gpu/drm/drm_drv.c
> @@ -535,6 +535,8 @@ static const char *drm_get_wedge_recovery(unsigned int opt)
>   		return "bus-reset";
>   	case DRM_WEDGE_RECOVERY_VENDOR:
>   		return "vendor-specific";
> +	case DRM_WEDGE_RECOVERY_COLD_RESET:
> +		return "cold-reset";
>   	default:
>   		return NULL;
>   	}
> diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h
> index bc78fb77cc27..3e386eb42023 100644
> --- a/include/drm/drm_device.h
> +++ b/include/drm/drm_device.h
> @@ -37,6 +37,7 @@ struct pci_controller;
>   #define DRM_WEDGE_RECOVERY_REBIND	BIT(1)	/* unbind + bind driver */
>   #define DRM_WEDGE_RECOVERY_BUS_RESET	BIT(2)	/* unbind + reset bus device + bind */
>   #define DRM_WEDGE_RECOVERY_VENDOR	BIT(3)	/* vendor specific recovery method */
> +#define DRM_WEDGE_RECOVERY_COLD_RESET	BIT(4)	/* full device cold reset */
>   
>   /**
>    * struct drm_wedge_task_info - information about the guilty task of a wedge dev

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/5] drm/xe/debugfs: Add interface to trigger power management unit error handler
  2026-03-30  4:55   ` Tauro, Riana
@ 2026-03-30 13:40     ` Mallesh, Koujalagi
  2026-04-02  8:31       ` Raag Jadav
  0 siblings, 1 reply; 25+ messages in thread
From: Mallesh, Koujalagi @ 2026-03-30 13:40 UTC (permalink / raw)
  To: Tauro, Riana
  Cc: andrealmeid, christian.koenig, airlied, simona.vetter, mripard,
	anshuman.gupta, badal.nilawar, karthik.poosa, sk.anirban,
	raag.jadav, intel-xe, dri-devel, rodrigo.vivi


On 30-03-2026 10:25 am, Tauro, Riana wrote:
>
> On 3/18/2026 12:10 PM, Mallesh Koujalagi wrote:
>> Add a debugfs interface to manually trigger power management unit error
>> handler for testing cold reset recovery paths. This is useful for
>> validating the error recovery mechanism.
>>
>> The new debugfs entry 'trigger_punit_error' is located at:
>>    /sys/kernel/debug/dri/N/trigger_punit_error
>>
>> Reading the file displays usage instructions. Writing '1' invokes
>> xe_punit_error_handler(), which marks the device as wedged with
>> DRM_WEDGE_RECOVERY_COLD_RESET method and sends a uevent to userspace
>> indicating that a complete device power cycle is required for recovery.
>>
>> Writing '0' or any other false value has no effect.
>>
>> This interface is intended for development, testing, and validation
>> of power management unit error recovery code.
>
> Would fault injection be more appropriate here?

Here we need a deterministic way to invoke the punit error handler to 
test the cold-reset

recovery flow end-to-end. Using debugfs interface, we directly triggers 
wedge/reset status via a debugfs write

rather than using fault injection.


Thanks,

-/Mallesh

>
> Thanks
> Riana
>
>> Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_debugfs.c | 38 +++++++++++++++++++++++++++++++++
>>   1 file changed, 38 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_debugfs.c 
>> b/drivers/gpu/drm/xe/xe_debugfs.c
>> index 844cfafe1ec7..390bbed9c1af 100644
>> --- a/drivers/gpu/drm/xe/xe_debugfs.c
>> +++ b/drivers/gpu/drm/xe/xe_debugfs.c
>> @@ -18,6 +18,7 @@
>>   #include "xe_gt_debugfs.h"
>>   #include "xe_gt_printk.h"
>>   #include "xe_guc_ads.h"
>> +#include "xe_hw_error.h"
>>   #include "xe_mmio.h"
>>   #include "xe_pm.h"
>>   #include "xe_psmi.h"
>> @@ -509,6 +510,40 @@ static const struct file_operations 
>> disable_late_binding_fops = {
>>       .write = disable_late_binding_set,
>>   };
>>   +static ssize_t trigger_punit_error_show(struct file *f, char 
>> __user *ubuf,
>> +                    size_t size, loff_t *pos)
>> +{
>> +    const char *msg = "Write 1 to trigger power management unit 
>> error handler\n";
>> +
>> +    return simple_read_from_buffer(ubuf, size, pos, msg, strlen(msg));
>> +}
>> +
>> +static ssize_t trigger_punit_error_set(struct file *f,
>> +                       const char __user *ubuf,
>> +                       size_t size, loff_t *pos)
>> +{
>> +    struct xe_device *xe = file_inode(f)->i_private;
>> +    bool trigger;
>> +    ssize_t ret;
>> +
>> +    ret = kstrtobool_from_user(ubuf, size, &trigger);
>> +    if (ret)
>> +        return ret;
>> +
>> +    if (trigger) {
>> +        xe_punit_error_handler(xe);
>> +        drm_info(&xe->drm, "PMU error handler triggered via 
>> debugfs\n");
>> +    }
>> +
>> +    return size;
>> +}
>> +
>> +static const struct file_operations trigger_punit_error_fops = {
>> +    .owner = THIS_MODULE,
>> +    .read = trigger_punit_error_show,
>> +    .write = trigger_punit_error_set,
>> +};
>> +
>>   void xe_debugfs_register(struct xe_device *xe)
>>   {
>>       struct ttm_device *bdev = &xe->ttm;
>> @@ -550,6 +585,9 @@ void xe_debugfs_register(struct xe_device *xe)
>>       debugfs_create_file("disable_late_binding", 0600, root, xe,
>>                   &disable_late_binding_fops);
>>   +    debugfs_create_file("trigger_punit_error", 0600, root, xe,
>> +                &trigger_punit_error_fops);
>> +
>>       /*
>>        * Don't expose page reclaim configuration file if not 
>> supported by the
>>        * hardware initially.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 4/5] drm/xe: Add handler for power management unit errors which require cold-reset
  2026-03-30  4:54   ` Tauro, Riana
@ 2026-03-30 13:50     ` Mallesh, Koujalagi
  0 siblings, 0 replies; 25+ messages in thread
From: Mallesh, Koujalagi @ 2026-03-30 13:50 UTC (permalink / raw)
  To: Tauro, Riana, intel-xe, dri-devel, rodrigo.vivi
  Cc: andrealmeid, christian.koenig, airlied, simona.vetter, mripard,
	anshuman.gupta, badal.nilawar, karthik.poosa, sk.anirban,
	raag.jadav

Hi Riana,

On 30-03-2026 10:24 am, Tauro, Riana wrote:
>
> On 3/18/2026 12:10 PM, Mallesh Koujalagi wrote:
>> This handler is designed to be called when power management unit 
>> errors are
>> detected that affect device-level state persisting across warm resets.
> Commit message should be clear.
>>   The
>> cold reset recovery method signals to userspace that only a complete 
>> device
>> power cycle can restore normal operation.
>>
>> v2:
>> - Add use case: Handling errors from power management unit,
>>    which requires a complete power cycle (cold reset)
>>    to recover. (Christian)
> This is not an addition to the patch. Only add this in cover letter.
>>
>> Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_hw_error.c | 27 +++++++++++++++++++++++++++
>>   drivers/gpu/drm/xe/xe_hw_error.h |  1 +
>>   drivers/gpu/drm/xe/xe_ras.c      |  3 ++-
>>   3 files changed, 30 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_hw_error.c 
>> b/drivers/gpu/drm/xe/xe_hw_error.c
>> index 2a31b430570e..ca965a2b092c 100644
>> --- a/drivers/gpu/drm/xe/xe_hw_error.c
>> +++ b/drivers/gpu/drm/xe/xe_hw_error.c
>> @@ -5,6 +5,7 @@
>>     #include <linux/bitmap.h>
>>   #include <linux/fault-inject.h>
>> +#include <drm/drm_drv.h>
>>     #include "regs/xe_gsc_regs.h"
>>   #include "regs/xe_hw_error_regs.h"
>> @@ -542,6 +543,32 @@ static void process_hw_errors(struct xe_device *xe)
>>       }
>>   }
>>   +/**
>> + * xe_punit_error_handler - Handler for power management unit errors
>> + * @xe: device instance
>> + *
>> + * Handles power management unit errors that affect the device and 
>> cannot
>> + * be recovered through driver reload, PCIe reset, etc.
>> + *
>> + * Marks the device as wedged with DRM_WEDGE_RECOVERY_COLD_RESET method
>> + * and notifies userspace that a complete device power cycle is 
>> required.
>> + */
>> +void xe_punit_error_handler(struct xe_device *xe)
>
> You can move this to xe_ras
Sure, will move to xe_ras.c in v3.
>
>> +{
>> +    drm_err(&xe->drm, "CRITICAL: PMU error detected\n");
>
> PMU? also means Performance monitoring unit. Please use Punit instead
> Keep this consistent in both commit message, kernel-doc and code
>
Good catch, will replace "PMU" reference with "Punit" across commit 
message, kernel-doc etc.
>> +    drm_err(&xe->drm, "Recovery: Device cold reset required\n");
> power-cycle/cold-reset. Use consistent wording

Sure, will use "cold reset " consistently in v3.

Thanks,

-/Mallesh

> Thanks
> Riana
>
>> +
>> +    /* Set cold reset recovery method */
>> +    xe_device_set_wedged_method(xe, DRM_WEDGE_RECOVERY_COLD_RESET);
>> +
>> +    if (xe_device_wedged(xe)) {
>> +        drm_dev_wedged_event(&xe->drm, xe->wedged.method, NULL);
>> +    } else {
>> +        /* Declare device wedged - will trigger uevent with cold 
>> reset method */
>> +        xe_device_declare_wedged(xe);
>> +    }
>> +}
>> +
>>   /**
>>    * xe_hw_error_init - Initialize hw errors
>>    * @xe: xe device instance
>> diff --git a/drivers/gpu/drm/xe/xe_hw_error.h 
>> b/drivers/gpu/drm/xe/xe_hw_error.h
>> index d86e28c5180c..f588320eb94d 100644
>> --- a/drivers/gpu/drm/xe/xe_hw_error.h
>> +++ b/drivers/gpu/drm/xe/xe_hw_error.h
>> @@ -11,5 +11,6 @@ struct xe_tile;
>>   struct xe_device;
>>     void xe_hw_error_irq_handler(struct xe_tile *tile, const u32 
>> master_ctl);
>> +void xe_punit_error_handler(struct xe_device *xe);
>>   void xe_hw_error_init(struct xe_device *xe);
>>   #endif
>> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
>> index 777321021391..93257d0eaaa0 100644
>> --- a/drivers/gpu/drm/xe/xe_ras.c
>> +++ b/drivers/gpu/drm/xe/xe_ras.c
>> @@ -10,6 +10,7 @@
>>   #include "xe_survivability_mode.h"
>>   #include "xe_sysctrl_mailbox.h"
>>   #include "xe_sysctrl_mailbox_types.h"
>> +#include "xe_hw_error.h"
>>     #define COMPUTE_ERROR_SEVERITY_MASK        GENMASK(26, 25)
>>   #define GLOBAL_UNCORR_ERROR            2
>> @@ -148,7 +149,7 @@ static enum xe_ras_recovery_action 
>> handle_soc_internal_errors(struct xe_device *
>>               xe_err(xe, "[RAS]: PUNIT %s error detected: 0x%x\n",
>>                      severity_to_str(xe, common_info.severity),
>>                      ieh_error->error_sources_ieh0.punit);
>> -            /** TODO: Add PUNIT error handling */
>> +            xe_punit_error_handler(xe);
>>               action = XE_RAS_RECOVERY_ACTION_DISCONNECT;
>>           }
>>       }

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/5] drm/doc: Document DRM_WEDGE_RECOVERY_COLD_RESET recovery method
  2026-03-30  5:00   ` Tauro, Riana
@ 2026-03-30 14:02     ` Mallesh, Koujalagi
  0 siblings, 0 replies; 25+ messages in thread
From: Mallesh, Koujalagi @ 2026-03-30 14:02 UTC (permalink / raw)
  To: Tauro, Riana, intel-xe, dri-devel, rodrigo.vivi
  Cc: andrealmeid, christian.koenig, airlied, simona.vetter, mripard,
	anshuman.gupta, badal.nilawar, karthik.poosa, sk.anirban,
	raag.jadav

Hi Riana,

On 30-03-2026 10:30 am, Tauro, Riana wrote:
>
> On 3/18/2026 12:10 PM, Mallesh Koujalagi wrote:
>> Add documentation for the DRM_WEDGE_RECOVERY_COLD_RESET recovery
>> method introduced for handling power management unit errors. This 
>> method is
>> designated for severe errors that compromise core device functionality
>> and are unrecoverable via recovery mechanisms such as driver reload 
>> or PCIe
>> bus reset. The documentation clarifies when this recovery method 
>> should be
>> used and its implications for userspace applications.
>>
>> v2:
>> - Add several instead of number to avoid update. (Jani)
>>
>> Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
>> ---
>>   Documentation/gpu/drm-uapi.rst | 73 +++++++++++++++++++++++++++++++++-
>>   1 file changed, 72 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/gpu/drm-uapi.rst 
>> b/Documentation/gpu/drm-uapi.rst
>> index d98428a592f1..5b63f1c17b9b 100644
>> --- a/Documentation/gpu/drm-uapi.rst
>> +++ b/Documentation/gpu/drm-uapi.rst
>> @@ -418,7 +418,7 @@ needed.
>>   Recovery
>>   --------
>>   -Current implementation defines four recovery methods, out of 
>> which, drivers
>> +Current implementation defines several recovery methods, out of 
>> which, drivers
>>   can use any one, multiple or none. Method(s) of choice will be sent 
>> in the
>>   uevent environment as ``WEDGED=<method1>[,..,<methodN>]`` in order 
>> of less to
>>   more side-effects. See the section `Vendor Specific Recovery`_
>> @@ -435,6 +435,7 @@ following expectations.
>>       rebind          unbind + bind driver
>>       bus-reset       unbind + bus reset/re-enumeration + bind
>>       vendor-specific vendor specific recovery method
>> +    cold-reset      full device cold reset required
>>       unknown         consumer policy
>>       =============== ========================================
>>   @@ -446,6 +447,27 @@ telemetry information (devcoredump, syslog). 
>> This is useful because the first
>>   hang is usually the most critical one which can result in 
>> consequential hangs or
>>   complete wedging.
>>   +Cold Reset Recovery
>> +-------------------
>> +
>> +The ``WEDGED=cold-reset`` event indicates that the device has 
>> encountered
>> +power management unit errors that affect core functionality that 
>> cannot be
>
> Power management errors may be only xe usecase.  Keep the 
> documentation vendor-agnostic.
> Some vendors may want to use it for a different usecase
>
>> +resolved through recovery mechanisms.
>> +
>> +This recovery method is reserved for power management unit error 
>> conditions where the
> Same as above
>
Agree, will rework!

Thanks,

-/Mallesh

> Thanks
> Riana
>
>
>> +device state cannot be restored via:
>> +
>> +- Driver unbind/rebind operations
>> +- PCIe bus reset and re-enumeration
>> +- Device Function Level Reset (FLR)
>> +- Warm device resets
>> +
>> +Such power management unit error state typically persists across all 
>> software-based
>> +recovery attempts. Only a complete device power cycle can restore
>> +normal operation.
>> +
>> +Upon receiving a ``WEDGED=cold-reset`` event, userspace should initiate
>> +a full cold reset of the affected device to restore functionality.
>>     Vendor Specific Recovery
>>   ------------------------
>> @@ -524,6 +546,55 @@ Recovery script::
>>       echo -n $DEVICE > $DRIVER/unbind
>>       echo -n $DEVICE > $DRIVER/bind
>>   +Example - cold-reset
>> +--------------------
>> +
>> +Udev rule::
>> +
>> +    SUBSYSTEM=="drm", ENV{WEDGED}=="cold-reset", 
>> DEVPATH=="*/drm/card[0-9]",
>> +    RUN+="/path/to/cold-reset.sh $env{DEVPATH}"
>> +
>> +Recovery script::
>> +
>> +    #!/bin/sh
>> +
>> +    [ -z "$1" ] && echo "Usage: $0 <device-path>" && exit 1
>> +
>> +    # Get device
>> +    DEVPATH=$(readlink -f /sys/$1/device 2>/dev/null || readlink -f 
>> /sys/$1)
>> +    DEVICE=$(basename $DEVPATH)
>> +
>> +    echo "Cold reset: $DEVICE"
>> +
>> +    # Try slot power reset first
>> +    SLOT=$(find /sys/bus/pci/slots/ -type l 2>/dev/null | while read 
>> slot; do
>> +        ADDR=$(cat "$slot" 2>/dev/null)
>> +        [ -n "$ADDR" ] && echo "$DEVICE" | grep -q "^$ADDR" && 
>> basename $(dirname "$slot") && break
>> +    done)
>> +
>> +    if [ -n "$SLOT" ]; then
>> +    echo "Using slot $SLOT"
>> +
>> +    # Unbind driver
>> +    [ -e "/sys/bus/pci/devices/$DEVICE/driver" ] && \
>> +    echo "$DEVICE" > /sys/bus/pci/devices/$DEVICE/driver/unbind 
>> 2>/dev/null
>> +
>> +    # Remove device
>> +    echo 1 > /sys/bus/pci/devices/$DEVICE/remove
>> +
>> +    # Power cycle slot
>> +    echo 0 > /sys/bus/pci/slots/$SLOT/power
>> +    sleep 2
>> +    echo 1 > /sys/bus/pci/slots/$SLOT/power
>> +    sleep 1
>> +
>> +    # Rescan
>> +    echo 1 > /sys/bus/pci/rescan
>> +    echo "Done!"
>> +    else
>> +    echo "No slot found"
>> +    fi
>> +
>>   Customization
>>   -------------

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/5] drm/doc: Document DRM_WEDGE_RECOVERY_COLD_RESET recovery method
  2026-03-18  6:40 ` [PATCH v2 3/5] drm/doc: Document DRM_WEDGE_RECOVERY_COLD_RESET recovery method Mallesh Koujalagi
  2026-03-30  5:00   ` Tauro, Riana
@ 2026-04-02  8:16   ` Raag Jadav
  2026-04-06 12:26     ` Mallesh, Koujalagi
  1 sibling, 1 reply; 25+ messages in thread
From: Raag Jadav @ 2026-04-02  8:16 UTC (permalink / raw)
  To: Mallesh Koujalagi
  Cc: intel-xe, dri-devel, rodrigo.vivi, andrealmeid, christian.koenig,
	airlied, simona.vetter, mripard, anshuman.gupta, badal.nilawar,
	riana.tauro, karthik.poosa, sk.anirban

On Wed, Mar 18, 2026 at 12:10:20PM +0530, Mallesh Koujalagi wrote:
> Add documentation for the DRM_WEDGE_RECOVERY_COLD_RESET recovery
> method introduced for handling power management unit errors. This method is
> designated for severe errors that compromise core device functionality
> and are unrecoverable via recovery mechanisms such as driver reload or PCIe
> bus reset. The documentation clarifies when this recovery method should be
> used and its implications for userspace applications.

Aesthetic nit: We usually try to utilize the full 75 character space where
possible (in all patches).

> v2:
> - Add several instead of number to avoid update. (Jani)
> 
> Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
> ---
>  Documentation/gpu/drm-uapi.rst | 73 +++++++++++++++++++++++++++++++++-
>  1 file changed, 72 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
> index d98428a592f1..5b63f1c17b9b 100644
> --- a/Documentation/gpu/drm-uapi.rst
> +++ b/Documentation/gpu/drm-uapi.rst
> @@ -418,7 +418,7 @@ needed.
>  Recovery
>  --------
>  
> -Current implementation defines four recovery methods, out of which, drivers
> +Current implementation defines several recovery methods, out of which, drivers
>  can use any one, multiple or none. Method(s) of choice will be sent in the
>  uevent environment as ``WEDGED=<method1>[,..,<methodN>]`` in order of less to
>  more side-effects. See the section `Vendor Specific Recovery`_
> @@ -435,6 +435,7 @@ following expectations.
>      rebind          unbind + bind driver
>      bus-reset       unbind + bus reset/re-enumeration + bind
>      vendor-specific vendor specific recovery method
> +    cold-reset      full device cold reset required

Does this work without unbind + bind?

>      unknown         consumer policy
>      =============== ========================================
>  
> @@ -446,6 +447,27 @@ telemetry information (devcoredump, syslog). This is useful because the first
>  hang is usually the most critical one which can result in consequential hangs or
>  complete wedging.
>  
> +Cold Reset Recovery
> +-------------------
> +
> +The ``WEDGED=cold-reset`` event indicates that the device has encountered
> +power management unit errors that affect core functionality that cannot be
> +resolved through recovery mechanisms.

Here cold-reset itself is introduced as a recovery method, so this could
use better phrasing. Also, we try to be consistent with terminologies
already in place so that the document is easy to follow.

> +This recovery method is reserved for power management unit error conditions where the
> +device state cannot be restored via:
> +
> +- Driver unbind/rebind operations
> +- PCIe bus reset and re-enumeration
> +- Device Function Level Reset (FLR)
> +- Warm device resets

These are already documented so not sure if it's worth repeating them.

> +Such power management unit error state typically persists across all software-based
> +recovery attempts. Only a complete device power cycle can restore
> +normal operation.
> +
> +Upon receiving a ``WEDGED=cold-reset`` event, userspace should initiate
> +a full cold reset of the affected device to restore functionality.

This is already covered in consumer expectations so can be dropped.

AI assistance is useful at times but we'd want to make sure that things
remain objective and translatable ;)

Raag

>  Vendor Specific Recovery
>  ------------------------
> @@ -524,6 +546,55 @@ Recovery script::
>      echo -n $DEVICE > $DRIVER/unbind
>      echo -n $DEVICE > $DRIVER/bind
>  
> +Example - cold-reset
> +--------------------
> +
> +Udev rule::
> +
> +    SUBSYSTEM=="drm", ENV{WEDGED}=="cold-reset", DEVPATH=="*/drm/card[0-9]",
> +    RUN+="/path/to/cold-reset.sh $env{DEVPATH}"
> +
> +Recovery script::
> +
> +    #!/bin/sh
> +
> +    [ -z "$1" ] && echo "Usage: $0 <device-path>" && exit 1
> +
> +    # Get device
> +    DEVPATH=$(readlink -f /sys/$1/device 2>/dev/null || readlink -f /sys/$1)
> +    DEVICE=$(basename $DEVPATH)
> +
> +    echo "Cold reset: $DEVICE"
> +
> +    # Try slot power reset first
> +    SLOT=$(find /sys/bus/pci/slots/ -type l 2>/dev/null | while read slot; do
> +	    ADDR=$(cat "$slot" 2>/dev/null)
> +	    [ -n "$ADDR" ] && echo "$DEVICE" | grep -q "^$ADDR" && basename $(dirname "$slot") && break
> +    done)
> +
> +    if [ -n "$SLOT" ]; then
> +	echo "Using slot $SLOT"
> +
> +	# Unbind driver
> +	[ -e "/sys/bus/pci/devices/$DEVICE/driver" ] && \
> +	echo "$DEVICE" > /sys/bus/pci/devices/$DEVICE/driver/unbind 2>/dev/null
> +
> +	# Remove device
> +	echo 1 > /sys/bus/pci/devices/$DEVICE/remove
> +
> +	# Power cycle slot
> +	echo 0 > /sys/bus/pci/slots/$SLOT/power
> +	sleep 2
> +	echo 1 > /sys/bus/pci/slots/$SLOT/power
> +	sleep 1
> +
> +	# Rescan
> +	echo 1 > /sys/bus/pci/rescan
> +	echo "Done!"
> +    else
> +	echo "No slot found"
> +    fi
> +
>  Customization
>  -------------
>  
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 4/5] drm/xe: Add handler for power management unit errors which require cold-reset
  2026-03-18  6:40 ` [PATCH v2 4/5] drm/xe: Add handler for power management unit errors which require cold-reset Mallesh Koujalagi
  2026-03-30  4:54   ` Tauro, Riana
@ 2026-04-02  8:19   ` Raag Jadav
  1 sibling, 0 replies; 25+ messages in thread
From: Raag Jadav @ 2026-04-02  8:19 UTC (permalink / raw)
  To: Mallesh Koujalagi
  Cc: intel-xe, dri-devel, rodrigo.vivi, andrealmeid, christian.koenig,
	airlied, simona.vetter, mripard, anshuman.gupta, badal.nilawar,
	riana.tauro, karthik.poosa, sk.anirban

On Wed, Mar 18, 2026 at 12:10:21PM +0530, Mallesh Koujalagi wrote:
> This handler is designed to be called when power management unit errors are
> detected that affect device-level state persisting across warm resets. The
> cold reset recovery method signals to userspace that only a complete device
> power cycle can restore normal operation.
> 
> v2:
> - Add use case: Handling errors from power management unit,
>   which requires a complete power cycle (cold reset)
>   to recover. (Christian)
> 
> Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_hw_error.c | 27 +++++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_hw_error.h |  1 +
>  drivers/gpu/drm/xe/xe_ras.c      |  3 ++-
>  3 files changed, 30 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_hw_error.c b/drivers/gpu/drm/xe/xe_hw_error.c
> index 2a31b430570e..ca965a2b092c 100644
> --- a/drivers/gpu/drm/xe/xe_hw_error.c
> +++ b/drivers/gpu/drm/xe/xe_hw_error.c
> @@ -5,6 +5,7 @@
>  
>  #include <linux/bitmap.h>
>  #include <linux/fault-inject.h>

Blank line to distinguish between linux and drm includes.

> +#include <drm/drm_drv.h>
>  
>  #include "regs/xe_gsc_regs.h"
>  #include "regs/xe_hw_error_regs.h"
> @@ -542,6 +543,32 @@ static void process_hw_errors(struct xe_device *xe)
>  	}
>  }
>  
> +/**
> + * xe_punit_error_handler - Handler for power management unit errors
> + * @xe: device instance
> + *
> + * Handles power management unit errors that affect the device and cannot
> + * be recovered through driver reload, PCIe reset, etc.
> + *
> + * Marks the device as wedged with DRM_WEDGE_RECOVERY_COLD_RESET method
> + * and notifies userspace that a complete device power cycle is required.
> + */
> +void xe_punit_error_handler(struct xe_device *xe)
> +{
> +	drm_err(&xe->drm, "CRITICAL: PMU error detected\n");
> +	drm_err(&xe->drm, "Recovery: Device cold reset required\n");

The caller is already printing these so we'd want to avoid dmesg spam.

> +	/* Set cold reset recovery method */

Comments are more useful for something that's not obvious from the code,
so we try to make the code as self documenting as possible.

> +	xe_device_set_wedged_method(xe, DRM_WEDGE_RECOVERY_COLD_RESET);
> +
> +	if (xe_device_wedged(xe)) {

We should not be at this point if the device is already wedged, so I
believe this needs to be handled somewhere upstream.

> +		drm_dev_wedged_event(&xe->drm, xe->wedged.method, NULL);
> +	} else {
> +		/* Declare device wedged - will trigger uevent with cold reset method */

Same as above.

Raag

> +		xe_device_declare_wedged(xe);
> +	}
> +}
> +
>  /**
>   * xe_hw_error_init - Initialize hw errors
>   * @xe: xe device instance
> diff --git a/drivers/gpu/drm/xe/xe_hw_error.h b/drivers/gpu/drm/xe/xe_hw_error.h
> index d86e28c5180c..f588320eb94d 100644
> --- a/drivers/gpu/drm/xe/xe_hw_error.h
> +++ b/drivers/gpu/drm/xe/xe_hw_error.h
> @@ -11,5 +11,6 @@ struct xe_tile;
>  struct xe_device;
>  
>  void xe_hw_error_irq_handler(struct xe_tile *tile, const u32 master_ctl);
> +void xe_punit_error_handler(struct xe_device *xe);
>  void xe_hw_error_init(struct xe_device *xe);
>  #endif
> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
> index 777321021391..93257d0eaaa0 100644
> --- a/drivers/gpu/drm/xe/xe_ras.c
> +++ b/drivers/gpu/drm/xe/xe_ras.c
> @@ -10,6 +10,7 @@
>  #include "xe_survivability_mode.h"
>  #include "xe_sysctrl_mailbox.h"
>  #include "xe_sysctrl_mailbox_types.h"
> +#include "xe_hw_error.h"
>  
>  #define COMPUTE_ERROR_SEVERITY_MASK		GENMASK(26, 25)
>  #define GLOBAL_UNCORR_ERROR			2
> @@ -148,7 +149,7 @@ static enum xe_ras_recovery_action handle_soc_internal_errors(struct xe_device *
>  			xe_err(xe, "[RAS]: PUNIT %s error detected: 0x%x\n",
>  			       severity_to_str(xe, common_info.severity),
>  			       ieh_error->error_sources_ieh0.punit);
> -			/** TODO: Add PUNIT error handling */
> +			xe_punit_error_handler(xe);
>  			action = XE_RAS_RECOVERY_ACTION_DISCONNECT;
>  		}
>  	}
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/5] drm/xe/debugfs: Add interface to trigger power management unit error handler
  2026-03-30 13:40     ` Mallesh, Koujalagi
@ 2026-04-02  8:31       ` Raag Jadav
  2026-04-06 12:49         ` Mallesh, Koujalagi
  0 siblings, 1 reply; 25+ messages in thread
From: Raag Jadav @ 2026-04-02  8:31 UTC (permalink / raw)
  To: Mallesh, Koujalagi
  Cc: Tauro, Riana, andrealmeid, christian.koenig, airlied,
	simona.vetter, mripard, anshuman.gupta, badal.nilawar,
	karthik.poosa, sk.anirban, intel-xe, dri-devel, rodrigo.vivi

On Mon, Mar 30, 2026 at 07:10:33PM +0530, Mallesh, Koujalagi wrote:
> On 30-03-2026 10:25 am, Tauro, Riana wrote:
> > On 3/18/2026 12:10 PM, Mallesh Koujalagi wrote:
> > > Add a debugfs interface to manually trigger power management unit error
> > > handler for testing cold reset recovery paths. This is useful for
> > > validating the error recovery mechanism.
> > > 
> > > The new debugfs entry 'trigger_punit_error' is located at:
> > >    /sys/kernel/debug/dri/N/trigger_punit_error
> > > 
> > > Reading the file displays usage instructions. Writing '1' invokes
> > > xe_punit_error_handler(), which marks the device as wedged with
> > > DRM_WEDGE_RECOVERY_COLD_RESET method and sends a uevent to userspace
> > > indicating that a complete device power cycle is required for recovery.
> > > 
> > > Writing '0' or any other false value has no effect.
> > > 
> > > This interface is intended for development, testing, and validation
> > > of power management unit error recovery code.
> > 
> > Would fault injection be more appropriate here?
> 
> Here we need a deterministic way to invoke the punit error handler to test
> the cold-reset
> 
> recovery flow end-to-end. Using debugfs interface, we directly triggers
> wedge/reset status via a debugfs write
> 
> rather than using fault injection.

I think the question from Riana was, since fault injection can provide
wider coverage of all different kind of error flows, would it make more
sense to reuse it for punit as well?

Raag

> > > Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
> > > ---
> > >   drivers/gpu/drm/xe/xe_debugfs.c | 38 +++++++++++++++++++++++++++++++++
> > >   1 file changed, 38 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_debugfs.c
> > > b/drivers/gpu/drm/xe/xe_debugfs.c
> > > index 844cfafe1ec7..390bbed9c1af 100644
> > > --- a/drivers/gpu/drm/xe/xe_debugfs.c
> > > +++ b/drivers/gpu/drm/xe/xe_debugfs.c
> > > @@ -18,6 +18,7 @@
> > >   #include "xe_gt_debugfs.h"
> > >   #include "xe_gt_printk.h"
> > >   #include "xe_guc_ads.h"
> > > +#include "xe_hw_error.h"
> > >   #include "xe_mmio.h"
> > >   #include "xe_pm.h"
> > >   #include "xe_psmi.h"
> > > @@ -509,6 +510,40 @@ static const struct file_operations
> > > disable_late_binding_fops = {
> > >       .write = disable_late_binding_set,
> > >   };
> > >   +static ssize_t trigger_punit_error_show(struct file *f, char
> > > __user *ubuf,
> > > +                    size_t size, loff_t *pos)
> > > +{
> > > +    const char *msg = "Write 1 to trigger power management unit
> > > error handler\n";
> > > +
> > > +    return simple_read_from_buffer(ubuf, size, pos, msg, strlen(msg));
> > > +}
> > > +
> > > +static ssize_t trigger_punit_error_set(struct file *f,
> > > +                       const char __user *ubuf,
> > > +                       size_t size, loff_t *pos)
> > > +{
> > > +    struct xe_device *xe = file_inode(f)->i_private;
> > > +    bool trigger;
> > > +    ssize_t ret;
> > > +
> > > +    ret = kstrtobool_from_user(ubuf, size, &trigger);
> > > +    if (ret)
> > > +        return ret;
> > > +
> > > +    if (trigger) {
> > > +        xe_punit_error_handler(xe);
> > > +        drm_info(&xe->drm, "PMU error handler triggered via
> > > debugfs\n");
> > > +    }
> > > +
> > > +    return size;
> > > +}
> > > +
> > > +static const struct file_operations trigger_punit_error_fops = {
> > > +    .owner = THIS_MODULE,
> > > +    .read = trigger_punit_error_show,
> > > +    .write = trigger_punit_error_set,
> > > +};
> > > +
> > >   void xe_debugfs_register(struct xe_device *xe)
> > >   {
> > >       struct ttm_device *bdev = &xe->ttm;
> > > @@ -550,6 +585,9 @@ void xe_debugfs_register(struct xe_device *xe)
> > >       debugfs_create_file("disable_late_binding", 0600, root, xe,
> > >                   &disable_late_binding_fops);
> > >   +    debugfs_create_file("trigger_punit_error", 0600, root, xe,
> > > +                &trigger_punit_error_fops);
> > > +
> > >       /*
> > >        * Don't expose page reclaim configuration file if not
> > > supported by the
> > >        * hardware initially.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/5] drm/doc: Document DRM_WEDGE_RECOVERY_COLD_RESET recovery method
  2026-04-02  8:16   ` Raag Jadav
@ 2026-04-06 12:26     ` Mallesh, Koujalagi
  0 siblings, 0 replies; 25+ messages in thread
From: Mallesh, Koujalagi @ 2026-04-06 12:26 UTC (permalink / raw)
  To: Raag Jadav
  Cc: intel-xe, dri-devel, rodrigo.vivi, andrealmeid, christian.koenig,
	airlied, simona.vetter, mripard, anshuman.gupta, badal.nilawar,
	riana.tauro, karthik.poosa, sk.anirban


On 02-04-2026 01:46 pm, Raag Jadav wrote:
> On Wed, Mar 18, 2026 at 12:10:20PM +0530, Mallesh Koujalagi wrote:
>> Add documentation for the DRM_WEDGE_RECOVERY_COLD_RESET recovery
>> method introduced for handling power management unit errors. This method is
>> designated for severe errors that compromise core device functionality
>> and are unrecoverable via recovery mechanisms such as driver reload or PCIe
>> bus reset. The documentation clarifies when this recovery method should be
>> used and its implications for userspace applications.
> Aesthetic nit: We usually try to utilize the full 75 character space where
> possible (in all patches).
>
>> v2:
>> - Add several instead of number to avoid update. (Jani)
>>
>> Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
>> ---
>>   Documentation/gpu/drm-uapi.rst | 73 +++++++++++++++++++++++++++++++++-
>>   1 file changed, 72 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
>> index d98428a592f1..5b63f1c17b9b 100644
>> --- a/Documentation/gpu/drm-uapi.rst
>> +++ b/Documentation/gpu/drm-uapi.rst
>> @@ -418,7 +418,7 @@ needed.
>>   Recovery
>>   --------
>>   
>> -Current implementation defines four recovery methods, out of which, drivers
>> +Current implementation defines several recovery methods, out of which, drivers
>>   can use any one, multiple or none. Method(s) of choice will be sent in the
>>   uevent environment as ``WEDGED=<method1>[,..,<methodN>]`` in order of less to
>>   more side-effects. See the section `Vendor Specific Recovery`_
>> @@ -435,6 +435,7 @@ following expectations.
>>       rebind          unbind + bind driver
>>       bus-reset       unbind + bus reset/re-enumeration + bind
>>       vendor-specific vendor specific recovery method
>> +    cold-reset      full device cold reset required
> Does this work without unbind + bind?

No. In cold-reset method, we do unbind + power cycle + rescan/bind.

Thanks,

-/Mallesh

>
>>       unknown         consumer policy
>>       =============== ========================================
>>   
>> @@ -446,6 +447,27 @@ telemetry information (devcoredump, syslog). This is useful because the first
>>   hang is usually the most critical one which can result in consequential hangs or
>>   complete wedging.
>>   
>> +Cold Reset Recovery
>> +-------------------
>> +
>> +The ``WEDGED=cold-reset`` event indicates that the device has encountered
>> +power management unit errors that affect core functionality that cannot be
>> +resolved through recovery mechanisms.
> Here cold-reset itself is introduced as a recovery method, so this could
> use better phrasing. Also, we try to be consistent with terminologies
> already in place so that the document is easy to follow.
>
>> +This recovery method is reserved for power management unit error conditions where the
>> +device state cannot be restored via:
>> +
>> +- Driver unbind/rebind operations
>> +- PCIe bus reset and re-enumeration
>> +- Device Function Level Reset (FLR)
>> +- Warm device resets
> These are already documented so not sure if it's worth repeating them.
>
>> +Such power management unit error state typically persists across all software-based
>> +recovery attempts. Only a complete device power cycle can restore
>> +normal operation.
>> +
>> +Upon receiving a ``WEDGED=cold-reset`` event, userspace should initiate
>> +a full cold reset of the affected device to restore functionality.
> This is already covered in consumer expectations so can be dropped.
>
> AI assistance is useful at times but we'd want to make sure that things
> remain objective and translatable ;)
>
> Raag
>
>>   Vendor Specific Recovery
>>   ------------------------
>> @@ -524,6 +546,55 @@ Recovery script::
>>       echo -n $DEVICE > $DRIVER/unbind
>>       echo -n $DEVICE > $DRIVER/bind
>>   
>> +Example - cold-reset
>> +--------------------
>> +
>> +Udev rule::
>> +
>> +    SUBSYSTEM=="drm", ENV{WEDGED}=="cold-reset", DEVPATH=="*/drm/card[0-9]",
>> +    RUN+="/path/to/cold-reset.sh $env{DEVPATH}"
>> +
>> +Recovery script::
>> +
>> +    #!/bin/sh
>> +
>> +    [ -z "$1" ] && echo "Usage: $0 <device-path>" && exit 1
>> +
>> +    # Get device
>> +    DEVPATH=$(readlink -f /sys/$1/device 2>/dev/null || readlink -f /sys/$1)
>> +    DEVICE=$(basename $DEVPATH)
>> +
>> +    echo "Cold reset: $DEVICE"
>> +
>> +    # Try slot power reset first
>> +    SLOT=$(find /sys/bus/pci/slots/ -type l 2>/dev/null | while read slot; do
>> +	    ADDR=$(cat "$slot" 2>/dev/null)
>> +	    [ -n "$ADDR" ] && echo "$DEVICE" | grep -q "^$ADDR" && basename $(dirname "$slot") && break
>> +    done)
>> +
>> +    if [ -n "$SLOT" ]; then
>> +	echo "Using slot $SLOT"
>> +
>> +	# Unbind driver
>> +	[ -e "/sys/bus/pci/devices/$DEVICE/driver" ] && \
>> +	echo "$DEVICE" > /sys/bus/pci/devices/$DEVICE/driver/unbind 2>/dev/null
>> +
>> +	# Remove device
>> +	echo 1 > /sys/bus/pci/devices/$DEVICE/remove
>> +
>> +	# Power cycle slot
>> +	echo 0 > /sys/bus/pci/slots/$SLOT/power
>> +	sleep 2
>> +	echo 1 > /sys/bus/pci/slots/$SLOT/power
>> +	sleep 1
>> +
>> +	# Rescan
>> +	echo 1 > /sys/bus/pci/rescan
>> +	echo "Done!"
>> +    else
>> +	echo "No slot found"
>> +    fi
>> +
>>   Customization
>>   -------------
>>   
>> -- 
>> 2.34.1
>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/5] drm/xe/debugfs: Add interface to trigger power management unit error handler
  2026-04-02  8:31       ` Raag Jadav
@ 2026-04-06 12:49         ` Mallesh, Koujalagi
  0 siblings, 0 replies; 25+ messages in thread
From: Mallesh, Koujalagi @ 2026-04-06 12:49 UTC (permalink / raw)
  To: Raag Jadav
  Cc: Tauro, Riana, andrealmeid, christian.koenig, airlied,
	simona.vetter, mripard, anshuman.gupta, badal.nilawar,
	karthik.poosa, sk.anirban, intel-xe, dri-devel, rodrigo.vivi


On 02-04-2026 02:01 pm, Raag Jadav wrote:
> On Mon, Mar 30, 2026 at 07:10:33PM +0530, Mallesh, Koujalagi wrote:
>> On 30-03-2026 10:25 am, Tauro, Riana wrote:
>>> On 3/18/2026 12:10 PM, Mallesh Koujalagi wrote:
>>>> Add a debugfs interface to manually trigger power management unit error
>>>> handler for testing cold reset recovery paths. This is useful for
>>>> validating the error recovery mechanism.
>>>>
>>>> The new debugfs entry 'trigger_punit_error' is located at:
>>>>     /sys/kernel/debug/dri/N/trigger_punit_error
>>>>
>>>> Reading the file displays usage instructions. Writing '1' invokes
>>>> xe_punit_error_handler(), which marks the device as wedged with
>>>> DRM_WEDGE_RECOVERY_COLD_RESET method and sends a uevent to userspace
>>>> indicating that a complete device power cycle is required for recovery.
>>>>
>>>> Writing '0' or any other false value has no effect.
>>>>
>>>> This interface is intended for development, testing, and validation
>>>> of power management unit error recovery code.
>>> Would fault injection be more appropriate here?
>> Here we need a deterministic way to invoke the punit error handler to test
>> the cold-reset
>>
>> recovery flow end-to-end. Using debugfs interface, we directly triggers
>> wedge/reset status via a debugfs write
>>
>> rather than using fault injection.
> I think the question from Riana was, since fault injection can provide
> wider coverage of all different kind of error flows, would it make more
> sense to reuse it for punit as well?

Thanks Riana/Raag!! will follow up.

Thanks

-/Mallesh

>
> Raag
>
>>>> Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
>>>> ---
>>>>    drivers/gpu/drm/xe/xe_debugfs.c | 38 +++++++++++++++++++++++++++++++++
>>>>    1 file changed, 38 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/xe/xe_debugfs.c
>>>> b/drivers/gpu/drm/xe/xe_debugfs.c
>>>> index 844cfafe1ec7..390bbed9c1af 100644
>>>> --- a/drivers/gpu/drm/xe/xe_debugfs.c
>>>> +++ b/drivers/gpu/drm/xe/xe_debugfs.c
>>>> @@ -18,6 +18,7 @@
>>>>    #include "xe_gt_debugfs.h"
>>>>    #include "xe_gt_printk.h"
>>>>    #include "xe_guc_ads.h"
>>>> +#include "xe_hw_error.h"
>>>>    #include "xe_mmio.h"
>>>>    #include "xe_pm.h"
>>>>    #include "xe_psmi.h"
>>>> @@ -509,6 +510,40 @@ static const struct file_operations
>>>> disable_late_binding_fops = {
>>>>        .write = disable_late_binding_set,
>>>>    };
>>>>    +static ssize_t trigger_punit_error_show(struct file *f, char
>>>> __user *ubuf,
>>>> +                    size_t size, loff_t *pos)
>>>> +{
>>>> +    const char *msg = "Write 1 to trigger power management unit
>>>> error handler\n";
>>>> +
>>>> +    return simple_read_from_buffer(ubuf, size, pos, msg, strlen(msg));
>>>> +}
>>>> +
>>>> +static ssize_t trigger_punit_error_set(struct file *f,
>>>> +                       const char __user *ubuf,
>>>> +                       size_t size, loff_t *pos)
>>>> +{
>>>> +    struct xe_device *xe = file_inode(f)->i_private;
>>>> +    bool trigger;
>>>> +    ssize_t ret;
>>>> +
>>>> +    ret = kstrtobool_from_user(ubuf, size, &trigger);
>>>> +    if (ret)
>>>> +        return ret;
>>>> +
>>>> +    if (trigger) {
>>>> +        xe_punit_error_handler(xe);
>>>> +        drm_info(&xe->drm, "PMU error handler triggered via
>>>> debugfs\n");
>>>> +    }
>>>> +
>>>> +    return size;
>>>> +}
>>>> +
>>>> +static const struct file_operations trigger_punit_error_fops = {
>>>> +    .owner = THIS_MODULE,
>>>> +    .read = trigger_punit_error_show,
>>>> +    .write = trigger_punit_error_set,
>>>> +};
>>>> +
>>>>    void xe_debugfs_register(struct xe_device *xe)
>>>>    {
>>>>        struct ttm_device *bdev = &xe->ttm;
>>>> @@ -550,6 +585,9 @@ void xe_debugfs_register(struct xe_device *xe)
>>>>        debugfs_create_file("disable_late_binding", 0600, root, xe,
>>>>                    &disable_late_binding_fops);
>>>>    +    debugfs_create_file("trigger_punit_error", 0600, root, xe,
>>>> +                &trigger_punit_error_fops);
>>>> +
>>>>        /*
>>>>         * Don't expose page reclaim configuration file if not
>>>> supported by the
>>>>         * hardware initially.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2026-04-06 12:49 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-18  6:40 [PATCH v2 0/5] Introduce cold reset recovery method Mallesh Koujalagi
2026-03-18  6:40 ` [PATCH v2 1/5] Introduce Xe Uncorrectable Error Handling Mallesh Koujalagi
2026-03-18 19:35   ` kernel test robot
2026-03-19 14:42   ` kernel test robot
2026-03-19 20:02   ` kernel test robot
2026-03-18  6:40 ` [PATCH v2 2/5] drm: Add DRM_WEDGE_RECOVERY_COLD_RESET for power management unit error Mallesh Koujalagi
2026-03-30  5:26   ` Tauro, Riana
2026-03-18  6:40 ` [PATCH v2 3/5] drm/doc: Document DRM_WEDGE_RECOVERY_COLD_RESET recovery method Mallesh Koujalagi
2026-03-30  5:00   ` Tauro, Riana
2026-03-30 14:02     ` Mallesh, Koujalagi
2026-04-02  8:16   ` Raag Jadav
2026-04-06 12:26     ` Mallesh, Koujalagi
2026-03-18  6:40 ` [PATCH v2 4/5] drm/xe: Add handler for power management unit errors which require cold-reset Mallesh Koujalagi
2026-03-30  4:54   ` Tauro, Riana
2026-03-30 13:50     ` Mallesh, Koujalagi
2026-04-02  8:19   ` Raag Jadav
2026-03-18  6:40 ` [PATCH v2 5/5] drm/xe/debugfs: Add interface to trigger power management unit error handler Mallesh Koujalagi
2026-03-30  4:55   ` Tauro, Riana
2026-03-30 13:40     ` Mallesh, Koujalagi
2026-04-02  8:31       ` Raag Jadav
2026-04-06 12:49         ` Mallesh, Koujalagi
2026-03-18  6:49 ` ✗ CI.checkpatch: warning for Introduce cold reset recovery method Patchwork
2026-03-18  6:50 ` ✓ CI.KUnit: success " Patchwork
2026-03-18  7:33 ` ✓ Xe.CI.BAT: " Patchwork
2026-03-19 20:20 ` ✓ Xe.CI.FULL: " Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox