* [PATCH v9 0/6] Introduce cold reset recovery method
@ 2026-06-22 13:14 Mallesh Koujalagi
2026-06-22 13:15 ` [PATCH v9 1/6] Introduce Xe Uncorrectable Error Handling Mallesh Koujalagi
` (9 more replies)
0 siblings, 10 replies; 16+ messages in thread
From: Mallesh Koujalagi @ 2026-06-22 13:14 UTC (permalink / raw)
To: intel-xe, dri-devel, rodrigo.vivi
Cc: andrealmeid, christian.koenig, airlied, simona.vetter, mripard,
maarten.lankhorst, tzimmermann, anshuman.gupta, badal.nilawar,
riana.tauro, karthik.poosa, sk.anirban, raag.jadav,
Mallesh Koujalagi
This series builds on top of Introduce Xe Uncorrectable Error Handling[1]
and adds support for handling errors that require a complete
device power cycle (cold reset) to recover.
Certain error conditions leave the device in a persistent hardware
error state that cannot be cleared through existing recovery mechanisms
such as driver reload or PCIe reset. In these cases, functionality can
only be restored by performing a cold reset.
To support this, the series introduces a new DRM wedging recovery
method, DRM_WEDGE_RECOVERY_COLD_RESET (BIT(4)). When a device is wedged
with this method, the DRM core notifies userspace via a uevent that a cold
reset is required. This allows userspace to take appropriate action to
power-cycle the device.
Example uevent received:
SUBSYSTEM=drm
WEDGED=cold-reset
DEVPATH=/devices/.../drm/card0
Detailed description in commit message.
[1] https://patchwork.freedesktop.org/series/160482/
This patch series introduces a call to punit_error_handler() from
within handle_soc_internal_errors() when PUNIT errors detected.
v2:
- Add use case: Handling errors from power management unit,
which requires a complete power cycle to
recover. (Christian)
- Add several instead of number to avoid update. (Jani)
v3:
- Update any scenario that requires cold-reset. (Riana)
- Update document with generic scenario. (Riana)
- Consistent with terminology. (Raag)
- Remove already covered information.
- Use PUNIT instead of PMU. (Riana)
- Use consistent wordingi.
- Remove log. (Raag)
v4:
- Rename cold reset to power cyclce. (Raag)
- Update doc. (Raag/Riana)
- Change commit message. (Raag)
- Make function static. (Raag)
v5:
- Make it consistent with consumer expectations. (Raag)
- Update commit message.
- Remove unbind.
- Simplify cold-reset script.
- Remove kdoc for static function.
- Remove xe_ prefix for static function.
v6:
- Drop "last resort" wording. (Riana)
- Look up the hotplug slot in DEVPATH instead of scanning
every PCI slot on the system. (Raag)
- Drop arbitrary sleep values from the example script.
- Expand commit message to explain why SUR_DN is masked. (Raag/Riana)
- Check Slot Implemented bit before reading Slot Capabilities, per
PCIe spec. (Riana)
- Add debug log.
v7:
- Update recovery script. (Raag)
- Handle surprise link down event properly. (Aravind/Riana)
- Update commit message. (Riana)
- Correct log message.
v8:
- Add rescan instead of reset. (Raag)
- Use find_usp_dev() in punit_error_handler() function.
v9:
- Remove unwanted header. (Sashiko)
- Removed #ifdef CONFIG_PCIEAER. (Riana)
- Used pci_find_ext_capability() instead of usp->aer_cap.
- Clear the PCI_ERR_UNC_SURPDN status bit (W1C) after
reset complete. (Lukas Wunner)
- Use pci_clear_and_set_config_dword() helper.
Cc: André Almeida <andrealmeid@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona.vetter@ffwll.ch>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Mallesh Koujalagi (5):
drm: Add DRM_WEDGE_RECOVERY_COLD_RESET recovery method
drm/doc: Document DRM_WEDGE_RECOVERY_COLD_RESET recovery method
drm/xe: Handle PUNIT errors by requesting cold-reset recovery
drm/xe: Suppress Surprise Link Down on device
drm/xe/ras: Use fault-inject to trigger punit error handler
Riana Tauro (1):
Introduce Xe Uncorrectable Error Handling
Documentation/gpu/drm-uapi.rst | 85 ++-
drivers/gpu/drm/drm_drv.c | 2 +
drivers/gpu/drm/xe/Makefile | 1 +
drivers/gpu/drm/xe/xe_debugfs.c | 3 +
drivers/gpu/drm/xe/xe_device.c | 5 +-
drivers/gpu/drm/xe/xe_device.h | 27 +-
drivers/gpu/drm/xe/xe_device_types.h | 12 +-
drivers/gpu/drm/xe/xe_gt.c | 14 +-
drivers/gpu/drm/xe/xe_guc_submit.c | 9 +-
drivers/gpu/drm/xe/xe_pci.c | 9 +
drivers/gpu/drm/xe/xe_pci_error.c | 135 +++++
drivers/gpu/drm/xe/xe_pci_error.h | 13 +
drivers/gpu/drm/xe/xe_ras.c | 543 +++++++++++++++++-
drivers/gpu/drm/xe/xe_ras.h | 3 +-
drivers/gpu/drm/xe/xe_ras_types.h | 231 +++++++-
drivers/gpu/drm/xe/xe_survivability_mode.c | 13 +-
drivers/gpu/drm/xe/xe_sysctrl_event.c | 2 +-
drivers/gpu/drm/xe/xe_sysctrl_event_types.h | 3 -
drivers/gpu/drm/xe/xe_sysctrl_mailbox.h | 1 -
drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h | 11 +
include/drm/drm_device.h | 1 +
21 files changed, 1085 insertions(+), 38 deletions(-)
create mode 100644 drivers/gpu/drm/xe/xe_pci_error.c
create mode 100644 drivers/gpu/drm/xe/xe_pci_error.h
--
2.34.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v9 1/6] Introduce Xe Uncorrectable Error Handling
2026-06-22 13:14 [PATCH v9 0/6] Introduce cold reset recovery method Mallesh Koujalagi
@ 2026-06-22 13:15 ` Mallesh Koujalagi
2026-06-23 20:54 ` sashiko-bot
2026-06-22 13:15 ` [PATCH v9 2/6] drm: Add DRM_WEDGE_RECOVERY_COLD_RESET recovery method Mallesh Koujalagi
` (8 subsequent siblings)
9 siblings, 1 reply; 16+ messages in thread
From: Mallesh Koujalagi @ 2026-06-22 13:15 UTC (permalink / raw)
To: intel-xe, dri-devel, rodrigo.vivi
Cc: andrealmeid, christian.koenig, airlied, simona.vetter, mripard,
maarten.lankhorst, tzimmermann, anshuman.gupta, badal.nilawar,
riana.tauro, karthik.poosa, sk.anirban, raag.jadav,
Mallesh Koujalagi
From: Riana Tauro <riana.tauro@intel.com>
DO NOT REVIEW. COMPILATION ONLY
This patch is from https://patchwork.freedesktop.org/series/160482/
Added only for Compilation.
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
---
drivers/gpu/drm/xe/Makefile | 1 +
drivers/gpu/drm/xe/xe_device.c | 5 +-
drivers/gpu/drm/xe/xe_device.h | 27 +-
drivers/gpu/drm/xe/xe_device_types.h | 12 +-
drivers/gpu/drm/xe/xe_gt.c | 14 +-
drivers/gpu/drm/xe/xe_guc_submit.c | 9 +-
drivers/gpu/drm/xe/xe_pci.c | 9 +
drivers/gpu/drm/xe/xe_pci_error.c | 135 +++++
drivers/gpu/drm/xe/xe_pci_error.h | 13 +
drivers/gpu/drm/xe/xe_ras.c | 481 +++++++++++++++++-
drivers/gpu/drm/xe/xe_ras.h | 3 +-
drivers/gpu/drm/xe/xe_ras_types.h | 231 ++++++++-
drivers/gpu/drm/xe/xe_survivability_mode.c | 13 +-
drivers/gpu/drm/xe/xe_sysctrl_event.c | 2 +-
drivers/gpu/drm/xe/xe_sysctrl_event_types.h | 3 -
drivers/gpu/drm/xe/xe_sysctrl_mailbox.h | 1 -
drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h | 11 +
17 files changed, 933 insertions(+), 37 deletions(-)
create mode 100644 drivers/gpu/drm/xe/xe_pci_error.c
create mode 100644 drivers/gpu/drm/xe/xe_pci_error.h
diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index 8e7b146880f4..3c001b2a4aec 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -101,6 +101,7 @@ xe-y += xe_bb.o \
xe_page_reclaim.o \
xe_pat.o \
xe_pci.o \
+ xe_pci_error.o \
xe_pci_rebar.o \
xe_pcode.o \
xe_pm.o \
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index b6e49309a99f..46d27b6f3eea 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -918,7 +918,7 @@ static void xe_device_wedged_fini(struct drm_device *drm, void *arg)
{
struct xe_device *xe = arg;
- if (atomic_read(&xe->wedged.flag))
+ if (atomic_read(&xe->wedged.fini))
xe_pm_runtime_put(xe);
}
@@ -1425,7 +1425,8 @@ void xe_device_declare_wedged(struct xe_device *xe)
return;
}
- if (!atomic_xchg(&xe->wedged.flag, 1)) {
+ if (!atomic_xchg(&xe->wedged.fini, 1)) {
+ xe_device_wedged_get(xe);
xe->needs_flr_on_fini = true;
xe_pm_runtime_get_noresume(xe);
drm_err(&xe->drm,
diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
index 975768a6a9c8..e177c05a7a95 100644
--- a/drivers/gpu/drm/xe/xe_device.h
+++ b/drivers/gpu/drm/xe/xe_device.h
@@ -181,6 +181,21 @@ static inline bool xe_device_has_mert(const struct xe_device *xe)
return xe->info.has_mert;
}
+static inline bool xe_device_is_in_reset(struct xe_device *xe)
+{
+ return atomic_read(&xe->in_reset);
+}
+
+static inline void xe_device_set_in_reset(struct xe_device *xe)
+{
+ atomic_set(&xe->in_reset, 1);
+}
+
+static inline void xe_device_clear_in_reset(struct xe_device *xe)
+{
+ atomic_set(&xe->in_reset, 0);
+}
+
u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size);
void xe_device_snapshot_print(struct xe_device *xe, struct drm_printer *p);
@@ -192,9 +207,19 @@ bool xe_device_is_l2_flush_optimized(struct xe_device *xe);
void xe_device_td_flush(struct xe_device *xe);
void xe_device_l2_flush(struct xe_device *xe);
+static inline void xe_device_wedged_get(struct xe_device *xe)
+{
+ atomic_inc(&xe->wedged.ref);
+}
+
+static inline void xe_device_wedged_put(struct xe_device *xe)
+{
+ atomic_dec(&xe->wedged.ref);
+}
+
static inline bool xe_device_wedged(struct xe_device *xe)
{
- return atomic_read(&xe->wedged.flag);
+ return atomic_read(&xe->wedged.ref);
}
void xe_device_set_wedged_method(struct xe_device *xe, unsigned long method);
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 4e2f115f14e2..ddf7fd57da98 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -485,10 +485,15 @@ struct xe_device {
/** @needs_flr_on_fini: requests function-reset on fini */
bool needs_flr_on_fini;
+ /** @in_reset: Indicates if device is in reset */
+ atomic_t in_reset;
+
/** @wedged: Struct to control Wedged States and mode */
struct {
- /** @wedged.flag: Xe device faced a critical error and is now blocked. */
- atomic_t flag;
+ /** @wedged.fini: Needs cleanup on fini */
+ atomic_t fini;
+ /** @wedged.ref: Refcount for wedged device, blocks critical path execution */
+ atomic_t ref;
/** @wedged.mode: Mode controlled by kernel parameter and debugfs */
enum xe_wedged_mode mode;
/** @wedged.method: Recovery method to be sent in the drm device wedged uevent */
@@ -497,6 +502,9 @@ struct xe_device {
bool inconsistent_reset;
} wedged;
+ /** @devres_group_id: id for devres group */
+ void *devres_group_id;
+
/** @bo_device: Struct to control async free of BOs */
struct xe_bo_dev {
/** @bo_device.async_free: Free worker */
diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index 783eb6d631b5..d904527a8898 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -917,6 +917,9 @@ static void gt_reset_worker(struct work_struct *w)
if (xe_device_wedged(gt_to_xe(gt)))
goto err_pm_put;
+ if (xe_device_is_in_reset(gt_to_xe(gt)))
+ goto err_pm_put;
+
/* We only support GT resets with GuC submission */
if (!xe_device_uc_enabled(gt_to_xe(gt)))
goto err_pm_put;
@@ -977,18 +980,21 @@ static void gt_reset_worker(struct work_struct *w)
void xe_gt_reset_async(struct xe_gt *gt)
{
- xe_gt_info(gt, "trying reset from %ps\n", __builtin_return_address(0));
+ struct xe_device *xe = gt_to_xe(gt);
+
+ if (xe_device_is_in_reset(xe))
+ return;
/* Don't do a reset while one is already in flight */
if (!xe_fault_inject_gt_reset() && xe_uc_reset_prepare(>->uc))
return;
- xe_gt_info(gt, "reset queued\n");
+ xe_gt_info(gt, "reset queued from %ps\n", __builtin_return_address(0));
/* Pair with put in gt_reset_worker() if work is enqueued */
- xe_pm_runtime_get_noresume(gt_to_xe(gt));
+ xe_pm_runtime_get_noresume(xe);
if (!queue_work(gt->ordered_wq, >->reset.worker))
- xe_pm_runtime_put(gt_to_xe(gt));
+ xe_pm_runtime_put(xe);
}
void xe_gt_suspend_prepare(struct xe_gt *gt)
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 9458bf477fa6..12416bfa3255 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1532,7 +1532,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
* If devcoredump not captured and GuC capture for the job is not ready
* do manual capture first and decide later if we need to use it
*/
- if (!exec_queue_killed(q) && !xe->devcoredump.captured &&
+ if (!xe_device_is_in_reset(xe) && !exec_queue_killed(q) && !xe->devcoredump.captured &&
!xe_guc_capture_get_matching_and_lock(q)) {
/* take force wake before engine register manual capture */
CLASS(xe_force_wake, fw_ref)(gt_to_fw(q->gt), XE_FORCEWAKE_ALL);
@@ -1554,8 +1554,8 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
set_exec_queue_banned(q);
/* Kick job / queue off hardware */
- if (!wedged && (exec_queue_enabled(primary) ||
- exec_queue_pending_disable(primary))) {
+ if (!xe_device_is_in_reset(xe) && !wedged &&
+ (exec_queue_enabled(primary) || exec_queue_pending_disable(primary))) {
int ret;
if (exec_queue_reset(primary))
@@ -1623,7 +1623,8 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
trace_xe_sched_job_timedout(job);
- if (!exec_queue_killed(q))
+ /* Do not access device if in reset */
+ if (!xe_device_is_in_reset(xe) && !exec_queue_killed(q))
xe_devcoredump(q, job,
"Timedout job - seqno=%u, lrc_seqno=%u, guc_id=%d, flags=0x%lx",
xe_sched_job_seqno(job), xe_sched_job_lrc_seqno(job),
diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
index 9c249454cc95..31097725b108 100644
--- a/drivers/gpu/drm/xe/xe_pci.c
+++ b/drivers/gpu/drm/xe/xe_pci.c
@@ -26,6 +26,7 @@
#include "xe_guc.h"
#include "xe_mmio.h"
#include "xe_module.h"
+#include "xe_pci_error.h"
#include "xe_pci_rebar.h"
#include "xe_pci_sriov.h"
#include "xe_pci_types.h"
@@ -1079,6 +1080,7 @@ static int xe_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
const struct xe_device_desc *desc = (const void *)ent->driver_data;
const struct xe_subplatform_desc *subplatform_desc;
struct xe_device *xe;
+ void *devres_id;
int err;
subplatform_desc = find_subplatform(desc, pdev->device);
@@ -1106,6 +1108,10 @@ static int xe_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
if (xe_display_driver_probe_defer(pdev))
return -EPROBE_DEFER;
+ devres_id = devres_open_group(&pdev->dev, NULL, GFP_KERNEL);
+ if (!devres_id)
+ return -ENOMEM;
+
err = pcim_enable_device(pdev);
if (err)
return err;
@@ -1114,6 +1120,8 @@ static int xe_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
if (IS_ERR(xe))
return PTR_ERR(xe);
+ xe->devres_group_id = devres_id;
+
pci_set_drvdata(pdev, &xe->drm);
xe_pm_assert_unbounded_bridge(xe);
@@ -1352,6 +1360,7 @@ static struct pci_driver xe_pci_driver = {
.remove = xe_pci_remove,
.shutdown = xe_pci_shutdown,
.sriov_configure = xe_pci_sriov_configure,
+ .err_handler = &xe_pci_error_handlers,
#ifdef CONFIG_PM_SLEEP
.driver.pm = &xe_pm_ops,
#endif
diff --git a/drivers/gpu/drm/xe/xe_pci_error.c b/drivers/gpu/drm/xe/xe_pci_error.c
new file mode 100644
index 000000000000..b08601f470d6
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_pci_error.c
@@ -0,0 +1,135 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#include <linux/pci.h>
+
+#include "xe_device.h"
+#include "xe_gt.h"
+#include "xe_pci.h"
+#include "xe_printk.h"
+#include "xe_ras.h"
+#include "xe_survivability_mode.h"
+
+static void prepare_device_for_reset(struct pci_dev *pdev)
+{
+ struct xe_device *xe = pdev_to_xe_device(pdev);
+ struct xe_gt *gt;
+ u8 id;
+
+ xe_device_set_in_reset(xe);
+
+ /* Wedge the device to prevent userspace access during reset */
+ xe_device_wedged_get(xe);
+
+ for_each_gt(gt, xe, id)
+ xe_gt_declare_wedged(gt);
+
+ pci_disable_device(pdev);
+}
+
+static pci_ers_result_t ras_action_to_pci_result(struct pci_dev *pdev, u32 action)
+{
+ switch (action) {
+ case XE_RAS_RECOVERY_ACTION_RECOVERED:
+ return PCI_ERS_RESULT_RECOVERED;
+ case XE_RAS_RECOVERY_ACTION_RESET:
+ prepare_device_for_reset(pdev);
+ return PCI_ERS_RESULT_NEED_RESET;
+ case XE_RAS_RECOVERY_ACTION_DISCONNECT:
+ return PCI_ERS_RESULT_DISCONNECT;
+ default:
+ return PCI_ERS_RESULT_DISCONNECT;
+ }
+}
+
+static pci_ers_result_t xe_pci_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
+{
+ struct xe_device *xe = pdev_to_xe_device(pdev);
+
+ xe_err(xe, "PCI error: detected state = %u\n", state);
+
+ if (state == pci_channel_io_perm_failure)
+ return PCI_ERS_RESULT_DISCONNECT;
+
+ /* If the device is already wedged or in survivability mode, do not attempt recovery */
+ if (xe_survivability_mode_is_boot_enabled(xe) || xe_device_wedged(xe))
+ return PCI_ERS_RESULT_DISCONNECT;
+
+ switch (state) {
+ case pci_channel_io_normal:
+ return PCI_ERS_RESULT_CAN_RECOVER;
+ case pci_channel_io_frozen:
+ prepare_device_for_reset(pdev);
+ return PCI_ERS_RESULT_NEED_RESET;
+ default:
+ xe_err(xe, "PCI error: unknown state %d\n", state);
+ return PCI_ERS_RESULT_NEED_RESET;
+ }
+}
+
+static pci_ers_result_t xe_pci_error_mmio_enabled(struct pci_dev *pdev)
+{
+ struct xe_device *xe = pdev_to_xe_device(pdev);
+ enum xe_ras_recovery_action action;
+
+ xe_err(xe, "PCI error: MMIO enabled\n");
+
+ action = xe_ras_process_errors(xe);
+
+ return ras_action_to_pci_result(pdev, action);
+}
+
+static pci_ers_result_t xe_pci_error_slot_reset(struct pci_dev *pdev)
+{
+ const struct pci_device_id *ent = pci_match_id(pdev->driver->id_table, pdev);
+ struct xe_device *xe = pdev_to_xe_device(pdev);
+
+ xe_err(xe, "PCI error: slot reset\n");
+
+ pci_restore_state(pdev);
+
+ if (pci_enable_device(pdev)) {
+ xe_err(xe, "Cannot re-enable PCI device after reset\n");
+ return PCI_ERS_RESULT_DISCONNECT;
+ }
+
+ /*
+ * Secondary Bus Reset causes all VRAM state to be lost along with
+ * hardware state. As an initial step, re-probe the device to
+ * re-initialize the driver and hardware.
+ * TODO: optimize by re-initializing only the hardware state and re-creating
+ * kernel BOs.
+ */
+ xe_device_clear_in_reset(xe);
+ pdev->driver->remove(pdev);
+ devres_release_group(&pdev->dev, xe->devres_group_id);
+
+ if (pdev->driver->probe(pdev, ent))
+ return PCI_ERS_RESULT_DISCONNECT;
+
+ xe = pdev_to_xe_device(pdev);
+
+ /* Wedge the device to prevent I/O operations till the resume callback */
+ xe_device_wedged_get(xe);
+
+ return PCI_ERS_RESULT_RECOVERED;
+}
+
+static void xe_pci_error_resume(struct pci_dev *pdev)
+{
+ struct xe_device *xe = pdev_to_xe_device(pdev);
+
+ xe_err(xe, "PCI error: resume\n");
+
+ /* Resume I/O operations */
+ xe_device_wedged_put(xe);
+}
+
+const struct pci_error_handlers xe_pci_error_handlers = {
+ .error_detected = xe_pci_error_detected,
+ .mmio_enabled = xe_pci_error_mmio_enabled,
+ .slot_reset = xe_pci_error_slot_reset,
+ .resume = xe_pci_error_resume,
+};
diff --git a/drivers/gpu/drm/xe/xe_pci_error.h b/drivers/gpu/drm/xe/xe_pci_error.h
new file mode 100644
index 000000000000..725ad0214e62
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_pci_error.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#ifndef _XE_PCI_ERROR_H_
+#define _XE_PCI_ERROR_H_
+
+struct pci_error_handlers;
+
+extern const struct pci_error_handlers xe_pci_error_handlers;
+
+#endif
diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
index 44f4e1a3455b..d6db0e98586f 100644
--- a/drivers/gpu/drm/xe/xe_ras.c
+++ b/drivers/gpu/drm/xe/xe_ras.c
@@ -3,17 +3,23 @@
* Copyright © 2026 Intel Corporation
*/
+#include "xe_assert.h"
+#include "xe_bo.h"
#include "xe_device.h"
#include "xe_drm_ras.h"
#include "xe_pm.h"
#include "xe_printk.h"
#include "xe_ras.h"
#include "xe_ras_types.h"
+#include "xe_survivability_mode.h"
#include "xe_sysctrl.h"
#include "xe_sysctrl_event_types.h"
#include "xe_sysctrl_mailbox.h"
#include "xe_sysctrl_mailbox_types.h"
+#define CORE_COMPUTE_UNCORR_TYPE GENMASK(26, 25)
+#define GLOBAL_UNCORR_ERROR 2
+
/* Severity of detected errors */
enum xe_ras_severity {
XE_RAS_SEV_NOT_SUPPORTED = 0,
@@ -35,17 +41,6 @@ enum xe_ras_component {
XE_RAS_COMP_MAX
};
-/* RAS response status codes */
-enum xe_ras_response_status {
- XE_RAS_STATUS_SUCCESS = 0,
- XE_RAS_STATUS_INVALID_PARAM,
- XE_RAS_STATUS_OP_NOT_SUPPORTED,
- XE_RAS_STATUS_TIMEOUT,
- XE_RAS_STATUS_HARDWARE_FAILURE,
- XE_RAS_STATUS_INSUFFICIENT_RESOURCES,
- XE_RAS_STATUS_MAX
-};
-
static const char *const xe_ras_severities[] = {
[XE_RAS_SEV_NOT_SUPPORTED] = "Not Supported",
[XE_RAS_SEV_CORRECTABLE] = "Correctable Error",
@@ -131,6 +126,305 @@ static inline const char *comp_to_str(u8 component)
return xe_ras_components[component];
}
+static int send_page_offline(struct xe_device *xe, enum xe_ras_page_action action, u64 page_address)
+{
+ struct xe_sysctrl_mailbox_command command = {0};
+ struct xe_ras_page_offline_request request = {0};
+ struct xe_ras_page_offline_response response = {0};
+ size_t rlen;
+ int ret;
+
+ if (!xe->info.has_sysctrl)
+ return 0;
+
+ if (action >= XE_RAS_PAGE_ACTION_MAX) {
+ xe_err(xe, "[RAS]: Invalid page offline action %d\n", action);
+ return -EINVAL;
+ }
+
+ request.page_address = page_address;
+ request.action = action;
+
+ xe_sysctrl_create_command(&command, XE_SYSCTRL_GROUP_GFSP, XE_SYSCTRL_CMD_PAGE_OFFLINE,
+ &request, sizeof(request), &response, sizeof(response));
+
+ ret = xe_sysctrl_send_command(&xe->sc, &command, &rlen);
+ if (ret) {
+ xe_err(xe, "sysctrl: failed to send page offline command %d\n", ret);
+ return ret;
+ }
+
+ if (rlen != sizeof(response)) {
+ xe_err(xe, "sysctrl: unexpected page offline response length %zu (expected %zu)\n",
+ rlen, sizeof(response));
+ return -EINVAL;
+ }
+
+ ret = ras_status_to_errno(response.status);
+ if (ret) {
+ xe_err(xe, "sysctrl: page offline command failed with status %d\n",
+ response.status);
+ }
+
+ return ret;
+}
+
+static int handle_page_offline(struct xe_device *xe, u64 page_address, bool send_offline_cmd)
+{
+ enum xe_ras_page_action action;
+ int ret = 0;
+
+ if (!IS_ALIGNED(page_address, XE_PAGE_SIZE)) {
+ xe_err(xe, "sysctrl: Unaligned page address: 0x%llx\n", page_address);
+ return -EINVAL;
+ }
+
+ /*
+ * TODO: Call function to handle address fault
+ * ret = xe_ttm_vram_handle_addr_fault(xe, page_address);
+ */
+
+ /*
+ * Handle return code from address fault handling function:
+ * 0: Address is valid and can be offlined
+ * -EIO: Address belongs to a critical BO that cannot be offlined
+ * -EOPNOTSUPP: Address is valid and can be offlined but user policy is not to offline
+ *
+ * For any other non-zero error code, skip offlining.
+ */
+
+ switch (ret) {
+ case 0:
+ action = XE_RAS_PAGE_ACTION_OFFLINE;
+ break;
+ /* User policy set to decline page offlining */
+ case -EOPNOTSUPP:
+ action = XE_RAS_PAGE_ACTION_DECLINE;
+ break;
+ case -EIO:
+ xe_err(xe, "[RAS]: Page address belongs to critical BO: 0x%llx\n",
+ page_address);
+ return ret;
+ default:
+ xe_err(xe, "[RAS]: Failed to handle address fault 0x%llx: %d\n",
+ page_address, ret);
+ return 0;
+ }
+
+ if (send_offline_cmd) {
+ ret = send_page_offline(xe, action, page_address);
+ if (ret)
+ xe_err(xe, "sysctrl: Failed to offline page for address 0x%llx: %d\n",
+ page_address, ret);
+ return ret;
+ }
+
+ return 0;
+}
+
+static enum xe_ras_recovery_action handle_core_compute_errors(struct xe_ras_error_array *arr)
+{
+ struct xe_ras_compute_error *error_info = (void *)arr->details;
+ u8 uncorr_type;
+
+ uncorr_type = FIELD_GET(CORE_COMPUTE_UNCORR_TYPE, error_info->log_header);
+
+ /* Request a reset if error is global */
+ if (uncorr_type == GLOBAL_UNCORR_ERROR)
+ return XE_RAS_RECOVERY_ACTION_RESET;
+
+ /*
+ * No action needed for other errors.
+ * Local errors are recovered using an engine reset by GuC.
+ */
+ return XE_RAS_RECOVERY_ACTION_RECOVERED;
+}
+
+static struct pci_dev *find_usp_dev(struct pci_dev *pdev)
+{
+ struct pci_dev *vsp;
+
+ /*
+ * Device Hierarchy:
+ *
+ * Upstream Switch Port (USP) --> Virtual Switch Port (VSP) --> SGunit (GPU endpoint)
+ */
+ vsp = pci_upstream_bridge(pdev);
+ if (!vsp)
+ return NULL;
+
+ return pci_upstream_bridge(vsp);
+}
+
+static enum xe_ras_recovery_action handle_soc_internal_errors(struct xe_device *xe,
+ struct xe_ras_error_array *arr)
+{
+ struct xe_ras_soc_error *info = (void *)arr->details;
+ struct xe_ras_soc_error_source *source = &info->source;
+ struct xe_ras_error_class *counter = &arr->counter;
+
+ if (source->csc) {
+ struct xe_ras_csc_error *csc_error = (void *)info->details;
+
+ /*
+ * CSC uncorrectable errors are classified as hardware errors and firmware errors.
+ * CSC firmware errors are critical errors that can be recovered only by firmware
+ * update via SPI driver. On a CSC firmware error, PCODE enables FDO mode and sets
+ * the bit in the capability register. On receiving this error, the driver enables
+ * runtime survivability mode which notifies userspace that a firmware update
+ * is required.
+ */
+ if (csc_error->hec_fw_error) {
+ xe_err(xe, "[RAS]: CSC %s detected: 0x%x\n",
+ sev_to_str(counter->common.severity),
+ csc_error->hec_fw_error);
+ xe_survivability_mode_runtime_enable(xe);
+ return XE_RAS_RECOVERY_ACTION_DISCONNECT;
+ }
+ } else if (source->ieh) {
+ struct xe_ras_ieh_error *ieh_error = (void *)info->details;
+
+ if (ieh_error->global_error_status & XE_RAS_SOC_IEH_PUNIT) {
+ xe_err(xe, "[RAS]: PUNIT %s detected: 0x%x\n",
+ sev_to_str(counter->common.severity),
+ ieh_error->global_error_status);
+ /* TODO: Add PUNIT error handling */
+ return XE_RAS_RECOVERY_ACTION_DISCONNECT;
+ }
+ }
+
+ /* For other SOC internal errors, request a reset as recovery mechanism */
+ return XE_RAS_RECOVERY_ACTION_RESET;
+}
+
+static enum xe_ras_recovery_action handle_device_memory_errors(struct xe_device *xe,
+ struct xe_ras_error_array *arr)
+{
+ struct xe_ras_memory_error *info = (void *)arr->details;
+ int ret;
+
+ /*
+ * For memory errors, the recovery action depends on the error category
+ *
+ * Double bit ECC: Page offline handling
+ * Poison and data parity errors: Log only
+ * For any other memory errors, request a reset as recovery mechanism
+ */
+ switch (info->category) {
+ case XE_RAS_MEMORY_ECC:
+ xe_err(xe, "[RAS]: Double-bit ECC error detected at sw address 0x%llx\n",
+ info->sw_address);
+ ret = handle_page_offline(xe, info->sw_address, true);
+ if (ret)
+ return XE_RAS_RECOVERY_ACTION_RESET;
+ break;
+ case XE_RAS_MEMORY_POISON:
+ xe_info(xe, "[RAS]: Poison error detected\n");
+ break;
+ case XE_RAS_MEMORY_DATA_PARITY:
+ xe_info(xe, "[RAS]: Data parity error detected\n");
+ break;
+ default:
+ return XE_RAS_RECOVERY_ACTION_RESET;
+ }
+
+ return XE_RAS_RECOVERY_ACTION_RECOVERED;
+}
+
+static void get_queued_pages(struct xe_device *xe)
+{
+ struct xe_sysctrl_mailbox_command command = {0};
+ struct xe_ras_page_offline_queue response = {0};
+ u32 count = 0;
+ size_t rlen;
+ int ret, i;
+
+ /* Supported only on platforms with system controller */
+ if (!xe->info.has_sysctrl)
+ return;
+
+ xe_sysctrl_create_command(&command, XE_SYSCTRL_GROUP_GFSP,
+ XE_SYSCTRL_CMD_GET_OFFLINE_QUEUE, NULL, 0, &response,
+ sizeof(response));
+
+ do {
+ memset(&response, 0, sizeof(response));
+
+ ret = xe_sysctrl_send_command(&xe->sc, &command, &rlen);
+ if (ret) {
+ xe_err(xe, "sysctrl: failed to get page offline queue %d\n", ret);
+ return;
+ }
+
+ if (rlen != sizeof(response)) {
+ xe_err(xe, "sysctrl: unexpected page offline queue response length %zu (expected %zu)\n",
+ rlen, sizeof(response));
+ return;
+ }
+
+ for (i = 0; i < response.pages_returned && i < XE_RAS_NUM_PAGES; i++)
+ handle_page_offline(xe, response.page_addresses[i], true);
+
+ count += response.pages_returned;
+ if (!response.pages_returned)
+ break;
+
+ if (count > response.total_pages) {
+ xe_err(xe, "sysctrl: Pages returned from queue exceed total pages %u, returned %u\n",
+ response.total_pages, count);
+ return;
+ }
+ } while (response.additional_data);
+}
+
+static void get_offlined_list(struct xe_device *xe)
+{
+ struct xe_sysctrl_mailbox_command command = {0};
+ struct xe_ras_offline_list_response response = {0};
+ struct xe_ras_offline_list_request request = {0};
+ u32 count = 0;
+ size_t rlen;
+ int ret, i;
+
+ /* Supported only on platforms with system controller */
+ if (!xe->info.has_sysctrl)
+ return;
+
+ xe_sysctrl_create_command(&command, XE_SYSCTRL_GROUP_GFSP, XE_SYSCTRL_CMD_GET_OFFLINE_LIST,
+ &request, sizeof(request), &response, sizeof(response));
+
+ do {
+ memset(&response, 0, sizeof(response));
+
+ request.index = count;
+
+ ret = xe_sysctrl_send_command(&xe->sc, &command, &rlen);
+ if (ret) {
+ xe_err(xe, "sysctrl: failed to get page offline list %d\n", ret);
+ return;
+ }
+
+ if (rlen != sizeof(response)) {
+ xe_err(xe, "sysctrl: unexpected page offline list response length %zu (expected %zu)\n",
+ rlen, sizeof(response));
+ return;
+ }
+
+ for (i = 0; i < response.pages_returned && i < XE_RAS_NUM_PAGES; i++)
+ handle_page_offline(xe, response.page_addresses[i], false);
+
+ count += response.pages_returned;
+ if (!response.pages_returned)
+ break;
+
+ if (count > response.total_pages) {
+ xe_err(xe, "sysctrl: Pages returned from list exceed total pages %u, returned %u\n",
+ response.total_pages, count);
+ return;
+ }
+ } while (response.additional_data);
+}
+
void xe_ras_counter_threshold_crossed(struct xe_device *xe,
struct xe_sysctrl_event_response *response)
{
@@ -270,16 +564,173 @@ int xe_ras_clear_counter(struct xe_device *xe, u8 severity, u8 component)
return 0;
}
+static void aer_unmask_and_downgrade_internal_error(struct xe_device *xe)
+{
+ struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
+ u32 aer_uncorr_mask, aer_uncorr_sev, aer_uncorr_status;
+ struct pci_dev *usp;
+ u16 aer_cap;
+
+ usp = find_usp_dev(pdev);
+ if (!usp)
+ return;
+
+ aer_cap = pci_find_ext_capability(usp, PCI_EXT_CAP_ID_ERR);
+ if (!aer_cap) {
+ dev_info(&usp->dev, "AER capability unavailable\n");
+ return;
+ }
+
+ /*
+ * Clear any stale Uncorrectable Internal Error Status event in Uncorrectable Error
+ * Status Register.
+ */
+ pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_STATUS, &aer_uncorr_status);
+ if (aer_uncorr_status & PCI_ERR_UNC_INTN)
+ pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_STATUS, PCI_ERR_UNC_INTN);
+
+ /*
+ * All errors are steered to USP which is a PCIe AER Compliant device.
+ * Downgrade all the errors to non-fatal to prevent PCIe bus driver
+ * from triggering a Secondary Bus Reset (SBR). This allows error
+ * detection, containment and recovery in the driver.
+ *
+ * The Uncorrectable Error Severity Register has the 'Uncorrectable
+ * Internal Error Severity' set to fatal by default. Set this to
+ * non-fatal and unmask the error.
+ */
+
+ /* Initialize Uncorrectable Error Severity Register */
+ pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_SEVER, &aer_uncorr_sev);
+ aer_uncorr_sev &= ~PCI_ERR_UNC_INTN;
+ pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_SEVER, aer_uncorr_sev);
+
+ /* Initialize Uncorrectable Error Mask Register */
+ pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, &aer_uncorr_mask);
+ aer_uncorr_mask &= ~PCI_ERR_UNC_INTN;
+ pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, aer_uncorr_mask);
+
+ pci_save_state(usp);
+ dev_dbg(&usp->dev, "Uncorrectable Internal Errors downgraded and unmasked\n");
+}
+
/**
* xe_ras_init - Initialize Xe RAS
* @xe: xe device instance
*
- * Register drm_ras nodes
+ * Initialize Xe RAS
*/
void xe_ras_init(struct xe_device *xe)
{
- if (!xe->info.has_drm_ras)
- return;
+ if (xe->info.has_sysctrl) {
+ if (IS_ENABLED(CONFIG_PCIEAER))
+ aer_unmask_and_downgrade_internal_error(xe);
+
+ get_queued_pages(xe);
+ get_offlined_list(xe);
+
+ /*
+ * During probe, process and log any errors detected by firmware
+ * while the driver was not loaded. Critical errors such as Punit
+ * and CSC are reported through Pcode init failure, causing the
+ * driver to enter survivability mode.
+ */
+ xe_ras_process_errors(xe);
+ }
+
+ if (xe->info.has_drm_ras)
+ xe_drm_ras_init(xe);
+}
+
+/**
+ * xe_ras_process_errors() - Process and contain hardware errors
+ * @xe: xe device instance
+ *
+ * Get error details from system controller and return recovery
+ * method. Called only from PCI error handling.
+ *
+ * Returns: recovery action to be taken
+ */
+enum xe_ras_recovery_action xe_ras_process_errors(struct xe_device *xe)
+{
+ struct xe_sysctrl_mailbox_command command = {0};
+ struct xe_ras_get_soc_error response;
+ enum xe_ras_recovery_action final_action;
+ u32 remaining = XE_SYSCTRL_FLOOD_LIMIT;
+ size_t rlen;
+ int ret;
- xe_drm_ras_init(xe);
+ if (!xe->info.has_sysctrl)
+ return XE_RAS_RECOVERY_ACTION_RESET;
+
+ /* Default action */
+ final_action = XE_RAS_RECOVERY_ACTION_RECOVERED;
+
+ xe_sysctrl_create_command(&command, XE_SYSCTRL_GROUP_GFSP, XE_SYSCTRL_CMD_GET_SOC_ERROR,
+ NULL, 0, &response, sizeof(response));
+
+ do {
+ memset(&response, 0, sizeof(response));
+
+ ret = xe_sysctrl_send_command(&xe->sc, &command, &rlen);
+ if (ret) {
+ xe_err(xe, "sysctrl: failed to get soc error %d\n", ret);
+ goto err;
+ }
+
+ if (rlen != sizeof(response)) {
+ xe_err(xe, "sysctrl: unexpected get soc error response length %zu (expected %zu)\n",
+ rlen, sizeof(response));
+ goto err;
+ }
+
+ /* Report if number of errors exceeds the maximum errors supported */
+ if (response.num_errors > XE_RAS_NUM_ERROR_ARR)
+ xe_err(xe, "sysctrl: number of errors received %d out of bound (%d)\n",
+ response.num_errors, XE_RAS_NUM_ERROR_ARR);
+
+ for (int i = 0; i < response.num_errors && i < XE_RAS_NUM_ERROR_ARR; i++) {
+ struct xe_ras_error_array *arr = &response.arr[i];
+ enum xe_ras_recovery_action action;
+ u8 component, severity;
+
+ component = arr->counter.common.component;
+ severity = arr->counter.common.severity;
+
+ xe_err(xe, "[RAS]: %s %s detected\n", comp_to_str(component),
+ sev_to_str(severity));
+
+ switch (component) {
+ case XE_RAS_COMP_CORE_COMPUTE:
+ action = handle_core_compute_errors(arr);
+ break;
+ case XE_RAS_COMP_SOC_INTERNAL:
+ action = handle_soc_internal_errors(xe, arr);
+ break;
+ case XE_RAS_COMP_DEVICE_MEMORY:
+ action = handle_device_memory_errors(xe, arr);
+ break;
+ default:
+ /* For any other component, reset */
+ action = XE_RAS_RECOVERY_ACTION_RESET;
+ break;
+ }
+
+ /* Process and log all errors and then trigger highest recovery action */
+ if (action > final_action)
+ final_action = action;
+ }
+
+ /* Treat flooding as an system controller error */
+ if (!--remaining) {
+ xe_err(xe, "[RAS]: sysctrl: get soc error response flooding\n");
+ return XE_RAS_RECOVERY_ACTION_RESET;
+ }
+
+ } while (response.additional_errors);
+
+ return final_action;
+
+err:
+ return XE_RAS_RECOVERY_ACTION_RESET;
}
diff --git a/drivers/gpu/drm/xe/xe_ras.h b/drivers/gpu/drm/xe/xe_ras.h
index ba0b0224df23..b85657b1dd97 100644
--- a/drivers/gpu/drm/xe/xe_ras.h
+++ b/drivers/gpu/drm/xe/xe_ras.h
@@ -6,7 +6,7 @@
#ifndef _XE_RAS_H_
#define _XE_RAS_H_
-#include <linux/types.h>
+#include "xe_ras_types.h"
struct xe_device;
struct xe_sysctrl_event_response;
@@ -16,5 +16,6 @@ void xe_ras_counter_threshold_crossed(struct xe_device *xe,
int xe_ras_get_counter(struct xe_device *xe, u8 severity, u8 component, u32 *value);
int xe_ras_clear_counter(struct xe_device *xe, u8 severity, u8 component);
void xe_ras_init(struct xe_device *xe);
+enum xe_ras_recovery_action xe_ras_process_errors(struct xe_device *xe);
#endif
diff --git a/drivers/gpu/drm/xe/xe_ras_types.h b/drivers/gpu/drm/xe/xe_ras_types.h
index 6688e11f57a8..a041a8375730 100644
--- a/drivers/gpu/drm/xe/xe_ras_types.h
+++ b/drivers/gpu/drm/xe/xe_ras_types.h
@@ -8,7 +8,67 @@
#include <linux/types.h>
+#define XE_RAS_NUM_ERROR_ARR 3
#define XE_RAS_NUM_COUNTERS 16
+#define XE_RAS_SOC_IEH_PUNIT BIT(1)
+/* Device memory error categories */
+#define XE_RAS_MEMORY_ECC BIT(1)
+#define XE_RAS_MEMORY_POISON BIT(2)
+#define XE_RAS_MEMORY_DATA_PARITY BIT(5)
+
+#define XE_RAS_NUM_PAGES 25
+
+/**
+ * enum xe_ras_recovery_action - RAS recovery actions
+ *
+ * @XE_RAS_RECOVERY_ACTION_RECOVERED: Error recovered
+ * @XE_RAS_RECOVERY_ACTION_RESET: Requires reset
+ * @XE_RAS_RECOVERY_ACTION_DISCONNECT: Requires disconnect
+ * @XE_RAS_RECOVERY_ACTION_MAX: Max action value
+ *
+ * This enum defines the possible recovery actions that can be taken in response
+ * to RAS errors.
+ */
+enum xe_ras_recovery_action {
+ XE_RAS_RECOVERY_ACTION_RECOVERED = 0,
+ XE_RAS_RECOVERY_ACTION_RESET,
+ XE_RAS_RECOVERY_ACTION_DISCONNECT,
+ XE_RAS_RECOVERY_ACTION_MAX
+};
+
+/**
+ * enum xe_ras_page_action - Page offline actions for page offline request
+ *
+ * @XE_RAS_PAGE_ACTION_OFFLINE: Instruct firmware to remove page from queue
+ * @XE_RAS_PAGE_ACTION_DECLINE: Instruct firmware to mark page as not offline
+ * @XE_RAS_PAGE_ACTION_MAX: Max value for validation
+ */
+enum xe_ras_page_action {
+ XE_RAS_PAGE_ACTION_OFFLINE,
+ XE_RAS_PAGE_ACTION_DECLINE,
+ XE_RAS_PAGE_ACTION_MAX
+};
+
+/**
+ * enum xe_ras_response_status - RAS response status codes
+ *
+ * @XE_RAS_STATUS_SUCCESS: Operation successful
+ * @XE_RAS_STATUS_INVALID_PARAM: Invalid parameter
+ * @XE_RAS_STATUS_OP_NOT_SUPPORTED: Operation not supported
+ * @XE_RAS_STATUS_TIMEOUT: Operation timed out
+ * @XE_RAS_STATUS_HARDWARE_FAILURE: Hardware failure
+ * @XE_RAS_STATUS_INSUFFICIENT_RESOURCES: Insufficient resources
+ * @XE_RAS_STATUS_UNKNOWN_ERROR: Unknown error
+ */
+enum xe_ras_response_status {
+ XE_RAS_STATUS_SUCCESS = 0,
+ XE_RAS_STATUS_INVALID_PARAM,
+ XE_RAS_STATUS_OP_NOT_SUPPORTED,
+ XE_RAS_STATUS_TIMEOUT,
+ XE_RAS_STATUS_HARDWARE_FAILURE,
+ XE_RAS_STATUS_INSUFFICIENT_RESOURCES,
+ XE_RAS_STATUS_UNKNOWN_ERROR
+};
/**
* struct xe_ras_error_common - Error fields that are common across all products
@@ -121,4 +181,173 @@ struct xe_ras_clear_counter_response {
/** @reserved1: Reserved for future use */
u32 reserved1[3];
} __packed;
-#endif
+
+/**
+ * struct xe_ras_error_array - Details of the error types
+ */
+struct xe_ras_error_array {
+ /** @counter_value: Counter value of the returned error */
+ u32 counter_value;
+ /** @counter: Error counter */
+ struct xe_ras_error_class counter;
+ /** @timestamp: Timestamp */
+ u64 timestamp;
+ /** @details: Error details specific to the counter */
+ u32 details[XE_RAS_NUM_COUNTERS];
+} __packed;
+
+/**
+ * struct xe_ras_get_soc_error - Response from get soc error command
+ */
+struct xe_ras_get_soc_error {
+ /** @num_errors: Number of errors reported in this response */
+ u8 num_errors;
+ /** @additional_errors: Indicates if the errors are pending */
+ u8 additional_errors;
+ /** @arr: Array of up to 3 errors */
+ struct xe_ras_error_array arr[XE_RAS_NUM_ERROR_ARR];
+} __packed;
+
+/**
+ * struct xe_ras_compute_error - Error details of Core Compute error
+ */
+struct xe_ras_compute_error {
+ /** @log_header: Error Source and type */
+ u32 log_header;
+ /** @reserved: Reserved */
+ u32 reserved[15];
+} __packed;
+
+/**
+ * struct xe_ras_soc_error_source - Source of SoC error
+ */
+struct xe_ras_soc_error_source {
+ /** @csc: CSC */
+ u32 csc:1;
+ /** @ieh: IEH (Integrated Error Handler) */
+ u32 ieh:1;
+ /** @reserved: Reserved for future use */
+ u32 reserved:30;
+} __packed;
+
+/**
+ * struct xe_ras_soc_error - Error details of SoC internal error
+ */
+struct xe_ras_soc_error {
+ /** @source: Error source */
+ struct xe_ras_soc_error_source source;
+ /** @details: Error details specific to the error source */
+ u32 details[15];
+} __packed;
+
+/**
+ * struct xe_ras_csc_error - CSC error details
+ */
+struct xe_ras_csc_error {
+ /** @reserved: Reserved */
+ u32 reserved;
+ /** @hec_fw_error: CSC firmware error */
+ u32 hec_fw_error;
+} __packed;
+
+/**
+ * struct xe_ras_ieh_error - SoC IEH (Integrated Error Handler) error details
+ */
+struct xe_ras_ieh_error {
+ /** @ieh_instance: IEH instance */
+ u32 ieh_instance:2;
+ /** @reserved: Reserved for future use */
+ u32 reserved:30;
+ /** @global_error_status: Global error status */
+ u32 global_error_status;
+ /** @local_error_status: Local error status */
+ u32 local_error_status;
+ /** @gerr_mask: Global error mask */
+ u32 gerr_mask;
+ /** @info: Additional information */
+ u32 info[10];
+} __packed;
+
+/**
+ * struct xe_ras_memory_error - Device memory error details
+ */
+struct xe_ras_memory_error {
+ /** @category: Device memory error category */
+ u8 category;
+ /** @reserved: Reserved for future use */
+ u8 reserved[7];
+ /** @hardware_address: Hardware physical address details */
+ u64 hardware_address;
+ /** @sw_address: Software address where error occurred */
+ u64 sw_address;
+ /** @reserved1: Reserved */
+ u32 reserved1[10];
+} __packed;
+
+/**
+ * struct xe_ras_offline_list_request - Request for get offline list command
+ */
+struct xe_ras_offline_list_request {
+ /** @index: Zero-based index into the offline page list */
+ u32 index;
+} __packed;
+
+/**
+ * struct xe_ras_offline_list_response - Response from get offline list command
+ */
+struct xe_ras_offline_list_response {
+ /** @max_entries: Total no of pages that can be stored in flash */
+ u32 max_entries;
+ /** @total_pages: Total number of permanently offlined pages */
+ u32 total_pages;
+ /** @pages_returned: Number of pages returned in this response */
+ u32 pages_returned;
+ /** @page_addresses: Array of permanently offlined page addresses (4KB aligned) */
+ u64 page_addresses[XE_RAS_NUM_PAGES];
+ /** @additional_data: Indicates if more data is available */
+ u8 additional_data;
+ /** @reserved: Reserved for future use */
+ u8 reserved[3];
+} __packed;
+
+/**
+ * struct xe_ras_page_offline_queue - Response from get offline queue command
+ */
+struct xe_ras_page_offline_queue {
+ /** @total_pages: Total number of queued pages */
+ u32 total_pages;
+ /** @pages_returned: Number of pages returned in this response */
+ u32 pages_returned;
+ /** @page_addresses: Array of page addresses (4KB aligned) */
+ u64 page_addresses[XE_RAS_NUM_PAGES];
+ /** @additional_data: Indicates if more data is available */
+ u8 additional_data;
+ /** @reserved: Reserved for future use */
+ u8 reserved[3];
+} __packed;
+
+/**
+ * struct xe_ras_page_offline_request - Request for page offline command
+ *
+ * This structure provides the request format to offline/decline a page
+ */
+struct xe_ras_page_offline_request {
+ /** @page_address: Page address (4KB aligned) */
+ u64 page_address;
+ /** @action: Action to be performed, see &enum xe_ras_page_action */
+ u32 action;
+ /** @reserved: Reserved for future use */
+ u32 reserved;
+} __packed;
+
+/**
+ * struct xe_ras_page_offline_response - Response from page offline command
+ */
+struct xe_ras_page_offline_response {
+ /** @status: Status of the page offline request, see &enum xe_ras_response_status */
+ u32 status;
+ /** @reserved: Reserved for future use */
+ u32 reserved;
+} __packed;
+
+#endif /* _XE_RAS_TYPES_H_ */
diff --git a/drivers/gpu/drm/xe/xe_survivability_mode.c b/drivers/gpu/drm/xe/xe_survivability_mode.c
index 427afd144f3a..4c506027fa94 100644
--- a/drivers/gpu/drm/xe/xe_survivability_mode.c
+++ b/drivers/gpu/drm/xe/xe_survivability_mode.c
@@ -54,7 +54,6 @@
* # cat /sys/bus/pci/devices/<device>/survivability_mode
* Boot
*
- *
* Any additional debug information if present will be visible under the directory
* ``survivability_info``::
*
@@ -98,6 +97,15 @@
* # cat /sys/bus/pci/devices/<device>/survivability_mode
* Runtime
*
+ * On some CSC firmware errors, PCODE sets FDO mode and the only recovery possible is through
+ * firmware flash using SPI driver. Userspace can check if FDO mode is set by checking the below
+ * sysfs entry.
+ *
+ * .. code-block:: shell
+ *
+ * # cat /sys/bus/pci/devices/<device>/survivability_info/fdo_mode
+ * enabled
+ *
* When such errors occur, userspace is notified with the drm device wedged uevent and runtime
* survivability mode. User can then initiate a firmware flash using userspace tools like fwupd
* to restore device to normal operation.
@@ -296,7 +304,8 @@ static int create_survivability_sysfs(struct pci_dev *pdev)
if (ret)
return ret;
- if (check_boot_failure(xe)) {
+ /* Survivability info is not required if enabled via configfs */
+ if (!xe_configfs_get_survivability_mode(pdev)) {
ret = devm_device_add_group(dev, &survivability_info_group);
if (ret)
return ret;
diff --git a/drivers/gpu/drm/xe/xe_sysctrl_event.c b/drivers/gpu/drm/xe/xe_sysctrl_event.c
index b4d17329af6c..da395148ee9d 100644
--- a/drivers/gpu/drm/xe/xe_sysctrl_event.c
+++ b/drivers/gpu/drm/xe/xe_sysctrl_event.c
@@ -16,7 +16,7 @@ static void get_pending_event(struct xe_sysctrl *sc, struct xe_sysctrl_mailbox_c
{
struct xe_sysctrl_event_response *response = command->data_out;
struct xe_device *xe = sc_to_xe(sc);
- u32 count = XE_SYSCTRL_EVENT_FLOOD;
+ u32 count = XE_SYSCTRL_FLOOD_LIMIT;
size_t len;
int ret;
diff --git a/drivers/gpu/drm/xe/xe_sysctrl_event_types.h b/drivers/gpu/drm/xe/xe_sysctrl_event_types.h
index c16c66b9fa7f..348768ca454a 100644
--- a/drivers/gpu/drm/xe/xe_sysctrl_event_types.h
+++ b/drivers/gpu/drm/xe/xe_sysctrl_event_types.h
@@ -10,9 +10,6 @@
#define XE_SYSCTRL_EVENT_DATA_LEN 59
-/* Modify as needed */
-#define XE_SYSCTRL_EVENT_FLOOD 16
-
/**
* enum xe_sysctrl_event - Events reported by System Controller
*
diff --git a/drivers/gpu/drm/xe/xe_sysctrl_mailbox.h b/drivers/gpu/drm/xe/xe_sysctrl_mailbox.h
index fb434cc165b2..d34e9a9dbb0e 100644
--- a/drivers/gpu/drm/xe/xe_sysctrl_mailbox.h
+++ b/drivers/gpu/drm/xe/xe_sysctrl_mailbox.h
@@ -30,5 +30,4 @@ void xe_sysctrl_mailbox_init(struct xe_sysctrl *sc);
int xe_sysctrl_send_command(struct xe_sysctrl *sc,
struct xe_sysctrl_mailbox_command *cmd,
size_t *rdata_len);
-
#endif
diff --git a/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h b/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
index 6e3753554510..5afd1ed16db2 100644
--- a/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
+++ b/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
@@ -22,14 +22,22 @@ enum xe_sysctrl_group {
/**
* enum xe_sysctrl_gfsp_cmd - Commands supported by GFSP group
*
+ * @XE_SYSCTRL_CMD_GET_SOC_ERROR: Retrieve basic error information
* @XE_SYSCTRL_CMD_GET_COUNTER: Get error counter value
* @XE_SYSCTRL_CMD_CLEAR_COUNTER: Clear error counter value
* @XE_SYSCTRL_CMD_GET_PENDING_EVENT: Retrieve pending event
+ * @XE_SYSCTRL_CMD_PAGE_OFFLINE: Instruct firmware to offline/decline a page
+ * @XE_SYSCTRL_CMD_GET_OFFLINE_LIST: Retrieve list of all offlined pages from flash
+ * @XE_SYSCTRL_CMD_GET_OFFLINE_QUEUE: Retrieve list of offlined queued pages from firmware
*/
enum xe_sysctrl_gfsp_cmd {
+ XE_SYSCTRL_CMD_GET_SOC_ERROR = 0x01,
XE_SYSCTRL_CMD_GET_COUNTER = 0x03,
XE_SYSCTRL_CMD_CLEAR_COUNTER = 0x04,
XE_SYSCTRL_CMD_GET_PENDING_EVENT = 0x07,
+ XE_SYSCTRL_CMD_PAGE_OFFLINE = 0x08,
+ XE_SYSCTRL_CMD_GET_OFFLINE_LIST = 0x09,
+ XE_SYSCTRL_CMD_GET_OFFLINE_QUEUE = 0x0A,
};
/**
@@ -52,6 +60,9 @@ struct xe_sysctrl_mailbox_command {
size_t data_out_len;
};
+/* Modify as needed */
+#define XE_SYSCTRL_FLOOD_LIMIT 16
+
#define XE_SYSCTRL_MB_FRAME_SIZE 16
#define XE_SYSCTRL_MB_MAX_FRAMES 64
#define XE_SYSCTRL_MB_MAX_MESSAGE_SIZE \
--
2.34.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v9 2/6] drm: Add DRM_WEDGE_RECOVERY_COLD_RESET recovery method
2026-06-22 13:14 [PATCH v9 0/6] Introduce cold reset recovery method Mallesh Koujalagi
2026-06-22 13:15 ` [PATCH v9 1/6] Introduce Xe Uncorrectable Error Handling Mallesh Koujalagi
@ 2026-06-22 13:15 ` Mallesh Koujalagi
2026-06-23 21:04 ` sashiko-bot
2026-06-22 13:15 ` [PATCH v9 3/6] drm/doc: Document " Mallesh Koujalagi
` (7 subsequent siblings)
9 siblings, 1 reply; 16+ messages in thread
From: Mallesh Koujalagi @ 2026-06-22 13:15 UTC (permalink / raw)
To: intel-xe, dri-devel, rodrigo.vivi
Cc: andrealmeid, christian.koenig, airlied, simona.vetter, mripard,
maarten.lankhorst, tzimmermann, anshuman.gupta, badal.nilawar,
riana.tauro, karthik.poosa, sk.anirban, raag.jadav,
Mallesh Koujalagi
Introduce DRM_WEDGE_RECOVERY_COLD_RESET (BIT(4)) recovery method to handle
scenarios requiring device power cycle.
This method addresses cases where other recovery mechanisms
(driver reload, PCIe reset, etc.) are insufficient to restore device
functionality. When set, it indicates to userspace that only device power
cycle can recover the device from its current error state.
Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
---
v3:
- Update any scenario that requires cold-reset. (Riana)
v4:
- Rename cold reset to power cycle. (Raag)
v5:
- Make it consistent with consumer expectations. (Raag)
v6:
- Drop "last resort" wording. (Riana)
---
drivers/gpu/drm/drm_drv.c | 2 ++
include/drm/drm_device.h | 1 +
2 files changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index 1ff0bf7cba6a..94abe7a811a5 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -537,6 +537,8 @@ static const char *drm_get_wedge_recovery(unsigned int opt)
return "bus-reset";
case DRM_WEDGE_RECOVERY_VENDOR:
return "vendor-specific";
+ case DRM_WEDGE_RECOVERY_COLD_RESET:
+ return "cold-reset";
default:
return NULL;
}
diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h
index 768a8dae83c5..75f030d027ee 100644
--- a/include/drm/drm_device.h
+++ b/include/drm/drm_device.h
@@ -37,6 +37,7 @@ struct pci_controller;
#define DRM_WEDGE_RECOVERY_REBIND BIT(1) /* unbind + bind driver */
#define DRM_WEDGE_RECOVERY_BUS_RESET BIT(2) /* unbind + reset bus device + bind */
#define DRM_WEDGE_RECOVERY_VENDOR BIT(3) /* vendor specific recovery method */
+#define DRM_WEDGE_RECOVERY_COLD_RESET BIT(4) /* remove device + slot power cycle + rescan */
/**
* struct drm_wedge_task_info - information about the guilty task of a wedge dev
--
2.34.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v9 3/6] drm/doc: Document DRM_WEDGE_RECOVERY_COLD_RESET recovery method
2026-06-22 13:14 [PATCH v9 0/6] Introduce cold reset recovery method Mallesh Koujalagi
2026-06-22 13:15 ` [PATCH v9 1/6] Introduce Xe Uncorrectable Error Handling Mallesh Koujalagi
2026-06-22 13:15 ` [PATCH v9 2/6] drm: Add DRM_WEDGE_RECOVERY_COLD_RESET recovery method Mallesh Koujalagi
@ 2026-06-22 13:15 ` Mallesh Koujalagi
2026-06-23 21:11 ` sashiko-bot
2026-06-22 13:15 ` [PATCH v9 4/6] drm/xe: Handle PUNIT errors by requesting cold-reset recovery Mallesh Koujalagi
` (6 subsequent siblings)
9 siblings, 1 reply; 16+ messages in thread
From: Mallesh Koujalagi @ 2026-06-22 13:15 UTC (permalink / raw)
To: intel-xe, dri-devel, rodrigo.vivi
Cc: andrealmeid, christian.koenig, airlied, simona.vetter, mripard,
maarten.lankhorst, tzimmermann, anshuman.gupta, badal.nilawar,
riana.tauro, karthik.poosa, sk.anirban, raag.jadav,
Mallesh Koujalagi
When ``WEDGED=cold-reset`` is sent, it indicates that the device has
encountered an error condition that cannot be resolved through other
recovery methods such as driver rebind or bus reset, and requires a
complete device power cycle to restore functionality.
Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
---
v2:
- Add several instead of number to avoid update. (Jani)
v3:
- Update document with generic scenario. (Riana)
- Consistent with terminology. (Raag)
- Remove already covered information.
v4:
- Update doc. (Raag/Riana)
- Change commit message.
v5:
- Update commit message. (Raag)
- Remove unbind.
- Simplify cold-reset script.
v6:
- Look up the hotplug slot in DEVPATH instead of scanning
every PCI slot on the system. (Raag)
- Drop arbitrary sleep values from the example script.
v7:
- Update recovery script. (Raag)
v8:
- Add rescan instead of reset. (Raag)
---
Documentation/gpu/drm-uapi.rst | 85 +++++++++++++++++++++++++++++++++-
1 file changed, 84 insertions(+), 1 deletion(-)
diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
index 93df92c4ac8c..43cdac65d04f 100644
--- a/Documentation/gpu/drm-uapi.rst
+++ b/Documentation/gpu/drm-uapi.rst
@@ -424,7 +424,7 @@ needed.
Recovery
--------
-Current implementation defines four recovery methods, out of which, drivers
+Current implementation defines several recovery methods, out of which, drivers
can use any one, multiple or none. Method(s) of choice will be sent in the
uevent environment as ``WEDGED=<method1>[,..,<methodN>]`` in order of less to
more side-effects. See the section `Vendor Specific Recovery`_
@@ -441,6 +441,7 @@ following expectations.
rebind unbind + bind driver
bus-reset unbind + bus reset/re-enumeration + bind
vendor-specific vendor specific recovery method
+ cold-reset remove device + slot power cycle + rescan
unknown consumer policy
=============== ========================================
@@ -453,6 +454,17 @@ debug purpose in order to root cause the hang. This is useful because the first
hang is usually the most critical one which can result in consequential hangs
or complete wedging.
+Cold Reset Recovery
+-------------------
+
+When ``WEDGED=cold-reset`` is sent, it indicates that the device has
+encountered an error condition that cannot be resolved through other
+recovery methods such as driver rebind or bus reset, and requires a complete
+device power cycle to restore functionality.
+
+This method is used by devices that are plugged directly into the PCIe slot
+which supports removing the power.
+
Vendor Specific Recovery
------------------------
@@ -530,6 +542,77 @@ Recovery script::
echo -n $DEVICE > $DRIVER/unbind
echo -n $DEVICE > $DRIVER/bind
+Example - cold-reset
+--------------------
+
+Udev rule::
+
+ SUBSYSTEM=="drm", ENV{WEDGED}=="cold-reset", DEVPATH=="*/drm/card[0-9]",
+ RUN+="/path/to/cold-reset.sh $env{DEVPATH}"
+
+Recovery script::
+
+ #!/bin/sh
+ die() { echo "ERROR: $*" >&2; exit 1; }
+
+ [ -n "$1" ] || die "Usage: $0 <device-path>"
+
+ PCI_DEVS=/sys/bus/pci/devices
+ PCI_SLOTS=/sys/bus/pci/slots
+
+ syspath=$(readlink -f "/sys/$1/device" 2>/dev/null || readlink -f "/sys/$1" 2>/dev/null)
+ [ -n "$syspath" ] || die "cannot resolve sysfs path for: $1"
+
+ dev=$(basename "$syspath")
+ [ -e "$PCI_DEVS/$dev" ] || die "not a PCI device: $dev"
+ echo "device : $dev"
+
+ slot=""
+ walk=$(dirname "$(readlink -f "$PCI_DEVS/$dev")")
+
+ while true; do
+ ancestor=$(basename "$walk")
+ case "$ancestor" in pci*) break ;; esac # reached the virtual bus root
+
+ ancestor_nofn=${ancestor%.*} # strip function: 0000:03:01.0 -> 0000:03:01
+
+ for f in "$PCI_SLOTS"/*/address; do
+ [ -f "$f" ] || continue
+ addr=$(cat "$f")
+ case "$ancestor_nofn" in
+ *"$addr") slot=$(basename "$(dirname "$f")"); break ;;
+ esac
+ done
+
+ if [ -n "$slot" ] && [ -e "$PCI_SLOTS/$slot/power" ]; then
+ echo "slot : $slot (port $ancestor)"
+ break
+ fi
+ slot=""
+ walk=$(dirname "$walk")
+ done
+
+ [ -n "$slot" ] || die "no hotplug slot with power control found in PCIe topology"
+
+ # Cold reset: remove the device, cut slot power, restore power, rescan.
+ echo "Removing $dev..."
+ [ -e "$PCI_DEVS/$dev" ] && echo 1 > "$PCI_DEVS/$dev/remove"
+
+ echo "Powering off slot $slot..."
+ echo 0 > "$PCI_SLOTS/$slot/power"
+
+ echo "Powering on slot $slot..."
+ echo 1 > "$PCI_SLOTS/$slot/power"
+
+ echo "Rescanning PCI bus..."
+ echo 1 > /sys/bus/pci/rescan
+
+ if [ -e "$PCI_DEVS/$dev" ]; then
+ echo "Done: $dev is back online."
+ else
+ echo "WARNING: $dev did not re-appear after rescan."
+ fi
+
Customization
-------------
--
2.34.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v9 4/6] drm/xe: Handle PUNIT errors by requesting cold-reset recovery
2026-06-22 13:14 [PATCH v9 0/6] Introduce cold reset recovery method Mallesh Koujalagi
` (2 preceding siblings ...)
2026-06-22 13:15 ` [PATCH v9 3/6] drm/doc: Document " Mallesh Koujalagi
@ 2026-06-22 13:15 ` Mallesh Koujalagi
2026-06-22 13:15 ` [PATCH v9 5/6] drm/xe: Suppress Surprise Link Down on device Mallesh Koujalagi
` (5 subsequent siblings)
9 siblings, 0 replies; 16+ messages in thread
From: Mallesh Koujalagi @ 2026-06-22 13:15 UTC (permalink / raw)
To: intel-xe, dri-devel, rodrigo.vivi
Cc: andrealmeid, christian.koenig, airlied, simona.vetter, mripard,
maarten.lankhorst, tzimmermann, anshuman.gupta, badal.nilawar,
riana.tauro, karthik.poosa, sk.anirban, raag.jadav,
Mallesh Koujalagi
When PUNIT (power management unit) errors are detected that persist across
warm resets, mark the device as wedged with DRM_WEDGE_RECOVERY_COLD_RESET
and notify userspace that a complete device power cycle is required to
restore normal operation.
Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
---
v3:
- Use PUNIT instead of PMU. (Riana)
- Use consistent wording.
- Remove log. (Raag)
v4:
- Make function static. (Raag)
v5:
- Remove kdoc for static function. (Raag)
- Remove xe_ prefix for static function.
v9:
- Remove unwanted header. (Sashiko)
---
drivers/gpu/drm/xe/xe_ras.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
index d6db0e98586f..1eb2bbaccd9b 100644
--- a/drivers/gpu/drm/xe/xe_ras.c
+++ b/drivers/gpu/drm/xe/xe_ras.c
@@ -256,6 +256,12 @@ static struct pci_dev *find_usp_dev(struct pci_dev *pdev)
return pci_upstream_bridge(vsp);
}
+static void punit_error_handler(struct xe_device *xe)
+{
+ xe_device_set_wedged_method(xe, DRM_WEDGE_RECOVERY_COLD_RESET);
+ xe_device_declare_wedged(xe);
+}
+
static enum xe_ras_recovery_action handle_soc_internal_errors(struct xe_device *xe,
struct xe_ras_error_array *arr)
{
@@ -288,7 +294,7 @@ static enum xe_ras_recovery_action handle_soc_internal_errors(struct xe_device *
xe_err(xe, "[RAS]: PUNIT %s detected: 0x%x\n",
sev_to_str(counter->common.severity),
ieh_error->global_error_status);
- /* TODO: Add PUNIT error handling */
+ punit_error_handler(xe);
return XE_RAS_RECOVERY_ACTION_DISCONNECT;
}
}
--
2.34.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v9 5/6] drm/xe: Suppress Surprise Link Down on device
2026-06-22 13:14 [PATCH v9 0/6] Introduce cold reset recovery method Mallesh Koujalagi
` (3 preceding siblings ...)
2026-06-22 13:15 ` [PATCH v9 4/6] drm/xe: Handle PUNIT errors by requesting cold-reset recovery Mallesh Koujalagi
@ 2026-06-22 13:15 ` Mallesh Koujalagi
2026-06-23 22:47 ` sashiko-bot
2026-06-22 13:15 ` [PATCH v9 6/6] drm/xe/ras: Use fault-inject to trigger punit error handler Mallesh Koujalagi
` (4 subsequent siblings)
9 siblings, 1 reply; 16+ messages in thread
From: Mallesh Koujalagi @ 2026-06-22 13:15 UTC (permalink / raw)
To: intel-xe, dri-devel, rodrigo.vivi
Cc: andrealmeid, christian.koenig, airlied, simona.vetter, mripard,
maarten.lankhorst, tzimmermann, anshuman.gupta, badal.nilawar,
riana.tauro, karthik.poosa, sk.anirban, raag.jadav,
Mallesh Koujalagi
PUNIT errors can only be recovered using a power-cycle. Xe KMD
sends a uevent to notify userspace to trigger a power cycle.
On platforms where link drop caused by powering the device off and
back on is reported by hardware as a Surprise Link Down (SLD), which
AER then escalates as an Uncorrectable Fatal Error. That error fires
before the device finishes coming back up and defeats the
very recovery we are attempting.
To keep the expected, recovery-induced link drop from being raised as
a fatal AER event, mask the Surprise Link Down bit
(PCI_ERR_UNC_SURPDN) in the upstream port's AER Uncorrectable Error
Mask register before punit_error_handler() requests the cold reset.
After the reset finishes, clear the Surprise Down Error Status bit
in the Uncorrectable Error Status register.
Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
---
v6:
- Expand commit message to explain why SUR_DN is masked. (Raag/Riana)
- Check Slot Implemented bit before reading Slot Capabilities, per
PCIe spec. (Riana)
- Add debug log.
v7:
- Handle surprise link down event properly. (Aravind/Riana)
- Update commit message. (Riana)
- Correct log message.
v8:
- Use find_usp_dev() in punit_error_handler() function.
v9:
- Removed #ifdef CONFIG_PCIEAER. (Riana)
- Used pci_find_ext_capability() instead of usp->aer_cap.
- Clear the PCI_ERR_UNC_SURPDN status bit (W1C) after
reset complete. (Lukas Wunner)
- Use pci_clear_and_set_config_dword() helper.
---
drivers/gpu/drm/xe/xe_ras.c | 60 +++++++++++++++++++++++++++++++------
1 file changed, 51 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
index 1eb2bbaccd9b..5a2fee5a1308 100644
--- a/drivers/gpu/drm/xe/xe_ras.c
+++ b/drivers/gpu/drm/xe/xe_ras.c
@@ -256,8 +256,37 @@ static struct pci_dev *find_usp_dev(struct pci_dev *pdev)
return pci_upstream_bridge(vsp);
}
+static void pcie_suppress_surprise_link_down(struct pci_dev *usp)
+{
+ u16 aer_cap;
+
+ /*
+ * Cold reset power-cycles the slot, dropping the PCIe link.
+ * This triggers a spurious Surprise Link Down AER event on the
+ * Upstream Switch Port (USP). Mask this error to avoid false
+ * error reporting during recovery.
+ */
+ aer_cap = pci_find_ext_capability(usp, PCI_EXT_CAP_ID_ERR);
+ if (!aer_cap) {
+ dev_dbg(&usp->dev,
+ "AER capability not present\n");
+ return;
+ }
+
+ pci_clear_and_set_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, 0, PCI_ERR_UNC_SURPDN);
+ dev_dbg(&usp->dev, "Surprise Link Down masked for cold reset\n");
+}
+
static void punit_error_handler(struct xe_device *xe)
{
+ struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
+ struct pci_dev *usp;
+
+ usp = find_usp_dev(pdev);
+
+ if (usp)
+ pcie_suppress_surprise_link_down(usp);
+
xe_device_set_wedged_method(xe, DRM_WEDGE_RECOVERY_COLD_RESET);
xe_device_declare_wedged(xe);
}
@@ -573,7 +602,7 @@ int xe_ras_clear_counter(struct xe_device *xe, u8 severity, u8 component)
static void aer_unmask_and_downgrade_internal_error(struct xe_device *xe)
{
struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
- u32 aer_uncorr_mask, aer_uncorr_sev, aer_uncorr_status;
+ u32 aer_uncorr_status;
struct pci_dev *usp;
u16 aer_cap;
@@ -595,6 +624,17 @@ static void aer_unmask_and_downgrade_internal_error(struct xe_device *xe)
if (aer_uncorr_status & PCI_ERR_UNC_INTN)
pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_STATUS, PCI_ERR_UNC_INTN);
+ /*
+ * Even though we masked the Surprise Link Down bit before the reset, the
+ * USP hardware still records the event in its status register. Since the
+ * USP itself is never power-cycled, that status bit survives the reset and
+ * stays set. If we unmask the error below without clearing the status first,
+ * it will immediately fire as a new AER error and undo the recovery we just
+ * completed. Write 1 to clear it (W1C) before unmasking.
+ */
+ if (aer_uncorr_status & PCI_ERR_UNC_SURPDN)
+ pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_STATUS, PCI_ERR_UNC_SURPDN);
+
/*
* All errors are steered to USP which is a PCIe AER Compliant device.
* Downgrade all the errors to non-fatal to prevent PCIe bus driver
@@ -604,20 +644,22 @@ static void aer_unmask_and_downgrade_internal_error(struct xe_device *xe)
* The Uncorrectable Error Severity Register has the 'Uncorrectable
* Internal Error Severity' set to fatal by default. Set this to
* non-fatal and unmask the error.
+ *
+ * Also restore the Surprise Link Down mask that was set in
+ * pcie_suppress_surprise_link_down() before the cold reset. The USP is
+ * never power-cycled, so the mask bit persists and must be cleared here
+ * to ensure link-down events are reported normally going forward.
*/
/* Initialize Uncorrectable Error Severity Register */
- pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_SEVER, &aer_uncorr_sev);
- aer_uncorr_sev &= ~PCI_ERR_UNC_INTN;
- pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_SEVER, aer_uncorr_sev);
+ pci_clear_and_set_config_dword(usp, aer_cap + PCI_ERR_UNCOR_SEVER, PCI_ERR_UNC_INTN, 0);
- /* Initialize Uncorrectable Error Mask Register */
- pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, &aer_uncorr_mask);
- aer_uncorr_mask &= ~PCI_ERR_UNC_INTN;
- pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, aer_uncorr_mask);
+ /* Unmask Uncorrectable Internal Error and restore Surprise Link Down to unmasked */
+ pci_clear_and_set_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK,
+ PCI_ERR_UNC_INTN | PCI_ERR_UNC_SURPDN, 0);
pci_save_state(usp);
- dev_dbg(&usp->dev, "Uncorrectable Internal Errors downgraded and unmasked\n");
+ dev_dbg(&usp->dev, "AER: INTN downgraded to non-fatal, INTN and SLD unmasked\n");
}
/**
--
2.34.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v9 6/6] drm/xe/ras: Use fault-inject to trigger punit error handler
2026-06-22 13:14 [PATCH v9 0/6] Introduce cold reset recovery method Mallesh Koujalagi
` (4 preceding siblings ...)
2026-06-22 13:15 ` [PATCH v9 5/6] drm/xe: Suppress Surprise Link Down on device Mallesh Koujalagi
@ 2026-06-22 13:15 ` Mallesh Koujalagi
2026-06-23 22:47 ` sashiko-bot
2026-06-22 17:20 ` ✗ CI.checkpatch: warning for Introduce cold reset recovery method (rev9) Patchwork
` (3 subsequent siblings)
9 siblings, 1 reply; 16+ messages in thread
From: Mallesh Koujalagi @ 2026-06-22 13:15 UTC (permalink / raw)
To: intel-xe, dri-devel, rodrigo.vivi
Cc: andrealmeid, christian.koenig, airlied, simona.vetter, mripard,
maarten.lankhorst, tzimmermann, anshuman.gupta, badal.nilawar,
riana.tauro, karthik.poosa, sk.anirban, raag.jadav,
Mallesh Koujalagi
Use fault-inject framework to trigger punit_error_handler()
for testing.
Usage:
echo 100 > .../inject_punit_error/probability
echo 1 > .../inject_punit_error/times
Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
---
drivers/gpu/drm/xe/xe_debugfs.c | 3 +++
drivers/gpu/drm/xe/xe_ras.c | 14 ++++++++++++++
2 files changed, 17 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_debugfs.c b/drivers/gpu/drm/xe/xe_debugfs.c
index 22b471303984..3a90deb5596c 100644
--- a/drivers/gpu/drm/xe/xe_debugfs.c
+++ b/drivers/gpu/drm/xe/xe_debugfs.c
@@ -40,6 +40,7 @@
DECLARE_FAULT_ATTR(gt_reset_failure);
DECLARE_FAULT_ATTR(inject_csc_hw_error);
+DECLARE_FAULT_ATTR(inject_punit_error);
static void read_residency_counter(struct xe_device *xe, struct xe_mmio *mmio,
u32 offset, const char *name, struct drm_printer *p)
@@ -612,6 +613,8 @@ void xe_debugfs_register(struct xe_device *xe)
fault_create_debugfs_attr("fail_gt_reset", root, >_reset_failure);
+ fault_create_debugfs_attr("inject_punit_error", root, &inject_punit_error);
+
if (IS_SRIOV_PF(xe))
xe_sriov_pf_debugfs_register(xe, root);
else if (IS_SRIOV_VF(xe))
diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
index 5a2fee5a1308..bb8502472f73 100644
--- a/drivers/gpu/drm/xe/xe_ras.c
+++ b/drivers/gpu/drm/xe/xe_ras.c
@@ -3,6 +3,8 @@
* Copyright © 2026 Intel Corporation
*/
+#include <linux/fault-inject.h>
+
#include "xe_assert.h"
#include "xe_bo.h"
#include "xe_device.h"
@@ -690,6 +692,13 @@ void xe_ras_init(struct xe_device *xe)
xe_drm_ras_init(xe);
}
+extern struct fault_attr inject_punit_error;
+
+static bool fault_inject_punit_error(void)
+{
+ return IS_ENABLED(CONFIG_DEBUG_FS) && should_fail(&inject_punit_error, 1);
+}
+
/**
* xe_ras_process_errors() - Process and contain hardware errors
* @xe: xe device instance
@@ -708,6 +717,11 @@ enum xe_ras_recovery_action xe_ras_process_errors(struct xe_device *xe)
size_t rlen;
int ret;
+ if (fault_inject_punit_error()) {
+ punit_error_handler(xe);
+ return XE_RAS_RECOVERY_ACTION_DISCONNECT;
+ }
+
if (!xe->info.has_sysctrl)
return XE_RAS_RECOVERY_ACTION_RESET;
--
2.34.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* ✗ CI.checkpatch: warning for Introduce cold reset recovery method (rev9)
2026-06-22 13:14 [PATCH v9 0/6] Introduce cold reset recovery method Mallesh Koujalagi
` (5 preceding siblings ...)
2026-06-22 13:15 ` [PATCH v9 6/6] drm/xe/ras: Use fault-inject to trigger punit error handler Mallesh Koujalagi
@ 2026-06-22 17:20 ` Patchwork
2026-06-22 17:22 ` ✓ CI.KUnit: success " Patchwork
` (2 subsequent siblings)
9 siblings, 0 replies; 16+ messages in thread
From: Patchwork @ 2026-06-22 17:20 UTC (permalink / raw)
To: Mallesh Koujalagi; +Cc: intel-xe
== Series Details ==
Series: Introduce cold reset recovery method (rev9)
URL : https://patchwork.freedesktop.org/series/163428/
State : warning
== Summary ==
+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
061140b9bc586ae7f40abc1249c97e1cc72d1b9d
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit 403a6fca2ddf190e48f02cc2d4d55e5c003140a7
Author: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
Date: Mon Jun 22 18:45:05 2026 +0530
drm/xe/ras: Use fault-inject to trigger punit error handler
Use fault-inject framework to trigger punit_error_handler()
for testing.
Usage:
echo 100 > .../inject_punit_error/probability
echo 1 > .../inject_punit_error/times
Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
+ /mt/dim checkpatch 6583dd200482a34bb17e5dc54551f91f53b79798 drm-intel
446f28f6ecf4 Introduce Xe Uncorrectable Error Handling
-:250: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#250:
new file mode 100644
total: 0 errors, 1 warnings, 0 checks, 1200 lines checked
cbba9802fc7f drm: Add DRM_WEDGE_RECOVERY_COLD_RESET recovery method
b624e8063aee drm/doc: Document DRM_WEDGE_RECOVERY_COLD_RESET recovery method
5f4cb93d94ce drm/xe: Handle PUNIT errors by requesting cold-reset recovery
15486edae6dc drm/xe: Suppress Surprise Link Down on device
403a6fca2ddf drm/xe/ras: Use fault-inject to trigger punit error handler
^ permalink raw reply [flat|nested] 16+ messages in thread
* ✓ CI.KUnit: success for Introduce cold reset recovery method (rev9)
2026-06-22 13:14 [PATCH v9 0/6] Introduce cold reset recovery method Mallesh Koujalagi
` (6 preceding siblings ...)
2026-06-22 17:20 ` ✗ CI.checkpatch: warning for Introduce cold reset recovery method (rev9) Patchwork
@ 2026-06-22 17:22 ` Patchwork
2026-06-22 18:09 ` ✓ Xe.CI.BAT: " Patchwork
2026-06-22 22:03 ` ✗ Xe.CI.FULL: failure " Patchwork
9 siblings, 0 replies; 16+ messages in thread
From: Patchwork @ 2026-06-22 17:22 UTC (permalink / raw)
To: Mallesh Koujalagi; +Cc: intel-xe
== Series Details ==
Series: Introduce cold reset recovery method (rev9)
URL : https://patchwork.freedesktop.org/series/163428/
State : success
== Summary ==
+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
[17:20:46] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[17:20:51] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[17:21:22] Starting KUnit Kernel (1/1)...
[17:21:22] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[17:21:22] ================== guc_buf (11 subtests) ===================
[17:21:22] [PASSED] test_smallest
[17:21:22] [PASSED] test_largest
[17:21:22] [PASSED] test_granular
[17:21:22] [PASSED] test_unique
[17:21:22] [PASSED] test_overlap
[17:21:22] [PASSED] test_reusable
[17:21:22] [PASSED] test_too_big
[17:21:22] [PASSED] test_flush
[17:21:22] [PASSED] test_lookup
[17:21:22] [PASSED] test_data
[17:21:22] [PASSED] test_class
[17:21:22] ===================== [PASSED] guc_buf =====================
[17:21:22] =================== guc_dbm (7 subtests) ===================
[17:21:22] [PASSED] test_empty
[17:21:22] [PASSED] test_default
[17:21:22] ======================== test_size ========================
[17:21:22] [PASSED] 4
[17:21:22] [PASSED] 8
[17:21:22] [PASSED] 32
[17:21:22] [PASSED] 256
[17:21:22] ==================== [PASSED] test_size ====================
[17:21:22] ======================= test_reuse ========================
[17:21:22] [PASSED] 4
[17:21:22] [PASSED] 8
[17:21:22] [PASSED] 32
[17:21:22] [PASSED] 256
[17:21:22] =================== [PASSED] test_reuse ====================
[17:21:22] =================== test_range_overlap ====================
[17:21:22] [PASSED] 4
[17:21:22] [PASSED] 8
[17:21:22] [PASSED] 32
[17:21:22] [PASSED] 256
[17:21:22] =============== [PASSED] test_range_overlap ================
[17:21:22] =================== test_range_compact ====================
[17:21:22] [PASSED] 4
[17:21:22] [PASSED] 8
[17:21:22] [PASSED] 32
[17:21:22] [PASSED] 256
[17:21:22] =============== [PASSED] test_range_compact ================
[17:21:22] ==================== test_range_spare =====================
[17:21:22] [PASSED] 4
[17:21:22] [PASSED] 8
[17:21:22] [PASSED] 32
[17:21:22] [PASSED] 256
[17:21:22] ================ [PASSED] test_range_spare =================
[17:21:22] ===================== [PASSED] guc_dbm =====================
[17:21:22] =================== guc_idm (6 subtests) ===================
[17:21:22] [PASSED] bad_init
[17:21:22] [PASSED] no_init
[17:21:22] [PASSED] init_fini
[17:21:22] [PASSED] check_used
[17:21:22] [PASSED] check_quota
[17:21:22] [PASSED] check_all
[17:21:22] ===================== [PASSED] guc_idm =====================
[17:21:22] ================== no_relay (3 subtests) ===================
[17:21:22] [PASSED] xe_drops_guc2pf_if_not_ready
[17:21:22] [PASSED] xe_drops_guc2vf_if_not_ready
[17:21:22] [PASSED] xe_rejects_send_if_not_ready
[17:21:22] ==================== [PASSED] no_relay =====================
[17:21:22] ================== pf_relay (14 subtests) ==================
[17:21:22] [PASSED] pf_rejects_guc2pf_too_short
[17:21:22] [PASSED] pf_rejects_guc2pf_too_long
[17:21:22] [PASSED] pf_rejects_guc2pf_no_payload
[17:21:22] [PASSED] pf_fails_no_payload
[17:21:22] [PASSED] pf_fails_bad_origin
[17:21:22] [PASSED] pf_fails_bad_type
[17:21:22] [PASSED] pf_txn_reports_error
[17:21:22] [PASSED] pf_txn_sends_pf2guc
[17:21:22] [PASSED] pf_sends_pf2guc
[17:21:22] [SKIPPED] pf_loopback_nop
[17:21:22] [SKIPPED] pf_loopback_echo
[17:21:22] [SKIPPED] pf_loopback_fail
[17:21:22] [SKIPPED] pf_loopback_busy
[17:21:22] [SKIPPED] pf_loopback_retry
[17:21:22] ==================== [PASSED] pf_relay =====================
[17:21:22] ================== vf_relay (3 subtests) ===================
[17:21:22] [PASSED] vf_rejects_guc2vf_too_short
[17:21:22] [PASSED] vf_rejects_guc2vf_too_long
[17:21:22] [PASSED] vf_rejects_guc2vf_no_payload
[17:21:22] ==================== [PASSED] vf_relay =====================
[17:21:22] ================ pf_gt_config (9 subtests) =================
[17:21:22] [PASSED] fair_contexts_1vf
[17:21:22] [PASSED] fair_doorbells_1vf
[17:21:22] [PASSED] fair_ggtt_1vf
[17:21:22] ====================== fair_vram_1vf ======================
[17:21:22] [PASSED] 3.50 GiB
[17:21:22] [PASSED] 11.5 GiB
[17:21:22] [PASSED] 15.5 GiB
[17:21:22] [PASSED] 31.5 GiB
[17:21:22] [PASSED] 63.5 GiB
[17:21:22] [PASSED] 1.91 GiB
[17:21:22] ================== [PASSED] fair_vram_1vf ==================
[17:21:22] ================ fair_vram_1vf_admin_only =================
[17:21:22] [PASSED] 3.50 GiB
[17:21:22] [PASSED] 11.5 GiB
[17:21:22] [PASSED] 15.5 GiB
[17:21:22] [PASSED] 31.5 GiB
[17:21:22] [PASSED] 63.5 GiB
[17:21:22] [PASSED] 1.91 GiB
[17:21:22] ============ [PASSED] fair_vram_1vf_admin_only =============
[17:21:22] ====================== fair_contexts ======================
[17:21:22] [PASSED] 1 VF
[17:21:22] [PASSED] 2 VFs
[17:21:22] [PASSED] 3 VFs
[17:21:22] [PASSED] 4 VFs
[17:21:22] [PASSED] 5 VFs
[17:21:22] [PASSED] 6 VFs
[17:21:22] [PASSED] 7 VFs
[17:21:22] [PASSED] 8 VFs
[17:21:22] [PASSED] 9 VFs
[17:21:22] [PASSED] 10 VFs
[17:21:22] [PASSED] 11 VFs
[17:21:22] [PASSED] 12 VFs
[17:21:22] [PASSED] 13 VFs
[17:21:22] [PASSED] 14 VFs
[17:21:22] [PASSED] 15 VFs
[17:21:22] [PASSED] 16 VFs
[17:21:22] [PASSED] 17 VFs
[17:21:22] [PASSED] 18 VFs
[17:21:22] [PASSED] 19 VFs
[17:21:22] [PASSED] 20 VFs
[17:21:22] [PASSED] 21 VFs
[17:21:22] [PASSED] 22 VFs
[17:21:22] [PASSED] 23 VFs
[17:21:22] [PASSED] 24 VFs
[17:21:22] [PASSED] 25 VFs
[17:21:22] [PASSED] 26 VFs
[17:21:22] [PASSED] 27 VFs
[17:21:22] [PASSED] 28 VFs
[17:21:22] [PASSED] 29 VFs
[17:21:22] [PASSED] 30 VFs
[17:21:22] [PASSED] 31 VFs
[17:21:22] [PASSED] 32 VFs
[17:21:22] [PASSED] 33 VFs
[17:21:22] [PASSED] 34 VFs
[17:21:22] [PASSED] 35 VFs
[17:21:22] [PASSED] 36 VFs
[17:21:22] [PASSED] 37 VFs
[17:21:22] [PASSED] 38 VFs
[17:21:22] [PASSED] 39 VFs
[17:21:22] [PASSED] 40 VFs
[17:21:22] [PASSED] 41 VFs
[17:21:22] [PASSED] 42 VFs
[17:21:22] [PASSED] 43 VFs
[17:21:22] [PASSED] 44 VFs
[17:21:22] [PASSED] 45 VFs
[17:21:22] [PASSED] 46 VFs
[17:21:22] [PASSED] 47 VFs
[17:21:22] [PASSED] 48 VFs
[17:21:22] [PASSED] 49 VFs
[17:21:22] [PASSED] 50 VFs
[17:21:22] [PASSED] 51 VFs
[17:21:22] [PASSED] 52 VFs
[17:21:22] [PASSED] 53 VFs
[17:21:22] [PASSED] 54 VFs
[17:21:22] [PASSED] 55 VFs
[17:21:22] [PASSED] 56 VFs
[17:21:22] [PASSED] 57 VFs
[17:21:22] [PASSED] 58 VFs
[17:21:22] [PASSED] 59 VFs
[17:21:22] [PASSED] 60 VFs
[17:21:22] [PASSED] 61 VFs
[17:21:22] [PASSED] 62 VFs
[17:21:22] [PASSED] 63 VFs
[17:21:22] ================== [PASSED] fair_contexts ==================
[17:21:22] ===================== fair_doorbells ======================
[17:21:22] [PASSED] 1 VF
[17:21:22] [PASSED] 2 VFs
[17:21:22] [PASSED] 3 VFs
[17:21:22] [PASSED] 4 VFs
[17:21:22] [PASSED] 5 VFs
[17:21:22] [PASSED] 6 VFs
[17:21:22] [PASSED] 7 VFs
[17:21:22] [PASSED] 8 VFs
[17:21:22] [PASSED] 9 VFs
[17:21:22] [PASSED] 10 VFs
[17:21:22] [PASSED] 11 VFs
[17:21:22] [PASSED] 12 VFs
[17:21:22] [PASSED] 13 VFs
[17:21:22] [PASSED] 14 VFs
[17:21:22] [PASSED] 15 VFs
[17:21:22] [PASSED] 16 VFs
[17:21:22] [PASSED] 17 VFs
[17:21:22] [PASSED] 18 VFs
[17:21:22] [PASSED] 19 VFs
[17:21:22] [PASSED] 20 VFs
[17:21:22] [PASSED] 21 VFs
[17:21:22] [PASSED] 22 VFs
[17:21:22] [PASSED] 23 VFs
[17:21:22] [PASSED] 24 VFs
[17:21:22] [PASSED] 25 VFs
[17:21:22] [PASSED] 26 VFs
[17:21:22] [PASSED] 27 VFs
[17:21:22] [PASSED] 28 VFs
[17:21:22] [PASSED] 29 VFs
[17:21:22] [PASSED] 30 VFs
[17:21:22] [PASSED] 31 VFs
[17:21:22] [PASSED] 32 VFs
[17:21:22] [PASSED] 33 VFs
[17:21:22] [PASSED] 34 VFs
[17:21:22] [PASSED] 35 VFs
[17:21:22] [PASSED] 36 VFs
[17:21:22] [PASSED] 37 VFs
[17:21:22] [PASSED] 38 VFs
[17:21:22] [PASSED] 39 VFs
[17:21:22] [PASSED] 40 VFs
[17:21:22] [PASSED] 41 VFs
[17:21:22] [PASSED] 42 VFs
[17:21:22] [PASSED] 43 VFs
[17:21:22] [PASSED] 44 VFs
[17:21:22] [PASSED] 45 VFs
[17:21:22] [PASSED] 46 VFs
[17:21:22] [PASSED] 47 VFs
[17:21:22] [PASSED] 48 VFs
[17:21:22] [PASSED] 49 VFs
[17:21:22] [PASSED] 50 VFs
[17:21:22] [PASSED] 51 VFs
[17:21:22] [PASSED] 52 VFs
[17:21:22] [PASSED] 53 VFs
[17:21:22] [PASSED] 54 VFs
[17:21:22] [PASSED] 55 VFs
[17:21:22] [PASSED] 56 VFs
[17:21:22] [PASSED] 57 VFs
[17:21:22] [PASSED] 58 VFs
[17:21:22] [PASSED] 59 VFs
[17:21:22] [PASSED] 60 VFs
[17:21:22] [PASSED] 61 VFs
[17:21:22] [PASSED] 62 VFs
[17:21:22] [PASSED] 63 VFs
[17:21:22] ================= [PASSED] fair_doorbells ==================
[17:21:22] ======================== fair_ggtt ========================
[17:21:22] [PASSED] 1 VF
[17:21:22] [PASSED] 2 VFs
[17:21:22] [PASSED] 3 VFs
[17:21:22] [PASSED] 4 VFs
[17:21:22] [PASSED] 5 VFs
[17:21:22] [PASSED] 6 VFs
[17:21:22] [PASSED] 7 VFs
[17:21:22] [PASSED] 8 VFs
[17:21:22] [PASSED] 9 VFs
[17:21:22] [PASSED] 10 VFs
[17:21:22] [PASSED] 11 VFs
[17:21:22] [PASSED] 12 VFs
[17:21:22] [PASSED] 13 VFs
[17:21:22] [PASSED] 14 VFs
[17:21:22] [PASSED] 15 VFs
[17:21:22] [PASSED] 16 VFs
[17:21:22] [PASSED] 17 VFs
[17:21:22] [PASSED] 18 VFs
[17:21:22] [PASSED] 19 VFs
[17:21:22] [PASSED] 20 VFs
[17:21:22] [PASSED] 21 VFs
[17:21:22] [PASSED] 22 VFs
[17:21:22] [PASSED] 23 VFs
[17:21:22] [PASSED] 24 VFs
[17:21:22] [PASSED] 25 VFs
[17:21:22] [PASSED] 26 VFs
[17:21:22] [PASSED] 27 VFs
[17:21:22] [PASSED] 28 VFs
[17:21:22] [PASSED] 29 VFs
[17:21:22] [PASSED] 30 VFs
[17:21:22] [PASSED] 31 VFs
[17:21:22] [PASSED] 32 VFs
[17:21:22] [PASSED] 33 VFs
[17:21:22] [PASSED] 34 VFs
[17:21:22] [PASSED] 35 VFs
[17:21:22] [PASSED] 36 VFs
[17:21:22] [PASSED] 37 VFs
[17:21:22] [PASSED] 38 VFs
[17:21:22] [PASSED] 39 VFs
[17:21:22] [PASSED] 40 VFs
[17:21:22] [PASSED] 41 VFs
[17:21:22] [PASSED] 42 VFs
[17:21:22] [PASSED] 43 VFs
[17:21:22] [PASSED] 44 VFs
[17:21:22] [PASSED] 45 VFs
[17:21:22] [PASSED] 46 VFs
[17:21:22] [PASSED] 47 VFs
[17:21:22] [PASSED] 48 VFs
[17:21:22] [PASSED] 49 VFs
[17:21:22] [PASSED] 50 VFs
[17:21:22] [PASSED] 51 VFs
[17:21:22] [PASSED] 52 VFs
[17:21:22] [PASSED] 53 VFs
[17:21:22] [PASSED] 54 VFs
[17:21:22] [PASSED] 55 VFs
[17:21:22] [PASSED] 56 VFs
[17:21:22] [PASSED] 57 VFs
[17:21:22] [PASSED] 58 VFs
[17:21:22] [PASSED] 59 VFs
[17:21:22] [PASSED] 60 VFs
[17:21:22] [PASSED] 61 VFs
[17:21:22] [PASSED] 62 VFs
[17:21:22] [PASSED] 63 VFs
[17:21:22] ==================== [PASSED] fair_ggtt ====================
[17:21:22] ======================== fair_vram ========================
[17:21:22] [PASSED] 1 VF
[17:21:22] [PASSED] 2 VFs
[17:21:22] [PASSED] 3 VFs
[17:21:22] [PASSED] 4 VFs
[17:21:22] [PASSED] 5 VFs
[17:21:22] [PASSED] 6 VFs
[17:21:22] [PASSED] 7 VFs
[17:21:22] [PASSED] 8 VFs
[17:21:22] [PASSED] 9 VFs
[17:21:22] [PASSED] 10 VFs
[17:21:22] [PASSED] 11 VFs
[17:21:22] [PASSED] 12 VFs
[17:21:22] [PASSED] 13 VFs
[17:21:22] [PASSED] 14 VFs
[17:21:22] [PASSED] 15 VFs
[17:21:22] [PASSED] 16 VFs
[17:21:22] [PASSED] 17 VFs
[17:21:22] [PASSED] 18 VFs
[17:21:22] [PASSED] 19 VFs
[17:21:22] [PASSED] 20 VFs
[17:21:22] [PASSED] 21 VFs
[17:21:22] [PASSED] 22 VFs
[17:21:22] [PASSED] 23 VFs
[17:21:22] [PASSED] 24 VFs
[17:21:22] [PASSED] 25 VFs
[17:21:22] [PASSED] 26 VFs
[17:21:22] [PASSED] 27 VFs
[17:21:22] [PASSED] 28 VFs
[17:21:22] [PASSED] 29 VFs
[17:21:22] [PASSED] 30 VFs
[17:21:22] [PASSED] 31 VFs
[17:21:22] [PASSED] 32 VFs
[17:21:22] [PASSED] 33 VFs
[17:21:22] [PASSED] 34 VFs
[17:21:22] [PASSED] 35 VFs
[17:21:22] [PASSED] 36 VFs
[17:21:22] [PASSED] 37 VFs
[17:21:22] [PASSED] 38 VFs
[17:21:22] [PASSED] 39 VFs
[17:21:22] [PASSED] 40 VFs
[17:21:22] [PASSED] 41 VFs
[17:21:22] [PASSED] 42 VFs
[17:21:22] [PASSED] 43 VFs
[17:21:22] [PASSED] 44 VFs
[17:21:22] [PASSED] 45 VFs
[17:21:22] [PASSED] 46 VFs
[17:21:22] [PASSED] 47 VFs
[17:21:22] [PASSED] 48 VFs
[17:21:22] [PASSED] 49 VFs
[17:21:22] [PASSED] 50 VFs
[17:21:22] [PASSED] 51 VFs
[17:21:22] [PASSED] 52 VFs
[17:21:22] [PASSED] 53 VFs
[17:21:22] [PASSED] 54 VFs
[17:21:22] [PASSED] 55 VFs
[17:21:22] [PASSED] 56 VFs
[17:21:22] [PASSED] 57 VFs
[17:21:22] [PASSED] 58 VFs
[17:21:22] [PASSED] 59 VFs
[17:21:22] [PASSED] 60 VFs
[17:21:22] [PASSED] 61 VFs
[17:21:22] [PASSED] 62 VFs
[17:21:22] [PASSED] 63 VFs
[17:21:22] ==================== [PASSED] fair_vram ====================
[17:21:22] ================== [PASSED] pf_gt_config ===================
[17:21:22] ===================== lmtt (1 subtest) =====================
[17:21:22] ======================== test_ops =========================
[17:21:22] [PASSED] 2-level
[17:21:22] [PASSED] multi-level
[17:21:22] ==================== [PASSED] test_ops =====================
[17:21:22] ====================== [PASSED] lmtt =======================
[17:21:22] ================= pf_service (11 subtests) =================
[17:21:22] [PASSED] pf_negotiate_any
[17:21:22] [PASSED] pf_negotiate_base_match
[17:21:22] [PASSED] pf_negotiate_base_newer
[17:21:22] [PASSED] pf_negotiate_base_next
[17:21:22] [SKIPPED] pf_negotiate_base_older
[17:21:22] [PASSED] pf_negotiate_base_prev
[17:21:22] [PASSED] pf_negotiate_latest_match
[17:21:22] [PASSED] pf_negotiate_latest_newer
[17:21:22] [PASSED] pf_negotiate_latest_next
[17:21:22] [SKIPPED] pf_negotiate_latest_older
[17:21:22] [SKIPPED] pf_negotiate_latest_prev
[17:21:22] =================== [PASSED] pf_service ====================
[17:21:22] ================= xe_guc_g2g (2 subtests) ==================
[17:21:22] ============== xe_live_guc_g2g_kunit_default ==============
[17:21:22] ========= [SKIPPED] xe_live_guc_g2g_kunit_default ==========
[17:21:22] ============== xe_live_guc_g2g_kunit_allmem ===============
[17:21:22] ========== [SKIPPED] xe_live_guc_g2g_kunit_allmem ==========
[17:21:22] =================== [SKIPPED] xe_guc_g2g ===================
[17:21:22] =================== xe_mocs (2 subtests) ===================
[17:21:22] ================ xe_live_mocs_kernel_kunit ================
[17:21:22] =========== [SKIPPED] xe_live_mocs_kernel_kunit ============
[17:21:22] ================ xe_live_mocs_reset_kunit =================
[17:21:22] ============ [SKIPPED] xe_live_mocs_reset_kunit ============
[17:21:22] ==================== [SKIPPED] xe_mocs =====================
[17:21:22] ================= xe_migrate (2 subtests) ==================
[17:21:22] ================= xe_migrate_sanity_kunit =================
[17:21:22] ============ [SKIPPED] xe_migrate_sanity_kunit =============
[17:21:22] ================== xe_validate_ccs_kunit ==================
[17:21:22] ============= [SKIPPED] xe_validate_ccs_kunit ==============
[17:21:22] =================== [SKIPPED] xe_migrate ===================
[17:21:22] ================== xe_dma_buf (1 subtest) ==================
[17:21:22] ==================== xe_dma_buf_kunit =====================
[17:21:22] ================ [SKIPPED] xe_dma_buf_kunit ================
[17:21:22] =================== [SKIPPED] xe_dma_buf ===================
[17:21:22] ================= xe_bo_shrink (1 subtest) =================
[17:21:22] =================== xe_bo_shrink_kunit ====================
[17:21:22] =============== [SKIPPED] xe_bo_shrink_kunit ===============
[17:21:22] ================== [SKIPPED] xe_bo_shrink ==================
[17:21:22] ==================== xe_bo (2 subtests) ====================
[17:21:22] ================== xe_ccs_migrate_kunit ===================
[17:21:22] ============== [SKIPPED] xe_ccs_migrate_kunit ==============
[17:21:22] ==================== xe_bo_evict_kunit ====================
[17:21:22] =============== [SKIPPED] xe_bo_evict_kunit ================
[17:21:22] ===================== [SKIPPED] xe_bo ======================
[17:21:22] ==================== args (13 subtests) ====================
[17:21:22] [PASSED] count_args_test
[17:21:22] [PASSED] call_args_example
[17:21:22] [PASSED] call_args_test
[17:21:22] [PASSED] drop_first_arg_example
[17:21:22] [PASSED] drop_first_arg_test
[17:21:22] [PASSED] first_arg_example
[17:21:22] [PASSED] first_arg_test
[17:21:22] [PASSED] last_arg_example
[17:21:22] [PASSED] last_arg_test
[17:21:22] [PASSED] pick_arg_example
[17:21:22] [PASSED] if_args_example
[17:21:22] [PASSED] if_args_test
[17:21:22] [PASSED] sep_comma_example
[17:21:22] ====================== [PASSED] args =======================
[17:21:22] =================== xe_pci (3 subtests) ====================
[17:21:22] ==================== check_graphics_ip ====================
[17:21:22] [PASSED] 12.00 Xe_LP
[17:21:22] [PASSED] 12.10 Xe_LP+
[17:21:22] [PASSED] 12.55 Xe_HPG
[17:21:22] [PASSED] 12.60 Xe_HPC
[17:21:22] [PASSED] 12.70 Xe_LPG
[17:21:22] [PASSED] 12.71 Xe_LPG
[17:21:22] [PASSED] 12.74 Xe_LPG+
[17:21:22] [PASSED] 20.01 Xe2_HPG
[17:21:22] [PASSED] 20.02 Xe2_HPG
[17:21:22] [PASSED] 20.04 Xe2_LPG
[17:21:22] [PASSED] 30.00 Xe3_LPG
[17:21:22] [PASSED] 30.01 Xe3_LPG
[17:21:22] [PASSED] 30.03 Xe3_LPG
[17:21:22] [PASSED] 30.04 Xe3_LPG
[17:21:22] [PASSED] 30.05 Xe3_LPG
[17:21:22] [PASSED] 35.10 Xe3p_LPG
[17:21:22] [PASSED] 35.11 Xe3p_XPC
[17:21:22] ================ [PASSED] check_graphics_ip ================
[17:21:22] ===================== check_media_ip ======================
[17:21:22] [PASSED] 12.00 Xe_M
[17:21:22] [PASSED] 12.55 Xe_HPM
[17:21:22] [PASSED] 13.00 Xe_LPM+
[17:21:22] [PASSED] 13.01 Xe2_HPM
[17:21:22] [PASSED] 20.00 Xe2_LPM
[17:21:22] [PASSED] 30.00 Xe3_LPM
[17:21:22] [PASSED] 30.02 Xe3_LPM
[17:21:22] [PASSED] 35.00 Xe3p_LPM
[17:21:22] [PASSED] 35.03 Xe3p_HPM
[17:21:22] ================= [PASSED] check_media_ip ==================
[17:21:22] =================== check_platform_desc ===================
[17:21:22] [PASSED] 0x9A60 (TIGERLAKE)
[17:21:22] [PASSED] 0x9A68 (TIGERLAKE)
[17:21:22] [PASSED] 0x9A70 (TIGERLAKE)
[17:21:22] [PASSED] 0x9A40 (TIGERLAKE)
[17:21:22] [PASSED] 0x9A49 (TIGERLAKE)
[17:21:22] [PASSED] 0x9A59 (TIGERLAKE)
[17:21:22] [PASSED] 0x9A78 (TIGERLAKE)
[17:21:22] [PASSED] 0x9AC0 (TIGERLAKE)
[17:21:22] [PASSED] 0x9AC9 (TIGERLAKE)
[17:21:22] [PASSED] 0x9AD9 (TIGERLAKE)
[17:21:22] [PASSED] 0x9AF8 (TIGERLAKE)
[17:21:22] [PASSED] 0x4C80 (ROCKETLAKE)
[17:21:22] [PASSED] 0x4C8A (ROCKETLAKE)
[17:21:22] [PASSED] 0x4C8B (ROCKETLAKE)
[17:21:22] [PASSED] 0x4C8C (ROCKETLAKE)
[17:21:22] [PASSED] 0x4C90 (ROCKETLAKE)
[17:21:22] [PASSED] 0x4C9A (ROCKETLAKE)
[17:21:22] [PASSED] 0x4680 (ALDERLAKE_S)
[17:21:22] [PASSED] 0x4682 (ALDERLAKE_S)
[17:21:22] [PASSED] 0x4688 (ALDERLAKE_S)
[17:21:22] [PASSED] 0x468A (ALDERLAKE_S)
[17:21:22] [PASSED] 0x468B (ALDERLAKE_S)
[17:21:22] [PASSED] 0x4690 (ALDERLAKE_S)
[17:21:22] [PASSED] 0x4692 (ALDERLAKE_S)
[17:21:22] [PASSED] 0x4693 (ALDERLAKE_S)
[17:21:22] [PASSED] 0x46A0 (ALDERLAKE_P)
[17:21:22] [PASSED] 0x46A1 (ALDERLAKE_P)
[17:21:22] [PASSED] 0x46A2 (ALDERLAKE_P)
[17:21:22] [PASSED] 0x46A3 (ALDERLAKE_P)
[17:21:22] [PASSED] 0x46A6 (ALDERLAKE_P)
[17:21:22] [PASSED] 0x46A8 (ALDERLAKE_P)
[17:21:22] [PASSED] 0x46AA (ALDERLAKE_P)
[17:21:22] [PASSED] 0x462A (ALDERLAKE_P)
[17:21:22] [PASSED] 0x4626 (ALDERLAKE_P)
[17:21:22] [PASSED] 0x4628 (ALDERLAKE_P)
[17:21:22] [PASSED] 0x46B0 (ALDERLAKE_P)
[17:21:22] [PASSED] 0x46B1 (ALDERLAKE_P)
[17:21:22] [PASSED] 0x46B2 (ALDERLAKE_P)
[17:21:22] [PASSED] 0x46B3 (ALDERLAKE_P)
[17:21:22] [PASSED] 0x46C0 (ALDERLAKE_P)
[17:21:22] [PASSED] 0x46C1 (ALDERLAKE_P)
[17:21:22] [PASSED] 0x46C2 (ALDERLAKE_P)
[17:21:22] [PASSED] 0x46C3 (ALDERLAKE_P)
[17:21:22] [PASSED] 0x46D0 (ALDERLAKE_N)
[17:21:22] [PASSED] 0x46D1 (ALDERLAKE_N)
[17:21:22] [PASSED] 0x46D2 (ALDERLAKE_N)
[17:21:22] [PASSED] 0x46D3 (ALDERLAKE_N)
[17:21:22] [PASSED] 0x46D4 (ALDERLAKE_N)
[17:21:22] [PASSED] 0xA721 (ALDERLAKE_P)
[17:21:22] [PASSED] 0xA7A1 (ALDERLAKE_P)
[17:21:22] [PASSED] 0xA7A9 (ALDERLAKE_P)
[17:21:22] [PASSED] 0xA7AC (ALDERLAKE_P)
[17:21:22] [PASSED] 0xA7AD (ALDERLAKE_P)
[17:21:22] [PASSED] 0xA720 (ALDERLAKE_P)
[17:21:22] [PASSED] 0xA7A0 (ALDERLAKE_P)
[17:21:22] [PASSED] 0xA7A8 (ALDERLAKE_P)
[17:21:22] [PASSED] 0xA7AA (ALDERLAKE_P)
[17:21:22] [PASSED] 0xA7AB (ALDERLAKE_P)
[17:21:22] [PASSED] 0xA780 (ALDERLAKE_S)
[17:21:22] [PASSED] 0xA781 (ALDERLAKE_S)
[17:21:22] [PASSED] 0xA782 (ALDERLAKE_S)
[17:21:22] [PASSED] 0xA783 (ALDERLAKE_S)
[17:21:22] [PASSED] 0xA788 (ALDERLAKE_S)
[17:21:22] [PASSED] 0xA789 (ALDERLAKE_S)
[17:21:22] [PASSED] 0xA78A (ALDERLAKE_S)
[17:21:22] [PASSED] 0xA78B (ALDERLAKE_S)
[17:21:22] [PASSED] 0x4905 (DG1)
[17:21:22] [PASSED] 0x4906 (DG1)
[17:21:22] [PASSED] 0x4907 (DG1)
[17:21:22] [PASSED] 0x4908 (DG1)
[17:21:22] [PASSED] 0x4909 (DG1)
[17:21:22] [PASSED] 0x56C0 (DG2)
[17:21:22] [PASSED] 0x56C2 (DG2)
[17:21:22] [PASSED] 0x56C1 (DG2)
[17:21:22] [PASSED] 0x7D51 (METEORLAKE)
[17:21:22] [PASSED] 0x7DD1 (METEORLAKE)
[17:21:22] [PASSED] 0x7D41 (METEORLAKE)
[17:21:22] [PASSED] 0x7D67 (METEORLAKE)
[17:21:22] [PASSED] 0xB640 (METEORLAKE)
[17:21:22] [PASSED] 0x56A0 (DG2)
[17:21:22] [PASSED] 0x56A1 (DG2)
[17:21:22] [PASSED] 0x56A2 (DG2)
[17:21:22] [PASSED] 0x56BE (DG2)
[17:21:22] [PASSED] 0x56BF (DG2)
[17:21:22] [PASSED] 0x5690 (DG2)
[17:21:22] [PASSED] 0x5691 (DG2)
[17:21:22] [PASSED] 0x5692 (DG2)
[17:21:22] [PASSED] 0x56A5 (DG2)
[17:21:22] [PASSED] 0x56A6 (DG2)
[17:21:22] [PASSED] 0x56B0 (DG2)
[17:21:22] [PASSED] 0x56B1 (DG2)
[17:21:22] [PASSED] 0x56BA (DG2)
[17:21:22] [PASSED] 0x56BB (DG2)
[17:21:22] [PASSED] 0x56BC (DG2)
[17:21:22] [PASSED] 0x56BD (DG2)
[17:21:22] [PASSED] 0x5693 (DG2)
[17:21:22] [PASSED] 0x5694 (DG2)
[17:21:22] [PASSED] 0x5695 (DG2)
[17:21:22] [PASSED] 0x56A3 (DG2)
[17:21:22] [PASSED] 0x56A4 (DG2)
[17:21:22] [PASSED] 0x56B2 (DG2)
[17:21:22] [PASSED] 0x56B3 (DG2)
[17:21:22] [PASSED] 0x5696 (DG2)
[17:21:22] [PASSED] 0x5697 (DG2)
[17:21:22] [PASSED] 0xB69 (PVC)
[17:21:22] [PASSED] 0xB6E (PVC)
[17:21:22] [PASSED] 0xBD4 (PVC)
[17:21:22] [PASSED] 0xBD5 (PVC)
[17:21:22] [PASSED] 0xBD6 (PVC)
[17:21:22] [PASSED] 0xBD7 (PVC)
[17:21:22] [PASSED] 0xBD8 (PVC)
[17:21:22] [PASSED] 0xBD9 (PVC)
[17:21:22] [PASSED] 0xBDA (PVC)
[17:21:22] [PASSED] 0xBDB (PVC)
[17:21:22] [PASSED] 0xBE0 (PVC)
[17:21:22] [PASSED] 0xBE1 (PVC)
[17:21:22] [PASSED] 0xBE5 (PVC)
[17:21:22] [PASSED] 0x7D40 (METEORLAKE)
[17:21:22] [PASSED] 0x7D45 (METEORLAKE)
[17:21:22] [PASSED] 0x7D55 (METEORLAKE)
[17:21:22] [PASSED] 0x7D60 (METEORLAKE)
[17:21:22] [PASSED] 0x7DD5 (METEORLAKE)
[17:21:22] [PASSED] 0x6420 (LUNARLAKE)
[17:21:22] [PASSED] 0x64A0 (LUNARLAKE)
[17:21:22] [PASSED] 0x64B0 (LUNARLAKE)
[17:21:22] [PASSED] 0xE202 (BATTLEMAGE)
[17:21:22] [PASSED] 0xE209 (BATTLEMAGE)
[17:21:22] [PASSED] 0xE20B (BATTLEMAGE)
[17:21:22] [PASSED] 0xE20C (BATTLEMAGE)
[17:21:22] [PASSED] 0xE20D (BATTLEMAGE)
[17:21:22] [PASSED] 0xE210 (BATTLEMAGE)
[17:21:22] [PASSED] 0xE211 (BATTLEMAGE)
[17:21:22] [PASSED] 0xE212 (BATTLEMAGE)
[17:21:22] [PASSED] 0xE216 (BATTLEMAGE)
[17:21:22] [PASSED] 0xE220 (BATTLEMAGE)
[17:21:22] [PASSED] 0xE221 (BATTLEMAGE)
[17:21:22] [PASSED] 0xE222 (BATTLEMAGE)
[17:21:22] [PASSED] 0xE223 (BATTLEMAGE)
[17:21:22] [PASSED] 0xB080 (PANTHERLAKE)
[17:21:22] [PASSED] 0xB081 (PANTHERLAKE)
[17:21:22] [PASSED] 0xB082 (PANTHERLAKE)
[17:21:22] [PASSED] 0xB083 (PANTHERLAKE)
[17:21:22] [PASSED] 0xB084 (PANTHERLAKE)
[17:21:22] [PASSED] 0xB085 (PANTHERLAKE)
[17:21:22] [PASSED] 0xB086 (PANTHERLAKE)
[17:21:22] [PASSED] 0xB087 (PANTHERLAKE)
[17:21:22] [PASSED] 0xB08F (PANTHERLAKE)
[17:21:22] [PASSED] 0xB090 (PANTHERLAKE)
[17:21:22] [PASSED] 0xB0A0 (PANTHERLAKE)
[17:21:22] [PASSED] 0xB0B0 (PANTHERLAKE)
[17:21:22] [PASSED] 0xFD80 (PANTHERLAKE)
[17:21:22] [PASSED] 0xFD81 (PANTHERLAKE)
[17:21:22] [PASSED] 0xD740 (NOVALAKE_S)
[17:21:22] [PASSED] 0xD741 (NOVALAKE_S)
[17:21:22] [PASSED] 0xD742 (NOVALAKE_S)
[17:21:22] [PASSED] 0xD743 (NOVALAKE_S)
[17:21:22] [PASSED] 0xD745 (NOVALAKE_S)
[17:21:22] [PASSED] 0xD74A (NOVALAKE_S)
[17:21:22] [PASSED] 0xD74B (NOVALAKE_S)
[17:21:22] [PASSED] 0x674C (CRESCENTISLAND)
[17:21:22] [PASSED] 0x674D (CRESCENTISLAND)
[17:21:22] [PASSED] 0x674E (CRESCENTISLAND)
[17:21:22] [PASSED] 0x674F (CRESCENTISLAND)
[17:21:22] [PASSED] 0x6750 (CRESCENTISLAND)
[17:21:22] [PASSED] 0xD750 (NOVALAKE_P)
[17:21:22] [PASSED] 0xD751 (NOVALAKE_P)
[17:21:22] [PASSED] 0xD752 (NOVALAKE_P)
[17:21:22] [PASSED] 0xD753 (NOVALAKE_P)
[17:21:22] [PASSED] 0xD754 (NOVALAKE_P)
[17:21:22] [PASSED] 0xD755 (NOVALAKE_P)
[17:21:22] [PASSED] 0xD756 (NOVALAKE_P)
[17:21:22] [PASSED] 0xD757 (NOVALAKE_P)
[17:21:22] [PASSED] 0xD75F (NOVALAKE_P)
[17:21:22] =============== [PASSED] check_platform_desc ===============
[17:21:22] ===================== [PASSED] xe_pci ======================
[17:21:22] ============= xe_rtp_tables_test (4 subtests) ==============
[17:21:22] ================== xe_rtp_table_gt_test ===================
[17:21:22] [PASSED] gt_was/14011060649
[17:21:22] [PASSED] gt_was/14011059788
[17:21:22] [PASSED] gt_was/14015795083
[17:21:22] [PASSED] gt_was/16021867713
[17:21:22] [PASSED] gt_was/14019449301
[17:21:22] [PASSED] gt_was/16028005424
[17:21:22] [PASSED] gt_was/14026578760
[17:21:22] [PASSED] gt_was/1409420604
[17:21:22] [PASSED] gt_was/1408615072
[17:21:22] [PASSED] gt_was/22010523718
[17:21:22] [PASSED] gt_was/14011006942
[17:21:22] [PASSED] gt_was/14014830051
[17:21:22] [PASSED] gt_was/18018781329
[17:21:22] [PASSED] gt_was/1509235366
[17:21:22] [PASSED] gt_was/18018781329
[17:21:22] [PASSED] gt_was/16016694945
[17:21:22] [PASSED] gt_was/14018575942
[17:21:22] [PASSED] gt_was/22016670082
[17:21:22] [PASSED] gt_was/22016670082
[17:21:22] [PASSED] gt_was/14017421178
[17:21:22] [PASSED] gt_was/16025250150
[17:21:22] [PASSED] gt_was/14021871409
[17:21:22] [PASSED] gt_was/16021865536
[17:21:22] [PASSED] gt_was/14021486841
[17:21:22] [PASSED] gt_was/14025160223
[17:21:22] [PASSED] gt_was/14026144927, 16029437861, 14026127056
[17:21:22] [PASSED] gt_was/14025635424
[17:21:22] [PASSED] gt_was/16028005424
[17:21:22] ============== [PASSED] xe_rtp_table_gt_test ===============
[17:21:22] ================== xe_rtp_table_gt_test ===================
[17:21:22] [PASSED] gt_tunings/Tuning: Blend Fill Caching Optimization Disable
[17:21:22] [PASSED] gt_tunings/Tuning: 32B Access Enable
[17:21:22] [PASSED] gt_tunings/Tuning: L3 cache
[17:21:22] [PASSED] gt_tunings/Tuning: L3 cache - media
[17:21:22] [PASSED] gt_tunings/Tuning: Compression Overfetch
[17:21:22] [PASSED] gt_tunings/Tuning: Compression Overfetch - media
[17:21:22] [PASSED] gt_tunings/Tuning: Enable compressible partial write overfetch in L3
[17:21:22] [PASSED] gt_tunings/Tuning: Enable compressible partial write overfetch in L3 - media
[17:21:22] [PASSED] gt_tunings/Tuning: L2 Overfetch Compressible Only
[17:21:22] [PASSED] gt_tunings/Tuning: L2 Overfetch Compressible Only - media
[17:21:22] [PASSED] gt_tunings/Tuning: Stateless compression control
[17:21:22] [PASSED] gt_tunings/Tuning: Stateless compression control - media
[17:21:22] [PASSED] gt_tunings/Tuning: L3 RW flush all Cache
[17:21:22] [PASSED] gt_tunings/Tuning: L3 RW flush all cache - media
[17:21:22] [PASSED] gt_tunings/Tuning: Set STLB Bank Hash Mode to 4KB
[17:21:22] ============== [PASSED] xe_rtp_table_gt_test ===============
[17:21:22] ================== xe_rtp_table_oob_test ==================
[17:21:22] [PASSED] oob_was/1607983814
[17:21:22] [PASSED] oob_was/16010904313
[17:21:22] [PASSED] oob_was/18022495364
[17:21:22] [PASSED] oob_was/22012773006
[17:21:22] [PASSED] oob_was/14014475959
[17:21:22] [PASSED] oob_was/22011391025
[17:21:22] [PASSED] oob_was/22012727170
[17:21:22] [PASSED] oob_was/22012727685
[17:21:22] [PASSED] oob_was/22016596838
[17:21:22] [PASSED] oob_was/18020744125
[17:21:22] [PASSED] oob_was/1409600907
[17:21:22] [PASSED] oob_was/22014953428
[17:21:22] [PASSED] oob_was/16017236439
[17:21:22] [PASSED] oob_was/14019821291
[17:21:22] [PASSED] oob_was/14015076503
[17:21:22] [PASSED] oob_was/14018913170
[17:21:22] [PASSED] oob_was/14018094691
[17:21:22] [PASSED] oob_was/18024947630
[17:21:22] [PASSED] oob_was/16022287689
[17:21:22] [PASSED] oob_was/13011645652
[17:21:22] [PASSED] oob_was/14022293748
[17:21:22] [PASSED] oob_was/22019794406
[17:21:22] [PASSED] oob_was/22019338487
[17:21:22] [PASSED] oob_was/16023588340
[17:21:22] [PASSED] oob_was/14019789679
[17:21:22] [PASSED] oob_was/14022866841
[17:21:22] [PASSED] oob_was/16021333562
[17:21:22] [PASSED] oob_was/14016712196
[17:21:22] [PASSED] oob_was/14015568240
[17:21:22] [PASSED] oob_was/18013179988
[17:21:22] [PASSED] oob_was/1508761755
[17:21:22] [PASSED] oob_was/16023105232
[17:21:22] [PASSED] oob_was/16026508708
[17:21:22] [PASSED] oob_was/14020001231
[17:21:22] [PASSED] oob_was/16023683509
[17:21:22] [PASSED] oob_was/14025515070
[17:21:22] [PASSED] oob_was/15015404425_disable
[17:21:22] [PASSED] oob_was/16026007364
[17:21:22] [PASSED] oob_was/14020316580
[17:21:22] [PASSED] oob_was/14025883347
[17:21:22] [PASSED] oob_was/16029380221
[17:21:22] ============== [PASSED] xe_rtp_table_oob_test ==============
[17:21:22] ================ xe_rtp_table_dev_oob_test ================
[17:21:22] [PASSED] device_oob_was/22010954014
[17:21:22] [PASSED] device_oob_was/15015404425
[17:21:22] [PASSED] device_oob_was/22019338487_display
[17:21:22] [PASSED] device_oob_was/14022085890
[17:21:22] [PASSED] device_oob_was/14026539277
[17:21:22] [PASSED] device_oob_was/14026633728
[17:21:22] [PASSED] device_oob_was/14026746987
[17:21:22] [PASSED] device_oob_was/14026779378
[17:21:22] ============ [PASSED] xe_rtp_table_dev_oob_test ============
[17:21:22] =============== [PASSED] xe_rtp_tables_test ================
[17:21:22] =================== xe_rtp (3 subtests) ====================
[17:21:22] =================== xe_rtp_rules_tests ====================
[17:21:22] [PASSED] no
[17:21:22] [PASSED] yes
[17:21:22] [PASSED] no-and-no
[17:21:22] [PASSED] no-and-yes
[17:21:22] [PASSED] yes-and-no
[17:21:22] [PASSED] yes-and-yes
[17:21:22] [PASSED] no-or-no
[17:21:22] [PASSED] no-or-yes
[17:21:22] [PASSED] yes-or-no
[17:21:22] [PASSED] yes-or-yes
[17:21:22] [PASSED] no-yes-or-yes-no
[17:21:22] [PASSED] no-yes-or-yes-yes
[17:21:22] [PASSED] yes-yes-or-no-yes
[17:21:22] [PASSED] yes-yes-or-yes-yes
[17:21:22] [PASSED] no-no-or-yes-or-no
[17:21:22] [PASSED] or
[17:21:22] [PASSED] or-yes
[17:21:22] [PASSED] or-no
[17:21:22] [PASSED] yes-or
[17:21:22] [PASSED] no-or
[17:21:22] [PASSED] no-or-or-yes
[17:21:22] [PASSED] yes-or-or-no
[17:21:22] [PASSED] no-or-or-no
[17:21:22] [PASSED] missing-context-engine-class
[17:21:22] [PASSED] missing-context-engine-class-or-yes
[17:21:22] [PASSED] missing-context-engine-class-or-or-yes
[17:21:22] =============== [PASSED] xe_rtp_rules_tests ================
[17:21:22] =============== xe_rtp_process_to_sr_tests ================
[17:21:22] [PASSED] coalesce-same-reg
[17:21:22] [PASSED] coalesce-same-reg-literal-and-func
[17:21:22] [PASSED] no-match-no-add
[17:21:22] [PASSED] two-regs-two-entries
[17:21:22] [PASSED] clr-one-set-other
[17:21:22] [PASSED] set-field
[17:21:22] [PASSED] conflict-duplicate
[17:21:22] [PASSED] conflict-not-disjoint
[17:21:22] [PASSED] conflict-not-disjoint-literal-and-func
[17:21:22] [PASSED] conflict-reg-type
[17:21:22] [PASSED] bad-mcr-reg-forced-to-regular
[17:21:22] [PASSED] bad-regular-reg-forced-to-mcr
[17:21:22] =========== [PASSED] xe_rtp_process_to_sr_tests ============
[17:21:22] ================== xe_rtp_process_tests ===================
[17:21:22] [PASSED] active1
[17:21:22] [PASSED] active2
[17:21:22] [PASSED] active-inactive
[17:21:22] [PASSED] inactive-active
[17:21:22] [PASSED] inactive-active-inactive
[17:21:22] [PASSED] inactive-inactive-inactive
[17:21:22] ============== [PASSED] xe_rtp_process_tests ===============
[17:21:22] ===================== [PASSED] xe_rtp ======================
[17:21:22] ==================== xe_wa (1 subtest) =====================
[17:21:22] ======================== xe_wa_gt =========================
[17:21:22] [PASSED] TIGERLAKE B0
[17:21:22] [PASSED] DG1 A0
[17:21:22] [PASSED] DG1 B0
[17:21:22] [PASSED] ALDERLAKE_S A0
[17:21:22] [PASSED] ALDERLAKE_S B0
[17:21:22] [PASSED] ALDERLAKE_S C0
[17:21:22] [PASSED] ALDERLAKE_S D0
[17:21:22] [PASSED] ALDERLAKE_P A0
[17:21:22] [PASSED] ALDERLAKE_P B0
[17:21:22] [PASSED] ALDERLAKE_P C0
[17:21:22] [PASSED] ALDERLAKE_S RPLS D0
[17:21:22] [PASSED] ALDERLAKE_P RPLU E0
[17:21:22] [PASSED] DG2 G10 C0
[17:21:22] [PASSED] DG2 G11 B1
[17:21:22] [PASSED] DG2 G12 A1
[17:21:22] [PASSED] METEORLAKE 12.70(Xe_LPG) A0 13.00(Xe_LPM+) A0
[17:21:22] [PASSED] METEORLAKE 12.71(Xe_LPG) A0 13.00(Xe_LPM+) A0
[17:21:22] [PASSED] METEORLAKE 12.74(Xe_LPG+) A0 13.00(Xe_LPM+) A0
[17:21:22] [PASSED] LUNARLAKE 20.04(Xe2_LPG) A0 20.00(Xe2_LPM) A0
[17:21:22] [PASSED] LUNARLAKE 20.04(Xe2_LPG) B0 20.00(Xe2_LPM) A0
[17:21:22] [PASSED] BATTLEMAGE 20.01(Xe2_HPG) A0 13.01(Xe2_HPM) A1
[17:21:22] [PASSED] PANTHERLAKE 30.00(Xe3_LPG) A0 30.00(Xe3_LPM) A0
[17:21:22] ==================== [PASSED] xe_wa_gt =====================
[17:21:22] ====================== [PASSED] xe_wa ======================
[17:21:22] ============================================================
[17:21:22] Testing complete. Ran 719 tests: passed: 701, skipped: 18
[17:21:22] Elapsed time: 36.128s total, 4.269s configuring, 31.193s building, 0.649s running
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/tests/.kunitconfig
[17:21:22] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[17:21:24] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[17:21:49] Starting KUnit Kernel (1/1)...
[17:21:49] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[17:21:49] ============ drm_test_pick_cmdline (2 subtests) ============
[17:21:49] [PASSED] drm_test_pick_cmdline_res_1920_1080_60
[17:21:49] =============== drm_test_pick_cmdline_named ===============
[17:21:49] [PASSED] NTSC
[17:21:49] [PASSED] NTSC-J
[17:21:49] [PASSED] PAL
[17:21:49] [PASSED] PAL-M
[17:21:49] =========== [PASSED] drm_test_pick_cmdline_named ===========
[17:21:49] ============== [PASSED] drm_test_pick_cmdline ==============
[17:21:49] == drm_test_atomic_get_connector_for_encoder (1 subtest) ===
[17:21:49] [PASSED] drm_test_drm_atomic_get_connector_for_encoder
[17:21:49] ==== [PASSED] drm_test_atomic_get_connector_for_encoder ====
[17:21:49] =========== drm_validate_clone_mode (2 subtests) ===========
[17:21:49] ============== drm_test_check_in_clone_mode ===============
[17:21:49] [PASSED] in_clone_mode
[17:21:49] [PASSED] not_in_clone_mode
[17:21:49] ========== [PASSED] drm_test_check_in_clone_mode ===========
[17:21:49] =============== drm_test_check_valid_clones ===============
[17:21:49] [PASSED] not_in_clone_mode
[17:21:49] [PASSED] valid_clone
[17:21:49] [PASSED] invalid_clone
[17:21:49] =========== [PASSED] drm_test_check_valid_clones ===========
[17:21:49] ============= [PASSED] drm_validate_clone_mode =============
[17:21:49] ============= drm_validate_modeset (1 subtest) =============
[17:21:49] [PASSED] drm_test_check_connector_changed_modeset
[17:21:49] ============== [PASSED] drm_validate_modeset ===============
[17:21:49] ====== drm_test_bridge_get_current_state (2 subtests) ======
[17:21:49] [PASSED] drm_test_drm_bridge_get_current_state_atomic
[17:21:49] [PASSED] drm_test_drm_bridge_get_current_state_legacy
[17:21:49] ======== [PASSED] drm_test_bridge_get_current_state ========
[17:21:49] ====== drm_test_bridge_helper_reset_crtc (4 subtests) ======
[17:21:49] [PASSED] drm_test_drm_bridge_helper_reset_crtc_atomic
[17:21:49] [PASSED] drm_test_drm_bridge_helper_reset_crtc_atomic_disabled
[17:21:49] [PASSED] drm_test_drm_bridge_helper_reset_crtc_legacy
[17:21:49] [PASSED] drm_test_drm_bridge_helper_hdmi_output_bus_fmts
[17:21:49] ======== [PASSED] drm_test_bridge_helper_reset_crtc ========
[17:21:49] ============== drm_bridge_alloc (2 subtests) ===============
[17:21:49] [PASSED] drm_test_drm_bridge_alloc_basic
[17:21:49] [PASSED] drm_test_drm_bridge_alloc_get_put
[17:21:49] ================ [PASSED] drm_bridge_alloc =================
[17:21:49] ============= drm_bridge_bus_fmt (5 subtests) ==============
[17:21:49] [PASSED] drm_test_bridge_rgb_yuv_rgb
[17:21:49] [PASSED] drm_test_bridge_must_convert_to_yuv444
[17:21:49] [PASSED] drm_test_bridge_hdmi_auto_rgb
[17:21:49] [PASSED] drm_test_bridge_auto_first
[17:21:49] [PASSED] drm_test_bridge_rgb_yuv_no_path
[17:21:49] =============== [PASSED] drm_bridge_bus_fmt ================
[17:21:49] ============= drm_cmdline_parser (40 subtests) =============
[17:21:49] [PASSED] drm_test_cmdline_force_d_only
[17:21:49] [PASSED] drm_test_cmdline_force_D_only_dvi
[17:21:49] [PASSED] drm_test_cmdline_force_D_only_hdmi
[17:21:49] [PASSED] drm_test_cmdline_force_D_only_not_digital
[17:21:49] [PASSED] drm_test_cmdline_force_e_only
[17:21:49] [PASSED] drm_test_cmdline_res
[17:21:49] [PASSED] drm_test_cmdline_res_vesa
[17:21:49] [PASSED] drm_test_cmdline_res_vesa_rblank
[17:21:49] [PASSED] drm_test_cmdline_res_rblank
[17:21:49] [PASSED] drm_test_cmdline_res_bpp
[17:21:49] [PASSED] drm_test_cmdline_res_refresh
[17:21:49] [PASSED] drm_test_cmdline_res_bpp_refresh
[17:21:49] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced
[17:21:49] [PASSED] drm_test_cmdline_res_bpp_refresh_margins
[17:21:49] [PASSED] drm_test_cmdline_res_bpp_refresh_force_off
[17:21:49] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on
[17:21:49] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_analog
[17:21:49] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_digital
[17:21:49] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced_margins_force_on
[17:21:49] [PASSED] drm_test_cmdline_res_margins_force_on
[17:21:49] [PASSED] drm_test_cmdline_res_vesa_margins
[17:21:49] [PASSED] drm_test_cmdline_name
[17:21:49] [PASSED] drm_test_cmdline_name_bpp
[17:21:49] [PASSED] drm_test_cmdline_name_option
[17:21:49] [PASSED] drm_test_cmdline_name_bpp_option
[17:21:49] [PASSED] drm_test_cmdline_rotate_0
[17:21:49] [PASSED] drm_test_cmdline_rotate_90
[17:21:49] [PASSED] drm_test_cmdline_rotate_180
[17:21:49] [PASSED] drm_test_cmdline_rotate_270
[17:21:49] [PASSED] drm_test_cmdline_hmirror
[17:21:49] [PASSED] drm_test_cmdline_vmirror
[17:21:49] [PASSED] drm_test_cmdline_margin_options
[17:21:49] [PASSED] drm_test_cmdline_multiple_options
[17:21:49] [PASSED] drm_test_cmdline_bpp_extra_and_option
[17:21:49] [PASSED] drm_test_cmdline_extra_and_option
[17:21:49] [PASSED] drm_test_cmdline_freestanding_options
[17:21:49] [PASSED] drm_test_cmdline_freestanding_force_e_and_options
[17:21:49] [PASSED] drm_test_cmdline_panel_orientation
[17:21:49] ================ drm_test_cmdline_invalid =================
[17:21:49] [PASSED] margin_only
[17:21:49] [PASSED] interlace_only
[17:21:49] [PASSED] res_missing_x
[17:21:49] [PASSED] res_missing_y
[17:21:49] [PASSED] res_bad_y
[17:21:49] [PASSED] res_missing_y_bpp
[17:21:49] [PASSED] res_bad_bpp
[17:21:49] [PASSED] res_bad_refresh
[17:21:49] [PASSED] res_bpp_refresh_force_on_off
[17:21:49] [PASSED] res_invalid_mode
[17:21:49] [PASSED] res_bpp_wrong_place_mode
[17:21:49] [PASSED] name_bpp_refresh
[17:21:49] [PASSED] name_refresh
[17:21:49] [PASSED] name_refresh_wrong_mode
[17:21:49] [PASSED] name_refresh_invalid_mode
[17:21:49] [PASSED] rotate_multiple
[17:21:49] [PASSED] rotate_invalid_val
[17:21:49] [PASSED] rotate_truncated
[17:21:49] [PASSED] invalid_option
[17:21:49] [PASSED] invalid_tv_option
[17:21:49] [PASSED] truncated_tv_option
[17:21:49] ============ [PASSED] drm_test_cmdline_invalid =============
[17:21:49] =============== drm_test_cmdline_tv_options ===============
[17:21:49] [PASSED] NTSC
[17:21:49] [PASSED] NTSC_443
[17:21:49] [PASSED] NTSC_J
[17:21:49] [PASSED] PAL
[17:21:49] [PASSED] PAL_M
[17:21:49] [PASSED] PAL_N
[17:21:49] [PASSED] SECAM
[17:21:49] [PASSED] MONO_525
[17:21:49] [PASSED] MONO_625
[17:21:49] =========== [PASSED] drm_test_cmdline_tv_options ===========
[17:21:49] =============== [PASSED] drm_cmdline_parser ================
[17:21:49] ========== drmm_connector_hdmi_init (20 subtests) ==========
[17:21:49] [PASSED] drm_test_connector_hdmi_init_valid
[17:21:49] [PASSED] drm_test_connector_hdmi_init_bpc_8
[17:21:49] [PASSED] drm_test_connector_hdmi_init_bpc_10
[17:21:49] [PASSED] drm_test_connector_hdmi_init_bpc_12
[17:21:49] [PASSED] drm_test_connector_hdmi_init_bpc_invalid
[17:21:49] [PASSED] drm_test_connector_hdmi_init_bpc_null
[17:21:49] [PASSED] drm_test_connector_hdmi_init_formats_empty
[17:21:49] [PASSED] drm_test_connector_hdmi_init_formats_no_rgb
[17:21:49] === drm_test_connector_hdmi_init_formats_yuv420_allowed ===
[17:21:49] [PASSED] supported_formats=0x9 yuv420_allowed=1
[17:21:49] [PASSED] supported_formats=0x9 yuv420_allowed=0
[17:21:49] [PASSED] supported_formats=0x5 yuv420_allowed=1
[17:21:49] [PASSED] supported_formats=0x5 yuv420_allowed=0
[17:21:49] === [PASSED] drm_test_connector_hdmi_init_formats_yuv420_allowed ===
[17:21:49] [PASSED] drm_test_connector_hdmi_init_null_ddc
[17:21:49] [PASSED] drm_test_connector_hdmi_init_null_product
[17:21:49] [PASSED] drm_test_connector_hdmi_init_null_vendor
[17:21:49] [PASSED] drm_test_connector_hdmi_init_product_length_exact
[17:21:49] [PASSED] drm_test_connector_hdmi_init_product_length_too_long
[17:21:49] [PASSED] drm_test_connector_hdmi_init_product_valid
[17:21:49] [PASSED] drm_test_connector_hdmi_init_vendor_length_exact
[17:21:49] [PASSED] drm_test_connector_hdmi_init_vendor_length_too_long
[17:21:49] [PASSED] drm_test_connector_hdmi_init_vendor_valid
[17:21:49] ========= drm_test_connector_hdmi_init_type_valid =========
[17:21:49] [PASSED] HDMI-A
[17:21:49] [PASSED] HDMI-B
[17:21:49] ===== [PASSED] drm_test_connector_hdmi_init_type_valid =====
[17:21:49] ======== drm_test_connector_hdmi_init_type_invalid ========
[17:21:49] [PASSED] Unknown
[17:21:49] [PASSED] VGA
[17:21:49] [PASSED] DVI-I
[17:21:49] [PASSED] DVI-D
[17:21:49] [PASSED] DVI-A
[17:21:49] [PASSED] Composite
[17:21:49] [PASSED] SVIDEO
[17:21:49] [PASSED] LVDS
[17:21:49] [PASSED] Component
[17:21:49] [PASSED] DIN
[17:21:49] [PASSED] DP
[17:21:49] [PASSED] TV
[17:21:49] [PASSED] eDP
[17:21:49] [PASSED] Virtual
[17:21:49] [PASSED] DSI
[17:21:49] [PASSED] DPI
[17:21:49] [PASSED] Writeback
[17:21:49] [PASSED] SPI
[17:21:49] [PASSED] USB
[17:21:49] ==== [PASSED] drm_test_connector_hdmi_init_type_invalid ====
[17:21:49] ============ [PASSED] drmm_connector_hdmi_init =============
[17:21:49] ============= drmm_connector_init (3 subtests) =============
[17:21:49] [PASSED] drm_test_drmm_connector_init
[17:21:49] [PASSED] drm_test_drmm_connector_init_null_ddc
[17:21:49] ========= drm_test_drmm_connector_init_type_valid =========
[17:21:49] [PASSED] Unknown
[17:21:49] [PASSED] VGA
[17:21:49] [PASSED] DVI-I
[17:21:49] [PASSED] DVI-D
[17:21:49] [PASSED] DVI-A
[17:21:49] [PASSED] Composite
[17:21:49] [PASSED] SVIDEO
[17:21:49] [PASSED] LVDS
[17:21:49] [PASSED] Component
[17:21:49] [PASSED] DIN
[17:21:49] [PASSED] DP
[17:21:49] [PASSED] HDMI-A
[17:21:49] [PASSED] HDMI-B
[17:21:49] [PASSED] TV
[17:21:49] [PASSED] eDP
[17:21:49] [PASSED] Virtual
[17:21:49] [PASSED] DSI
[17:21:49] [PASSED] DPI
[17:21:49] [PASSED] Writeback
[17:21:49] [PASSED] SPI
[17:21:49] [PASSED] USB
[17:21:49] ===== [PASSED] drm_test_drmm_connector_init_type_valid =====
[17:21:49] =============== [PASSED] drmm_connector_init ===============
[17:21:49] ========= drm_connector_dynamic_init (6 subtests) ==========
[17:21:49] [PASSED] drm_test_drm_connector_dynamic_init
[17:21:49] [PASSED] drm_test_drm_connector_dynamic_init_null_ddc
[17:21:49] [PASSED] drm_test_drm_connector_dynamic_init_not_added
[17:21:49] [PASSED] drm_test_drm_connector_dynamic_init_properties
[17:21:49] ===== drm_test_drm_connector_dynamic_init_type_valid ======
[17:21:49] [PASSED] Unknown
[17:21:49] [PASSED] VGA
[17:21:49] [PASSED] DVI-I
[17:21:49] [PASSED] DVI-D
[17:21:49] [PASSED] DVI-A
[17:21:49] [PASSED] Composite
[17:21:49] [PASSED] SVIDEO
[17:21:49] [PASSED] LVDS
[17:21:49] [PASSED] Component
[17:21:49] [PASSED] DIN
[17:21:49] [PASSED] DP
[17:21:49] [PASSED] HDMI-A
[17:21:49] [PASSED] HDMI-B
[17:21:49] [PASSED] TV
[17:21:49] [PASSED] eDP
[17:21:49] [PASSED] Virtual
[17:21:49] [PASSED] DSI
[17:21:49] [PASSED] DPI
[17:21:49] [PASSED] Writeback
[17:21:49] [PASSED] SPI
[17:21:49] [PASSED] USB
[17:21:49] = [PASSED] drm_test_drm_connector_dynamic_init_type_valid ==
[17:21:49] ======== drm_test_drm_connector_dynamic_init_name =========
[17:21:49] [PASSED] Unknown
[17:21:49] [PASSED] VGA
[17:21:49] [PASSED] DVI-I
[17:21:49] [PASSED] DVI-D
[17:21:49] [PASSED] DVI-A
[17:21:49] [PASSED] Composite
[17:21:49] [PASSED] SVIDEO
[17:21:49] [PASSED] LVDS
[17:21:49] [PASSED] Component
[17:21:49] [PASSED] DIN
[17:21:49] [PASSED] DP
[17:21:49] [PASSED] HDMI-A
[17:21:49] [PASSED] HDMI-B
[17:21:49] [PASSED] TV
[17:21:49] [PASSED] eDP
[17:21:49] [PASSED] Virtual
[17:21:49] [PASSED] DSI
[17:21:49] [PASSED] DPI
[17:21:49] [PASSED] Writeback
[17:21:49] [PASSED] SPI
[17:21:49] [PASSED] USB
[17:21:49] ==== [PASSED] drm_test_drm_connector_dynamic_init_name =====
[17:21:49] =========== [PASSED] drm_connector_dynamic_init ============
[17:21:49] ==== drm_connector_dynamic_register_early (4 subtests) =====
[17:21:49] [PASSED] drm_test_drm_connector_dynamic_register_early_on_list
[17:21:49] [PASSED] drm_test_drm_connector_dynamic_register_early_defer
[17:21:49] [PASSED] drm_test_drm_connector_dynamic_register_early_no_init
[17:21:49] [PASSED] drm_test_drm_connector_dynamic_register_early_no_mode_object
[17:21:49] ====== [PASSED] drm_connector_dynamic_register_early =======
[17:21:49] ======= drm_connector_dynamic_register (7 subtests) ========
[17:21:49] [PASSED] drm_test_drm_connector_dynamic_register_on_list
[17:21:49] [PASSED] drm_test_drm_connector_dynamic_register_no_defer
[17:21:49] [PASSED] drm_test_drm_connector_dynamic_register_no_init
[17:21:49] [PASSED] drm_test_drm_connector_dynamic_register_mode_object
[17:21:49] [PASSED] drm_test_drm_connector_dynamic_register_sysfs
[17:21:49] [PASSED] drm_test_drm_connector_dynamic_register_sysfs_name
[17:21:49] [PASSED] drm_test_drm_connector_dynamic_register_debugfs
[17:21:49] ========= [PASSED] drm_connector_dynamic_register ==========
[17:21:49] = drm_connector_attach_broadcast_rgb_property (2 subtests) =
[17:21:49] [PASSED] drm_test_drm_connector_attach_broadcast_rgb_property
[17:21:49] [PASSED] drm_test_drm_connector_attach_broadcast_rgb_property_hdmi_connector
[17:21:49] === [PASSED] drm_connector_attach_broadcast_rgb_property ===
[17:21:49] ========== drm_get_tv_mode_from_name (2 subtests) ==========
[17:21:49] ========== drm_test_get_tv_mode_from_name_valid ===========
[17:21:49] [PASSED] NTSC
[17:21:49] [PASSED] NTSC-443
[17:21:49] [PASSED] NTSC-J
[17:21:49] [PASSED] PAL
[17:21:49] [PASSED] PAL-M
[17:21:49] [PASSED] PAL-N
[17:21:49] [PASSED] SECAM
[17:21:49] [PASSED] Mono
[17:21:49] ====== [PASSED] drm_test_get_tv_mode_from_name_valid =======
[17:21:49] [PASSED] drm_test_get_tv_mode_from_name_truncated
[17:21:49] ============ [PASSED] drm_get_tv_mode_from_name ============
[17:21:49] = drm_test_connector_hdmi_compute_mode_clock (12 subtests) =
[17:21:49] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb
[17:21:49] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_10bpc
[17:21:49] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_10bpc_vic_1
[17:21:49] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_12bpc
[17:21:49] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_12bpc_vic_1
[17:21:49] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_double
[17:21:49] = drm_test_connector_hdmi_compute_mode_clock_yuv420_valid =
[17:21:49] [PASSED] VIC 96
[17:21:49] [PASSED] VIC 97
[17:21:49] [PASSED] VIC 101
[17:21:49] [PASSED] VIC 102
[17:21:49] [PASSED] VIC 106
[17:21:49] [PASSED] VIC 107
[17:21:49] === [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_valid ===
[17:21:49] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_10_bpc
[17:21:49] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_12_bpc
[17:21:49] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_8_bpc
[17:21:49] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_10_bpc
[17:21:49] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_12_bpc
[17:21:49] === [PASSED] drm_test_connector_hdmi_compute_mode_clock ====
[17:21:49] == drm_hdmi_connector_get_broadcast_rgb_name (2 subtests) ==
[17:21:49] === drm_test_drm_hdmi_connector_get_broadcast_rgb_name ====
[17:21:49] [PASSED] Automatic
[17:21:49] [PASSED] Full
[17:21:49] [PASSED] Limited 16:235
[17:21:49] === [PASSED] drm_test_drm_hdmi_connector_get_broadcast_rgb_name ===
[17:21:49] [PASSED] drm_test_drm_hdmi_connector_get_broadcast_rgb_name_invalid
[17:21:49] ==== [PASSED] drm_hdmi_connector_get_broadcast_rgb_name ====
[17:21:49] == drm_hdmi_connector_get_output_format_name (2 subtests) ==
[17:21:49] === drm_test_drm_hdmi_connector_get_output_format_name ====
[17:21:49] [PASSED] RGB
[17:21:49] [PASSED] YUV 4:2:0
[17:21:49] [PASSED] YUV 4:2:2
[17:21:49] [PASSED] YUV 4:4:4
[17:21:49] === [PASSED] drm_test_drm_hdmi_connector_get_output_format_name ===
[17:21:49] [PASSED] drm_test_drm_hdmi_connector_get_output_format_name_invalid
[17:21:49] ==== [PASSED] drm_hdmi_connector_get_output_format_name ====
[17:21:49] ============= drm_damage_helper (21 subtests) ==============
[17:21:49] [PASSED] drm_test_damage_iter_no_damage
[17:21:49] [PASSED] drm_test_damage_iter_no_damage_fractional_src
[17:21:49] [PASSED] drm_test_damage_iter_no_damage_src_moved
[17:21:49] [PASSED] drm_test_damage_iter_no_damage_fractional_src_moved
[17:21:49] [PASSED] drm_test_damage_iter_no_damage_not_visible
[17:21:49] [PASSED] drm_test_damage_iter_no_damage_no_crtc
[17:21:49] [PASSED] drm_test_damage_iter_no_damage_no_fb
[17:21:49] [PASSED] drm_test_damage_iter_simple_damage
[17:21:49] [PASSED] drm_test_damage_iter_single_damage
[17:21:49] [PASSED] drm_test_damage_iter_single_damage_intersect_src
[17:21:49] [PASSED] drm_test_damage_iter_single_damage_outside_src
[17:21:49] [PASSED] drm_test_damage_iter_single_damage_fractional_src
[17:21:49] [PASSED] drm_test_damage_iter_single_damage_intersect_fractional_src
[17:21:49] [PASSED] drm_test_damage_iter_single_damage_outside_fractional_src
[17:21:49] [PASSED] drm_test_damage_iter_single_damage_src_moved
[17:21:49] [PASSED] drm_test_damage_iter_single_damage_fractional_src_moved
[17:21:49] [PASSED] drm_test_damage_iter_damage
[17:21:49] [PASSED] drm_test_damage_iter_damage_one_intersect
[17:21:49] [PASSED] drm_test_damage_iter_damage_one_outside
[17:21:49] [PASSED] drm_test_damage_iter_damage_src_moved
[17:21:49] [PASSED] drm_test_damage_iter_damage_not_visible
[17:21:49] ================ [PASSED] drm_damage_helper ================
[17:21:49] ============== drm_dp_mst_helper (3 subtests) ==============
[17:21:49] ============== drm_test_dp_mst_calc_pbn_mode ==============
[17:21:49] [PASSED] Clock 154000 BPP 30 DSC disabled
[17:21:49] [PASSED] Clock 234000 BPP 30 DSC disabled
[17:21:49] [PASSED] Clock 297000 BPP 24 DSC disabled
[17:21:49] [PASSED] Clock 332880 BPP 24 DSC enabled
[17:21:49] [PASSED] Clock 324540 BPP 24 DSC enabled
[17:21:49] ========== [PASSED] drm_test_dp_mst_calc_pbn_mode ==========
[17:21:49] ============== drm_test_dp_mst_calc_pbn_div ===============
[17:21:49] [PASSED] Link rate 2000000 lane count 4
[17:21:49] [PASSED] Link rate 2000000 lane count 2
[17:21:49] [PASSED] Link rate 2000000 lane count 1
[17:21:49] [PASSED] Link rate 1350000 lane count 4
[17:21:49] [PASSED] Link rate 1350000 lane count 2
[17:21:49] [PASSED] Link rate 1350000 lane count 1
[17:21:49] [PASSED] Link rate 1000000 lane count 4
[17:21:49] [PASSED] Link rate 1000000 lane count 2
[17:21:49] [PASSED] Link rate 1000000 lane count 1
[17:21:49] [PASSED] Link rate 810000 lane count 4
[17:21:49] [PASSED] Link rate 810000 lane count 2
[17:21:49] [PASSED] Link rate 810000 lane count 1
[17:21:49] [PASSED] Link rate 540000 lane count 4
[17:21:49] [PASSED] Link rate 540000 lane count 2
[17:21:49] [PASSED] Link rate 540000 lane count 1
[17:21:49] [PASSED] Link rate 270000 lane count 4
[17:21:49] [PASSED] Link rate 270000 lane count 2
[17:21:49] [PASSED] Link rate 270000 lane count 1
[17:21:49] [PASSED] Link rate 162000 lane count 4
[17:21:49] [PASSED] Link rate 162000 lane count 2
[17:21:49] [PASSED] Link rate 162000 lane count 1
[17:21:49] ========== [PASSED] drm_test_dp_mst_calc_pbn_div ===========
[17:21:49] ========= drm_test_dp_mst_sideband_msg_req_decode =========
[17:21:49] [PASSED] DP_ENUM_PATH_RESOURCES with port number
[17:21:49] [PASSED] DP_POWER_UP_PHY with port number
[17:21:49] [PASSED] DP_POWER_DOWN_PHY with port number
[17:21:49] [PASSED] DP_ALLOCATE_PAYLOAD with SDP stream sinks
[17:21:49] [PASSED] DP_ALLOCATE_PAYLOAD with port number
[17:21:49] [PASSED] DP_ALLOCATE_PAYLOAD with VCPI
[17:21:49] [PASSED] DP_ALLOCATE_PAYLOAD with PBN
[17:21:49] [PASSED] DP_QUERY_PAYLOAD with port number
[17:21:49] [PASSED] DP_QUERY_PAYLOAD with VCPI
[17:21:49] [PASSED] DP_REMOTE_DPCD_READ with port number
[17:21:49] [PASSED] DP_REMOTE_DPCD_READ with DPCD address
[17:21:49] [PASSED] DP_REMOTE_DPCD_READ with max number of bytes
[17:21:49] [PASSED] DP_REMOTE_DPCD_WRITE with port number
[17:21:49] [PASSED] DP_REMOTE_DPCD_WRITE with DPCD address
[17:21:49] [PASSED] DP_REMOTE_DPCD_WRITE with data array
[17:21:49] [PASSED] DP_REMOTE_I2C_READ with port number
[17:21:49] [PASSED] DP_REMOTE_I2C_READ with I2C device ID
[17:21:49] [PASSED] DP_REMOTE_I2C_READ with transactions array
[17:21:49] [PASSED] DP_REMOTE_I2C_WRITE with port number
[17:21:49] [PASSED] DP_REMOTE_I2C_WRITE with I2C device ID
[17:21:49] [PASSED] DP_REMOTE_I2C_WRITE with data array
[17:21:49] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream ID
[17:21:49] [PASSED] DP_QUERY_STREAM_ENC_STATUS with client ID
[17:21:49] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream event
[17:21:49] [PASSED] DP_QUERY_STREAM_ENC_STATUS with valid stream event
[17:21:49] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream behavior
[17:21:49] [PASSED] DP_QUERY_STREAM_ENC_STATUS with a valid stream behavior
[17:21:49] ===== [PASSED] drm_test_dp_mst_sideband_msg_req_decode =====
[17:21:49] ================ [PASSED] drm_dp_mst_helper ================
[17:21:49] ================== drm_exec (7 subtests) ===================
[17:21:49] [PASSED] sanitycheck
[17:21:49] [PASSED] test_lock
[17:21:49] [PASSED] test_lock_unlock
[17:21:49] [PASSED] test_duplicates
[17:21:49] [PASSED] test_prepare
[17:21:49] [PASSED] test_prepare_array
[17:21:49] [PASSED] test_multiple_loops
[17:21:49] ==================== [PASSED] drm_exec =====================
[17:21:49] =========== drm_format_helper_test (17 subtests) ===========
[17:21:49] ============== drm_test_fb_xrgb8888_to_gray8 ==============
[17:21:49] [PASSED] single_pixel_source_buffer
[17:21:49] [PASSED] single_pixel_clip_rectangle
[17:21:49] [PASSED] well_known_colors
[17:21:49] [PASSED] destination_pitch
[17:21:49] ========== [PASSED] drm_test_fb_xrgb8888_to_gray8 ==========
[17:21:49] ============= drm_test_fb_xrgb8888_to_rgb332 ==============
[17:21:49] [PASSED] single_pixel_source_buffer
[17:21:49] [PASSED] single_pixel_clip_rectangle
[17:21:49] [PASSED] well_known_colors
[17:21:49] [PASSED] destination_pitch
[17:21:49] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb332 ==========
[17:21:49] ============= drm_test_fb_xrgb8888_to_rgb565 ==============
[17:21:49] [PASSED] single_pixel_source_buffer
[17:21:49] [PASSED] single_pixel_clip_rectangle
[17:21:49] [PASSED] well_known_colors
[17:21:49] [PASSED] destination_pitch
[17:21:49] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb565 ==========
[17:21:49] ============ drm_test_fb_xrgb8888_to_xrgb1555 =============
[17:21:49] [PASSED] single_pixel_source_buffer
[17:21:49] [PASSED] single_pixel_clip_rectangle
[17:21:49] [PASSED] well_known_colors
[17:21:49] [PASSED] destination_pitch
[17:21:49] ======== [PASSED] drm_test_fb_xrgb8888_to_xrgb1555 =========
[17:21:49] ============ drm_test_fb_xrgb8888_to_argb1555 =============
[17:21:49] [PASSED] single_pixel_source_buffer
[17:21:49] [PASSED] single_pixel_clip_rectangle
[17:21:49] [PASSED] well_known_colors
[17:21:49] [PASSED] destination_pitch
[17:21:49] ======== [PASSED] drm_test_fb_xrgb8888_to_argb1555 =========
[17:21:49] ============ drm_test_fb_xrgb8888_to_rgba5551 =============
[17:21:49] [PASSED] single_pixel_source_buffer
[17:21:49] [PASSED] single_pixel_clip_rectangle
[17:21:49] [PASSED] well_known_colors
[17:21:49] [PASSED] destination_pitch
[17:21:49] ======== [PASSED] drm_test_fb_xrgb8888_to_rgba5551 =========
[17:21:49] ============= drm_test_fb_xrgb8888_to_rgb888 ==============
[17:21:49] [PASSED] single_pixel_source_buffer
[17:21:49] [PASSED] single_pixel_clip_rectangle
[17:21:49] [PASSED] well_known_colors
[17:21:49] [PASSED] destination_pitch
[17:21:49] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb888 ==========
[17:21:49] ============= drm_test_fb_xrgb8888_to_bgr888 ==============
[17:21:49] [PASSED] single_pixel_source_buffer
[17:21:49] [PASSED] single_pixel_clip_rectangle
[17:21:49] [PASSED] well_known_colors
[17:21:49] [PASSED] destination_pitch
[17:21:49] ========= [PASSED] drm_test_fb_xrgb8888_to_bgr888 ==========
[17:21:49] ============ drm_test_fb_xrgb8888_to_argb8888 =============
[17:21:49] [PASSED] single_pixel_source_buffer
[17:21:49] [PASSED] single_pixel_clip_rectangle
[17:21:49] [PASSED] well_known_colors
[17:21:49] [PASSED] destination_pitch
[17:21:49] ======== [PASSED] drm_test_fb_xrgb8888_to_argb8888 =========
[17:21:49] =========== drm_test_fb_xrgb8888_to_xrgb2101010 ===========
[17:21:49] [PASSED] single_pixel_source_buffer
[17:21:49] [PASSED] single_pixel_clip_rectangle
[17:21:49] [PASSED] well_known_colors
[17:21:49] [PASSED] destination_pitch
[17:21:49] ======= [PASSED] drm_test_fb_xrgb8888_to_xrgb2101010 =======
[17:21:49] =========== drm_test_fb_xrgb8888_to_argb2101010 ===========
[17:21:49] [PASSED] single_pixel_source_buffer
[17:21:49] [PASSED] single_pixel_clip_rectangle
[17:21:49] [PASSED] well_known_colors
[17:21:49] [PASSED] destination_pitch
[17:21:49] ======= [PASSED] drm_test_fb_xrgb8888_to_argb2101010 =======
[17:21:49] ============== drm_test_fb_xrgb8888_to_mono ===============
[17:21:49] [PASSED] single_pixel_source_buffer
[17:21:49] [PASSED] single_pixel_clip_rectangle
[17:21:49] [PASSED] well_known_colors
[17:21:49] [PASSED] destination_pitch
[17:21:49] ========== [PASSED] drm_test_fb_xrgb8888_to_mono ===========
[17:21:49] ==================== drm_test_fb_swab =====================
[17:21:49] [PASSED] single_pixel_source_buffer
[17:21:49] [PASSED] single_pixel_clip_rectangle
[17:21:49] [PASSED] well_known_colors
[17:21:49] [PASSED] destination_pitch
[17:21:49] ================ [PASSED] drm_test_fb_swab =================
[17:21:49] ============ drm_test_fb_xrgb8888_to_xbgr8888 =============
[17:21:49] [PASSED] single_pixel_source_buffer
[17:21:49] [PASSED] single_pixel_clip_rectangle
[17:21:49] [PASSED] well_known_colors
[17:21:49] [PASSED] destination_pitch
[17:21:49] ======== [PASSED] drm_test_fb_xrgb8888_to_xbgr8888 =========
[17:21:49] ============ drm_test_fb_xrgb8888_to_abgr8888 =============
[17:21:49] [PASSED] single_pixel_source_buffer
[17:21:49] [PASSED] single_pixel_clip_rectangle
[17:21:49] [PASSED] well_known_colors
[17:21:49] [PASSED] destination_pitch
[17:21:49] ======== [PASSED] drm_test_fb_xrgb8888_to_abgr8888 =========
[17:21:49] ================= drm_test_fb_clip_offset =================
[17:21:49] [PASSED] pass through
[17:21:49] [PASSED] horizontal offset
[17:21:49] [PASSED] vertical offset
[17:21:49] [PASSED] horizontal and vertical offset
[17:21:49] [PASSED] horizontal offset (custom pitch)
[17:21:49] [PASSED] vertical offset (custom pitch)
[17:21:49] [PASSED] horizontal and vertical offset (custom pitch)
[17:21:49] ============= [PASSED] drm_test_fb_clip_offset =============
[17:21:49] =================== drm_test_fb_memcpy ====================
[17:21:49] [PASSED] single_pixel_source_buffer: XR24 little-endian (0x34325258)
[17:21:49] [PASSED] single_pixel_source_buffer: XRA8 little-endian (0x38415258)
[17:21:49] [PASSED] single_pixel_source_buffer: YU24 little-endian (0x34325559)
[17:21:49] [PASSED] single_pixel_clip_rectangle: XB24 little-endian (0x34324258)
[17:21:49] [PASSED] single_pixel_clip_rectangle: XRA8 little-endian (0x38415258)
[17:21:49] [PASSED] single_pixel_clip_rectangle: YU24 little-endian (0x34325559)
[17:21:49] [PASSED] well_known_colors: XB24 little-endian (0x34324258)
[17:21:49] [PASSED] well_known_colors: XRA8 little-endian (0x38415258)
[17:21:49] [PASSED] well_known_colors: YU24 little-endian (0x34325559)
[17:21:49] [PASSED] destination_pitch: XB24 little-endian (0x34324258)
[17:21:49] [PASSED] destination_pitch: XRA8 little-endian (0x38415258)
[17:21:49] [PASSED] destination_pitch: YU24 little-endian (0x34325559)
[17:21:49] =============== [PASSED] drm_test_fb_memcpy ================
[17:21:49] ============= [PASSED] drm_format_helper_test ==============
[17:21:49] ================= drm_format (18 subtests) =================
[17:21:49] [PASSED] drm_test_format_block_width_invalid
[17:21:49] [PASSED] drm_test_format_block_width_one_plane
[17:21:49] [PASSED] drm_test_format_block_width_two_plane
[17:21:49] [PASSED] drm_test_format_block_width_three_plane
[17:21:49] [PASSED] drm_test_format_block_width_tiled
[17:21:49] [PASSED] drm_test_format_block_height_invalid
[17:21:49] [PASSED] drm_test_format_block_height_one_plane
[17:21:49] [PASSED] drm_test_format_block_height_two_plane
[17:21:49] [PASSED] drm_test_format_block_height_three_plane
[17:21:49] [PASSED] drm_test_format_block_height_tiled
[17:21:49] [PASSED] drm_test_format_min_pitch_invalid
[17:21:49] [PASSED] drm_test_format_min_pitch_one_plane_8bpp
[17:21:49] [PASSED] drm_test_format_min_pitch_one_plane_16bpp
[17:21:49] [PASSED] drm_test_format_min_pitch_one_plane_24bpp
[17:21:49] [PASSED] drm_test_format_min_pitch_one_plane_32bpp
[17:21:49] [PASSED] drm_test_format_min_pitch_two_plane
[17:21:49] [PASSED] drm_test_format_min_pitch_three_plane_8bpp
[17:21:49] [PASSED] drm_test_format_min_pitch_tiled
[17:21:49] =================== [PASSED] drm_format ====================
[17:21:49] ============== drm_framebuffer (10 subtests) ===============
[17:21:49] ========== drm_test_framebuffer_check_src_coords ==========
[17:21:49] [PASSED] Success: source fits into fb
[17:21:49] [PASSED] Fail: overflowing fb with x-axis coordinate
[17:21:49] [PASSED] Fail: overflowing fb with y-axis coordinate
[17:21:49] [PASSED] Fail: overflowing fb with source width
[17:21:49] [PASSED] Fail: overflowing fb with source height
[17:21:49] ====== [PASSED] drm_test_framebuffer_check_src_coords ======
[17:21:49] [PASSED] drm_test_framebuffer_cleanup
[17:21:49] =============== drm_test_framebuffer_create ===============
[17:21:49] [PASSED] ABGR8888 normal sizes
[17:21:49] [PASSED] ABGR8888 max sizes
[17:21:49] [PASSED] ABGR8888 pitch greater than min required
[17:21:49] [PASSED] ABGR8888 pitch less than min required
[17:21:49] [PASSED] ABGR8888 Invalid width
[17:21:49] [PASSED] ABGR8888 Invalid buffer handle
[17:21:49] [PASSED] No pixel format
[17:21:49] [PASSED] ABGR8888 Width 0
[17:21:49] [PASSED] ABGR8888 Height 0
[17:21:49] [PASSED] ABGR8888 Out of bound height * pitch combination
[17:21:49] [PASSED] ABGR8888 Large buffer offset
[17:21:49] [PASSED] ABGR8888 Buffer offset for inexistent plane
[17:21:49] [PASSED] ABGR8888 Invalid flag
[17:21:49] [PASSED] ABGR8888 Set DRM_MODE_FB_MODIFIERS without modifiers
[17:21:49] [PASSED] ABGR8888 Valid buffer modifier
[17:21:49] [PASSED] ABGR8888 Invalid buffer modifier(DRM_FORMAT_MOD_SAMSUNG_64_32_TILE)
[17:21:49] [PASSED] ABGR8888 Extra pitches without DRM_MODE_FB_MODIFIERS
[17:21:49] [PASSED] ABGR8888 Extra pitches with DRM_MODE_FB_MODIFIERS
[17:21:49] [PASSED] NV12 Normal sizes
[17:21:49] [PASSED] NV12 Max sizes
[17:21:49] [PASSED] NV12 Invalid pitch
[17:21:49] [PASSED] NV12 Invalid modifier/missing DRM_MODE_FB_MODIFIERS flag
[17:21:49] [PASSED] NV12 different modifier per-plane
[17:21:49] [PASSED] NV12 with DRM_FORMAT_MOD_SAMSUNG_64_32_TILE
[17:21:49] [PASSED] NV12 Valid modifiers without DRM_MODE_FB_MODIFIERS
[17:21:49] [PASSED] NV12 Modifier for inexistent plane
[17:21:49] [PASSED] NV12 Handle for inexistent plane
[17:21:49] [PASSED] NV12 Handle for inexistent plane without DRM_MODE_FB_MODIFIERS
[17:21:49] [PASSED] YVU420 DRM_MODE_FB_MODIFIERS set without modifier
[17:21:49] [PASSED] YVU420 Normal sizes
[17:21:49] [PASSED] YVU420 Max sizes
[17:21:49] [PASSED] YVU420 Invalid pitch
[17:21:49] [PASSED] YVU420 Different pitches
[17:21:49] [PASSED] YVU420 Different buffer offsets/pitches
[17:21:49] [PASSED] YVU420 Modifier set just for plane 0, without DRM_MODE_FB_MODIFIERS
[17:21:49] [PASSED] YVU420 Modifier set just for planes 0, 1, without DRM_MODE_FB_MODIFIERS
[17:21:49] [PASSED] YVU420 Modifier set just for plane 0, 1, with DRM_MODE_FB_MODIFIERS
[17:21:49] [PASSED] YVU420 Valid modifier
[17:21:49] [PASSED] YVU420 Different modifiers per plane
[17:21:49] [PASSED] YVU420 Modifier for inexistent plane
[17:21:49] [PASSED] YUV420_10BIT Invalid modifier(DRM_FORMAT_MOD_LINEAR)
[17:21:49] [PASSED] X0L2 Normal sizes
[17:21:49] [PASSED] X0L2 Max sizes
[17:21:49] [PASSED] X0L2 Invalid pitch
[17:21:49] [PASSED] X0L2 Pitch greater than minimum required
[17:21:49] [PASSED] X0L2 Handle for inexistent plane
[17:21:49] [PASSED] X0L2 Offset for inexistent plane, without DRM_MODE_FB_MODIFIERS set
[17:21:49] [PASSED] X0L2 Modifier without DRM_MODE_FB_MODIFIERS set
[17:21:49] [PASSED] X0L2 Valid modifier
[17:21:49] [PASSED] X0L2 Modifier for inexistent plane
[17:21:49] =========== [PASSED] drm_test_framebuffer_create ===========
[17:21:49] [PASSED] drm_test_framebuffer_free
[17:21:49] [PASSED] drm_test_framebuffer_init
[17:21:49] [PASSED] drm_test_framebuffer_init_bad_format
[17:21:49] [PASSED] drm_test_framebuffer_init_dev_mismatch
[17:21:49] [PASSED] drm_test_framebuffer_lookup
[17:21:49] [PASSED] drm_test_framebuffer_lookup_inexistent
[17:21:49] [PASSED] drm_test_framebuffer_modifiers_not_supported
[17:21:49] ================= [PASSED] drm_framebuffer =================
[17:21:49] ================ drm_gem_shmem (8 subtests) ================
[17:21:49] [PASSED] drm_gem_shmem_test_obj_create
[17:21:49] [PASSED] drm_gem_shmem_test_obj_create_private
[17:21:49] [PASSED] drm_gem_shmem_test_pin_pages
[17:21:49] [PASSED] drm_gem_shmem_test_vmap
[17:21:49] [PASSED] drm_gem_shmem_test_get_sg_table
[17:21:49] [PASSED] drm_gem_shmem_test_get_pages_sgt
[17:21:49] [PASSED] drm_gem_shmem_test_madvise
[17:21:49] [PASSED] drm_gem_shmem_test_purge
[17:21:49] ================== [PASSED] drm_gem_shmem ==================
[17:21:49] === drm_atomic_helper_connector_hdmi_check (29 subtests) ===
[17:21:49] [PASSED] drm_test_check_broadcast_rgb_auto_cea_mode
[17:21:49] [PASSED] drm_test_check_broadcast_rgb_auto_cea_mode_vic_1
[17:21:49] [PASSED] drm_test_check_broadcast_rgb_full_cea_mode
[17:21:49] [PASSED] drm_test_check_broadcast_rgb_full_cea_mode_vic_1
[17:21:49] [PASSED] drm_test_check_broadcast_rgb_limited_cea_mode
[17:21:49] [PASSED] drm_test_check_broadcast_rgb_limited_cea_mode_vic_1
[17:21:49] ====== drm_test_check_broadcast_rgb_cea_mode_yuv420 =======
[17:21:49] [PASSED] Automatic
[17:21:49] [PASSED] Full
[17:21:49] [PASSED] Limited 16:235
[17:21:49] == [PASSED] drm_test_check_broadcast_rgb_cea_mode_yuv420 ===
[17:21:49] [PASSED] drm_test_check_broadcast_rgb_crtc_mode_changed
[17:21:49] [PASSED] drm_test_check_broadcast_rgb_crtc_mode_not_changed
[17:21:49] [PASSED] drm_test_check_disable_connector
[17:21:49] [PASSED] drm_test_check_hdmi_funcs_reject_rate
[17:21:49] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_rgb
[17:21:49] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_yuv420
[17:21:49] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_ignore_yuv422
[17:21:49] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_ignore_yuv420
[17:21:49] [PASSED] drm_test_check_driver_unsupported_fallback_yuv420
[17:21:49] [PASSED] drm_test_check_output_bpc_crtc_mode_changed
[17:21:49] [PASSED] drm_test_check_output_bpc_crtc_mode_not_changed
[17:21:49] [PASSED] drm_test_check_output_bpc_dvi
[17:21:49] [PASSED] drm_test_check_output_bpc_format_vic_1
[17:21:49] [PASSED] drm_test_check_output_bpc_format_display_8bpc_only
[17:21:49] [PASSED] drm_test_check_output_bpc_format_display_rgb_only
[17:21:49] [PASSED] drm_test_check_output_bpc_format_driver_8bpc_only
[17:21:49] [PASSED] drm_test_check_output_bpc_format_driver_rgb_only
[17:21:49] [PASSED] drm_test_check_tmds_char_rate_rgb_8bpc
[17:21:49] [PASSED] drm_test_check_tmds_char_rate_rgb_10bpc
[17:21:49] [PASSED] drm_test_check_tmds_char_rate_rgb_12bpc
[17:21:49] ============ drm_test_check_hdmi_color_format =============
[17:21:49] [PASSED] AUTO -> RGB
[17:21:49] [PASSED] YCBCR422 -> YUV422
[17:21:49] [PASSED] YCBCR420 -> YUV420
[17:21:49] [PASSED] YCBCR444 -> YUV444
[17:21:49] [PASSED] RGB -> RGB
[17:21:49] ======== [PASSED] drm_test_check_hdmi_color_format =========
[17:21:49] ======== drm_test_check_hdmi_color_format_420_only ========
[17:21:49] [PASSED] RGB should fail
[17:21:49] [PASSED] YUV444 should fail
[17:21:49] [PASSED] YUV422 should fail
[17:21:49] [PASSED] YUV420 should work
[17:21:49] ==== [PASSED] drm_test_check_hdmi_color_format_420_only ====
[17:21:49] ===== [PASSED] drm_atomic_helper_connector_hdmi_check ======
[17:21:49] === drm_atomic_helper_connector_hdmi_reset (6 subtests) ====
[17:21:49] [PASSED] drm_test_check_broadcast_rgb_value
[17:21:49] [PASSED] drm_test_check_bpc_8_value
[17:21:49] [PASSED] drm_test_check_bpc_10_value
[17:21:49] [PASSED] drm_test_check_bpc_12_value
[17:21:49] [PASSED] drm_test_check_format_value
[17:21:49] [PASSED] drm_test_check_tmds_char_value
[17:21:49] ===== [PASSED] drm_atomic_helper_connector_hdmi_reset ======
[17:21:49] = drm_atomic_helper_connector_hdmi_mode_valid (7 subtests) =
[17:21:49] [PASSED] drm_test_check_mode_valid
[17:21:49] [PASSED] drm_test_check_mode_valid_reject
[17:21:49] [PASSED] drm_test_check_mode_valid_reject_rate
[17:21:49] [PASSED] drm_test_check_mode_valid_reject_max_clock
[17:21:49] [PASSED] drm_test_check_mode_valid_yuv420_only_max_clock
[17:21:49] [PASSED] drm_test_check_mode_valid_reject_yuv420_only_connector
[17:21:49] [PASSED] drm_test_check_mode_valid_accept_yuv420_also_connector_rgb
[17:21:49] === [PASSED] drm_atomic_helper_connector_hdmi_mode_valid ===
[17:21:49] = drm_atomic_helper_connector_hdmi_infoframes (5 subtests) =
[17:21:49] [PASSED] drm_test_check_infoframes
[17:21:49] [PASSED] drm_test_check_reject_avi_infoframe
[17:21:49] [PASSED] drm_test_check_reject_hdr_infoframe_bpc_8
[17:21:49] [PASSED] drm_test_check_reject_hdr_infoframe_bpc_10
[17:21:49] [PASSED] drm_test_check_reject_audio_infoframe
[17:21:49] === [PASSED] drm_atomic_helper_connector_hdmi_infoframes ===
[17:21:49] ================= drm_managed (2 subtests) =================
[17:21:49] [PASSED] drm_test_managed_release_action
[17:21:49] [PASSED] drm_test_managed_run_action
[17:21:49] =================== [PASSED] drm_managed ===================
[17:21:49] =================== drm_mm (6 subtests) ====================
[17:21:49] [PASSED] drm_test_mm_init
[17:21:49] [PASSED] drm_test_mm_debug
[17:21:49] [PASSED] drm_test_mm_align32
[17:21:49] [PASSED] drm_test_mm_align64
[17:21:49] [PASSED] drm_test_mm_lowest
[17:21:49] [PASSED] drm_test_mm_highest
[17:21:49] ===================== [PASSED] drm_mm ======================
[17:21:49] ============= drm_modes_analog_tv (5 subtests) =============
[17:21:49] [PASSED] drm_test_modes_analog_tv_mono_576i
[17:21:49] [PASSED] drm_test_modes_analog_tv_ntsc_480i
[17:21:49] [PASSED] drm_test_modes_analog_tv_ntsc_480i_inlined
[17:21:49] [PASSED] drm_test_modes_analog_tv_pal_576i
[17:21:49] [PASSED] drm_test_modes_analog_tv_pal_576i_inlined
[17:21:49] =============== [PASSED] drm_modes_analog_tv ===============
[17:21:49] ============== drm_plane_helper (2 subtests) ===============
[17:21:49] =============== drm_test_check_plane_state ================
[17:21:49] [PASSED] clipping_simple
[17:21:49] [PASSED] clipping_rotate_reflect
[17:21:49] [PASSED] positioning_simple
[17:21:49] [PASSED] upscaling
[17:21:49] [PASSED] downscaling
[17:21:49] [PASSED] rounding1
[17:21:49] [PASSED] rounding2
[17:21:49] [PASSED] rounding3
[17:21:49] [PASSED] rounding4
[17:21:49] =========== [PASSED] drm_test_check_plane_state ============
[17:21:49] =========== drm_test_check_invalid_plane_state ============
[17:21:49] [PASSED] positioning_invalid
[17:21:49] [PASSED] upscaling_invalid
[17:21:49] [PASSED] downscaling_invalid
[17:21:49] ======= [PASSED] drm_test_check_invalid_plane_state ========
[17:21:49] ================ [PASSED] drm_plane_helper =================
[17:21:49] ====== drm_connector_helper_tv_get_modes (1 subtest) =======
[17:21:49] ====== drm_test_connector_helper_tv_get_modes_check =======
[17:21:49] [PASSED] None
[17:21:49] [PASSED] PAL
[17:21:49] [PASSED] NTSC
[17:21:49] [PASSED] Both, NTSC Default
[17:21:49] [PASSED] Both, PAL Default
[17:21:49] [PASSED] Both, NTSC Default, with PAL on command-line
[17:21:49] [PASSED] Both, PAL Default, with NTSC on command-line
[17:21:49] == [PASSED] drm_test_connector_helper_tv_get_modes_check ===
[17:21:49] ======== [PASSED] drm_connector_helper_tv_get_modes ========
[17:21:49] ================== drm_rect (9 subtests) ===================
[17:21:49] [PASSED] drm_test_rect_clip_scaled_div_by_zero
[17:21:49] [PASSED] drm_test_rect_clip_scaled_not_clipped
[17:21:49] [PASSED] drm_test_rect_clip_scaled_clipped
[17:21:49] [PASSED] drm_test_rect_clip_scaled_signed_vs_unsigned
[17:21:49] ================= drm_test_rect_intersect =================
[17:21:49] [PASSED] top-left x bottom-right: 2x2+1+1 x 2x2+0+0
[17:21:49] [PASSED] top-right x bottom-left: 2x2+0+0 x 2x2+1-1
[17:21:49] [PASSED] bottom-left x top-right: 2x2+1-1 x 2x2+0+0
[17:21:49] [PASSED] bottom-right x top-left: 2x2+0+0 x 2x2+1+1
[17:21:49] [PASSED] right x left: 2x1+0+0 x 3x1+1+0
[17:21:49] [PASSED] left x right: 3x1+1+0 x 2x1+0+0
[17:21:49] [PASSED] up x bottom: 1x2+0+0 x 1x3+0-1
[17:21:49] [PASSED] bottom x up: 1x3+0-1 x 1x2+0+0
[17:21:49] [PASSED] touching corner: 1x1+0+0 x 2x2+1+1
[17:21:49] [PASSED] touching side: 1x1+0+0 x 1x1+1+0
[17:21:49] [PASSED] equal rects: 2x2+0+0 x 2x2+0+0
[17:21:49] [PASSED] inside another: 2x2+0+0 x 1x1+1+1
[17:21:49] [PASSED] far away: 1x1+0+0 x 1x1+3+6
[17:21:49] [PASSED] points intersecting: 0x0+5+10 x 0x0+5+10
[17:21:49] [PASSED] points not intersecting: 0x0+0+0 x 0x0+5+10
[17:21:49] ============= [PASSED] drm_test_rect_intersect =============
[17:21:49] ================ drm_test_rect_calc_hscale ================
[17:21:49] [PASSED] normal use
[17:21:49] [PASSED] out of max range
[17:21:49] [PASSED] out of min range
[17:21:49] [PASSED] zero dst
[17:21:49] [PASSED] negative src
[17:21:49] [PASSED] negative dst
[17:21:49] ============ [PASSED] drm_test_rect_calc_hscale ============
[17:21:49] ================ drm_test_rect_calc_vscale ================
[17:21:49] [PASSED] normal use
[17:21:49] [PASSED] out of max range
[17:21:49] [PASSED] out of min range
[17:21:49] [PASSED] zero dst
[17:21:49] [PASSED] negative src
[17:21:49] [PASSED] negative dst
[17:21:49] ============ [PASSED] drm_test_rect_calc_vscale ============
[17:21:49] ================== drm_test_rect_rotate ===================
[17:21:49] [PASSED] reflect-x
[17:21:49] [PASSED] reflect-y
[17:21:49] [PASSED] rotate-0
[17:21:49] [PASSED] rotate-90
[17:21:49] [PASSED] rotate-180
[17:21:49] [PASSED] rotate-270
[17:21:49] ============== [PASSED] drm_test_rect_rotate ===============
[17:21:49] ================ drm_test_rect_rotate_inv =================
[17:21:49] [PASSED] reflect-x
[17:21:49] [PASSED] reflect-y
[17:21:49] [PASSED] rotate-0
[17:21:49] [PASSED] rotate-90
[17:21:49] [PASSED] rotate-180
[17:21:49] [PASSED] rotate-270
[17:21:49] ============ [PASSED] drm_test_rect_rotate_inv =============
[17:21:49] ==================== [PASSED] drm_rect =====================
[17:21:49] ============ drm_sysfb_modeset_test (1 subtest) ============
[17:21:49] ============ drm_test_sysfb_build_fourcc_list =============
[17:21:49] [PASSED] no native formats
[17:21:49] [PASSED] XRGB8888 as native format
[17:21:49] [PASSED] remove duplicates
[17:21:49] [PASSED] convert alpha formats
[17:21:49] [PASSED] random formats
[17:21:49] ======== [PASSED] drm_test_sysfb_build_fourcc_list =========
[17:21:49] ============= [PASSED] drm_sysfb_modeset_test ==============
[17:21:49] ================== drm_fixp (2 subtests) ===================
[17:21:49] [PASSED] drm_test_int2fixp
[17:21:49] [PASSED] drm_test_sm2fixp
[17:21:49] ==================== [PASSED] drm_fixp =====================
[17:21:49] ============================================================
[17:21:49] Testing complete. Ran 639 tests: passed: 639
[17:21:49] Elapsed time: 26.203s total, 1.771s configuring, 24.266s building, 0.147s running
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/ttm/tests/.kunitconfig
[17:21:49] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[17:21:51] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[17:22:00] Starting KUnit Kernel (1/1)...
[17:22:00] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[17:22:00] ================= ttm_device (5 subtests) ==================
[17:22:00] [PASSED] ttm_device_init_basic
[17:22:00] [PASSED] ttm_device_init_multiple
[17:22:00] [PASSED] ttm_device_fini_basic
[17:22:00] [PASSED] ttm_device_init_no_vma_man
[17:22:00] ================== ttm_device_init_pools ==================
[17:22:00] [PASSED] No DMA allocations, no DMA32 required
[17:22:00] [PASSED] DMA allocations, DMA32 required
[17:22:00] [PASSED] No DMA allocations, DMA32 required
[17:22:00] [PASSED] DMA allocations, no DMA32 required
[17:22:00] ============== [PASSED] ttm_device_init_pools ==============
[17:22:00] =================== [PASSED] ttm_device ====================
[17:22:00] ================== ttm_pool (8 subtests) ===================
[17:22:00] ================== ttm_pool_alloc_basic ===================
[17:22:00] [PASSED] One page
[17:22:00] [PASSED] More than one page
[17:22:00] [PASSED] Above the allocation limit
[17:22:00] [PASSED] One page, with coherent DMA mappings enabled
[17:22:00] [PASSED] Above the allocation limit, with coherent DMA mappings enabled
[17:22:00] ============== [PASSED] ttm_pool_alloc_basic ===============
[17:22:00] ============== ttm_pool_alloc_basic_dma_addr ==============
[17:22:00] [PASSED] One page
[17:22:00] [PASSED] More than one page
[17:22:00] [PASSED] Above the allocation limit
[17:22:00] [PASSED] One page, with coherent DMA mappings enabled
[17:22:00] [PASSED] Above the allocation limit, with coherent DMA mappings enabled
[17:22:00] ========== [PASSED] ttm_pool_alloc_basic_dma_addr ==========
[17:22:00] [PASSED] ttm_pool_alloc_order_caching_match
[17:22:00] [PASSED] ttm_pool_alloc_caching_mismatch
[17:22:00] [PASSED] ttm_pool_alloc_order_mismatch
[17:22:00] [PASSED] ttm_pool_free_dma_alloc
[17:22:00] [PASSED] ttm_pool_free_no_dma_alloc
[17:22:00] [PASSED] ttm_pool_fini_basic
[17:22:00] ==================== [PASSED] ttm_pool =====================
[17:22:00] ================ ttm_resource (8 subtests) =================
[17:22:00] ================= ttm_resource_init_basic =================
[17:22:00] [PASSED] Init resource in TTM_PL_SYSTEM
[17:22:00] [PASSED] Init resource in TTM_PL_VRAM
[17:22:00] [PASSED] Init resource in a private placement
[17:22:00] [PASSED] Init resource in TTM_PL_SYSTEM, set placement flags
[17:22:00] ============= [PASSED] ttm_resource_init_basic =============
[17:22:00] [PASSED] ttm_resource_init_pinned
[17:22:00] [PASSED] ttm_resource_fini_basic
[17:22:00] [PASSED] ttm_resource_manager_init_basic
[17:22:00] [PASSED] ttm_resource_manager_usage_basic
[17:22:00] [PASSED] ttm_resource_manager_set_used_basic
[17:22:00] [PASSED] ttm_sys_man_alloc_basic
[17:22:00] [PASSED] ttm_sys_man_free_basic
[17:22:00] ================== [PASSED] ttm_resource ===================
[17:22:00] =================== ttm_tt (15 subtests) ===================
[17:22:00] ==================== ttm_tt_init_basic ====================
[17:22:00] [PASSED] Page-aligned size
[17:22:00] [PASSED] Extra pages requested
[17:22:00] ================ [PASSED] ttm_tt_init_basic ================
[17:22:00] [PASSED] ttm_tt_init_misaligned
[17:22:00] [PASSED] ttm_tt_fini_basic
[17:22:00] [PASSED] ttm_tt_fini_sg
[17:22:00] [PASSED] ttm_tt_fini_shmem
[17:22:00] [PASSED] ttm_tt_create_basic
[17:22:00] [PASSED] ttm_tt_create_invalid_bo_type
[17:22:00] [PASSED] ttm_tt_create_ttm_exists
[17:22:00] [PASSED] ttm_tt_create_failed
[17:22:00] [PASSED] ttm_tt_destroy_basic
[17:22:00] [PASSED] ttm_tt_populate_null_ttm
[17:22:00] [PASSED] ttm_tt_populate_populated_ttm
[17:22:00] [PASSED] ttm_tt_unpopulate_basic
[17:22:00] [PASSED] ttm_tt_unpopulate_empty_ttm
[17:22:00] [PASSED] ttm_tt_swapin_basic
[17:22:00] ===================== [PASSED] ttm_tt ======================
[17:22:00] =================== ttm_bo (14 subtests) ===================
[17:22:00] =========== ttm_bo_reserve_optimistic_no_ticket ===========
[17:22:00] [PASSED] Cannot be interrupted and sleeps
[17:22:00] [PASSED] Cannot be interrupted, locks straight away
[17:22:00] [PASSED] Can be interrupted, sleeps
[17:22:00] ======= [PASSED] ttm_bo_reserve_optimistic_no_ticket =======
[17:22:00] [PASSED] ttm_bo_reserve_locked_no_sleep
[17:22:00] [PASSED] ttm_bo_reserve_no_wait_ticket
[17:22:00] [PASSED] ttm_bo_reserve_double_resv
[17:22:00] [PASSED] ttm_bo_reserve_interrupted
[17:22:00] [PASSED] ttm_bo_reserve_deadlock
[17:22:00] [PASSED] ttm_bo_unreserve_basic
[17:22:00] [PASSED] ttm_bo_unreserve_pinned
[17:22:00] [PASSED] ttm_bo_unreserve_bulk
[17:22:00] [PASSED] ttm_bo_fini_basic
[17:22:00] [PASSED] ttm_bo_fini_shared_resv
[17:22:00] [PASSED] ttm_bo_pin_basic
[17:22:00] [PASSED] ttm_bo_pin_unpin_resource
[17:22:00] [PASSED] ttm_bo_multiple_pin_one_unpin
[17:22:00] ===================== [PASSED] ttm_bo ======================
[17:22:00] ============== ttm_bo_validate (22 subtests) ===============
[17:22:00] ============== ttm_bo_init_reserved_sys_man ===============
[17:22:00] [PASSED] Buffer object for userspace
[17:22:00] [PASSED] Kernel buffer object
[17:22:00] [PASSED] Shared buffer object
[17:22:00] ========== [PASSED] ttm_bo_init_reserved_sys_man ===========
[17:22:00] ============== ttm_bo_init_reserved_mock_man ==============
[17:22:00] [PASSED] Buffer object for userspace
[17:22:00] [PASSED] Kernel buffer object
[17:22:00] [PASSED] Shared buffer object
[17:22:00] ========== [PASSED] ttm_bo_init_reserved_mock_man ==========
[17:22:00] [PASSED] ttm_bo_init_reserved_resv
[17:22:00] ================== ttm_bo_validate_basic ==================
[17:22:00] [PASSED] Buffer object for userspace
[17:22:00] [PASSED] Kernel buffer object
[17:22:00] [PASSED] Shared buffer object
[17:22:00] ============== [PASSED] ttm_bo_validate_basic ==============
[17:22:00] [PASSED] ttm_bo_validate_invalid_placement
[17:22:00] ============= ttm_bo_validate_same_placement ==============
[17:22:00] [PASSED] System manager
[17:22:00] [PASSED] VRAM manager
[17:22:00] ========= [PASSED] ttm_bo_validate_same_placement ==========
[17:22:00] [PASSED] ttm_bo_validate_failed_alloc
[17:22:00] [PASSED] ttm_bo_validate_pinned
[17:22:00] [PASSED] ttm_bo_validate_busy_placement
[17:22:00] ================ ttm_bo_validate_multihop =================
[17:22:00] [PASSED] Buffer object for userspace
[17:22:00] [PASSED] Kernel buffer object
[17:22:00] [PASSED] Shared buffer object
[17:22:00] ============ [PASSED] ttm_bo_validate_multihop =============
[17:22:00] ========== ttm_bo_validate_no_placement_signaled ==========
[17:22:00] [PASSED] Buffer object in system domain, no page vector
[17:22:00] [PASSED] Buffer object in system domain with an existing page vector
[17:22:00] ====== [PASSED] ttm_bo_validate_no_placement_signaled ======
[17:22:00] ======== ttm_bo_validate_no_placement_not_signaled ========
[17:22:00] [PASSED] Buffer object for userspace
[17:22:00] [PASSED] Kernel buffer object
[17:22:00] [PASSED] Shared buffer object
[17:22:00] ==== [PASSED] ttm_bo_validate_no_placement_not_signaled ====
[17:22:00] [PASSED] ttm_bo_validate_move_fence_signaled
[17:22:00] ========= ttm_bo_validate_move_fence_not_signaled =========
[17:22:00] [PASSED] Waits for GPU
[17:22:00] [PASSED] Tries to lock straight away
[17:22:00] ===== [PASSED] ttm_bo_validate_move_fence_not_signaled =====
[17:22:00] [PASSED] ttm_bo_validate_swapout
[17:22:00] [PASSED] ttm_bo_validate_happy_evict
[17:22:00] [PASSED] ttm_bo_validate_all_pinned_evict
[17:22:00] [PASSED] ttm_bo_validate_allowed_only_evict
[17:22:00] [PASSED] ttm_bo_validate_deleted_evict
[17:22:00] [PASSED] ttm_bo_validate_busy_domain_evict
[17:22:00] [PASSED] ttm_bo_validate_evict_gutting
[17:22:00] [PASSED] ttm_bo_validate_recrusive_evict
[17:22:00] ================= [PASSED] ttm_bo_validate =================
[17:22:00] ============================================================
[17:22:00] Testing complete. Ran 102 tests: passed: 102
[17:22:00] Elapsed time: 11.355s total, 1.727s configuring, 9.413s building, 0.180s running
+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel
^ permalink raw reply [flat|nested] 16+ messages in thread
* ✓ Xe.CI.BAT: success for Introduce cold reset recovery method (rev9)
2026-06-22 13:14 [PATCH v9 0/6] Introduce cold reset recovery method Mallesh Koujalagi
` (7 preceding siblings ...)
2026-06-22 17:22 ` ✓ CI.KUnit: success " Patchwork
@ 2026-06-22 18:09 ` Patchwork
2026-06-22 22:03 ` ✗ Xe.CI.FULL: failure " Patchwork
9 siblings, 0 replies; 16+ messages in thread
From: Patchwork @ 2026-06-22 18:09 UTC (permalink / raw)
To: Mallesh Koujalagi; +Cc: intel-xe
[-- Attachment #1: Type: text/plain, Size: 1004 bytes --]
== Series Details ==
Series: Introduce cold reset recovery method (rev9)
URL : https://patchwork.freedesktop.org/series/163428/
State : success
== Summary ==
CI Bug Log - changes from xe-5286-6583dd200482a34bb17e5dc54551f91f53b79798_BAT -> xe-pw-163428v9_BAT
====================================================
Summary
-------
**SUCCESS**
No regressions found.
Participating hosts (11 -> 11)
------------------------------
No changes in participating hosts
Changes
-------
No changes found
Build changes
-------------
* IGT: IGT_8976 -> IGT_8977
* Linux: xe-5286-6583dd200482a34bb17e5dc54551f91f53b79798 -> xe-pw-163428v9
IGT_8976: 8976
IGT_8977: c8bdb0bc2140e337e1d79969430c3d7a5c088c9e @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
xe-5286-6583dd200482a34bb17e5dc54551f91f53b79798: 6583dd200482a34bb17e5dc54551f91f53b79798
xe-pw-163428v9: 163428v9
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/index.html
[-- Attachment #2: Type: text/html, Size: 1566 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* ✗ Xe.CI.FULL: failure for Introduce cold reset recovery method (rev9)
2026-06-22 13:14 [PATCH v9 0/6] Introduce cold reset recovery method Mallesh Koujalagi
` (8 preceding siblings ...)
2026-06-22 18:09 ` ✓ Xe.CI.BAT: " Patchwork
@ 2026-06-22 22:03 ` Patchwork
9 siblings, 0 replies; 16+ messages in thread
From: Patchwork @ 2026-06-22 22:03 UTC (permalink / raw)
To: Mallesh Koujalagi; +Cc: intel-xe
[-- Attachment #1: Type: text/plain, Size: 22720 bytes --]
== Series Details ==
Series: Introduce cold reset recovery method (rev9)
URL : https://patchwork.freedesktop.org/series/163428/
State : failure
== Summary ==
CI Bug Log - changes from xe-5286-6583dd200482a34bb17e5dc54551f91f53b79798_FULL -> xe-pw-163428v9_FULL
====================================================
Summary
-------
**FAILURE**
Serious unknown changes coming with xe-pw-163428v9_FULL absolutely need to be
verified manually.
If you think the reported changes have nothing to do with the changes
introduced in xe-pw-163428v9_FULL, please notify your bug team (I915-ci-infra@lists.freedesktop.org) to allow them
to document this new failure mode, which will reduce false positives in CI.
Participating hosts (2 -> 2)
------------------------------
No changes in participating hosts
Possible new issues
-------------------
Here are the unknown changes that may have been introduced in xe-pw-163428v9_FULL:
### IGT changes ###
#### Possible regressions ####
* igt@xe_oa@polling@oag-0:
- shard-lnl: [PASS][1] -> [FAIL][2] +1 other test fail
[1]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-5286-6583dd200482a34bb17e5dc54551f91f53b79798/shard-lnl-7/igt@xe_oa@polling@oag-0.html
[2]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-lnl-1/igt@xe_oa@polling@oag-0.html
Known issues
------------
Here are the changes found in xe-pw-163428v9_FULL that come from known issues:
### IGT changes ###
#### Issues hit ####
* igt@kms_big_fb@yf-tiled-max-hw-stride-64bpp-rotate-180-hflip:
- shard-lnl: NOTRUN -> [SKIP][3] ([Intel XE#1124])
[3]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-lnl-7/igt@kms_big_fb@yf-tiled-max-hw-stride-64bpp-rotate-180-hflip.html
- shard-bmg: NOTRUN -> [SKIP][4] ([Intel XE#1124]) +3 other tests skip
[4]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-8/igt@kms_big_fb@yf-tiled-max-hw-stride-64bpp-rotate-180-hflip.html
* igt@kms_bw@linear-tiling-1-displays-target-3840x2160p:
- shard-bmg: NOTRUN -> [SKIP][5] ([Intel XE#367]) +1 other test skip
[5]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-2/igt@kms_bw@linear-tiling-1-displays-target-3840x2160p.html
* igt@kms_ccs@bad-pixel-format-4-tiled-dg2-mc-ccs:
- shard-bmg: NOTRUN -> [SKIP][6] ([Intel XE#2887]) +6 other tests skip
[6]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-1/igt@kms_ccs@bad-pixel-format-4-tiled-dg2-mc-ccs.html
- shard-lnl: NOTRUN -> [SKIP][7] ([Intel XE#2887])
[7]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-lnl-5/igt@kms_ccs@bad-pixel-format-4-tiled-dg2-mc-ccs.html
* igt@kms_chamelium_color@ctm-blue-to-red:
- shard-bmg: NOTRUN -> [SKIP][8] ([Intel XE#2325] / [Intel XE#7358])
[8]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-9/igt@kms_chamelium_color@ctm-blue-to-red.html
* igt@kms_chamelium_frames@dp-crc-multiple:
- shard-bmg: NOTRUN -> [SKIP][9] ([Intel XE#2252]) +2 other tests skip
[9]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-7/igt@kms_chamelium_frames@dp-crc-multiple.html
* igt@kms_cursor_crc@cursor-offscreen-256x85:
- shard-bmg: NOTRUN -> [SKIP][10] ([Intel XE#2320])
[10]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-4/igt@kms_cursor_crc@cursor-offscreen-256x85.html
* igt@kms_cursor_crc@cursor-onscreen-512x170:
- shard-bmg: NOTRUN -> [SKIP][11] ([Intel XE#2321] / [Intel XE#7355])
[11]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-10/igt@kms_cursor_crc@cursor-onscreen-512x170.html
* igt@kms_cursor_legacy@flip-vs-cursor-legacy:
- shard-bmg: [PASS][12] -> [FAIL][13] ([Intel XE#7571])
[12]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-5286-6583dd200482a34bb17e5dc54551f91f53b79798/shard-bmg-1/igt@kms_cursor_legacy@flip-vs-cursor-legacy.html
[13]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-10/igt@kms_cursor_legacy@flip-vs-cursor-legacy.html
* igt@kms_cursor_legacy@short-busy-flip-before-cursor-atomic-transitions:
- shard-bmg: NOTRUN -> [SKIP][14] ([Intel XE#2286] / [Intel XE#6035])
[14]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-2/igt@kms_cursor_legacy@short-busy-flip-before-cursor-atomic-transitions.html
* igt@kms_dsc@dsc-basic-ultrajoiner:
- shard-bmg: NOTRUN -> [SKIP][15] ([Intel XE#8265]) +1 other test skip
[15]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-9/igt@kms_dsc@dsc-basic-ultrajoiner.html
* igt@kms_flip@flip-vs-expired-vblank-interruptible@d-hdmi-a3:
- shard-bmg: [PASS][16] -> [FAIL][17] ([Intel XE#3149] / [Intel XE#3321]) +1 other test fail
[16]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-5286-6583dd200482a34bb17e5dc54551f91f53b79798/shard-bmg-7/igt@kms_flip@flip-vs-expired-vblank-interruptible@d-hdmi-a3.html
[17]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-10/igt@kms_flip@flip-vs-expired-vblank-interruptible@d-hdmi-a3.html
* igt@kms_frontbuffer_tracking@drrs-rgb101010-draw-blt:
- shard-bmg: NOTRUN -> [SKIP][18] ([Intel XE#2311]) +17 other tests skip
[18]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-8/igt@kms_frontbuffer_tracking@drrs-rgb101010-draw-blt.html
* igt@kms_frontbuffer_tracking@drrshdr-rgb565-draw-mmap-wc:
- shard-lnl: NOTRUN -> [SKIP][19] ([Intel XE#6312])
[19]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-lnl-2/igt@kms_frontbuffer_tracking@drrshdr-rgb565-draw-mmap-wc.html
* igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-draw-blt:
- shard-bmg: NOTRUN -> [SKIP][20] ([Intel XE#4141]) +4 other tests skip
[20]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-8/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-draw-blt.html
* igt@kms_frontbuffer_tracking@fbcpsr-argb161616f-draw-render:
- shard-bmg: NOTRUN -> [SKIP][21] ([Intel XE#7061] / [Intel XE#7356])
[21]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-3/igt@kms_frontbuffer_tracking@fbcpsr-argb161616f-draw-render.html
* igt@kms_frontbuffer_tracking@fbcpsrhdr-1p-primscrn-shrfb-plflip-blt:
- shard-bmg: NOTRUN -> [SKIP][22] ([Intel XE#2313]) +11 other tests skip
[22]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-1/igt@kms_frontbuffer_tracking@fbcpsrhdr-1p-primscrn-shrfb-plflip-blt.html
* igt@kms_frontbuffer_tracking@fbcpsrhdr-argb161616f-draw-blt:
- shard-bmg: NOTRUN -> [SKIP][23] ([Intel XE#7061]) +1 other test skip
[23]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-1/igt@kms_frontbuffer_tracking@fbcpsrhdr-argb161616f-draw-blt.html
* igt@kms_hdr@static-swap@pipe-a-hdmi-a-3-xrgb2101010:
- shard-bmg: [PASS][24] -> [SKIP][25] ([Intel XE#7915]) +3 other tests skip
[24]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-5286-6583dd200482a34bb17e5dc54551f91f53b79798/shard-bmg-10/igt@kms_hdr@static-swap@pipe-a-hdmi-a-3-xrgb2101010.html
[25]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-6/igt@kms_hdr@static-swap@pipe-a-hdmi-a-3-xrgb2101010.html
* igt@kms_joiner@basic-force-ultra-joiner:
- shard-bmg: NOTRUN -> [SKIP][26] ([Intel XE#6911] / [Intel XE#7466])
[26]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-9/igt@kms_joiner@basic-force-ultra-joiner.html
* igt@kms_panel_fitting@atomic-fastset:
- shard-bmg: NOTRUN -> [SKIP][27] ([Intel XE#2486])
[27]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-4/igt@kms_panel_fitting@atomic-fastset.html
* igt@kms_plane@pixel-format-yf-tiled-ccs-modifier-source-clamping:
- shard-bmg: NOTRUN -> [SKIP][28] ([Intel XE#7283]) +2 other tests skip
[28]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-9/igt@kms_plane@pixel-format-yf-tiled-ccs-modifier-source-clamping.html
* igt@kms_pm_rpm@dpms-mode-unset-lpsp:
- shard-bmg: NOTRUN -> [SKIP][29] ([Intel XE#1439] / [Intel XE#7402] / [Intel XE#836])
[29]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-9/igt@kms_pm_rpm@dpms-mode-unset-lpsp.html
* igt@kms_psr2_sf@psr2-overlay-plane-move-continuous-sf:
- shard-bmg: NOTRUN -> [SKIP][30] ([Intel XE#1489]) +1 other test skip
[30]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-5/igt@kms_psr2_sf@psr2-overlay-plane-move-continuous-sf.html
* igt@kms_psr@psr-basic:
- shard-bmg: NOTRUN -> [SKIP][31] ([Intel XE#2234] / [Intel XE#2850]) +2 other tests skip
[31]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-3/igt@kms_psr@psr-basic.html
* igt@kms_rotation_crc@primary-y-tiled-reflect-x-0:
- shard-bmg: NOTRUN -> [SKIP][32] ([Intel XE#2330] / [Intel XE#5813])
[32]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-8/igt@kms_rotation_crc@primary-y-tiled-reflect-x-0.html
* igt@kms_setmode@basic@pipe-b-edp-1:
- shard-lnl: [PASS][33] -> [FAIL][34] ([Intel XE#6361]) +1 other test fail
[33]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-5286-6583dd200482a34bb17e5dc54551f91f53b79798/shard-lnl-4/igt@kms_setmode@basic@pipe-b-edp-1.html
[34]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-lnl-5/igt@kms_setmode@basic@pipe-b-edp-1.html
* igt@kms_sharpness_filter@filter-basic:
- shard-bmg: NOTRUN -> [SKIP][35] ([Intel XE#6503])
[35]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-3/igt@kms_sharpness_filter@filter-basic.html
* igt@kms_vrr@seamless-rr-switch-vrr:
- shard-bmg: NOTRUN -> [SKIP][36] ([Intel XE#1499])
[36]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-9/igt@kms_vrr@seamless-rr-switch-vrr.html
* igt@xe_eudebug@basic-exec-queues:
- shard-bmg: NOTRUN -> [SKIP][37] ([Intel XE#7636]) +3 other tests skip
[37]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-10/igt@xe_eudebug@basic-exec-queues.html
- shard-lnl: NOTRUN -> [SKIP][38] ([Intel XE#7636])
[38]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-lnl-8/igt@xe_eudebug@basic-exec-queues.html
* igt@xe_evict@evict-mixed-many-threads-small:
- shard-bmg: [PASS][39] -> [INCOMPLETE][40] ([Intel XE#6321] / [Intel XE#8355])
[39]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-5286-6583dd200482a34bb17e5dc54551f91f53b79798/shard-bmg-3/igt@xe_evict@evict-mixed-many-threads-small.html
[40]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-1/igt@xe_evict@evict-mixed-many-threads-small.html
* igt@xe_exec_basic@multigpu-many-execqueues-many-vm-null-defer-bind:
- shard-bmg: NOTRUN -> [SKIP][41] ([Intel XE#2322] / [Intel XE#7372]) +1 other test skip
[41]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-3/igt@xe_exec_basic@multigpu-many-execqueues-many-vm-null-defer-bind.html
* igt@xe_exec_fault_mode@many-multi-queue-rebind:
- shard-bmg: NOTRUN -> [SKIP][42] ([Intel XE#8374]) +1 other test skip
[42]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-9/igt@xe_exec_fault_mode@many-multi-queue-rebind.html
* igt@xe_exec_multi_queue@many-queues-dyn-priority:
- shard-lnl: NOTRUN -> [SKIP][43] ([Intel XE#8364]) +1 other test skip
[43]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-lnl-1/igt@xe_exec_multi_queue@many-queues-dyn-priority.html
* igt@xe_exec_multi_queue@many-queues-priority-smem:
- shard-bmg: NOTRUN -> [SKIP][44] ([Intel XE#8364]) +9 other tests skip
[44]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-6/igt@xe_exec_multi_queue@many-queues-priority-smem.html
* igt@xe_exec_threads@threads-multi-queue-cm-shared-vm-userptr-rebind:
- shard-bmg: NOTRUN -> [SKIP][45] ([Intel XE#8378]) +1 other test skip
[45]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-9/igt@xe_exec_threads@threads-multi-queue-cm-shared-vm-userptr-rebind.html
* igt@xe_fault_injection@exec-queue-create-fail-xe_pxp_exec_queue_add:
- shard-bmg: NOTRUN -> [SKIP][46] ([Intel XE#6281] / [Intel XE#7426])
[46]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-9/igt@xe_fault_injection@exec-queue-create-fail-xe_pxp_exec_queue_add.html
* igt@xe_query@multigpu-query-config:
- shard-bmg: NOTRUN -> [SKIP][47] ([Intel XE#944])
[47]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-5/igt@xe_query@multigpu-query-config.html
* igt@xe_sriov_flr@flr-vf1-clear:
- shard-bmg: [PASS][48] -> [FAIL][49] ([Intel XE#6569])
[48]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-5286-6583dd200482a34bb17e5dc54551f91f53b79798/shard-bmg-10/igt@xe_sriov_flr@flr-vf1-clear.html
[49]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-5/igt@xe_sriov_flr@flr-vf1-clear.html
#### Possible fixes ####
* igt@kms_flip@flip-vs-expired-vblank@a-edp1:
- shard-lnl: [FAIL][50] ([Intel XE#301]) -> [PASS][51] +2 other tests pass
[50]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-5286-6583dd200482a34bb17e5dc54551f91f53b79798/shard-lnl-7/igt@kms_flip@flip-vs-expired-vblank@a-edp1.html
[51]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-lnl-1/igt@kms_flip@flip-vs-expired-vblank@a-edp1.html
* igt@kms_flip@flip-vs-expired-vblank@c-edp1:
- shard-lnl: [FAIL][52] ([Intel XE#301] / [Intel XE#3149]) -> [PASS][53]
[52]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-5286-6583dd200482a34bb17e5dc54551f91f53b79798/shard-lnl-7/igt@kms_flip@flip-vs-expired-vblank@c-edp1.html
[53]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-lnl-1/igt@kms_flip@flip-vs-expired-vblank@c-edp1.html
* igt@kms_hdr@static-toggle@pipe-a-hdmi-a-3-xrgb16161616f:
- shard-bmg: [SKIP][54] ([Intel XE#7915]) -> [PASS][55] +1 other test pass
[54]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-5286-6583dd200482a34bb17e5dc54551f91f53b79798/shard-bmg-2/igt@kms_hdr@static-toggle@pipe-a-hdmi-a-3-xrgb16161616f.html
[55]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-4/igt@kms_hdr@static-toggle@pipe-a-hdmi-a-3-xrgb16161616f.html
* igt@xe_evict@evict-beng-mixed-many-threads-small:
- shard-bmg: [INCOMPLETE][56] ([Intel XE#6321] / [Intel XE#8355]) -> [PASS][57]
[56]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-5286-6583dd200482a34bb17e5dc54551f91f53b79798/shard-bmg-8/igt@xe_evict@evict-beng-mixed-many-threads-small.html
[57]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-5/igt@xe_evict@evict-beng-mixed-many-threads-small.html
* igt@xe_sriov_vram@vf-access-provisioned:
- shard-bmg: [FAIL][58] ([Intel XE#7992]) -> [PASS][59] +1 other test pass
[58]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-5286-6583dd200482a34bb17e5dc54551f91f53b79798/shard-bmg-8/igt@xe_sriov_vram@vf-access-provisioned.html
[59]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-6/igt@xe_sriov_vram@vf-access-provisioned.html
* igt@xe_vm@large-userptr-binds-16777216:
- shard-lnl: [ABORT][60] ([Intel XE#8007]) -> [PASS][61]
[60]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-5286-6583dd200482a34bb17e5dc54551f91f53b79798/shard-lnl-5/igt@xe_vm@large-userptr-binds-16777216.html
[61]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-lnl-4/igt@xe_vm@large-userptr-binds-16777216.html
#### Warnings ####
* igt@kms_big_fb@x-tiled-8bpp-rotate-270:
- shard-lnl: [SKIP][62] ([Intel XE#1407]) -> [ABORT][63] ([Intel XE#4760])
[62]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-5286-6583dd200482a34bb17e5dc54551f91f53b79798/shard-lnl-2/igt@kms_big_fb@x-tiled-8bpp-rotate-270.html
[63]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-lnl-3/igt@kms_big_fb@x-tiled-8bpp-rotate-270.html
* igt@kms_flip@flip-vs-expired-vblank:
- shard-lnl: [FAIL][64] ([Intel XE#301] / [Intel XE#3149]) -> [FAIL][65] ([Intel XE#301])
[64]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-5286-6583dd200482a34bb17e5dc54551f91f53b79798/shard-lnl-7/igt@kms_flip@flip-vs-expired-vblank.html
[65]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-lnl-1/igt@kms_flip@flip-vs-expired-vblank.html
* igt@kms_tiled_display@basic-test-pattern:
- shard-bmg: [FAIL][66] ([Intel XE#1729] / [Intel XE#7424]) -> [SKIP][67] ([Intel XE#2426] / [Intel XE#5848])
[66]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-5286-6583dd200482a34bb17e5dc54551f91f53b79798/shard-bmg-5/igt@kms_tiled_display@basic-test-pattern.html
[67]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-9/igt@kms_tiled_display@basic-test-pattern.html
* igt@kms_tiled_display@basic-test-pattern-with-chamelium:
- shard-bmg: [SKIP][68] ([Intel XE#2426] / [Intel XE#5848]) -> [SKIP][69] ([Intel XE#2509] / [Intel XE#7437])
[68]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-5286-6583dd200482a34bb17e5dc54551f91f53b79798/shard-bmg-9/igt@kms_tiled_display@basic-test-pattern-with-chamelium.html
[69]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/shard-bmg-10/igt@kms_tiled_display@basic-test-pattern-with-chamelium.html
[Intel XE#1124]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1124
[Intel XE#1407]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1407
[Intel XE#1439]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1439
[Intel XE#1489]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1489
[Intel XE#1499]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1499
[Intel XE#1729]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1729
[Intel XE#2234]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2234
[Intel XE#2252]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2252
[Intel XE#2286]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2286
[Intel XE#2311]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2311
[Intel XE#2313]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2313
[Intel XE#2320]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2320
[Intel XE#2321]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2321
[Intel XE#2322]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2322
[Intel XE#2325]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2325
[Intel XE#2330]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2330
[Intel XE#2426]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2426
[Intel XE#2486]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2486
[Intel XE#2509]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2509
[Intel XE#2850]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2850
[Intel XE#2887]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2887
[Intel XE#301]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/301
[Intel XE#3149]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3149
[Intel XE#3321]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3321
[Intel XE#367]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/367
[Intel XE#4141]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4141
[Intel XE#4760]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4760
[Intel XE#5813]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5813
[Intel XE#5848]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5848
[Intel XE#6035]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6035
[Intel XE#6281]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6281
[Intel XE#6312]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6312
[Intel XE#6321]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6321
[Intel XE#6361]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6361
[Intel XE#6503]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6503
[Intel XE#6569]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6569
[Intel XE#6911]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6911
[Intel XE#7061]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7061
[Intel XE#7283]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7283
[Intel XE#7355]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7355
[Intel XE#7356]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7356
[Intel XE#7358]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7358
[Intel XE#7372]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7372
[Intel XE#7402]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7402
[Intel XE#7424]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7424
[Intel XE#7426]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7426
[Intel XE#7437]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7437
[Intel XE#7466]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7466
[Intel XE#7571]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7571
[Intel XE#7636]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7636
[Intel XE#7915]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7915
[Intel XE#7992]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7992
[Intel XE#8007]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/8007
[Intel XE#8265]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/8265
[Intel XE#8355]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/8355
[Intel XE#836]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/836
[Intel XE#8364]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/8364
[Intel XE#8374]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/8374
[Intel XE#8378]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/8378
[Intel XE#944]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/944
Build changes
-------------
* IGT: IGT_8976 -> IGT_8977
* Linux: xe-5286-6583dd200482a34bb17e5dc54551f91f53b79798 -> xe-pw-163428v9
IGT_8976: 8976
IGT_8977: c8bdb0bc2140e337e1d79969430c3d7a5c088c9e @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
xe-5286-6583dd200482a34bb17e5dc54551f91f53b79798: 6583dd200482a34bb17e5dc54551f91f53b79798
xe-pw-163428v9: 163428v9
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v9/index.html
[-- Attachment #2: Type: text/html, Size: 25357 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v9 1/6] Introduce Xe Uncorrectable Error Handling
2026-06-22 13:15 ` [PATCH v9 1/6] Introduce Xe Uncorrectable Error Handling Mallesh Koujalagi
@ 2026-06-23 20:54 ` sashiko-bot
0 siblings, 0 replies; 16+ messages in thread
From: sashiko-bot @ 2026-06-23 20:54 UTC (permalink / raw)
To: Mallesh Koujalagi; +Cc: dri-devel
Thank you for your contribution! Sashiko AI review found 10 potential issue(s) to consider:
- [High] Manual invocation of driver remove and probe callbacks in the PCI error slot_reset handler bypasses the driver core.
- [High] Unclosed devres group captures unintended resources and its forceful release causes Use-After-Free.
- [High] Endpoint driver directly modifies AER registers of the Upstream Switch Port via unsynchronized RMW.
- [High] Unbalanced `xe_device_wedged_put()` during non-fatal PCI error recovery causes refcount underflow and permanently wedges the device.
- [High] Clearing `xe->in_reset` prematurely and manually destroying devres causes TDR and async workers to dereference NULL MMIO pointers.
- [High] Unsynchronized checks of `xe_device_is_in_reset()` cause Time-of-Check to Time-of-Use (TOCTOU) races with PCI reset.
- [High] Calling `pci_match_id()` manually in `xe_pci_error_slot_reset()` returns NULL for devices bound via dynamic IDs, causing a NULL pointer dereference.
- [High] `prepare_device_for_reset()` disables the PCI device without cancelling or flushing active GT reset workers, risking MMIO timeouts.
- [Medium] Flawed accumulation logic and missing flood limits in `get_queued_pages` and `get_offlined_list` cause premature aborts or infinite loops.
- [Medium] Double removal of sysfs group `survivability_info_group` via both `devm_device_add_group` and `xe_survivability_mode_fini`.
--
commit 5c76039736fbbd64a08449cce81a6b27917ce366
Author: Riana Tauro <riana.tauro@intel.com>
Introduce Xe Uncorrectable Error Handling
This commit introduces uncorrectable error handling for the Xe driver, adding
PCI error handlers to recover from AER and system controller errors. It
includes device wedging logic and hooks into the PCI error recovery flow.
> diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
> index c9d4fb6c4ff6a..352e7eb008d90 100644
> --- a/drivers/gpu/drm/xe/xe_pci.c
> +++ b/drivers/gpu/drm/xe/xe_pci.c
> @@ -1078,6 +1079,7 @@ static int xe_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> const struct xe_device_desc *desc = (const void *)ent->driver_data;
> const struct xe_subplatform_desc *subplatform_desc;
> struct xe_device *xe;
> + void *devres_id;
> int err;
>
> subplatform_desc = find_subplatform(desc, pdev->device);
> @@ -1105,6 +1107,10 @@ static int xe_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> if (xe_display_driver_probe_defer(pdev))
> return -EPROBE_DEFER;
>
> + devres_id = devres_open_group(&pdev->dev, NULL, GFP_KERNEL);
> + if (!devres_id)
> + return -ENOMEM;
> +
[Severity: High]
Does this code capture all future devm allocations unintentionally?
A devres group is opened here via devres_open_group() but never closed.
Consequently, all subsequent devres allocations by the driver and subsystem
will be silently added to this group. When devres_release_group() is manually
called later in xe_pci_error_slot_reset(), it forcefully destroys all
devm-managed resources, including the drm_device memory. If userspace holds
an open FD to the DRM device, could subsequent IOCTLs or FD closures access
the freed memory and cause a use-after-free?
> diff --git a/drivers/gpu/drm/xe/xe_pci_error.c b/drivers/gpu/drm/xe/xe_pci_error.c
> new file mode 100644
> index 0000000000000..b08601f470d63
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_pci_error.c
> @@ -0,0 +1,135 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2026 Intel Corporation
> + */
> +
> +#include <linux/pci.h>
> +
> +#include "xe_device.h"
> +#include "xe_gt.h"
> +#include "xe_pci.h"
> +#include "xe_printk.h"
> +#include "xe_ras.h"
> +#include "xe_survivability_mode.h"
> +
> +static void prepare_device_for_reset(struct pci_dev *pdev)
> +{
> + struct xe_device *xe = pdev_to_xe_device(pdev);
> + struct xe_gt *gt;
> + u8 id;
> +
> + xe_device_set_in_reset(xe);
> +
> + /* Wedge the device to prevent userspace access during reset */
> + xe_device_wedged_get(xe);
> +
> + for_each_gt(gt, xe, id)
> + xe_gt_declare_wedged(gt);
> +
> + pci_disable_device(pdev);
> +}
[Severity: High]
Are active GT reset workers properly synchronized before disabling the device?
This calls pci_disable_device(pdev) while gt_reset_worker or other
hardware-accessing work items may be actively executing. Although
xe_device_set_in_reset() is called, this only prevents new workers from
starting.
If a worker is actively polling MMIO registers when pci_disable_device drops
the memory decode enable bit, the MMIO reads will instantly return 0xFFFFFFFF.
Can this cause polling loops to time out, delay the reset process, or trigger
unexpected hardware warnings?
[ ... ]
> +static pci_ers_result_t xe_pci_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
> +{
> + struct xe_device *xe = pdev_to_xe_device(pdev);
> +
> + xe_err(xe, "PCI error: detected state = %u\n", state);
> +
> + if (state == pci_channel_io_perm_failure)
> + return PCI_ERS_RESULT_DISCONNECT;
> +
> + /* If the device is already wedged or in survivability mode, do not attempt recovery */
> + if (xe_survivability_mode_is_boot_enabled(xe) || xe_device_wedged(xe))
> + return PCI_ERS_RESULT_DISCONNECT;
> +
> + switch (state) {
> + case pci_channel_io_normal:
> + return PCI_ERS_RESULT_CAN_RECOVER;
[Severity: High]
Will this cause a refcount underflow for non-fatal errors?
When receiving pci_channel_io_normal, this returns PCI_ERS_RESULT_CAN_RECOVER
without calling prepare_device_for_reset(), which means xe_device_wedged_get()
is never called.
If the subsequent mmio_enabled handler successfully recovers the error, the
PCI core directly invokes xe_pci_error_resume(), which unconditionally calls
xe_device_wedged_put(xe).
Could this underflow the xe->wedged.ref atomic counter to -1, permanently
wedging the device and incorrectly blocking all subsequent IOCTLs and GuC
submissions?
[ ... ]
> +static pci_ers_result_t xe_pci_error_slot_reset(struct pci_dev *pdev)
> +{
> + const struct pci_device_id *ent = pci_match_id(pdev->driver->id_table, pdev);
> + struct xe_device *xe = pdev_to_xe_device(pdev);
[Severity: High]
Could this return NULL for devices bound via dynamic IDs?
This retrieves the device ID table entry by manually calling pci_match_id(),
which exclusively searches the driver's statically compiled id_table. If the
device was bound using the dynamic ID mechanism via sysfs new_id,
pci_match_id() returns NULL.
> +
> + xe_err(xe, "PCI error: slot reset\n");
> +
> + pci_restore_state(pdev);
> +
> + if (pci_enable_device(pdev)) {
> + xe_err(xe, "Cannot re-enable PCI device after reset\n");
> + return PCI_ERS_RESULT_DISCONNECT;
> + }
> +
> + /*
> + * Secondary Bus Reset causes all VRAM state to be lost along with
> + * hardware state. As an initial step, re-probe the device to
> + * re-initialize the driver and hardware.
> + * TODO: optimize by re-initializing only the hardware state and re-creating
> + * kernel BOs.
> + */
> + xe_device_clear_in_reset(xe);
> + pdev->driver->remove(pdev);
> + devres_release_group(&pdev->dev, xe->devres_group_id);
[Severity: High]
Does clearing xe->in_reset prematurely cause a race condition with async
workers?
xe_device_clear_in_reset(xe) is called immediately before tearing down the
driver and calling devres_release_group(). The in_reset flag is meant to
prevent asynchronous workers (like the TDR timer) from accessing hardware
during reset.
By clearing it too early, TDR workers will pass the !xe_device_is_in_reset(xe)
check and execute concurrently with driver teardown. Since
devres_release_group() synchronously unmaps MMIO, could a racing TDR worker
attempt to read hardware registers using unmapped pointers (for example,
during xe_devcoredump()), leading to a panic?
> +
> + if (pdev->driver->probe(pdev, ent))
> + return PCI_ERS_RESULT_DISCONNECT;
[Severity: High]
Is it safe to pass a potentially NULL 'ent' pointer to the probe function?
If pci_match_id() returned NULL earlier, passing it here will cause
xe_pci_probe() to immediately dereference ent->driver_data, resulting in a
deterministic NULL pointer dereference.
[Severity: High]
Does manually invoking the remove and probe callbacks bypass the Linux
driver core?
Directly calling pdev->driver->remove(pdev) and pdev->driver->probe(pdev, ent)
bypasses the unbind/bind state machine of the Linux driver core. This
detaches the device from proper lifecycle management.
If the manual probe fails, devres is not automatically cleaned up.
Furthermore, the driver core still considers the device bound. Upon eventual
module unload or unbind, the driver core will execute remove() a second time
on the already-freed xe structure, potentially resulting in a use-after-free
or double-free regression.
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index 9458bf477fa6e..12416bfa32557 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -1532,7 +1532,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> * If devcoredump not captured and GuC capture for the job is not ready
> * do manual capture first and decide later if we need to use it
> */
> - if (!exec_queue_killed(q) && !xe->devcoredump.captured &&
> + if (!xe_device_is_in_reset(xe) && !exec_queue_killed(q) && !xe->devcoredump.captured &&
> !xe_guc_capture_get_matching_and_lock(q)) {
> /* take force wake before engine register manual capture */
> CLASS(xe_force_wake, fw_ref)(gt_to_fw(q->gt), XE_FORCEWAKE_ALL);
[Severity: High]
Is there a time-of-check to time-of-use (TOCTOU) race condition here?
Asynchronous workers rely on the atomic check xe_device_is_in_reset(xe) to
avoid interacting with hardware during PCI error recovery, but this flag is
checked without holding any synchronization lock.
Immediately after the worker verifies the device is not in reset, the PCI
error handler could concurrently invoke prepare_device_for_reset(), setting
the flag and disabling the PCI device. The worker would then blindly proceed
to access MMIO or manipulate firmware state on a disabled device. Could this
cause MMIO timeouts or undefined hardware behavior?
> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
> index 44f4e1a3455b7..d6db0e98586fe 100644
> --- a/drivers/gpu/drm/xe/xe_ras.c
> +++ b/drivers/gpu/drm/xe/xe_ras.c
> @@ -131,6 +126,305 @@ static inline const char *comp_to_str(u8 component)
> return xe_ras_components[component];
> }
>
> +static int send_page_offline(struct xe_device *xe, enum xe_ras_page_action action, u64 page_address)
[ ... ]
> +static void get_queued_pages(struct xe_device *xe)
> +{
> + struct xe_sysctrl_mailbox_command command = {0};
> + struct xe_ras_page_offline_queue response = {0};
> + u32 count = 0;
> + size_t rlen;
> + int ret, i;
> +
> + /* Supported only on platforms with system controller */
> + if (!xe->info.has_sysctrl)
> + return;
> +
> + xe_sysctrl_create_command(&command, XE_SYSCTRL_GROUP_GFSP,
> + XE_SYSCTRL_CMD_GET_OFFLINE_QUEUE, NULL, 0, &response,
> + sizeof(response));
> +
> + do {
> + memset(&response, 0, sizeof(response));
> +
> + ret = xe_sysctrl_send_command(&xe->sc, &command, &rlen);
> + if (ret) {
> + xe_err(xe, "sysctrl: failed to get page offline queue %d\n", ret);
> + return;
> + }
> +
> + if (rlen != sizeof(response)) {
> + xe_err(xe, "sysctrl: unexpected page offline queue response length %zu (expected %zu)\n",
> + rlen, sizeof(response));
> + return;
> + }
> +
> + for (i = 0; i < response.pages_returned && i < XE_RAS_NUM_PAGES; i++)
> + handle_page_offline(xe, response.page_addresses[i], true);
> +
> + count += response.pages_returned;
> + if (!response.pages_returned)
> + break;
> +
> + if (count > response.total_pages) {
> + xe_err(xe, "sysctrl: Pages returned from queue exceed total pages %u, returned %u\n",
> + response.total_pages, count);
> + return;
> + }
> + } while (response.additional_data);
[Severity: Medium]
Could the accumulation logic here lead to a premature abort or infinite loop?
The loop monotonically accumulates response.pages_returned into count, and
compares it against response.total_pages. If the firmware concurrently updates
total_pages to a smaller value as pages are dequeued, this check could falsely
trigger and prematurely abort the recovery.
Additionally, there is no maximum iteration limit (like
XE_SYSCTRL_FLOOD_LIMIT). If a buggy firmware implementation keeps setting
additional_data = 1 indefinitely, could this force the kernel into an infinite
loop or cause the u32 count to overflow?
[ ... ]
> +static void aer_unmask_and_downgrade_internal_error(struct xe_device *xe)
> +{
> + struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
> + u32 aer_uncorr_mask, aer_uncorr_sev, aer_uncorr_status;
> + struct pci_dev *usp;
> + u16 aer_cap;
> +
> + usp = find_usp_dev(pdev);
> + if (!usp)
> + return;
> +
> + aer_cap = pci_find_ext_capability(usp, PCI_EXT_CAP_ID_ERR);
> + if (!aer_cap) {
> + dev_info(&usp->dev, "AER capability unavailable\n");
> + return;
> + }
> +
> + /*
> + * Clear any stale Uncorrectable Internal Error Status event in Uncorrectable Error
> + * Status Register.
> + */
> + pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_STATUS, &aer_uncorr_status);
> + if (aer_uncorr_status & PCI_ERR_UNC_INTN)
> + pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_STATUS, PCI_ERR_UNC_INTN);
> +
> + /*
> + * All errors are steered to USP which is a PCIe AER Compliant device.
> + * Downgrade all the errors to non-fatal to prevent PCIe bus driver
> + * from triggering a Secondary Bus Reset (SBR). This allows error
> + * detection, containment and recovery in the driver.
> + *
> + * The Uncorrectable Error Severity Register has the 'Uncorrectable
> + * Internal Error Severity' set to fatal by default. Set this to
> + * non-fatal and unmask the error.
> + */
> +
> + /* Initialize Uncorrectable Error Severity Register */
> + pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_SEVER, &aer_uncorr_sev);
> + aer_uncorr_sev &= ~PCI_ERR_UNC_INTN;
> + pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_SEVER, aer_uncorr_sev);
> +
> + /* Initialize Uncorrectable Error Mask Register */
> + pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, &aer_uncorr_mask);
> + aer_uncorr_mask &= ~PCI_ERR_UNC_INTN;
> + pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, aer_uncorr_mask);
[Severity: High]
Is it safe for the endpoint driver to directly modify AER registers of the
Upstream Switch Port using an unsynchronized read-modify-write sequence?
Retrieving the parent Upstream Switch Port (USP) device and directly modifying
its AER Configuration space bypasses the pcieport driver which manages it.
Because the modification uses an unlocked read-modify-write sequence,
concurrent probing of multiple endpoint functions or concurrent AER event
handling by the PCI core could result in lost updates and permanently
corrupted AER configurations on the PCIe switch.
> diff --git a/drivers/gpu/drm/xe/xe_survivability_mode.c b/drivers/gpu/drm/xe/xe_survivability_mode.c
> index 427afd144f3ac..4c506027fa94c 100644
> --- a/drivers/gpu/drm/xe/xe_survivability_mode.c
> +++ b/drivers/gpu/drm/xe/xe_survivability_mode.c
> @@ -296,7 +304,8 @@ static int create_survivability_sysfs(struct pci_dev *pdev)
> if (ret)
> return ret;
>
> - if (check_boot_failure(xe)) {
> + /* Survivability info is not required if enabled via configfs */
> + if (!xe_configfs_get_survivability_mode(pdev)) {
> ret = devm_device_add_group(dev, &survivability_info_group);
> if (ret)
> return ret;
[Severity: Medium]
Will this result in a double-removal of the sysfs group?
The survivability_info_group is registered using devm_device_add_group(),
which sets up an automatic devres cleanup action. Additionally,
xe_survivability_mode_fini() is registered via devm_add_action_or_reset(),
which calls device_remove_group() for the exact same group.
During device teardown, devres executes the automatically added
devm_device_remove_group first. It then executes xe_survivability_mode_fini,
which removes the group again. Furthermore, if the configfs check prevents the
group from being added initially, xe_survivability_mode_fini will still
unconditionally attempt to remove it.
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260622131458.144873-8-mallesh.koujalagi@intel.com?part=1
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v9 2/6] drm: Add DRM_WEDGE_RECOVERY_COLD_RESET recovery method
2026-06-22 13:15 ` [PATCH v9 2/6] drm: Add DRM_WEDGE_RECOVERY_COLD_RESET recovery method Mallesh Koujalagi
@ 2026-06-23 21:04 ` sashiko-bot
0 siblings, 0 replies; 16+ messages in thread
From: sashiko-bot @ 2026-06-23 21:04 UTC (permalink / raw)
To: Mallesh Koujalagi; +Cc: dri-devel
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
Pre-existing issues:
- [High] Directly invoking driver `remove()` and `probe()` callbacks from the PCI error handler bypasses the Linux device model, leading to driver core state corruption and subsequent crashes.
- [High] WEDGE_STR_LEN buffer size is too small to hold all recovery methods, leading to silently truncated uevent strings that userspace cannot parse.
--
commit e0ad0ce6dec9b17be989295e8a1391f07c6154a5
Author: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
drm: Add DRM_WEDGE_RECOVERY_COLD_RESET recovery method
Introduce DRM_WEDGE_RECOVERY_COLD_RESET (BIT(4)) recovery method to handle
scenarios requiring device power cycle. This method addresses cases where other
recovery mechanisms are insufficient to restore device functionality.
> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> index 675675480da49..19c25366635e9 100644
> --- a/drivers/gpu/drm/drm_drv.c
> +++ b/drivers/gpu/drm/drm_drv.c
> @@ -535,6 +535,8 @@ static const char *drm_get_wedge_recovery(unsigned int opt)
> return "bus-reset";
> case DRM_WEDGE_RECOVERY_VENDOR:
> return "vendor-specific";
> + case DRM_WEDGE_RECOVERY_COLD_RESET:
> + return "cold-reset";
[Severity: High]
This is a pre-existing issue, but does adding the cold-reset method make the
WEDGE_STR_LEN buffer too small to hold all recovery methods?
Could this lead to silently truncated uevent strings that userspace cannot
parse?
In drm_dev_wedged_event(), the event_string buffer is hardcoded to 32 bytes:
drivers/gpu/drm/drm_drv.c:drm_dev_wedged_event() {
char event_string[WEDGE_STR_LEN];
...
len = scnprintf(event_string, sizeof(event_string), "%s", "WEDGED=");
for_each_set_bit(opt, &method, BITS_PER_TYPE(method)) {
...
len += scnprintf(event_string + len, sizeof(event_string) - len, "%s,", recovery);
}
if (recovery)
/* Get rid of trailing comma */
event_string[len - 1] = '\0';
...
}
When formatting multiple methods (e.g. "WEDGED=rebind,bus-reset,cold-reset"),
the string length easily exceeds 32 bytes. If scnprintf() hits the buffer limit,
it returns 0 on the last iteration. Decrementing len then chops off the last
valid character of the truncated string, resulting in a corrupted uevent that
breaks userspace parsing.
> default:
> return NULL;
> }
[ ... ]
> diff --git a/drivers/gpu/drm/xe/xe_pci_error.c b/drivers/gpu/drm/xe/xe_pci_error.c
> new file mode 100644
> index 0000000000000..b08601f470d63
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_pci_error.c
> @@ -0,0 +1,135 @@
[ ... ]
> +static pci_ers_result_t xe_pci_error_slot_reset(struct pci_dev *pdev)
> +{
> + const struct pci_device_id *ent = pci_match_id(pdev->driver->id_table, pdev);
> + struct xe_device *xe = pdev_to_xe_device(pdev);
[ ... ]
> + xe_device_clear_in_reset(xe);
> + pdev->driver->remove(pdev);
> + devres_release_group(&pdev->dev, xe->devres_group_id);
> +
> + if (pdev->driver->probe(pdev, ent))
[Severity: High]
This isn't a bug introduced by this patch, but does directly invoking the driver
remove() and probe() callbacks bypass the Linux device model?
The driver core still considers the device 'bound'. If the manual probe()
fails, it returns PCI_ERS_RESULT_DISCONNECT, causing the PCI core to formally
unbind the device later.
This invokes xe_pci_remove() on a partially initialized or already
manually-removed state, which could lead to resource double-frees and NULL
pointer dereferences.
> + return PCI_ERS_RESULT_DISCONNECT;
> +
> + xe = pdev_to_xe_device(pdev);
> +
> + /* Wedge the device to prevent I/O operations till the resume callback */
> + xe_device_wedged_get(xe);
> +
> + return PCI_ERS_RESULT_RECOVERED;
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260622131458.144873-8-mallesh.koujalagi@intel.com?part=2
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v9 3/6] drm/doc: Document DRM_WEDGE_RECOVERY_COLD_RESET recovery method
2026-06-22 13:15 ` [PATCH v9 3/6] drm/doc: Document " Mallesh Koujalagi
@ 2026-06-23 21:11 ` sashiko-bot
0 siblings, 0 replies; 16+ messages in thread
From: sashiko-bot @ 2026-06-23 21:11 UTC (permalink / raw)
To: Mallesh Koujalagi; +Cc: dri-devel
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Medium] The example recovery script fails to find the power control slot for directly connected PCI devices due to starting its sysfs traversal one level too high.
- [Low] The new table row for `cold-reset` exceeds the column width defined by the reStructuredText table underlines, which will cause a 'Malformed table' warning and break documentation builds.
--
commit affb0b839cc810c897bbc60a895efd262a86bba0
Author: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
drm/doc: Document DRM_WEDGE_RECOVERY_COLD_RESET recovery method
This patch documents the WEDGED=cold-reset recovery method, which is
used when a device requires a complete power cycle to recover from
an error condition. It also adds an example userspace recovery script.
> diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
> index 93df92c4ac8cf..43cdac65d04f0 100644
> --- a/Documentation/gpu/drm-uapi.rst
> +++ b/Documentation/gpu/drm-uapi.rst
[ ... ]
> @@ -441,6 +441,7 @@ following expectations.
> rebind unbind + bind driver
> bus-reset unbind + bus reset/re-enumeration + bind
> vendor-specific vendor specific recovery method
> + cold-reset remove device + slot power cycle + rescan
> unknown consumer policy
> =============== ========================================
[Severity: Low]
Will this exceed the column width defined by the reStructuredText table
underlines?
The Sphinx docutils parser treats table underlines as strict column width
boundaries. The second column's underline is 40 characters long, but the
new text 'remove device + slot power cycle + rescan' is 41 characters.
This usually causes a malformed table warning which can break kernel
documentation builds that treat warnings as errors.
[ ... ]
> @@ -530,6 +542,77 @@ Recovery script::
[ ... ]
> + PCI_DEVS=/sys/bus/pci/devices
> + PCI_SLOTS=/sys/bus/pci/slots
> +
> + syspath=$(readlink -f "/sys/$1/device" 2>/dev/null || readlink -f "/sys/$1" 2>/dev/null)
> + [ -n "$syspath" ] || die "cannot resolve sysfs path for: $1"
> +
> + dev=$(basename "$syspath")
> + [ -e "$PCI_DEVS/$dev" ] || die "not a PCI device: $dev"
> + echo "device : $dev"
> +
> + slot=""
> + walk=$(dirname "$(readlink -f "$PCI_DEVS/$dev")")
> +
> + while true; do
> + ancestor=$(basename "$walk")
[Severity: Medium]
Does using dirname here skip the device's own sysfs path, causing the script
to compare the parent bridge's address to the slot's child address?
When the script executes, it initializes walk using dirname, which skips the
target device. It compares the parent bridge's address against the slot
address attribute.
However, hotplug slot address attributes contain the address of the child
(the device plugged into the slot), not the parent bridge. This could cause
the script to fail to find the power control slot for directly connected PCI
devices and abort with a 'no hotplug slot' error.
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260622131458.144873-8-mallesh.koujalagi@intel.com?part=3
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v9 5/6] drm/xe: Suppress Surprise Link Down on device
2026-06-22 13:15 ` [PATCH v9 5/6] drm/xe: Suppress Surprise Link Down on device Mallesh Koujalagi
@ 2026-06-23 22:47 ` sashiko-bot
0 siblings, 0 replies; 16+ messages in thread
From: sashiko-bot @ 2026-06-23 22:47 UTC (permalink / raw)
To: Mallesh Koujalagi; +Cc: dri-devel
Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:
New issues:
- [High] VF driver modifies physical Upstream Switch Port (USP) AER configuration, violating SR-IOV isolation.
- [High] Unconditional unmasking of Surprise Link Down overrides platform/firmware AER configurations.
- [Medium] Logic asymmetry: Surprise Link Down is masked unconditionally but unmasked conditionally.
- [Medium] Unbalanced lifecycle operation: AER mask is modified before an asynchronous userspace recovery request.
- [Medium] Race condition / TOCTOU when modifying shared USP AER registers.
Pre-existing issues:
- [High] Hardware topology assumption in find_usp_dev() may modify system bridges instead of GPU internal switches.
--
commit ddd656b7786f19f5a390609ee31b584f88c5c3be
Author: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
Subject: drm/xe: Suppress Surprise Link Down on device
PUNIT errors can only be recovered using a power-cycle. Xe KMD sends a
uevent to notify userspace to trigger a power cycle. On platforms where link
drop caused by powering the device off and back on is reported by hardware
as a Surprise Link Down (SLD), which AER then escalates as an Uncorrectable
Fatal Error. That error fires before the device finishes coming back up and
defeats the very recovery we are attempting.
> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
> index 1eb2bbaccd9bd..5a2fee5a13083 100644
> --- a/drivers/gpu/drm/xe/xe_ras.c
> +++ b/drivers/gpu/drm/xe/xe_ras.c
> @@ -256,8 +256,37 @@ static struct pci_dev *find_usp_dev(struct pci_dev *pdev)
> return pci_upstream_bridge(vsp);
> }
[Severity: High]
This is a pre-existing issue, but does find_usp_dev() make an unchecked
assumption about the hardware topology? If the GPU is not connected via an
internal PCIe switch, navigating two levels up could return a generic
motherboard root port or external switch port. Modifying the AER registers
of this shared system bridge might affect error handling for other connected
devices.
>
> +static void pcie_suppress_surprise_link_down(struct pci_dev *usp)
> +{
> + u16 aer_cap;
> +
> + /*
> + * Cold reset power-cycles the slot, dropping the PCIe link.
> + * This triggers a spurious Surprise Link Down AER event on the
> + * Upstream Switch Port (USP). Mask this error to avoid false
> + * error reporting during recovery.
> + */
> + aer_cap = pci_find_ext_capability(usp, PCI_EXT_CAP_ID_ERR);
> + if (!aer_cap) {
> + dev_dbg(&usp->dev,
> + "AER capability not present\n");
> + return;
> + }
> +
> + pci_clear_and_set_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, 0, PCI_ERR_UNC_SURPDN);
[Severity: Medium]
Could this lead to a race condition or TOCTOU when modifying shared USP AER
registers? pci_clear_and_set_config_dword() performs a read, modifies the
value, and writes it back without holding a synchronization lock. If multiple
endpoints on the switch experience errors concurrently, their asynchronous
error handlers might race here.
> + dev_dbg(&usp->dev, "Surprise Link Down masked for cold reset\n");
> +}
> +
> static void punit_error_handler(struct xe_device *xe)
> {
> + struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
> + struct pci_dev *usp;
> +
> + usp = find_usp_dev(pdev);
> +
> + if (usp)
> + pcie_suppress_surprise_link_down(usp);
[Severity: High]
Does modifying a shared physical resource (the USP) from a Virtual Function
(VF) break SR-IOV isolation? The VF driver attempts to mask the Surprise Link
Down error on the physical USP without an IS_SRIOV_VF() check. Following
subsystem guidelines, VF drivers must not access PF-only resources and must
guard such paths.
[Severity: Medium]
Also, could this cause a logic asymmetry where the Surprise Link Down is
masked unconditionally, but unmasked conditionally? If the kernel is
compiled without CONFIG_PCIEAER, the driver will mask SLD during an error
but never unmask it upon recovery, potentially disabling SLD reporting
permanently.
> +
> xe_device_set_wedged_method(xe, DRM_WEDGE_RECOVERY_COLD_RESET);
> xe_device_declare_wedged(xe);
[Severity: Medium]
If userspace ignores the uevent, crashes, or the driver unbinds before the
reset completes, will the AER configuration remain permanently modified
without cleanup? This modifies the AER mask before an asynchronous userspace
recovery request, leaving no teardown mechanism to revert the mask on
failure.
> }
[ ... ]
> @@ -604,20 +644,22 @@ static void aer_unmask_and_downgrade_internal_error(struct xe_device *xe)
> * The Uncorrectable Error Severity Register has the 'Uncorrectable
> * Internal Error Severity' set to fatal by default. Set this to
> * non-fatal and unmask the error.
> + *
> + * Also restore the Surprise Link Down mask that was set in
> + * pcie_suppress_surprise_link_down() before the cold reset. The USP is
> + * never power-cycled, so the mask bit persists and must be cleared here
> + * to ensure link-down events are reported normally going forward.
> */
>
> /* Initialize Uncorrectable Error Severity Register */
> - pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_SEVER, &aer_uncorr_sev);
> - aer_uncorr_sev &= ~PCI_ERR_UNC_INTN;
> - pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_SEVER, aer_uncorr_sev);
> + pci_clear_and_set_config_dword(usp, aer_cap + PCI_ERR_UNCOR_SEVER, PCI_ERR_UNC_INTN, 0);
>
> - /* Initialize Uncorrectable Error Mask Register */
> - pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, &aer_uncorr_mask);
> - aer_uncorr_mask &= ~PCI_ERR_UNC_INTN;
> - pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, aer_uncorr_mask);
> + /* Unmask Uncorrectable Internal Error and restore Surprise Link Down to unmasked */
> + pci_clear_and_set_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK,
> + PCI_ERR_UNC_INTN | PCI_ERR_UNC_SURPDN, 0);
[Severity: High]
Does this unconditional unmasking of Surprise Link Down override the platform
or firmware AER configurations? Forcibly unmasking PCI_ERR_UNC_SURPDN without
restoring its original state can trigger spurious fatal AER events if the
platform firmware, BIOS, or hotplug driver intentionally masked the bit by
default to handle hardware errata.
>
> pci_save_state(usp);
> - dev_dbg(&usp->dev, "Uncorrectable Internal Errors downgraded and unmasked\n");
> + dev_dbg(&usp->dev, "AER: INTN downgraded to non-fatal, INTN and SLD unmasked\n");
> }
>
> /**
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260622131458.144873-8-mallesh.koujalagi@intel.com?part=5
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v9 6/6] drm/xe/ras: Use fault-inject to trigger punit error handler
2026-06-22 13:15 ` [PATCH v9 6/6] drm/xe/ras: Use fault-inject to trigger punit error handler Mallesh Koujalagi
@ 2026-06-23 22:47 ` sashiko-bot
0 siblings, 0 replies; 16+ messages in thread
From: sashiko-bot @ 2026-06-23 22:47 UTC (permalink / raw)
To: Mallesh Koujalagi; +Cc: dri-devel
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Medium] Global struct fault_attr causes dentry reference leak and cross-device state contamination
- [Medium] Potential NULL pointer dereference during early probe if wedged event is sent before DRM device registration
--
commit f49d10330bc9238666f6e80cf2c7887368a6612b
Author: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
drm/xe/ras: Use fault-inject to trigger punit error handler
Use fault-inject framework to trigger punit_error_handler()
for testing.
> diff --git a/drivers/gpu/drm/xe/xe_debugfs.c b/drivers/gpu/drm/xe/xe_debugfs.c
> index 22b471303984a..3a90deb5596ce 100644
> --- a/drivers/gpu/drm/xe/xe_debugfs.c
> +++ b/drivers/gpu/drm/xe/xe_debugfs.c
> @@ -40,6 +40,7 @@
>
> DECLARE_FAULT_ATTR(gt_reset_failure);
> DECLARE_FAULT_ATTR(inject_csc_hw_error);
> +DECLARE_FAULT_ATTR(inject_punit_error);
>
> static void read_residency_counter(struct xe_device *xe, struct xe_mmio *mmio,
> u32 offset, const char *name, struct drm_printer *p)
[ ... ]
> @@ -612,6 +613,8 @@ void xe_debugfs_register(struct xe_device *xe)
>
> fault_create_debugfs_attr("fail_gt_reset", root, >_reset_failure);
>
> + fault_create_debugfs_attr("inject_punit_error", root, &inject_punit_error);
[Severity: Medium]
Since inject_punit_error is declared as a global attribute, will calling
fault_create_debugfs_attr() for each device overwrite attr->dname?
Looking at fault_create_debugfs_attr(), it unconditionally assigns
attr->dname = dget(dir). If the system has multiple XE GPUs, each
subsequent device's debugfs registration would overwrite attr->dname without
calling dput() on the previous reference, leaking the dentry.
Would this also expose a single global fault injection state across all
devices, meaning a write to one device's debugfs file unintentionally alters
the behavior for all other XE devices?
> +
> if (IS_SRIOV_PF(xe))
> xe_sriov_pf_debugfs_register(xe, root);
> else if (IS_SRIOV_VF(xe))
> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
> index 5a2fee5a13083..bb8502472f738 100644
> --- a/drivers/gpu/drm/xe/xe_ras.c
> +++ b/drivers/gpu/drm/xe/xe_ras.c
[ ... ]
> @@ -708,6 +717,11 @@ enum xe_ras_recovery_action xe_ras_process_errors(struct xe_device *xe)
> size_t rlen;
> int ret;
>
> + if (fault_inject_punit_error()) {
> + punit_error_handler(xe);
> + return XE_RAS_RECOVERY_ACTION_DISCONNECT;
> + }
[Severity: Medium]
Could this trigger a NULL pointer dereference during early probe?
xe_ras_init() is called during driver probe and executes
xe_ras_process_errors(). If the injected fault triggers here, it executes
punit_error_handler() which in turn calls xe_device_declare_wedged().
If xe_device_declare_wedged() invokes drm_dev_wedged_event() to notify
userspace before the DRM device interface is fully registered (before
drm_dev_register() initializes the minor's kdev), it might evaluate
&dev->primary->kdev->kobj and dereference NULL.
Because the inject_punit_error fault attribute is global, if a user sets
the probability to 100% on the first probed device, could it inadvertently
trigger during the probe of a subsequently attached device and crash the
system?
> +
> if (!xe->info.has_sysctrl)
> return XE_RAS_RECOVERY_ACTION_RESET;
>
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260622131458.144873-8-mallesh.koujalagi@intel.com?part=6
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2026-06-23 22:47 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-22 13:14 [PATCH v9 0/6] Introduce cold reset recovery method Mallesh Koujalagi
2026-06-22 13:15 ` [PATCH v9 1/6] Introduce Xe Uncorrectable Error Handling Mallesh Koujalagi
2026-06-23 20:54 ` sashiko-bot
2026-06-22 13:15 ` [PATCH v9 2/6] drm: Add DRM_WEDGE_RECOVERY_COLD_RESET recovery method Mallesh Koujalagi
2026-06-23 21:04 ` sashiko-bot
2026-06-22 13:15 ` [PATCH v9 3/6] drm/doc: Document " Mallesh Koujalagi
2026-06-23 21:11 ` sashiko-bot
2026-06-22 13:15 ` [PATCH v9 4/6] drm/xe: Handle PUNIT errors by requesting cold-reset recovery Mallesh Koujalagi
2026-06-22 13:15 ` [PATCH v9 5/6] drm/xe: Suppress Surprise Link Down on device Mallesh Koujalagi
2026-06-23 22:47 ` sashiko-bot
2026-06-22 13:15 ` [PATCH v9 6/6] drm/xe/ras: Use fault-inject to trigger punit error handler Mallesh Koujalagi
2026-06-23 22:47 ` sashiko-bot
2026-06-22 17:20 ` ✗ CI.checkpatch: warning for Introduce cold reset recovery method (rev9) Patchwork
2026-06-22 17:22 ` ✓ CI.KUnit: success " Patchwork
2026-06-22 18:09 ` ✓ Xe.CI.BAT: " Patchwork
2026-06-22 22:03 ` ✗ Xe.CI.FULL: failure " Patchwork
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.