* [PATCH v8 0/6] Introduce cold reset recovery method
@ 2026-06-12 8:07 Mallesh Koujalagi
2026-06-12 8:07 ` [PATCH v8 1/6] Introduce Xe Uncorrectable Error Handling Mallesh Koujalagi
` (9 more replies)
0 siblings, 10 replies; 17+ messages in thread
From: Mallesh Koujalagi @ 2026-06-12 8:07 UTC (permalink / raw)
To: intel-xe, dri-devel, rodrigo.vivi
Cc: andrealmeid, christian.koenig, airlied, simona.vetter, mripard,
maarten.lankhorst, tzimmermann, anshuman.gupta, badal.nilawar,
riana.tauro, karthik.poosa, sk.anirban, raag.jadav,
Mallesh Koujalagi
This series builds on top of Introduce Xe Uncorrectable Error Handling[1]
and adds support for handling errors that require a complete
device power cycle (cold reset) to recover.
Certain error conditions leave the device in a persistent hardware
error state that cannot be cleared through existing recovery mechanisms
such as driver reload or PCIe reset. In these cases, functionality can
only be restored by performing a cold reset.
To support this, the series introduces a new DRM wedging recovery
method, DRM_WEDGE_RECOVERY_COLD_RESET (BIT(4)). When a device is wedged
with this method, the DRM core notifies userspace via a uevent that a cold
reset is required. This allows userspace to take appropriate action to
power-cycle the device.
Example uevent received:
SUBSYSTEM=drm
WEDGED=cold-reset
DEVPATH=/devices/.../drm/card0
Detailed description in commit message.
[1] https://patchwork.freedesktop.org/series/160482/
This patch series introduces a call to punit_error_handler() from
within handle_soc_internal_errors() when PUNIT errors detected.
v2:
- Add use case: Handling errors from power management unit,
which requires a complete power cycle to
recover. (Christian)
- Add several instead of number to avoid update. (Jani)
v3:
- Update any scenario that requires cold-reset. (Riana)
- Update document with generic scenario. (Riana)
- Consistent with terminology. (Raag)
- Remove already covered information.
- Use PUNIT instead of PMU. (Riana)
- Use consistent wordingi.
- Remove log. (Raag)
v4:
- Rename cold reset to power cyclce. (Raag)
- Update doc. (Raag/Riana)
- Change commit message. (Raag)
- Make function static. (Raag)
v5:
- Make it consistent with consumer expectations. (Raag)
- Update commit message.
- Remove unbind.
- Simplify cold-reset script.
- Remove kdoc for static function.
- Remove xe_ prefix for static function.
v6:
- Drop "last resort" wording. (Riana)
- Look up the hotplug slot in DEVPATH instead of scanning
every PCI slot on the system. (Raag)
- Drop arbitrary sleep values from the example script.
- Expand commit message to explain why SUR_DN is masked. (Raag/Riana)
- Check Slot Implemented bit before reading Slot Capabilities, per
PCIe spec. (Riana)
- Add debug log.
v7:
- Update recovery script. (Raag)
- Handle surprise link down event properly. (Aravind/Riana)
- Update commit message. (Riana)
- Correct log message.
v8:
- Add rescan instead of reset. (Raag)
- Use find_usp_dev() in punit_error_handler() function.
Cc: André Almeida <andrealmeid@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona.vetter@ffwll.ch>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Mallesh Koujalagi (5):
drm: Add DRM_WEDGE_RECOVERY_COLD_RESET recovery method
drm/doc: Document DRM_WEDGE_RECOVERY_COLD_RESET recovery method
drm/xe: Handle PUNIT errors by requesting cold-reset recovery
drm/xe: Suppress Surprise Link Down on device
drm/xe/ras: Add debugfs entry to inject punit error
Riana Tauro (1):
Introduce Xe Uncorrectable Error Handling
Documentation/gpu/drm-uapi.rst | 85 ++-
drivers/gpu/drm/drm_drv.c | 2 +
drivers/gpu/drm/xe/Makefile | 1 +
drivers/gpu/drm/xe/xe_debugfs.c | 3 +
drivers/gpu/drm/xe/xe_device.c | 24 +-
drivers/gpu/drm/xe/xe_device.h | 27 +-
drivers/gpu/drm/xe/xe_device_types.h | 12 +-
drivers/gpu/drm/xe/xe_gt.c | 14 +-
drivers/gpu/drm/xe/xe_guc_submit.c | 9 +-
drivers/gpu/drm/xe/xe_pci.c | 9 +
drivers/gpu/drm/xe/xe_pci_error.c | 135 +++++
drivers/gpu/drm/xe/xe_pci_error.h | 13 +
drivers/gpu/drm/xe/xe_ras.c | 570 ++++++++++++++++++
drivers/gpu/drm/xe/xe_ras.h | 11 +
drivers/gpu/drm/xe/xe_ras_types.h | 227 +++++++
drivers/gpu/drm/xe/xe_survivability_mode.c | 13 +-
drivers/gpu/drm/xe/xe_sysctrl_event.c | 2 +-
drivers/gpu/drm/xe/xe_sysctrl_event_types.h | 3 -
drivers/gpu/drm/xe/xe_sysctrl_mailbox.c | 28 +
drivers/gpu/drm/xe/xe_sysctrl_mailbox.h | 4 +-
drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h | 11 +
include/drm/drm_device.h | 1 +
22 files changed, 1175 insertions(+), 29 deletions(-)
create mode 100644 drivers/gpu/drm/xe/xe_pci_error.c
create mode 100644 drivers/gpu/drm/xe/xe_pci_error.h
--
2.34.1
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v8 1/6] Introduce Xe Uncorrectable Error Handling
2026-06-12 8:07 [PATCH v8 0/6] Introduce cold reset recovery method Mallesh Koujalagi
@ 2026-06-12 8:07 ` Mallesh Koujalagi
2026-06-12 8:24 ` sashiko-bot
2026-06-12 8:07 ` [PATCH v8 2/6] drm: Add DRM_WEDGE_RECOVERY_COLD_RESET recovery method Mallesh Koujalagi
` (8 subsequent siblings)
9 siblings, 1 reply; 17+ messages in thread
From: Mallesh Koujalagi @ 2026-06-12 8:07 UTC (permalink / raw)
To: intel-xe, dri-devel, rodrigo.vivi
Cc: andrealmeid, christian.koenig, airlied, simona.vetter, mripard,
maarten.lankhorst, tzimmermann, anshuman.gupta, badal.nilawar,
riana.tauro, karthik.poosa, sk.anirban, raag.jadav,
Mallesh Koujalagi
From: Riana Tauro <riana.tauro@intel.com>
DO NOT REVIEW. COMPILATION ONLY
This patch is from https://patchwork.freedesktop.org/series/160482/
Added only for Compilation.
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
---
drivers/gpu/drm/xe/Makefile | 1 +
drivers/gpu/drm/xe/xe_device.c | 24 +-
drivers/gpu/drm/xe/xe_device.h | 27 +-
drivers/gpu/drm/xe/xe_device_types.h | 12 +-
drivers/gpu/drm/xe/xe_gt.c | 14 +-
drivers/gpu/drm/xe/xe_guc_submit.c | 9 +-
drivers/gpu/drm/xe/xe_pci.c | 9 +
drivers/gpu/drm/xe/xe_pci_error.c | 135 +++++
drivers/gpu/drm/xe/xe_pci_error.h | 13 +
drivers/gpu/drm/xe/xe_ras.c | 494 ++++++++++++++++++
drivers/gpu/drm/xe/xe_ras.h | 4 +
drivers/gpu/drm/xe/xe_ras_types.h | 227 ++++++++
drivers/gpu/drm/xe/xe_survivability_mode.c | 13 +-
drivers/gpu/drm/xe/xe_sysctrl_event.c | 2 +-
drivers/gpu/drm/xe/xe_sysctrl_event_types.h | 3 -
drivers/gpu/drm/xe/xe_sysctrl_mailbox.c | 28 +
drivers/gpu/drm/xe/xe_sysctrl_mailbox.h | 4 +-
drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h | 11 +
18 files changed, 1002 insertions(+), 28 deletions(-)
create mode 100644 drivers/gpu/drm/xe/xe_pci_error.c
create mode 100644 drivers/gpu/drm/xe/xe_pci_error.h
diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index 8e7b146880f4..3c001b2a4aec 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -101,6 +101,7 @@ xe-y += xe_bb.o \
xe_page_reclaim.o \
xe_pat.o \
xe_pci.o \
+ xe_pci_error.o \
xe_pci_rebar.o \
xe_pcode.o \
xe_pm.o \
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 51e3a2dd7b22..7ee2148f1321 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -61,6 +61,7 @@
#include "xe_psmi.h"
#include "xe_pxp.h"
#include "xe_query.h"
+#include "xe_ras.h"
#include "xe_shrinker.h"
#include "xe_soc_remapper.h"
#include "xe_survivability_mode.h"
@@ -915,7 +916,7 @@ static void xe_device_wedged_fini(struct drm_device *drm, void *arg)
{
struct xe_device *xe = arg;
- if (atomic_read(&xe->wedged.flag))
+ if (atomic_read(&xe->wedged.fini))
xe_pm_runtime_put(xe);
}
@@ -988,6 +989,16 @@ int xe_device_probe(struct xe_device *xe)
if (err)
return err;
+ err = xe_soc_remapper_init(xe);
+ if (err)
+ return err;
+
+ err = xe_sysctrl_init(xe);
+ if (err)
+ return err;
+
+ xe_ras_init(xe);
+
/*
* Now that GT is initialized (TTM in particular),
* we can try to init display, and inherit the initial fb.
@@ -1028,10 +1039,6 @@ int xe_device_probe(struct xe_device *xe)
xe_nvm_init(xe);
- err = xe_soc_remapper_init(xe);
- if (err)
- return err;
-
err = xe_heci_gsc_init(xe);
if (err)
return err;
@@ -1070,10 +1077,6 @@ int xe_device_probe(struct xe_device *xe)
if (err)
goto err_unregister_display;
- err = xe_sysctrl_init(xe);
- if (err)
- goto err_unregister_display;
-
err = xe_device_sysfs_init(xe);
if (err)
goto err_unregister_display;
@@ -1411,7 +1414,8 @@ void xe_device_declare_wedged(struct xe_device *xe)
return;
}
- if (!atomic_xchg(&xe->wedged.flag, 1)) {
+ if (!atomic_xchg(&xe->wedged.fini, 1)) {
+ xe_device_wedged_get(xe);
xe->needs_flr_on_fini = true;
xe_pm_runtime_get_noresume(xe);
drm_err(&xe->drm,
diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
index 975768a6a9c8..e177c05a7a95 100644
--- a/drivers/gpu/drm/xe/xe_device.h
+++ b/drivers/gpu/drm/xe/xe_device.h
@@ -181,6 +181,21 @@ static inline bool xe_device_has_mert(const struct xe_device *xe)
return xe->info.has_mert;
}
+static inline bool xe_device_is_in_reset(struct xe_device *xe)
+{
+ return atomic_read(&xe->in_reset);
+}
+
+static inline void xe_device_set_in_reset(struct xe_device *xe)
+{
+ atomic_set(&xe->in_reset, 1);
+}
+
+static inline void xe_device_clear_in_reset(struct xe_device *xe)
+{
+ atomic_set(&xe->in_reset, 0);
+}
+
u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size);
void xe_device_snapshot_print(struct xe_device *xe, struct drm_printer *p);
@@ -192,9 +207,19 @@ bool xe_device_is_l2_flush_optimized(struct xe_device *xe);
void xe_device_td_flush(struct xe_device *xe);
void xe_device_l2_flush(struct xe_device *xe);
+static inline void xe_device_wedged_get(struct xe_device *xe)
+{
+ atomic_inc(&xe->wedged.ref);
+}
+
+static inline void xe_device_wedged_put(struct xe_device *xe)
+{
+ atomic_dec(&xe->wedged.ref);
+}
+
static inline bool xe_device_wedged(struct xe_device *xe)
{
- return atomic_read(&xe->wedged.flag);
+ return atomic_read(&xe->wedged.ref);
}
void xe_device_set_wedged_method(struct xe_device *xe, unsigned long method);
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 32dd2ffbc796..bf43a3277d1e 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -483,10 +483,15 @@ struct xe_device {
/** @needs_flr_on_fini: requests function-reset on fini */
bool needs_flr_on_fini;
+ /** @in_reset: Indicates if device is in reset */
+ atomic_t in_reset;
+
/** @wedged: Struct to control Wedged States and mode */
struct {
- /** @wedged.flag: Xe device faced a critical error and is now blocked. */
- atomic_t flag;
+ /** @wedged.fini: Needs cleanup on fini */
+ atomic_t fini;
+ /** @wedged.ref: Refcount for wedged device, blocks critical path execution */
+ atomic_t ref;
/** @wedged.mode: Mode controlled by kernel parameter and debugfs */
enum xe_wedged_mode mode;
/** @wedged.method: Recovery method to be sent in the drm device wedged uevent */
@@ -495,6 +500,9 @@ struct xe_device {
bool inconsistent_reset;
} wedged;
+ /** @devres_group_id: id for devres group */
+ void *devres_group_id;
+
/** @bo_device: Struct to control async free of BOs */
struct xe_bo_dev {
/** @bo_device.async_free: Free worker */
diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index 783eb6d631b5..d904527a8898 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -917,6 +917,9 @@ static void gt_reset_worker(struct work_struct *w)
if (xe_device_wedged(gt_to_xe(gt)))
goto err_pm_put;
+ if (xe_device_is_in_reset(gt_to_xe(gt)))
+ goto err_pm_put;
+
/* We only support GT resets with GuC submission */
if (!xe_device_uc_enabled(gt_to_xe(gt)))
goto err_pm_put;
@@ -977,18 +980,21 @@ static void gt_reset_worker(struct work_struct *w)
void xe_gt_reset_async(struct xe_gt *gt)
{
- xe_gt_info(gt, "trying reset from %ps\n", __builtin_return_address(0));
+ struct xe_device *xe = gt_to_xe(gt);
+
+ if (xe_device_is_in_reset(xe))
+ return;
/* Don't do a reset while one is already in flight */
if (!xe_fault_inject_gt_reset() && xe_uc_reset_prepare(>->uc))
return;
- xe_gt_info(gt, "reset queued\n");
+ xe_gt_info(gt, "reset queued from %ps\n", __builtin_return_address(0));
/* Pair with put in gt_reset_worker() if work is enqueued */
- xe_pm_runtime_get_noresume(gt_to_xe(gt));
+ xe_pm_runtime_get_noresume(xe);
if (!queue_work(gt->ordered_wq, >->reset.worker))
- xe_pm_runtime_put(gt_to_xe(gt));
+ xe_pm_runtime_put(xe);
}
void xe_gt_suspend_prepare(struct xe_gt *gt)
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index e82018445b7c..c64d6b69eb92 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1532,7 +1532,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
* If devcoredump not captured and GuC capture for the job is not ready
* do manual capture first and decide later if we need to use it
*/
- if (!exec_queue_killed(q) && !xe->devcoredump.captured &&
+ if (!xe_device_is_in_reset(xe) && !exec_queue_killed(q) && !xe->devcoredump.captured &&
!xe_guc_capture_get_matching_and_lock(q)) {
/* take force wake before engine register manual capture */
CLASS(xe_force_wake, fw_ref)(gt_to_fw(q->gt), XE_FORCEWAKE_ALL);
@@ -1554,8 +1554,8 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
set_exec_queue_banned(q);
/* Kick job / queue off hardware */
- if (!wedged && (exec_queue_enabled(primary) ||
- exec_queue_pending_disable(primary))) {
+ if (!xe_device_is_in_reset(xe) && !wedged &&
+ (exec_queue_enabled(primary) || exec_queue_pending_disable(primary))) {
int ret;
if (exec_queue_reset(primary))
@@ -1623,7 +1623,8 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
trace_xe_sched_job_timedout(job);
- if (!exec_queue_killed(q))
+ /* Do not access device if in reset */
+ if (!xe_device_is_in_reset(xe) && !exec_queue_killed(q))
xe_devcoredump(q, job,
"Timedout job - seqno=%u, lrc_seqno=%u, guc_id=%d, flags=0x%lx",
xe_sched_job_seqno(job), xe_sched_job_lrc_seqno(job),
diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
index 78fc2e4dcfc6..ab74a5852dbd 100644
--- a/drivers/gpu/drm/xe/xe_pci.c
+++ b/drivers/gpu/drm/xe/xe_pci.c
@@ -26,6 +26,7 @@
#include "xe_guc.h"
#include "xe_mmio.h"
#include "xe_module.h"
+#include "xe_pci_error.h"
#include "xe_pci_rebar.h"
#include "xe_pci_sriov.h"
#include "xe_pci_types.h"
@@ -1076,6 +1077,7 @@ static int xe_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
const struct xe_device_desc *desc = (const void *)ent->driver_data;
const struct xe_subplatform_desc *subplatform_desc;
struct xe_device *xe;
+ void *devres_id;
int err;
subplatform_desc = find_subplatform(desc, pdev->device);
@@ -1103,6 +1105,10 @@ static int xe_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
if (xe_display_driver_probe_defer(pdev))
return -EPROBE_DEFER;
+ devres_id = devres_open_group(&pdev->dev, NULL, GFP_KERNEL);
+ if (!devres_id)
+ return -ENOMEM;
+
err = pcim_enable_device(pdev);
if (err)
return err;
@@ -1111,6 +1117,8 @@ static int xe_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
if (IS_ERR(xe))
return PTR_ERR(xe);
+ xe->devres_group_id = devres_id;
+
pci_set_drvdata(pdev, &xe->drm);
xe_pm_assert_unbounded_bridge(xe);
@@ -1349,6 +1357,7 @@ static struct pci_driver xe_pci_driver = {
.remove = xe_pci_remove,
.shutdown = xe_pci_shutdown,
.sriov_configure = xe_pci_sriov_configure,
+ .err_handler = &xe_pci_error_handlers,
#ifdef CONFIG_PM_SLEEP
.driver.pm = &xe_pm_ops,
#endif
diff --git a/drivers/gpu/drm/xe/xe_pci_error.c b/drivers/gpu/drm/xe/xe_pci_error.c
new file mode 100644
index 000000000000..b08601f470d6
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_pci_error.c
@@ -0,0 +1,135 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#include <linux/pci.h>
+
+#include "xe_device.h"
+#include "xe_gt.h"
+#include "xe_pci.h"
+#include "xe_printk.h"
+#include "xe_ras.h"
+#include "xe_survivability_mode.h"
+
+static void prepare_device_for_reset(struct pci_dev *pdev)
+{
+ struct xe_device *xe = pdev_to_xe_device(pdev);
+ struct xe_gt *gt;
+ u8 id;
+
+ xe_device_set_in_reset(xe);
+
+ /* Wedge the device to prevent userspace access during reset */
+ xe_device_wedged_get(xe);
+
+ for_each_gt(gt, xe, id)
+ xe_gt_declare_wedged(gt);
+
+ pci_disable_device(pdev);
+}
+
+static pci_ers_result_t ras_action_to_pci_result(struct pci_dev *pdev, u32 action)
+{
+ switch (action) {
+ case XE_RAS_RECOVERY_ACTION_RECOVERED:
+ return PCI_ERS_RESULT_RECOVERED;
+ case XE_RAS_RECOVERY_ACTION_RESET:
+ prepare_device_for_reset(pdev);
+ return PCI_ERS_RESULT_NEED_RESET;
+ case XE_RAS_RECOVERY_ACTION_DISCONNECT:
+ return PCI_ERS_RESULT_DISCONNECT;
+ default:
+ return PCI_ERS_RESULT_DISCONNECT;
+ }
+}
+
+static pci_ers_result_t xe_pci_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
+{
+ struct xe_device *xe = pdev_to_xe_device(pdev);
+
+ xe_err(xe, "PCI error: detected state = %u\n", state);
+
+ if (state == pci_channel_io_perm_failure)
+ return PCI_ERS_RESULT_DISCONNECT;
+
+ /* If the device is already wedged or in survivability mode, do not attempt recovery */
+ if (xe_survivability_mode_is_boot_enabled(xe) || xe_device_wedged(xe))
+ return PCI_ERS_RESULT_DISCONNECT;
+
+ switch (state) {
+ case pci_channel_io_normal:
+ return PCI_ERS_RESULT_CAN_RECOVER;
+ case pci_channel_io_frozen:
+ prepare_device_for_reset(pdev);
+ return PCI_ERS_RESULT_NEED_RESET;
+ default:
+ xe_err(xe, "PCI error: unknown state %d\n", state);
+ return PCI_ERS_RESULT_NEED_RESET;
+ }
+}
+
+static pci_ers_result_t xe_pci_error_mmio_enabled(struct pci_dev *pdev)
+{
+ struct xe_device *xe = pdev_to_xe_device(pdev);
+ enum xe_ras_recovery_action action;
+
+ xe_err(xe, "PCI error: MMIO enabled\n");
+
+ action = xe_ras_process_errors(xe);
+
+ return ras_action_to_pci_result(pdev, action);
+}
+
+static pci_ers_result_t xe_pci_error_slot_reset(struct pci_dev *pdev)
+{
+ const struct pci_device_id *ent = pci_match_id(pdev->driver->id_table, pdev);
+ struct xe_device *xe = pdev_to_xe_device(pdev);
+
+ xe_err(xe, "PCI error: slot reset\n");
+
+ pci_restore_state(pdev);
+
+ if (pci_enable_device(pdev)) {
+ xe_err(xe, "Cannot re-enable PCI device after reset\n");
+ return PCI_ERS_RESULT_DISCONNECT;
+ }
+
+ /*
+ * Secondary Bus Reset causes all VRAM state to be lost along with
+ * hardware state. As an initial step, re-probe the device to
+ * re-initialize the driver and hardware.
+ * TODO: optimize by re-initializing only the hardware state and re-creating
+ * kernel BOs.
+ */
+ xe_device_clear_in_reset(xe);
+ pdev->driver->remove(pdev);
+ devres_release_group(&pdev->dev, xe->devres_group_id);
+
+ if (pdev->driver->probe(pdev, ent))
+ return PCI_ERS_RESULT_DISCONNECT;
+
+ xe = pdev_to_xe_device(pdev);
+
+ /* Wedge the device to prevent I/O operations till the resume callback */
+ xe_device_wedged_get(xe);
+
+ return PCI_ERS_RESULT_RECOVERED;
+}
+
+static void xe_pci_error_resume(struct pci_dev *pdev)
+{
+ struct xe_device *xe = pdev_to_xe_device(pdev);
+
+ xe_err(xe, "PCI error: resume\n");
+
+ /* Resume I/O operations */
+ xe_device_wedged_put(xe);
+}
+
+const struct pci_error_handlers xe_pci_error_handlers = {
+ .error_detected = xe_pci_error_detected,
+ .mmio_enabled = xe_pci_error_mmio_enabled,
+ .slot_reset = xe_pci_error_slot_reset,
+ .resume = xe_pci_error_resume,
+};
diff --git a/drivers/gpu/drm/xe/xe_pci_error.h b/drivers/gpu/drm/xe/xe_pci_error.h
new file mode 100644
index 000000000000..725ad0214e62
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_pci_error.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#ifndef _XE_PCI_ERROR_H_
+#define _XE_PCI_ERROR_H_
+
+struct pci_error_handlers;
+
+extern const struct pci_error_handlers xe_pci_error_handlers;
+
+#endif
diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
index 4cb16b419b0c..6a5140bc47e5 100644
--- a/drivers/gpu/drm/xe/xe_ras.c
+++ b/drivers/gpu/drm/xe/xe_ras.c
@@ -3,12 +3,19 @@
* Copyright © 2026 Intel Corporation
*/
+#include "xe_bo.h"
#include "xe_device.h"
#include "xe_printk.h"
#include "xe_ras.h"
#include "xe_ras_types.h"
+#include "xe_survivability_mode.h"
#include "xe_sysctrl.h"
#include "xe_sysctrl_event_types.h"
+#include "xe_sysctrl_mailbox.h"
+#include "xe_sysctrl_mailbox_types.h"
+
+#define CORE_COMPUTE_UNCORR_TYPE GENMASK(26, 25)
+#define GLOBAL_UNCORR_ERROR 2
/* Severity of detected errors */
enum xe_ras_severity {
@@ -66,6 +73,309 @@ static inline const char *comp_to_str(u8 component)
return xe_ras_components[component];
}
+static int ras_status_to_errno(u32 status)
+{
+ switch (status) {
+ case XE_RAS_STATUS_SUCCESS:
+ return 0;
+ case XE_RAS_STATUS_INVALID_PARAM:
+ return -EINVAL;
+ case XE_RAS_STATUS_OP_NOT_SUPPORTED:
+ return -EOPNOTSUPP;
+ case XE_RAS_STATUS_TIMEOUT:
+ return -ETIMEDOUT;
+ case XE_RAS_STATUS_HARDWARE_FAILURE:
+ return -EIO;
+ case XE_RAS_STATUS_INSUFFICIENT_RESOURCES:
+ return -ENOSPC;
+ default:
+ return -EPROTO;
+ }
+}
+
+static int send_page_offline(struct xe_device *xe, enum xe_ras_page_action action, u64 page_address)
+{
+ struct xe_sysctrl_mailbox_command command = {0};
+ struct xe_ras_page_offline_request request = {0};
+ struct xe_ras_page_offline_response response = {0};
+ size_t rlen;
+ int ret;
+
+ if (!xe->info.has_sysctrl)
+ return 0;
+
+ if (action >= XE_RAS_PAGE_ACTION_MAX) {
+ xe_err(xe, "[RAS]: Invalid page offline action %d\n", action);
+ return -EINVAL;
+ }
+
+ request.page_address = page_address;
+ request.action = action;
+
+ xe_sysctrl_create_command(&command, XE_SYSCTRL_GROUP_GFSP, XE_SYSCTRL_CMD_PAGE_OFFLINE,
+ &request, sizeof(request), &response, sizeof(response));
+
+ ret = xe_sysctrl_send_command(&xe->sc, &command, &rlen);
+ if (ret) {
+ xe_err(xe, "sysctrl: failed to send page offline command %d\n", ret);
+ return ret;
+ }
+
+ if (rlen != sizeof(response)) {
+ xe_err(xe, "sysctrl: unexpected page offline response length %zu (expected %zu)\n",
+ rlen, sizeof(response));
+ return -EINVAL;
+ }
+
+ ret = ras_status_to_errno(response.status);
+ if (ret) {
+ xe_err(xe, "sysctrl: page offline command failed with status %d\n",
+ response.status);
+ }
+
+ return ret;
+}
+
+static int handle_page_offline(struct xe_device *xe, u64 page_address, bool send_offline_cmd)
+{
+ enum xe_ras_page_action action;
+ int ret = 0;
+
+ if (!IS_ALIGNED(page_address, XE_PAGE_SIZE)) {
+ xe_err(xe, "sysctrl: Unaligned page address: 0x%llx\n", page_address);
+ return -EINVAL;
+ }
+
+ /*
+ * TODO: Call function to handle address fault
+ * ret = xe_ttm_vram_handle_addr_fault(xe, page_address);
+ */
+
+ /*
+ * Handle return code from address fault handling function:
+ * 0: Address is valid and can be offlined
+ * -EIO: Address belongs to a critical BO that cannot be offlined
+ * -EOPNOTSUPP: Address is valid and can be offlined but user policy is not to offline
+ *
+ * For any other non-zero error code, skip offlining.
+ */
+
+ switch (ret) {
+ case 0:
+ action = XE_RAS_PAGE_ACTION_OFFLINE;
+ break;
+ /* User policy set to decline page offlining */
+ case -EOPNOTSUPP:
+ action = XE_RAS_PAGE_ACTION_DECLINE;
+ break;
+ case -EIO:
+ xe_err(xe, "[RAS]: Page address belongs to critical BO: 0x%llx\n",
+ page_address);
+ return ret;
+ default:
+ xe_err(xe, "[RAS]: Failed to handle address fault 0x%llx: %d\n",
+ page_address, ret);
+ return 0;
+ }
+
+ if (send_offline_cmd) {
+ ret = send_page_offline(xe, action, page_address);
+ if (ret)
+ xe_err(xe, "sysctrl: Failed to offline page for address 0x%llx: %d\n",
+ page_address, ret);
+ return ret;
+ }
+
+ return 0;
+}
+
+static enum xe_ras_recovery_action handle_core_compute_errors(struct xe_ras_error_array *arr)
+{
+ struct xe_ras_compute_error *error_info = (void *)arr->details;
+ u8 uncorr_type;
+
+ uncorr_type = FIELD_GET(CORE_COMPUTE_UNCORR_TYPE, error_info->log_header);
+
+ /* Request a reset if error is global */
+ if (uncorr_type == GLOBAL_UNCORR_ERROR)
+ return XE_RAS_RECOVERY_ACTION_RESET;
+
+ /*
+ * No action needed for other errors.
+ * Local errors are recovered using an engine reset by GuC.
+ */
+ return XE_RAS_RECOVERY_ACTION_RECOVERED;
+}
+
+static enum xe_ras_recovery_action handle_soc_internal_errors(struct xe_device *xe,
+ struct xe_ras_error_array *arr)
+{
+ struct xe_ras_soc_error *info = (void *)arr->details;
+ struct xe_ras_soc_error_source *source = &info->source;
+ struct xe_ras_error_class *counter = &arr->counter;
+
+ if (source->csc) {
+ struct xe_ras_csc_error *csc_error = (void *)info->details;
+
+ /*
+ * CSC uncorrectable errors are classified as hardware errors and firmware errors.
+ * CSC firmware errors are critical errors that can be recovered only by firmware
+ * update via SPI driver. On a CSC firmware error, PCODE enables FDO mode and sets
+ * the bit in the capability register. On receiving this error, the driver enables
+ * runtime survivability mode which notifies userspace that a firmware update
+ * is required.
+ */
+ if (csc_error->hec_fw_error) {
+ xe_err(xe, "[RAS]: CSC %s detected: 0x%x\n",
+ sev_to_str(counter->common.severity),
+ csc_error->hec_fw_error);
+ xe_survivability_mode_runtime_enable(xe);
+ return XE_RAS_RECOVERY_ACTION_DISCONNECT;
+ }
+ } else if (source->ieh) {
+ struct xe_ras_ieh_error *ieh_error = (void *)info->details;
+
+ if (ieh_error->global_error_status & XE_RAS_SOC_IEH_PUNIT) {
+ xe_err(xe, "[RAS]: PUNIT %s detected: 0x%x\n",
+ sev_to_str(counter->common.severity),
+ ieh_error->global_error_status);
+ /* TODO: Add PUNIT error handling */
+ return XE_RAS_RECOVERY_ACTION_DISCONNECT;
+ }
+ }
+
+ /* For other SOC internal errors, request a reset as recovery mechanism */
+ return XE_RAS_RECOVERY_ACTION_RESET;
+}
+
+static enum xe_ras_recovery_action handle_device_memory_errors(struct xe_device *xe,
+ struct xe_ras_error_array *arr)
+{
+ struct xe_ras_memory_error *info = (void *)arr->details;
+ int ret;
+
+ /*
+ * For memory errors, the recovery action depends on the error category
+ *
+ * Double bit ECC: Page offline handling
+ * Poison and data parity errors: Log only
+ * For any other memory errors, request a reset as recovery mechanism
+ */
+ switch (info->category) {
+ case XE_RAS_MEMORY_ECC:
+ xe_err(xe, "[RAS]: Double-bit ECC error detected at sw address 0x%llx\n",
+ info->sw_address);
+ ret = handle_page_offline(xe, info->sw_address, true);
+ if (ret)
+ return XE_RAS_RECOVERY_ACTION_RESET;
+ break;
+ case XE_RAS_MEMORY_POISON:
+ xe_info(xe, "[RAS]: Poison error detected\n");
+ break;
+ case XE_RAS_MEMORY_DATA_PARITY:
+ xe_info(xe, "[RAS]: Data parity error detected\n");
+ break;
+ default:
+ return XE_RAS_RECOVERY_ACTION_RESET;
+ }
+
+ return XE_RAS_RECOVERY_ACTION_RECOVERED;
+}
+
+static void get_queued_pages(struct xe_device *xe)
+{
+ struct xe_sysctrl_mailbox_command command = {0};
+ struct xe_ras_page_offline_queue response = {0};
+ u32 count = 0;
+ size_t rlen;
+ int ret, i;
+
+ /* Supported only on platforms with system controller */
+ if (!xe->info.has_sysctrl)
+ return;
+
+ xe_sysctrl_create_command(&command, XE_SYSCTRL_GROUP_GFSP,
+ XE_SYSCTRL_CMD_GET_OFFLINE_QUEUE, NULL, 0, &response,
+ sizeof(response));
+
+ do {
+ memset(&response, 0, sizeof(response));
+
+ ret = xe_sysctrl_send_command(&xe->sc, &command, &rlen);
+ if (ret) {
+ xe_err(xe, "sysctrl: failed to get page offline queue %d\n", ret);
+ return;
+ }
+
+ if (rlen != sizeof(response)) {
+ xe_err(xe, "sysctrl: unexpected page offline queue response length %zu (expected %zu)\n",
+ rlen, sizeof(response));
+ return;
+ }
+
+ for (i = 0; i < response.pages_returned && i < XE_RAS_NUM_PAGES; i++)
+ handle_page_offline(xe, response.page_addresses[i], true);
+
+ count += response.pages_returned;
+ if (!response.pages_returned)
+ break;
+
+ if (count > response.total_pages) {
+ xe_err(xe, "sysctrl: Pages returned from queue exceed total pages %u, returned %u\n",
+ response.total_pages, count);
+ return;
+ }
+ } while (response.additional_data);
+}
+
+static void get_offlined_list(struct xe_device *xe)
+{
+ struct xe_sysctrl_mailbox_command command = {0};
+ struct xe_ras_offline_list_response response = {0};
+ struct xe_ras_offline_list_request request = {0};
+ u32 count = 0;
+ size_t rlen;
+ int ret, i;
+
+ /* Supported only on platforms with system controller */
+ if (!xe->info.has_sysctrl)
+ return;
+
+ xe_sysctrl_create_command(&command, XE_SYSCTRL_GROUP_GFSP, XE_SYSCTRL_CMD_GET_OFFLINE_LIST,
+ &request, sizeof(request), &response, sizeof(response));
+
+ do {
+ memset(&response, 0, sizeof(response));
+
+ request.index = count;
+
+ ret = xe_sysctrl_send_command(&xe->sc, &command, &rlen);
+ if (ret) {
+ xe_err(xe, "sysctrl: failed to get page offline list %d\n", ret);
+ return;
+ }
+
+ if (rlen != sizeof(response)) {
+ xe_err(xe, "sysctrl: unexpected page offline list response length %zu (expected %zu)\n",
+ rlen, sizeof(response));
+ return;
+ }
+
+ for (i = 0; i < response.pages_returned && i < XE_RAS_NUM_PAGES; i++)
+ handle_page_offline(xe, response.page_addresses[i], false);
+
+ count += response.pages_returned;
+ if (!response.pages_returned)
+ break;
+
+ if (count > response.total_pages) {
+ xe_err(xe, "sysctrl: Pages returned from list exceed total pages %u, returned %u\n",
+ response.total_pages, count);
+ return;
+ }
+ } while (response.additional_data);
+}
+
void xe_ras_counter_threshold_crossed(struct xe_device *xe,
struct xe_sysctrl_event_response *response)
{
@@ -91,3 +401,187 @@ void xe_ras_counter_threshold_crossed(struct xe_device *xe,
comp_to_str(component), sev_to_str(severity));
}
}
+
+/**
+ * xe_ras_process_errors() - Process and contain hardware errors
+ * @xe: xe device instance
+ *
+ * Get error details from system controller and return recovery
+ * method. Called only from PCI error handling.
+ *
+ * Returns: recovery action to be taken
+ */
+enum xe_ras_recovery_action xe_ras_process_errors(struct xe_device *xe)
+{
+ struct xe_sysctrl_mailbox_command command = {0};
+ struct xe_ras_get_soc_error response;
+ enum xe_ras_recovery_action final_action;
+ u32 remaining = XE_SYSCTRL_FLOOD_LIMIT;
+ size_t rlen;
+ int ret;
+
+ if (!xe->info.has_sysctrl)
+ return XE_RAS_RECOVERY_ACTION_RESET;
+
+ /* Default action */
+ final_action = XE_RAS_RECOVERY_ACTION_RECOVERED;
+
+ xe_sysctrl_create_command(&command, XE_SYSCTRL_GROUP_GFSP, XE_SYSCTRL_CMD_GET_SOC_ERROR,
+ NULL, 0, &response, sizeof(response));
+
+ do {
+ memset(&response, 0, sizeof(response));
+
+ ret = xe_sysctrl_send_command(&xe->sc, &command, &rlen);
+ if (ret) {
+ xe_err(xe, "sysctrl: failed to get soc error %d\n", ret);
+ goto err;
+ }
+
+ if (rlen != sizeof(response)) {
+ xe_err(xe, "sysctrl: unexpected get soc error response length %zu (expected %zu)\n",
+ rlen, sizeof(response));
+ goto err;
+ }
+
+ /* Report if number of errors exceeds the maximum errors supported */
+ if (response.num_errors > XE_RAS_NUM_ERROR_ARR)
+ xe_err(xe, "sysctrl: number of errors received %d out of bound (%d)\n",
+ response.num_errors, XE_RAS_NUM_ERROR_ARR);
+
+ for (int i = 0; i < response.num_errors && i < XE_RAS_NUM_ERROR_ARR; i++) {
+ struct xe_ras_error_array *arr = &response.arr[i];
+ enum xe_ras_recovery_action action;
+ u8 component, severity;
+
+ component = arr->counter.common.component;
+ severity = arr->counter.common.severity;
+
+ xe_err(xe, "[RAS]: %s %s detected\n", comp_to_str(component),
+ sev_to_str(severity));
+
+ switch (component) {
+ case XE_RAS_COMP_CORE_COMPUTE:
+ action = handle_core_compute_errors(arr);
+ break;
+ case XE_RAS_COMP_SOC_INTERNAL:
+ action = handle_soc_internal_errors(xe, arr);
+ break;
+ case XE_RAS_COMP_DEVICE_MEMORY:
+ action = handle_device_memory_errors(xe, arr);
+ break;
+ default:
+ /* For any other component, reset */
+ action = XE_RAS_RECOVERY_ACTION_RESET;
+ break;
+ }
+
+ /* Process and log all errors and then trigger highest recovery action */
+ if (action > final_action)
+ final_action = action;
+ }
+
+ /* Treat flooding as an system controller error */
+ if (!--remaining) {
+ xe_err(xe, "[RAS]: sysctrl: get soc error response flooding\n");
+ return XE_RAS_RECOVERY_ACTION_RESET;
+ }
+
+ } while (response.additional_errors);
+
+ return final_action;
+
+err:
+ return XE_RAS_RECOVERY_ACTION_RESET;
+}
+
+static struct pci_dev *find_usp_dev(struct pci_dev *pdev)
+{
+ struct pci_dev *vsp;
+
+ /*
+ * Device Hierarchy:
+ *
+ * Upstream Switch Port (USP) --> Virtual Switch Port (VSP) --> SGunit (GPU endpoint)
+ */
+ vsp = pci_upstream_bridge(pdev);
+ if (!vsp)
+ return NULL;
+
+ return pci_upstream_bridge(vsp);
+}
+
+static void aer_unmask_and_downgrade_internal_error(struct xe_device *xe)
+{
+ struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
+ u32 aer_uncorr_mask, aer_uncorr_sev, aer_uncorr_status;
+ struct pci_dev *usp;
+ u16 aer_cap;
+
+ usp = find_usp_dev(pdev);
+ if (!usp)
+ return;
+
+ aer_cap = pci_find_ext_capability(usp, PCI_EXT_CAP_ID_ERR);
+ if (!aer_cap) {
+ dev_info(&usp->dev, "AER capability unavailable\n");
+ return;
+ }
+
+ /*
+ * Clear any stale Uncorrectable Internal Error Status event in Uncorrectable Error
+ * Status Register.
+ */
+ pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_STATUS, &aer_uncorr_status);
+ if (aer_uncorr_status & PCI_ERR_UNC_INTN)
+ pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_STATUS, PCI_ERR_UNC_INTN);
+
+ /*
+ * All errors are steered to USP which is a PCIe AER Compliant device.
+ * Downgrade all the errors to non-fatal to prevent PCIe bus driver
+ * from triggering a Secondary Bus Reset (SBR). This allows error
+ * detection, containment and recovery in the driver.
+ *
+ * The Uncorrectable Error Severity Register has the 'Uncorrectable
+ * Internal Error Severity' set to fatal by default. Set this to
+ * non-fatal and unmask the error.
+ */
+
+ /* Initialize Uncorrectable Error Severity Register */
+ pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_SEVER, &aer_uncorr_sev);
+ aer_uncorr_sev &= ~PCI_ERR_UNC_INTN;
+ pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_SEVER, aer_uncorr_sev);
+
+ /* Initialize Uncorrectable Error Mask Register */
+ pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, &aer_uncorr_mask);
+ aer_uncorr_mask &= ~PCI_ERR_UNC_INTN;
+ pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, aer_uncorr_mask);
+
+ pci_save_state(usp);
+ dev_dbg(&usp->dev, "Uncorrectable Internal Errors downgraded and unmasked\n");
+}
+
+/**
+ * xe_ras_init - Initialize Xe RAS
+ * @xe: xe device instance
+ *
+ * Initialize Xe RAS
+ */
+void xe_ras_init(struct xe_device *xe)
+{
+ if (!xe->info.has_sysctrl)
+ return;
+
+ if (IS_ENABLED(CONFIG_PCIEAER))
+ aer_unmask_and_downgrade_internal_error(xe);
+
+ get_queued_pages(xe);
+ get_offlined_list(xe);
+
+ /*
+ * During probe, process and log any errors detected by firmware while the driver was not
+ * loaded. Critical errors such as Punit and CSC are reported through Pcode init failure,
+ * causing the driver to enter survivability mode.
+ */
+ xe_ras_process_errors(xe);
+}
diff --git a/drivers/gpu/drm/xe/xe_ras.h b/drivers/gpu/drm/xe/xe_ras.h
index ea90593b62dc..8d106c708ff1 100644
--- a/drivers/gpu/drm/xe/xe_ras.h
+++ b/drivers/gpu/drm/xe/xe_ras.h
@@ -6,10 +6,14 @@
#ifndef _XE_RAS_H_
#define _XE_RAS_H_
+#include "xe_ras_types.h"
+
struct xe_device;
struct xe_sysctrl_event_response;
void xe_ras_counter_threshold_crossed(struct xe_device *xe,
struct xe_sysctrl_event_response *response);
+void xe_ras_init(struct xe_device *xe);
+enum xe_ras_recovery_action xe_ras_process_errors(struct xe_device *xe);
#endif
diff --git a/drivers/gpu/drm/xe/xe_ras_types.h b/drivers/gpu/drm/xe/xe_ras_types.h
index 4e63c67f806a..3312c9b42fe7 100644
--- a/drivers/gpu/drm/xe/xe_ras_types.h
+++ b/drivers/gpu/drm/xe/xe_ras_types.h
@@ -8,7 +8,67 @@
#include <linux/types.h>
+#define XE_RAS_NUM_ERROR_ARR 3
#define XE_RAS_NUM_COUNTERS 16
+#define XE_RAS_SOC_IEH_PUNIT BIT(1)
+/* Device memory error categories */
+#define XE_RAS_MEMORY_ECC BIT(1)
+#define XE_RAS_MEMORY_POISON BIT(2)
+#define XE_RAS_MEMORY_DATA_PARITY BIT(5)
+
+#define XE_RAS_NUM_PAGES 25
+
+/**
+ * enum xe_ras_recovery_action - RAS recovery actions
+ *
+ * @XE_RAS_RECOVERY_ACTION_RECOVERED: Error recovered
+ * @XE_RAS_RECOVERY_ACTION_RESET: Requires reset
+ * @XE_RAS_RECOVERY_ACTION_DISCONNECT: Requires disconnect
+ * @XE_RAS_RECOVERY_ACTION_MAX: Max action value
+ *
+ * This enum defines the possible recovery actions that can be taken in response
+ * to RAS errors.
+ */
+enum xe_ras_recovery_action {
+ XE_RAS_RECOVERY_ACTION_RECOVERED = 0,
+ XE_RAS_RECOVERY_ACTION_RESET,
+ XE_RAS_RECOVERY_ACTION_DISCONNECT,
+ XE_RAS_RECOVERY_ACTION_MAX
+};
+
+/**
+ * enum xe_ras_page_action - Page offline actions for page offline request
+ *
+ * @XE_RAS_PAGE_ACTION_OFFLINE: Instruct firmware to remove page from queue
+ * @XE_RAS_PAGE_ACTION_DECLINE: Instruct firmware to mark page as not offline
+ * @XE_RAS_PAGE_ACTION_MAX: Max value for validation
+ */
+enum xe_ras_page_action {
+ XE_RAS_PAGE_ACTION_OFFLINE,
+ XE_RAS_PAGE_ACTION_DECLINE,
+ XE_RAS_PAGE_ACTION_MAX
+};
+
+/**
+ * enum xe_ras_response_status - RAS response status codes
+ *
+ * @XE_RAS_STATUS_SUCCESS: Operation successful
+ * @XE_RAS_STATUS_INVALID_PARAM: Invalid parameter
+ * @XE_RAS_STATUS_OP_NOT_SUPPORTED: Operation not supported
+ * @XE_RAS_STATUS_TIMEOUT: Operation timed out
+ * @XE_RAS_STATUS_HARDWARE_FAILURE: Hardware failure
+ * @XE_RAS_STATUS_INSUFFICIENT_RESOURCES: Insufficient resources
+ * @XE_RAS_STATUS_UNKNOWN_ERROR: Unknown error
+ */
+enum xe_ras_response_status {
+ XE_RAS_STATUS_SUCCESS = 0,
+ XE_RAS_STATUS_INVALID_PARAM,
+ XE_RAS_STATUS_OP_NOT_SUPPORTED,
+ XE_RAS_STATUS_TIMEOUT,
+ XE_RAS_STATUS_HARDWARE_FAILURE,
+ XE_RAS_STATUS_INSUFFICIENT_RESOURCES,
+ XE_RAS_STATUS_UNKNOWN_ERROR
+};
/**
* struct xe_ras_error_common - Error fields that are common across all products
@@ -70,4 +130,171 @@ struct xe_ras_threshold_crossed {
struct xe_ras_error_class counters[XE_RAS_NUM_COUNTERS];
} __packed;
+/**
+ * struct xe_ras_error_array - Details of the error types
+ */
+struct xe_ras_error_array {
+ /** @counter_value: Counter value of the returned error */
+ u32 counter_value;
+ /** @counter: Error counter */
+ struct xe_ras_error_class counter;
+ /** @timestamp: Timestamp */
+ u64 timestamp;
+ /** @details: Error details specific to the counter */
+ u32 details[XE_RAS_NUM_COUNTERS];
+} __packed;
+
+/**
+ * struct xe_ras_get_soc_error - Response from get soc error command
+ */
+struct xe_ras_get_soc_error {
+ /** @num_errors: Number of errors reported in this response */
+ u8 num_errors;
+ /** @additional_errors: Indicates if the errors are pending */
+ u8 additional_errors;
+ /** @arr: Array of up to 3 errors */
+ struct xe_ras_error_array arr[XE_RAS_NUM_ERROR_ARR];
+} __packed;
+
+/**
+ * struct xe_ras_compute_error - Error details of Core Compute error
+ */
+struct xe_ras_compute_error {
+ /** @log_header: Error Source and type */
+ u32 log_header;
+ /** @reserved: Reserved */
+ u32 reserved[15];
+} __packed;
+
+/**
+ * struct xe_ras_soc_error_source - Source of SoC error
+ */
+struct xe_ras_soc_error_source {
+ /** @csc: CSC */
+ u32 csc:1;
+ /** @ieh: IEH (Integrated Error Handler) */
+ u32 ieh:1;
+ /** @reserved: Reserved for future use */
+ u32 reserved:30;
+} __packed;
+
+/**
+ * struct xe_ras_soc_error - Error details of SoC internal error
+ */
+struct xe_ras_soc_error {
+ /** @source: Error source */
+ struct xe_ras_soc_error_source source;
+ /** @details: Error details specific to the error source */
+ u32 details[15];
+} __packed;
+
+/**
+ * struct xe_ras_csc_error - CSC error details
+ */
+struct xe_ras_csc_error {
+ /** @reserved: Reserved */
+ u32 reserved;
+ /** @hec_fw_error: CSC firmware error */
+ u32 hec_fw_error;
+} __packed;
+
+/**
+ * struct xe_ras_ieh_error - SoC IEH (Integrated Error Handler) error details
+ */
+struct xe_ras_ieh_error {
+ /** @ieh_instance: IEH instance */
+ u32 ieh_instance:2;
+ /** @reserved: Reserved for future use */
+ u32 reserved:30;
+ /** @global_error_status: Global error status */
+ u32 global_error_status;
+ /** @local_error_status: Local error status */
+ u32 local_error_status;
+ /** @gerr_mask: Global error mask */
+ u32 gerr_mask;
+ /** @info: Additional information */
+ u32 info[10];
+} __packed;
+
+/**
+ * struct xe_ras_memory_error - Device memory error details
+ */
+struct xe_ras_memory_error {
+ /** @category: Device memory error category */
+ u8 category;
+ /** @reserved: Reserved for future use */
+ u8 reserved[7];
+ /** @hardware_address: Hardware physical address details */
+ u64 hardware_address;
+ /** @sw_address: Software address where error occurred */
+ u64 sw_address;
+ /** @reserved1: Reserved */
+ u32 reserved1[10];
+} __packed;
+
+/**
+ * struct xe_ras_offline_list_request - Request for get offline list command
+ */
+struct xe_ras_offline_list_request {
+ /** @index: Zero-based index into the offline page list */
+ u32 index;
+} __packed;
+
+/**
+ * struct xe_ras_offline_list_response - Response from get offline list command
+ */
+struct xe_ras_offline_list_response {
+ /** @max_entries: Total no of pages that can be stored in flash */
+ u32 max_entries;
+ /** @total_pages: Total number of permanently offlined pages */
+ u32 total_pages;
+ /** @pages_returned: Number of pages returned in this response */
+ u32 pages_returned;
+ /** @page_addresses: Array of permanently offlined page addresses (4KB aligned) */
+ u64 page_addresses[XE_RAS_NUM_PAGES];
+ /** @additional_data: Indicates if more data is available */
+ u8 additional_data;
+ /** @reserved: Reserved for future use */
+ u8 reserved[3];
+} __packed;
+
+/**
+ * struct xe_ras_page_offline_queue - Response from get offline queue command
+ */
+struct xe_ras_page_offline_queue {
+ /** @total_pages: Total number of queued pages */
+ u32 total_pages;
+ /** @pages_returned: Number of pages returned in this response */
+ u32 pages_returned;
+ /** @page_addresses: Array of page addresses (4KB aligned) */
+ u64 page_addresses[XE_RAS_NUM_PAGES];
+ /** @additional_data: Indicates if more data is available */
+ u8 additional_data;
+ /** @reserved: Reserved for future use */
+ u8 reserved[3];
+} __packed;
+
+/**
+ * struct xe_ras_page_offline_request - Request for page offline command
+ *
+ * This structure provides the request format to offline/decline a page
+ */
+struct xe_ras_page_offline_request {
+ /** @page_address: Page address (4KB aligned) */
+ u64 page_address;
+ /** @action: Action to be performed, see &enum xe_ras_page_action */
+ u32 action;
+ /** @reserved: Reserved for future use */
+ u32 reserved;
+} __packed;
+
+/**
+ * struct xe_ras_page_offline_response - Response from page offline command
+ */
+struct xe_ras_page_offline_response {
+ /** @status: Status of the page offline request, see &enum xe_ras_response_status */
+ u32 status;
+ /** @reserved: Reserved for future use */
+ u32 reserved;
+} __packed;
#endif
diff --git a/drivers/gpu/drm/xe/xe_survivability_mode.c b/drivers/gpu/drm/xe/xe_survivability_mode.c
index 427afd144f3a..4c506027fa94 100644
--- a/drivers/gpu/drm/xe/xe_survivability_mode.c
+++ b/drivers/gpu/drm/xe/xe_survivability_mode.c
@@ -54,7 +54,6 @@
* # cat /sys/bus/pci/devices/<device>/survivability_mode
* Boot
*
- *
* Any additional debug information if present will be visible under the directory
* ``survivability_info``::
*
@@ -98,6 +97,15 @@
* # cat /sys/bus/pci/devices/<device>/survivability_mode
* Runtime
*
+ * On some CSC firmware errors, PCODE sets FDO mode and the only recovery possible is through
+ * firmware flash using SPI driver. Userspace can check if FDO mode is set by checking the below
+ * sysfs entry.
+ *
+ * .. code-block:: shell
+ *
+ * # cat /sys/bus/pci/devices/<device>/survivability_info/fdo_mode
+ * enabled
+ *
* When such errors occur, userspace is notified with the drm device wedged uevent and runtime
* survivability mode. User can then initiate a firmware flash using userspace tools like fwupd
* to restore device to normal operation.
@@ -296,7 +304,8 @@ static int create_survivability_sysfs(struct pci_dev *pdev)
if (ret)
return ret;
- if (check_boot_failure(xe)) {
+ /* Survivability info is not required if enabled via configfs */
+ if (!xe_configfs_get_survivability_mode(pdev)) {
ret = devm_device_add_group(dev, &survivability_info_group);
if (ret)
return ret;
diff --git a/drivers/gpu/drm/xe/xe_sysctrl_event.c b/drivers/gpu/drm/xe/xe_sysctrl_event.c
index b4d17329af6c..da395148ee9d 100644
--- a/drivers/gpu/drm/xe/xe_sysctrl_event.c
+++ b/drivers/gpu/drm/xe/xe_sysctrl_event.c
@@ -16,7 +16,7 @@ static void get_pending_event(struct xe_sysctrl *sc, struct xe_sysctrl_mailbox_c
{
struct xe_sysctrl_event_response *response = command->data_out;
struct xe_device *xe = sc_to_xe(sc);
- u32 count = XE_SYSCTRL_EVENT_FLOOD;
+ u32 count = XE_SYSCTRL_FLOOD_LIMIT;
size_t len;
int ret;
diff --git a/drivers/gpu/drm/xe/xe_sysctrl_event_types.h b/drivers/gpu/drm/xe/xe_sysctrl_event_types.h
index c16c66b9fa7f..348768ca454a 100644
--- a/drivers/gpu/drm/xe/xe_sysctrl_event_types.h
+++ b/drivers/gpu/drm/xe/xe_sysctrl_event_types.h
@@ -10,9 +10,6 @@
#define XE_SYSCTRL_EVENT_DATA_LEN 59
-/* Modify as needed */
-#define XE_SYSCTRL_EVENT_FLOOD 16
-
/**
* enum xe_sysctrl_event - Events reported by System Controller
*
diff --git a/drivers/gpu/drm/xe/xe_sysctrl_mailbox.c b/drivers/gpu/drm/xe/xe_sysctrl_mailbox.c
index 3caa9f15875f..f49d8dabcf73 100644
--- a/drivers/gpu/drm/xe/xe_sysctrl_mailbox.c
+++ b/drivers/gpu/drm/xe/xe_sysctrl_mailbox.c
@@ -307,6 +307,34 @@ void xe_sysctrl_mailbox_init(struct xe_sysctrl *sc)
sc->phase_bit = (ctrl_reg & SYSCTRL_FRAME_PHASE) ? 1 : 0;
}
+/**
+ * xe_sysctrl_create_command() - Create System controller command structure
+ * @command: Sysctrl command structure
+ * @group_id: Command group ID
+ * @cmd_id: Command ID
+ * @request: Pointer to request buffer (can be NULL)
+ * @request_len: Size of request buffer
+ * @response: Pointer to response buffer
+ * @response_len: Size of response buffer
+ *
+ * Helper function to create sysctrl command to be sent via xe_sysctrl_send_command()
+ */
+void xe_sysctrl_create_command(struct xe_sysctrl_mailbox_command *command, u8 group_id, u8 cmd_id,
+ void *request, size_t request_len, void *response,
+ size_t response_len)
+{
+ struct xe_sysctrl_app_msg_hdr header = {0};
+
+ header.data = FIELD_PREP(APP_HDR_GROUP_ID_MASK, group_id) |
+ FIELD_PREP(APP_HDR_COMMAND_MASK, cmd_id);
+
+ command->header = header;
+ command->data_in = request;
+ command->data_in_len = request_len;
+ command->data_out = response;
+ command->data_out_len = response_len;
+}
+
/**
* xe_sysctrl_send_command() - Send mailbox command to System Controller
* @sc: System Controller instance
diff --git a/drivers/gpu/drm/xe/xe_sysctrl_mailbox.h b/drivers/gpu/drm/xe/xe_sysctrl_mailbox.h
index f67e9234de48..0ba841b0be1b 100644
--- a/drivers/gpu/drm/xe/xe_sysctrl_mailbox.h
+++ b/drivers/gpu/drm/xe/xe_sysctrl_mailbox.h
@@ -27,5 +27,7 @@ void xe_sysctrl_mailbox_init(struct xe_sysctrl *sc);
int xe_sysctrl_send_command(struct xe_sysctrl *sc,
struct xe_sysctrl_mailbox_command *cmd,
size_t *rdata_len);
-
+void xe_sysctrl_create_command(struct xe_sysctrl_mailbox_command *command, u8 group_id, u8 cmd_id,
+ void *request, size_t request_len, void *response,
+ size_t response_len);
#endif
diff --git a/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h b/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
index 84d7c647e743..f6cbb349c416 100644
--- a/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
+++ b/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
@@ -22,10 +22,18 @@ enum xe_sysctrl_group {
/**
* enum xe_sysctrl_gfsp_cmd - Commands supported by GFSP group
*
+ * @XE_SYSCTRL_CMD_GET_SOC_ERROR: Retrieve basic error information
* @XE_SYSCTRL_CMD_GET_PENDING_EVENT: Retrieve pending event
+ * @XE_SYSCTRL_CMD_PAGE_OFFLINE: Instruct firmware to offline/decline a page
+ * @XE_SYSCTRL_CMD_GET_OFFLINE_LIST: Retrieve list of all offlined pages from flash
+ * @XE_SYSCTRL_CMD_GET_OFFLINE_QUEUE: Retrieve list of offlined queued pages from firmware
*/
enum xe_sysctrl_gfsp_cmd {
+ XE_SYSCTRL_CMD_GET_SOC_ERROR = 0x01,
XE_SYSCTRL_CMD_GET_PENDING_EVENT = 0x07,
+ XE_SYSCTRL_CMD_PAGE_OFFLINE = 0x08,
+ XE_SYSCTRL_CMD_GET_OFFLINE_LIST = 0x09,
+ XE_SYSCTRL_CMD_GET_OFFLINE_QUEUE = 0x0A,
};
/**
@@ -48,6 +56,9 @@ struct xe_sysctrl_mailbox_command {
size_t data_out_len;
};
+/* Modify as needed */
+#define XE_SYSCTRL_FLOOD_LIMIT 16
+
#define XE_SYSCTRL_MB_FRAME_SIZE 16
#define XE_SYSCTRL_MB_MAX_FRAMES 64
#define XE_SYSCTRL_MB_MAX_MESSAGE_SIZE \
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v8 2/6] drm: Add DRM_WEDGE_RECOVERY_COLD_RESET recovery method
2026-06-12 8:07 [PATCH v8 0/6] Introduce cold reset recovery method Mallesh Koujalagi
2026-06-12 8:07 ` [PATCH v8 1/6] Introduce Xe Uncorrectable Error Handling Mallesh Koujalagi
@ 2026-06-12 8:07 ` Mallesh Koujalagi
2026-06-12 8:07 ` [PATCH v8 3/6] drm/doc: Document " Mallesh Koujalagi
` (7 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Mallesh Koujalagi @ 2026-06-12 8:07 UTC (permalink / raw)
To: intel-xe, dri-devel, rodrigo.vivi
Cc: andrealmeid, christian.koenig, airlied, simona.vetter, mripard,
maarten.lankhorst, tzimmermann, anshuman.gupta, badal.nilawar,
riana.tauro, karthik.poosa, sk.anirban, raag.jadav,
Mallesh Koujalagi
Introduce DRM_WEDGE_RECOVERY_COLD_RESET (BIT(4)) recovery method to handle
scenarios requiring device power cycle.
This method addresses cases where other recovery mechanisms
(driver reload, PCIe reset, etc.) are insufficient to restore device
functionality. When set, it indicates to userspace that only device power
cycle can recover the device from its current error state.
Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
---
v3:
- Update any scenario that requires cold-reset. (Riana)
v4:
- Rename cold reset to power cycle. (Raag)
v5:
- Make it consistent with consumer expectations. (Raag)
v6:
- Drop "last resort" wording. (Riana)
---
drivers/gpu/drm/drm_drv.c | 2 ++
include/drm/drm_device.h | 1 +
2 files changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index 1ff0bf7cba6a..94abe7a811a5 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -537,6 +537,8 @@ static const char *drm_get_wedge_recovery(unsigned int opt)
return "bus-reset";
case DRM_WEDGE_RECOVERY_VENDOR:
return "vendor-specific";
+ case DRM_WEDGE_RECOVERY_COLD_RESET:
+ return "cold-reset";
default:
return NULL;
}
diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h
index 768a8dae83c5..75f030d027ee 100644
--- a/include/drm/drm_device.h
+++ b/include/drm/drm_device.h
@@ -37,6 +37,7 @@ struct pci_controller;
#define DRM_WEDGE_RECOVERY_REBIND BIT(1) /* unbind + bind driver */
#define DRM_WEDGE_RECOVERY_BUS_RESET BIT(2) /* unbind + reset bus device + bind */
#define DRM_WEDGE_RECOVERY_VENDOR BIT(3) /* vendor specific recovery method */
+#define DRM_WEDGE_RECOVERY_COLD_RESET BIT(4) /* remove device + slot power cycle + rescan */
/**
* struct drm_wedge_task_info - information about the guilty task of a wedge dev
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v8 3/6] drm/doc: Document DRM_WEDGE_RECOVERY_COLD_RESET recovery method
2026-06-12 8:07 [PATCH v8 0/6] Introduce cold reset recovery method Mallesh Koujalagi
2026-06-12 8:07 ` [PATCH v8 1/6] Introduce Xe Uncorrectable Error Handling Mallesh Koujalagi
2026-06-12 8:07 ` [PATCH v8 2/6] drm: Add DRM_WEDGE_RECOVERY_COLD_RESET recovery method Mallesh Koujalagi
@ 2026-06-12 8:07 ` Mallesh Koujalagi
2026-06-12 8:07 ` [PATCH v8 4/6] drm/xe: Handle PUNIT errors by requesting cold-reset recovery Mallesh Koujalagi
` (6 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Mallesh Koujalagi @ 2026-06-12 8:07 UTC (permalink / raw)
To: intel-xe, dri-devel, rodrigo.vivi
Cc: andrealmeid, christian.koenig, airlied, simona.vetter, mripard,
maarten.lankhorst, tzimmermann, anshuman.gupta, badal.nilawar,
riana.tauro, karthik.poosa, sk.anirban, raag.jadav,
Mallesh Koujalagi
When ``WEDGED=cold-reset`` is sent, it indicates that the device has
encountered an error condition that cannot be resolved through other
recovery methods such as driver rebind or bus reset, and requires a
complete device power cycle to restore functionality.
Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
---
v2:
- Add several instead of number to avoid update. (Jani)
v3:
- Update document with generic scenario. (Riana)
- Consistent with terminology. (Raag)
- Remove already covered information.
v4:
- Update doc. (Raag/Riana)
- Change commit message.
v5:
- Update commit message. (Raag)
- Remove unbind.
- Simplify cold-reset script.
v6:
- Look up the hotplug slot in DEVPATH instead of scanning
every PCI slot on the system. (Raag)
- Drop arbitrary sleep values from the example script.
v7:
- Update recovery script. (Raag)
v8:
- Add rescan instead of reset. (Raag)
---
Documentation/gpu/drm-uapi.rst | 85 +++++++++++++++++++++++++++++++++-
1 file changed, 84 insertions(+), 1 deletion(-)
diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
index 93df92c4ac8c..43cdac65d04f 100644
--- a/Documentation/gpu/drm-uapi.rst
+++ b/Documentation/gpu/drm-uapi.rst
@@ -424,7 +424,7 @@ needed.
Recovery
--------
-Current implementation defines four recovery methods, out of which, drivers
+Current implementation defines several recovery methods, out of which, drivers
can use any one, multiple or none. Method(s) of choice will be sent in the
uevent environment as ``WEDGED=<method1>[,..,<methodN>]`` in order of less to
more side-effects. See the section `Vendor Specific Recovery`_
@@ -441,6 +441,7 @@ following expectations.
rebind unbind + bind driver
bus-reset unbind + bus reset/re-enumeration + bind
vendor-specific vendor specific recovery method
+ cold-reset remove device + slot power cycle + rescan
unknown consumer policy
=============== ========================================
@@ -453,6 +454,17 @@ debug purpose in order to root cause the hang. This is useful because the first
hang is usually the most critical one which can result in consequential hangs
or complete wedging.
+Cold Reset Recovery
+-------------------
+
+When ``WEDGED=cold-reset`` is sent, it indicates that the device has
+encountered an error condition that cannot be resolved through other
+recovery methods such as driver rebind or bus reset, and requires a complete
+device power cycle to restore functionality.
+
+This method is used by devices that are plugged directly into the PCIe slot
+which supports removing the power.
+
Vendor Specific Recovery
------------------------
@@ -530,6 +542,77 @@ Recovery script::
echo -n $DEVICE > $DRIVER/unbind
echo -n $DEVICE > $DRIVER/bind
+Example - cold-reset
+--------------------
+
+Udev rule::
+
+ SUBSYSTEM=="drm", ENV{WEDGED}=="cold-reset", DEVPATH=="*/drm/card[0-9]",
+ RUN+="/path/to/cold-reset.sh $env{DEVPATH}"
+
+Recovery script::
+
+ #!/bin/sh
+ die() { echo "ERROR: $*" >&2; exit 1; }
+
+ [ -n "$1" ] || die "Usage: $0 <device-path>"
+
+ PCI_DEVS=/sys/bus/pci/devices
+ PCI_SLOTS=/sys/bus/pci/slots
+
+ syspath=$(readlink -f "/sys/$1/device" 2>/dev/null || readlink -f "/sys/$1" 2>/dev/null)
+ [ -n "$syspath" ] || die "cannot resolve sysfs path for: $1"
+
+ dev=$(basename "$syspath")
+ [ -e "$PCI_DEVS/$dev" ] || die "not a PCI device: $dev"
+ echo "device : $dev"
+
+ slot=""
+ walk=$(dirname "$(readlink -f "$PCI_DEVS/$dev")")
+
+ while true; do
+ ancestor=$(basename "$walk")
+ case "$ancestor" in pci*) break ;; esac # reached the virtual bus root
+
+ ancestor_nofn=${ancestor%.*} # strip function: 0000:03:01.0 -> 0000:03:01
+
+ for f in "$PCI_SLOTS"/*/address; do
+ [ -f "$f" ] || continue
+ addr=$(cat "$f")
+ case "$ancestor_nofn" in
+ *"$addr") slot=$(basename "$(dirname "$f")"); break ;;
+ esac
+ done
+
+ if [ -n "$slot" ] && [ -e "$PCI_SLOTS/$slot/power" ]; then
+ echo "slot : $slot (port $ancestor)"
+ break
+ fi
+ slot=""
+ walk=$(dirname "$walk")
+ done
+
+ [ -n "$slot" ] || die "no hotplug slot with power control found in PCIe topology"
+
+ # Cold reset: remove the device, cut slot power, restore power, rescan.
+ echo "Removing $dev..."
+ [ -e "$PCI_DEVS/$dev" ] && echo 1 > "$PCI_DEVS/$dev/remove"
+
+ echo "Powering off slot $slot..."
+ echo 0 > "$PCI_SLOTS/$slot/power"
+
+ echo "Powering on slot $slot..."
+ echo 1 > "$PCI_SLOTS/$slot/power"
+
+ echo "Rescanning PCI bus..."
+ echo 1 > /sys/bus/pci/rescan
+
+ if [ -e "$PCI_DEVS/$dev" ]; then
+ echo "Done: $dev is back online."
+ else
+ echo "WARNING: $dev did not re-appear after rescan."
+ fi
+
Customization
-------------
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v8 4/6] drm/xe: Handle PUNIT errors by requesting cold-reset recovery
2026-06-12 8:07 [PATCH v8 0/6] Introduce cold reset recovery method Mallesh Koujalagi
` (2 preceding siblings ...)
2026-06-12 8:07 ` [PATCH v8 3/6] drm/doc: Document " Mallesh Koujalagi
@ 2026-06-12 8:07 ` Mallesh Koujalagi
2026-06-12 8:27 ` sashiko-bot
2026-06-12 8:07 ` [PATCH v8 5/6] drm/xe: Suppress Surprise Link Down on device Mallesh Koujalagi
` (5 subsequent siblings)
9 siblings, 1 reply; 17+ messages in thread
From: Mallesh Koujalagi @ 2026-06-12 8:07 UTC (permalink / raw)
To: intel-xe, dri-devel, rodrigo.vivi
Cc: andrealmeid, christian.koenig, airlied, simona.vetter, mripard,
maarten.lankhorst, tzimmermann, anshuman.gupta, badal.nilawar,
riana.tauro, karthik.poosa, sk.anirban, raag.jadav,
Mallesh Koujalagi
When PUNIT (power management unit) errors are detected that persist across
warm resets, mark the device as wedged with DRM_WEDGE_RECOVERY_COLD_RESET
and notify userspace that a complete device power cycle is required to
restore normal operation.
Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
---
v3:
- Use PUNIT instead of PMU. (Riana)
- Use consistent wording.
- Remove log. (Raag)
v4:
- Make function static. (Raag)
v5:
- Remove kdoc for static function. (Raag)
- Remove xe_ prefix for static function.
---
drivers/gpu/drm/xe/xe_ras.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
index 6a5140bc47e5..93a56a0269f1 100644
--- a/drivers/gpu/drm/xe/xe_ras.c
+++ b/drivers/gpu/drm/xe/xe_ras.c
@@ -4,6 +4,8 @@
*/
#include "xe_bo.h"
+#include "xe_assert.h"
+#include "xe_device_types.h"
#include "xe_device.h"
#include "xe_printk.h"
#include "xe_ras.h"
@@ -207,6 +209,12 @@ static enum xe_ras_recovery_action handle_core_compute_errors(struct xe_ras_erro
return XE_RAS_RECOVERY_ACTION_RECOVERED;
}
+static void punit_error_handler(struct xe_device *xe)
+{
+ xe_device_set_wedged_method(xe, DRM_WEDGE_RECOVERY_COLD_RESET);
+ xe_device_declare_wedged(xe);
+}
+
static enum xe_ras_recovery_action handle_soc_internal_errors(struct xe_device *xe,
struct xe_ras_error_array *arr)
{
@@ -239,7 +247,7 @@ static enum xe_ras_recovery_action handle_soc_internal_errors(struct xe_device *
xe_err(xe, "[RAS]: PUNIT %s detected: 0x%x\n",
sev_to_str(counter->common.severity),
ieh_error->global_error_status);
- /* TODO: Add PUNIT error handling */
+ punit_error_handler(xe);
return XE_RAS_RECOVERY_ACTION_DISCONNECT;
}
}
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v8 5/6] drm/xe: Suppress Surprise Link Down on device
2026-06-12 8:07 [PATCH v8 0/6] Introduce cold reset recovery method Mallesh Koujalagi
` (3 preceding siblings ...)
2026-06-12 8:07 ` [PATCH v8 4/6] drm/xe: Handle PUNIT errors by requesting cold-reset recovery Mallesh Koujalagi
@ 2026-06-12 8:07 ` Mallesh Koujalagi
2026-06-12 8:21 ` sashiko-bot
` (2 more replies)
2026-06-12 8:07 ` [PATCH v8 6/6] drm/xe/ras: Add debugfs entry to inject punit error Mallesh Koujalagi
` (4 subsequent siblings)
9 siblings, 3 replies; 17+ messages in thread
From: Mallesh Koujalagi @ 2026-06-12 8:07 UTC (permalink / raw)
To: intel-xe, dri-devel, rodrigo.vivi
Cc: andrealmeid, christian.koenig, airlied, simona.vetter, mripard,
maarten.lankhorst, tzimmermann, anshuman.gupta, badal.nilawar,
riana.tauro, karthik.poosa, sk.anirban, raag.jadav,
Mallesh Koujalagi
PUNIT errors can only be recovered using a power-cycle. Xe KMD
sends a uevent to notify userspace to trigger a power cycle.
On platforms where link drop caused by powering the device off and
back on is reported by hardware as a Surprise Link Down (SLD), which
AER then escalates as an Uncorrectable Fatal Error. That error fires
before the device finishes coming back up and defeats the
very recovery we are attempting.
To keep the expected, recovery-induced link drop from being raised as
a fatal AER event, mask the Surprise Link Down bit
(PCI_ERR_UNC_SURPDN) in the upstream port's AER Uncorrectable Error
Mask register before punit_error_handler() requests the cold reset.
Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
---
v6:
- Expand commit message to explain why SUR_DN is masked. (Raag/Riana)
- Check Slot Implemented bit before reading Slot Capabilities, per
PCIe spec. (Riana)
- Add debug log.
v7:
- Handle surprise link down event properly. (Aravind/Riana)
- Update commit message. (Riana)
- Correct log message.
v8:
- Use find_usp_dev() in punit_error_handler() function.
---
drivers/gpu/drm/xe/xe_ras.c | 65 ++++++++++++++++++++++++++++---------
1 file changed, 49 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
index 93a56a0269f1..15c2fa0d323a 100644
--- a/drivers/gpu/drm/xe/xe_ras.c
+++ b/drivers/gpu/drm/xe/xe_ras.c
@@ -209,8 +209,57 @@ static enum xe_ras_recovery_action handle_core_compute_errors(struct xe_ras_erro
return XE_RAS_RECOVERY_ACTION_RECOVERED;
}
+static struct pci_dev *find_usp_dev(struct pci_dev *pdev)
+{
+ struct pci_dev *vsp;
+
+ /*
+ * Device Hierarchy:
+ *
+ * Upstream Switch Port (USP) --> Virtual Switch Port (VSP) --> SGunit (GPU endpoint)
+ */
+ vsp = pci_upstream_bridge(pdev);
+ if (!vsp)
+ return NULL;
+
+ return pci_upstream_bridge(vsp);
+}
+
+#ifdef CONFIG_PCIEAER
+static void pcie_suppress_surprise_link_down(struct pci_dev *usp)
+{
+ u32 aer_uncorr_mask;
+ u16 aer_cap;
+
+ aer_cap = usp->aer_cap;
+ if (!aer_cap) {
+ dev_dbg(&usp->dev,
+ "AER capability not present\n");
+ return;
+ }
+
+ pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, &aer_uncorr_mask);
+ aer_uncorr_mask |= PCI_ERR_UNC_SURPDN;
+ pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, aer_uncorr_mask);
+ dev_dbg(&usp->dev, "Surprise Link Down masked for cold reset\n");
+}
+#endif /* CONFIG_PCIEAER */
+
static void punit_error_handler(struct xe_device *xe)
{
+#ifdef CONFIG_PCIEAER
+ struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
+ struct pci_dev *usp;
+
+ /*
+ * Cold reset power-cycles the slot, dropping the PCIe link. The
+ * slot triggers a spurious Surprise Link Down AER event on the USP.
+ */
+ usp = find_usp_dev(pdev);
+
+ if (usp)
+ pcie_suppress_surprise_link_down(usp);
+#endif
xe_device_set_wedged_method(xe, DRM_WEDGE_RECOVERY_COLD_RESET);
xe_device_declare_wedged(xe);
}
@@ -503,22 +552,6 @@ enum xe_ras_recovery_action xe_ras_process_errors(struct xe_device *xe)
return XE_RAS_RECOVERY_ACTION_RESET;
}
-static struct pci_dev *find_usp_dev(struct pci_dev *pdev)
-{
- struct pci_dev *vsp;
-
- /*
- * Device Hierarchy:
- *
- * Upstream Switch Port (USP) --> Virtual Switch Port (VSP) --> SGunit (GPU endpoint)
- */
- vsp = pci_upstream_bridge(pdev);
- if (!vsp)
- return NULL;
-
- return pci_upstream_bridge(vsp);
-}
-
static void aer_unmask_and_downgrade_internal_error(struct xe_device *xe)
{
struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v8 6/6] drm/xe/ras: Add debugfs entry to inject punit error
2026-06-12 8:07 [PATCH v8 0/6] Introduce cold reset recovery method Mallesh Koujalagi
` (4 preceding siblings ...)
2026-06-12 8:07 ` [PATCH v8 5/6] drm/xe: Suppress Surprise Link Down on device Mallesh Koujalagi
@ 2026-06-12 8:07 ` Mallesh Koujalagi
2026-06-12 8:23 ` sashiko-bot
2026-06-12 8:16 ` ✗ CI.checkpatch: warning for Introduce cold reset recovery method (rev8) Patchwork
` (3 subsequent siblings)
9 siblings, 1 reply; 17+ messages in thread
From: Mallesh Koujalagi @ 2026-06-12 8:07 UTC (permalink / raw)
To: intel-xe, dri-devel, rodrigo.vivi
Cc: andrealmeid, christian.koenig, airlied, simona.vetter, mripard,
maarten.lankhorst, tzimmermann, anshuman.gupta, badal.nilawar,
riana.tauro, karthik.poosa, sk.anirban, raag.jadav,
Mallesh Koujalagi
Add a write-only debugfs file 'inject_punit_error' under the DRI device
root that allows triggering punit_error_handler() manually for testing.
Writing 1 to the file invokes the handler, which suppresses spurious
Surprise Link Down AER events on slots and declares the
device wedged for a cold reset recovery.
Usage:
echo 1 > /sys/kernel/debug/dri/<N>/inject_punit_error
Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
---
drivers/gpu/drm/xe/xe_debugfs.c | 3 +++
drivers/gpu/drm/xe/xe_ras.c | 35 +++++++++++++++++++++++++++++++++
drivers/gpu/drm/xe/xe_ras.h | 7 +++++++
3 files changed, 45 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_debugfs.c b/drivers/gpu/drm/xe/xe_debugfs.c
index 22b471303984..bf22985a9fcf 100644
--- a/drivers/gpu/drm/xe/xe_debugfs.c
+++ b/drivers/gpu/drm/xe/xe_debugfs.c
@@ -28,6 +28,7 @@
#include "xe_sriov_pf_debugfs.h"
#include "xe_sriov_vf.h"
#include "xe_step.h"
+#include "xe_ras.h"
#include "xe_tile_debugfs.h"
#include "xe_vsec.h"
#include "xe_wa.h"
@@ -610,6 +611,8 @@ void xe_debugfs_register(struct xe_device *xe)
xe_psmi_debugfs_register(xe);
+ xe_ras_debugfs_register(xe, root);
+
fault_create_debugfs_attr("fail_gt_reset", root, >_reset_failure);
if (IS_SRIOV_PF(xe))
diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
index 15c2fa0d323a..281ef4dce218 100644
--- a/drivers/gpu/drm/xe/xe_ras.c
+++ b/drivers/gpu/drm/xe/xe_ras.c
@@ -3,6 +3,8 @@
* Copyright © 2026 Intel Corporation
*/
+#include <linux/debugfs.h>
+
#include "xe_bo.h"
#include "xe_assert.h"
#include "xe_device_types.h"
@@ -552,6 +554,39 @@ enum xe_ras_recovery_action xe_ras_process_errors(struct xe_device *xe)
return XE_RAS_RECOVERY_ACTION_RESET;
}
+#ifdef CONFIG_DRM_XE_DEBUG
+static ssize_t inject_punit_error_write(struct file *f, const char __user *ubuf,
+ size_t size, loff_t *pos)
+{
+ struct xe_device *xe = f->private_data;
+ u32 val;
+ int ret;
+
+ ret = kstrtouint_from_user(ubuf, size, 0, &val);
+ if (ret)
+ return ret;
+
+ if (val != 1)
+ return -EINVAL;
+
+ punit_error_handler(xe);
+
+ return size;
+}
+
+static const struct file_operations inject_punit_error_fops = {
+ .owner = THIS_MODULE,
+ .open = simple_open,
+ .write = inject_punit_error_write,
+};
+
+void xe_ras_debugfs_register(struct xe_device *xe, struct dentry *root)
+{
+ debugfs_create_file("inject_punit_error", 0200, root, xe,
+ &inject_punit_error_fops);
+}
+#endif /* CONFIG_DRM_XE_DEBUG */
+
static void aer_unmask_and_downgrade_internal_error(struct xe_device *xe)
{
struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
diff --git a/drivers/gpu/drm/xe/xe_ras.h b/drivers/gpu/drm/xe/xe_ras.h
index 8d106c708ff1..c9f84ef238c4 100644
--- a/drivers/gpu/drm/xe/xe_ras.h
+++ b/drivers/gpu/drm/xe/xe_ras.h
@@ -16,4 +16,11 @@ void xe_ras_counter_threshold_crossed(struct xe_device *xe,
void xe_ras_init(struct xe_device *xe);
enum xe_ras_recovery_action xe_ras_process_errors(struct xe_device *xe);
+#ifdef CONFIG_DRM_XE_DEBUG
+struct dentry;
+void xe_ras_debugfs_register(struct xe_device *xe, struct dentry *root);
+#else
+static inline void xe_ras_debugfs_register(struct xe_device *xe, struct dentry *root) {}
+#endif
+
#endif
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* ✗ CI.checkpatch: warning for Introduce cold reset recovery method (rev8)
2026-06-12 8:07 [PATCH v8 0/6] Introduce cold reset recovery method Mallesh Koujalagi
` (5 preceding siblings ...)
2026-06-12 8:07 ` [PATCH v8 6/6] drm/xe/ras: Add debugfs entry to inject punit error Mallesh Koujalagi
@ 2026-06-12 8:16 ` Patchwork
2026-06-12 8:18 ` ✓ CI.KUnit: success " Patchwork
` (2 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Patchwork @ 2026-06-12 8:16 UTC (permalink / raw)
To: Mallesh Koujalagi; +Cc: intel-xe
== Series Details ==
Series: Introduce cold reset recovery method (rev8)
URL : https://patchwork.freedesktop.org/series/163428/
State : warning
== Summary ==
+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
061140b9bc586ae7f40abc1249c97e1cc72d1b9d
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit dd455b46800eb95a9023673fb3ed152a6286ce38
Author: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
Date: Fri Jun 12 13:37:29 2026 +0530
drm/xe/ras: Add debugfs entry to inject punit error
Add a write-only debugfs file 'inject_punit_error' under the DRI device
root that allows triggering punit_error_handler() manually for testing.
Writing 1 to the file invokes the handler, which suppresses spurious
Surprise Link Down AER events on slots and declares the
device wedged for a cold reset recovery.
Usage:
echo 1 > /sys/kernel/debug/dri/<N>/inject_punit_error
Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
+ /mt/dim checkpatch 9ce63e3d9903982c053b125686abcb7a84357ed4 drm-intel
c9a46513efa5 Introduce Xe Uncorrectable Error Handling
-:297: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#297:
new file mode 100644
total: 0 errors, 1 warnings, 0 checks, 1266 lines checked
0cad634c8bcd drm: Add DRM_WEDGE_RECOVERY_COLD_RESET recovery method
c8f81118a89d drm/doc: Document DRM_WEDGE_RECOVERY_COLD_RESET recovery method
927f5643b21c drm/xe: Handle PUNIT errors by requesting cold-reset recovery
e855ecfbd078 drm/xe: Suppress Surprise Link Down on device
dd455b46800e drm/xe/ras: Add debugfs entry to inject punit error
^ permalink raw reply [flat|nested] 17+ messages in thread
* ✓ CI.KUnit: success for Introduce cold reset recovery method (rev8)
2026-06-12 8:07 [PATCH v8 0/6] Introduce cold reset recovery method Mallesh Koujalagi
` (6 preceding siblings ...)
2026-06-12 8:16 ` ✗ CI.checkpatch: warning for Introduce cold reset recovery method (rev8) Patchwork
@ 2026-06-12 8:18 ` Patchwork
2026-06-12 9:03 ` ✓ Xe.CI.BAT: " Patchwork
2026-06-13 1:18 ` ✓ Xe.CI.FULL: " Patchwork
9 siblings, 0 replies; 17+ messages in thread
From: Patchwork @ 2026-06-12 8:18 UTC (permalink / raw)
To: Mallesh Koujalagi; +Cc: intel-xe
== Series Details ==
Series: Introduce cold reset recovery method (rev8)
URL : https://patchwork.freedesktop.org/series/163428/
State : success
== Summary ==
+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
[08:16:58] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[08:17:02] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[08:17:34] Starting KUnit Kernel (1/1)...
[08:17:34] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[08:17:34] ================== guc_buf (11 subtests) ===================
[08:17:34] [PASSED] test_smallest
[08:17:34] [PASSED] test_largest
[08:17:34] [PASSED] test_granular
[08:17:34] [PASSED] test_unique
[08:17:34] [PASSED] test_overlap
[08:17:34] [PASSED] test_reusable
[08:17:34] [PASSED] test_too_big
[08:17:34] [PASSED] test_flush
[08:17:34] [PASSED] test_lookup
[08:17:34] [PASSED] test_data
[08:17:34] [PASSED] test_class
[08:17:34] ===================== [PASSED] guc_buf =====================
[08:17:34] =================== guc_dbm (7 subtests) ===================
[08:17:34] [PASSED] test_empty
[08:17:34] [PASSED] test_default
[08:17:34] ======================== test_size ========================
[08:17:34] [PASSED] 4
[08:17:34] [PASSED] 8
[08:17:34] [PASSED] 32
[08:17:34] [PASSED] 256
[08:17:34] ==================== [PASSED] test_size ====================
[08:17:34] ======================= test_reuse ========================
[08:17:34] [PASSED] 4
[08:17:34] [PASSED] 8
[08:17:34] [PASSED] 32
[08:17:34] [PASSED] 256
[08:17:34] =================== [PASSED] test_reuse ====================
[08:17:34] =================== test_range_overlap ====================
[08:17:34] [PASSED] 4
[08:17:34] [PASSED] 8
[08:17:34] [PASSED] 32
[08:17:34] [PASSED] 256
[08:17:34] =============== [PASSED] test_range_overlap ================
[08:17:34] =================== test_range_compact ====================
[08:17:34] [PASSED] 4
[08:17:34] [PASSED] 8
[08:17:34] [PASSED] 32
[08:17:34] [PASSED] 256
[08:17:34] =============== [PASSED] test_range_compact ================
[08:17:34] ==================== test_range_spare =====================
[08:17:34] [PASSED] 4
[08:17:34] [PASSED] 8
[08:17:34] [PASSED] 32
[08:17:34] [PASSED] 256
[08:17:34] ================ [PASSED] test_range_spare =================
[08:17:34] ===================== [PASSED] guc_dbm =====================
[08:17:34] =================== guc_idm (6 subtests) ===================
[08:17:34] [PASSED] bad_init
[08:17:34] [PASSED] no_init
[08:17:34] [PASSED] init_fini
[08:17:34] [PASSED] check_used
[08:17:34] [PASSED] check_quota
[08:17:34] [PASSED] check_all
[08:17:34] ===================== [PASSED] guc_idm =====================
[08:17:34] ================== no_relay (3 subtests) ===================
[08:17:34] [PASSED] xe_drops_guc2pf_if_not_ready
[08:17:34] [PASSED] xe_drops_guc2vf_if_not_ready
[08:17:34] [PASSED] xe_rejects_send_if_not_ready
[08:17:34] ==================== [PASSED] no_relay =====================
[08:17:34] ================== pf_relay (14 subtests) ==================
[08:17:34] [PASSED] pf_rejects_guc2pf_too_short
[08:17:34] [PASSED] pf_rejects_guc2pf_too_long
[08:17:34] [PASSED] pf_rejects_guc2pf_no_payload
[08:17:34] [PASSED] pf_fails_no_payload
[08:17:34] [PASSED] pf_fails_bad_origin
[08:17:34] [PASSED] pf_fails_bad_type
[08:17:34] [PASSED] pf_txn_reports_error
[08:17:34] [PASSED] pf_txn_sends_pf2guc
[08:17:34] [PASSED] pf_sends_pf2guc
[08:17:34] [SKIPPED] pf_loopback_nop
[08:17:34] [SKIPPED] pf_loopback_echo
[08:17:34] [SKIPPED] pf_loopback_fail
[08:17:34] [SKIPPED] pf_loopback_busy
[08:17:34] [SKIPPED] pf_loopback_retry
[08:17:34] ==================== [PASSED] pf_relay =====================
[08:17:34] ================== vf_relay (3 subtests) ===================
[08:17:34] [PASSED] vf_rejects_guc2vf_too_short
[08:17:34] [PASSED] vf_rejects_guc2vf_too_long
[08:17:34] [PASSED] vf_rejects_guc2vf_no_payload
[08:17:34] ==================== [PASSED] vf_relay =====================
[08:17:34] ================ pf_gt_config (9 subtests) =================
[08:17:34] [PASSED] fair_contexts_1vf
[08:17:34] [PASSED] fair_doorbells_1vf
[08:17:34] [PASSED] fair_ggtt_1vf
[08:17:34] ====================== fair_vram_1vf ======================
[08:17:34] [PASSED] 3.50 GiB
[08:17:34] [PASSED] 11.5 GiB
[08:17:34] [PASSED] 15.5 GiB
[08:17:34] [PASSED] 31.5 GiB
[08:17:34] [PASSED] 63.5 GiB
[08:17:34] [PASSED] 1.91 GiB
[08:17:34] ================== [PASSED] fair_vram_1vf ==================
[08:17:34] ================ fair_vram_1vf_admin_only =================
[08:17:34] [PASSED] 3.50 GiB
[08:17:34] [PASSED] 11.5 GiB
[08:17:34] [PASSED] 15.5 GiB
[08:17:34] [PASSED] 31.5 GiB
[08:17:34] [PASSED] 63.5 GiB
[08:17:34] [PASSED] 1.91 GiB
[08:17:34] ============ [PASSED] fair_vram_1vf_admin_only =============
[08:17:34] ====================== fair_contexts ======================
[08:17:34] [PASSED] 1 VF
[08:17:34] [PASSED] 2 VFs
[08:17:34] [PASSED] 3 VFs
[08:17:34] [PASSED] 4 VFs
[08:17:34] [PASSED] 5 VFs
[08:17:34] [PASSED] 6 VFs
[08:17:34] [PASSED] 7 VFs
[08:17:34] [PASSED] 8 VFs
[08:17:34] [PASSED] 9 VFs
[08:17:34] [PASSED] 10 VFs
[08:17:34] [PASSED] 11 VFs
[08:17:34] [PASSED] 12 VFs
[08:17:34] [PASSED] 13 VFs
[08:17:34] [PASSED] 14 VFs
[08:17:34] [PASSED] 15 VFs
[08:17:34] [PASSED] 16 VFs
[08:17:34] [PASSED] 17 VFs
[08:17:34] [PASSED] 18 VFs
[08:17:34] [PASSED] 19 VFs
[08:17:34] [PASSED] 20 VFs
[08:17:34] [PASSED] 21 VFs
[08:17:34] [PASSED] 22 VFs
[08:17:34] [PASSED] 23 VFs
[08:17:34] [PASSED] 24 VFs
[08:17:34] [PASSED] 25 VFs
[08:17:34] [PASSED] 26 VFs
[08:17:34] [PASSED] 27 VFs
[08:17:34] [PASSED] 28 VFs
[08:17:34] [PASSED] 29 VFs
[08:17:34] [PASSED] 30 VFs
[08:17:34] [PASSED] 31 VFs
[08:17:34] [PASSED] 32 VFs
[08:17:34] [PASSED] 33 VFs
[08:17:34] [PASSED] 34 VFs
[08:17:34] [PASSED] 35 VFs
[08:17:34] [PASSED] 36 VFs
[08:17:34] [PASSED] 37 VFs
[08:17:34] [PASSED] 38 VFs
[08:17:34] [PASSED] 39 VFs
[08:17:34] [PASSED] 40 VFs
[08:17:34] [PASSED] 41 VFs
[08:17:34] [PASSED] 42 VFs
[08:17:34] [PASSED] 43 VFs
[08:17:34] [PASSED] 44 VFs
[08:17:34] [PASSED] 45 VFs
[08:17:34] [PASSED] 46 VFs
[08:17:34] [PASSED] 47 VFs
[08:17:34] [PASSED] 48 VFs
[08:17:34] [PASSED] 49 VFs
[08:17:34] [PASSED] 50 VFs
[08:17:34] [PASSED] 51 VFs
[08:17:34] [PASSED] 52 VFs
[08:17:34] [PASSED] 53 VFs
[08:17:34] [PASSED] 54 VFs
[08:17:34] [PASSED] 55 VFs
[08:17:34] [PASSED] 56 VFs
[08:17:34] [PASSED] 57 VFs
[08:17:34] [PASSED] 58 VFs
[08:17:34] [PASSED] 59 VFs
[08:17:34] [PASSED] 60 VFs
[08:17:34] [PASSED] 61 VFs
[08:17:34] [PASSED] 62 VFs
[08:17:34] [PASSED] 63 VFs
[08:17:34] ================== [PASSED] fair_contexts ==================
[08:17:34] ===================== fair_doorbells ======================
[08:17:34] [PASSED] 1 VF
[08:17:34] [PASSED] 2 VFs
[08:17:34] [PASSED] 3 VFs
[08:17:34] [PASSED] 4 VFs
[08:17:34] [PASSED] 5 VFs
[08:17:34] [PASSED] 6 VFs
[08:17:34] [PASSED] 7 VFs
[08:17:34] [PASSED] 8 VFs
[08:17:34] [PASSED] 9 VFs
[08:17:34] [PASSED] 10 VFs
[08:17:34] [PASSED] 11 VFs
[08:17:34] [PASSED] 12 VFs
[08:17:34] [PASSED] 13 VFs
[08:17:34] [PASSED] 14 VFs
[08:17:34] [PASSED] 15 VFs
[08:17:34] [PASSED] 16 VFs
[08:17:34] [PASSED] 17 VFs
[08:17:34] [PASSED] 18 VFs
[08:17:34] [PASSED] 19 VFs
[08:17:34] [PASSED] 20 VFs
[08:17:34] [PASSED] 21 VFs
[08:17:34] [PASSED] 22 VFs
[08:17:34] [PASSED] 23 VFs
[08:17:34] [PASSED] 24 VFs
[08:17:34] [PASSED] 25 VFs
[08:17:34] [PASSED] 26 VFs
[08:17:34] [PASSED] 27 VFs
[08:17:34] [PASSED] 28 VFs
[08:17:34] [PASSED] 29 VFs
[08:17:34] [PASSED] 30 VFs
[08:17:34] [PASSED] 31 VFs
[08:17:34] [PASSED] 32 VFs
[08:17:34] [PASSED] 33 VFs
[08:17:34] [PASSED] 34 VFs
[08:17:34] [PASSED] 35 VFs
[08:17:34] [PASSED] 36 VFs
[08:17:34] [PASSED] 37 VFs
[08:17:34] [PASSED] 38 VFs
[08:17:34] [PASSED] 39 VFs
[08:17:34] [PASSED] 40 VFs
[08:17:34] [PASSED] 41 VFs
[08:17:34] [PASSED] 42 VFs
[08:17:34] [PASSED] 43 VFs
[08:17:34] [PASSED] 44 VFs
[08:17:34] [PASSED] 45 VFs
[08:17:34] [PASSED] 46 VFs
[08:17:34] [PASSED] 47 VFs
[08:17:34] [PASSED] 48 VFs
[08:17:34] [PASSED] 49 VFs
[08:17:34] [PASSED] 50 VFs
[08:17:34] [PASSED] 51 VFs
[08:17:34] [PASSED] 52 VFs
[08:17:34] [PASSED] 53 VFs
[08:17:34] [PASSED] 54 VFs
[08:17:34] [PASSED] 55 VFs
[08:17:34] [PASSED] 56 VFs
[08:17:34] [PASSED] 57 VFs
[08:17:34] [PASSED] 58 VFs
[08:17:34] [PASSED] 59 VFs
[08:17:34] [PASSED] 60 VFs
[08:17:34] [PASSED] 61 VFs
[08:17:34] [PASSED] 62 VFs
[08:17:34] [PASSED] 63 VFs
[08:17:34] ================= [PASSED] fair_doorbells ==================
[08:17:34] ======================== fair_ggtt ========================
[08:17:34] [PASSED] 1 VF
[08:17:34] [PASSED] 2 VFs
[08:17:34] [PASSED] 3 VFs
[08:17:34] [PASSED] 4 VFs
[08:17:34] [PASSED] 5 VFs
[08:17:34] [PASSED] 6 VFs
[08:17:34] [PASSED] 7 VFs
[08:17:34] [PASSED] 8 VFs
[08:17:34] [PASSED] 9 VFs
[08:17:34] [PASSED] 10 VFs
[08:17:34] [PASSED] 11 VFs
[08:17:34] [PASSED] 12 VFs
[08:17:34] [PASSED] 13 VFs
[08:17:34] [PASSED] 14 VFs
[08:17:34] [PASSED] 15 VFs
[08:17:34] [PASSED] 16 VFs
[08:17:34] [PASSED] 17 VFs
[08:17:34] [PASSED] 18 VFs
[08:17:34] [PASSED] 19 VFs
[08:17:34] [PASSED] 20 VFs
[08:17:34] [PASSED] 21 VFs
[08:17:34] [PASSED] 22 VFs
[08:17:34] [PASSED] 23 VFs
[08:17:34] [PASSED] 24 VFs
[08:17:34] [PASSED] 25 VFs
[08:17:34] [PASSED] 26 VFs
[08:17:34] [PASSED] 27 VFs
[08:17:34] [PASSED] 28 VFs
[08:17:34] [PASSED] 29 VFs
[08:17:34] [PASSED] 30 VFs
[08:17:34] [PASSED] 31 VFs
[08:17:34] [PASSED] 32 VFs
[08:17:34] [PASSED] 33 VFs
[08:17:34] [PASSED] 34 VFs
[08:17:34] [PASSED] 35 VFs
[08:17:34] [PASSED] 36 VFs
[08:17:34] [PASSED] 37 VFs
[08:17:34] [PASSED] 38 VFs
[08:17:34] [PASSED] 39 VFs
[08:17:34] [PASSED] 40 VFs
[08:17:34] [PASSED] 41 VFs
[08:17:34] [PASSED] 42 VFs
[08:17:34] [PASSED] 43 VFs
[08:17:34] [PASSED] 44 VFs
[08:17:34] [PASSED] 45 VFs
[08:17:34] [PASSED] 46 VFs
[08:17:34] [PASSED] 47 VFs
[08:17:34] [PASSED] 48 VFs
[08:17:34] [PASSED] 49 VFs
[08:17:34] [PASSED] 50 VFs
[08:17:34] [PASSED] 51 VFs
[08:17:34] [PASSED] 52 VFs
[08:17:34] [PASSED] 53 VFs
[08:17:34] [PASSED] 54 VFs
[08:17:34] [PASSED] 55 VFs
[08:17:34] [PASSED] 56 VFs
[08:17:34] [PASSED] 57 VFs
[08:17:34] [PASSED] 58 VFs
[08:17:34] [PASSED] 59 VFs
[08:17:34] [PASSED] 60 VFs
[08:17:34] [PASSED] 61 VFs
[08:17:34] [PASSED] 62 VFs
[08:17:34] [PASSED] 63 VFs
[08:17:34] ==================== [PASSED] fair_ggtt ====================
[08:17:34] ======================== fair_vram ========================
[08:17:34] [PASSED] 1 VF
[08:17:34] [PASSED] 2 VFs
[08:17:34] [PASSED] 3 VFs
[08:17:34] [PASSED] 4 VFs
[08:17:34] [PASSED] 5 VFs
[08:17:34] [PASSED] 6 VFs
[08:17:34] [PASSED] 7 VFs
[08:17:34] [PASSED] 8 VFs
[08:17:34] [PASSED] 9 VFs
[08:17:34] [PASSED] 10 VFs
[08:17:34] [PASSED] 11 VFs
[08:17:34] [PASSED] 12 VFs
[08:17:34] [PASSED] 13 VFs
[08:17:34] [PASSED] 14 VFs
[08:17:34] [PASSED] 15 VFs
[08:17:34] [PASSED] 16 VFs
[08:17:34] [PASSED] 17 VFs
[08:17:34] [PASSED] 18 VFs
[08:17:34] [PASSED] 19 VFs
[08:17:34] [PASSED] 20 VFs
[08:17:34] [PASSED] 21 VFs
[08:17:34] [PASSED] 22 VFs
[08:17:34] [PASSED] 23 VFs
[08:17:34] [PASSED] 24 VFs
[08:17:34] [PASSED] 25 VFs
[08:17:34] [PASSED] 26 VFs
[08:17:34] [PASSED] 27 VFs
[08:17:34] [PASSED] 28 VFs
[08:17:34] [PASSED] 29 VFs
[08:17:34] [PASSED] 30 VFs
[08:17:34] [PASSED] 31 VFs
[08:17:34] [PASSED] 32 VFs
[08:17:34] [PASSED] 33 VFs
[08:17:34] [PASSED] 34 VFs
[08:17:34] [PASSED] 35 VFs
[08:17:34] [PASSED] 36 VFs
[08:17:34] [PASSED] 37 VFs
[08:17:34] [PASSED] 38 VFs
[08:17:34] [PASSED] 39 VFs
[08:17:34] [PASSED] 40 VFs
[08:17:34] [PASSED] 41 VFs
[08:17:34] [PASSED] 42 VFs
[08:17:34] [PASSED] 43 VFs
[08:17:34] [PASSED] 44 VFs
[08:17:34] [PASSED] 45 VFs
[08:17:34] [PASSED] 46 VFs
[08:17:34] [PASSED] 47 VFs
[08:17:34] [PASSED] 48 VFs
[08:17:34] [PASSED] 49 VFs
[08:17:34] [PASSED] 50 VFs
[08:17:34] [PASSED] 51 VFs
[08:17:34] [PASSED] 52 VFs
[08:17:34] [PASSED] 53 VFs
[08:17:34] [PASSED] 54 VFs
[08:17:34] [PASSED] 55 VFs
[08:17:34] [PASSED] 56 VFs
[08:17:34] [PASSED] 57 VFs
[08:17:34] [PASSED] 58 VFs
[08:17:34] [PASSED] 59 VFs
[08:17:34] [PASSED] 60 VFs
[08:17:34] [PASSED] 61 VFs
[08:17:34] [PASSED] 62 VFs
[08:17:34] [PASSED] 63 VFs
[08:17:34] ==================== [PASSED] fair_vram ====================
[08:17:34] ================== [PASSED] pf_gt_config ===================
[08:17:34] ===================== lmtt (1 subtest) =====================
[08:17:34] ======================== test_ops =========================
[08:17:34] [PASSED] 2-level
[08:17:34] [PASSED] multi-level
[08:17:34] ==================== [PASSED] test_ops =====================
[08:17:34] ====================== [PASSED] lmtt =======================
[08:17:34] ================= pf_service (11 subtests) =================
[08:17:34] [PASSED] pf_negotiate_any
[08:17:34] [PASSED] pf_negotiate_base_match
[08:17:34] [PASSED] pf_negotiate_base_newer
[08:17:34] [PASSED] pf_negotiate_base_next
[08:17:34] [SKIPPED] pf_negotiate_base_older
[08:17:34] [PASSED] pf_negotiate_base_prev
[08:17:34] [PASSED] pf_negotiate_latest_match
[08:17:34] [PASSED] pf_negotiate_latest_newer
[08:17:34] [PASSED] pf_negotiate_latest_next
[08:17:34] [SKIPPED] pf_negotiate_latest_older
[08:17:34] [SKIPPED] pf_negotiate_latest_prev
[08:17:34] =================== [PASSED] pf_service ====================
[08:17:34] ================= xe_guc_g2g (2 subtests) ==================
[08:17:34] ============== xe_live_guc_g2g_kunit_default ==============
[08:17:34] ========= [SKIPPED] xe_live_guc_g2g_kunit_default ==========
[08:17:34] ============== xe_live_guc_g2g_kunit_allmem ===============
[08:17:34] ========== [SKIPPED] xe_live_guc_g2g_kunit_allmem ==========
[08:17:34] =================== [SKIPPED] xe_guc_g2g ===================
[08:17:34] =================== xe_mocs (2 subtests) ===================
[08:17:34] ================ xe_live_mocs_kernel_kunit ================
[08:17:34] =========== [SKIPPED] xe_live_mocs_kernel_kunit ============
[08:17:34] ================ xe_live_mocs_reset_kunit =================
[08:17:34] ============ [SKIPPED] xe_live_mocs_reset_kunit ============
[08:17:34] ==================== [SKIPPED] xe_mocs =====================
[08:17:34] ================= xe_migrate (2 subtests) ==================
[08:17:34] ================= xe_migrate_sanity_kunit =================
[08:17:34] ============ [SKIPPED] xe_migrate_sanity_kunit =============
[08:17:34] ================== xe_validate_ccs_kunit ==================
[08:17:34] ============= [SKIPPED] xe_validate_ccs_kunit ==============
[08:17:34] =================== [SKIPPED] xe_migrate ===================
[08:17:34] ================== xe_dma_buf (1 subtest) ==================
[08:17:34] ==================== xe_dma_buf_kunit =====================
[08:17:34] ================ [SKIPPED] xe_dma_buf_kunit ================
[08:17:34] =================== [SKIPPED] xe_dma_buf ===================
[08:17:34] ================= xe_bo_shrink (1 subtest) =================
[08:17:34] =================== xe_bo_shrink_kunit ====================
[08:17:34] =============== [SKIPPED] xe_bo_shrink_kunit ===============
[08:17:34] ================== [SKIPPED] xe_bo_shrink ==================
[08:17:34] ==================== xe_bo (2 subtests) ====================
[08:17:34] ================== xe_ccs_migrate_kunit ===================
[08:17:34] ============== [SKIPPED] xe_ccs_migrate_kunit ==============
[08:17:34] ==================== xe_bo_evict_kunit ====================
[08:17:34] =============== [SKIPPED] xe_bo_evict_kunit ================
[08:17:34] ===================== [SKIPPED] xe_bo ======================
[08:17:34] ==================== args (13 subtests) ====================
[08:17:34] [PASSED] count_args_test
[08:17:34] [PASSED] call_args_example
[08:17:34] [PASSED] call_args_test
[08:17:34] [PASSED] drop_first_arg_example
[08:17:34] [PASSED] drop_first_arg_test
[08:17:34] [PASSED] first_arg_example
[08:17:34] [PASSED] first_arg_test
[08:17:34] [PASSED] last_arg_example
[08:17:34] [PASSED] last_arg_test
[08:17:34] [PASSED] pick_arg_example
[08:17:34] [PASSED] if_args_example
[08:17:34] [PASSED] if_args_test
[08:17:34] [PASSED] sep_comma_example
[08:17:34] ====================== [PASSED] args =======================
[08:17:34] =================== xe_pci (3 subtests) ====================
[08:17:34] ==================== check_graphics_ip ====================
[08:17:34] [PASSED] 12.00 Xe_LP
[08:17:34] [PASSED] 12.10 Xe_LP+
[08:17:34] [PASSED] 12.55 Xe_HPG
[08:17:34] [PASSED] 12.60 Xe_HPC
[08:17:34] [PASSED] 12.70 Xe_LPG
[08:17:34] [PASSED] 12.71 Xe_LPG
[08:17:34] [PASSED] 12.74 Xe_LPG+
[08:17:34] [PASSED] 20.01 Xe2_HPG
[08:17:34] [PASSED] 20.02 Xe2_HPG
[08:17:34] [PASSED] 20.04 Xe2_LPG
[08:17:34] [PASSED] 30.00 Xe3_LPG
[08:17:34] [PASSED] 30.01 Xe3_LPG
[08:17:34] [PASSED] 30.03 Xe3_LPG
[08:17:34] [PASSED] 30.04 Xe3_LPG
[08:17:34] [PASSED] 30.05 Xe3_LPG
[08:17:34] [PASSED] 35.10 Xe3p_LPG
[08:17:34] [PASSED] 35.11 Xe3p_XPC
[08:17:34] ================ [PASSED] check_graphics_ip ================
[08:17:34] ===================== check_media_ip ======================
[08:17:34] [PASSED] 12.00 Xe_M
[08:17:34] [PASSED] 12.55 Xe_HPM
[08:17:34] [PASSED] 13.00 Xe_LPM+
[08:17:34] [PASSED] 13.01 Xe2_HPM
[08:17:34] [PASSED] 20.00 Xe2_LPM
[08:17:34] [PASSED] 30.00 Xe3_LPM
[08:17:34] [PASSED] 30.02 Xe3_LPM
[08:17:34] [PASSED] 35.00 Xe3p_LPM
[08:17:34] [PASSED] 35.03 Xe3p_HPM
[08:17:34] ================= [PASSED] check_media_ip ==================
[08:17:34] =================== check_platform_desc ===================
[08:17:34] [PASSED] 0x9A60 (TIGERLAKE)
[08:17:34] [PASSED] 0x9A68 (TIGERLAKE)
[08:17:34] [PASSED] 0x9A70 (TIGERLAKE)
[08:17:34] [PASSED] 0x9A40 (TIGERLAKE)
[08:17:34] [PASSED] 0x9A49 (TIGERLAKE)
[08:17:34] [PASSED] 0x9A59 (TIGERLAKE)
[08:17:34] [PASSED] 0x9A78 (TIGERLAKE)
[08:17:34] [PASSED] 0x9AC0 (TIGERLAKE)
[08:17:34] [PASSED] 0x9AC9 (TIGERLAKE)
[08:17:34] [PASSED] 0x9AD9 (TIGERLAKE)
[08:17:34] [PASSED] 0x9AF8 (TIGERLAKE)
[08:17:34] [PASSED] 0x4C80 (ROCKETLAKE)
[08:17:34] [PASSED] 0x4C8A (ROCKETLAKE)
[08:17:34] [PASSED] 0x4C8B (ROCKETLAKE)
[08:17:34] [PASSED] 0x4C8C (ROCKETLAKE)
[08:17:34] [PASSED] 0x4C90 (ROCKETLAKE)
[08:17:34] [PASSED] 0x4C9A (ROCKETLAKE)
[08:17:34] [PASSED] 0x4680 (ALDERLAKE_S)
[08:17:34] [PASSED] 0x4682 (ALDERLAKE_S)
[08:17:34] [PASSED] 0x4688 (ALDERLAKE_S)
[08:17:34] [PASSED] 0x468A (ALDERLAKE_S)
[08:17:34] [PASSED] 0x468B (ALDERLAKE_S)
[08:17:34] [PASSED] 0x4690 (ALDERLAKE_S)
[08:17:34] [PASSED] 0x4692 (ALDERLAKE_S)
[08:17:34] [PASSED] 0x4693 (ALDERLAKE_S)
[08:17:34] [PASSED] 0x46A0 (ALDERLAKE_P)
[08:17:34] [PASSED] 0x46A1 (ALDERLAKE_P)
[08:17:34] [PASSED] 0x46A2 (ALDERLAKE_P)
[08:17:34] [PASSED] 0x46A3 (ALDERLAKE_P)
[08:17:34] [PASSED] 0x46A6 (ALDERLAKE_P)
[08:17:34] [PASSED] 0x46A8 (ALDERLAKE_P)
[08:17:34] [PASSED] 0x46AA (ALDERLAKE_P)
[08:17:34] [PASSED] 0x462A (ALDERLAKE_P)
[08:17:34] [PASSED] 0x4626 (ALDERLAKE_P)
[08:17:34] [PASSED] 0x4628 (ALDERLAKE_P)
[08:17:34] [PASSED] 0x46B0 (ALDERLAKE_P)
[08:17:34] [PASSED] 0x46B1 (ALDERLAKE_P)
[08:17:34] [PASSED] 0x46B2 (ALDERLAKE_P)
[08:17:34] [PASSED] 0x46B3 (ALDERLAKE_P)
[08:17:34] [PASSED] 0x46C0 (ALDERLAKE_P)
[08:17:34] [PASSED] 0x46C1 (ALDERLAKE_P)
[08:17:34] [PASSED] 0x46C2 (ALDERLAKE_P)
[08:17:34] [PASSED] 0x46C3 (ALDERLAKE_P)
[08:17:34] [PASSED] 0x46D0 (ALDERLAKE_N)
[08:17:34] [PASSED] 0x46D1 (ALDERLAKE_N)
[08:17:34] [PASSED] 0x46D2 (ALDERLAKE_N)
[08:17:34] [PASSED] 0x46D3 (ALDERLAKE_N)
[08:17:34] [PASSED] 0x46D4 (ALDERLAKE_N)
[08:17:34] [PASSED] 0xA721 (ALDERLAKE_P)
[08:17:34] [PASSED] 0xA7A1 (ALDERLAKE_P)
[08:17:34] [PASSED] 0xA7A9 (ALDERLAKE_P)
[08:17:34] [PASSED] 0xA7AC (ALDERLAKE_P)
[08:17:34] [PASSED] 0xA7AD (ALDERLAKE_P)
[08:17:34] [PASSED] 0xA720 (ALDERLAKE_P)
[08:17:34] [PASSED] 0xA7A0 (ALDERLAKE_P)
[08:17:34] [PASSED] 0xA7A8 (ALDERLAKE_P)
[08:17:34] [PASSED] 0xA7AA (ALDERLAKE_P)
[08:17:34] [PASSED] 0xA7AB (ALDERLAKE_P)
[08:17:34] [PASSED] 0xA780 (ALDERLAKE_S)
[08:17:34] [PASSED] 0xA781 (ALDERLAKE_S)
[08:17:34] [PASSED] 0xA782 (ALDERLAKE_S)
[08:17:34] [PASSED] 0xA783 (ALDERLAKE_S)
[08:17:34] [PASSED] 0xA788 (ALDERLAKE_S)
[08:17:34] [PASSED] 0xA789 (ALDERLAKE_S)
[08:17:34] [PASSED] 0xA78A (ALDERLAKE_S)
[08:17:34] [PASSED] 0xA78B (ALDERLAKE_S)
[08:17:34] [PASSED] 0x4905 (DG1)
[08:17:34] [PASSED] 0x4906 (DG1)
[08:17:34] [PASSED] 0x4907 (DG1)
[08:17:34] [PASSED] 0x4908 (DG1)
[08:17:34] [PASSED] 0x4909 (DG1)
[08:17:34] [PASSED] 0x56C0 (DG2)
[08:17:34] [PASSED] 0x56C2 (DG2)
[08:17:34] [PASSED] 0x56C1 (DG2)
[08:17:34] [PASSED] 0x7D51 (METEORLAKE)
[08:17:34] [PASSED] 0x7DD1 (METEORLAKE)
[08:17:34] [PASSED] 0x7D41 (METEORLAKE)
[08:17:34] [PASSED] 0x7D67 (METEORLAKE)
[08:17:34] [PASSED] 0xB640 (METEORLAKE)
[08:17:34] [PASSED] 0x56A0 (DG2)
[08:17:34] [PASSED] 0x56A1 (DG2)
[08:17:34] [PASSED] 0x56A2 (DG2)
[08:17:34] [PASSED] 0x56BE (DG2)
[08:17:34] [PASSED] 0x56BF (DG2)
[08:17:34] [PASSED] 0x5690 (DG2)
[08:17:34] [PASSED] 0x5691 (DG2)
[08:17:34] [PASSED] 0x5692 (DG2)
[08:17:34] [PASSED] 0x56A5 (DG2)
[08:17:34] [PASSED] 0x56A6 (DG2)
[08:17:34] [PASSED] 0x56B0 (DG2)
[08:17:34] [PASSED] 0x56B1 (DG2)
[08:17:34] [PASSED] 0x56BA (DG2)
[08:17:34] [PASSED] 0x56BB (DG2)
[08:17:34] [PASSED] 0x56BC (DG2)
[08:17:34] [PASSED] 0x56BD (DG2)
[08:17:34] [PASSED] 0x5693 (DG2)
[08:17:34] [PASSED] 0x5694 (DG2)
[08:17:34] [PASSED] 0x5695 (DG2)
[08:17:34] [PASSED] 0x56A3 (DG2)
[08:17:34] [PASSED] 0x56A4 (DG2)
[08:17:34] [PASSED] 0x56B2 (DG2)
[08:17:34] [PASSED] 0x56B3 (DG2)
[08:17:34] [PASSED] 0x5696 (DG2)
[08:17:34] [PASSED] 0x5697 (DG2)
[08:17:34] [PASSED] 0xB69 (PVC)
[08:17:34] [PASSED] 0xB6E (PVC)
[08:17:34] [PASSED] 0xBD4 (PVC)
[08:17:34] [PASSED] 0xBD5 (PVC)
[08:17:34] [PASSED] 0xBD6 (PVC)
[08:17:34] [PASSED] 0xBD7 (PVC)
[08:17:34] [PASSED] 0xBD8 (PVC)
[08:17:34] [PASSED] 0xBD9 (PVC)
[08:17:34] [PASSED] 0xBDA (PVC)
[08:17:34] [PASSED] 0xBDB (PVC)
[08:17:34] [PASSED] 0xBE0 (PVC)
[08:17:34] [PASSED] 0xBE1 (PVC)
[08:17:34] [PASSED] 0xBE5 (PVC)
[08:17:34] [PASSED] 0x7D40 (METEORLAKE)
[08:17:34] [PASSED] 0x7D45 (METEORLAKE)
[08:17:34] [PASSED] 0x7D55 (METEORLAKE)
[08:17:34] [PASSED] 0x7D60 (METEORLAKE)
[08:17:34] [PASSED] 0x7DD5 (METEORLAKE)
[08:17:34] [PASSED] 0x6420 (LUNARLAKE)
[08:17:34] [PASSED] 0x64A0 (LUNARLAKE)
[08:17:34] [PASSED] 0x64B0 (LUNARLAKE)
[08:17:34] [PASSED] 0xE202 (BATTLEMAGE)
[08:17:34] [PASSED] 0xE209 (BATTLEMAGE)
[08:17:34] [PASSED] 0xE20B (BATTLEMAGE)
[08:17:34] [PASSED] 0xE20C (BATTLEMAGE)
[08:17:34] [PASSED] 0xE20D (BATTLEMAGE)
[08:17:34] [PASSED] 0xE210 (BATTLEMAGE)
[08:17:34] [PASSED] 0xE211 (BATTLEMAGE)
[08:17:34] [PASSED] 0xE212 (BATTLEMAGE)
[08:17:34] [PASSED] 0xE216 (BATTLEMAGE)
[08:17:34] [PASSED] 0xE220 (BATTLEMAGE)
[08:17:34] [PASSED] 0xE221 (BATTLEMAGE)
[08:17:34] [PASSED] 0xE222 (BATTLEMAGE)
[08:17:34] [PASSED] 0xE223 (BATTLEMAGE)
[08:17:34] [PASSED] 0xB080 (PANTHERLAKE)
[08:17:34] [PASSED] 0xB081 (PANTHERLAKE)
[08:17:34] [PASSED] 0xB082 (PANTHERLAKE)
[08:17:34] [PASSED] 0xB083 (PANTHERLAKE)
[08:17:34] [PASSED] 0xB084 (PANTHERLAKE)
[08:17:34] [PASSED] 0xB085 (PANTHERLAKE)
[08:17:34] [PASSED] 0xB086 (PANTHERLAKE)
[08:17:34] [PASSED] 0xB087 (PANTHERLAKE)
[08:17:34] [PASSED] 0xB08F (PANTHERLAKE)
[08:17:34] [PASSED] 0xB090 (PANTHERLAKE)
[08:17:34] [PASSED] 0xB0A0 (PANTHERLAKE)
[08:17:34] [PASSED] 0xB0B0 (PANTHERLAKE)
[08:17:34] [PASSED] 0xFD80 (PANTHERLAKE)
[08:17:34] [PASSED] 0xFD81 (PANTHERLAKE)
[08:17:34] [PASSED] 0xD740 (NOVALAKE_S)
[08:17:34] [PASSED] 0xD741 (NOVALAKE_S)
[08:17:34] [PASSED] 0xD742 (NOVALAKE_S)
[08:17:34] [PASSED] 0xD743 (NOVALAKE_S)
[08:17:34] [PASSED] 0xD745 (NOVALAKE_S)
[08:17:34] [PASSED] 0xD74A (NOVALAKE_S)
[08:17:34] [PASSED] 0xD74B (NOVALAKE_S)
[08:17:34] [PASSED] 0x674C (CRESCENTISLAND)
[08:17:34] [PASSED] 0x674D (CRESCENTISLAND)
[08:17:34] [PASSED] 0x674E (CRESCENTISLAND)
[08:17:34] [PASSED] 0x674F (CRESCENTISLAND)
[08:17:34] [PASSED] 0x6750 (CRESCENTISLAND)
[08:17:34] [PASSED] 0xD750 (NOVALAKE_P)
[08:17:34] [PASSED] 0xD751 (NOVALAKE_P)
[08:17:34] [PASSED] 0xD752 (NOVALAKE_P)
[08:17:34] [PASSED] 0xD753 (NOVALAKE_P)
[08:17:34] [PASSED] 0xD754 (NOVALAKE_P)
[08:17:34] [PASSED] 0xD755 (NOVALAKE_P)
[08:17:34] [PASSED] 0xD756 (NOVALAKE_P)
[08:17:34] [PASSED] 0xD757 (NOVALAKE_P)
[08:17:34] [PASSED] 0xD75F (NOVALAKE_P)
[08:17:34] =============== [PASSED] check_platform_desc ===============
[08:17:34] ===================== [PASSED] xe_pci ======================
[08:17:34] ============= xe_rtp_tables_test (4 subtests) ==============
[08:17:34] ================== xe_rtp_table_gt_test ===================
[08:17:34] [PASSED] gt_was/14011060649
[08:17:34] [PASSED] gt_was/14011059788
[08:17:34] [PASSED] gt_was/14015795083
[08:17:34] [PASSED] gt_was/16021867713
[08:17:34] [PASSED] gt_was/14019449301
[08:17:34] [PASSED] gt_was/16028005424
[08:17:34] [PASSED] gt_was/14026578760
[08:17:34] [PASSED] gt_was/1409420604
[08:17:34] [PASSED] gt_was/1408615072
[08:17:34] [PASSED] gt_was/22010523718
[08:17:34] [PASSED] gt_was/14011006942
[08:17:34] [PASSED] gt_was/14014830051
[08:17:34] [PASSED] gt_was/18018781329
[08:17:34] [PASSED] gt_was/1509235366
[08:17:34] [PASSED] gt_was/18018781329
[08:17:34] [PASSED] gt_was/16016694945
[08:17:34] [PASSED] gt_was/14018575942
[08:17:34] [PASSED] gt_was/22016670082
[08:17:34] [PASSED] gt_was/22016670082
[08:17:34] [PASSED] gt_was/14017421178
[08:17:34] [PASSED] gt_was/16025250150
[08:17:34] [PASSED] gt_was/14021871409
[08:17:34] [PASSED] gt_was/16021865536
[08:17:34] [PASSED] gt_was/14021486841
[08:17:34] [PASSED] gt_was/14025160223
[08:17:34] [PASSED] gt_was/14026144927, 16029437861
[08:17:34] [PASSED] gt_was/14025635424
[08:17:34] [PASSED] gt_was/16028005424
[08:17:34] ============== [PASSED] xe_rtp_table_gt_test ===============
[08:17:34] ================== xe_rtp_table_gt_test ===================
[08:17:34] [PASSED] gt_tunings/Tuning: Blend Fill Caching Optimization Disable
[08:17:34] [PASSED] gt_tunings/Tuning: 32B Access Enable
[08:17:34] [PASSED] gt_tunings/Tuning: L3 cache
[08:17:34] [PASSED] gt_tunings/Tuning: L3 cache - media
[08:17:34] [PASSED] gt_tunings/Tuning: Compression Overfetch
[08:17:34] [PASSED] gt_tunings/Tuning: Compression Overfetch - media
[08:17:34] [PASSED] gt_tunings/Tuning: Enable compressible partial write overfetch in L3
[08:17:34] [PASSED] gt_tunings/Tuning: Enable compressible partial write overfetch in L3 - media
[08:17:34] [PASSED] gt_tunings/Tuning: L2 Overfetch Compressible Only
[08:17:34] [PASSED] gt_tunings/Tuning: L2 Overfetch Compressible Only - media
[08:17:34] [PASSED] gt_tunings/Tuning: Stateless compression control
[08:17:34] [PASSED] gt_tunings/Tuning: Stateless compression control - media
[08:17:34] [PASSED] gt_tunings/Tuning: L3 RW flush all Cache
[08:17:34] [PASSED] gt_tunings/Tuning: L3 RW flush all cache - media
[08:17:34] [PASSED] gt_tunings/Tuning: Set STLB Bank Hash Mode to 4KB
[08:17:34] ============== [PASSED] xe_rtp_table_gt_test ===============
[08:17:34] ================== xe_rtp_table_oob_test ==================
[08:17:34] [PASSED] oob_was/1607983814
[08:17:34] [PASSED] oob_was/16010904313
[08:17:34] [PASSED] oob_was/18022495364
[08:17:34] [PASSED] oob_was/22012773006
[08:17:34] [PASSED] oob_was/14014475959
[08:17:34] [PASSED] oob_was/22011391025
[08:17:34] [PASSED] oob_was/22012727170
[08:17:34] [PASSED] oob_was/22012727685
[08:17:34] [PASSED] oob_was/22016596838
[08:17:34] [PASSED] oob_was/18020744125
[08:17:34] [PASSED] oob_was/1409600907
[08:17:34] [PASSED] oob_was/22014953428
[08:17:34] [PASSED] oob_was/16017236439
[08:17:34] [PASSED] oob_was/14019821291
[08:17:34] [PASSED] oob_was/14015076503
[08:17:34] [PASSED] oob_was/14018913170
[08:17:34] [PASSED] oob_was/14018094691
[08:17:34] [PASSED] oob_was/18024947630
[08:17:34] [PASSED] oob_was/16022287689
[08:17:34] [PASSED] oob_was/13011645652
[08:17:34] [PASSED] oob_was/14022293748
[08:17:34] [PASSED] oob_was/22019794406
[08:17:34] [PASSED] oob_was/22019338487
[08:17:34] [PASSED] oob_was/16023588340
[08:17:34] [PASSED] oob_was/14019789679
[08:17:34] [PASSED] oob_was/14022866841
[08:17:34] [PASSED] oob_was/16021333562
[08:17:34] [PASSED] oob_was/14016712196
[08:17:34] [PASSED] oob_was/14015568240
[08:17:34] [PASSED] oob_was/18013179988
[08:17:34] [PASSED] oob_was/1508761755
[08:17:34] [PASSED] oob_was/16023105232
[08:17:34] [PASSED] oob_was/16026508708
[08:17:34] [PASSED] oob_was/14020001231
[08:17:34] [PASSED] oob_was/16023683509
[08:17:34] [PASSED] oob_was/14025515070
[08:17:34] [PASSED] oob_was/15015404425_disable
[08:17:34] [PASSED] oob_was/16026007364
[08:17:34] [PASSED] oob_was/14020316580
[08:17:34] [PASSED] oob_was/14025883347
[08:17:34] ============== [PASSED] xe_rtp_table_oob_test ==============
[08:17:34] ================ xe_rtp_table_dev_oob_test ================
[08:17:34] [PASSED] device_oob_was/22010954014
[08:17:34] [PASSED] device_oob_was/15015404425
[08:17:34] [PASSED] device_oob_was/22019338487_display
[08:17:34] [PASSED] device_oob_was/14022085890
[08:17:34] [PASSED] device_oob_was/14026539277
[08:17:34] [PASSED] device_oob_was/14026633728
[08:17:34] [PASSED] device_oob_was/14026746987
[08:17:34] [PASSED] device_oob_was/14026779378
[08:17:34] ============ [PASSED] xe_rtp_table_dev_oob_test ============
[08:17:34] =============== [PASSED] xe_rtp_tables_test ================
[08:17:34] =================== xe_rtp (3 subtests) ====================
[08:17:34] =================== xe_rtp_rules_tests ====================
[08:17:34] [PASSED] no
[08:17:34] [PASSED] yes
[08:17:34] [PASSED] no-and-no
[08:17:34] [PASSED] no-and-yes
[08:17:34] [PASSED] yes-and-no
[08:17:34] [PASSED] yes-and-yes
[08:17:34] [PASSED] no-or-no
[08:17:34] [PASSED] no-or-yes
[08:17:34] [PASSED] yes-or-no
[08:17:34] [PASSED] yes-or-yes
[08:17:34] [PASSED] no-yes-or-yes-no
[08:17:34] [PASSED] no-yes-or-yes-yes
[08:17:34] [PASSED] yes-yes-or-no-yes
[08:17:34] [PASSED] yes-yes-or-yes-yes
[08:17:34] [PASSED] no-no-or-yes-or-no
[08:17:34] [PASSED] or
[08:17:34] [PASSED] or-yes
[08:17:34] [PASSED] or-no
[08:17:34] [PASSED] yes-or
[08:17:34] [PASSED] no-or
[08:17:34] [PASSED] no-or-or-yes
[08:17:34] [PASSED] yes-or-or-no
[08:17:34] [PASSED] no-or-or-no
[08:17:34] [PASSED] missing-context-engine-class
[08:17:34] [PASSED] missing-context-engine-class-or-yes
[08:17:34] [PASSED] missing-context-engine-class-or-or-yes
[08:17:34] =============== [PASSED] xe_rtp_rules_tests ================
[08:17:34] =============== xe_rtp_process_to_sr_tests ================
[08:17:34] [PASSED] coalesce-same-reg
[08:17:34] [PASSED] no-match-no-add
[08:17:34] [PASSED] two-regs-two-entries
[08:17:34] [PASSED] clr-one-set-other
[08:17:34] [PASSED] set-field
[08:17:34] [PASSED] conflict-duplicate
[08:17:34] [PASSED] conflict-not-disjoint
[08:17:34] [PASSED] conflict-reg-type
[08:17:34] [PASSED] bad-mcr-reg-forced-to-regular
[08:17:34] [PASSED] bad-regular-reg-forced-to-mcr
[08:17:34] =========== [PASSED] xe_rtp_process_to_sr_tests ============
[08:17:34] ================== xe_rtp_process_tests ===================
[08:17:34] [PASSED] active1
[08:17:34] [PASSED] active2
[08:17:34] [PASSED] active-inactive
[08:17:34] [PASSED] inactive-active
[08:17:34] [PASSED] inactive-active-inactive
[08:17:34] [PASSED] inactive-inactive-inactive
[08:17:34] ============== [PASSED] xe_rtp_process_tests ===============
[08:17:34] ===================== [PASSED] xe_rtp ======================
[08:17:34] ==================== xe_wa (1 subtest) =====================
[08:17:34] ======================== xe_wa_gt =========================
[08:17:34] [PASSED] TIGERLAKE B0
[08:17:34] [PASSED] DG1 A0
[08:17:34] [PASSED] DG1 B0
[08:17:34] [PASSED] ALDERLAKE_S A0
[08:17:34] [PASSED] ALDERLAKE_S B0
[08:17:34] [PASSED] ALDERLAKE_S C0
[08:17:34] [PASSED] ALDERLAKE_S D0
[08:17:34] [PASSED] ALDERLAKE_P A0
[08:17:34] [PASSED] ALDERLAKE_P B0
[08:17:34] [PASSED] ALDERLAKE_P C0
[08:17:34] [PASSED] ALDERLAKE_S RPLS D0
[08:17:34] [PASSED] ALDERLAKE_P RPLU E0
[08:17:34] [PASSED] DG2 G10 C0
[08:17:34] [PASSED] DG2 G11 B1
[08:17:34] [PASSED] DG2 G12 A1
[08:17:34] [PASSED] METEORLAKE 12.70(Xe_LPG) A0 13.00(Xe_LPM+) A0
[08:17:34] [PASSED] METEORLAKE 12.71(Xe_LPG) A0 13.00(Xe_LPM+) A0
[08:17:34] [PASSED] METEORLAKE 12.74(Xe_LPG+) A0 13.00(Xe_LPM+) A0
[08:17:34] [PASSED] LUNARLAKE 20.04(Xe2_LPG) A0 20.00(Xe2_LPM) A0
[08:17:34] [PASSED] LUNARLAKE 20.04(Xe2_LPG) B0 20.00(Xe2_LPM) A0
[08:17:34] [PASSED] BATTLEMAGE 20.01(Xe2_HPG) A0 13.01(Xe2_HPM) A1
[08:17:34] [PASSED] PANTHERLAKE 30.00(Xe3_LPG) A0 30.00(Xe3_LPM) A0
[08:17:34] ==================== [PASSED] xe_wa_gt =====================
[08:17:34] ====================== [PASSED] xe_wa ======================
[08:17:34] ============================================================
[08:17:34] Testing complete. Ran 716 tests: passed: 698, skipped: 18
[08:17:34] Elapsed time: 36.185s total, 4.316s configuring, 31.203s building, 0.638s running
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/tests/.kunitconfig
[08:17:34] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[08:17:36] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
In file included from ../drivers/gpu/drm/tests/drm_bridge_test.c:21:
../drivers/gpu/drm/tests/drm_kunit_edid.h:958:28: warning: ‘test_edid_hdmi_4k_rgb_yuv420_dc_max_340mhz’ defined but not used [-Wunused-const-variable=]
958 | static const unsigned char test_edid_hdmi_4k_rgb_yuv420_dc_max_340mhz[] = {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/tests/drm_kunit_edid.h:726:28: warning: ‘test_edid_hdmi_1080p_rgb_yuv_dc_max_340mhz’ defined but not used [-Wunused-const-variable=]
726 | static const unsigned char test_edid_hdmi_1080p_rgb_yuv_dc_max_340mhz[] = {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/tests/drm_kunit_edid.h:612:28: warning: ‘test_edid_hdmi_1080p_rgb_yuv_dc_max_200mhz’ defined but not used [-Wunused-const-variable=]
612 | static const unsigned char test_edid_hdmi_1080p_rgb_yuv_dc_max_200mhz[] = {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/tests/drm_kunit_edid.h:498:28: warning: ‘test_edid_hdmi_1080p_rgb_max_340mhz’ defined but not used [-Wunused-const-variable=]
498 | static const unsigned char test_edid_hdmi_1080p_rgb_max_340mhz[] = {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/tests/drm_kunit_edid.h:390:28: warning: ‘test_edid_hdmi_1080p_rgb_max_200mhz_hdr’ defined but not used [-Wunused-const-variable=]
390 | static const unsigned char test_edid_hdmi_1080p_rgb_max_200mhz_hdr[] = {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/tests/drm_kunit_edid.h:271:28: warning: ‘test_edid_hdmi_1080p_rgb_max_200mhz’ defined but not used [-Wunused-const-variable=]
271 | static const unsigned char test_edid_hdmi_1080p_rgb_max_200mhz[] = {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/tests/drm_kunit_edid.h:163:28: warning: ‘test_edid_hdmi_1080p_rgb_max_100mhz’ defined but not used [-Wunused-const-variable=]
163 | static const unsigned char test_edid_hdmi_1080p_rgb_max_100mhz[] = {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/tests/drm_kunit_edid.h:57:28: warning: ‘test_edid_dvi_1080p’ defined but not used [-Wunused-const-variable=]
57 | static const unsigned char test_edid_dvi_1080p[] = {
| ^~~~~~~~~~~~~~~~~~~
[08:18:00] Starting KUnit Kernel (1/1)...
[08:18:00] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[08:18:00] ============ drm_test_pick_cmdline (2 subtests) ============
[08:18:00] [PASSED] drm_test_pick_cmdline_res_1920_1080_60
[08:18:00] =============== drm_test_pick_cmdline_named ===============
[08:18:00] [PASSED] NTSC
[08:18:00] [PASSED] NTSC-J
[08:18:00] [PASSED] PAL
[08:18:00] [PASSED] PAL-M
[08:18:00] =========== [PASSED] drm_test_pick_cmdline_named ===========
[08:18:00] ============== [PASSED] drm_test_pick_cmdline ==============
[08:18:00] == drm_test_atomic_get_connector_for_encoder (1 subtest) ===
[08:18:00] [PASSED] drm_test_drm_atomic_get_connector_for_encoder
[08:18:00] ==== [PASSED] drm_test_atomic_get_connector_for_encoder ====
[08:18:00] =========== drm_validate_clone_mode (2 subtests) ===========
[08:18:00] ============== drm_test_check_in_clone_mode ===============
[08:18:00] [PASSED] in_clone_mode
[08:18:00] [PASSED] not_in_clone_mode
[08:18:00] ========== [PASSED] drm_test_check_in_clone_mode ===========
[08:18:00] =============== drm_test_check_valid_clones ===============
[08:18:00] [PASSED] not_in_clone_mode
[08:18:00] [PASSED] valid_clone
[08:18:00] [PASSED] invalid_clone
[08:18:00] =========== [PASSED] drm_test_check_valid_clones ===========
[08:18:00] ============= [PASSED] drm_validate_clone_mode =============
[08:18:00] ============= drm_validate_modeset (1 subtest) =============
[08:18:00] [PASSED] drm_test_check_connector_changed_modeset
[08:18:00] ============== [PASSED] drm_validate_modeset ===============
[08:18:00] ====== drm_test_bridge_get_current_state (2 subtests) ======
[08:18:00] [PASSED] drm_test_drm_bridge_get_current_state_atomic
[08:18:00] [PASSED] drm_test_drm_bridge_get_current_state_legacy
[08:18:00] ======== [PASSED] drm_test_bridge_get_current_state ========
[08:18:00] ====== drm_test_bridge_helper_reset_crtc (4 subtests) ======
[08:18:00] [PASSED] drm_test_drm_bridge_helper_reset_crtc_atomic
[08:18:00] [PASSED] drm_test_drm_bridge_helper_reset_crtc_atomic_disabled
[08:18:00] [PASSED] drm_test_drm_bridge_helper_reset_crtc_legacy
[08:18:00] [PASSED] drm_test_drm_bridge_helper_hdmi_output_bus_fmts
[08:18:00] ======== [PASSED] drm_test_bridge_helper_reset_crtc ========
[08:18:00] ============== drm_bridge_alloc (2 subtests) ===============
[08:18:00] [PASSED] drm_test_drm_bridge_alloc_basic
[08:18:00] [PASSED] drm_test_drm_bridge_alloc_get_put
[08:18:00] ================ [PASSED] drm_bridge_alloc =================
[08:18:00] ============= drm_bridge_bus_fmt (5 subtests) ==============
[08:18:00] [PASSED] drm_test_bridge_rgb_yuv_rgb
[08:18:00] [PASSED] drm_test_bridge_must_convert_to_yuv444
[08:18:00] [PASSED] drm_test_bridge_hdmi_auto_rgb
[08:18:00] [PASSED] drm_test_bridge_auto_first
[08:18:00] [PASSED] drm_test_bridge_rgb_yuv_no_path
[08:18:00] =============== [PASSED] drm_bridge_bus_fmt ================
[08:18:00] ============= drm_cmdline_parser (40 subtests) =============
[08:18:00] [PASSED] drm_test_cmdline_force_d_only
[08:18:00] [PASSED] drm_test_cmdline_force_D_only_dvi
[08:18:00] [PASSED] drm_test_cmdline_force_D_only_hdmi
[08:18:00] [PASSED] drm_test_cmdline_force_D_only_not_digital
[08:18:00] [PASSED] drm_test_cmdline_force_e_only
[08:18:00] [PASSED] drm_test_cmdline_res
[08:18:00] [PASSED] drm_test_cmdline_res_vesa
[08:18:00] [PASSED] drm_test_cmdline_res_vesa_rblank
[08:18:00] [PASSED] drm_test_cmdline_res_rblank
[08:18:00] [PASSED] drm_test_cmdline_res_bpp
[08:18:00] [PASSED] drm_test_cmdline_res_refresh
[08:18:00] [PASSED] drm_test_cmdline_res_bpp_refresh
[08:18:00] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced
[08:18:00] [PASSED] drm_test_cmdline_res_bpp_refresh_margins
[08:18:00] [PASSED] drm_test_cmdline_res_bpp_refresh_force_off
[08:18:00] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on
[08:18:00] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_analog
[08:18:00] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_digital
[08:18:00] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced_margins_force_on
[08:18:00] [PASSED] drm_test_cmdline_res_margins_force_on
[08:18:00] [PASSED] drm_test_cmdline_res_vesa_margins
[08:18:00] [PASSED] drm_test_cmdline_name
[08:18:00] [PASSED] drm_test_cmdline_name_bpp
[08:18:00] [PASSED] drm_test_cmdline_name_option
[08:18:00] [PASSED] drm_test_cmdline_name_bpp_option
[08:18:00] [PASSED] drm_test_cmdline_rotate_0
[08:18:00] [PASSED] drm_test_cmdline_rotate_90
[08:18:00] [PASSED] drm_test_cmdline_rotate_180
[08:18:00] [PASSED] drm_test_cmdline_rotate_270
[08:18:00] [PASSED] drm_test_cmdline_hmirror
[08:18:00] [PASSED] drm_test_cmdline_vmirror
[08:18:00] [PASSED] drm_test_cmdline_margin_options
[08:18:00] [PASSED] drm_test_cmdline_multiple_options
[08:18:00] [PASSED] drm_test_cmdline_bpp_extra_and_option
[08:18:00] [PASSED] drm_test_cmdline_extra_and_option
[08:18:00] [PASSED] drm_test_cmdline_freestanding_options
[08:18:00] [PASSED] drm_test_cmdline_freestanding_force_e_and_options
[08:18:00] [PASSED] drm_test_cmdline_panel_orientation
[08:18:00] ================ drm_test_cmdline_invalid =================
[08:18:00] [PASSED] margin_only
[08:18:00] [PASSED] interlace_only
[08:18:00] [PASSED] res_missing_x
[08:18:00] [PASSED] res_missing_y
[08:18:00] [PASSED] res_bad_y
[08:18:00] [PASSED] res_missing_y_bpp
[08:18:00] [PASSED] res_bad_bpp
[08:18:00] [PASSED] res_bad_refresh
[08:18:00] [PASSED] res_bpp_refresh_force_on_off
[08:18:00] [PASSED] res_invalid_mode
[08:18:00] [PASSED] res_bpp_wrong_place_mode
[08:18:00] [PASSED] name_bpp_refresh
[08:18:00] [PASSED] name_refresh
[08:18:00] [PASSED] name_refresh_wrong_mode
[08:18:00] [PASSED] name_refresh_invalid_mode
[08:18:00] [PASSED] rotate_multiple
[08:18:00] [PASSED] rotate_invalid_val
[08:18:00] [PASSED] rotate_truncated
[08:18:00] [PASSED] invalid_option
[08:18:00] [PASSED] invalid_tv_option
[08:18:00] [PASSED] truncated_tv_option
[08:18:00] ============ [PASSED] drm_test_cmdline_invalid =============
[08:18:00] =============== drm_test_cmdline_tv_options ===============
[08:18:00] [PASSED] NTSC
[08:18:00] [PASSED] NTSC_443
[08:18:00] [PASSED] NTSC_J
[08:18:00] [PASSED] PAL
[08:18:00] [PASSED] PAL_M
[08:18:00] [PASSED] PAL_N
[08:18:00] [PASSED] SECAM
[08:18:00] [PASSED] MONO_525
[08:18:00] [PASSED] MONO_625
[08:18:00] =========== [PASSED] drm_test_cmdline_tv_options ===========
[08:18:00] =============== [PASSED] drm_cmdline_parser ================
[08:18:00] ========== drmm_connector_hdmi_init (20 subtests) ==========
[08:18:00] [PASSED] drm_test_connector_hdmi_init_valid
[08:18:00] [PASSED] drm_test_connector_hdmi_init_bpc_8
[08:18:00] [PASSED] drm_test_connector_hdmi_init_bpc_10
[08:18:00] [PASSED] drm_test_connector_hdmi_init_bpc_12
[08:18:00] [PASSED] drm_test_connector_hdmi_init_bpc_invalid
[08:18:00] [PASSED] drm_test_connector_hdmi_init_bpc_null
[08:18:00] [PASSED] drm_test_connector_hdmi_init_formats_empty
[08:18:00] [PASSED] drm_test_connector_hdmi_init_formats_no_rgb
[08:18:00] === drm_test_connector_hdmi_init_formats_yuv420_allowed ===
[08:18:00] [PASSED] supported_formats=0x9 yuv420_allowed=1
[08:18:00] [PASSED] supported_formats=0x9 yuv420_allowed=0
[08:18:00] [PASSED] supported_formats=0x5 yuv420_allowed=1
[08:18:00] [PASSED] supported_formats=0x5 yuv420_allowed=0
[08:18:00] === [PASSED] drm_test_connector_hdmi_init_formats_yuv420_allowed ===
[08:18:00] [PASSED] drm_test_connector_hdmi_init_null_ddc
[08:18:00] [PASSED] drm_test_connector_hdmi_init_null_product
[08:18:00] [PASSED] drm_test_connector_hdmi_init_null_vendor
[08:18:00] [PASSED] drm_test_connector_hdmi_init_product_length_exact
[08:18:00] [PASSED] drm_test_connector_hdmi_init_product_length_too_long
[08:18:00] [PASSED] drm_test_connector_hdmi_init_product_valid
[08:18:00] [PASSED] drm_test_connector_hdmi_init_vendor_length_exact
[08:18:00] [PASSED] drm_test_connector_hdmi_init_vendor_length_too_long
[08:18:00] [PASSED] drm_test_connector_hdmi_init_vendor_valid
[08:18:00] ========= drm_test_connector_hdmi_init_type_valid =========
[08:18:00] [PASSED] HDMI-A
[08:18:00] [PASSED] HDMI-B
[08:18:00] ===== [PASSED] drm_test_connector_hdmi_init_type_valid =====
[08:18:00] ======== drm_test_connector_hdmi_init_type_invalid ========
[08:18:00] [PASSED] Unknown
[08:18:00] [PASSED] VGA
[08:18:00] [PASSED] DVI-I
[08:18:00] [PASSED] DVI-D
[08:18:00] [PASSED] DVI-A
[08:18:00] [PASSED] Composite
[08:18:00] [PASSED] SVIDEO
[08:18:00] [PASSED] LVDS
[08:18:00] [PASSED] Component
[08:18:00] [PASSED] DIN
[08:18:00] [PASSED] DP
[08:18:00] [PASSED] TV
[08:18:00] [PASSED] eDP
[08:18:00] [PASSED] Virtual
[08:18:00] [PASSED] DSI
[08:18:00] [PASSED] DPI
[08:18:00] [PASSED] Writeback
[08:18:00] [PASSED] SPI
[08:18:00] [PASSED] USB
[08:18:00] ==== [PASSED] drm_test_connector_hdmi_init_type_invalid ====
[08:18:00] ============ [PASSED] drmm_connector_hdmi_init =============
[08:18:00] ============= drmm_connector_init (3 subtests) =============
[08:18:00] [PASSED] drm_test_drmm_connector_init
[08:18:00] [PASSED] drm_test_drmm_connector_init_null_ddc
[08:18:00] ========= drm_test_drmm_connector_init_type_valid =========
[08:18:00] [PASSED] Unknown
[08:18:00] [PASSED] VGA
[08:18:00] [PASSED] DVI-I
[08:18:00] [PASSED] DVI-D
[08:18:00] [PASSED] DVI-A
[08:18:00] [PASSED] Composite
[08:18:00] [PASSED] SVIDEO
[08:18:00] [PASSED] LVDS
[08:18:00] [PASSED] Component
[08:18:00] [PASSED] DIN
[08:18:00] [PASSED] DP
[08:18:00] [PASSED] HDMI-A
[08:18:00] [PASSED] HDMI-B
[08:18:00] [PASSED] TV
[08:18:00] [PASSED] eDP
[08:18:00] [PASSED] Virtual
[08:18:00] [PASSED] DSI
[08:18:00] [PASSED] DPI
[08:18:00] [PASSED] Writeback
[08:18:00] [PASSED] SPI
[08:18:00] [PASSED] USB
[08:18:00] ===== [PASSED] drm_test_drmm_connector_init_type_valid =====
[08:18:00] =============== [PASSED] drmm_connector_init ===============
[08:18:00] ========= drm_connector_dynamic_init (6 subtests) ==========
[08:18:00] [PASSED] drm_test_drm_connector_dynamic_init
[08:18:00] [PASSED] drm_test_drm_connector_dynamic_init_null_ddc
[08:18:00] [PASSED] drm_test_drm_connector_dynamic_init_not_added
[08:18:00] [PASSED] drm_test_drm_connector_dynamic_init_properties
[08:18:00] ===== drm_test_drm_connector_dynamic_init_type_valid ======
[08:18:00] [PASSED] Unknown
[08:18:00] [PASSED] VGA
[08:18:00] [PASSED] DVI-I
[08:18:00] [PASSED] DVI-D
[08:18:00] [PASSED] DVI-A
[08:18:00] [PASSED] Composite
[08:18:00] [PASSED] SVIDEO
[08:18:00] [PASSED] LVDS
[08:18:00] [PASSED] Component
[08:18:00] [PASSED] DIN
[08:18:00] [PASSED] DP
[08:18:00] [PASSED] HDMI-A
[08:18:00] [PASSED] HDMI-B
[08:18:00] [PASSED] TV
[08:18:00] [PASSED] eDP
[08:18:00] [PASSED] Virtual
[08:18:00] [PASSED] DSI
[08:18:00] [PASSED] DPI
[08:18:00] [PASSED] Writeback
[08:18:00] [PASSED] SPI
[08:18:00] [PASSED] USB
[08:18:00] = [PASSED] drm_test_drm_connector_dynamic_init_type_valid ==
[08:18:00] ======== drm_test_drm_connector_dynamic_init_name =========
[08:18:00] [PASSED] Unknown
[08:18:00] [PASSED] VGA
[08:18:00] [PASSED] DVI-I
[08:18:00] [PASSED] DVI-D
[08:18:00] [PASSED] DVI-A
[08:18:00] [PASSED] Composite
[08:18:00] [PASSED] SVIDEO
[08:18:00] [PASSED] LVDS
[08:18:00] [PASSED] Component
[08:18:00] [PASSED] DIN
[08:18:00] [PASSED] DP
[08:18:00] [PASSED] HDMI-A
[08:18:00] [PASSED] HDMI-B
[08:18:00] [PASSED] TV
[08:18:00] [PASSED] eDP
[08:18:00] [PASSED] Virtual
[08:18:00] [PASSED] DSI
[08:18:00] [PASSED] DPI
[08:18:00] [PASSED] Writeback
[08:18:00] [PASSED] SPI
[08:18:00] [PASSED] USB
[08:18:00] ==== [PASSED] drm_test_drm_connector_dynamic_init_name =====
[08:18:00] =========== [PASSED] drm_connector_dynamic_init ============
[08:18:00] ==== drm_connector_dynamic_register_early (4 subtests) =====
[08:18:00] [PASSED] drm_test_drm_connector_dynamic_register_early_on_list
[08:18:00] [PASSED] drm_test_drm_connector_dynamic_register_early_defer
[08:18:00] [PASSED] drm_test_drm_connector_dynamic_register_early_no_init
[08:18:00] [PASSED] drm_test_drm_connector_dynamic_register_early_no_mode_object
[08:18:00] ====== [PASSED] drm_connector_dynamic_register_early =======
[08:18:00] ======= drm_connector_dynamic_register (7 subtests) ========
[08:18:00] [PASSED] drm_test_drm_connector_dynamic_register_on_list
[08:18:00] [PASSED] drm_test_drm_connector_dynamic_register_no_defer
[08:18:00] [PASSED] drm_test_drm_connector_dynamic_register_no_init
[08:18:00] [PASSED] drm_test_drm_connector_dynamic_register_mode_object
[08:18:00] [PASSED] drm_test_drm_connector_dynamic_register_sysfs
[08:18:00] [PASSED] drm_test_drm_connector_dynamic_register_sysfs_name
[08:18:00] [PASSED] drm_test_drm_connector_dynamic_register_debugfs
[08:18:00] ========= [PASSED] drm_connector_dynamic_register ==========
[08:18:00] = drm_connector_attach_broadcast_rgb_property (2 subtests) =
[08:18:00] [PASSED] drm_test_drm_connector_attach_broadcast_rgb_property
[08:18:00] [PASSED] drm_test_drm_connector_attach_broadcast_rgb_property_hdmi_connector
[08:18:00] === [PASSED] drm_connector_attach_broadcast_rgb_property ===
[08:18:00] ========== drm_get_tv_mode_from_name (2 subtests) ==========
[08:18:00] ========== drm_test_get_tv_mode_from_name_valid ===========
[08:18:00] [PASSED] NTSC
[08:18:00] [PASSED] NTSC-443
[08:18:00] [PASSED] NTSC-J
[08:18:00] [PASSED] PAL
[08:18:00] [PASSED] PAL-M
[08:18:00] [PASSED] PAL-N
[08:18:00] [PASSED] SECAM
[08:18:00] [PASSED] Mono
[08:18:00] ====== [PASSED] drm_test_get_tv_mode_from_name_valid =======
[08:18:00] [PASSED] drm_test_get_tv_mode_from_name_truncated
[08:18:00] ============ [PASSED] drm_get_tv_mode_from_name ============
[08:18:00] = drm_test_connector_hdmi_compute_mode_clock (12 subtests) =
[08:18:00] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb
[08:18:00] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_10bpc
[08:18:00] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_10bpc_vic_1
[08:18:00] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_12bpc
[08:18:00] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_12bpc_vic_1
[08:18:00] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_double
[08:18:00] = drm_test_connector_hdmi_compute_mode_clock_yuv420_valid =
[08:18:00] [PASSED] VIC 96
[08:18:00] [PASSED] VIC 97
[08:18:00] [PASSED] VIC 101
[08:18:00] [PASSED] VIC 102
[08:18:00] [PASSED] VIC 106
[08:18:00] [PASSED] VIC 107
[08:18:00] === [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_valid ===
[08:18:00] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_10_bpc
[08:18:00] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_12_bpc
[08:18:00] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_8_bpc
[08:18:00] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_10_bpc
[08:18:00] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_12_bpc
[08:18:00] === [PASSED] drm_test_connector_hdmi_compute_mode_clock ====
[08:18:00] == drm_hdmi_connector_get_broadcast_rgb_name (2 subtests) ==
[08:18:00] === drm_test_drm_hdmi_connector_get_broadcast_rgb_name ====
[08:18:00] [PASSED] Automatic
[08:18:00] [PASSED] Full
[08:18:00] [PASSED] Limited 16:235
[08:18:00] === [PASSED] drm_test_drm_hdmi_connector_get_broadcast_rgb_name ===
[08:18:00] [PASSED] drm_test_drm_hdmi_connector_get_broadcast_rgb_name_invalid
[08:18:00] ==== [PASSED] drm_hdmi_connector_get_broadcast_rgb_name ====
[08:18:00] == drm_hdmi_connector_get_output_format_name (2 subtests) ==
[08:18:00] === drm_test_drm_hdmi_connector_get_output_format_name ====
[08:18:00] [PASSED] RGB
[08:18:00] [PASSED] YUV 4:2:0
[08:18:00] [PASSED] YUV 4:2:2
[08:18:00] [PASSED] YUV 4:4:4
[08:18:00] === [PASSED] drm_test_drm_hdmi_connector_get_output_format_name ===
[08:18:00] [PASSED] drm_test_drm_hdmi_connector_get_output_format_name_invalid
[08:18:00] ==== [PASSED] drm_hdmi_connector_get_output_format_name ====
[08:18:00] ============= drm_damage_helper (21 subtests) ==============
[08:18:00] [PASSED] drm_test_damage_iter_no_damage
[08:18:00] [PASSED] drm_test_damage_iter_no_damage_fractional_src
[08:18:00] [PASSED] drm_test_damage_iter_no_damage_src_moved
[08:18:00] [PASSED] drm_test_damage_iter_no_damage_fractional_src_moved
[08:18:00] [PASSED] drm_test_damage_iter_no_damage_not_visible
[08:18:00] [PASSED] drm_test_damage_iter_no_damage_no_crtc
[08:18:00] [PASSED] drm_test_damage_iter_no_damage_no_fb
[08:18:00] [PASSED] drm_test_damage_iter_simple_damage
[08:18:00] [PASSED] drm_test_damage_iter_single_damage
[08:18:00] [PASSED] drm_test_damage_iter_single_damage_intersect_src
[08:18:00] [PASSED] drm_test_damage_iter_single_damage_outside_src
[08:18:00] [PASSED] drm_test_damage_iter_single_damage_fractional_src
[08:18:00] [PASSED] drm_test_damage_iter_single_damage_intersect_fractional_src
[08:18:00] [PASSED] drm_test_damage_iter_single_damage_outside_fractional_src
[08:18:00] [PASSED] drm_test_damage_iter_single_damage_src_moved
[08:18:00] [PASSED] drm_test_damage_iter_single_damage_fractional_src_moved
[08:18:00] [PASSED] drm_test_damage_iter_damage
[08:18:00] [PASSED] drm_test_damage_iter_damage_one_intersect
[08:18:00] [PASSED] drm_test_damage_iter_damage_one_outside
[08:18:00] [PASSED] drm_test_damage_iter_damage_src_moved
[08:18:00] [PASSED] drm_test_damage_iter_damage_not_visible
[08:18:00] ================ [PASSED] drm_damage_helper ================
[08:18:00] ============== drm_dp_mst_helper (3 subtests) ==============
[08:18:00] ============== drm_test_dp_mst_calc_pbn_mode ==============
[08:18:00] [PASSED] Clock 154000 BPP 30 DSC disabled
[08:18:00] [PASSED] Clock 234000 BPP 30 DSC disabled
[08:18:00] [PASSED] Clock 297000 BPP 24 DSC disabled
[08:18:00] [PASSED] Clock 332880 BPP 24 DSC enabled
[08:18:00] [PASSED] Clock 324540 BPP 24 DSC enabled
[08:18:00] ========== [PASSED] drm_test_dp_mst_calc_pbn_mode ==========
[08:18:00] ============== drm_test_dp_mst_calc_pbn_div ===============
[08:18:00] [PASSED] Link rate 2000000 lane count 4
[08:18:00] [PASSED] Link rate 2000000 lane count 2
[08:18:00] [PASSED] Link rate 2000000 lane count 1
[08:18:00] [PASSED] Link rate 1350000 lane count 4
[08:18:00] [PASSED] Link rate 1350000 lane count 2
[08:18:00] [PASSED] Link rate 1350000 lane count 1
[08:18:00] [PASSED] Link rate 1000000 lane count 4
[08:18:00] [PASSED] Link rate 1000000 lane count 2
[08:18:00] [PASSED] Link rate 1000000 lane count 1
[08:18:00] [PASSED] Link rate 810000 lane count 4
[08:18:00] [PASSED] Link rate 810000 lane count 2
[08:18:00] [PASSED] Link rate 810000 lane count 1
[08:18:00] [PASSED] Link rate 540000 lane count 4
[08:18:00] [PASSED] Link rate 540000 lane count 2
[08:18:00] [PASSED] Link rate 540000 lane count 1
[08:18:00] [PASSED] Link rate 270000 lane count 4
[08:18:00] [PASSED] Link rate 270000 lane count 2
[08:18:00] [PASSED] Link rate 270000 lane count 1
[08:18:00] [PASSED] Link rate 162000 lane count 4
[08:18:00] [PASSED] Link rate 162000 lane count 2
[08:18:00] [PASSED] Link rate 162000 lane count 1
[08:18:00] ========== [PASSED] drm_test_dp_mst_calc_pbn_div ===========
[08:18:00] ========= drm_test_dp_mst_sideband_msg_req_decode =========
[08:18:00] [PASSED] DP_ENUM_PATH_RESOURCES with port number
[08:18:00] [PASSED] DP_POWER_UP_PHY with port number
[08:18:00] [PASSED] DP_POWER_DOWN_PHY with port number
[08:18:00] [PASSED] DP_ALLOCATE_PAYLOAD with SDP stream sinks
[08:18:00] [PASSED] DP_ALLOCATE_PAYLOAD with port number
[08:18:00] [PASSED] DP_ALLOCATE_PAYLOAD with VCPI
[08:18:00] [PASSED] DP_ALLOCATE_PAYLOAD with PBN
[08:18:00] [PASSED] DP_QUERY_PAYLOAD with port number
[08:18:00] [PASSED] DP_QUERY_PAYLOAD with VCPI
[08:18:00] [PASSED] DP_REMOTE_DPCD_READ with port number
[08:18:00] [PASSED] DP_REMOTE_DPCD_READ with DPCD address
[08:18:00] [PASSED] DP_REMOTE_DPCD_READ with max number of bytes
[08:18:00] [PASSED] DP_REMOTE_DPCD_WRITE with port number
[08:18:00] [PASSED] DP_REMOTE_DPCD_WRITE with DPCD address
[08:18:00] [PASSED] DP_REMOTE_DPCD_WRITE with data array
[08:18:00] [PASSED] DP_REMOTE_I2C_READ with port number
[08:18:00] [PASSED] DP_REMOTE_I2C_READ with I2C device ID
[08:18:00] [PASSED] DP_REMOTE_I2C_READ with transactions array
[08:18:00] [PASSED] DP_REMOTE_I2C_WRITE with port number
[08:18:00] [PASSED] DP_REMOTE_I2C_WRITE with I2C device ID
[08:18:00] [PASSED] DP_REMOTE_I2C_WRITE with data array
[08:18:00] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream ID
[08:18:00] [PASSED] DP_QUERY_STREAM_ENC_STATUS with client ID
[08:18:00] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream event
[08:18:00] [PASSED] DP_QUERY_STREAM_ENC_STATUS with valid stream event
[08:18:00] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream behavior
[08:18:00] [PASSED] DP_QUERY_STREAM_ENC_STATUS with a valid stream behavior
[08:18:00] ===== [PASSED] drm_test_dp_mst_sideband_msg_req_decode =====
[08:18:00] ================ [PASSED] drm_dp_mst_helper ================
[08:18:00] ================== drm_exec (7 subtests) ===================
[08:18:00] [PASSED] sanitycheck
[08:18:00] [PASSED] test_lock
[08:18:00] [PASSED] test_lock_unlock
[08:18:00] [PASSED] test_duplicates
[08:18:00] [PASSED] test_prepare
[08:18:00] [PASSED] test_prepare_array
[08:18:00] [PASSED] test_multiple_loops
[08:18:00] ==================== [PASSED] drm_exec =====================
[08:18:00] =========== drm_format_helper_test (17 subtests) ===========
[08:18:00] ============== drm_test_fb_xrgb8888_to_gray8 ==============
[08:18:00] [PASSED] single_pixel_source_buffer
[08:18:00] [PASSED] single_pixel_clip_rectangle
[08:18:00] [PASSED] well_known_colors
[08:18:00] [PASSED] destination_pitch
[08:18:00] ========== [PASSED] drm_test_fb_xrgb8888_to_gray8 ==========
[08:18:00] ============= drm_test_fb_xrgb8888_to_rgb332 ==============
[08:18:00] [PASSED] single_pixel_source_buffer
[08:18:00] [PASSED] single_pixel_clip_rectangle
[08:18:00] [PASSED] well_known_colors
[08:18:00] [PASSED] destination_pitch
[08:18:00] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb332 ==========
[08:18:00] ============= drm_test_fb_xrgb8888_to_rgb565 ==============
[08:18:00] [PASSED] single_pixel_source_buffer
[08:18:00] [PASSED] single_pixel_clip_rectangle
[08:18:00] [PASSED] well_known_colors
[08:18:00] [PASSED] destination_pitch
[08:18:00] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb565 ==========
[08:18:00] ============ drm_test_fb_xrgb8888_to_xrgb1555 =============
[08:18:00] [PASSED] single_pixel_source_buffer
[08:18:00] [PASSED] single_pixel_clip_rectangle
[08:18:00] [PASSED] well_known_colors
[08:18:00] [PASSED] destination_pitch
[08:18:00] ======== [PASSED] drm_test_fb_xrgb8888_to_xrgb1555 =========
[08:18:00] ============ drm_test_fb_xrgb8888_to_argb1555 =============
[08:18:00] [PASSED] single_pixel_source_buffer
[08:18:00] [PASSED] single_pixel_clip_rectangle
[08:18:00] [PASSED] well_known_colors
[08:18:00] [PASSED] destination_pitch
[08:18:00] ======== [PASSED] drm_test_fb_xrgb8888_to_argb1555 =========
[08:18:00] ============ drm_test_fb_xrgb8888_to_rgba5551 =============
[08:18:00] [PASSED] single_pixel_source_buffer
[08:18:00] [PASSED] single_pixel_clip_rectangle
[08:18:00] [PASSED] well_known_colors
[08:18:00] [PASSED] destination_pitch
[08:18:00] ======== [PASSED] drm_test_fb_xrgb8888_to_rgba5551 =========
[08:18:00] ============= drm_test_fb_xrgb8888_to_rgb888 ==============
[08:18:00] [PASSED] single_pixel_source_buffer
[08:18:00] [PASSED] single_pixel_clip_rectangle
[08:18:00] [PASSED] well_known_colors
[08:18:00] [PASSED] destination_pitch
[08:18:00] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb888 ==========
[08:18:00] ============= drm_test_fb_xrgb8888_to_bgr888 ==============
[08:18:00] [PASSED] single_pixel_source_buffer
[08:18:00] [PASSED] single_pixel_clip_rectangle
[08:18:00] [PASSED] well_known_colors
[08:18:00] [PASSED] destination_pitch
[08:18:00] ========= [PASSED] drm_test_fb_xrgb8888_to_bgr888 ==========
[08:18:00] ============ drm_test_fb_xrgb8888_to_argb8888 =============
[08:18:00] [PASSED] single_pixel_source_buffer
[08:18:00] [PASSED] single_pixel_clip_rectangle
[08:18:00] [PASSED] well_known_colors
[08:18:00] [PASSED] destination_pitch
[08:18:00] ======== [PASSED] drm_test_fb_xrgb8888_to_argb8888 =========
[08:18:00] =========== drm_test_fb_xrgb8888_to_xrgb2101010 ===========
[08:18:00] [PASSED] single_pixel_source_buffer
[08:18:00] [PASSED] single_pixel_clip_rectangle
[08:18:00] [PASSED] well_known_colors
[08:18:00] [PASSED] destination_pitch
[08:18:00] ======= [PASSED] drm_test_fb_xrgb8888_to_xrgb2101010 =======
[08:18:00] =========== drm_test_fb_xrgb8888_to_argb2101010 ===========
[08:18:00] [PASSED] single_pixel_source_buffer
[08:18:00] [PASSED] single_pixel_clip_rectangle
[08:18:00] [PASSED] well_known_colors
[08:18:00] [PASSED] destination_pitch
[08:18:00] ======= [PASSED] drm_test_fb_xrgb8888_to_argb2101010 =======
[08:18:00] ============== drm_test_fb_xrgb8888_to_mono ===============
[08:18:00] [PASSED] single_pixel_source_buffer
[08:18:00] [PASSED] single_pixel_clip_rectangle
[08:18:00] [PASSED] well_known_colors
[08:18:00] [PASSED] destination_pitch
[08:18:00] ========== [PASSED] drm_test_fb_xrgb8888_to_mono ===========
[08:18:00] ==================== drm_test_fb_swab =====================
[08:18:00] [PASSED] single_pixel_source_buffer
[08:18:00] [PASSED] single_pixel_clip_rectangle
[08:18:00] [PASSED] well_known_colors
[08:18:00] [PASSED] destination_pitch
[08:18:00] ================ [PASSED] drm_test_fb_swab =================
[08:18:00] ============ drm_test_fb_xrgb8888_to_xbgr8888 =============
[08:18:00] [PASSED] single_pixel_source_buffer
[08:18:00] [PASSED] single_pixel_clip_rectangle
[08:18:00] [PASSED] well_known_colors
[08:18:00] [PASSED] destination_pitch
[08:18:00] ======== [PASSED] drm_test_fb_xrgb8888_to_xbgr8888 =========
[08:18:00] ============ drm_test_fb_xrgb8888_to_abgr8888 =============
[08:18:00] [PASSED] single_pixel_source_buffer
[08:18:00] [PASSED] single_pixel_clip_rectangle
[08:18:00] [PASSED] well_known_colors
[08:18:00] [PASSED] destination_pitch
[08:18:00] ======== [PASSED] drm_test_fb_xrgb8888_to_abgr8888 =========
[08:18:00] ================= drm_test_fb_clip_offset =================
[08:18:00] [PASSED] pass through
[08:18:00] [PASSED] horizontal offset
[08:18:00] [PASSED] vertical offset
[08:18:00] [PASSED] horizontal and vertical offset
[08:18:00] [PASSED] horizontal offset (custom pitch)
[08:18:00] [PASSED] vertical offset (custom pitch)
[08:18:00] [PASSED] horizontal and vertical offset (custom pitch)
[08:18:00] ============= [PASSED] drm_test_fb_clip_offset =============
[08:18:00] =================== drm_test_fb_memcpy ====================
[08:18:00] [PASSED] single_pixel_source_buffer: XR24 little-endian (0x34325258)
[08:18:00] [PASSED] single_pixel_source_buffer: XRA8 little-endian (0x38415258)
[08:18:00] [PASSED] single_pixel_source_buffer: YU24 little-endian (0x34325559)
[08:18:00] [PASSED] single_pixel_clip_rectangle: XB24 little-endian (0x34324258)
[08:18:00] [PASSED] single_pixel_clip_rectangle: XRA8 little-endian (0x38415258)
[08:18:00] [PASSED] single_pixel_clip_rectangle: YU24 little-endian (0x34325559)
[08:18:00] [PASSED] well_known_colors: XB24 little-endian (0x34324258)
[08:18:00] [PASSED] well_known_colors: XRA8 little-endian (0x38415258)
[08:18:00] [PASSED] well_known_colors: YU24 little-endian (0x34325559)
[08:18:00] [PASSED] destination_pitch: XB24 little-endian (0x34324258)
[08:18:00] [PASSED] destination_pitch: XRA8 little-endian (0x38415258)
[08:18:00] [PASSED] destination_pitch: YU24 little-endian (0x34325559)
[08:18:00] =============== [PASSED] drm_test_fb_memcpy ================
[08:18:00] ============= [PASSED] drm_format_helper_test ==============
[08:18:00] ================= drm_format (18 subtests) =================
[08:18:00] [PASSED] drm_test_format_block_width_invalid
[08:18:00] [PASSED] drm_test_format_block_width_one_plane
[08:18:00] [PASSED] drm_test_format_block_width_two_plane
[08:18:00] [PASSED] drm_test_format_block_width_three_plane
[08:18:00] [PASSED] drm_test_format_block_width_tiled
[08:18:00] [PASSED] drm_test_format_block_height_invalid
[08:18:00] [PASSED] drm_test_format_block_height_one_plane
[08:18:00] [PASSED] drm_test_format_block_height_two_plane
[08:18:00] [PASSED] drm_test_format_block_height_three_plane
[08:18:00] [PASSED] drm_test_format_block_height_tiled
[08:18:00] [PASSED] drm_test_format_min_pitch_invalid
[08:18:00] [PASSED] drm_test_format_min_pitch_one_plane_8bpp
[08:18:00] [PASSED] drm_test_format_min_pitch_one_plane_16bpp
[08:18:00] [PASSED] drm_test_format_min_pitch_one_plane_24bpp
[08:18:00] [PASSED] drm_test_format_min_pitch_one_plane_32bpp
[08:18:00] [PASSED] drm_test_format_min_pitch_two_plane
[08:18:00] [PASSED] drm_test_format_min_pitch_three_plane_8bpp
[08:18:00] [PASSED] drm_test_format_min_pitch_tiled
[08:18:00] =================== [PASSED] drm_format ====================
[08:18:00] ============== drm_framebuffer (10 subtests) ===============
[08:18:00] ========== drm_test_framebuffer_check_src_coords ==========
[08:18:00] [PASSED] Success: source fits into fb
[08:18:00] [PASSED] Fail: overflowing fb with x-axis coordinate
[08:18:00] [PASSED] Fail: overflowing fb with y-axis coordinate
[08:18:00] [PASSED] Fail: overflowing fb with source width
[08:18:00] [PASSED] Fail: overflowing fb with source height
[08:18:00] ====== [PASSED] drm_test_framebuffer_check_src_coords ======
[08:18:00] [PASSED] drm_test_framebuffer_cleanup
[08:18:00] =============== drm_test_framebuffer_create ===============
[08:18:00] [PASSED] ABGR8888 normal sizes
[08:18:00] [PASSED] ABGR8888 max sizes
[08:18:00] [PASSED] ABGR8888 pitch greater than min required
[08:18:00] [PASSED] ABGR8888 pitch less than min required
[08:18:00] [PASSED] ABGR8888 Invalid width
[08:18:00] [PASSED] ABGR8888 Invalid buffer handle
[08:18:00] [PASSED] No pixel format
[08:18:00] [PASSED] ABGR8888 Width 0
[08:18:00] [PASSED] ABGR8888 Height 0
[08:18:00] [PASSED] ABGR8888 Out of bound height * pitch combination
[08:18:00] [PASSED] ABGR8888 Large buffer offset
[08:18:00] [PASSED] ABGR8888 Buffer offset for inexistent plane
[08:18:00] [PASSED] ABGR8888 Invalid flag
[08:18:00] [PASSED] ABGR8888 Set DRM_MODE_FB_MODIFIERS without modifiers
[08:18:00] [PASSED] ABGR8888 Valid buffer modifier
[08:18:00] [PASSED] ABGR8888 Invalid buffer modifier(DRM_FORMAT_MOD_SAMSUNG_64_32_TILE)
[08:18:00] [PASSED] ABGR8888 Extra pitches without DRM_MODE_FB_MODIFIERS
[08:18:00] [PASSED] ABGR8888 Extra pitches with DRM_MODE_FB_MODIFIERS
[08:18:00] [PASSED] NV12 Normal sizes
[08:18:00] [PASSED] NV12 Max sizes
[08:18:00] [PASSED] NV12 Invalid pitch
[08:18:00] [PASSED] NV12 Invalid modifier/missing DRM_MODE_FB_MODIFIERS flag
[08:18:00] [PASSED] NV12 different modifier per-plane
[08:18:00] [PASSED] NV12 with DRM_FORMAT_MOD_SAMSUNG_64_32_TILE
[08:18:00] [PASSED] NV12 Valid modifiers without DRM_MODE_FB_MODIFIERS
[08:18:00] [PASSED] NV12 Modifier for inexistent plane
[08:18:00] [PASSED] NV12 Handle for inexistent plane
[08:18:00] [PASSED] NV12 Handle for inexistent plane without DRM_MODE_FB_MODIFIERS
[08:18:00] [PASSED] YVU420 DRM_MODE_FB_MODIFIERS set without modifier
[08:18:00] [PASSED] YVU420 Normal sizes
[08:18:00] [PASSED] YVU420 Max sizes
[08:18:00] [PASSED] YVU420 Invalid pitch
[08:18:00] [PASSED] YVU420 Different pitches
[08:18:00] [PASSED] YVU420 Different buffer offsets/pitches
[08:18:00] [PASSED] YVU420 Modifier set just for plane 0, without DRM_MODE_FB_MODIFIERS
[08:18:00] [PASSED] YVU420 Modifier set just for planes 0, 1, without DRM_MODE_FB_MODIFIERS
[08:18:00] [PASSED] YVU420 Modifier set just for plane 0, 1, with DRM_MODE_FB_MODIFIERS
[08:18:00] [PASSED] YVU420 Valid modifier
[08:18:00] [PASSED] YVU420 Different modifiers per plane
[08:18:00] [PASSED] YVU420 Modifier for inexistent plane
[08:18:00] [PASSED] YUV420_10BIT Invalid modifier(DRM_FORMAT_MOD_LINEAR)
[08:18:00] [PASSED] X0L2 Normal sizes
[08:18:00] [PASSED] X0L2 Max sizes
[08:18:00] [PASSED] X0L2 Invalid pitch
[08:18:00] [PASSED] X0L2 Pitch greater than minimum required
[08:18:00] [PASSED] X0L2 Handle for inexistent plane
[08:18:00] [PASSED] X0L2 Offset for inexistent plane, without DRM_MODE_FB_MODIFIERS set
[08:18:00] [PASSED] X0L2 Modifier without DRM_MODE_FB_MODIFIERS set
[08:18:00] [PASSED] X0L2 Valid modifier
[08:18:00] [PASSED] X0L2 Modifier for inexistent plane
[08:18:00] =========== [PASSED] drm_test_framebuffer_create ===========
[08:18:00] [PASSED] drm_test_framebuffer_free
[08:18:00] [PASSED] drm_test_framebuffer_init
[08:18:00] [PASSED] drm_test_framebuffer_init_bad_format
[08:18:00] [PASSED] drm_test_framebuffer_init_dev_mismatch
[08:18:00] [PASSED] drm_test_framebuffer_lookup
[08:18:00] [PASSED] drm_test_framebuffer_lookup_inexistent
[08:18:00] [PASSED] drm_test_framebuffer_modifiers_not_supported
[08:18:00] ================= [PASSED] drm_framebuffer =================
[08:18:00] ================ drm_gem_shmem (8 subtests) ================
[08:18:00] [PASSED] drm_gem_shmem_test_obj_create
[08:18:00] [PASSED] drm_gem_shmem_test_obj_create_private
[08:18:00] [PASSED] drm_gem_shmem_test_pin_pages
[08:18:00] [PASSED] drm_gem_shmem_test_vmap
[08:18:00] [PASSED] drm_gem_shmem_test_get_sg_table
[08:18:00] [PASSED] drm_gem_shmem_test_get_pages_sgt
[08:18:00] [PASSED] drm_gem_shmem_test_madvise
[08:18:00] [PASSED] drm_gem_shmem_test_purge
[08:18:00] ================== [PASSED] drm_gem_shmem ==================
[08:18:00] === drm_atomic_helper_connector_hdmi_check (29 subtests) ===
[08:18:00] [PASSED] drm_test_check_broadcast_rgb_auto_cea_mode
[08:18:00] [PASSED] drm_test_check_broadcast_rgb_auto_cea_mode_vic_1
[08:18:00] [PASSED] drm_test_check_broadcast_rgb_full_cea_mode
[08:18:00] [PASSED] drm_test_check_broadcast_rgb_full_cea_mode_vic_1
[08:18:00] [PASSED] drm_test_check_broadcast_rgb_limited_cea_mode
[08:18:00] [PASSED] drm_test_check_broadcast_rgb_limited_cea_mode_vic_1
[08:18:00] ====== drm_test_check_broadcast_rgb_cea_mode_yuv420 =======
[08:18:00] [PASSED] Automatic
[08:18:00] [PASSED] Full
[08:18:00] [PASSED] Limited 16:235
[08:18:00] == [PASSED] drm_test_check_broadcast_rgb_cea_mode_yuv420 ===
[08:18:00] [PASSED] drm_test_check_broadcast_rgb_crtc_mode_changed
[08:18:00] [PASSED] drm_test_check_broadcast_rgb_crtc_mode_not_changed
[08:18:00] [PASSED] drm_test_check_disable_connector
[08:18:00] [PASSED] drm_test_check_hdmi_funcs_reject_rate
[08:18:00] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_rgb
[08:18:00] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_yuv420
[08:18:00] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_ignore_yuv422
[08:18:00] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_ignore_yuv420
[08:18:00] [PASSED] drm_test_check_driver_unsupported_fallback_yuv420
[08:18:00] [PASSED] drm_test_check_output_bpc_crtc_mode_changed
[08:18:00] [PASSED] drm_test_check_output_bpc_crtc_mode_not_changed
[08:18:00] [PASSED] drm_test_check_output_bpc_dvi
[08:18:00] [PASSED] drm_test_check_output_bpc_format_vic_1
[08:18:00] [PASSED] drm_test_check_output_bpc_format_display_8bpc_only
[08:18:00] [PASSED] drm_test_check_output_bpc_format_display_rgb_only
[08:18:00] [PASSED] drm_test_check_output_bpc_format_driver_8bpc_only
[08:18:00] [PASSED] drm_test_check_output_bpc_format_driver_rgb_only
[08:18:00] [PASSED] drm_test_check_tmds_char_rate_rgb_8bpc
[08:18:00] [PASSED] drm_test_check_tmds_char_rate_rgb_10bpc
[08:18:00] [PASSED] drm_test_check_tmds_char_rate_rgb_12bpc
[08:18:00] ============ drm_test_check_hdmi_color_format =============
[08:18:00] [PASSED] AUTO -> RGB
[08:18:00] [PASSED] YCBCR422 -> YUV422
[08:18:00] [PASSED] YCBCR420 -> YUV420
[08:18:00] [PASSED] YCBCR444 -> YUV444
[08:18:00] [PASSED] RGB -> RGB
[08:18:00] ======== [PASSED] drm_test_check_hdmi_color_format =========
[08:18:00] ======== drm_test_check_hdmi_color_format_420_only ========
[08:18:00] [PASSED] RGB should fail
[08:18:00] [PASSED] YUV444 should fail
[08:18:00] [PASSED] YUV422 should fail
[08:18:00] [PASSED] YUV420 should work
[08:18:00] ==== [PASSED] drm_test_check_hdmi_color_format_420_only ====
[08:18:00] ===== [PASSED] drm_atomic_helper_connector_hdmi_check ======
[08:18:00] === drm_atomic_helper_connector_hdmi_reset (6 subtests) ====
[08:18:00] [PASSED] drm_test_check_broadcast_rgb_value
[08:18:00] [PASSED] drm_test_check_bpc_8_value
[08:18:00] [PASSED] drm_test_check_bpc_10_value
[08:18:00] [PASSED] drm_test_check_bpc_12_value
[08:18:00] [PASSED] drm_test_check_format_value
[08:18:00] [PASSED] drm_test_check_tmds_char_value
[08:18:00] ===== [PASSED] drm_atomic_helper_connector_hdmi_reset ======
[08:18:00] = drm_atomic_helper_connector_hdmi_mode_valid (7 subtests) =
[08:18:00] [PASSED] drm_test_check_mode_valid
[08:18:00] [PASSED] drm_test_check_mode_valid_reject
[08:18:00] [PASSED] drm_test_check_mode_valid_reject_rate
[08:18:00] [PASSED] drm_test_check_mode_valid_reject_max_clock
[08:18:00] [PASSED] drm_test_check_mode_valid_yuv420_only_max_clock
[08:18:00] [PASSED] drm_test_check_mode_valid_reject_yuv420_only_connector
[08:18:00] [PASSED] drm_test_check_mode_valid_accept_yuv420_also_connector_rgb
[08:18:00] === [PASSED] drm_atomic_helper_connector_hdmi_mode_valid ===
[08:18:00] = drm_atomic_helper_connector_hdmi_infoframes (5 subtests) =
[08:18:00] [PASSED] drm_test_check_infoframes
[08:18:00] [PASSED] drm_test_check_reject_avi_infoframe
[08:18:00] [PASSED] drm_test_check_reject_hdr_infoframe_bpc_8
[08:18:00] [PASSED] drm_test_check_reject_hdr_infoframe_bpc_10
[08:18:00] [PASSED] drm_test_check_reject_audio_infoframe
[08:18:00] === [PASSED] drm_atomic_helper_connector_hdmi_infoframes ===
[08:18:00] ================= drm_managed (2 subtests) =================
[08:18:00] [PASSED] drm_test_managed_release_action
[08:18:00] [PASSED] drm_test_managed_run_action
[08:18:00] =================== [PASSED] drm_managed ===================
[08:18:00] =================== drm_mm (6 subtests) ====================
[08:18:00] [PASSED] drm_test_mm_init
[08:18:00] [PASSED] drm_test_mm_debug
[08:18:00] [PASSED] drm_test_mm_align32
[08:18:00] [PASSED] drm_test_mm_align64
[08:18:00] [PASSED] drm_test_mm_lowest
[08:18:00] [PASSED] drm_test_mm_highest
[08:18:00] ===================== [PASSED] drm_mm ======================
[08:18:00] ============= drm_modes_analog_tv (5 subtests) =============
[08:18:00] [PASSED] drm_test_modes_analog_tv_mono_576i
[08:18:00] [PASSED] drm_test_modes_analog_tv_ntsc_480i
[08:18:00] [PASSED] drm_test_modes_analog_tv_ntsc_480i_inlined
[08:18:00] [PASSED] drm_test_modes_analog_tv_pal_576i
[08:18:00] [PASSED] drm_test_modes_analog_tv_pal_576i_inlined
[08:18:00] =============== [PASSED] drm_modes_analog_tv ===============
[08:18:00] ============== drm_plane_helper (2 subtests) ===============
[08:18:00] =============== drm_test_check_plane_state ================
[08:18:00] [PASSED] clipping_simple
[08:18:00] [PASSED] clipping_rotate_reflect
[08:18:00] [PASSED] positioning_simple
[08:18:00] [PASSED] upscaling
[08:18:00] [PASSED] downscaling
[08:18:00] [PASSED] rounding1
[08:18:00] [PASSED] rounding2
[08:18:00] [PASSED] rounding3
[08:18:00] [PASSED] rounding4
[08:18:00] =========== [PASSED] drm_test_check_plane_state ============
[08:18:00] =========== drm_test_check_invalid_plane_state ============
[08:18:00] [PASSED] positioning_invalid
[08:18:00] [PASSED] upscaling_invalid
[08:18:00] [PASSED] downscaling_invalid
[08:18:00] ======= [PASSED] drm_test_check_invalid_plane_state ========
[08:18:00] ================ [PASSED] drm_plane_helper =================
[08:18:00] ====== drm_connector_helper_tv_get_modes (1 subtest) =======
[08:18:00] ====== drm_test_connector_helper_tv_get_modes_check =======
[08:18:00] [PASSED] None
[08:18:00] [PASSED] PAL
[08:18:00] [PASSED] NTSC
[08:18:00] [PASSED] Both, NTSC Default
[08:18:00] [PASSED] Both, PAL Default
[08:18:00] [PASSED] Both, NTSC Default, with PAL on command-line
[08:18:00] [PASSED] Both, PAL Default, with NTSC on command-line
[08:18:00] == [PASSED] drm_test_connector_helper_tv_get_modes_check ===
[08:18:00] ======== [PASSED] drm_connector_helper_tv_get_modes ========
[08:18:00] ================== drm_rect (9 subtests) ===================
[08:18:00] [PASSED] drm_test_rect_clip_scaled_div_by_zero
[08:18:00] [PASSED] drm_test_rect_clip_scaled_not_clipped
[08:18:00] [PASSED] drm_test_rect_clip_scaled_clipped
[08:18:00] [PASSED] drm_test_rect_clip_scaled_signed_vs_unsigned
[08:18:00] ================= drm_test_rect_intersect =================
[08:18:00] [PASSED] top-left x bottom-right: 2x2+1+1 x 2x2+0+0
[08:18:00] [PASSED] top-right x bottom-left: 2x2+0+0 x 2x2+1-1
[08:18:00] [PASSED] bottom-left x top-right: 2x2+1-1 x 2x2+0+0
[08:18:00] [PASSED] bottom-right x top-left: 2x2+0+0 x 2x2+1+1
[08:18:00] [PASSED] right x left: 2x1+0+0 x 3x1+1+0
[08:18:00] [PASSED] left x right: 3x1+1+0 x 2x1+0+0
[08:18:00] [PASSED] up x bottom: 1x2+0+0 x 1x3+0-1
[08:18:00] [PASSED] bottom x up: 1x3+0-1 x 1x2+0+0
[08:18:00] [PASSED] touching corner: 1x1+0+0 x 2x2+1+1
[08:18:00] [PASSED] touching side: 1x1+0+0 x 1x1+1+0
[08:18:00] [PASSED] equal rects: 2x2+0+0 x 2x2+0+0
[08:18:00] [PASSED] inside another: 2x2+0+0 x 1x1+1+1
[08:18:00] [PASSED] far away: 1x1+0+0 x 1x1+3+6
[08:18:00] [PASSED] points intersecting: 0x0+5+10 x 0x0+5+10
[08:18:00] [PASSED] points not intersecting: 0x0+0+0 x 0x0+5+10
[08:18:00] ============= [PASSED] drm_test_rect_intersect =============
[08:18:00] ================ drm_test_rect_calc_hscale ================
[08:18:00] [PASSED] normal use
[08:18:00] [PASSED] out of max range
[08:18:00] [PASSED] out of min range
[08:18:00] [PASSED] zero dst
[08:18:00] [PASSED] negative src
[08:18:00] [PASSED] negative dst
[08:18:00] ============ [PASSED] drm_test_rect_calc_hscale ============
[08:18:00] ================ drm_test_rect_calc_vscale ================
[08:18:00] [PASSED] normal use
[08:18:00] [PASSED] out of max range
[08:18:00] [PASSED] out of min range
[08:18:00] [PASSED] zero dst
[08:18:00] [PASSED] negative src
[08:18:00] [PASSED] negative dst
[08:18:00] ============ [PASSED] drm_test_rect_calc_vscale ============
[08:18:00] ================== drm_test_rect_rotate ===================
[08:18:00] [PASSED] reflect-x
[08:18:00] [PASSED] reflect-y
[08:18:00] [PASSED] rotate-0
[08:18:00] [PASSED] rotate-90
[08:18:00] [PASSED] rotate-180
[08:18:00] [PASSED] rotate-270
[08:18:00] ============== [PASSED] drm_test_rect_rotate ===============
[08:18:00] ================ drm_test_rect_rotate_inv =================
[08:18:00] [PASSED] reflect-x
[08:18:00] [PASSED] reflect-y
[08:18:00] [PASSED] rotate-0
[08:18:00] [PASSED] rotate-90
[08:18:00] [PASSED] rotate-180
[08:18:00] [PASSED] rotate-270
[08:18:00] ============ [PASSED] drm_test_rect_rotate_inv =============
[08:18:00] ==================== [PASSED] drm_rect =====================
[08:18:00] ============ drm_sysfb_modeset_test (1 subtest) ============
[08:18:00] ============ drm_test_sysfb_build_fourcc_list =============
[08:18:00] [PASSED] no native formats
[08:18:00] [PASSED] XRGB8888 as native format
[08:18:00] [PASSED] remove duplicates
[08:18:00] [PASSED] convert alpha formats
[08:18:00] [PASSED] random formats
[08:18:00] ======== [PASSED] drm_test_sysfb_build_fourcc_list =========
[08:18:00] ============= [PASSED] drm_sysfb_modeset_test ==============
[08:18:00] ================== drm_fixp (2 subtests) ===================
[08:18:00] [PASSED] drm_test_int2fixp
[08:18:00] [PASSED] drm_test_sm2fixp
[08:18:00] ==================== [PASSED] drm_fixp =====================
[08:18:00] ============================================================
[08:18:00] Testing complete. Ran 639 tests: passed: 639
[08:18:00] Elapsed time: 25.784s total, 1.716s configuring, 23.898s building, 0.154s running
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/ttm/tests/.kunitconfig
[08:18:00] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[08:18:02] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[08:18:12] Starting KUnit Kernel (1/1)...
[08:18:12] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[08:18:12] ================= ttm_device (5 subtests) ==================
[08:18:12] [PASSED] ttm_device_init_basic
[08:18:12] [PASSED] ttm_device_init_multiple
[08:18:12] [PASSED] ttm_device_fini_basic
[08:18:12] [PASSED] ttm_device_init_no_vma_man
[08:18:12] ================== ttm_device_init_pools ==================
[08:18:12] [PASSED] No DMA allocations, no DMA32 required
[08:18:12] [PASSED] DMA allocations, DMA32 required
[08:18:12] [PASSED] No DMA allocations, DMA32 required
[08:18:12] [PASSED] DMA allocations, no DMA32 required
[08:18:12] ============== [PASSED] ttm_device_init_pools ==============
[08:18:12] =================== [PASSED] ttm_device ====================
[08:18:12] ================== ttm_pool (8 subtests) ===================
[08:18:12] ================== ttm_pool_alloc_basic ===================
[08:18:12] [PASSED] One page
[08:18:12] [PASSED] More than one page
[08:18:12] [PASSED] Above the allocation limit
[08:18:12] [PASSED] One page, with coherent DMA mappings enabled
[08:18:12] [PASSED] Above the allocation limit, with coherent DMA mappings enabled
[08:18:12] ============== [PASSED] ttm_pool_alloc_basic ===============
[08:18:12] ============== ttm_pool_alloc_basic_dma_addr ==============
[08:18:12] [PASSED] One page
[08:18:12] [PASSED] More than one page
[08:18:12] [PASSED] Above the allocation limit
[08:18:12] [PASSED] One page, with coherent DMA mappings enabled
[08:18:12] [PASSED] Above the allocation limit, with coherent DMA mappings enabled
[08:18:12] ========== [PASSED] ttm_pool_alloc_basic_dma_addr ==========
[08:18:12] [PASSED] ttm_pool_alloc_order_caching_match
[08:18:12] [PASSED] ttm_pool_alloc_caching_mismatch
[08:18:12] [PASSED] ttm_pool_alloc_order_mismatch
[08:18:12] [PASSED] ttm_pool_free_dma_alloc
[08:18:12] [PASSED] ttm_pool_free_no_dma_alloc
[08:18:12] [PASSED] ttm_pool_fini_basic
[08:18:12] ==================== [PASSED] ttm_pool =====================
[08:18:12] ================ ttm_resource (8 subtests) =================
[08:18:12] ================= ttm_resource_init_basic =================
[08:18:12] [PASSED] Init resource in TTM_PL_SYSTEM
[08:18:12] [PASSED] Init resource in TTM_PL_VRAM
[08:18:12] [PASSED] Init resource in a private placement
[08:18:12] [PASSED] Init resource in TTM_PL_SYSTEM, set placement flags
[08:18:12] ============= [PASSED] ttm_resource_init_basic =============
[08:18:12] [PASSED] ttm_resource_init_pinned
[08:18:12] [PASSED] ttm_resource_fini_basic
[08:18:12] [PASSED] ttm_resource_manager_init_basic
[08:18:12] [PASSED] ttm_resource_manager_usage_basic
[08:18:12] [PASSED] ttm_resource_manager_set_used_basic
[08:18:12] [PASSED] ttm_sys_man_alloc_basic
[08:18:12] [PASSED] ttm_sys_man_free_basic
[08:18:12] ================== [PASSED] ttm_resource ===================
[08:18:12] =================== ttm_tt (15 subtests) ===================
[08:18:12] ==================== ttm_tt_init_basic ====================
[08:18:12] [PASSED] Page-aligned size
[08:18:12] [PASSED] Extra pages requested
[08:18:12] ================ [PASSED] ttm_tt_init_basic ================
[08:18:12] [PASSED] ttm_tt_init_misaligned
[08:18:12] [PASSED] ttm_tt_fini_basic
[08:18:12] [PASSED] ttm_tt_fini_sg
[08:18:12] [PASSED] ttm_tt_fini_shmem
[08:18:12] [PASSED] ttm_tt_create_basic
[08:18:12] [PASSED] ttm_tt_create_invalid_bo_type
[08:18:12] [PASSED] ttm_tt_create_ttm_exists
[08:18:12] [PASSED] ttm_tt_create_failed
[08:18:12] [PASSED] ttm_tt_destroy_basic
[08:18:12] [PASSED] ttm_tt_populate_null_ttm
[08:18:12] [PASSED] ttm_tt_populate_populated_ttm
[08:18:12] [PASSED] ttm_tt_unpopulate_basic
[08:18:12] [PASSED] ttm_tt_unpopulate_empty_ttm
[08:18:12] [PASSED] ttm_tt_swapin_basic
[08:18:12] ===================== [PASSED] ttm_tt ======================
[08:18:12] =================== ttm_bo (14 subtests) ===================
[08:18:12] =========== ttm_bo_reserve_optimistic_no_ticket ===========
[08:18:12] [PASSED] Cannot be interrupted and sleeps
[08:18:12] [PASSED] Cannot be interrupted, locks straight away
[08:18:12] [PASSED] Can be interrupted, sleeps
[08:18:12] ======= [PASSED] ttm_bo_reserve_optimistic_no_ticket =======
[08:18:12] [PASSED] ttm_bo_reserve_locked_no_sleep
[08:18:12] [PASSED] ttm_bo_reserve_no_wait_ticket
[08:18:12] [PASSED] ttm_bo_reserve_double_resv
[08:18:12] [PASSED] ttm_bo_reserve_interrupted
[08:18:12] [PASSED] ttm_bo_reserve_deadlock
[08:18:12] [PASSED] ttm_bo_unreserve_basic
[08:18:12] [PASSED] ttm_bo_unreserve_pinned
[08:18:12] [PASSED] ttm_bo_unreserve_bulk
[08:18:12] [PASSED] ttm_bo_fini_basic
[08:18:12] [PASSED] ttm_bo_fini_shared_resv
[08:18:12] [PASSED] ttm_bo_pin_basic
[08:18:12] [PASSED] ttm_bo_pin_unpin_resource
[08:18:12] [PASSED] ttm_bo_multiple_pin_one_unpin
[08:18:12] ===================== [PASSED] ttm_bo ======================
[08:18:12] ============== ttm_bo_validate (22 subtests) ===============
[08:18:12] ============== ttm_bo_init_reserved_sys_man ===============
[08:18:12] [PASSED] Buffer object for userspace
[08:18:12] [PASSED] Kernel buffer object
[08:18:12] [PASSED] Shared buffer object
[08:18:12] ========== [PASSED] ttm_bo_init_reserved_sys_man ===========
[08:18:12] ============== ttm_bo_init_reserved_mock_man ==============
[08:18:12] [PASSED] Buffer object for userspace
[08:18:12] [PASSED] Kernel buffer object
[08:18:12] [PASSED] Shared buffer object
[08:18:12] ========== [PASSED] ttm_bo_init_reserved_mock_man ==========
[08:18:12] [PASSED] ttm_bo_init_reserved_resv
[08:18:12] ================== ttm_bo_validate_basic ==================
[08:18:12] [PASSED] Buffer object for userspace
[08:18:12] [PASSED] Kernel buffer object
[08:18:12] [PASSED] Shared buffer object
[08:18:12] ============== [PASSED] ttm_bo_validate_basic ==============
[08:18:12] [PASSED] ttm_bo_validate_invalid_placement
[08:18:12] ============= ttm_bo_validate_same_placement ==============
[08:18:12] [PASSED] System manager
[08:18:12] [PASSED] VRAM manager
[08:18:12] ========= [PASSED] ttm_bo_validate_same_placement ==========
[08:18:12] [PASSED] ttm_bo_validate_failed_alloc
[08:18:12] [PASSED] ttm_bo_validate_pinned
[08:18:12] [PASSED] ttm_bo_validate_busy_placement
[08:18:12] ================ ttm_bo_validate_multihop =================
[08:18:12] [PASSED] Buffer object for userspace
[08:18:12] [PASSED] Kernel buffer object
[08:18:12] [PASSED] Shared buffer object
[08:18:12] ============ [PASSED] ttm_bo_validate_multihop =============
[08:18:12] ========== ttm_bo_validate_no_placement_signaled ==========
[08:18:12] [PASSED] Buffer object in system domain, no page vector
[08:18:12] [PASSED] Buffer object in system domain with an existing page vector
[08:18:12] ====== [PASSED] ttm_bo_validate_no_placement_signaled ======
[08:18:12] ======== ttm_bo_validate_no_placement_not_signaled ========
[08:18:12] [PASSED] Buffer object for userspace
[08:18:12] [PASSED] Kernel buffer object
[08:18:12] [PASSED] Shared buffer object
[08:18:12] ==== [PASSED] ttm_bo_validate_no_placement_not_signaled ====
[08:18:12] [PASSED] ttm_bo_validate_move_fence_signaled
[08:18:12] ========= ttm_bo_validate_move_fence_not_signaled =========
[08:18:12] [PASSED] Waits for GPU
[08:18:12] [PASSED] Tries to lock straight away
[08:18:12] ===== [PASSED] ttm_bo_validate_move_fence_not_signaled =====
[08:18:12] [PASSED] ttm_bo_validate_swapout
[08:18:12] [PASSED] ttm_bo_validate_happy_evict
[08:18:12] [PASSED] ttm_bo_validate_all_pinned_evict
[08:18:12] [PASSED] ttm_bo_validate_allowed_only_evict
[08:18:12] [PASSED] ttm_bo_validate_deleted_evict
[08:18:12] [PASSED] ttm_bo_validate_busy_domain_evict
[08:18:12] [PASSED] ttm_bo_validate_evict_gutting
[08:18:12] [PASSED] ttm_bo_validate_recrusive_evict
[08:18:12] ================= [PASSED] ttm_bo_validate =================
[08:18:12] ============================================================
[08:18:12] Testing complete. Ran 102 tests: passed: 102
[08:18:12] Elapsed time: 11.503s total, 1.718s configuring, 9.570s building, 0.176s running
+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v8 5/6] drm/xe: Suppress Surprise Link Down on device
2026-06-12 8:07 ` [PATCH v8 5/6] drm/xe: Suppress Surprise Link Down on device Mallesh Koujalagi
@ 2026-06-12 8:21 ` sashiko-bot
2026-06-15 8:06 ` Tauro, Riana
2026-06-18 13:24 ` Raag Jadav
2 siblings, 0 replies; 17+ messages in thread
From: sashiko-bot @ 2026-06-12 8:21 UTC (permalink / raw)
To: Mallesh Koujalagi; +Cc: dri-devel
Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:
- [High] Masking Surprise Link Down on the Upstream Switch Port violates PCIe AER specifications.
- [High] Directly mutating AER registers without verifying OS native AER ownership.
- [High] Permanent masking of PCI_ERR_UNC_SURPDN with no restoration mechanism.
- [High] Blind PCIe hierarchy traversal modifies unrelated upstream bridges.
- [Medium] Data race due to unprotected read-modify-write on a shared AER mask register.
--
commit 7a2dc1d13a9370ceb4dd62a9d15aa26d2212058a
Author: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
drm/xe: Suppress Surprise Link Down on device
This commit attempts to prevent fatal AER events during a recovery-induced
cold reset. It masks the Surprise Link Down (SLD) bit in the upstream
port's AER Uncorrectable Error Mask register before triggering the reset.
> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
> index 93a56a0269f1e..15c2fa0d323a6 100644
> --- a/drivers/gpu/drm/xe/xe_ras.c
> +++ b/drivers/gpu/drm/xe/xe_ras.c
> @@ -209,8 +209,57 @@ static enum xe_ras_recovery_action handle_core_compute_errors(struct xe_ras_erro
> return XE_RAS_RECOVERY_ACTION_RECOVERED;
> }
>
> +static struct pci_dev *find_usp_dev(struct pci_dev *pdev)
> +{
> + struct pci_dev *vsp;
> +
> + /*
> + * Device Hierarchy:
> + *
> + * Upstream Switch Port (USP) --> Virtual Switch Port (VSP) --> SGunit (GPU endpoint)
> + */
> + vsp = pci_upstream_bridge(pdev);
> + if (!vsp)
> + return NULL;
> +
> + return pci_upstream_bridge(vsp);
[Severity: High]
Does this blind traversal safely handle diverse PCIe topologies? If the GPU
is connected to a standard motherboard switch, passed through a hypervisor, or
connected directly to a Root Port, might this return an unrelated bridge
device owned by the host system? Should this verify the Vendor ID or Device
ID of the USP before proceeding?
> +}
> +
> +#ifdef CONFIG_PCIEAER
> +static void pcie_suppress_surprise_link_down(struct pci_dev *usp)
> +{
> + u32 aer_uncorr_mask;
> + u16 aer_cap;
> +
> + aer_cap = usp->aer_cap;
> + if (!aer_cap) {
> + dev_dbg(&usp->dev,
> + "AER capability not present\n");
> + return;
> + }
> +
> + pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, &aer_uncorr_mask);
[Severity: Medium]
Since the Upstream Switch Port is a shared bridge managed by the pcieport
driver, could this lockless read-modify-write sequence race with the AER
driver handling other hotplug events, or with other endpoints on the same
switch experiencing concurrent errors?
> + aer_uncorr_mask |= PCI_ERR_UNC_SURPDN;
[Severity: High]
Is there a mechanism to unmask this bit after the cold reset completes? If
the Surprise Link Down bit is set here but never cleared during driver
probe, resume, or a post-reset callback, will it permanently disable AER
reporting for genuine link failures on this port?
> + pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, aer_uncorr_mask);
[Severity: High]
According to the PCIe Base Specification, Surprise Down errors are exclusively
detected and reported by Downstream Ports. On Upstream Ports, the Surprise
Down Error Status and Mask bits are reserved and hardwired to 0. Does masking
PCI_ERR_UNC_SURPDN on the Upstream Switch Port actually prevent the downstream
port (the VSP or Root Port) that detects the link drop from generating the
fatal AER event?
[Severity: High]
Is it safe to mutate the AER mask registers directly without verifying OS
native AER ownership? On platforms where firmware retains control of AER
(such as ACPI APEI Firmware First setups), could mutating these registers
directly from the OS conflict with platform management firmware? Is a check
like pcie_aer_is_native(usp) needed here?
> + dev_dbg(&usp->dev, "Surprise Link Down masked for cold reset\n");
> +}
> +#endif /* CONFIG_PCIEAER */
[ ... ]
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260612080722.26726-8-mallesh.koujalagi@intel.com?part=5
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v8 6/6] drm/xe/ras: Add debugfs entry to inject punit error
2026-06-12 8:07 ` [PATCH v8 6/6] drm/xe/ras: Add debugfs entry to inject punit error Mallesh Koujalagi
@ 2026-06-12 8:23 ` sashiko-bot
0 siblings, 0 replies; 17+ messages in thread
From: sashiko-bot @ 2026-06-12 8:23 UTC (permalink / raw)
To: Mallesh Koujalagi; +Cc: dri-devel
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Medium] Unconditional registration of the `inject_punit_error` debugfs file exposes PF-only hardware operations to Virtual Functions (VFs) and ignores sysctrl capability checks.
- [Low] Missing forward declaration of `struct dentry` in `xe_ras.h` when `CONFIG_DRM_XE_DEBUG` is disabled.
--
commit 185b28acb4e5add829b3fd14f6a6416e4d2a9fe3
Author: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
drm/xe/ras: Add debugfs entry to inject punit error
Add a write-only debugfs file 'inject_punit_error' under the DRI device
root that allows triggering punit_error_handler() manually for testing.
Writing 1 to the file invokes the handler, which suppresses spurious
Surprise Link Down AER events on slots and declares the
device wedged for a cold reset recovery.
> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
> index 15c2fa0d323a6..281ef4dce2188 100644
> --- a/drivers/gpu/drm/xe/xe_ras.c
> +++ b/drivers/gpu/drm/xe/xe_ras.c
[ ... ]
> +void xe_ras_debugfs_register(struct xe_device *xe, struct dentry *root)
> +{
> + debugfs_create_file("inject_punit_error", 0200, root, xe,
> + &inject_punit_error_fops);
[Severity: Medium]
Could this expose PF-only hardware operations to Virtual Functions (VFs)?
Writing to this debugfs file executes punit_error_handler(), which
manipulates the physical AER configuration on the Upstream Switch Port (USP)
via PCI config space and triggers a PCIe cold reset. According to SR-IOV
constraints, VFs must not access PF-only hardware resources like physical
PCIe bridge configurations.
Additionally, this doesn't check if the device actually supports sysctrl
before allowing the error injection. Should this function include an early
return guard like if (IS_SRIOV_VF(xe) || !xe->info.has_sysctrl) before
creating the file?
> +}
> +#endif /* CONFIG_DRM_XE_DEBUG */
[ ... ]
> diff --git a/drivers/gpu/drm/xe/xe_ras.h b/drivers/gpu/drm/xe/xe_ras.h
> index 8d106c708ff19..c9f84ef238c42 100644
> --- a/drivers/gpu/drm/xe/xe_ras.h
> +++ b/drivers/gpu/drm/xe/xe_ras.h
> @@ -16,4 +16,11 @@ void xe_ras_counter_threshold_crossed(struct xe_device *xe,
> void xe_ras_init(struct xe_device *xe);
> enum xe_ras_recovery_action xe_ras_process_errors(struct xe_device *xe);
>
> +#ifdef CONFIG_DRM_XE_DEBUG
> +struct dentry;
> +void xe_ras_debugfs_register(struct xe_device *xe, struct dentry *root);
> +#else
> +static inline void xe_ras_debugfs_register(struct xe_device *xe, struct dentry *root) {}
[Severity: Low]
Does this cause a compiler warning when CONFIG_DRM_XE_DEBUG is disabled?
The forward declaration of struct dentry is inside the #ifdef block, but the
#else block uses struct dentry * in the parameter list of the static inline
function. In C, this scopes the struct declaration to the function prototype
itself if it wasn't previously declared.
Should the struct dentry; forward declaration be moved outside the #ifdef
block to ensure it is visible to the inline stub?
> +#endif
> +
> #endif
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260612080722.26726-8-mallesh.koujalagi@intel.com?part=6
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v8 1/6] Introduce Xe Uncorrectable Error Handling
2026-06-12 8:07 ` [PATCH v8 1/6] Introduce Xe Uncorrectable Error Handling Mallesh Koujalagi
@ 2026-06-12 8:24 ` sashiko-bot
0 siblings, 0 replies; 17+ messages in thread
From: sashiko-bot @ 2026-06-12 8:24 UTC (permalink / raw)
To: Mallesh Koujalagi; +Cc: dri-devel
Thank you for your contribution! Sashiko AI review found 7 potential issue(s) to consider:
- [Critical] Manually calling `pdev->driver->probe()` and `remove()` in `xe_pci_error_slot_reset()` bypasses the device core's lifecycle and synchronization, leading to Use-After-Free, resource leaks, state corruption, and crashes.
- [High] Unbalanced `xe_device_wedged_put()` in PCI error resume path permanently wedges the device.
- [High] The endpoint driver directly modifies the Upstream Switch Port (USP) AER configuration registers, leading to layering violations, unprotected RMW race conditions, and permanently leaked hardware state modifications.
- [High] Lockless checks of the `in_reset` flag create a Time-Of-Check to Time-Of-Use race with the PCI error handler, potentially causing MMIO aborts.
- [Medium] Integer overflow in page offline processing loops allows a compromised firmware to trap the kernel in an infinite loop.
- [Medium] Use of C bitfields to define hardware data structures makes the layout compiler- and architecture-dependent.
- [Medium] Unconditional page offlining due to commented-out address fault handling.
--
commit 428443bc3d7fc4298f440e96ee0492972d733535
Author: Riana Tauro <riana.tauro@intel.com>
Subject: Introduce Xe Uncorrectable Error Handling
This commit introduces uncorrectable error handling for the Xe driver. It
includes PCI error handlers for recovery and reset operations. It also
integrates RAS error processing via system controller mailboxes.
Link: https://patchwork.freedesktop.org/series/160482/
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
[ ... ]
> @@ -1511,7 +1511,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> * If devcoredump not captured and GuC capture for the job is not ready
> * do manual capture first and decide later if we need to use it
> */
> - if (!exec_queue_killed(q) && !xe->devcoredump.captured &&
> + if (!xe_device_is_in_reset(xe) && !exec_queue_killed(q) && !xe->devcoredump.captured &&
> !xe_guc_capture_get_matching_and_lock(q)) {
> /* take force wake before engine register manual capture */
> CLASS(xe_force_wake, fw_ref)(gt_to_fw(q->gt), XE_FORCEWAKE_ALL);
[Severity: High]
Could this lockless check of the in_reset flag create a race condition with
the PCI error handler?
If this worker thread evaluates xe_device_is_in_reset() as false just before
prepare_device_for_reset() sets the flag and disables the PCI device, the
worker might attempt MMIO access on a disabled device. Will this cause a
master abort or system hang?
> diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
> --- a/drivers/gpu/drm/xe/xe_pci.c
> +++ b/drivers/gpu/drm/xe/xe_pci.c
[ ... ]
> @@ -1102,6 +1104,10 @@ static int xe_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> if (xe_display_driver_probe_defer(pdev))
> return -EPROBE_DEFER;
>
> + devres_id = devres_open_group(&pdev->dev, NULL, GFP_KERNEL);
> + if (!devres_id)
> + return -ENOMEM;
> +
[Severity: Critical]
Is devres_open_group() intentionally left without a matching close call?
All subsequent devres allocations will be tied to this group. As seen below
in xe_pci_error_slot_reset(), this group is forcefully released, which could
leak resources if a manual probe fails.
> diff --git a/drivers/gpu/drm/xe/xe_pci_error.c b/drivers/gpu/drm/xe/xe_pci_error.c
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_pci_error.c
[ ... ]
> +static pci_ers_result_t xe_pci_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
> +{
> + struct xe_device *xe = pdev_to_xe_device(pdev);
> +
> + xe_err(xe, "PCI error: detected state = %u\n", state);
> +
> + if (state == pci_channel_io_perm_failure)
> + return PCI_ERS_RESULT_DISCONNECT;
> +
> + /* If the device is already wedged or in survivability mode, do not attempt recovery */
> + if (xe_survivability_mode_is_boot_enabled(xe) || xe_device_wedged(xe))
> + return PCI_ERS_RESULT_DISCONNECT;
> +
> + switch (state) {
> + case pci_channel_io_normal:
> + return PCI_ERS_RESULT_CAN_RECOVER;
[Severity: High]
Does this path need to call xe_device_wedged_get() like the frozen state does?
Returning PCI_ERS_RESULT_CAN_RECOVER here bypasses prepare_device_for_reset().
When the recovery finishes and xe_pci_error_resume() calls
xe_device_wedged_put(), it might underflow the reference count and permanently
wedge the device.
[ ... ]
> +static pci_ers_result_t xe_pci_error_slot_reset(struct pci_dev *pdev)
> +{
[ ... ]
> + /*
> + * Secondary Bus Reset causes all VRAM state to be lost along with
> + * hardware state. As an initial step, re-probe the device to
> + * re-initialize the driver and hardware.
> + * TODO: optimize by re-initializing only the hardware state and re-creating
> + * kernel BOs.
> + */
> + xe_device_clear_in_reset(xe);
> + pdev->driver->remove(pdev);
> + devres_release_group(&pdev->dev, xe->devres_group_id);
> +
> + if (pdev->driver->probe(pdev, ent))
> + return PCI_ERS_RESULT_DISCONNECT;
[Severity: Critical]
Are we bypassing the Linux device model here? Directly calling
pdev->driver->remove() and probe() misses devres lifecycle management,
device links, PM runtime, and sysfs synchronization.
If the manual probe() fails, it seems like we leak all newly allocated
devm_ resources since the driver core is not involved. Would a later
normal driver unbind cause a use-after-free by calling remove on a
partially initialized device?
> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
> --- a/drivers/gpu/drm/xe/xe_ras.c
> +++ b/drivers/gpu/drm/xe/xe_ras.c
[ ... ]
> +static int handle_page_offline(struct xe_device *xe, u64 page_address, bool send_offline_cmd)
> +{
> + enum xe_ras_page_action action;
> + int ret = 0;
> +
> + if (!IS_ALIGNED(page_address, XE_PAGE_SIZE)) {
> + xe_err(xe, "sysctrl: Unaligned page address: 0x%llx\n", page_address);
> + return -EINVAL;
> + }
> +
> + /*
> + * TODO: Call function to handle address fault
> + * ret = xe_ttm_vram_handle_addr_fault(xe, page_address);
> + */
> +
> + /*
> + * Handle return code from address fault handling function:
> + * 0: Address is valid and can be offlined
> + * -EIO: Address belongs to a critical BO that cannot be offlined
> + * -EOPNOTSUPP: Address is valid and can be offlined but user policy is not to offline
> + *
> + * For any other non-zero error code, skip offlining.
> + */
> +
> + switch (ret) {
> + case 0:
> + action = XE_RAS_PAGE_ACTION_OFFLINE;
> + break;
[Severity: Medium]
Is it intended to leave the address fault handling function commented out?
Because ret remains 0, the switch statement will unconditionally execute the
success case and set the action to XE_RAS_PAGE_ACTION_OFFLINE. Could this
instruct the firmware to offline memory pages even if they hold critical
kernel structures, potentially crashing the system?
[ ... ]
> +static void get_queued_pages(struct xe_device *xe)
> +{
[ ... ]
> + for (i = 0; i < response.pages_returned && i < XE_RAS_NUM_PAGES; i++)
> + handle_page_offline(xe, response.page_addresses[i], true);
> +
> + count += response.pages_returned;
> + if (!response.pages_returned)
> + break;
> +
> + if (count > response.total_pages) {
[Severity: Medium]
Can the addition to count overflow?
Both count and response.pages_returned are 32-bit unsigned integers. If the
system controller firmware returns a crafted value like 0xFFFFFFFF, count
could wrap around to a small value, bypassing the safety check. Since the
loop lacks an explicit maximum iteration limit, could this trap the kernel
thread in an infinite loop?
[ ... ]
> +static void aer_unmask_and_downgrade_internal_error(struct xe_device *xe)
> +{
> + struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
> + u32 aer_uncorr_mask, aer_uncorr_sev, aer_uncorr_status;
> + struct pci_dev *usp;
> + u16 aer_cap;
[ ... ]
> + /* Initialize Uncorrectable Error Severity Register */
> + pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_SEVER, &aer_uncorr_sev);
> + aer_uncorr_sev &= ~PCI_ERR_UNC_INTN;
> + pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_SEVER, aer_uncorr_sev);
[Severity: High]
Is it safe for the endpoint driver to directly perform read-modify-write
operations on the Upstream Switch Port AER configuration registers?
Since the upstream bridge is owned by the PCI core and pcieport service
driver, wouldn't these unprotected operations race with concurrent updates
from the AER core or userspace? Also, since these modifications aren't
reverted during driver unload or after reset, will they permanently alter
the behavior of the slot?
> diff --git a/drivers/gpu/drm/xe/xe_ras_types.h b/drivers/gpu/drm/xe/xe_ras_types.h
> --- a/drivers/gpu/drm/xe/xe_ras_types.h
> +++ b/drivers/gpu/drm/xe/xe_ras_types.h
[ ... ]
> +/**
> + * struct xe_ras_soc_error_source - Source of SoC error
> + */
> +struct xe_ras_soc_error_source {
> + /** @csc: CSC */
> + u32 csc:1;
> + /** @ieh: IEH (Integrated Error Handler) */
> + u32 ieh:1;
> + /** @reserved: Reserved for future use */
> + u32 reserved:30;
> +} __packed;
[Severity: Medium]
Since this structure maps directly to System Controller Mailbox payloads, is
it safe to use C bitfields here?
The packing order of bitfields is compiler-defined and dependent on
architecture endianness. Could this result in incorrect parsing of hardware
error sources on big-endian architectures?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260612080722.26726-8-mallesh.koujalagi@intel.com?part=1
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v8 4/6] drm/xe: Handle PUNIT errors by requesting cold-reset recovery
2026-06-12 8:07 ` [PATCH v8 4/6] drm/xe: Handle PUNIT errors by requesting cold-reset recovery Mallesh Koujalagi
@ 2026-06-12 8:27 ` sashiko-bot
0 siblings, 0 replies; 17+ messages in thread
From: sashiko-bot @ 2026-06-12 8:27 UTC (permalink / raw)
To: Mallesh Koujalagi; +Cc: dri-devel
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Low] Unnecessary header files were added to `xe_ras.c`.
--
commit 5df465d682f75cd912c33257354af24edc73cf6d
Author: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
drm/xe: Handle PUNIT errors by requesting cold-reset recovery
When PUNIT (power management unit) errors are detected that persist across
warm resets, mark the device as wedged with DRM_WEDGE_RECOVERY_COLD_RESET
and notify userspace that a complete device power cycle is required to
restore normal operation.
> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
> index 6a5140bc47e52..93a56a0269f1e 100644
> --- a/drivers/gpu/drm/xe/xe_ras.c
> +++ b/drivers/gpu/drm/xe/xe_ras.c
> @@ -4,6 +4,8 @@
> */
>
> #include "xe_bo.h"
> +#include "xe_assert.h"
> +#include "xe_device_types.h"
[Severity: Low]
Are these headers necessary here?
Neither the new punit_error_handler() function nor the modified
handle_soc_internal_errors() function appear to use any macros or types
defined exclusively in xe_assert.h or xe_device_types.h.
It looks like xe_device.h is already included below and provides the
necessary types and function declarations like xe_device_set_wedged_method().
Could these additions be dropped to avoid unnecessary compilation dependencies?
[ ... ]
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260612080722.26726-8-mallesh.koujalagi@intel.com?part=4
^ permalink raw reply [flat|nested] 17+ messages in thread
* ✓ Xe.CI.BAT: success for Introduce cold reset recovery method (rev8)
2026-06-12 8:07 [PATCH v8 0/6] Introduce cold reset recovery method Mallesh Koujalagi
` (7 preceding siblings ...)
2026-06-12 8:18 ` ✓ CI.KUnit: success " Patchwork
@ 2026-06-12 9:03 ` Patchwork
2026-06-13 1:18 ` ✓ Xe.CI.FULL: " Patchwork
9 siblings, 0 replies; 17+ messages in thread
From: Patchwork @ 2026-06-12 9:03 UTC (permalink / raw)
To: Mallesh Koujalagi; +Cc: intel-xe
[-- Attachment #1: Type: text/plain, Size: 950 bytes --]
== Series Details ==
Series: Introduce cold reset recovery method (rev8)
URL : https://patchwork.freedesktop.org/series/163428/
State : success
== Summary ==
CI Bug Log - changes from xe-5246-9ce63e3d9903982c053b125686abcb7a84357ed4_BAT -> xe-pw-163428v8_BAT
====================================================
Summary
-------
**SUCCESS**
No regressions found.
Participating hosts (13 -> 12)
------------------------------
Missing (1): bat-lnl-2
Changes
-------
No changes found
Build changes
-------------
* Linux: xe-5246-9ce63e3d9903982c053b125686abcb7a84357ed4 -> xe-pw-163428v8
IGT_8960: 88bd725754990332efcd158b0429f6ac7fb63862 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
xe-5246-9ce63e3d9903982c053b125686abcb7a84357ed4: 9ce63e3d9903982c053b125686abcb7a84357ed4
xe-pw-163428v8: 163428v8
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/index.html
[-- Attachment #2: Type: text/html, Size: 1498 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* ✓ Xe.CI.FULL: success for Introduce cold reset recovery method (rev8)
2026-06-12 8:07 [PATCH v8 0/6] Introduce cold reset recovery method Mallesh Koujalagi
` (8 preceding siblings ...)
2026-06-12 9:03 ` ✓ Xe.CI.BAT: " Patchwork
@ 2026-06-13 1:18 ` Patchwork
9 siblings, 0 replies; 17+ messages in thread
From: Patchwork @ 2026-06-13 1:18 UTC (permalink / raw)
To: Mallesh Koujalagi; +Cc: intel-xe
[-- Attachment #1: Type: text/plain, Size: 20723 bytes --]
== Series Details ==
Series: Introduce cold reset recovery method (rev8)
URL : https://patchwork.freedesktop.org/series/163428/
State : success
== Summary ==
CI Bug Log - changes from xe-5246-9ce63e3d9903982c053b125686abcb7a84357ed4_FULL -> xe-pw-163428v8_FULL
====================================================
Summary
-------
**SUCCESS**
No regressions found.
Participating hosts (2 -> 2)
------------------------------
No changes in participating hosts
Known issues
------------
Here are the changes found in xe-pw-163428v8_FULL that come from known issues:
### IGT changes ###
#### Issues hit ####
* igt@kms_big_fb@linear-max-hw-stride-32bpp-rotate-180-hflip:
- shard-lnl: NOTRUN -> [SKIP][1] ([Intel XE#7059] / [Intel XE#7085])
[1]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@kms_big_fb@linear-max-hw-stride-32bpp-rotate-180-hflip.html
* igt@kms_big_fb@y-tiled-max-hw-stride-32bpp-rotate-180:
- shard-lnl: NOTRUN -> [SKIP][2] ([Intel XE#1124])
[2]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@kms_big_fb@y-tiled-max-hw-stride-32bpp-rotate-180.html
* igt@kms_ccs@missing-ccs-buffer-y-tiled-gen12-rc-ccs:
- shard-lnl: NOTRUN -> [SKIP][3] ([Intel XE#2887]) +2 other tests skip
[3]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@kms_ccs@missing-ccs-buffer-y-tiled-gen12-rc-ccs.html
* igt@kms_chamelium_hpd@hdmi-hpd-for-each-pipe:
- shard-lnl: NOTRUN -> [SKIP][4] ([Intel XE#373])
[4]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@kms_chamelium_hpd@hdmi-hpd-for-each-pipe.html
* igt@kms_cursor_crc@cursor-random-32x32:
- shard-lnl: NOTRUN -> [SKIP][5] ([Intel XE#1424]) +1 other test skip
[5]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@kms_cursor_crc@cursor-random-32x32.html
* igt@kms_cursor_legacy@cursorb-vs-flipa-varying-size:
- shard-lnl: NOTRUN -> [SKIP][6] ([Intel XE#309] / [Intel XE#7343]) +1 other test skip
[6]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@kms_cursor_legacy@cursorb-vs-flipa-varying-size.html
* igt@kms_cursor_legacy@short-busy-flip-before-cursor-atomic-transitions:
- shard-lnl: NOTRUN -> [SKIP][7] ([Intel XE#323] / [Intel XE#6035])
[7]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@kms_cursor_legacy@short-busy-flip-before-cursor-atomic-transitions.html
* igt@kms_dp_link_training@uhbr-mst:
- shard-lnl: NOTRUN -> [SKIP][8] ([Intel XE#4354] / [Intel XE#7386])
[8]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@kms_dp_link_training@uhbr-mst.html
* igt@kms_flip@2x-flip-vs-dpms-on-nop-interruptible:
- shard-lnl: NOTRUN -> [SKIP][9] ([Intel XE#1421])
[9]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@kms_flip@2x-flip-vs-dpms-on-nop-interruptible.html
* igt@kms_flip@flip-vs-expired-vblank@d-dp2:
- shard-bmg: [PASS][10] -> [FAIL][11] ([Intel XE#3321]) +1 other test fail
[10]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-5246-9ce63e3d9903982c053b125686abcb7a84357ed4/shard-bmg-10/igt@kms_flip@flip-vs-expired-vblank@d-dp2.html
[11]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-bmg-5/igt@kms_flip@flip-vs-expired-vblank@d-dp2.html
* igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-32bpp-ytileccs-upscaling:
- shard-lnl: NOTRUN -> [SKIP][12] ([Intel XE#7178] / [Intel XE#7351]) +1 other test skip
[12]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-32bpp-ytileccs-upscaling.html
* igt@kms_frontbuffer_tracking@drrshdr-1p-primscrn-indfb-pgflip-blt:
- shard-lnl: NOTRUN -> [SKIP][13] ([Intel XE#6312]) +2 other tests skip
[13]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@kms_frontbuffer_tracking@drrshdr-1p-primscrn-indfb-pgflip-blt.html
* igt@kms_frontbuffer_tracking@drrshdr-2p-primscrn-pri-shrfb-draw-mmap-wc:
- shard-lnl: NOTRUN -> [SKIP][14] ([Intel XE#7905]) +7 other tests skip
[14]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@kms_frontbuffer_tracking@drrshdr-2p-primscrn-pri-shrfb-draw-mmap-wc.html
* igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-indfb-pgflip-blt:
- shard-lnl: NOTRUN -> [SKIP][15] ([Intel XE#656] / [Intel XE#7905]) +9 other tests skip
[15]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-indfb-pgflip-blt.html
* igt@kms_frontbuffer_tracking@fbcdrrs-1p-primscrn-cur-indfb-draw-blt:
- shard-lnl: NOTRUN -> [SKIP][16] ([Intel XE#6312] / [Intel XE#651]) +1 other test skip
[16]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@kms_frontbuffer_tracking@fbcdrrs-1p-primscrn-cur-indfb-draw-blt.html
* igt@kms_frontbuffer_tracking@fbcdrrs-argb161616f-draw-mmap-wc:
- shard-lnl: NOTRUN -> [SKIP][17] ([Intel XE#7061] / [Intel XE#7356])
[17]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@kms_frontbuffer_tracking@fbcdrrs-argb161616f-draw-mmap-wc.html
* igt@kms_frontbuffer_tracking@fbcdrrshdr-tiling-y:
- shard-lnl: NOTRUN -> [SKIP][18] ([Intel XE#7399])
[18]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@kms_frontbuffer_tracking@fbcdrrshdr-tiling-y.html
* igt@kms_frontbuffer_tracking@fbchdr-abgr161616f-draw-blt:
- shard-lnl: NOTRUN -> [SKIP][19] ([Intel XE#7061])
[19]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@kms_frontbuffer_tracking@fbchdr-abgr161616f-draw-blt.html
* igt@kms_frontbuffer_tracking@psrhdr-1p-primscrn-pri-indfb-draw-blt:
- shard-lnl: NOTRUN -> [SKIP][20] ([Intel XE#7865]) +2 other tests skip
[20]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@kms_frontbuffer_tracking@psrhdr-1p-primscrn-pri-indfb-draw-blt.html
* igt@kms_hdr@invalid-hdr:
- shard-bmg: [PASS][21] -> [SKIP][22] ([Intel XE#1503])
[21]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-5246-9ce63e3d9903982c053b125686abcb7a84357ed4/shard-bmg-6/igt@kms_hdr@invalid-hdr.html
[22]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-bmg-9/igt@kms_hdr@invalid-hdr.html
* igt@kms_hdr@invalid-hdr@pipe-a-hdmi-a-3-xrgb2101010:
- shard-bmg: [PASS][23] -> [SKIP][24] ([Intel XE#7922]) +1 other test skip
[23]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-5246-9ce63e3d9903982c053b125686abcb7a84357ed4/shard-bmg-6/igt@kms_hdr@invalid-hdr@pipe-a-hdmi-a-3-xrgb2101010.html
[24]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-bmg-9/igt@kms_hdr@invalid-hdr@pipe-a-hdmi-a-3-xrgb2101010.html
* igt@kms_hdr@invalid-metadata-sizes@pipe-a-hdmi-a-3-xrgb16161616f:
- shard-bmg: [PASS][25] -> [SKIP][26] ([Intel XE#7915]) +1 other test skip
[25]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-5246-9ce63e3d9903982c053b125686abcb7a84357ed4/shard-bmg-1/igt@kms_hdr@invalid-metadata-sizes@pipe-a-hdmi-a-3-xrgb16161616f.html
[26]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-bmg-6/igt@kms_hdr@invalid-metadata-sizes@pipe-a-hdmi-a-3-xrgb16161616f.html
* igt@kms_hdr@static-swap:
- shard-lnl: NOTRUN -> [SKIP][27] ([Intel XE#1503] / [Intel XE#7915])
[27]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@kms_hdr@static-swap.html
* igt@kms_hdr@static-swap@pipe-a-edp-1-xrgb2101010:
- shard-lnl: NOTRUN -> [SKIP][28] ([Intel XE#7915]) +1 other test skip
[28]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@kms_hdr@static-swap@pipe-a-edp-1-xrgb2101010.html
* igt@kms_joiner@switch-modeset-ultra-joiner-big-joiner:
- shard-lnl: NOTRUN -> [SKIP][29] ([Intel XE#7173] / [Intel XE#7294])
[29]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@kms_joiner@switch-modeset-ultra-joiner-big-joiner.html
* igt@kms_plane@pixel-format-4-tiled-mtl-rc-ccs-modifier-source-clamping:
- shard-lnl: NOTRUN -> [SKIP][30] ([Intel XE#7283])
[30]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@kms_plane@pixel-format-4-tiled-mtl-rc-ccs-modifier-source-clamping.html
* igt@kms_psr@fbc-pr-primary-page-flip:
- shard-lnl: NOTRUN -> [SKIP][31] ([Intel XE#1406])
[31]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@kms_psr@fbc-pr-primary-page-flip.html
* igt@kms_rotation_crc@primary-4-tiled-reflect-x-180:
- shard-lnl: NOTRUN -> [SKIP][32] ([Intel XE#3414] / [Intel XE#3904] / [Intel XE#7342])
[32]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@kms_rotation_crc@primary-4-tiled-reflect-x-180.html
* igt@kms_setmode@basic@pipe-b-edp-1:
- shard-lnl: [PASS][33] -> [FAIL][34] ([Intel XE#6361]) +2 other tests fail
[33]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-5246-9ce63e3d9903982c053b125686abcb7a84357ed4/shard-lnl-7/igt@kms_setmode@basic@pipe-b-edp-1.html
[34]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-8/igt@kms_setmode@basic@pipe-b-edp-1.html
* igt@xe_eudebug_online@pagefault-one-of-many:
- shard-lnl: NOTRUN -> [SKIP][35] ([Intel XE#7636]) +2 other tests skip
[35]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@xe_eudebug_online@pagefault-one-of-many.html
* igt@xe_evict@evict-beng-mixed-threads-large-multi-vm:
- shard-lnl: NOTRUN -> [SKIP][36] ([Intel XE#6540] / [Intel XE#688]) +1 other test skip
[36]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@xe_evict@evict-beng-mixed-threads-large-multi-vm.html
* igt@xe_exec_balancer@twice-cm-virtual-userptr:
- shard-lnl: NOTRUN -> [SKIP][37] ([Intel XE#7482]) +4 other tests skip
[37]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@xe_exec_balancer@twice-cm-virtual-userptr.html
* igt@xe_exec_basic@multigpu-no-exec-bindexecqueue-userptr:
- shard-lnl: NOTRUN -> [SKIP][38] ([Intel XE#1392])
[38]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@xe_exec_basic@multigpu-no-exec-bindexecqueue-userptr.html
* igt@xe_exec_fault_mode@many-multi-queue-userptr-invalidate-race-prefetch:
- shard-lnl: NOTRUN -> [SKIP][39] ([Intel XE#7136]) +2 other tests skip
[39]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@xe_exec_fault_mode@many-multi-queue-userptr-invalidate-race-prefetch.html
* igt@xe_exec_multi_queue@one-queue-preempt-mode-dyn-priority-smem:
- shard-lnl: NOTRUN -> [SKIP][40] ([Intel XE#6874]) +5 other tests skip
[40]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@xe_exec_multi_queue@one-queue-preempt-mode-dyn-priority-smem.html
* igt@xe_exec_reset@cm-multi-queue-gt-reset:
- shard-lnl: NOTRUN -> [SKIP][41] ([Intel XE#7866]) +1 other test skip
[41]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@xe_exec_reset@cm-multi-queue-gt-reset.html
* igt@xe_exec_reset@long-spin-many-preempt-threads:
- shard-bmg: [PASS][42] -> [FAIL][43] ([Intel XE#7956])
[42]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-5246-9ce63e3d9903982c053b125686abcb7a84357ed4/shard-bmg-1/igt@xe_exec_reset@long-spin-many-preempt-threads.html
[43]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-bmg-6/igt@xe_exec_reset@long-spin-many-preempt-threads.html
* igt@xe_exec_threads@threads-multi-queue-mixed-fd-userptr:
- shard-lnl: NOTRUN -> [SKIP][44] ([Intel XE#7138])
[44]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@xe_exec_threads@threads-multi-queue-mixed-fd-userptr.html
* igt@xe_page_reclaim@invalid-1g:
- shard-lnl: NOTRUN -> [SKIP][45] ([Intel XE#7793])
[45]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@xe_page_reclaim@invalid-1g.html
* igt@xe_pm@d3cold-basic-exec:
- shard-lnl: NOTRUN -> [SKIP][46] ([Intel XE#2284] / [Intel XE#366] / [Intel XE#7370])
[46]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@xe_pm@d3cold-basic-exec.html
* igt@xe_query@multigpu-query-gt-list:
- shard-lnl: NOTRUN -> [SKIP][47] ([Intel XE#944])
[47]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@xe_query@multigpu-query-gt-list.html
* igt@xe_sriov_auto_provisioning@fair-allocation:
- shard-lnl: NOTRUN -> [SKIP][48] ([Intel XE#4130] / [Intel XE#7366])
[48]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@xe_sriov_auto_provisioning@fair-allocation.html
#### Possible fixes ####
* igt@kms_async_flips@alternate-sync-async-flip:
- shard-bmg: [FAIL][49] ([Intel XE#3718] / [Intel XE#6078]) -> [PASS][50]
[49]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-5246-9ce63e3d9903982c053b125686abcb7a84357ed4/shard-bmg-9/igt@kms_async_flips@alternate-sync-async-flip.html
[50]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-bmg-1/igt@kms_async_flips@alternate-sync-async-flip.html
* igt@kms_async_flips@alternate-sync-async-flip@pipe-b-dp-2:
- shard-bmg: [FAIL][51] ([Intel XE#6078]) -> [PASS][52]
[51]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-5246-9ce63e3d9903982c053b125686abcb7a84357ed4/shard-bmg-9/igt@kms_async_flips@alternate-sync-async-flip@pipe-b-dp-2.html
[52]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-bmg-1/igt@kms_async_flips@alternate-sync-async-flip@pipe-b-dp-2.html
* igt@kms_cursor_legacy@flip-vs-cursor-legacy:
- shard-bmg: [FAIL][53] ([Intel XE#7571]) -> [PASS][54]
[53]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-5246-9ce63e3d9903982c053b125686abcb7a84357ed4/shard-bmg-4/igt@kms_cursor_legacy@flip-vs-cursor-legacy.html
[54]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-bmg-7/igt@kms_cursor_legacy@flip-vs-cursor-legacy.html
* igt@kms_setmode@basic:
- shard-bmg: [FAIL][55] ([Intel XE#6361]) -> [PASS][56] +1 other test pass
[55]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-5246-9ce63e3d9903982c053b125686abcb7a84357ed4/shard-bmg-8/igt@kms_setmode@basic.html
[56]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-bmg-2/igt@kms_setmode@basic.html
#### Warnings ####
* igt@kms_frontbuffer_tracking@psrhdr-argb161616f-draw-render:
- shard-lnl: [ABORT][57] ([Intel XE#8007]) -> [SKIP][58] ([Intel XE#7061])
[57]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-5246-9ce63e3d9903982c053b125686abcb7a84357ed4/shard-lnl-8/igt@kms_frontbuffer_tracking@psrhdr-argb161616f-draw-render.html
[58]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-lnl-7/igt@kms_frontbuffer_tracking@psrhdr-argb161616f-draw-render.html
* igt@kms_tiled_display@basic-test-pattern-with-chamelium:
- shard-bmg: [SKIP][59] ([Intel XE#2426] / [Intel XE#5848]) -> [SKIP][60] ([Intel XE#2509] / [Intel XE#7437])
[59]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-5246-9ce63e3d9903982c053b125686abcb7a84357ed4/shard-bmg-9/igt@kms_tiled_display@basic-test-pattern-with-chamelium.html
[60]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/shard-bmg-1/igt@kms_tiled_display@basic-test-pattern-with-chamelium.html
[Intel XE#1124]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1124
[Intel XE#1392]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1392
[Intel XE#1406]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1406
[Intel XE#1421]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1421
[Intel XE#1424]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1424
[Intel XE#1503]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1503
[Intel XE#2284]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2284
[Intel XE#2426]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2426
[Intel XE#2509]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2509
[Intel XE#2887]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2887
[Intel XE#309]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/309
[Intel XE#323]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/323
[Intel XE#3321]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3321
[Intel XE#3414]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3414
[Intel XE#366]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/366
[Intel XE#3718]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3718
[Intel XE#373]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/373
[Intel XE#3904]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3904
[Intel XE#4130]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4130
[Intel XE#4354]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4354
[Intel XE#5848]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5848
[Intel XE#6035]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6035
[Intel XE#6078]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6078
[Intel XE#6312]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6312
[Intel XE#6361]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6361
[Intel XE#651]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/651
[Intel XE#6540]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6540
[Intel XE#656]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/656
[Intel XE#6874]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6874
[Intel XE#688]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/688
[Intel XE#7059]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7059
[Intel XE#7061]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7061
[Intel XE#7085]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7085
[Intel XE#7136]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7136
[Intel XE#7138]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7138
[Intel XE#7173]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7173
[Intel XE#7178]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7178
[Intel XE#7283]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7283
[Intel XE#7294]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7294
[Intel XE#7342]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7342
[Intel XE#7343]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7343
[Intel XE#7351]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7351
[Intel XE#7356]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7356
[Intel XE#7366]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7366
[Intel XE#7370]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7370
[Intel XE#7386]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7386
[Intel XE#7399]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7399
[Intel XE#7437]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7437
[Intel XE#7482]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7482
[Intel XE#7571]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7571
[Intel XE#7636]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7636
[Intel XE#7793]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7793
[Intel XE#7865]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7865
[Intel XE#7866]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7866
[Intel XE#7905]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7905
[Intel XE#7915]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7915
[Intel XE#7922]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7922
[Intel XE#7956]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7956
[Intel XE#8007]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/8007
[Intel XE#944]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/944
Build changes
-------------
* Linux: xe-5246-9ce63e3d9903982c053b125686abcb7a84357ed4 -> xe-pw-163428v8
IGT_8960: 88bd725754990332efcd158b0429f6ac7fb63862 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
xe-5246-9ce63e3d9903982c053b125686abcb7a84357ed4: 9ce63e3d9903982c053b125686abcb7a84357ed4
xe-pw-163428v8: 163428v8
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-163428v8/index.html
[-- Attachment #2: Type: text/html, Size: 22891 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v8 5/6] drm/xe: Suppress Surprise Link Down on device
2026-06-12 8:07 ` [PATCH v8 5/6] drm/xe: Suppress Surprise Link Down on device Mallesh Koujalagi
2026-06-12 8:21 ` sashiko-bot
@ 2026-06-15 8:06 ` Tauro, Riana
2026-06-18 13:24 ` Raag Jadav
2 siblings, 0 replies; 17+ messages in thread
From: Tauro, Riana @ 2026-06-15 8:06 UTC (permalink / raw)
To: Mallesh Koujalagi, intel-xe, dri-devel, rodrigo.vivi
Cc: andrealmeid, christian.koenig, airlied, simona.vetter, mripard,
maarten.lankhorst, tzimmermann, anshuman.gupta, badal.nilawar,
karthik.poosa, sk.anirban, raag.jadav
On 12-06-2026 13:37, Mallesh Koujalagi wrote:
> PUNIT errors can only be recovered using a power-cycle. Xe KMD
> sends a uevent to notify userspace to trigger a power cycle.
> On platforms where link drop caused by powering the device off and
> back on is reported by hardware as a Surprise Link Down (SLD), which
> AER then escalates as an Uncorrectable Fatal Error. That error fires
> before the device finishes coming back up and defeats the
> very recovery we are attempting.
>
> To keep the expected, recovery-induced link drop from being raised as
> a fatal AER event, mask the Surprise Link Down bit
> (PCI_ERR_UNC_SURPDN) in the upstream port's AER Uncorrectable Error
> Mask register before punit_error_handler() requests the cold reset.
>
> Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
> ---
> v6:
> - Expand commit message to explain why SUR_DN is masked. (Raag/Riana)
> - Check Slot Implemented bit before reading Slot Capabilities, per
> PCIe spec. (Riana)
> - Add debug log.
>
> v7:
> - Handle surprise link down event properly. (Aravind/Riana)
> - Update commit message. (Riana)
> - Correct log message.
>
> v8:
> - Use find_usp_dev() in punit_error_handler() function.
> ---
> drivers/gpu/drm/xe/xe_ras.c | 65 ++++++++++++++++++++++++++++---------
> 1 file changed, 49 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
> index 93a56a0269f1..15c2fa0d323a 100644
> --- a/drivers/gpu/drm/xe/xe_ras.c
> +++ b/drivers/gpu/drm/xe/xe_ras.c
> @@ -209,8 +209,57 @@ static enum xe_ras_recovery_action handle_core_compute_errors(struct xe_ras_erro
> return XE_RAS_RECOVERY_ACTION_RECOVERED;
> }
>
> +static struct pci_dev *find_usp_dev(struct pci_dev *pdev)
> +{
> + struct pci_dev *vsp;
> +
> + /*
> + * Device Hierarchy:
> + *
> + * Upstream Switch Port (USP) --> Virtual Switch Port (VSP) --> SGunit (GPU endpoint)
> + */
> + vsp = pci_upstream_bridge(pdev);
> + if (!vsp)
> + return NULL;
> +
> + return pci_upstream_bridge(vsp);
> +}
> +
> +#ifdef CONFIG_PCIEAER
> +static void pcie_suppress_surprise_link_down(struct pci_dev *usp)
> +{
> + u32 aer_uncorr_mask;
> + u16 aer_cap;
> +
> + aer_cap = usp->aer_cap;
> + if (!aer_cap) {
> + dev_dbg(&usp->dev,
> + "AER capability not present\n");
>
> + return;
> + }
> +
> + pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, &aer_uncorr_mask);
> + aer_uncorr_mask |= PCI_ERR_UNC_SURPDN;
> + pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, aer_uncorr_mask);
> + dev_dbg(&usp->dev, "Surprise Link Down masked for cold reset\n");
> +}
> +#endif /* CONFIG_PCIEAER */
> +
> static void punit_error_handler(struct xe_device *xe)
> {
> +#ifdef CONFIG_PCIEAER
We can remove the ifdefs and use this
aer_cap = pci_find_ext_capability(usp, PCI_EXT_CAP_ID_ERR);
> + struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
> + struct pci_dev *usp;
> +
> + /*
> + * Cold reset power-cycles the slot, dropping the PCIe link. The
> + * slot triggers a spurious Surprise Link Down AER event on the USP.
> + */
Also mention the reason for masking it. Move this comment to the
function that masks it.
Thanks
Riana
> + usp = find_usp_dev(pdev);
> +
> + if (usp)
> + pcie_suppress_surprise_link_down(usp);
> +#endif
> xe_device_set_wedged_method(xe, DRM_WEDGE_RECOVERY_COLD_RESET);
> xe_device_declare_wedged(xe);
> }
> @@ -503,22 +552,6 @@ enum xe_ras_recovery_action xe_ras_process_errors(struct xe_device *xe)
> return XE_RAS_RECOVERY_ACTION_RESET;
> }
>
> -static struct pci_dev *find_usp_dev(struct pci_dev *pdev)
> -{
> - struct pci_dev *vsp;
> -
> - /*
> - * Device Hierarchy:
> - *
> - * Upstream Switch Port (USP) --> Virtual Switch Port (VSP) --> SGunit (GPU endpoint)
> - */
> - vsp = pci_upstream_bridge(pdev);
> - if (!vsp)
> - return NULL;
> -
> - return pci_upstream_bridge(vsp);
> -}
> -
> static void aer_unmask_and_downgrade_internal_error(struct xe_device *xe)
> {
> struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v8 5/6] drm/xe: Suppress Surprise Link Down on device
2026-06-12 8:07 ` [PATCH v8 5/6] drm/xe: Suppress Surprise Link Down on device Mallesh Koujalagi
2026-06-12 8:21 ` sashiko-bot
2026-06-15 8:06 ` Tauro, Riana
@ 2026-06-18 13:24 ` Raag Jadav
2 siblings, 0 replies; 17+ messages in thread
From: Raag Jadav @ 2026-06-18 13:24 UTC (permalink / raw)
To: Mallesh Koujalagi
Cc: intel-xe, dri-devel, rodrigo.vivi, andrealmeid, christian.koenig,
airlied, simona.vetter, mripard, maarten.lankhorst, tzimmermann,
anshuman.gupta, badal.nilawar, riana.tauro, karthik.poosa,
sk.anirban
On Fri, Jun 12, 2026 at 01:37:28PM +0530, Mallesh Koujalagi wrote:
> PUNIT errors can only be recovered using a power-cycle. Xe KMD
> sends a uevent to notify userspace to trigger a power cycle.
> On platforms where link drop caused by powering the device off and
> back on is reported by hardware as a Surprise Link Down (SLD), which
> AER then escalates as an Uncorrectable Fatal Error. That error fires
> before the device finishes coming back up and defeats the
> very recovery we are attempting.
>
> To keep the expected, recovery-induced link drop from being raised as
> a fatal AER event, mask the Surprise Link Down bit
> (PCI_ERR_UNC_SURPDN) in the upstream port's AER Uncorrectable Error
> Mask register before punit_error_handler() requests the cold reset.
>
> Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
> ---
> v6:
> - Expand commit message to explain why SUR_DN is masked. (Raag/Riana)
> - Check Slot Implemented bit before reading Slot Capabilities, per
> PCIe spec. (Riana)
> - Add debug log.
>
> v7:
> - Handle surprise link down event properly. (Aravind/Riana)
> - Update commit message. (Riana)
> - Correct log message.
>
> v8:
> - Use find_usp_dev() in punit_error_handler() function.
> ---
> drivers/gpu/drm/xe/xe_ras.c | 65 ++++++++++++++++++++++++++++---------
> 1 file changed, 49 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
> index 93a56a0269f1..15c2fa0d323a 100644
> --- a/drivers/gpu/drm/xe/xe_ras.c
> +++ b/drivers/gpu/drm/xe/xe_ras.c
> @@ -209,8 +209,57 @@ static enum xe_ras_recovery_action handle_core_compute_errors(struct xe_ras_erro
> return XE_RAS_RECOVERY_ACTION_RECOVERED;
> }
>
> +static struct pci_dev *find_usp_dev(struct pci_dev *pdev)
> +{
> + struct pci_dev *vsp;
> +
> + /*
> + * Device Hierarchy:
> + *
> + * Upstream Switch Port (USP) --> Virtual Switch Port (VSP) --> SGunit (GPU endpoint)
> + */
> + vsp = pci_upstream_bridge(pdev);
> + if (!vsp)
> + return NULL;
> +
> + return pci_upstream_bridge(vsp);
> +}
Unneeded churn, please make sure the function is already at the top in
the original series.
> +#ifdef CONFIG_PCIEAER
> +static void pcie_suppress_surprise_link_down(struct pci_dev *usp)
> +{
> + u32 aer_uncorr_mask;
> + u16 aer_cap;
> +
> + aer_cap = usp->aer_cap;
> + if (!aer_cap) {
> + dev_dbg(&usp->dev,
> + "AER capability not present\n");
> + return;
> + }
> +
> + pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, &aer_uncorr_mask);
> + aer_uncorr_mask |= PCI_ERR_UNC_SURPDN;
> + pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, aer_uncorr_mask);
> + dev_dbg(&usp->dev, "Surprise Link Down masked for cold reset\n");
What about when we come back on after successful recovery? Do we need to
unmask it?
Raag
> +}
> +#endif /* CONFIG_PCIEAER */
> +
> static void punit_error_handler(struct xe_device *xe)
> {
> +#ifdef CONFIG_PCIEAER
> + struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
> + struct pci_dev *usp;
> +
> + /*
> + * Cold reset power-cycles the slot, dropping the PCIe link. The
> + * slot triggers a spurious Surprise Link Down AER event on the USP.
> + */
> + usp = find_usp_dev(pdev);
> +
> + if (usp)
> + pcie_suppress_surprise_link_down(usp);
> +#endif
> xe_device_set_wedged_method(xe, DRM_WEDGE_RECOVERY_COLD_RESET);
> xe_device_declare_wedged(xe);
> }
> @@ -503,22 +552,6 @@ enum xe_ras_recovery_action xe_ras_process_errors(struct xe_device *xe)
> return XE_RAS_RECOVERY_ACTION_RESET;
> }
>
> -static struct pci_dev *find_usp_dev(struct pci_dev *pdev)
> -{
> - struct pci_dev *vsp;
> -
> - /*
> - * Device Hierarchy:
> - *
> - * Upstream Switch Port (USP) --> Virtual Switch Port (VSP) --> SGunit (GPU endpoint)
> - */
> - vsp = pci_upstream_bridge(pdev);
> - if (!vsp)
> - return NULL;
> -
> - return pci_upstream_bridge(vsp);
> -}
> -
> static void aer_unmask_and_downgrade_internal_error(struct xe_device *xe)
> {
> struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2026-06-18 13:25 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-12 8:07 [PATCH v8 0/6] Introduce cold reset recovery method Mallesh Koujalagi
2026-06-12 8:07 ` [PATCH v8 1/6] Introduce Xe Uncorrectable Error Handling Mallesh Koujalagi
2026-06-12 8:24 ` sashiko-bot
2026-06-12 8:07 ` [PATCH v8 2/6] drm: Add DRM_WEDGE_RECOVERY_COLD_RESET recovery method Mallesh Koujalagi
2026-06-12 8:07 ` [PATCH v8 3/6] drm/doc: Document " Mallesh Koujalagi
2026-06-12 8:07 ` [PATCH v8 4/6] drm/xe: Handle PUNIT errors by requesting cold-reset recovery Mallesh Koujalagi
2026-06-12 8:27 ` sashiko-bot
2026-06-12 8:07 ` [PATCH v8 5/6] drm/xe: Suppress Surprise Link Down on device Mallesh Koujalagi
2026-06-12 8:21 ` sashiko-bot
2026-06-15 8:06 ` Tauro, Riana
2026-06-18 13:24 ` Raag Jadav
2026-06-12 8:07 ` [PATCH v8 6/6] drm/xe/ras: Add debugfs entry to inject punit error Mallesh Koujalagi
2026-06-12 8:23 ` sashiko-bot
2026-06-12 8:16 ` ✗ CI.checkpatch: warning for Introduce cold reset recovery method (rev8) Patchwork
2026-06-12 8:18 ` ✓ CI.KUnit: success " Patchwork
2026-06-12 9:03 ` ✓ Xe.CI.BAT: " Patchwork
2026-06-13 1:18 ` ✓ Xe.CI.FULL: " Patchwork
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.