[PATCH v2 00/11] Introduce Xe Uncorrectable Error Handling

public inbox for intel-xe@lists.freedesktop.org
 help / color / mirror / Atom feed

* [PATCH v2 00/11] Introduce Xe Uncorrectable Error Handling
@ 2026-03-02 10:21 Riana Tauro
  2026-03-02 10:21 ` [PATCH v2 01/11] drm/xe/xe_sysctrl: Add System controller patch Riana Tauro
                   ` (14 more replies)
  0 siblings, 15 replies; 43+ messages in thread
From: Riana Tauro @ 2026-03-02 10:21 UTC (permalink / raw)
  To: intel-xe
  Cc: riana.tauro, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, raag.jadav, ravi.kishore.koppuravuri,
	mallesh.koujalagi

This series adds the base support for XE Uncorrectable Error Handling
on top of the system controller patch [1].

The first four patches implement PCI error recovery callbacks for AER events.
On fatal errors, the device is wedged in error_detected and a Secondary
Bus reset (SBR) is requested from PCI core by returning
PCI_ERS_RESULT_NEED_RESET.

On non-fatal errors, the mmio_enabled callback is invoked to query the
error and attempt the required recovery.

This series adds support for handling Uncorrectable Core compute
and SoC internal errors.

Core Compute Errors: Uncorrectable Core-Compute errors are classified
into Global and Local errors.
Global error is an error that affects the entire device requiring a reset.
This type of error is not isolated. When an AER is reported and
error_detected is invoked return PCI_ERS_RESULT_NEED_RESET.
A Local error is confined to a specific component or context like a
engine. These errors can be contained and recovered by resetting
only the affected part without distrupting the rest of the device.

SoC Internal errors: Most of the uncorrectable SoC internal errors
are recovered using a SBR apart from CSC firmware and Punit errors.
CSC firmware errors requires a firmware flash to be recovered whereas
Punit error requires cold-reset.

Rev2: Add support for SoC internal errors
      fix review comments

Anoop Vijay (1):
  drm/xe/xe_sysctrl: Add System controller patch

Riana Tauro (10):
  drm/xe/xe_survivability: Decouple survivability info from boot
    survivability
  drm/xe/xe_pci_error: Implement PCI error recovery callbacks
  drm/xe/xe_pci_error: Group all devres to release them on PCIe slot
    reset
  drm/xe: Skip device access during PCI error recovery
  drm/xe/xe_ras: Initialize Uncorrectable AER Registers
  drm/xe/xe_ras: Add structures and commands for Uncorrectable Core
    Compute Errors
  drm/xe/xe_ras: Add support for Uncorrectable Core-Compute errors
  drm/xe/xe_ras: Add structures for SoC Internal errors
  drm/xe/xe_ras: Handle Uncorrectable SoC Internal errors
  drm/xe/xe_pci_error: Process errors in mmio_enabled

 drivers/gpu/drm/xe/Makefile                   |   4 +
 drivers/gpu/drm/xe/regs/xe_sysctrl_regs.h     |  36 ++
 drivers/gpu/drm/xe/xe_device.c                |  15 +
 drivers/gpu/drm/xe/xe_device.h                |  15 +
 drivers/gpu/drm/xe/xe_device_types.h          |  12 +
 drivers/gpu/drm/xe/xe_gt.c                    |  11 +-
 drivers/gpu/drm/xe/xe_guc_submit.c            |   9 +-
 drivers/gpu/drm/xe/xe_pci.c                   |   5 +
 drivers/gpu/drm/xe/xe_pci_error.c             | 111 +++++
 drivers/gpu/drm/xe/xe_pci_types.h             |   1 +
 drivers/gpu/drm/xe/xe_ras.c                   | 325 +++++++++++++++
 drivers/gpu/drm/xe/xe_ras.h                   |  16 +
 drivers/gpu/drm/xe/xe_ras_types.h             | 282 +++++++++++++
 drivers/gpu/drm/xe/xe_survivability_mode.c    |  12 +-
 drivers/gpu/drm/xe/xe_sysctrl.c               |  80 ++++
 drivers/gpu/drm/xe/xe_sysctrl.h               |  13 +
 drivers/gpu/drm/xe/xe_sysctrl_mailbox.c       | 390 ++++++++++++++++++
 drivers/gpu/drm/xe/xe_sysctrl_mailbox.h       |  35 ++
 drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h |  55 +++
 drivers/gpu/drm/xe/xe_sysctrl_types.h         |  33 ++
 20 files changed, 1452 insertions(+), 8 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/regs/xe_sysctrl_regs.h
 create mode 100644 drivers/gpu/drm/xe/xe_pci_error.c
 create mode 100644 drivers/gpu/drm/xe/xe_ras.c
 create mode 100644 drivers/gpu/drm/xe/xe_ras.h
 create mode 100644 drivers/gpu/drm/xe/xe_ras_types.h
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl.c
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl.h
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl_mailbox.c
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl_mailbox.h
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl_types.h

-- 
2.47.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v2 01/11] drm/xe/xe_sysctrl: Add System controller patch
  2026-03-02 10:21 [PATCH v2 00/11] Introduce Xe Uncorrectable Error Handling Riana Tauro
@ 2026-03-02 10:21 ` Riana Tauro
  2026-03-02 10:21 ` [PATCH v2 02/11] drm/xe/xe_survivability: Decouple survivability info from boot survivability Riana Tauro
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 43+ messages in thread
From: Riana Tauro @ 2026-03-02 10:21 UTC (permalink / raw)
  To: intel-xe
  Cc: riana.tauro, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, raag.jadav, ravi.kishore.koppuravuri,
	mallesh.koujalagi, Anoop Vijay

From: Anoop Vijay <anoop.c.vijay@intel.com>

DO NOT REVIEW. COMPILATION ONLY
This patch is from https://patchwork.freedesktop.org/series/159554/
Added only for Compilation.

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
---
 drivers/gpu/drm/xe/Makefile                   |   2 +
 drivers/gpu/drm/xe/regs/xe_sysctrl_regs.h     |  36 ++
 drivers/gpu/drm/xe/xe_device.c                |   5 +
 drivers/gpu/drm/xe/xe_device_types.h          |   6 +
 drivers/gpu/drm/xe/xe_pci.c                   |   2 +
 drivers/gpu/drm/xe/xe_pci_types.h             |   1 +
 drivers/gpu/drm/xe/xe_sysctrl.c               |  80 ++++
 drivers/gpu/drm/xe/xe_sysctrl.h               |  13 +
 drivers/gpu/drm/xe/xe_sysctrl_mailbox.c       | 390 ++++++++++++++++++
 drivers/gpu/drm/xe/xe_sysctrl_mailbox.h       |  35 ++
 drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h |  42 ++
 drivers/gpu/drm/xe/xe_sysctrl_types.h         |  33 ++
 12 files changed, 645 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/regs/xe_sysctrl_regs.h
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl.c
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl.h
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl_mailbox.c
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl_mailbox.h
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
 create mode 100644 drivers/gpu/drm/xe/xe_sysctrl_types.h

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index ff778fb2d4ff..1890bbd1b28d 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -122,6 +122,8 @@ xe-y += xe_bb.o \
 	xe_step.o \
 	xe_survivability_mode.o \
 	xe_sync.o \
+	xe_sysctrl.o \
+	xe_sysctrl_mailbox.o \
 	xe_tile.o \
 	xe_tile_sysfs.o \
 	xe_tlb_inval.o \
diff --git a/drivers/gpu/drm/xe/regs/xe_sysctrl_regs.h b/drivers/gpu/drm/xe/regs/xe_sysctrl_regs.h
new file mode 100644
index 000000000000..2e91febfa9a2
--- /dev/null
+++ b/drivers/gpu/drm/xe/regs/xe_sysctrl_regs.h
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#ifndef _XE_SYSCTRL_REGS_H_
+#define _XE_SYSCTRL_REGS_H_
+
+#include "xe_regs.h"
+
+#define SYSCTRL_BASE_OFFSET			0xdb000
+#define SYSCTRL_BASE				(SOC_BASE + SYSCTRL_BASE_OFFSET)
+#define SYSCTRL_MAILBOX_INDEX			0x03
+#define SYSCTRL_BAR_LENGTH			0x1000
+
+#define SYSCTRL_MB_CTRL				XE_REG(0x10)
+#define   SYSCTRL_MB_CTRL_RUN_BUSY		REG_BIT(31)
+#define   SYSCTRL_MB_CTRL_IRQ			REG_BIT(30)
+#define   SYSCTRL_MB_CTRL_RUN_BUSY_OUT		REG_BIT(29)
+#define   SYSCTRL_MB_CTRL_PARAM3_MASK		REG_GENMASK(28, 24)
+#define   SYSCTRL_MB_CTRL_PARAM2_MASK		REG_GENMASK(23, 16)
+#define   SYSCTRL_MB_CTRL_PARAM1_MASK		REG_GENMASK(15, 8)
+#define   SYSCTRL_MB_CTRL_COMMAND_MASK		REG_GENMASK(7, 0)
+#define   SYSCTRL_MB_CTRL_MKHI_CMD		REG_FIELD_PREP(SYSCTRL_MB_CTRL_COMMAND_MASK, 5)
+
+#define SYSCTRL_MB_DATA0			XE_REG(0x14)
+#define SYSCTRL_MB_DATA1			XE_REG(0x18)
+#define SYSCTRL_MB_DATA2			XE_REG(0x1C)
+#define SYSCTRL_MB_DATA3			XE_REG(0x20)
+
+#define MKHI_FRAME_PHASE			REG_BIT(24)
+#define MKHI_FRAME_CURRENT_MASK			REG_GENMASK(21, 16)
+#define MKHI_FRAME_TOTAL_MASK			REG_GENMASK(13, 8)
+#define MKHI_FRAME_COMMAND_MASK			REG_GENMASK(7, 0)
+
+#endif /* _XE_SYSCTRL_REGS_H_ */
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 3462645ca13c..1d61bb504e9b 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -65,6 +65,7 @@
 #include "xe_survivability_mode.h"
 #include "xe_sriov.h"
 #include "xe_svm.h"
+#include "xe_sysctrl.h"
 #include "xe_tile.h"
 #include "xe_ttm_stolen_mgr.h"
 #include "xe_ttm_sys_mgr.h"
@@ -985,6 +986,10 @@ int xe_device_probe(struct xe_device *xe)
 	if (err)
 		goto err_unregister_display;
 
+	err = xe_sysctrl_init(xe);
+	if (err)
+		goto err_unregister_display;
+
 	err = xe_device_sysfs_init(xe);
 	if (err)
 		goto err_unregister_display;
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index caa8f34a6744..5599534384fa 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -25,6 +25,7 @@
 #include "xe_sriov_vf_types.h"
 #include "xe_sriov_vf_ccs_types.h"
 #include "xe_step_types.h"
+#include "xe_sysctrl_types.h"
 #include "xe_survivability_mode_types.h"
 #include "xe_tile_types.h"
 #include "xe_validation.h"
@@ -203,6 +204,8 @@ struct xe_device {
 		u8 has_soc_remapper_telem:1;
 		/** @info.has_sriov: Supports SR-IOV */
 		u8 has_sriov:1;
+		/** @info.has_sysctrl: Supports System Controller */
+		u8 has_sysctrl:1;
 		/** @info.has_usm: Device has unified shared memory support */
 		u8 has_usm:1;
 		/** @info.has_64bit_timestamp: Device supports 64-bit timestamps */
@@ -471,6 +474,9 @@ struct xe_device {
 	/** @heci_gsc: graphics security controller */
 	struct xe_heci_gsc heci_gsc;
 
+	/** @sc: System Controller */
+	struct xe_sysctrl sc;
+
 	/** @nvm: discrete graphics non-volatile memory */
 	struct intel_dg_nvm_dev *nvm;
 
diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
index 3ac99472d6dd..ad1e5ef2ee89 100644
--- a/drivers/gpu/drm/xe/xe_pci.c
+++ b/drivers/gpu/drm/xe/xe_pci.c
@@ -462,6 +462,7 @@ static const struct xe_device_desc cri_desc = {
 	.has_soc_remapper_sysctrl = true,
 	.has_soc_remapper_telem = true,
 	.has_sriov = true,
+	.has_sysctrl = true,
 	.max_gt_per_tile = 2,
 	MULTI_LRC_MASK,
 	.require_force_probe = true,
@@ -761,6 +762,7 @@ static int xe_info_init_early(struct xe_device *xe,
 	xe->info.has_soc_remapper_telem = desc->has_soc_remapper_telem;
 	xe->info.has_sriov = xe_configfs_primary_gt_allowed(to_pci_dev(xe->drm.dev)) &&
 		desc->has_sriov;
+	xe->info.has_sysctrl = desc->has_sysctrl;
 	xe->info.skip_guc_pc = desc->skip_guc_pc;
 	xe->info.skip_mtcfg = desc->skip_mtcfg;
 	xe->info.skip_pcode = desc->skip_pcode;
diff --git a/drivers/gpu/drm/xe/xe_pci_types.h b/drivers/gpu/drm/xe/xe_pci_types.h
index 47e8a1552c2b..6ffef99b7973 100644
--- a/drivers/gpu/drm/xe/xe_pci_types.h
+++ b/drivers/gpu/drm/xe/xe_pci_types.h
@@ -57,6 +57,7 @@ struct xe_device_desc {
 	u8 has_soc_remapper_sysctrl:1;
 	u8 has_soc_remapper_telem:1;
 	u8 has_sriov:1;
+	u8 has_sysctrl:1;
 	u8 needs_scratch:1;
 	u8 skip_guc_pc:1;
 	u8 skip_mtcfg:1;
diff --git a/drivers/gpu/drm/xe/xe_sysctrl.c b/drivers/gpu/drm/xe/xe_sysctrl.c
new file mode 100644
index 000000000000..430bccbdc3b9
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_sysctrl.c
@@ -0,0 +1,80 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#include <drm/drm_managed.h>
+#include <linux/device.h>
+#include <linux/mutex.h>
+
+#include "regs/xe_sysctrl_regs.h"
+#include "xe_device.h"
+#include "xe_mmio.h"
+#include "xe_printk.h"
+#include "xe_soc_remapper.h"
+#include "xe_sysctrl.h"
+#include "xe_sysctrl_mailbox.h"
+#include "xe_sysctrl_types.h"
+
+/**
+ * DOC: System Controller (sysctrl)
+ *
+ * The System Controller (sysctrl) is an embedded microcontroller in Intel GPUs
+ * responsible for managing various low-level platform functions. Communication
+ * between the driver and the System Controller occurs via a mailbox interface,
+ * enabling the exchange of commands and responses.
+ *
+ * This module provides initialization routines and helper functions to interact
+ * with the System Controller through the mailbox.
+ */
+
+static void xe_sysctrl_fini(void *arg)
+{
+	struct xe_device *xe = arg;
+
+	xe->soc_remapper.set_sysctrl_region(xe, 0);
+}
+
+/**
+ * xe_sysctrl_init - Initialize System Controller subsystem
+ * @xe: xe device instance
+ *
+ * Entry point for System Controller initialization, called from xe_device_probe.
+ * This function checks platform support and initializes the system controller.
+ *
+ * Return: 0 on success, error code on failure
+ */
+int xe_sysctrl_init(struct xe_device *xe)
+{
+	struct xe_tile *tile = xe_device_get_root_tile(xe);
+	struct xe_sysctrl *sc = &xe->sc;
+	int ret;
+
+	if (!xe->info.has_sysctrl)
+		return 0;
+
+	if (!xe->soc_remapper.set_sysctrl_region)
+		return -ENODEV;
+
+	xe->soc_remapper.set_sysctrl_region(xe, SYSCTRL_MAILBOX_INDEX);
+
+	ret = devm_add_action_or_reset(xe->drm.dev, xe_sysctrl_fini, xe);
+	if (ret)
+		return ret;
+
+	sc->mmio = devm_kzalloc(xe->drm.dev, sizeof(*sc->mmio), GFP_KERNEL);
+	if (!sc->mmio)
+		return -ENOMEM;
+
+	xe_mmio_init(sc->mmio, tile, tile->mmio.regs, tile->mmio.regs_size);
+	sc->mmio->adj_offset = SYSCTRL_BASE;
+	sc->mmio->adj_limit = U32_MAX;
+
+	ret = drmm_mutex_init(&xe->drm, &sc->cmd_lock);
+	if (ret)
+		return ret;
+
+	xe_sysctrl_mailbox_init(sc);
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/xe/xe_sysctrl.h b/drivers/gpu/drm/xe/xe_sysctrl.h
new file mode 100644
index 000000000000..ee7826fe4c98
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_sysctrl.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#ifndef _XE_SYSCTRL_H_
+#define _XE_SYSCTRL_H_
+
+struct xe_device;
+
+int xe_sysctrl_init(struct xe_device *xe);
+
+#endif /* _XE_SYSCTRL_H_ */
diff --git a/drivers/gpu/drm/xe/xe_sysctrl_mailbox.c b/drivers/gpu/drm/xe/xe_sysctrl_mailbox.c
new file mode 100644
index 000000000000..15a186a6f057
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_sysctrl_mailbox.c
@@ -0,0 +1,390 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#include <linux/bitfield.h>
+#include <linux/cleanup.h>
+#include <linux/container_of.h>
+#include <linux/errno.h>
+#include <linux/minmax.h>
+#include <linux/mutex.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/types.h>
+
+#include "regs/xe_sysctrl_regs.h"
+#include "xe_device.h"
+#include "xe_device_types.h"
+#include "xe_mmio.h"
+#include "xe_pm.h"
+#include "xe_printk.h"
+#include "xe_sysctrl.h"
+#include "xe_sysctrl_mailbox.h"
+#include "xe_sysctrl_mailbox_types.h"
+#include "xe_sysctrl_types.h"
+
+#define MKHI_HDR_GROUP_ID_MASK		GENMASK(7, 0)
+#define MKHI_HDR_COMMAND_MASK		GENMASK(14, 8)
+#define MKHI_HDR_IS_RESPONSE		BIT(15)
+#define MKHI_HDR_RESERVED_MASK		GENMASK(23, 16)
+#define MKHI_HDR_RESULT_MASK		GENMASK(31, 24)
+
+#define XE_SYSCTRL_MKHI_HDR_GROUP_ID(hdr) \
+	FIELD_GET(MKHI_HDR_GROUP_ID_MASK, le32_to_cpu((hdr)->data))
+
+#define XE_SYSCTRL_MKHI_HDR_COMMAND(hdr) \
+	FIELD_GET(MKHI_HDR_COMMAND_MASK, le32_to_cpu((hdr)->data))
+
+#define XE_SYSCTRL_MKHI_HDR_IS_RESPONSE(hdr) \
+	FIELD_GET(MKHI_HDR_IS_RESPONSE, le32_to_cpu((hdr)->data))
+
+#define XE_SYSCTRL_MKHI_HDR_RESULT(hdr) \
+	FIELD_GET(MKHI_HDR_RESULT_MASK, le32_to_cpu((hdr)->data))
+
+static struct xe_device *sc_to_xe(struct xe_sysctrl *sc)
+{
+	return container_of(sc, struct xe_device, sc);
+}
+
+static bool xe_sysctrl_mailbox_wait_bit_clear(struct xe_sysctrl *sc, u32 bit_mask,
+					      unsigned int timeout_ms)
+{
+	int ret;
+
+	ret = xe_mmio_wait32_not(sc->mmio, SYSCTRL_MB_CTRL, bit_mask, bit_mask,
+				 timeout_ms * 1000, NULL, false);
+
+	return ret == 0;
+}
+
+static bool xe_sysctrl_mailbox_wait_bit_set(struct xe_sysctrl *sc, u32 bit_mask,
+					    unsigned int timeout_ms)
+{
+	int ret;
+
+	ret = xe_mmio_wait32(sc->mmio, SYSCTRL_MB_CTRL, bit_mask, bit_mask,
+			     timeout_ms * 1000, NULL, false);
+
+	return ret == 0;
+}
+
+static int xe_sysctrl_mailbox_write_frame(struct xe_sysctrl *sc, const void *frame,
+					  size_t len)
+{
+	static const struct xe_reg regs[] = {
+		SYSCTRL_MB_DATA0, SYSCTRL_MB_DATA1, SYSCTRL_MB_DATA2, SYSCTRL_MB_DATA3
+	};
+	u32 val[SYSCTRL_MB_FRAME_SIZE / sizeof(u32)] = {0};
+	u32 dw = DIV_ROUND_UP(len, sizeof(u32));
+	u32 i;
+
+	memcpy(val, frame, len);
+
+	for (i = 0; i < dw; i++)
+		xe_mmio_write32(sc->mmio, regs[i], val[i]);
+
+	return 0;
+}
+
+static int xe_sysctrl_mailbox_read_frame(struct xe_sysctrl *sc, void *frame,
+					 size_t len)
+{
+	static const struct xe_reg regs[] = {
+		SYSCTRL_MB_DATA0, SYSCTRL_MB_DATA1, SYSCTRL_MB_DATA2, SYSCTRL_MB_DATA3
+	};
+	u32 val[SYSCTRL_MB_FRAME_SIZE / sizeof(u32)] = {0};
+	u32 dw = DIV_ROUND_UP(len, sizeof(u32));
+	u32 i;
+
+	for (i = 0; i < dw; i++)
+		val[i] = xe_mmio_read32(sc->mmio, regs[i]);
+
+	memcpy(frame, val, len);
+
+	return 0;
+}
+
+static void xe_sysctrl_mailbox_clear_response(struct xe_sysctrl *sc)
+{
+	xe_mmio_rmw32(sc->mmio, SYSCTRL_MB_CTRL, SYSCTRL_MB_CTRL_RUN_BUSY_OUT, 0);
+}
+
+static int xe_sysctrl_mailbox_prepare_command(struct xe_device *xe,
+					      u8 group_id, u8 command,
+					      const void *data_in, size_t data_in_len,
+					      u8 **mbox_cmd, size_t *cmd_size)
+{
+	struct xe_sysctrl_mailbox_mkhi_msg_hdr *mkhi_hdr;
+	size_t size;
+	u8 *buffer;
+
+	if (data_in_len > SYSCTRL_MB_MAX_MESSAGE_SIZE - sizeof(*mkhi_hdr)) {
+		xe_err(xe, "sysctrl: Input data too large: %zu bytes\n", data_in_len);
+		return -EINVAL;
+	}
+
+	size = sizeof(*mkhi_hdr) + data_in_len;
+
+	buffer = kmalloc(size, GFP_KERNEL);
+	if (!buffer)
+		return -ENOMEM;
+
+	mkhi_hdr = (struct xe_sysctrl_mailbox_mkhi_msg_hdr *)buffer;
+	mkhi_hdr->data = cpu_to_le32(FIELD_PREP(MKHI_HDR_GROUP_ID_MASK, group_id) |
+				     FIELD_PREP(MKHI_HDR_COMMAND_MASK, command & 0x7F) |
+				     FIELD_PREP(MKHI_HDR_IS_RESPONSE, 0) |
+				     FIELD_PREP(MKHI_HDR_RESERVED_MASK, 0) |
+				     FIELD_PREP(MKHI_HDR_RESULT_MASK, 0));
+
+	if (data_in && data_in_len)
+		memcpy(buffer + sizeof(*mkhi_hdr), data_in, data_in_len);
+
+	*mbox_cmd = buffer;
+	*cmd_size = size;
+
+	return 0;
+}
+
+static int xe_sysctrl_mailbox_send_frames(struct xe_sysctrl *sc,
+					  const u8 *mbox_cmd,
+					  size_t cmd_size, unsigned int timeout_ms)
+{
+	struct xe_device *xe = sc_to_xe(sc);
+	u32 ctrl_reg, total_frames, frame;
+	size_t bytes_sent, frame_size;
+
+	total_frames = DIV_ROUND_UP(cmd_size, SYSCTRL_MB_FRAME_SIZE);
+
+	if (!xe_sysctrl_mailbox_wait_bit_clear(sc, SYSCTRL_MB_CTRL_RUN_BUSY, timeout_ms)) {
+		xe_err(xe, "sysctrl: Mailbox busy\n");
+		return -EBUSY;
+	}
+
+	sc->phase_bit ^= 1;
+	bytes_sent = 0;
+
+	for (frame = 0; frame < total_frames; frame++) {
+		frame_size = min_t(size_t, cmd_size - bytes_sent, SYSCTRL_MB_FRAME_SIZE);
+
+		if (xe_sysctrl_mailbox_write_frame(sc, mbox_cmd + bytes_sent, frame_size)) {
+			xe_err(xe, "sysctrl: Failed to write frame %u\n", frame);
+			sc->phase_bit = 0;
+			return -EIO;
+		}
+
+		ctrl_reg = SYSCTRL_MB_CTRL_RUN_BUSY |
+			   FIELD_PREP(MKHI_FRAME_CURRENT_MASK, frame) |
+			   FIELD_PREP(MKHI_FRAME_TOTAL_MASK, total_frames - 1) |
+			   SYSCTRL_MB_CTRL_MKHI_CMD |
+			   (sc->phase_bit ? MKHI_FRAME_PHASE : 0);
+
+		xe_mmio_write32(sc->mmio, SYSCTRL_MB_CTRL, ctrl_reg);
+
+		if (!xe_sysctrl_mailbox_wait_bit_clear(sc, SYSCTRL_MB_CTRL_RUN_BUSY, timeout_ms)) {
+			xe_err(xe, "sysctrl: Frame %u acknowledgment timeout\n", frame);
+			sc->phase_bit = 0;
+			return -ETIMEDOUT;
+		}
+
+		bytes_sent += frame_size;
+	}
+
+	return 0;
+}
+
+static int xe_sysctrl_mailbox_process_frame(struct xe_sysctrl *sc, void *out,
+					    size_t frame_size, unsigned int timeout_ms,
+					    bool *done)
+{
+	u32 curr_frame, total_frames, ctrl_reg;
+	struct xe_device *xe = sc_to_xe(sc);
+	int ret;
+
+	if (!xe_sysctrl_mailbox_wait_bit_set(sc, SYSCTRL_MB_CTRL_RUN_BUSY_OUT, timeout_ms)) {
+		xe_err(xe, "sysctrl: Response frame timeout\n");
+		return -ETIMEDOUT;
+	}
+
+	ctrl_reg = xe_mmio_read32(sc->mmio, SYSCTRL_MB_CTRL);
+	total_frames = FIELD_GET(MKHI_FRAME_TOTAL_MASK, ctrl_reg);
+	curr_frame = FIELD_GET(MKHI_FRAME_CURRENT_MASK, ctrl_reg);
+
+	ret = xe_sysctrl_mailbox_read_frame(sc, out, frame_size);
+	if (ret)
+		return ret;
+
+	xe_sysctrl_mailbox_clear_response(sc);
+
+	if (curr_frame == total_frames)
+		*done = true;
+
+	return 0;
+}
+
+static int xe_sysctrl_mailbox_receive_frames(struct xe_sysctrl *sc,
+					     const struct xe_sysctrl_mailbox_mkhi_msg_hdr *req,
+					     void *data_out, size_t data_out_len,
+					     size_t *rdata_len, unsigned int timeout_ms)
+{
+	struct xe_sysctrl_mailbox_mkhi_msg_hdr *mkhi_hdr;
+	struct xe_device *xe = sc_to_xe(sc);
+	size_t frame_size, remain;
+	bool done = false;
+	u8 *out;
+	int ret = 0;
+
+	remain = sizeof(*mkhi_hdr) + data_out_len;
+	u8 *buffer __free(kfree) = kzalloc(remain, GFP_KERNEL);
+	if (!buffer)
+		return -ENOMEM;
+
+	out = buffer;
+	while (!done && remain) {
+		frame_size = min_t(size_t, remain, SYSCTRL_MB_FRAME_SIZE);
+
+		ret = xe_sysctrl_mailbox_process_frame(sc, out, frame_size, timeout_ms,
+						       &done);
+		if (ret)
+			return ret;
+
+		remain -= frame_size;
+		out += frame_size;
+	}
+
+	mkhi_hdr = (struct xe_sysctrl_mailbox_mkhi_msg_hdr *)buffer;
+
+	if (!XE_SYSCTRL_MKHI_HDR_IS_RESPONSE(mkhi_hdr) ||
+	    XE_SYSCTRL_MKHI_HDR_GROUP_ID(mkhi_hdr) != XE_SYSCTRL_MKHI_HDR_GROUP_ID(req) ||
+	    XE_SYSCTRL_MKHI_HDR_COMMAND(mkhi_hdr) != XE_SYSCTRL_MKHI_HDR_COMMAND(req)) {
+		xe_err(xe, "sysctrl: Response header mismatch\n");
+		return -EPROTO;
+	}
+
+	if (XE_SYSCTRL_MKHI_HDR_RESULT(mkhi_hdr) != 0) {
+		xe_err(xe, "sysctrl: Firmware error: 0x%02lx\n",
+		       XE_SYSCTRL_MKHI_HDR_RESULT(mkhi_hdr));
+		return -EIO;
+	}
+
+	memcpy(data_out, mkhi_hdr + 1, data_out_len);
+	*rdata_len = out - buffer - sizeof(*mkhi_hdr);
+
+	return ret;
+}
+
+static int xe_sysctrl_mailbox_send_command(struct xe_sysctrl *sc,
+					   const u8 *mbox_cmd, size_t cmd_size,
+					   void *data_out, size_t data_out_len,
+					   size_t *rdata_len, unsigned int timeout_ms)
+{
+	const struct xe_sysctrl_mailbox_mkhi_msg_hdr *mkhi_hdr;
+	size_t received;
+	int ret;
+
+	ret = xe_sysctrl_mailbox_send_frames(sc, mbox_cmd, cmd_size, timeout_ms);
+	if (ret)
+		return ret;
+
+	if (!data_out || !rdata_len)
+		return 0;
+
+	mkhi_hdr = (const struct xe_sysctrl_mailbox_mkhi_msg_hdr *)mbox_cmd;
+
+	ret = xe_sysctrl_mailbox_receive_frames(sc, mkhi_hdr, data_out, data_out_len,
+						&received, timeout_ms);
+	if (ret)
+		return ret;
+
+	*rdata_len = received;
+
+	return 0;
+}
+
+/**
+ * xe_sysctrl_mailbox_init - Initialize System Controller mailbox interface
+ * @sc: System controller structure
+ *
+ * Initialize system controller mailbox interface for communication.
+ */
+void xe_sysctrl_mailbox_init(struct xe_sysctrl *sc)
+{
+	u32 ctrl_reg;
+
+	ctrl_reg = xe_mmio_read32(sc->mmio, SYSCTRL_MB_CTRL);
+	sc->phase_bit = (ctrl_reg & MKHI_FRAME_PHASE) ? 1 : 0;
+}
+
+/**
+ * xe_sysctrl_send_command - Send command to System Controller via mailbox
+ * @xe: XE device instance
+ * @cmd: Pointer to xe_sysctrl_mailbox_command structure
+ * @rdata_len: Pointer to store actual response data size (can be NULL)
+ *
+ * Send a command to the System Controller using MKHI protocol. Handles
+ * command preparation, fragmentation, transmission, and response reception.
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+int xe_sysctrl_send_command(struct xe_device *xe,
+			    struct xe_sysctrl_mailbox_command *cmd,
+			    size_t *rdata_len)
+{
+	struct xe_sysctrl *sc;
+	u8 group_id, command_code;
+	u8 *mbox_cmd = NULL;
+	size_t cmd_size = 0;
+	int ret = 0;
+
+	if (!xe) {
+		pr_err("sysctrl: Invalid device handle\n");
+		return -EINVAL;
+	}
+
+	if (!xe->info.has_sysctrl)
+		return -ENODEV;
+
+	sc = &xe->sc;
+
+	if (!cmd) {
+		xe_err(xe, "sysctrl: Invalid command buffer\n");
+		return -EINVAL;
+	}
+
+	group_id = XE_SYSCTRL_APP_HDR_GROUP_ID(&cmd->header);
+	command_code = XE_SYSCTRL_APP_HDR_COMMAND(&cmd->header);
+
+	if (!cmd->data_in && cmd->data_in_len) {
+		xe_err(xe, "sysctrl: Invalid input parameters\n");
+		return -EINVAL;
+	}
+
+	if (!cmd->data_out && cmd->data_out_len) {
+		xe_err(xe, "sysctrl: Invalid output parameters\n");
+		return -EINVAL;
+	}
+
+	might_sleep();
+
+	ret = xe_sysctrl_mailbox_prepare_command(xe, group_id, command_code,
+						 cmd->data_in, cmd->data_in_len,
+						 &mbox_cmd, &cmd_size);
+	if (ret) {
+		xe_err(xe, "sysctrl: Failed to prepare command: %d\n", ret);
+		return ret;
+	}
+
+	guard(xe_pm_runtime)(xe);
+
+	guard(mutex)(&sc->cmd_lock);
+
+	ret = xe_sysctrl_mailbox_send_command(sc, mbox_cmd, cmd_size,
+					      cmd->data_out, cmd->data_out_len, rdata_len,
+					      SYSCTRL_MB_DEFAULT_TIMEOUT_MS);
+	if (ret)
+		xe_err(xe, "sysctrl: Mailbox command failed: %d\n", ret);
+
+	kfree(mbox_cmd);
+
+	return ret;
+}
diff --git a/drivers/gpu/drm/xe/xe_sysctrl_mailbox.h b/drivers/gpu/drm/xe/xe_sysctrl_mailbox.h
new file mode 100644
index 000000000000..2b64165c8e76
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_sysctrl_mailbox.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#ifndef __XE_SYSCTRL_MAILBOX_H__
+#define __XE_SYSCTRL_MAILBOX_H__
+
+#include <linux/bitfield.h>
+#include <linux/types.h>
+
+struct xe_sysctrl;
+struct xe_device;
+struct xe_sysctrl_mailbox_command;
+
+#define APP_HDR_GROUP_ID_MASK			GENMASK(7, 0)
+#define APP_HDR_COMMAND_MASK			GENMASK(15, 8)
+#define APP_HDR_VERSION_MASK			GENMASK(23, 16)
+#define APP_HDR_RESERVED_MASK			GENMASK(31, 24)
+
+#define XE_SYSCTRL_APP_HDR_GROUP_ID(hdr) \
+	FIELD_GET(APP_HDR_GROUP_ID_MASK, le32_to_cpu((hdr)->data))
+
+#define XE_SYSCTRL_APP_HDR_COMMAND(hdr) \
+	FIELD_GET(APP_HDR_COMMAND_MASK, le32_to_cpu((hdr)->data))
+
+#define XE_SYSCTRL_APP_HDR_VERSION(hdr) \
+	FIELD_GET(APP_HDR_VERSION_MASK, le32_to_cpu((hdr)->data))
+
+void xe_sysctrl_mailbox_init(struct xe_sysctrl *sc);
+int xe_sysctrl_send_command(struct xe_device *xe,
+			    struct xe_sysctrl_mailbox_command *cmd,
+			    size_t *rdata_len);
+
+#endif /* __XE_SYSCTRL_MAILBOX_H__ */
diff --git a/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h b/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
new file mode 100644
index 000000000000..ce10924c5881
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#ifndef __XE_SYSCTRL_MAILBOX_TYPES_H__
+#define __XE_SYSCTRL_MAILBOX_TYPES_H__
+
+#include <linux/types.h>
+
+struct xe_sysctrl_mailbox_mkhi_msg_hdr {
+	__le32 data;
+} __packed;
+
+struct xe_sysctrl_mailbox_app_msg_hdr {
+	__le32 data;
+} __packed;
+
+struct xe_sysctrl_mailbox_command {
+	/** @header: Application message header containing command information */
+	struct xe_sysctrl_mailbox_app_msg_hdr header;
+
+	/** @data_in: Pointer to input payload data (can be NULL if no input data) */
+	void *data_in;
+
+	/** @data_in_len: Size of input payload in bytes (0 if no input data) */
+	size_t data_in_len;
+
+	/** @data_out: Pointer to output buffer for response data (can be NULL if no response) */
+	void *data_out;
+
+	/** @data_out_len: Size of output buffer in bytes (0 if no response expected) */
+	size_t data_out_len;
+};
+
+#define SYSCTRL_MB_FRAME_SIZE			16
+#define SYSCTRL_MB_MAX_FRAMES			64
+#define SYSCTRL_MB_MAX_MESSAGE_SIZE		(SYSCTRL_MB_FRAME_SIZE * SYSCTRL_MB_MAX_FRAMES)
+
+#define SYSCTRL_MB_DEFAULT_TIMEOUT_MS		500
+
+#endif /* __XE_SYSCTRL_MAILBOX_TYPES_H__ */
diff --git a/drivers/gpu/drm/xe/xe_sysctrl_types.h b/drivers/gpu/drm/xe/xe_sysctrl_types.h
new file mode 100644
index 000000000000..d4a362564925
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_sysctrl_types.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#ifndef _XE_SYSCTRL_TYPES_H_
+#define _XE_SYSCTRL_TYPES_H_
+
+#include <linux/mutex.h>
+#include <linux/types.h>
+
+struct xe_mmio;
+
+/**
+ * struct xe_sysctrl - System Controller driver context
+ */
+struct xe_sysctrl {
+	/** @mmio: MMIO region for system control registers */
+	struct xe_mmio *mmio;
+
+	/** @cmd_lock: Mutex protecting mailbox command operations */
+	struct mutex cmd_lock;
+
+	/**
+	 * @phase_bit: MKHI message boundary phase toggle bit
+	 *
+	 * Phase bit alternates between 0 and 1 for consecutive
+	 * messages to help distinguish message boundaries.
+	 */
+	bool phase_bit;
+};
+
+#endif /* _XE_SYSCTRL_TYPES_H_ */
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 02/11] drm/xe/xe_survivability: Decouple survivability info from boot survivability
  2026-03-02 10:21 [PATCH v2 00/11] Introduce Xe Uncorrectable Error Handling Riana Tauro
  2026-03-02 10:21 ` [PATCH v2 01/11] drm/xe/xe_sysctrl: Add System controller patch Riana Tauro
@ 2026-03-02 10:21 ` Riana Tauro
  2026-03-02 17:00   ` Raag Jadav
  2026-03-02 10:21 ` [PATCH v2 03/11] drm/xe/xe_pci_error: Implement PCI error recovery callbacks Riana Tauro
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 43+ messages in thread
From: Riana Tauro @ 2026-03-02 10:21 UTC (permalink / raw)
  To: intel-xe
  Cc: riana.tauro, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, raag.jadav, ravi.kishore.koppuravuri,
	mallesh.koujalagi

On CSC runtime firmware errors that requires firmware flash through SPI,
PCODE sets the FDO mode bit in the Capability register.
Currently the survivability_info group is created only for boot
survivability.

Create survivability_info group even for runtime survivability to allow
userspace to check FDO mode sysfs.

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
---
 drivers/gpu/drm/xe/xe_survivability_mode.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_survivability_mode.c b/drivers/gpu/drm/xe/xe_survivability_mode.c
index db64cac39c94..70feb192fa2f 100644
--- a/drivers/gpu/drm/xe/xe_survivability_mode.c
+++ b/drivers/gpu/drm/xe/xe_survivability_mode.c
@@ -98,6 +98,15 @@
  *	# cat /sys/bus/pci/devices/<device>/survivability_mode
  *	  Runtime
  *
+ * On some CSC firmware errors, PCODE sets FDO mode and the only recovery possible is through
+ * firmware flash using SPI driver. Userspace can check if FDO mode is set by checking the below
+ * sysfs entry.
+ *
+ * .. code-block:: shell
+ *
+ *	# cat /sys/bus/pci/devices/<device>/survivability_info/fdo_mode
+ *       enabled
+ *
  * When such errors occur, userspace is notified with the drm device wedged uevent and runtime
  * survivability mode. User can then initiate a firmware flash using userspace tools like fwupd
  * to restore device to normal operation.
@@ -296,7 +305,8 @@ static int create_survivability_sysfs(struct pci_dev *pdev)
 	if (ret)
 		return ret;
 
-	if (check_boot_failure(xe)) {
+	/* Surivivability info is not required if enabled via configfs */
+	if (!xe_configfs_get_survivability_mode(pdev)) {
 		ret = devm_device_add_group(dev, &survivability_info_group);
 		if (ret)
 			return ret;
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 02/11] drm/xe/xe_survivability: Decouple survivability info from boot survivability
  2026-03-02 10:21 ` [PATCH v2 02/11] drm/xe/xe_survivability: Decouple survivability info from boot survivability Riana Tauro
@ 2026-03-02 17:00   ` Raag Jadav
  2026-03-03  8:18     ` Mallesh, Koujalagi
  2026-03-30 13:00     ` Tauro, Riana
  0 siblings, 2 replies; 43+ messages in thread
From: Raag Jadav @ 2026-03-02 17:00 UTC (permalink / raw)
  To: Riana Tauro
  Cc: intel-xe, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, ravi.kishore.koppuravuri, mallesh.koujalagi

On Mon, Mar 02, 2026 at 03:51:57PM +0530, Riana Tauro wrote:
> On CSC runtime firmware errors that requires firmware flash through SPI,
> PCODE sets the FDO mode bit in the Capability register.
> Currently the survivability_info group is created only for boot
> survivability.
> 
> Create survivability_info group even for runtime survivability to allow
> userspace to check FDO mode sysfs.
> 
> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_survivability_mode.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_survivability_mode.c b/drivers/gpu/drm/xe/xe_survivability_mode.c
> index db64cac39c94..70feb192fa2f 100644
> --- a/drivers/gpu/drm/xe/xe_survivability_mode.c
> +++ b/drivers/gpu/drm/xe/xe_survivability_mode.c
> @@ -98,6 +98,15 @@
>   *	# cat /sys/bus/pci/devices/<device>/survivability_mode
>   *	  Runtime
>   *
> + * On some CSC firmware errors, PCODE sets FDO mode and the only recovery possible is through
> + * firmware flash using SPI driver. Userspace can check if FDO mode is set by checking the below
> + * sysfs entry.
> + *
> + * .. code-block:: shell
> + *
> + *	# cat /sys/bus/pci/devices/<device>/survivability_info/fdo_mode
> + *       enabled

Currently FDO_INFO is defined as (MAX_SCRATCH_REG + 1), but I couldn't
find this case in survivability_info_attrs_visible(). Or did I miss
something?

Raag

>   * When such errors occur, userspace is notified with the drm device wedged uevent and runtime
>   * survivability mode. User can then initiate a firmware flash using userspace tools like fwupd
>   * to restore device to normal operation.
> @@ -296,7 +305,8 @@ static int create_survivability_sysfs(struct pci_dev *pdev)
>  	if (ret)
>  		return ret;
>  
> -	if (check_boot_failure(xe)) {
> +	/* Surivivability info is not required if enabled via configfs */
> +	if (!xe_configfs_get_survivability_mode(pdev)) {
>  		ret = devm_device_add_group(dev, &survivability_info_group);
>  		if (ret)
>  			return ret;
> -- 
> 2.47.1
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 02/11] drm/xe/xe_survivability: Decouple survivability info from boot survivability
  2026-03-02 17:00   ` Raag Jadav
@ 2026-03-03  8:18     ` Mallesh, Koujalagi
  2026-03-30 12:56       ` Tauro, Riana
  2026-03-30 13:00     ` Tauro, Riana
  1 sibling, 1 reply; 43+ messages in thread
From: Mallesh, Koujalagi @ 2026-03-03  8:18 UTC (permalink / raw)
  To: Riana Tauro
  Cc: intel-xe, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, ravi.kishore.koppuravuri, Raag Jadav


On 02-03-2026 10:30 pm, Raag Jadav wrote:
> On Mon, Mar 02, 2026 at 03:51:57PM +0530, Riana Tauro wrote:
>> On CSC runtime firmware errors that requires firmware flash through SPI,
>> PCODE sets the FDO mode bit in the Capability register.
>> Currently the survivability_info group is created only for boot
>> survivability.
>>
>> Create survivability_info group even for runtime survivability to allow
>> userspace to check FDO mode sysfs.
>>
>> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_survivability_mode.c | 12 +++++++++++-
>>   1 file changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_survivability_mode.c b/drivers/gpu/drm/xe/xe_survivability_mode.c
>> index db64cac39c94..70feb192fa2f 100644
>> --- a/drivers/gpu/drm/xe/xe_survivability_mode.c
>> +++ b/drivers/gpu/drm/xe/xe_survivability_mode.c
>> @@ -98,6 +98,15 @@
>>    *	# cat /sys/bus/pci/devices/<device>/survivability_mode
>>    *	  Runtime
>>    *
>> + * On some CSC firmware errors, PCODE sets FDO mode and the only recovery possible is through
>> + * firmware flash using SPI driver. Userspace can check if FDO mode is set by checking the below
>> + * sysfs entry.
>> + *
>> + * .. code-block:: shell
>> + *
>> + *	# cat /sys/bus/pci/devices/<device>/survivability_info/fdo_mode
>> + *       enabled
> Currently FDO_INFO is defined as (MAX_SCRATCH_REG + 1), but I couldn't
> find this case in survivability_info_attrs_visible(). Or did I miss
> something?
>
> Raag
>
>>    * When such errors occur, userspace is notified with the drm device wedged uevent and runtime
>>    * survivability mode. User can then initiate a firmware flash using userspace tools like fwupd
>>    * to restore device to normal operation.
>> @@ -296,7 +305,8 @@ static int create_survivability_sysfs(struct pci_dev *pdev)
>>   	if (ret)
>>   		return ret;
>>   
>> -	if (check_boot_failure(xe)) {
>> +	/* Surivivability info is not required if enabled via configfs */

Please fix typo,

Thanks

-/Mallesh

>> +	if (!xe_configfs_get_survivability_mode(pdev)) {
>>   		ret = devm_device_add_group(dev, &survivability_info_group);
>>   		if (ret)
>>   			return ret;
>> -- 
>> 2.47.1
>>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 02/11] drm/xe/xe_survivability: Decouple survivability info from boot survivability
  2026-03-03  8:18     ` Mallesh, Koujalagi
@ 2026-03-30 12:56       ` Tauro, Riana
  0 siblings, 0 replies; 43+ messages in thread
From: Tauro, Riana @ 2026-03-30 12:56 UTC (permalink / raw)
  To: Mallesh, Koujalagi
  Cc: intel-xe, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, ravi.kishore.koppuravuri, Raag Jadav


On 3/3/2026 1:48 PM, Mallesh, Koujalagi wrote:
>
> On 02-03-2026 10:30 pm, Raag Jadav wrote:
>> On Mon, Mar 02, 2026 at 03:51:57PM +0530, Riana Tauro wrote:
>>> On CSC runtime firmware errors that requires firmware flash through 
>>> SPI,
>>> PCODE sets the FDO mode bit in the Capability register.
>>> Currently the survivability_info group is created only for boot
>>> survivability.
>>>
>>> Create survivability_info group even for runtime survivability to allow
>>> userspace to check FDO mode sysfs.
>>>
>>> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
>>> ---
>>>   drivers/gpu/drm/xe/xe_survivability_mode.c | 12 +++++++++++-
>>>   1 file changed, 11 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_survivability_mode.c 
>>> b/drivers/gpu/drm/xe/xe_survivability_mode.c
>>> index db64cac39c94..70feb192fa2f 100644
>>> --- a/drivers/gpu/drm/xe/xe_survivability_mode.c
>>> +++ b/drivers/gpu/drm/xe/xe_survivability_mode.c
>>> @@ -98,6 +98,15 @@
>>>    *    # cat /sys/bus/pci/devices/<device>/survivability_mode
>>>    *      Runtime
>>>    *
>>> + * On some CSC firmware errors, PCODE sets FDO mode and the only 
>>> recovery possible is through
>>> + * firmware flash using SPI driver. Userspace can check if FDO mode 
>>> is set by checking the below
>>> + * sysfs entry.
>>> + *
>>> + * .. code-block:: shell
>>> + *
>>> + *    # cat /sys/bus/pci/devices/<device>/survivability_info/fdo_mode
>>> + *       enabled
>> Currently FDO_INFO is defined as (MAX_SCRATCH_REG + 1), but I couldn't
>> find this case in survivability_info_attrs_visible(). Or did I miss
>> something?
>>
>> Raag
>>
>>>    * When such errors occur, userspace is notified with the drm 
>>> device wedged uevent and runtime
>>>    * survivability mode. User can then initiate a firmware flash 
>>> using userspace tools like fwupd
>>>    * to restore device to normal operation.
>>> @@ -296,7 +305,8 @@ static int create_survivability_sysfs(struct 
>>> pci_dev *pdev)
>>>       if (ret)
>>>           return ret;
>>>   -    if (check_boot_failure(xe)) {
>>> +    /* Surivivability info is not required if enabled via configfs */
>
> Please fix typo,

Will fix. Thank you

Riana

>
> Thanks
>
> -/Mallesh
>
>>> +    if (!xe_configfs_get_survivability_mode(pdev)) {
>>>           ret = devm_device_add_group(dev, &survivability_info_group);
>>>           if (ret)
>>>               return ret;
>>> -- 
>>> 2.47.1
>>>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 02/11] drm/xe/xe_survivability: Decouple survivability info from boot survivability
  2026-03-02 17:00   ` Raag Jadav
  2026-03-03  8:18     ` Mallesh, Koujalagi
@ 2026-03-30 13:00     ` Tauro, Riana
  1 sibling, 0 replies; 43+ messages in thread
From: Tauro, Riana @ 2026-03-30 13:00 UTC (permalink / raw)
  To: Raag Jadav
  Cc: intel-xe, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, ravi.kishore.koppuravuri, mallesh.koujalagi


On 3/2/2026 10:30 PM, Raag Jadav wrote:
> On Mon, Mar 02, 2026 at 03:51:57PM +0530, Riana Tauro wrote:
>> On CSC runtime firmware errors that requires firmware flash through SPI,
>> PCODE sets the FDO mode bit in the Capability register.
>> Currently the survivability_info group is created only for boot
>> survivability.
>>
>> Create survivability_info group even for runtime survivability to allow
>> userspace to check FDO mode sysfs.
>>
>> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_survivability_mode.c | 12 +++++++++++-
>>   1 file changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_survivability_mode.c b/drivers/gpu/drm/xe/xe_survivability_mode.c
>> index db64cac39c94..70feb192fa2f 100644
>> --- a/drivers/gpu/drm/xe/xe_survivability_mode.c
>> +++ b/drivers/gpu/drm/xe/xe_survivability_mode.c
>> @@ -98,6 +98,15 @@
>>    *	# cat /sys/bus/pci/devices/<device>/survivability_mode
>>    *	  Runtime
>>    *
>> + * On some CSC firmware errors, PCODE sets FDO mode and the only recovery possible is through
>> + * firmware flash using SPI driver. Userspace can check if FDO mode is set by checking the below
>> + * sysfs entry.
>> + *
>> + * .. code-block:: shell
>> + *
>> + *	# cat /sys/bus/pci/devices/<device>/survivability_info/fdo_mode
>> + *       enabled
> Currently FDO_INFO is defined as (MAX_SCRATCH_REG + 1), but I couldn't
> find this case in survivability_info_attrs_visible(). Or did I miss
> something?

This check is present in survivability_info_attrs_visible
&attr_fdo_mode.attr.attr is at the last index.  This is made visible 
based on survivability mode version.
I will be sending this as a separate patch too.

Thanks
Riana

>
> Raag
>
>>    * When such errors occur, userspace is notified with the drm device wedged uevent and runtime
>>    * survivability mode. User can then initiate a firmware flash using userspace tools like fwupd
>>    * to restore device to normal operation.
>> @@ -296,7 +305,8 @@ static int create_survivability_sysfs(struct pci_dev *pdev)
>>   	if (ret)
>>   		return ret;
>>   
>> -	if (check_boot_failure(xe)) {
>> +	/* Surivivability info is not required if enabled via configfs */
>> +	if (!xe_configfs_get_survivability_mode(pdev)) {
>>   		ret = devm_device_add_group(dev, &survivability_info_group);
>>   		if (ret)
>>   			return ret;
>> -- 
>> 2.47.1
>>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v2 03/11] drm/xe/xe_pci_error: Implement PCI error recovery callbacks
  2026-03-02 10:21 [PATCH v2 00/11] Introduce Xe Uncorrectable Error Handling Riana Tauro
  2026-03-02 10:21 ` [PATCH v2 01/11] drm/xe/xe_sysctrl: Add System controller patch Riana Tauro
  2026-03-02 10:21 ` [PATCH v2 02/11] drm/xe/xe_survivability: Decouple survivability info from boot survivability Riana Tauro
@ 2026-03-02 10:21 ` Riana Tauro
  2026-03-02 17:37   ` Raag Jadav
  2026-03-04 10:38   ` Mallesh, Koujalagi
  2026-03-02 10:21 ` [PATCH v2 04/11] drm/xe/xe_pci_error: Group all devres to release them on PCIe slot reset Riana Tauro
                   ` (11 subsequent siblings)
  14 siblings, 2 replies; 43+ messages in thread
From: Riana Tauro @ 2026-03-02 10:21 UTC (permalink / raw)
  To: intel-xe
  Cc: riana.tauro, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, raag.jadav, ravi.kishore.koppuravuri,
	mallesh.koujalagi, Michal Wajdeczko, Matthew Brost, Matt Roper

Add error_detected, mmio_enabled, slot_reset and resume
recovery callbacks to handle PCIe Advanced Error Reporting
(AER) errors.

For fatal errors, the device is wedged and becomes
inaccessible. Return PCI_ERS_RESULT_SLOT_RESET from
error_detected to request a Secondary Bus Reset (SBR).

For non-fatal errors, return PCI_ERS_RESULT_CAN_RECOVER from
error_detected to trigger the mmio_enabled callback. In this callback,
the device is queried to determine the error cause and attempt
recovery based on the error type.

Once the secondary bus reset(SBR) is completed the slot_reset callback
cleanly removes and reprobe the device to restore functionality.

Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
---
v2: re-order linux headers
    reword error messages
    do not clear in_recovery after remove
    return PCI_ERS_RESULT_DISCONNECT if probe fails (Michal)
    only wedge device do not send uevent (Raag)
    set recovery flag in error_detected and clear on resume
    add default switch case (Mallesh)
---
 drivers/gpu/drm/xe/Makefile          |  1 +
 drivers/gpu/drm/xe/xe_device.h       | 15 +++++
 drivers/gpu/drm/xe/xe_device_types.h |  3 +
 drivers/gpu/drm/xe/xe_pci.c          |  3 +
 drivers/gpu/drm/xe/xe_pci_error.c    | 99 ++++++++++++++++++++++++++++
 5 files changed, 121 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/xe_pci_error.c

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index 1890bbd1b28d..417b030e5ce7 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -99,6 +99,7 @@ xe-y += xe_bb.o \
 	xe_page_reclaim.o \
 	xe_pat.o \
 	xe_pci.o \
+	xe_pci_error.o \
 	xe_pci_rebar.o \
 	xe_pcode.o \
 	xe_pm.o \
diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
index 39464650533b..972f43d20f1a 100644
--- a/drivers/gpu/drm/xe/xe_device.h
+++ b/drivers/gpu/drm/xe/xe_device.h
@@ -43,6 +43,21 @@ static inline struct xe_device *ttm_to_xe_device(struct ttm_device *ttm)
 	return container_of(ttm, struct xe_device, ttm);
 }
 
+static inline bool xe_device_is_in_recovery(struct xe_device *xe)
+{
+	return atomic_read(&xe->in_recovery);
+}
+
+static inline void xe_device_set_in_recovery(struct xe_device *xe)
+{
+	atomic_set(&xe->in_recovery, 1);
+}
+
+static inline void xe_device_clear_in_recovery(struct xe_device *xe)
+{
+	 atomic_set(&xe->in_recovery, 0);
+}
+
 struct xe_device *xe_device_create(struct pci_dev *pdev,
 				   const struct pci_device_id *ent);
 int xe_device_probe_early(struct xe_device *xe);
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 5599534384fa..616d74792902 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -504,6 +504,9 @@ struct xe_device {
 		bool inconsistent_reset;
 	} wedged;
 
+	/** @in_recovery: Indicates if device is in recovery */
+	atomic_t in_recovery;
+
 	/** @bo_device: Struct to control async free of BOs */
 	struct xe_bo_dev {
 		/** @bo_device.async_free: Free worker */
diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
index ad1e5ef2ee89..825489287f28 100644
--- a/drivers/gpu/drm/xe/xe_pci.c
+++ b/drivers/gpu/drm/xe/xe_pci.c
@@ -1313,6 +1313,8 @@ static const struct dev_pm_ops xe_pm_ops = {
 };
 #endif
 
+extern const struct pci_error_handlers xe_pci_error_handlers;
+
 static struct pci_driver xe_pci_driver = {
 	.name = DRIVER_NAME,
 	.id_table = pciidlist,
@@ -1320,6 +1322,7 @@ static struct pci_driver xe_pci_driver = {
 	.remove = xe_pci_remove,
 	.shutdown = xe_pci_shutdown,
 	.sriov_configure = xe_pci_sriov_configure,
+	.err_handler = &xe_pci_error_handlers,
 #ifdef CONFIG_PM_SLEEP
 	.driver.pm = &xe_pm_ops,
 #endif
diff --git a/drivers/gpu/drm/xe/xe_pci_error.c b/drivers/gpu/drm/xe/xe_pci_error.c
new file mode 100644
index 000000000000..d4896a4a5014
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_pci_error.c
@@ -0,0 +1,99 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+#include <linux/pci.h>
+
+#include <drm/drm_drv.h>
+
+#include "xe_device.h"
+#include "xe_gt.h"
+#include "xe_pci.h"
+#include "xe_uc.h"
+
+static void xe_pci_error_handling(struct pci_dev *pdev)
+{
+	struct xe_device *xe = pdev_to_xe_device(pdev);
+	struct xe_gt *gt;
+	u8 id;
+
+	/* Wedge the device to prevent userspace access but don't send the event yet */
+	atomic_set(&xe->wedged.flag, 1);
+
+	for_each_gt(gt, xe, id)
+		xe_gt_declare_wedged(gt);
+
+	pci_disable_device(pdev);
+}
+
+static pci_ers_result_t xe_pci_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
+{
+	struct xe_device *xe = pdev_to_xe_device(pdev);
+
+	dev_err(&pdev->dev, "Xe Pci error recovery: error detected state %d\n", state);
+
+	xe_device_set_in_recovery(xe);
+
+	switch (state) {
+	case pci_channel_io_normal:
+		return PCI_ERS_RESULT_CAN_RECOVER;
+	case pci_channel_io_frozen:
+		xe_pci_error_handling(pdev);
+		return PCI_ERS_RESULT_NEED_RESET;
+	case pci_channel_io_perm_failure:
+		return PCI_ERS_RESULT_DISCONNECT;
+	default:
+		dev_err(&pdev->dev, "Unknown state %d\n", state);
+		return PCI_ERS_RESULT_NEED_RESET;
+	}
+}
+
+static pci_ers_result_t xe_pci_error_mmio_enabled(struct pci_dev *pdev)
+{
+	dev_err(&pdev->dev, "Xe Pci error recovery: MMIO enabled\n");
+
+	return PCI_ERS_RESULT_NEED_RESET;
+}
+
+static pci_ers_result_t xe_pci_error_slot_reset(struct pci_dev *pdev)
+{
+	const struct pci_device_id *ent = pci_match_id(pdev->driver->id_table, pdev);
+	struct xe_device *xe = pdev_to_xe_device(pdev);
+
+	dev_err(&pdev->dev, "Xe Pci error recovery: Slot reset\n");
+
+	pci_restore_state(pdev);
+
+	if (pci_enable_device(pdev)) {
+		dev_err(&pdev->dev,
+			"Cannot re-enable PCI device after reset\n");
+		return PCI_ERS_RESULT_DISCONNECT;
+	}
+
+	/*
+	 * Secondary Bus Reset wipes out all device memory
+	 * requiring XE KMD to perform a device removal and reprobe.
+	 */
+	pdev->driver->remove(pdev);
+
+	if (!pdev->driver->probe(pdev, ent))
+		return PCI_ERS_RESULT_RECOVERED;
+
+	return PCI_ERS_RESULT_DISCONNECT;
+}
+
+static void xe_pci_error_resume(struct pci_dev *pdev)
+{
+	struct xe_device *xe = pdev_to_xe_device(pdev);
+
+	dev_info(&pdev->dev, "Xe Pci error recovery: Recovered\n");
+
+	xe_device_clear_in_recovery(xe);
+}
+
+const struct pci_error_handlers xe_pci_error_handlers = {
+	.error_detected	= xe_pci_error_detected,
+	.mmio_enabled	= xe_pci_error_mmio_enabled,
+	.slot_reset	= xe_pci_error_slot_reset,
+	.resume		= xe_pci_error_resume,
+};
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 03/11] drm/xe/xe_pci_error: Implement PCI error recovery callbacks
  2026-03-02 10:21 ` [PATCH v2 03/11] drm/xe/xe_pci_error: Implement PCI error recovery callbacks Riana Tauro
@ 2026-03-02 17:37   ` Raag Jadav
  2026-03-03  5:09     ` Riana Tauro
  2026-03-04 10:38   ` Mallesh, Koujalagi
  1 sibling, 1 reply; 43+ messages in thread
From: Raag Jadav @ 2026-03-02 17:37 UTC (permalink / raw)
  To: Riana Tauro
  Cc: intel-xe, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, ravi.kishore.koppuravuri, mallesh.koujalagi,
	Michal Wajdeczko, Matthew Brost, Matt Roper

On Mon, Mar 02, 2026 at 03:51:58PM +0530, Riana Tauro wrote:
> Add error_detected, mmio_enabled, slot_reset and resume
> recovery callbacks to handle PCIe Advanced Error Reporting
> (AER) errors.
> 
> For fatal errors, the device is wedged and becomes
> inaccessible. Return PCI_ERS_RESULT_SLOT_RESET from
> error_detected to request a Secondary Bus Reset (SBR).
> 
> For non-fatal errors, return PCI_ERS_RESULT_CAN_RECOVER from
> error_detected to trigger the mmio_enabled callback. In this callback,
> the device is queried to determine the error cause and attempt
> recovery based on the error type.
> 
> Once the secondary bus reset(SBR) is completed the slot_reset callback
> cleanly removes and reprobe the device to restore functionality.
> 
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
> ---
> v2: re-order linux headers
>     reword error messages
>     do not clear in_recovery after remove
>     return PCI_ERS_RESULT_DISCONNECT if probe fails (Michal)
>     only wedge device do not send uevent (Raag)
>     set recovery flag in error_detected and clear on resume
>     add default switch case (Mallesh)
> ---
>  drivers/gpu/drm/xe/Makefile          |  1 +
>  drivers/gpu/drm/xe/xe_device.h       | 15 +++++
>  drivers/gpu/drm/xe/xe_device_types.h |  3 +
>  drivers/gpu/drm/xe/xe_pci.c          |  3 +
>  drivers/gpu/drm/xe/xe_pci_error.c    | 99 ++++++++++++++++++++++++++++
>  5 files changed, 121 insertions(+)
>  create mode 100644 drivers/gpu/drm/xe/xe_pci_error.c
> 
> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> index 1890bbd1b28d..417b030e5ce7 100644
> --- a/drivers/gpu/drm/xe/Makefile
> +++ b/drivers/gpu/drm/xe/Makefile
> @@ -99,6 +99,7 @@ xe-y += xe_bb.o \
>  	xe_page_reclaim.o \
>  	xe_pat.o \
>  	xe_pci.o \
> +	xe_pci_error.o \
>  	xe_pci_rebar.o \
>  	xe_pcode.o \
>  	xe_pm.o \
> diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
> index 39464650533b..972f43d20f1a 100644
> --- a/drivers/gpu/drm/xe/xe_device.h
> +++ b/drivers/gpu/drm/xe/xe_device.h
> @@ -43,6 +43,21 @@ static inline struct xe_device *ttm_to_xe_device(struct ttm_device *ttm)
>  	return container_of(ttm, struct xe_device, ttm);
>  }
>  
> +static inline bool xe_device_is_in_recovery(struct xe_device *xe)
> +{
> +	return atomic_read(&xe->in_recovery);
> +}
> +
> +static inline void xe_device_set_in_recovery(struct xe_device *xe)
> +{
> +	atomic_set(&xe->in_recovery, 1);
> +}
> +
> +static inline void xe_device_clear_in_recovery(struct xe_device *xe)
> +{
> +	 atomic_set(&xe->in_recovery, 0);
> +}
> +
>  struct xe_device *xe_device_create(struct pci_dev *pdev,
>  				   const struct pci_device_id *ent);
>  int xe_device_probe_early(struct xe_device *xe);
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> index 5599534384fa..616d74792902 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -504,6 +504,9 @@ struct xe_device {
>  		bool inconsistent_reset;
>  	} wedged;
>  
> +	/** @in_recovery: Indicates if device is in recovery */
> +	atomic_t in_recovery;
> +
>  	/** @bo_device: Struct to control async free of BOs */
>  	struct xe_bo_dev {
>  		/** @bo_device.async_free: Free worker */
> diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
> index ad1e5ef2ee89..825489287f28 100644
> --- a/drivers/gpu/drm/xe/xe_pci.c
> +++ b/drivers/gpu/drm/xe/xe_pci.c
> @@ -1313,6 +1313,8 @@ static const struct dev_pm_ops xe_pm_ops = {
>  };
>  #endif
>  
> +extern const struct pci_error_handlers xe_pci_error_handlers;
> +
>  static struct pci_driver xe_pci_driver = {
>  	.name = DRIVER_NAME,
>  	.id_table = pciidlist,
> @@ -1320,6 +1322,7 @@ static struct pci_driver xe_pci_driver = {
>  	.remove = xe_pci_remove,
>  	.shutdown = xe_pci_shutdown,
>  	.sriov_configure = xe_pci_sriov_configure,
> +	.err_handler = &xe_pci_error_handlers,
>  #ifdef CONFIG_PM_SLEEP
>  	.driver.pm = &xe_pm_ops,
>  #endif
> diff --git a/drivers/gpu/drm/xe/xe_pci_error.c b/drivers/gpu/drm/xe/xe_pci_error.c
> new file mode 100644
> index 000000000000..d4896a4a5014
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_pci_error.c
> @@ -0,0 +1,99 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2026 Intel Corporation
> + */
> +#include <linux/pci.h>
> +
> +#include <drm/drm_drv.h>
> +
> +#include "xe_device.h"
> +#include "xe_gt.h"
> +#include "xe_pci.h"
> +#include "xe_uc.h"
> +
> +static void xe_pci_error_handling(struct pci_dev *pdev)
> +{
> +	struct xe_device *xe = pdev_to_xe_device(pdev);
> +	struct xe_gt *gt;
> +	u8 id;
> +
> +	/* Wedge the device to prevent userspace access but don't send the event yet */
> +	atomic_set(&xe->wedged.flag, 1);
> +
> +	for_each_gt(gt, xe, id)
> +		xe_gt_declare_wedged(gt);
> +
> +	pci_disable_device(pdev);
> +}
> +
> +static pci_ers_result_t xe_pci_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
> +{
> +	struct xe_device *xe = pdev_to_xe_device(pdev);
> +
> +	dev_err(&pdev->dev, "Xe Pci error recovery: error detected state %d\n", state);
> +
> +	xe_device_set_in_recovery(xe);

This looks similar to wedged.flag. If we rather stop exec queues and
cancel/flush all pending work properly, perhaps we won't be needing
this. Let me explore what can be done here.

> +	switch (state) {
> +	case pci_channel_io_normal:
> +		return PCI_ERS_RESULT_CAN_RECOVER;
> +	case pci_channel_io_frozen:
> +		xe_pci_error_handling(pdev);
> +		return PCI_ERS_RESULT_NEED_RESET;
> +	case pci_channel_io_perm_failure:
> +		return PCI_ERS_RESULT_DISCONNECT;
> +	default:
> +		dev_err(&pdev->dev, "Unknown state %d\n", state);
> +		return PCI_ERS_RESULT_NEED_RESET;
> +	}
> +}
> +
> +static pci_ers_result_t xe_pci_error_mmio_enabled(struct pci_dev *pdev)
> +{
> +	dev_err(&pdev->dev, "Xe Pci error recovery: MMIO enabled\n");
> +
> +	return PCI_ERS_RESULT_NEED_RESET;
> +}
> +
> +static pci_ers_result_t xe_pci_error_slot_reset(struct pci_dev *pdev)
> +{
> +	const struct pci_device_id *ent = pci_match_id(pdev->driver->id_table, pdev);
> +	struct xe_device *xe = pdev_to_xe_device(pdev);
> +
> +	dev_err(&pdev->dev, "Xe Pci error recovery: Slot reset\n");
> +
> +	pci_restore_state(pdev);
> +
> +	if (pci_enable_device(pdev)) {
> +		dev_err(&pdev->dev,
> +			"Cannot re-enable PCI device after reset\n");
> +		return PCI_ERS_RESULT_DISCONNECT;
> +	}
> +
> +	/*
> +	 * Secondary Bus Reset wipes out all device memory
> +	 * requiring XE KMD to perform a device removal and reprobe.
> +	 */
> +	pdev->driver->remove(pdev);

A bit fishy, but does the job for now ;)

Raag

> +	if (!pdev->driver->probe(pdev, ent))
> +		return PCI_ERS_RESULT_RECOVERED;
> +
> +	return PCI_ERS_RESULT_DISCONNECT;
> +}
> +
> +static void xe_pci_error_resume(struct pci_dev *pdev)
> +{
> +	struct xe_device *xe = pdev_to_xe_device(pdev);
> +
> +	dev_info(&pdev->dev, "Xe Pci error recovery: Recovered\n");
> +
> +	xe_device_clear_in_recovery(xe);
> +}
> +
> +const struct pci_error_handlers xe_pci_error_handlers = {
> +	.error_detected	= xe_pci_error_detected,
> +	.mmio_enabled	= xe_pci_error_mmio_enabled,
> +	.slot_reset	= xe_pci_error_slot_reset,
> +	.resume		= xe_pci_error_resume,
> +};
> -- 
> 2.47.1
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 03/11] drm/xe/xe_pci_error: Implement PCI error recovery callbacks
  2026-03-02 17:37   ` Raag Jadav
@ 2026-03-03  5:09     ` Riana Tauro
  0 siblings, 0 replies; 43+ messages in thread
From: Riana Tauro @ 2026-03-03  5:09 UTC (permalink / raw)
  To: Raag Jadav
  Cc: intel-xe, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, ravi.kishore.koppuravuri, mallesh.koujalagi,
	Michal Wajdeczko, Matthew Brost, Matt Roper



On 3/2/2026 11:07 PM, Raag Jadav wrote:
> On Mon, Mar 02, 2026 at 03:51:58PM +0530, Riana Tauro wrote:
>> Add error_detected, mmio_enabled, slot_reset and resume
>> recovery callbacks to handle PCIe Advanced Error Reporting
>> (AER) errors.
>>
>> For fatal errors, the device is wedged and becomes
>> inaccessible. Return PCI_ERS_RESULT_SLOT_RESET from
>> error_detected to request a Secondary Bus Reset (SBR).
>>
>> For non-fatal errors, return PCI_ERS_RESULT_CAN_RECOVER from
>> error_detected to trigger the mmio_enabled callback. In this callback,
>> the device is queried to determine the error cause and attempt
>> recovery based on the error type.
>>
>> Once the secondary bus reset(SBR) is completed the slot_reset callback
>> cleanly removes and reprobe the device to restore functionality.
>>
>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: Matt Roper <matthew.d.roper@intel.com>
>> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
>> ---
>> v2: re-order linux headers
>>      reword error messages
>>      do not clear in_recovery after remove
>>      return PCI_ERS_RESULT_DISCONNECT if probe fails (Michal)
>>      only wedge device do not send uevent (Raag)
>>      set recovery flag in error_detected and clear on resume
>>      add default switch case (Mallesh)
>> ---
>>   drivers/gpu/drm/xe/Makefile          |  1 +
>>   drivers/gpu/drm/xe/xe_device.h       | 15 +++++
>>   drivers/gpu/drm/xe/xe_device_types.h |  3 +
>>   drivers/gpu/drm/xe/xe_pci.c          |  3 +
>>   drivers/gpu/drm/xe/xe_pci_error.c    | 99 ++++++++++++++++++++++++++++
>>   5 files changed, 121 insertions(+)
>>   create mode 100644 drivers/gpu/drm/xe/xe_pci_error.c
>>
>> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
>> index 1890bbd1b28d..417b030e5ce7 100644
>> --- a/drivers/gpu/drm/xe/Makefile
>> +++ b/drivers/gpu/drm/xe/Makefile
>> @@ -99,6 +99,7 @@ xe-y += xe_bb.o \
>>   	xe_page_reclaim.o \
>>   	xe_pat.o \
>>   	xe_pci.o \
>> +	xe_pci_error.o \
>>   	xe_pci_rebar.o \
>>   	xe_pcode.o \
>>   	xe_pm.o \
>> diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
>> index 39464650533b..972f43d20f1a 100644
>> --- a/drivers/gpu/drm/xe/xe_device.h
>> +++ b/drivers/gpu/drm/xe/xe_device.h
>> @@ -43,6 +43,21 @@ static inline struct xe_device *ttm_to_xe_device(struct ttm_device *ttm)
>>   	return container_of(ttm, struct xe_device, ttm);
>>   }
>>   
>> +static inline bool xe_device_is_in_recovery(struct xe_device *xe)
>> +{
>> +	return atomic_read(&xe->in_recovery);
>> +}
>> +
>> +static inline void xe_device_set_in_recovery(struct xe_device *xe)
>> +{
>> +	atomic_set(&xe->in_recovery, 1);
>> +}
>> +
>> +static inline void xe_device_clear_in_recovery(struct xe_device *xe)
>> +{
>> +	 atomic_set(&xe->in_recovery, 0);
>> +}
>> +
>>   struct xe_device *xe_device_create(struct pci_dev *pdev,
>>   				   const struct pci_device_id *ent);
>>   int xe_device_probe_early(struct xe_device *xe);
>> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
>> index 5599534384fa..616d74792902 100644
>> --- a/drivers/gpu/drm/xe/xe_device_types.h
>> +++ b/drivers/gpu/drm/xe/xe_device_types.h
>> @@ -504,6 +504,9 @@ struct xe_device {
>>   		bool inconsistent_reset;
>>   	} wedged;
>>   
>> +	/** @in_recovery: Indicates if device is in recovery */
>> +	atomic_t in_recovery;
>> +
>>   	/** @bo_device: Struct to control async free of BOs */
>>   	struct xe_bo_dev {
>>   		/** @bo_device.async_free: Free worker */
>> diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
>> index ad1e5ef2ee89..825489287f28 100644
>> --- a/drivers/gpu/drm/xe/xe_pci.c
>> +++ b/drivers/gpu/drm/xe/xe_pci.c
>> @@ -1313,6 +1313,8 @@ static const struct dev_pm_ops xe_pm_ops = {
>>   };
>>   #endif
>>   
>> +extern const struct pci_error_handlers xe_pci_error_handlers;
>> +
>>   static struct pci_driver xe_pci_driver = {
>>   	.name = DRIVER_NAME,
>>   	.id_table = pciidlist,
>> @@ -1320,6 +1322,7 @@ static struct pci_driver xe_pci_driver = {
>>   	.remove = xe_pci_remove,
>>   	.shutdown = xe_pci_shutdown,
>>   	.sriov_configure = xe_pci_sriov_configure,
>> +	.err_handler = &xe_pci_error_handlers,
>>   #ifdef CONFIG_PM_SLEEP
>>   	.driver.pm = &xe_pm_ops,
>>   #endif
>> diff --git a/drivers/gpu/drm/xe/xe_pci_error.c b/drivers/gpu/drm/xe/xe_pci_error.c
>> new file mode 100644
>> index 000000000000..d4896a4a5014
>> --- /dev/null
>> +++ b/drivers/gpu/drm/xe/xe_pci_error.c
>> @@ -0,0 +1,99 @@
>> +// SPDX-License-Identifier: MIT
>> +/*
>> + * Copyright © 2026 Intel Corporation
>> + */
>> +#include <linux/pci.h>
>> +
>> +#include <drm/drm_drv.h>
>> +
>> +#include "xe_device.h"
>> +#include "xe_gt.h"
>> +#include "xe_pci.h"
>> +#include "xe_uc.h"
>> +
>> +static void xe_pci_error_handling(struct pci_dev *pdev)
>> +{
>> +	struct xe_device *xe = pdev_to_xe_device(pdev);
>> +	struct xe_gt *gt;
>> +	u8 id;
>> +
>> +	/* Wedge the device to prevent userspace access but don't send the event yet */
>> +	atomic_set(&xe->wedged.flag, 1);
>> +
>> +	for_each_gt(gt, xe, id)
>> +		xe_gt_declare_wedged(gt);
>> +
>> +	pci_disable_device(pdev);
>> +}
>> +
>> +static pci_ers_result_t xe_pci_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
>> +{
>> +	struct xe_device *xe = pdev_to_xe_device(pdev);
>> +
>> +	dev_err(&pdev->dev, "Xe Pci error recovery: error detected state %d\n", state);
>> +
>> +	xe_device_set_in_recovery(xe);
> 
> This looks similar to wedged.flag. If we rather stop exec queues and
> cancel/flush all pending work properly, perhaps we won't be needing
> this. Let me explore what can be done here.

This will also have to deal with clearing user bos. Let me take a look.
Right now jobs timeout. This flag prevents gt reset and devcoredump to 
prevent accessing the device.

> 
>> +	switch (state) {
>> +	case pci_channel_io_normal:
>> +		return PCI_ERS_RESULT_CAN_RECOVER;
>> +	case pci_channel_io_frozen:
>> +		xe_pci_error_handling(pdev);
>> +		return PCI_ERS_RESULT_NEED_RESET;
>> +	case pci_channel_io_perm_failure:
>> +		return PCI_ERS_RESULT_DISCONNECT;
>> +	default:
>> +		dev_err(&pdev->dev, "Unknown state %d\n", state);
>> +		return PCI_ERS_RESULT_NEED_RESET;
>> +	}
>> +}
>> +
>> +static pci_ers_result_t xe_pci_error_mmio_enabled(struct pci_dev *pdev)
>> +{
>> +	dev_err(&pdev->dev, "Xe Pci error recovery: MMIO enabled\n");
>> +
>> +	return PCI_ERS_RESULT_NEED_RESET;
>> +}
>> +
>> +static pci_ers_result_t xe_pci_error_slot_reset(struct pci_dev *pdev)
>> +{
>> +	const struct pci_device_id *ent = pci_match_id(pdev->driver->id_table, pdev);
>> +	struct xe_device *xe = pdev_to_xe_device(pdev);
>> +
>> +	dev_err(&pdev->dev, "Xe Pci error recovery: Slot reset\n");
>> +
>> +	pci_restore_state(pdev);
>> +
>> +	if (pci_enable_device(pdev)) {
>> +		dev_err(&pdev->dev,
>> +			"Cannot re-enable PCI device after reset\n");
>> +		return PCI_ERS_RESULT_DISCONNECT;
>> +	}
>> +
>> +	/*
>> +	 * Secondary Bus Reset wipes out all device memory
>> +	 * requiring XE KMD to perform a device removal and reprobe.
>> +	 */
>> +	pdev->driver->remove(pdev);
> 
> A bit fishy, but does the job for now ;)

If the FLR changes are merged that would be helpful here.
Thanks for that series.

Will add those changes and test locally. Otherwise will incrementally 
optimize this to separate out xe_device and xe_pci related changes so we 
can call xe_device_probe and remove.

Thanks
Riana

> 
> Raag
> 
>> +	if (!pdev->driver->probe(pdev, ent))
>> +		return PCI_ERS_RESULT_RECOVERED;
>> +
>> +	return PCI_ERS_RESULT_DISCONNECT;
>> +}
>> +
>> +static void xe_pci_error_resume(struct pci_dev *pdev)
>> +{
>> +	struct xe_device *xe = pdev_to_xe_device(pdev);
>> +
>> +	dev_info(&pdev->dev, "Xe Pci error recovery: Recovered\n");
>> +
>> +	xe_device_clear_in_recovery(xe);
>> +}
>> +
>> +const struct pci_error_handlers xe_pci_error_handlers = {
>> +	.error_detected	= xe_pci_error_detected,
>> +	.mmio_enabled	= xe_pci_error_mmio_enabled,
>> +	.slot_reset	= xe_pci_error_slot_reset,
>> +	.resume		= xe_pci_error_resume,
>> +};
>> -- 
>> 2.47.1
>>


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 03/11] drm/xe/xe_pci_error: Implement PCI error recovery callbacks
  2026-03-02 10:21 ` [PATCH v2 03/11] drm/xe/xe_pci_error: Implement PCI error recovery callbacks Riana Tauro
  2026-03-02 17:37   ` Raag Jadav
@ 2026-03-04 10:38   ` Mallesh, Koujalagi
  2026-03-31  5:18     ` Tauro, Riana
  1 sibling, 1 reply; 43+ messages in thread
From: Mallesh, Koujalagi @ 2026-03-04 10:38 UTC (permalink / raw)
  To: Riana Tauro, intel-xe
  Cc: anshuman.gupta, rodrigo.vivi, aravind.iddamsetty, badal.nilawar,
	raag.jadav, ravi.kishore.koppuravuri, Michal Wajdeczko,
	Matthew Brost, Matt Roper


On 02-03-2026 03:51 pm, Riana Tauro wrote:
> Add error_detected, mmio_enabled, slot_reset and resume
> recovery callbacks to handle PCIe Advanced Error Reporting
> (AER) errors.
>
> For fatal errors, the device is wedged and becomes
> inaccessible. Return PCI_ERS_RESULT_SLOT_RESET from
> error_detected to request a Secondary Bus Reset (SBR).
>
> For non-fatal errors, return PCI_ERS_RESULT_CAN_RECOVER from
> error_detected to trigger the mmio_enabled callback. In this callback,
> the device is queried to determine the error cause and attempt
> recovery based on the error type.
>
> Once the secondary bus reset(SBR) is completed the slot_reset callback
> cleanly removes and reprobe the device to restore functionality.
>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
> ---
> v2: re-order linux headers
>      reword error messages
>      do not clear in_recovery after remove
>      return PCI_ERS_RESULT_DISCONNECT if probe fails (Michal)
>      only wedge device do not send uevent (Raag)
>      set recovery flag in error_detected and clear on resume
>      add default switch case (Mallesh)
> ---
>   drivers/gpu/drm/xe/Makefile          |  1 +
>   drivers/gpu/drm/xe/xe_device.h       | 15 +++++
>   drivers/gpu/drm/xe/xe_device_types.h |  3 +
>   drivers/gpu/drm/xe/xe_pci.c          |  3 +
>   drivers/gpu/drm/xe/xe_pci_error.c    | 99 ++++++++++++++++++++++++++++
>   5 files changed, 121 insertions(+)
>   create mode 100644 drivers/gpu/drm/xe/xe_pci_error.c
>
> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> index 1890bbd1b28d..417b030e5ce7 100644
> --- a/drivers/gpu/drm/xe/Makefile
> +++ b/drivers/gpu/drm/xe/Makefile
> @@ -99,6 +99,7 @@ xe-y += xe_bb.o \
>   	xe_page_reclaim.o \
>   	xe_pat.o \
>   	xe_pci.o \
> +	xe_pci_error.o \
>   	xe_pci_rebar.o \
>   	xe_pcode.o \
>   	xe_pm.o \
> diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
> index 39464650533b..972f43d20f1a 100644
> --- a/drivers/gpu/drm/xe/xe_device.h
> +++ b/drivers/gpu/drm/xe/xe_device.h
> @@ -43,6 +43,21 @@ static inline struct xe_device *ttm_to_xe_device(struct ttm_device *ttm)
>   	return container_of(ttm, struct xe_device, ttm);
>   }
>   
> +static inline bool xe_device_is_in_recovery(struct xe_device *xe)
> +{
> +	return atomic_read(&xe->in_recovery);
> +}
> +
> +static inline void xe_device_set_in_recovery(struct xe_device *xe)
> +{
> +	atomic_set(&xe->in_recovery, 1);
> +}
> +
> +static inline void xe_device_clear_in_recovery(struct xe_device *xe)
> +{
> +	 atomic_set(&xe->in_recovery, 0);
> +}
> +
>   struct xe_device *xe_device_create(struct pci_dev *pdev,
>   				   const struct pci_device_id *ent);
>   int xe_device_probe_early(struct xe_device *xe);
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> index 5599534384fa..616d74792902 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -504,6 +504,9 @@ struct xe_device {
>   		bool inconsistent_reset;
>   	} wedged;
>   
> +	/** @in_recovery: Indicates if device is in recovery */
> +	atomic_t in_recovery;
> +
>   	/** @bo_device: Struct to control async free of BOs */
>   	struct xe_bo_dev {
>   		/** @bo_device.async_free: Free worker */
> diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
> index ad1e5ef2ee89..825489287f28 100644
> --- a/drivers/gpu/drm/xe/xe_pci.c
> +++ b/drivers/gpu/drm/xe/xe_pci.c
> @@ -1313,6 +1313,8 @@ static const struct dev_pm_ops xe_pm_ops = {
>   };
>   #endif
>   
> +extern const struct pci_error_handlers xe_pci_error_handlers;
> +
>   static struct pci_driver xe_pci_driver = {
>   	.name = DRIVER_NAME,
>   	.id_table = pciidlist,
> @@ -1320,6 +1322,7 @@ static struct pci_driver xe_pci_driver = {
>   	.remove = xe_pci_remove,
>   	.shutdown = xe_pci_shutdown,
>   	.sriov_configure = xe_pci_sriov_configure,
> +	.err_handler = &xe_pci_error_handlers,
>   #ifdef CONFIG_PM_SLEEP
>   	.driver.pm = &xe_pm_ops,
>   #endif
> diff --git a/drivers/gpu/drm/xe/xe_pci_error.c b/drivers/gpu/drm/xe/xe_pci_error.c
> new file mode 100644
> index 000000000000..d4896a4a5014
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_pci_error.c
> @@ -0,0 +1,99 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2026 Intel Corporation
> + */
> +#include <linux/pci.h>
> +
> +#include <drm/drm_drv.h>
> +
> +#include "xe_device.h"
> +#include "xe_gt.h"
> +#include "xe_pci.h"
> +#include "xe_uc.h"
> +
> +static void xe_pci_error_handling(struct pci_dev *pdev)
> +{
> +	struct xe_device *xe = pdev_to_xe_device(pdev);
> +	struct xe_gt *gt;
> +	u8 id;
> +
What is behavior when wedged mode is XE_WEDGED_MODE_NEVER and PCI error 
occurred.
> +	/* Wedge the device to prevent userspace access but don't send the event yet */
> +	atomic_set(&xe->wedged.flag, 1);
> +
> +	for_each_gt(gt, xe, id)
> +		xe_gt_declare_wedged(gt);
> +
> +	pci_disable_device(pdev);
> +}
> +
> +static pci_ers_result_t xe_pci_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
> +{
> +	struct xe_device *xe = pdev_to_xe_device(pdev);
> +
> +	dev_err(&pdev->dev, "Xe Pci error recovery: error detected state %d\n", state);
> +
> +	xe_device_set_in_recovery(xe);
> +
> +	switch (state) {
> +	case pci_channel_io_normal:
> +		return PCI_ERS_RESULT_CAN_RECOVER;
> +	case pci_channel_io_frozen:
> +		xe_pci_error_handling(pdev);
> +		return PCI_ERS_RESULT_NEED_RESET;
> +	case pci_channel_io_perm_failure:

Clear in_recovery flag before disconnect, dead/unusable device should 
not appear to be recovering.

in_recovery flag still set that suggest recovery is actively happening.

Thanks

-/Mallesh

> +		return PCI_ERS_RESULT_DISCONNECT;
> +	default:
> +		dev_err(&pdev->dev, "Unknown state %d\n", state);
> +		return PCI_ERS_RESULT_NEED_RESET;
> +	}
> +}
> +
> +static pci_ers_result_t xe_pci_error_mmio_enabled(struct pci_dev *pdev)
> +{
> +	dev_err(&pdev->dev, "Xe Pci error recovery: MMIO enabled\n");
> +
> +	return PCI_ERS_RESULT_NEED_RESET;
> +}
> +
> +static pci_ers_result_t xe_pci_error_slot_reset(struct pci_dev *pdev)
> +{
> +	const struct pci_device_id *ent = pci_match_id(pdev->driver->id_table, pdev);
> +	struct xe_device *xe = pdev_to_xe_device(pdev);
> +
> +	dev_err(&pdev->dev, "Xe Pci error recovery: Slot reset\n");
> +
> +	pci_restore_state(pdev);
> +
> +	if (pci_enable_device(pdev)) {
> +		dev_err(&pdev->dev,
> +			"Cannot re-enable PCI device after reset\n");
> +		return PCI_ERS_RESULT_DISCONNECT;
> +	}
> +
> +	/*
> +	 * Secondary Bus Reset wipes out all device memory
> +	 * requiring XE KMD to perform a device removal and reprobe.
> +	 */
> +	pdev->driver->remove(pdev);
> +
> +	if (!pdev->driver->probe(pdev, ent))
> +		return PCI_ERS_RESULT_RECOVERED;
> +
> +	return PCI_ERS_RESULT_DISCONNECT;
> +}
> +
> +static void xe_pci_error_resume(struct pci_dev *pdev)
> +{
> +	struct xe_device *xe = pdev_to_xe_device(pdev);
> +
> +	dev_info(&pdev->dev, "Xe Pci error recovery: Recovered\n");
> +
> +	xe_device_clear_in_recovery(xe);
> +}
> +
> +const struct pci_error_handlers xe_pci_error_handlers = {
> +	.error_detected	= xe_pci_error_detected,
> +	.mmio_enabled	= xe_pci_error_mmio_enabled,
> +	.slot_reset	= xe_pci_error_slot_reset,
> +	.resume		= xe_pci_error_resume,
> +};

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 03/11] drm/xe/xe_pci_error: Implement PCI error recovery callbacks
  2026-03-04 10:38   ` Mallesh, Koujalagi
@ 2026-03-31  5:18     ` Tauro, Riana
  0 siblings, 0 replies; 43+ messages in thread
From: Tauro, Riana @ 2026-03-31  5:18 UTC (permalink / raw)
  To: Mallesh, Koujalagi, intel-xe
  Cc: anshuman.gupta, rodrigo.vivi, aravind.iddamsetty, badal.nilawar,
	raag.jadav, ravi.kishore.koppuravuri, Michal Wajdeczko,
	Matthew Brost, Matt Roper


On 3/4/2026 4:08 PM, Mallesh, Koujalagi wrote:
>
> On 02-03-2026 03:51 pm, Riana Tauro wrote:
>> Add error_detected, mmio_enabled, slot_reset and resume
>> recovery callbacks to handle PCIe Advanced Error Reporting
>> (AER) errors.
>>
>> For fatal errors, the device is wedged and becomes
>> inaccessible. Return PCI_ERS_RESULT_SLOT_RESET from
>> error_detected to request a Secondary Bus Reset (SBR).
>>
>> For non-fatal errors, return PCI_ERS_RESULT_CAN_RECOVER from
>> error_detected to trigger the mmio_enabled callback. In this callback,
>> the device is queried to determine the error cause and attempt
>> recovery based on the error type.
>>
>> Once the secondary bus reset(SBR) is completed the slot_reset callback
>> cleanly removes and reprobe the device to restore functionality.
>>
>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: Matt Roper <matthew.d.roper@intel.com>
>> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
>> ---
>> v2: re-order linux headers
>>      reword error messages
>>      do not clear in_recovery after remove
>>      return PCI_ERS_RESULT_DISCONNECT if probe fails (Michal)
>>      only wedge device do not send uevent (Raag)
>>      set recovery flag in error_detected and clear on resume
>>      add default switch case (Mallesh)
>> ---
>>   drivers/gpu/drm/xe/Makefile          |  1 +
>>   drivers/gpu/drm/xe/xe_device.h       | 15 +++++
>>   drivers/gpu/drm/xe/xe_device_types.h |  3 +
>>   drivers/gpu/drm/xe/xe_pci.c          |  3 +
>>   drivers/gpu/drm/xe/xe_pci_error.c    | 99 ++++++++++++++++++++++++++++
>>   5 files changed, 121 insertions(+)
>>   create mode 100644 drivers/gpu/drm/xe/xe_pci_error.c
>>
>> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
>> index 1890bbd1b28d..417b030e5ce7 100644
>> --- a/drivers/gpu/drm/xe/Makefile
>> +++ b/drivers/gpu/drm/xe/Makefile
>> @@ -99,6 +99,7 @@ xe-y += xe_bb.o \
>>       xe_page_reclaim.o \
>>       xe_pat.o \
>>       xe_pci.o \
>> +    xe_pci_error.o \
>>       xe_pci_rebar.o \
>>       xe_pcode.o \
>>       xe_pm.o \
>> diff --git a/drivers/gpu/drm/xe/xe_device.h 
>> b/drivers/gpu/drm/xe/xe_device.h
>> index 39464650533b..972f43d20f1a 100644
>> --- a/drivers/gpu/drm/xe/xe_device.h
>> +++ b/drivers/gpu/drm/xe/xe_device.h
>> @@ -43,6 +43,21 @@ static inline struct xe_device 
>> *ttm_to_xe_device(struct ttm_device *ttm)
>>       return container_of(ttm, struct xe_device, ttm);
>>   }
>>   +static inline bool xe_device_is_in_recovery(struct xe_device *xe)
>> +{
>> +    return atomic_read(&xe->in_recovery);
>> +}
>> +
>> +static inline void xe_device_set_in_recovery(struct xe_device *xe)
>> +{
>> +    atomic_set(&xe->in_recovery, 1);
>> +}
>> +
>> +static inline void xe_device_clear_in_recovery(struct xe_device *xe)
>> +{
>> +     atomic_set(&xe->in_recovery, 0);
>> +}
>> +
>>   struct xe_device *xe_device_create(struct pci_dev *pdev,
>>                      const struct pci_device_id *ent);
>>   int xe_device_probe_early(struct xe_device *xe);
>> diff --git a/drivers/gpu/drm/xe/xe_device_types.h 
>> b/drivers/gpu/drm/xe/xe_device_types.h
>> index 5599534384fa..616d74792902 100644
>> --- a/drivers/gpu/drm/xe/xe_device_types.h
>> +++ b/drivers/gpu/drm/xe/xe_device_types.h
>> @@ -504,6 +504,9 @@ struct xe_device {
>>           bool inconsistent_reset;
>>       } wedged;
>>   +    /** @in_recovery: Indicates if device is in recovery */
>> +    atomic_t in_recovery;
>> +
>>       /** @bo_device: Struct to control async free of BOs */
>>       struct xe_bo_dev {
>>           /** @bo_device.async_free: Free worker */
>> diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
>> index ad1e5ef2ee89..825489287f28 100644
>> --- a/drivers/gpu/drm/xe/xe_pci.c
>> +++ b/drivers/gpu/drm/xe/xe_pci.c
>> @@ -1313,6 +1313,8 @@ static const struct dev_pm_ops xe_pm_ops = {
>>   };
>>   #endif
>>   +extern const struct pci_error_handlers xe_pci_error_handlers;
>> +
>>   static struct pci_driver xe_pci_driver = {
>>       .name = DRIVER_NAME,
>>       .id_table = pciidlist,
>> @@ -1320,6 +1322,7 @@ static struct pci_driver xe_pci_driver = {
>>       .remove = xe_pci_remove,
>>       .shutdown = xe_pci_shutdown,
>>       .sriov_configure = xe_pci_sriov_configure,
>> +    .err_handler = &xe_pci_error_handlers,
>>   #ifdef CONFIG_PM_SLEEP
>>       .driver.pm = &xe_pm_ops,
>>   #endif
>> diff --git a/drivers/gpu/drm/xe/xe_pci_error.c 
>> b/drivers/gpu/drm/xe/xe_pci_error.c
>> new file mode 100644
>> index 000000000000..d4896a4a5014
>> --- /dev/null
>> +++ b/drivers/gpu/drm/xe/xe_pci_error.c
>> @@ -0,0 +1,99 @@
>> +// SPDX-License-Identifier: MIT
>> +/*
>> + * Copyright © 2026 Intel Corporation
>> + */
>> +#include <linux/pci.h>
>> +
>> +#include <drm/drm_drv.h>
>> +
>> +#include "xe_device.h"
>> +#include "xe_gt.h"
>> +#include "xe_pci.h"
>> +#include "xe_uc.h"
>> +
>> +static void xe_pci_error_handling(struct pci_dev *pdev)
>> +{
>> +    struct xe_device *xe = pdev_to_xe_device(pdev);
>> +    struct xe_gt *gt;
>> +    u8 id;
>> +
> What is behavior when wedged mode is XE_WEDGED_MODE_NEVER and PCI 
> error occurred.

Even then to attempt a slot reset, we need to stop all ioctls and job 
submissions. Here we are reusing
the wedged flag to do the same.  We are not asking the userspace to 
recover so should be fine.
My intention was to re-use the wedged flag instead of using in_recovery 
everywhere.

>> +    /* Wedge the device to prevent userspace access but don't send 
>> the event yet */
>> +    atomic_set(&xe->wedged.flag, 1);
>> +
>> +    for_each_gt(gt, xe, id)
>> +        xe_gt_declare_wedged(gt);
>> +
>> +    pci_disable_device(pdev);
>> +}
>> +
>> +static pci_ers_result_t xe_pci_error_detected(struct pci_dev *pdev, 
>> pci_channel_state_t state)
>> +{
>> +    struct xe_device *xe = pdev_to_xe_device(pdev);
>> +
>> +    dev_err(&pdev->dev, "Xe Pci error recovery: error detected state 
>> %d\n", state);
>> +
>> +    xe_device_set_in_recovery(xe);
>> +
>> +    switch (state) {
>> +    case pci_channel_io_normal:
>> +        return PCI_ERS_RESULT_CAN_RECOVER;
>> +    case pci_channel_io_frozen:
>> +        xe_pci_error_handling(pdev);
>> +        return PCI_ERS_RESULT_NEED_RESET;
>> +    case pci_channel_io_perm_failure:
>
> Clear in_recovery flag before disconnect, dead/unusable device should 
> not appear to be recovering.
>
> in_recovery flag still set that suggest recovery is actively happening. 

Agree, last rev this was proper and in_recovery was added only before 
attempting a slot reset.
Will either move it back or return if state is disconnect.

Reverting to previous revision would make more sense here since we are 
using in_recovery
during job timeouts and gt resets when device is wedged.

Thanks
Riana

>
> Thanks
>
> -/Mallesh
>
>> +        return PCI_ERS_RESULT_DISCONNECT;
>> +    default:
>> +        dev_err(&pdev->dev, "Unknown state %d\n", state);
>> +        return PCI_ERS_RESULT_NEED_RESET;
>> +    }
>> +}
>> +
>> +static pci_ers_result_t xe_pci_error_mmio_enabled(struct pci_dev *pdev)
>> +{
>> +    dev_err(&pdev->dev, "Xe Pci error recovery: MMIO enabled\n");
>> +
>> +    return PCI_ERS_RESULT_NEED_RESET;
>> +}
>> +
>> +static pci_ers_result_t xe_pci_error_slot_reset(struct pci_dev *pdev)
>> +{
>> +    const struct pci_device_id *ent = 
>> pci_match_id(pdev->driver->id_table, pdev);
>> +    struct xe_device *xe = pdev_to_xe_device(pdev);
>> +
>> +    dev_err(&pdev->dev, "Xe Pci error recovery: Slot reset\n");
>> +
>> +    pci_restore_state(pdev);
>> +
>> +    if (pci_enable_device(pdev)) {
>> +        dev_err(&pdev->dev,
>> +            "Cannot re-enable PCI device after reset\n");
>> +        return PCI_ERS_RESULT_DISCONNECT;
>> +    }
>> +
>> +    /*
>> +     * Secondary Bus Reset wipes out all device memory
>> +     * requiring XE KMD to perform a device removal and reprobe.
>> +     */
>> +    pdev->driver->remove(pdev);
>> +
>> +    if (!pdev->driver->probe(pdev, ent))
>> +        return PCI_ERS_RESULT_RECOVERED;
>> +
>> +    return PCI_ERS_RESULT_DISCONNECT;
>> +}
>> +
>> +static void xe_pci_error_resume(struct pci_dev *pdev)
>> +{
>> +    struct xe_device *xe = pdev_to_xe_device(pdev);
>> +
>> +    dev_info(&pdev->dev, "Xe Pci error recovery: Recovered\n");
>> +
>> +    xe_device_clear_in_recovery(xe);
>> +}
>> +
>> +const struct pci_error_handlers xe_pci_error_handlers = {
>> +    .error_detected    = xe_pci_error_detected,
>> +    .mmio_enabled    = xe_pci_error_mmio_enabled,
>> +    .slot_reset    = xe_pci_error_slot_reset,
>> +    .resume        = xe_pci_error_resume,
>> +};

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v2 04/11] drm/xe/xe_pci_error: Group all devres to release them on PCIe slot reset
  2026-03-02 10:21 [PATCH v2 00/11] Introduce Xe Uncorrectable Error Handling Riana Tauro
                   ` (2 preceding siblings ...)
  2026-03-02 10:21 ` [PATCH v2 03/11] drm/xe/xe_pci_error: Implement PCI error recovery callbacks Riana Tauro
@ 2026-03-02 10:21 ` Riana Tauro
  2026-03-02 10:22 ` [PATCH v2 05/11] drm/xe: Skip device access during PCI error recovery Riana Tauro
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 43+ messages in thread
From: Riana Tauro @ 2026-03-02 10:21 UTC (permalink / raw)
  To: intel-xe
  Cc: riana.tauro, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, raag.jadav, ravi.kishore.koppuravuri,
	mallesh.koujalagi, Matthew Brost, Himal Prasad Ghimiray

Add devres grouping to handle device resource cleanup during
PCI error recovery.

Secondary Bus Reset (SBR) is triggered by PCI core when the
error_detected/mmio_enabled callbacks return PCI_ERS_RESULT_NEED_RESET.

Once SBR is complete, the slot_reset callback is triggered. SBR wipes
out all device memory requiring XE KMD to perform a device removal and
reprobe.
Calling xe_pci_remove() alone does not free the devres allocated.
Since there are no exported functions to release all devres, group the
devres allocations and release the entire group during slot reset to
ensure proper cleanup.

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
---
 drivers/gpu/drm/xe/xe_device.c       | 7 +++++++
 drivers/gpu/drm/xe/xe_device_types.h | 3 +++
 drivers/gpu/drm/xe/xe_pci_error.c    | 1 +
 3 files changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 1d61bb504e9b..eea67cf13539 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -441,6 +441,7 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
 				   const struct pci_device_id *ent)
 {
 	struct xe_device *xe;
+	void *devres_id;
 	int err;
 
 	xe_display_driver_set_hooks(&driver);
@@ -449,10 +450,16 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
 	if (err)
 		return ERR_PTR(err);
 
+	devres_id = devres_open_group(&pdev->dev, NULL, GFP_KERNEL);
+	if (!devres_id)
+		return ERR_PTR(-ENOMEM);
+
 	xe = devm_drm_dev_alloc(&pdev->dev, &driver, struct xe_device, drm);
 	if (IS_ERR(xe))
 		return xe;
 
+	xe->devres_group_id = devres_id;
+
 	err = ttm_device_init(&xe->ttm, &xe_ttm_funcs, xe->drm.dev,
 			      xe->drm.anon_inode->i_mapping,
 			      xe->drm.vma_offset_manager, 0);
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 616d74792902..22dfaec30c58 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -507,6 +507,9 @@ struct xe_device {
 	/** @in_recovery: Indicates if device is in recovery */
 	atomic_t in_recovery;
 
+	/** @devres_group_id: id for devres group */
+	void *devres_group_id;
+
 	/** @bo_device: Struct to control async free of BOs */
 	struct xe_bo_dev {
 		/** @bo_device.async_free: Free worker */
diff --git a/drivers/gpu/drm/xe/xe_pci_error.c b/drivers/gpu/drm/xe/xe_pci_error.c
index d4896a4a5014..ba62868f00d4 100644
--- a/drivers/gpu/drm/xe/xe_pci_error.c
+++ b/drivers/gpu/drm/xe/xe_pci_error.c
@@ -75,6 +75,7 @@ static pci_ers_result_t xe_pci_error_slot_reset(struct pci_dev *pdev)
 	 * requiring XE KMD to perform a device removal and reprobe.
 	 */
 	pdev->driver->remove(pdev);
+	devres_release_group(&pdev->dev, xe->devres_group_id);
 
 	if (!pdev->driver->probe(pdev, ent))
 		return PCI_ERS_RESULT_RECOVERED;
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 05/11] drm/xe: Skip device access during PCI error recovery
  2026-03-02 10:21 [PATCH v2 00/11] Introduce Xe Uncorrectable Error Handling Riana Tauro
                   ` (3 preceding siblings ...)
  2026-03-02 10:21 ` [PATCH v2 04/11] drm/xe/xe_pci_error: Group all devres to release them on PCIe slot reset Riana Tauro
@ 2026-03-02 10:22 ` Riana Tauro
  2026-03-04 10:59   ` Mallesh, Koujalagi
  2026-03-02 10:22 ` [PATCH v2 06/11] drm/xe/xe_ras: Initialize Uncorrectable AER Registers Riana Tauro
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 43+ messages in thread
From: Riana Tauro @ 2026-03-02 10:22 UTC (permalink / raw)
  To: intel-xe
  Cc: riana.tauro, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, raag.jadav, ravi.kishore.koppuravuri,
	mallesh.koujalagi, Matthew Brost, Himal Prasad Ghimiray

When a fatal error occurs and the error_detected callback is
invoked the device is inaccessible. The error_detected callback
wedges the device causing the jobs to timeout.

The timedout handler acquires forcewake to dump devcoredump and
triggers a GT reset. Since the device is inacessible this causes
errors. Skip all mmio accesses and gt reset when the device
is in recovery.

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
---
 drivers/gpu/drm/xe/xe_gt.c         | 11 ++++++++---
 drivers/gpu/drm/xe/xe_guc_submit.c |  9 +++++----
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index b455af1e6072..6f41090063bf 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -933,18 +933,23 @@ static void gt_reset_worker(struct work_struct *w)
 
 void xe_gt_reset_async(struct xe_gt *gt)
 {
-	xe_gt_info(gt, "trying reset from %ps\n", __builtin_return_address(0));
+	struct xe_device *xe = gt_to_xe(gt);
+
+	if (xe_device_is_in_recovery(xe))
+		return;
 
 	/* Don't do a reset while one is already in flight */
 	if (!xe_fault_inject_gt_reset() && xe_uc_reset_prepare(&gt->uc))
 		return;
 
+	xe_gt_info(gt, "trying reset from %ps\n", __builtin_return_address(0));
+
 	xe_gt_info(gt, "reset queued\n");
 
 	/* Pair with put in gt_reset_worker() if work is enqueued */
-	xe_pm_runtime_get_noresume(gt_to_xe(gt));
+	xe_pm_runtime_get_noresume(xe);
 	if (!queue_work(gt->ordered_wq, &gt->reset.worker))
-		xe_pm_runtime_put(gt_to_xe(gt));
+		xe_pm_runtime_put(xe);
 }
 
 void xe_gt_suspend_prepare(struct xe_gt *gt)
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index ca7aa4f358d0..c25658f1e44b 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1508,7 +1508,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 	 * If devcoredump not captured and GuC capture for the job is not ready
 	 * do manual capture first and decide later if we need to use it
 	 */
-	if (!exec_queue_killed(q) && !xe->devcoredump.captured &&
+	if (!xe_device_is_in_recovery(xe) && !exec_queue_killed(q) && !xe->devcoredump.captured &&
 	    !xe_guc_capture_get_matching_and_lock(q)) {
 		/* take force wake before engine register manual capture */
 		CLASS(xe_force_wake, fw_ref)(gt_to_fw(q->gt), XE_FORCEWAKE_ALL);
@@ -1530,8 +1530,8 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 	set_exec_queue_banned(q);
 
 	/* Kick job / queue off hardware */
-	if (!wedged && (exec_queue_enabled(primary) ||
-			exec_queue_pending_disable(primary))) {
+	if (!xe_device_is_in_recovery(xe) && !wedged &&
+	    (exec_queue_enabled(primary) || exec_queue_pending_disable(primary))) {
 		int ret;
 
 		if (exec_queue_reset(primary))
@@ -1599,7 +1599,8 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 
 	trace_xe_sched_job_timedout(job);
 
-	if (!exec_queue_killed(q))
+	/* Do not access device if in recovery */
+	if (!xe_device_is_in_recovery(xe) && !exec_queue_killed(q))
 		xe_devcoredump(q, job,
 			       "Timedout job - seqno=%u, lrc_seqno=%u, guc_id=%d, flags=0x%lx",
 			       xe_sched_job_seqno(job), xe_sched_job_lrc_seqno(job),
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 05/11] drm/xe: Skip device access during PCI error recovery
  2026-03-02 10:22 ` [PATCH v2 05/11] drm/xe: Skip device access during PCI error recovery Riana Tauro
@ 2026-03-04 10:59   ` Mallesh, Koujalagi
  0 siblings, 0 replies; 43+ messages in thread
From: Mallesh, Koujalagi @ 2026-03-04 10:59 UTC (permalink / raw)
  To: Riana Tauro, intel-xe
  Cc: anshuman.gupta, rodrigo.vivi, aravind.iddamsetty, badal.nilawar,
	raag.jadav, ravi.kishore.koppuravuri, Matthew Brost,
	Himal Prasad Ghimiray


On 02-03-2026 03:52 pm, Riana Tauro wrote:
> When a fatal error occurs and the error_detected callback is
> invoked the device is inaccessible. The error_detected callback
> wedges the device causing the jobs to timeout.
>
> The timedout handler acquires forcewake to dump devcoredump and
> triggers a GT reset. Since the device is inacessible this causes
> errors. Skip all mmio accesses and gt reset when the device
> is in recovery.
>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_gt.c         | 11 ++++++++---
>   drivers/gpu/drm/xe/xe_guc_submit.c |  9 +++++----
>   2 files changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> index b455af1e6072..6f41090063bf 100644
> --- a/drivers/gpu/drm/xe/xe_gt.c
> +++ b/drivers/gpu/drm/xe/xe_gt.c
> @@ -933,18 +933,23 @@ static void gt_reset_worker(struct work_struct *w)
>   
>   void xe_gt_reset_async(struct xe_gt *gt)
>   {
> -	xe_gt_info(gt, "trying reset from %ps\n", __builtin_return_address(0));
> +	struct xe_device *xe = gt_to_xe(gt);
> +
> +	if (xe_device_is_in_recovery(xe))
> +		return;

Need to check in_recovery flag in the gt_reset_worker() as well to skip 
GT reset when device in PCI recovery.

Thanks

-/Mallesh

>   	/* Don't do a reset while one is already in flight */
>   	if (!xe_fault_inject_gt_reset() && xe_uc_reset_prepare(&gt->uc))
>   		return;
>   
> +	xe_gt_info(gt, "trying reset from %ps\n", __builtin_return_address(0));
> +
>   	xe_gt_info(gt, "reset queued\n");
>   
>   	/* Pair with put in gt_reset_worker() if work is enqueued */
> -	xe_pm_runtime_get_noresume(gt_to_xe(gt));
> +	xe_pm_runtime_get_noresume(xe);
>   	if (!queue_work(gt->ordered_wq, &gt->reset.worker))
> -		xe_pm_runtime_put(gt_to_xe(gt));
> +		xe_pm_runtime_put(xe);
>   }
>   
>   void xe_gt_suspend_prepare(struct xe_gt *gt)
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index ca7aa4f358d0..c25658f1e44b 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -1508,7 +1508,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
>   	 * If devcoredump not captured and GuC capture for the job is not ready
>   	 * do manual capture first and decide later if we need to use it
>   	 */
> -	if (!exec_queue_killed(q) && !xe->devcoredump.captured &&
> +	if (!xe_device_is_in_recovery(xe) && !exec_queue_killed(q) && !xe->devcoredump.captured &&
>   	    !xe_guc_capture_get_matching_and_lock(q)) {
>   		/* take force wake before engine register manual capture */
>   		CLASS(xe_force_wake, fw_ref)(gt_to_fw(q->gt), XE_FORCEWAKE_ALL);
> @@ -1530,8 +1530,8 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
>   	set_exec_queue_banned(q);
>   
>   	/* Kick job / queue off hardware */
> -	if (!wedged && (exec_queue_enabled(primary) ||
> -			exec_queue_pending_disable(primary))) {
> +	if (!xe_device_is_in_recovery(xe) && !wedged &&
> +	    (exec_queue_enabled(primary) || exec_queue_pending_disable(primary))) {
>   		int ret;
>   
>   		if (exec_queue_reset(primary))
> @@ -1599,7 +1599,8 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
>   
>   	trace_xe_sched_job_timedout(job);
>   
> -	if (!exec_queue_killed(q))
> +	/* Do not access device if in recovery */
> +	if (!xe_device_is_in_recovery(xe) && !exec_queue_killed(q))
>   		xe_devcoredump(q, job,
>   			       "Timedout job - seqno=%u, lrc_seqno=%u, guc_id=%d, flags=0x%lx",
>   			       xe_sched_job_seqno(job), xe_sched_job_lrc_seqno(job),

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v2 06/11] drm/xe/xe_ras: Initialize Uncorrectable AER Registers
  2026-03-02 10:21 [PATCH v2 00/11] Introduce Xe Uncorrectable Error Handling Riana Tauro
                   ` (4 preceding siblings ...)
  2026-03-02 10:22 ` [PATCH v2 05/11] drm/xe: Skip device access during PCI error recovery Riana Tauro
@ 2026-03-02 10:22 ` Riana Tauro
  2026-03-02 10:22 ` [PATCH v2 07/11] drm/xe/xe_ras: Add structures and commands for Uncorrectable Core Compute Errors Riana Tauro
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 43+ messages in thread
From: Riana Tauro @ 2026-03-02 10:22 UTC (permalink / raw)
  To: intel-xe
  Cc: riana.tauro, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, raag.jadav, ravi.kishore.koppuravuri,
	mallesh.koujalagi

Uncorrectable errors from different endpoints in the device are steered to
the USP which is a PCI Advanced Error Reporting (AER) Compliant device.
Downgrade all the errors to non-fatal to prevent PCIe bus driver
from triggering a Secondary Bus Reset (SBR). This allows error
detection, containment and recovery in the driver.

The Uncorrectable Error Severity Register has the 'Uncorrectable
Internal Error Severity' set to fatal by default. Set this to
non-fatal and unmask the error.

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
---
v2: clear stale uncorrectable internal status in status register
(Aravind)
---
 drivers/gpu/drm/xe/Makefile    |  1 +
 drivers/gpu/drm/xe/xe_device.c |  3 ++
 drivers/gpu/drm/xe/xe_ras.c    | 78 ++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_ras.h    | 13 ++++++
 4 files changed, 95 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/xe_ras.c
 create mode 100644 drivers/gpu/drm/xe/xe_ras.h

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index 417b030e5ce7..47cc8a572112 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -111,6 +111,7 @@ xe-y += xe_bb.o \
 	xe_pxp_debugfs.o \
 	xe_pxp_submit.o \
 	xe_query.o \
+	xe_ras.o \
 	xe_range_fence.o \
 	xe_reg_sr.o \
 	xe_reg_whitelist.o \
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index eea67cf13539..d2978c52efed 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -60,6 +60,7 @@
 #include "xe_psmi.h"
 #include "xe_pxp.h"
 #include "xe_query.h"
+#include "xe_ras.h"
 #include "xe_shrinker.h"
 #include "xe_soc_remapper.h"
 #include "xe_survivability_mode.h"
@@ -1016,6 +1017,8 @@ int xe_device_probe(struct xe_device *xe)
 
 	xe_vsec_init(xe);
 
+	xe_ras_init(xe);
+
 	err = xe_sriov_init_late(xe);
 	if (err)
 		goto err_unregister_display;
diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
new file mode 100644
index 000000000000..bc7615c6c1be
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_ras.c
@@ -0,0 +1,78 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#include "xe_device_types.h"
+#include "xe_ras.h"
+
+#ifdef CONFIG_PCIEAER
+static void aer_unmask_and_downgrade_internal_error(struct xe_device *xe)
+{
+	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
+	struct pci_dev *vsp, *usp;
+	u32 aer_uncorr_mask, aer_uncorr_sev, aer_uncorr_status;
+	u16 aer_cap;
+
+	 /* Gfx Device Hierarchy: USP-->VSP-->SGunit */
+	vsp = pci_upstream_bridge(pdev);
+	if (!vsp)
+		return;
+
+	usp = pci_upstream_bridge(vsp);
+	if (!usp)
+		return;
+
+	aer_cap = usp->aer_cap;
+
+	if (!aer_cap)
+		return;
+
+	/*
+	 * Clear any stale Uncorrectable Internal Error Status event in Uncorrectable Error
+	 * Status Register.
+	 */
+	pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_STATUS, &aer_uncorr_status);
+	if (aer_uncorr_status & PCI_ERR_UNC_INTN)
+		pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_STATUS, PCI_ERR_UNC_INTN);
+
+	/*
+	 * All errors are steered to USP which is a PCIe AER Compliant device.
+	 * Downgrade all the errors to non-fatal to prevent PCIe bus driver
+	 * from triggering a Secondary Bus Reset (SBR). This allows error
+	 * detection, containment and recovery in the driver.
+	 *
+	 * The Uncorrectable Error Severity Register has the 'Uncorrectable
+	 * Internal Error Severity' set to fatal by default. Set this to
+	 * non-fatal and unmask the error.
+	 */
+
+	/* Initialize Uncorrectable Error Severity Register */
+	pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_SEVER, &aer_uncorr_sev);
+	aer_uncorr_sev &= ~PCI_ERR_UNC_INTN;
+	pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_SEVER, aer_uncorr_sev);
+
+	/* Initialize Uncorrectable Error Mask Register */
+	pci_read_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, &aer_uncorr_mask);
+	aer_uncorr_mask &= ~PCI_ERR_UNC_INTN;
+	pci_write_config_dword(usp, aer_cap + PCI_ERR_UNCOR_MASK, aer_uncorr_mask);
+
+	pci_save_state(usp);
+}
+#endif
+
+/**
+ * xe_ras_init - Initialize Xe RAS
+ * @xe: xe device instance
+ *
+ * Initialize Xe RAS
+ */
+void xe_ras_init(struct xe_device *xe)
+{
+	if (!xe->info.has_sysctrl)
+		return;
+
+#ifdef CONFIG_PCIEAER
+	aer_unmask_and_downgrade_internal_error(xe);
+#endif
+}
diff --git a/drivers/gpu/drm/xe/xe_ras.h b/drivers/gpu/drm/xe/xe_ras.h
new file mode 100644
index 000000000000..14cb973603e7
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_ras.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#ifndef _XE_RAS_H_
+#define _XE_RAS_H_
+
+struct xe_device;
+
+void xe_ras_init(struct xe_device *xe);
+
+#endif
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 07/11] drm/xe/xe_ras: Add structures and commands for Uncorrectable Core Compute Errors
  2026-03-02 10:21 [PATCH v2 00/11] Introduce Xe Uncorrectable Error Handling Riana Tauro
                   ` (5 preceding siblings ...)
  2026-03-02 10:22 ` [PATCH v2 06/11] drm/xe/xe_ras: Initialize Uncorrectable AER Registers Riana Tauro
@ 2026-03-02 10:22 ` Riana Tauro
  2026-03-04 16:32   ` Raag Jadav
  2026-03-02 10:22 ` [PATCH v2 08/11] drm/xe/xe_ras: Add support for Uncorrectable Core-Compute errors Riana Tauro
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 43+ messages in thread
From: Riana Tauro @ 2026-03-02 10:22 UTC (permalink / raw)
  To: intel-xe
  Cc: riana.tauro, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, raag.jadav, ravi.kishore.koppuravuri,
	mallesh.koujalagi

Add the sysctrl commands and response structures for Uncorrectable
Core Compute errors.

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
---
v2: use macro max error details (Mallesh)
    use static_assert
    fix kernel-doc
    remove xe_assert
---
 drivers/gpu/drm/xe/xe_ras.c                   |  56 ++++++++
 drivers/gpu/drm/xe/xe_ras_types.h             | 132 ++++++++++++++++++
 drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h |  13 ++
 3 files changed, 201 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/xe_ras_types.h

diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
index bc7615c6c1be..3bef589082d7 100644
--- a/drivers/gpu/drm/xe/xe_ras.c
+++ b/drivers/gpu/drm/xe/xe_ras.c
@@ -6,6 +6,62 @@
 #include "xe_device_types.h"
 #include "xe_ras.h"
 
+/* Severity classification of detected errors */
+enum xe_ras_severity {
+	XE_RAS_SEVERITY_NOT_SUPPORTED = 0,
+	XE_RAS_SEVERITY_CORRECTABLE,
+	XE_RAS_SEVERITY_UNCORRECTABLE,
+	XE_RAS_SEVERITY_INFORMATIONAL,
+	XE_RAS_SEVERITY_MAX
+};
+
+/* major IP blocks where errors can originate */
+enum xe_ras_component {
+	XE_RAS_COMPONENT_NOT_SUPPORTED = 0,
+	XE_RAS_COMPONENT_DEVICE_MEMORY,
+	XE_RAS_COMPONENT_CORE_COMPUTE,
+	XE_RAS_COMPONENT_RESERVED,
+	XE_RAS_COMPONENT_PCIE,
+	XE_RAS_COMPONENT_FABRIC,
+	XE_RAS_COMPONENT_SOC_INTERNAL,
+	XE_RAS_COMPONENT_MAX
+};
+
+static const char * const xe_ras_severities[] = {
+	[XE_RAS_SEVERITY_NOT_SUPPORTED]		= "Not Supported",
+	[XE_RAS_SEVERITY_CORRECTABLE]		= "Correctable",
+	[XE_RAS_SEVERITY_UNCORRECTABLE]		= "Uncorrectable",
+	[XE_RAS_SEVERITY_INFORMATIONAL]		= "Informational",
+};
+static_assert(ARRAY_SIZE(xe_ras_severities) == XE_RAS_SEVERITY_MAX);
+
+static const char * const xe_ras_components[] = {
+	[XE_RAS_COMPONENT_NOT_SUPPORTED]	= "Not Supported",
+	[XE_RAS_COMPONENT_DEVICE_MEMORY]	= "Device Memory",
+	[XE_RAS_COMPONENT_CORE_COMPUTE]		= "Core Compute",
+	[XE_RAS_COMPONENT_RESERVED]		= "Reserved",
+	[XE_RAS_COMPONENT_PCIE]			= "PCIe",
+	[XE_RAS_COMPONENT_FABRIC]		= "Fabric",
+	[XE_RAS_COMPONENT_SOC_INTERNAL]		= "SoC Internal",
+};
+static_assert(ARRAY_SIZE(xe_ras_components) == XE_RAS_COMPONENT_MAX);
+
+static inline const char *severity_to_str(struct xe_device *xe, u32 severity)
+{
+	if (severity >= XE_RAS_SEVERITY_MAX)
+		return "Unknown Severity";
+
+	return xe_ras_severities[severity];
+}
+
+static inline const char *comp_to_str(struct xe_device *xe, u32 comp)
+{
+	if (comp >= XE_RAS_COMPONENT_MAX)
+		return "Unknown Component";
+
+	return xe_ras_components[comp];
+}
+
 #ifdef CONFIG_PCIEAER
 static void aer_unmask_and_downgrade_internal_error(struct xe_device *xe)
 {
diff --git a/drivers/gpu/drm/xe/xe_ras_types.h b/drivers/gpu/drm/xe/xe_ras_types.h
new file mode 100644
index 000000000000..676755732ef6
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_ras_types.h
@@ -0,0 +1,132 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2026 Intel Corporation
+ */
+
+#ifndef _XE_RAS_TYPES_H_
+#define _XE_RAS_TYPES_H_
+
+#include <linux/types.h>
+
+#define XE_RAS_NUM_ERROR_ARR		3
+#define XE_RAS_MAX_ERROR_DETAILS	16
+
+/**
+ * struct xe_ras_error_common - Common RAS error class
+ *
+ * This structure contains error severity and component information
+ * across all products
+ */
+struct xe_ras_error_common {
+	/** @severity: Error Severity */
+	u8 severity;
+	/** @component: IP where the error originated */
+	u8 component;
+} __packed;
+
+/**
+ * struct xe_ras_error_unit - Error unit information
+ */
+struct xe_ras_error_unit {
+	/** @tile: Tile identifier */
+	u8 tile;
+	/** @instance: Instance identifier within a component */
+	u32 instance;
+} __packed;
+
+/**
+ * struct xe_ras_error_cause - Error cause information
+ */
+struct xe_ras_error_cause {
+	/** @cause: Cause */
+	u32 cause;
+	/** @reserved: For future use */
+	u8 reserved;
+} __packed;
+
+/**
+ * struct xe_ras_error_product - Error fields that are specific to the product
+ */
+struct xe_ras_error_product {
+	/** @unit: Unit within IP block */
+	struct xe_ras_error_unit unit;
+	/** @error_cause: Cause/checker */
+	struct xe_ras_error_cause error_cause;
+} __packed;
+
+/**
+ * struct xe_ras_error_class - Complete RAS Error Class
+ *
+ * This structure provides the complete error classification by combining
+ * the common error class with the product-specific error class.
+ */
+struct xe_ras_error_class {
+	/** @common: Common error severity and component */
+	struct xe_ras_error_common common;
+	/** @product: Product-specific unit and cause */
+	struct xe_ras_error_product product;
+} __packed;
+
+/**
+ * struct xe_ras_error_array - Details of the error types
+ */
+struct xe_ras_error_array {
+	/** @error_class: Error class */
+	struct xe_ras_error_class error_class;
+	/** @timestamp: Timestamp */
+	u64 timestamp;
+	/** @error_details: Error details specific to the class */
+	u32 error_details[XE_RAS_MAX_ERROR_DETAILS];
+} __packed;
+
+/**
+ * struct xe_ras_get_error_response - Response for XE_SYSCTRL_GET_SOC_ERROR
+ */
+struct xe_ras_get_error_response {
+	/** @num_errors: Number of errors reported in this response */
+	u8 num_errors;
+	/** @additional_errors: Indicates if the errors are pending */
+	u8 additional_errors;
+	/** @error_arr: Array of up to 3 errors */
+	struct xe_ras_error_array error_arr[XE_RAS_NUM_ERROR_ARR];
+} __packed;
+
+/**
+ * struct xe_ras_compute_error - Error details of Core Compute error
+ */
+struct xe_ras_compute_error {
+	/** @error_log_header: Error Source and type */
+	u32 error_log_header;
+	/** @internal_error_log: Internal Error log */
+	u32 internal_error_log;
+	/** @fabric_log: Fabric Error log */
+	u32 fabric_log;
+	/** @internal_error_addr_log0: Internal Error addr log */
+	u32 internal_error_addr_log0;
+	/** @internal_error_addr_log1: Internal Error addr log */
+	u32 internal_error_addr_log1;
+	/** @packet_log0: Packet log */
+	u32 packet_log0;
+	/** @packet_log1: Packet log */
+	u32 packet_log1;
+	/** @packet_log2: Packet log */
+	u32 packet_log2;
+	/** @packet_log3: Packet log */
+	u32 packet_log3;
+	/** @packet_log4: Packet log */
+	u32 packet_log4;
+	/** @misc_log0: Misc log */
+	u32 misc_log0;
+	/** @misc_log1: Misc log */
+	u32 misc_log1;
+	/** @spare_log0: Spare log */
+	u32 spare_log0;
+	/** @spare_log1: Spare log */
+	u32 spare_log1;
+	/** @spare_log2: Spare log */
+	u32 spare_log2;
+	/** @spare_log3: Spare log */
+	u32 spare_log3;
+} __packed;
+
+#endif
diff --git a/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h b/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
index ce10924c5881..14e2d7989fcc 100644
--- a/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
+++ b/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
@@ -8,6 +8,19 @@
 
 #include <linux/types.h>
 
+/**
+ * enum xe_sysctrl_mailbox_command_id - RAS Command ID's for GFSP group
+ *
+ * @XE_SYSCTRL_CMD_GET_SOC_ERROR: Get basic error information
+ */
+enum xe_sysctrl_mailbox_command_id {
+	XE_SYSCTRL_CMD_GET_SOC_ERROR = 1
+};
+
+enum xe_sysctrl_group {
+	XE_SYSCTRL_GROUP_GFSP = 1
+};
+
 struct xe_sysctrl_mailbox_mkhi_msg_hdr {
 	__le32 data;
 } __packed;
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 07/11] drm/xe/xe_ras: Add structures and commands for Uncorrectable Core Compute Errors
  2026-03-02 10:22 ` [PATCH v2 07/11] drm/xe/xe_ras: Add structures and commands for Uncorrectable Core Compute Errors Riana Tauro
@ 2026-03-04 16:32   ` Raag Jadav
  2026-03-31 16:14     ` Tauro, Riana
  0 siblings, 1 reply; 43+ messages in thread
From: Raag Jadav @ 2026-03-04 16:32 UTC (permalink / raw)
  To: Riana Tauro
  Cc: intel-xe, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, ravi.kishore.koppuravuri, mallesh.koujalagi

On Mon, Mar 02, 2026 at 03:52:02PM +0530, Riana Tauro wrote:
> Add the sysctrl commands and response structures for Uncorrectable
> Core Compute errors.

...

> +static inline const char *severity_to_str(struct xe_device *xe, u32 severity)
> +{
> +	if (severity >= XE_RAS_SEVERITY_MAX)
> +		return "Unknown Severity";
> +
> +	return xe_ras_severities[severity];

Rather,

	return sev < XE_RAS_SEV_MAX ? xe_ras_severities[sev] : "Unknown";

and since this should ideally never happen, I'd also add a

	xe_assert(xe, sev < XE_RAS_SEV_MAX);
> +}
> +
> +static inline const char *comp_to_str(struct xe_device *xe, u32 comp)
> +{
> +	if (comp >= XE_RAS_COMPONENT_MAX)
> +		return "Unknown Component";
> +
> +	return xe_ras_components[comp];

Ditto.

Raag

> +}

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 07/11] drm/xe/xe_ras: Add structures and commands for Uncorrectable Core Compute Errors
  2026-03-04 16:32   ` Raag Jadav
@ 2026-03-31 16:14     ` Tauro, Riana
  2026-04-01  6:25       ` Raag Jadav
  0 siblings, 1 reply; 43+ messages in thread
From: Tauro, Riana @ 2026-03-31 16:14 UTC (permalink / raw)
  To: Raag Jadav
  Cc: intel-xe, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, ravi.kishore.koppuravuri, mallesh.koujalagi


On 3/4/2026 10:02 PM, Raag Jadav wrote:
> On Mon, Mar 02, 2026 at 03:52:02PM +0530, Riana Tauro wrote:
>> Add the sysctrl commands and response structures for Uncorrectable
>> Core Compute errors.
> ...
>
>> +static inline const char *severity_to_str(struct xe_device *xe, u32 severity)
>> +{
>> +	if (severity >= XE_RAS_SEVERITY_MAX)
>> +		return "Unknown Severity";
>> +
>> +	return xe_ras_severities[severity];
> Rather,
>
> 	return sev < XE_RAS_SEV_MAX ? xe_ras_severities[sev] : "Unknown";
I can make this change
>
> and since this should ideally never happen, I'd also add a
>
> 	xe_assert(xe, sev < XE_RAS_SEV_MAX);
This is only used in debug builds.  Won't be useful anyway..

Thanks
Riana
>> +}
>> +
>> +static inline const char *comp_to_str(struct xe_device *xe, u32 comp)
>> +{
>> +	if (comp >= XE_RAS_COMPONENT_MAX)
>> +		return "Unknown Component";
>> +
>> +	return xe_ras_components[comp];
> Ditto.
>
> Raag
>
>> +}

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 07/11] drm/xe/xe_ras: Add structures and commands for Uncorrectable Core Compute Errors
  2026-03-31 16:14     ` Tauro, Riana
@ 2026-04-01  6:25       ` Raag Jadav
  2026-04-01  6:39         ` Tauro, Riana
  0 siblings, 1 reply; 43+ messages in thread
From: Raag Jadav @ 2026-04-01  6:25 UTC (permalink / raw)
  To: Tauro, Riana
  Cc: intel-xe, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, ravi.kishore.koppuravuri, mallesh.koujalagi

On Tue, Mar 31, 2026 at 09:44:10PM +0530, Tauro, Riana wrote:
> On 3/4/2026 10:02 PM, Raag Jadav wrote:
> > On Mon, Mar 02, 2026 at 03:52:02PM +0530, Riana Tauro wrote:
> > > Add the sysctrl commands and response structures for Uncorrectable
> > > Core Compute errors.
> > ...
> > 
> > > +static inline const char *severity_to_str(struct xe_device *xe, u32 severity)
> > > +{
> > > +	if (severity >= XE_RAS_SEVERITY_MAX)
> > > +		return "Unknown Severity";
> > > +
> > > +	return xe_ras_severities[severity];
> > Rather,
> > 
> > 	return sev < XE_RAS_SEV_MAX ? xe_ras_severities[sev] : "Unknown";
> I can make this change
> > 
> > and since this should ideally never happen, I'd also add a
> > 
> > 	xe_assert(xe, sev < XE_RAS_SEV_MAX);
> This is only used in debug builds.

Exactly, we need a splat on out of spec behaviours in debug builds.

Raag

> Won't be useful anyway..
> 
> Thanks
> Riana
> > > +}
> > > +
> > > +static inline const char *comp_to_str(struct xe_device *xe, u32 comp)
> > > +{
> > > +	if (comp >= XE_RAS_COMPONENT_MAX)
> > > +		return "Unknown Component";
> > > +
> > > +	return xe_ras_components[comp];
> > Ditto.
> > 
> > Raag
> > 
> > > +}

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 07/11] drm/xe/xe_ras: Add structures and commands for Uncorrectable Core Compute Errors
  2026-04-01  6:25       ` Raag Jadav
@ 2026-04-01  6:39         ` Tauro, Riana
  0 siblings, 0 replies; 43+ messages in thread
From: Tauro, Riana @ 2026-04-01  6:39 UTC (permalink / raw)
  To: Raag Jadav
  Cc: intel-xe, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, ravi.kishore.koppuravuri, mallesh.koujalagi


On 4/1/2026 11:55 AM, Raag Jadav wrote:
> On Tue, Mar 31, 2026 at 09:44:10PM +0530, Tauro, Riana wrote:
>> On 3/4/2026 10:02 PM, Raag Jadav wrote:
>>> On Mon, Mar 02, 2026 at 03:52:02PM +0530, Riana Tauro wrote:
>>>> Add the sysctrl commands and response structures for Uncorrectable
>>>> Core Compute errors.
>>> ...
>>>
>>>> +static inline const char *severity_to_str(struct xe_device *xe, u32 severity)
>>>> +{
>>>> +	if (severity >= XE_RAS_SEVERITY_MAX)
>>>> +		return "Unknown Severity";
>>>> +
>>>> +	return xe_ras_severities[severity];
>>> Rather,
>>>
>>> 	return sev < XE_RAS_SEV_MAX ? xe_ras_severities[sev] : "Unknown";
>> I can make this change
>>> and since this should ideally never happen, I'd also add a
>>>
>>> 	xe_assert(xe, sev < XE_RAS_SEV_MAX);
>> This is only used in debug builds.
> Exactly, we need a splat on out of spec behaviours in debug builds.

Then i will add it in the default switch case

Riana


>
> Raag
>
>> Won't be useful anyway..
>>
>> Thanks
>> Riana
>>>> +}
>>>> +
>>>> +static inline const char *comp_to_str(struct xe_device *xe, u32 comp)
>>>> +{
>>>> +	if (comp >= XE_RAS_COMPONENT_MAX)
>>>> +		return "Unknown Component";
>>>> +
>>>> +	return xe_ras_components[comp];
>>> Ditto.
>>>
>>> Raag
>>>
>>>> +}

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v2 08/11] drm/xe/xe_ras: Add support for Uncorrectable Core-Compute errors
  2026-03-02 10:21 [PATCH v2 00/11] Introduce Xe Uncorrectable Error Handling Riana Tauro
                   ` (6 preceding siblings ...)
  2026-03-02 10:22 ` [PATCH v2 07/11] drm/xe/xe_ras: Add structures and commands for Uncorrectable Core Compute Errors Riana Tauro
@ 2026-03-02 10:22 ` Riana Tauro
  2026-03-04 16:52   ` Raag Jadav
  2026-03-06  3:50   ` [v2,08/11] " Purkait, Soham
  2026-03-02 10:22 ` [PATCH v2 09/11] drm/xe/xe_ras: Add structures for SoC Internal errors Riana Tauro
                   ` (6 subsequent siblings)
  14 siblings, 2 replies; 43+ messages in thread
From: Riana Tauro @ 2026-03-02 10:22 UTC (permalink / raw)
  To: intel-xe
  Cc: riana.tauro, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, raag.jadav, ravi.kishore.koppuravuri,
	mallesh.koujalagi

Uncorrectable Core-Compute errors are classified into Global and Local
errors.

Global error is an error that affects the entire device requiring a
reset. This type of error is not isolated. When an AER is reported and
error_detected is invoked return PCI_ERS_RESULT_NEED_RESET.

A Local error is confined to a specific component or context like a
engine. These errors can be contained and recovered by resetting
only the affected part without distrupting the rest of the device.

Upon detection of an Uncorrectable Local Core-Compute error, an AER is
generated and GuC is notified of the error. The KMD then sets
the context as non-runnable and initiates an engine reset.
(TODO: GuC <->KMD communication for the error).
Since the error is contained and recovered, PCI error handling
callback returns PCI_ERS_RESULT_RECOVERED.

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
---
v2: add newline and fix log
    add bounds check (Mallesh)
    add ras specific enum (Raag)
    helper for sysctrl prepare command
    process all errors before deciding recovery action
---
 drivers/gpu/drm/xe/xe_ras.c       | 139 ++++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_ras.h       |   3 +
 drivers/gpu/drm/xe/xe_ras_types.h |  16 ++++
 3 files changed, 158 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
index 3bef589082d7..61c01a4bfadb 100644
--- a/drivers/gpu/drm/xe/xe_ras.c
+++ b/drivers/gpu/drm/xe/xe_ras.c
@@ -4,7 +4,14 @@
  */
 
 #include "xe_device_types.h"
+#include "xe_printk.h"
 #include "xe_ras.h"
+#include "xe_ras_types.h"
+#include "xe_sysctrl_mailbox.h"
+#include "xe_sysctrl_mailbox_types.h"
+
+#define COMPUTE_ERROR_SEVERITY_MASK		GENMASK(26, 25)
+#define GLOBAL_UNCORR_ERROR			2
 
 /* Severity classification of detected errors */
 enum xe_ras_severity {
@@ -62,6 +69,138 @@ static inline const char *comp_to_str(struct xe_device *xe, u32 comp)
 	return xe_ras_components[comp];
 }
 
+static void log_ras_error(struct xe_device *xe, struct xe_ras_error_class *error_class)
+{
+	struct xe_ras_error_common common_info = error_class->common;
+	struct xe_ras_error_product product_info = error_class->product;
+	u8 tile = product_info.unit.tile;
+	u32 instance = product_info.unit.instance;
+	u32 cause = product_info.error_cause.cause;
+
+	xe_err(xe, "[RAS]: Tile%u, Instance %u, %s %s Error detected Cause: 0x%x\n",
+	       tile, instance, severity_to_str(xe, common_info.severity),
+	       comp_to_str(xe, common_info.component), cause);
+}
+
+static enum xe_ras_recovery_action handle_compute_errors(struct xe_device *xe,
+							 struct xe_ras_error_array *arr)
+{
+	struct xe_ras_compute_error *error_info = (struct xe_ras_compute_error *)arr->error_details;
+	u8 uncorr_type;
+
+	uncorr_type = FIELD_GET(COMPUTE_ERROR_SEVERITY_MASK, error_info->error_log_header);
+	log_ras_error(xe, &arr->error_class);
+
+	xe_err(xe, "[RAS]: Core Compute Error: timestamp %llu Uncorrected error type %u\n",
+	       arr->timestamp, uncorr_type);
+
+	/* Request a RESET if error is global */
+	if (uncorr_type == GLOBAL_UNCORR_ERROR)
+		return XE_RAS_RECOVERY_ACTION_RESET;
+
+	/* Local errors are recovered using a engine reset */
+	return XE_RAS_RECOVERY_ACTION_RECOVERED;
+}
+
+static void xe_ras_prepare_sysctrl_command(struct xe_sysctrl_mailbox_command *command,
+					   u32 cmd_mask, void *request, size_t request_len,
+					   void *response, size_t response_len)
+{
+	struct xe_sysctrl_mailbox_app_msg_hdr hdr = {0};
+	u32 req_hdr;
+
+	req_hdr = FIELD_PREP(APP_HDR_GROUP_ID_MASK, XE_SYSCTRL_GROUP_GFSP) |
+		  FIELD_PREP(APP_HDR_COMMAND_MASK, cmd_mask);
+
+	hdr.data = req_hdr;
+	command->header = hdr;
+	command->data_in = request;
+	command->data_in_len = request_len;
+	command->data_out = response;
+	command->data_out_len = response_len;
+}
+
+/**
+ * xe_ras_process_errors - Process and contain hardware errors
+ * @xe: xe device instance
+ *
+ * Get error details from system controller and return recovery
+ * method. Called only from PCI error handling.
+ *
+ * Returns: recovery action to be taken
+ */
+enum xe_ras_recovery_action xe_ras_process_errors(struct xe_device *xe)
+{
+	struct xe_sysctrl_mailbox_command command = {0};
+	struct xe_ras_get_error_response response;
+	enum xe_ras_recovery_action final_action;
+	size_t rlen;
+	int ret;
+
+	/* Default action */
+	final_action = XE_RAS_RECOVERY_ACTION_RECOVERED;
+
+	if (!xe->info.has_sysctrl)
+		return XE_RAS_RECOVERY_ACTION_RESET;
+
+	xe_ras_prepare_sysctrl_command(&command, XE_SYSCTRL_CMD_GET_SOC_ERROR, NULL, 0,
+				       &response, sizeof(response));
+
+	do {
+		memset(&response, 0, sizeof(response));
+		rlen = 0;
+
+		ret = xe_sysctrl_send_command(xe, &command, &rlen);
+		if (ret || !rlen) {
+			xe_err(xe, "[RAS]: Sysctrl error ret %d\n", ret);
+			goto err;
+		}
+
+		if (rlen != sizeof(response)) {
+			xe_err(xe, "[RAS]: Sysctrl response does not match len!!\n");
+			goto err;
+		}
+
+		if (response.num_errors > XE_RAS_NUM_ERROR_ARR) {
+			xe_err(xe, "[RAS]: Number of errors out of bound (%d)\n",
+			       XE_RAS_NUM_ERROR_ARR);
+			goto err;
+		}
+
+		for (int i = 0; i < response.num_errors; i++) {
+			struct xe_ras_error_array arr = response.error_arr[i];
+			enum xe_ras_recovery_action action;
+			struct xe_ras_error_class error_class;
+			u8 component;
+
+			error_class = arr.error_class;
+			component = error_class.common.component;
+
+			switch (component) {
+			case XE_RAS_COMPONENT_CORE_COMPUTE:
+				action = handle_compute_errors(xe, &arr);
+				break;
+			default:
+				xe_err(xe, "[RAS]: Unknown error component %u\n", component);
+				break;
+			}
+
+			/*
+			 * Retain the highest severity action. Process and log all errors
+			 * and then take appropriate recovery action
+			 */
+			if (action > final_action)
+				final_action = action;
+		}
+
+	} while (response.additional_errors);
+
+	return final_action;
+
+err:
+	return XE_RAS_RECOVERY_ACTION_RESET;
+}
+
 #ifdef CONFIG_PCIEAER
 static void aer_unmask_and_downgrade_internal_error(struct xe_device *xe)
 {
diff --git a/drivers/gpu/drm/xe/xe_ras.h b/drivers/gpu/drm/xe/xe_ras.h
index 14cb973603e7..e191ab80080c 100644
--- a/drivers/gpu/drm/xe/xe_ras.h
+++ b/drivers/gpu/drm/xe/xe_ras.h
@@ -6,8 +6,11 @@
 #ifndef _XE_RAS_H_
 #define _XE_RAS_H_
 
+#include "xe_ras_types.h"
+
 struct xe_device;
 
 void xe_ras_init(struct xe_device *xe);
+enum xe_ras_recovery_action  xe_ras_process_errors(struct xe_device *xe);
 
 #endif
diff --git a/drivers/gpu/drm/xe/xe_ras_types.h b/drivers/gpu/drm/xe/xe_ras_types.h
index 676755732ef6..221d07efd84c 100644
--- a/drivers/gpu/drm/xe/xe_ras_types.h
+++ b/drivers/gpu/drm/xe/xe_ras_types.h
@@ -11,6 +11,22 @@
 #define XE_RAS_NUM_ERROR_ARR		3
 #define XE_RAS_MAX_ERROR_DETAILS	16
 
+/**
+ * enum xe_ras_recovery_action - RAS recovery actions
+ *
+ * @XE_RAS_RECOVERY_ACTION_RECOVERED: Error recovered
+ * @XE_RAS_RECOVERY_ACTION_RESET: Requires reset
+ * @XE_RAS_RECOVERY_ACTION_DISCONNECT: Requires disconnect
+ *
+ * This enum defines the possible recovery actions that can be taken in response
+ * to RAS errors.
+ */
+enum xe_ras_recovery_action {
+	XE_RAS_RECOVERY_ACTION_RECOVERED = 0,
+	XE_RAS_RECOVERY_ACTION_RESET,
+	XE_RAS_RECOVERY_ACTION_DISCONNECT
+};
+
 /**
  * struct xe_ras_error_common - Common RAS error class
  *
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 08/11] drm/xe/xe_ras: Add support for Uncorrectable Core-Compute errors
  2026-03-02 10:22 ` [PATCH v2 08/11] drm/xe/xe_ras: Add support for Uncorrectable Core-Compute errors Riana Tauro
@ 2026-03-04 16:52   ` Raag Jadav
  2026-03-06 18:37     ` Raag Jadav
  2026-03-31 16:24     ` Tauro, Riana
  2026-03-06  3:50   ` [v2,08/11] " Purkait, Soham
  1 sibling, 2 replies; 43+ messages in thread
From: Raag Jadav @ 2026-03-04 16:52 UTC (permalink / raw)
  To: Riana Tauro
  Cc: intel-xe, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, ravi.kishore.koppuravuri, mallesh.koujalagi

On Mon, Mar 02, 2026 at 03:52:03PM +0530, Riana Tauro wrote:
> Uncorrectable Core-Compute errors are classified into Global and Local
> errors.
> 
> Global error is an error that affects the entire device requiring a
> reset. This type of error is not isolated. When an AER is reported and
> error_detected is invoked return PCI_ERS_RESULT_NEED_RESET.
> 
> A Local error is confined to a specific component or context like a
> engine. These errors can be contained and recovered by resetting
> only the affected part without distrupting the rest of the device.
> 
> Upon detection of an Uncorrectable Local Core-Compute error, an AER is
> generated and GuC is notified of the error. The KMD then sets
> the context as non-runnable and initiates an engine reset.
> (TODO: GuC <->KMD communication for the error).

TODOs are more useful in the code, so we can actually find them ;)

> Since the error is contained and recovered, PCI error handling
> callback returns PCI_ERS_RESULT_RECOVERED.

...

> +enum xe_ras_recovery_action xe_ras_process_errors(struct xe_device *xe)
> +{
> +	struct xe_sysctrl_mailbox_command command = {0};
> +	struct xe_ras_get_error_response response;
> +	enum xe_ras_recovery_action final_action;
> +	size_t rlen;
> +	int ret;
> +
> +	/* Default action */
> +	final_action = XE_RAS_RECOVERY_ACTION_RECOVERED;
> +
> +	if (!xe->info.has_sysctrl)
> +		return XE_RAS_RECOVERY_ACTION_RESET;
> +
> +	xe_ras_prepare_sysctrl_command(&command, XE_SYSCTRL_CMD_GET_SOC_ERROR, NULL, 0,
> +				       &response, sizeof(response));
> +
> +	do {
> +		memset(&response, 0, sizeof(response));
> +		rlen = 0;
> +
> +		ret = xe_sysctrl_send_command(xe, &command, &rlen);
> +		if (ret || !rlen) {

We'd probably want them to be separate cases so we know what actually
happened. Besides, I think !rlen is redundant here since you're already
handling it below.

> +			xe_err(xe, "[RAS]: Sysctrl error ret %d\n", ret);
> +			goto err;
> +		}
> +
> +		if (rlen != sizeof(response)) {
> +			xe_err(xe, "[RAS]: Sysctrl response does not match len!!\n");

I'd print rlen as well.

> +			goto err;
> +		}
> +
> +		if (response.num_errors > XE_RAS_NUM_ERROR_ARR) {

I'd handle this as part of for loop below so we atleast have the chance
to recover based on initial errors.

> +			xe_err(xe, "[RAS]: Number of errors out of bound (%d)\n",
> +			       XE_RAS_NUM_ERROR_ARR);
> +			goto err;
> +		}
> +
> +		for (int i = 0; i < response.num_errors; i++) {

		for (int i = 0; i < response.num_errors && i < XE_RAS_NUM_ERROR_ARR; i++)

> +			struct xe_ras_error_array arr = response.error_arr[i];
> +			enum xe_ras_recovery_action action;
> +			struct xe_ras_error_class error_class;
> +			u8 component;
> +
> +			error_class = arr.error_class;
> +			component = error_class.common.component;
> +
> +			switch (component) {
> +			case XE_RAS_COMPONENT_CORE_COMPUTE:
> +				action = handle_compute_errors(xe, &arr);
> +				break;
> +			default:
> +				xe_err(xe, "[RAS]: Unknown error component %u\n", component);
> +				break;
> +			}
> +
> +			/*
> +			 * Retain the highest severity action. Process and log all errors
> +			 * and then take appropriate recovery action

Punctuations.

> +			 */
> +			if (action > final_action)
> +				final_action = action;
> +		}
> +
> +	} while (response.additional_errors);

I know we're not NASA but I'd try to have some timeout instead of blindly
trusting the hardware.

Raag

> +	return final_action;
> +
> +err:
> +	return XE_RAS_RECOVERY_ACTION_RESET;
> +}

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 08/11] drm/xe/xe_ras: Add support for Uncorrectable Core-Compute errors
  2026-03-04 16:52   ` Raag Jadav
@ 2026-03-06 18:37     ` Raag Jadav
  2026-03-31 16:24     ` Tauro, Riana
  1 sibling, 0 replies; 43+ messages in thread
From: Raag Jadav @ 2026-03-06 18:37 UTC (permalink / raw)
  To: Riana Tauro
  Cc: intel-xe, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, ravi.kishore.koppuravuri, mallesh.koujalagi

On Wed, Mar 04, 2026 at 05:52:38PM +0100, Raag Jadav wrote:
> On Mon, Mar 02, 2026 at 03:52:03PM +0530, Riana Tauro wrote:
> > Uncorrectable Core-Compute errors are classified into Global and Local
> > errors.
> > 
> > Global error is an error that affects the entire device requiring a
> > reset. This type of error is not isolated. When an AER is reported and
> > error_detected is invoked return PCI_ERS_RESULT_NEED_RESET.
> > 
> > A Local error is confined to a specific component or context like a
> > engine. These errors can be contained and recovered by resetting
> > only the affected part without distrupting the rest of the device.
> > 
> > Upon detection of an Uncorrectable Local Core-Compute error, an AER is
> > generated and GuC is notified of the error. The KMD then sets
> > the context as non-runnable and initiates an engine reset.
> > (TODO: GuC <->KMD communication for the error).

...

> > +	} while (response.additional_errors);
> 
> I know we're not NASA but I'd try to have some timeout instead of blindly
> trusting the hardware.

Or just break on something like MAX_ADDITIONAL_ERRORS if (by any luck)
it's mentioned in spec.

Raag

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 08/11] drm/xe/xe_ras: Add support for Uncorrectable Core-Compute errors
  2026-03-04 16:52   ` Raag Jadav
  2026-03-06 18:37     ` Raag Jadav
@ 2026-03-31 16:24     ` Tauro, Riana
  2026-04-01  6:34       ` Raag Jadav
  1 sibling, 1 reply; 43+ messages in thread
From: Tauro, Riana @ 2026-03-31 16:24 UTC (permalink / raw)
  To: Raag Jadav
  Cc: intel-xe, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, ravi.kishore.koppuravuri, mallesh.koujalagi


On 3/4/2026 10:22 PM, Raag Jadav wrote:
> On Mon, Mar 02, 2026 at 03:52:03PM +0530, Riana Tauro wrote:
>> Uncorrectable Core-Compute errors are classified into Global and Local
>> errors.
>>
>> Global error is an error that affects the entire device requiring a
>> reset. This type of error is not isolated. When an AER is reported and
>> error_detected is invoked return PCI_ERS_RESULT_NEED_RESET.
>>
>> A Local error is confined to a specific component or context like a
>> engine. These errors can be contained and recovered by resetting
>> only the affected part without distrupting the rest of the device.
>>
>> Upon detection of an Uncorrectable Local Core-Compute error, an AER is
>> generated and GuC is notified of the error. The KMD then sets
>> the context as non-runnable and initiates an engine reset.
>> (TODO: GuC <->KMD communication for the error).
> TODOs are more useful in the code, so we can actually find them ;)


Yeah but the code will not be in xe_ras.  I will add a comment on why we 
return recovered
for local uncorrectable GT error.

>
>> Since the error is contained and recovered, PCI error handling
>> callback returns PCI_ERS_RESULT_RECOVERED.
> ...
>
>> +enum xe_ras_recovery_action xe_ras_process_errors(struct xe_device *xe)
>> +{
>> +	struct xe_sysctrl_mailbox_command command = {0};
>> +	struct xe_ras_get_error_response response;
>> +	enum xe_ras_recovery_action final_action;
>> +	size_t rlen;
>> +	int ret;
>> +
>> +	/* Default action */
>> +	final_action = XE_RAS_RECOVERY_ACTION_RECOVERED;
>> +
>> +	if (!xe->info.has_sysctrl)
>> +		return XE_RAS_RECOVERY_ACTION_RESET;
>> +
>> +	xe_ras_prepare_sysctrl_command(&command, XE_SYSCTRL_CMD_GET_SOC_ERROR, NULL, 0,
>> +				       &response, sizeof(response));
>> +
>> +	do {
>> +		memset(&response, 0, sizeof(response));
>> +		rlen = 0;
>> +
>> +		ret = xe_sysctrl_send_command(xe, &command, &rlen);
>> +		if (ret || !rlen) {
> We'd probably want them to be separate cases so we know what actually
> happened. Besides, I think !rlen is redundant here since you're already
> handling it below.
Sure will fix this.
>
>> +			xe_err(xe, "[RAS]: Sysctrl error ret %d\n", ret);
>> +			goto err;
>> +		}
>> +
>> +		if (rlen != sizeof(response)) {
>> +			xe_err(xe, "[RAS]: Sysctrl response does not match len!!\n");
> I'd print rlen as well.
>
>> +			goto err;
>> +		}
>> +
>> +		if (response.num_errors > XE_RAS_NUM_ERROR_ARR) {
> I'd handle this as part of for loop below so we atleast have the chance
> to recover based on initial errors.
yeah makes sense. Will add it as part of the loop
>
>> +			xe_err(xe, "[RAS]: Number of errors out of bound (%d)\n",
>> +			       XE_RAS_NUM_ERROR_ARR);
>> +			goto err;
>> +		}
>> +
>> +		for (int i = 0; i < response.num_errors; i++) {
> 		for (int i = 0; i < response.num_errors && i < XE_RAS_NUM_ERROR_ARR; i++)
>
>> +			struct xe_ras_error_array arr = response.error_arr[i];
>> +			enum xe_ras_recovery_action action;
>> +			struct xe_ras_error_class error_class;
>> +			u8 component;
>> +
>> +			error_class = arr.error_class;
>> +			component = error_class.common.component;
>> +
>> +			switch (component) {
>> +			case XE_RAS_COMPONENT_CORE_COMPUTE:
>> +				action = handle_compute_errors(xe, &arr);
>> +				break;
>> +			default:
>> +				xe_err(xe, "[RAS]: Unknown error component %u\n", component);
>> +				break;
>> +			}
>> +
>> +			/*
>> +			 * Retain the highest severity action. Process and log all errors
>> +			 * and then take appropriate recovery action
> Punctuations.
>
>> +			 */
>> +			if (action > final_action)
>> +				final_action = action;
>> +		}
>> +
>> +	} while (response.additional_errors);
> I know we're not NASA but I'd try to have some timeout instead of blindly
> trusting the hardware.

additional_errors is a indication. So will be 0/1.
Timeout is already present in sysctrl_send_command. So unnecessary here 
again.

Also before every sysctrl command we are setting response to 0.

Thanks
Riana


>
> Raag
>
>> +	return final_action;
>> +
>> +err:
>> +	return XE_RAS_RECOVERY_ACTION_RESET;
>> +}

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 08/11] drm/xe/xe_ras: Add support for Uncorrectable Core-Compute errors
  2026-03-31 16:24     ` Tauro, Riana
@ 2026-04-01  6:34       ` Raag Jadav
  2026-04-01  6:47         ` Tauro, Riana
  0 siblings, 1 reply; 43+ messages in thread
From: Raag Jadav @ 2026-04-01  6:34 UTC (permalink / raw)
  To: Tauro, Riana
  Cc: intel-xe, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, ravi.kishore.koppuravuri, mallesh.koujalagi

On Tue, Mar 31, 2026 at 09:54:32PM +0530, Tauro, Riana wrote:
> On 3/4/2026 10:22 PM, Raag Jadav wrote:
> > On Mon, Mar 02, 2026 at 03:52:03PM +0530, Riana Tauro wrote:
> > > Uncorrectable Core-Compute errors are classified into Global and Local
> > > errors.

...

> > > +	} while (response.additional_errors);
> > I know we're not NASA but I'd try to have some timeout instead of blindly
> > trusting the hardware.
> 
> additional_errors is a indication. So will be 0/1.
> Timeout is already present in sysctrl_send_command. So unnecessary here
> again.
> 
> Also before every sysctrl command we are setting response to 0.

If the firmware returns the same values repeatedly, we'll be stuck here
indefinitely. I know it's extreme corner case and probably a firmware
bug, but that's not an excuse for the driver to not handle it.

I've handled it[1] as flooding, not ideal but convenient for now.

[1] https://lore.kernel.org/intel-xe/20260331102346.1034100-3-raag.jadav@intel.com/

Raag

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 08/11] drm/xe/xe_ras: Add support for Uncorrectable Core-Compute errors
  2026-04-01  6:34       ` Raag Jadav
@ 2026-04-01  6:47         ` Tauro, Riana
  0 siblings, 0 replies; 43+ messages in thread
From: Tauro, Riana @ 2026-04-01  6:47 UTC (permalink / raw)
  To: Raag Jadav
  Cc: intel-xe, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, ravi.kishore.koppuravuri, mallesh.koujalagi


On 4/1/2026 12:04 PM, Raag Jadav wrote:
> On Tue, Mar 31, 2026 at 09:54:32PM +0530, Tauro, Riana wrote:
>> On 3/4/2026 10:22 PM, Raag Jadav wrote:
>>> On Mon, Mar 02, 2026 at 03:52:03PM +0530, Riana Tauro wrote:
>>>> Uncorrectable Core-Compute errors are classified into Global and Local
>>>> errors.
> ...
>
>>>> +	} while (response.additional_errors);
>>> I know we're not NASA but I'd try to have some timeout instead of blindly
>>> trusting the hardware.
>> additional_errors is a indication. So will be 0/1.
>> Timeout is already present in sysctrl_send_command. So unnecessary here
>> again.
>>
>> Also before every sysctrl command we are setting response to 0.
> If the firmware returns the same values repeatedly, we'll be stuck here
> indefinitely. I know it's extreme corner case and probably a firmware
> bug, but that's not an excuse for the driver to not handle it.
>
> I've handled it[1] as flooding, not ideal but convenient for now.

In your case you do have a max count as 16. But for this command there 
is no such count.
Let me check with fw and add some check.

Thanks
Riana

>
> [1] https://lore.kernel.org/intel-xe/20260331102346.1034100-3-raag.jadav@intel.com/
>
> Raag

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [v2,08/11] drm/xe/xe_ras: Add support for Uncorrectable Core-Compute errors
  2026-03-02 10:22 ` [PATCH v2 08/11] drm/xe/xe_ras: Add support for Uncorrectable Core-Compute errors Riana Tauro
  2026-03-04 16:52   ` Raag Jadav
@ 2026-03-06  3:50   ` Purkait, Soham
  2026-03-31 16:16     ` Tauro, Riana
  1 sibling, 1 reply; 43+ messages in thread
From: Purkait, Soham @ 2026-03-06  3:50 UTC (permalink / raw)
  To: Riana Tauro, intel-xe
  Cc: anshuman.gupta, rodrigo.vivi, aravind.iddamsetty, badal.nilawar,
	raag.jadav, ravi.kishore.koppuravuri, mallesh.koujalagi

Hi Riana,

On 02-03-2026 15:52, Riana Tauro wrote:
> Uncorrectable Core-Compute errors are classified into Global and Local
> errors.
>
> Global error is an error that affects the entire device requiring a
> reset. This type of error is not isolated. When an AER is reported and
> error_detected is invoked return PCI_ERS_RESULT_NEED_RESET.
>
> A Local error is confined to a specific component or context like a
> engine. These errors can be contained and recovered by resetting
> only the affected part without distrupting the rest of the device.
>
> Upon detection of an Uncorrectable Local Core-Compute error, an AER is
> generated and GuC is notified of the error. The KMD then sets
> the context as non-runnable and initiates an engine reset.
> (TODO: GuC <->KMD communication for the error).
> Since the error is contained and recovered, PCI error handling
> callback returns PCI_ERS_RESULT_RECOVERED.
>
> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
> ---
> v2: add newline and fix log
>      add bounds check (Mallesh)
>      add ras specific enum (Raag)
>      helper for sysctrl prepare command
>      process all errors before deciding recovery action
> ---
>   drivers/gpu/drm/xe/xe_ras.c       | 139 ++++++++++++++++++++++++++++++
>   drivers/gpu/drm/xe/xe_ras.h       |   3 +
>   drivers/gpu/drm/xe/xe_ras_types.h |  16 ++++
>   3 files changed, 158 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
> index 3bef589082d7..61c01a4bfadb 100644
> --- a/drivers/gpu/drm/xe/xe_ras.c
> +++ b/drivers/gpu/drm/xe/xe_ras.c
> @@ -4,7 +4,14 @@
>    */
>   
>   #include "xe_device_types.h"
> +#include "xe_printk.h"
>   #include "xe_ras.h"
> +#include "xe_ras_types.h"
> +#include "xe_sysctrl_mailbox.h"
> +#include "xe_sysctrl_mailbox_types.h"
> +
> +#define COMPUTE_ERROR_SEVERITY_MASK		GENMASK(26, 25)
> +#define GLOBAL_UNCORR_ERROR			2
>   
>   /* Severity classification of detected errors */
>   enum xe_ras_severity {
> @@ -62,6 +69,138 @@ static inline const char *comp_to_str(struct xe_device *xe, u32 comp)
>   	return xe_ras_components[comp];
>   }
>   
> +static void log_ras_error(struct xe_device *xe, struct xe_ras_error_class *error_class)
> +{
> +	struct xe_ras_error_common common_info = error_class->common;
> +	struct xe_ras_error_product product_info = error_class->product;
> +	u8 tile = product_info.unit.tile;
> +	u32 instance = product_info.unit.instance;
> +	u32 cause = product_info.error_cause.cause;
> +
> +	xe_err(xe, "[RAS]: Tile%u, Instance %u, %s %s Error detected Cause: 0x%x\n",
> +	       tile, instance, severity_to_str(xe, common_info.severity),
> +	       comp_to_str(xe, common_info.component), cause);
> +}
> +
> +static enum xe_ras_recovery_action handle_compute_errors(struct xe_device *xe,
> +							 struct xe_ras_error_array *arr)
> +{
> +	struct xe_ras_compute_error *error_info = (struct xe_ras_compute_error *)arr->error_details;
> +	u8 uncorr_type;
> +
> +	uncorr_type = FIELD_GET(COMPUTE_ERROR_SEVERITY_MASK, error_info->error_log_header);
> +	log_ras_error(xe, &arr->error_class);
> +
> +	xe_err(xe, "[RAS]: Core Compute Error: timestamp %llu Uncorrected error type %u\n",
> +	       arr->timestamp, uncorr_type);
> +
> +	/* Request a RESET if error is global */
> +	if (uncorr_type == GLOBAL_UNCORR_ERROR)
> +		return XE_RAS_RECOVERY_ACTION_RESET;
> +
> +	/* Local errors are recovered using a engine reset */
> +	return XE_RAS_RECOVERY_ACTION_RECOVERED;
> +}
> +
> +static void xe_ras_prepare_sysctrl_command(struct xe_sysctrl_mailbox_command *command,

You can drop prefix for static functions.

Thanks,
Soham

> +					   u32 cmd_mask, void *request, size_t request_len,
> +					   void *response, size_t response_len)
> +{
> +	struct xe_sysctrl_mailbox_app_msg_hdr hdr = {0};
> +	u32 req_hdr;
> +
> +	req_hdr = FIELD_PREP(APP_HDR_GROUP_ID_MASK, XE_SYSCTRL_GROUP_GFSP) |
> +		  FIELD_PREP(APP_HDR_COMMAND_MASK, cmd_mask);
> +
> +	hdr.data = req_hdr;
> +	command->header = hdr;
> +	command->data_in = request;
> +	command->data_in_len = request_len;
> +	command->data_out = response;
> +	command->data_out_len = response_len;
> +}
> +
> +/**
> + * xe_ras_process_errors - Process and contain hardware errors
> + * @xe: xe device instance
> + *
> + * Get error details from system controller and return recovery
> + * method. Called only from PCI error handling.
> + *
> + * Returns: recovery action to be taken
> + */
> +enum xe_ras_recovery_action xe_ras_process_errors(struct xe_device *xe)
> +{
> +	struct xe_sysctrl_mailbox_command command = {0};
> +	struct xe_ras_get_error_response response;
> +	enum xe_ras_recovery_action final_action;
> +	size_t rlen;
> +	int ret;
> +
> +	/* Default action */
> +	final_action = XE_RAS_RECOVERY_ACTION_RECOVERED;
> +
> +	if (!xe->info.has_sysctrl)
> +		return XE_RAS_RECOVERY_ACTION_RESET;
> +
> +	xe_ras_prepare_sysctrl_command(&command, XE_SYSCTRL_CMD_GET_SOC_ERROR, NULL, 0,
> +				       &response, sizeof(response));
> +
> +	do {
> +		memset(&response, 0, sizeof(response));
> +		rlen = 0;
> +
> +		ret = xe_sysctrl_send_command(xe, &command, &rlen);
> +		if (ret || !rlen) {
> +			xe_err(xe, "[RAS]: Sysctrl error ret %d\n", ret);
> +			goto err;
> +		}
> +
> +		if (rlen != sizeof(response)) {
> +			xe_err(xe, "[RAS]: Sysctrl response does not match len!!\n");
> +			goto err;
> +		}
> +
> +		if (response.num_errors > XE_RAS_NUM_ERROR_ARR) {
> +			xe_err(xe, "[RAS]: Number of errors out of bound (%d)\n",
> +			       XE_RAS_NUM_ERROR_ARR);
> +			goto err;
> +		}
> +
> +		for (int i = 0; i < response.num_errors; i++) {
> +			struct xe_ras_error_array arr = response.error_arr[i];
> +			enum xe_ras_recovery_action action;
> +			struct xe_ras_error_class error_class;
> +			u8 component;
> +
> +			error_class = arr.error_class;
> +			component = error_class.common.component;
> +
> +			switch (component) {
> +			case XE_RAS_COMPONENT_CORE_COMPUTE:
> +				action = handle_compute_errors(xe, &arr);
> +				break;
> +			default:
> +				xe_err(xe, "[RAS]: Unknown error component %u\n", component);
> +				break;
> +			}
> +
> +			/*
> +			 * Retain the highest severity action. Process and log all errors
> +			 * and then take appropriate recovery action
> +			 */
> +			if (action > final_action)
> +				final_action = action;
> +		}
> +
> +	} while (response.additional_errors);
> +
> +	return final_action;
> +
> +err:
> +	return XE_RAS_RECOVERY_ACTION_RESET;
> +}
> +
>   #ifdef CONFIG_PCIEAER
>   static void aer_unmask_and_downgrade_internal_error(struct xe_device *xe)
>   {
> diff --git a/drivers/gpu/drm/xe/xe_ras.h b/drivers/gpu/drm/xe/xe_ras.h
> index 14cb973603e7..e191ab80080c 100644
> --- a/drivers/gpu/drm/xe/xe_ras.h
> +++ b/drivers/gpu/drm/xe/xe_ras.h
> @@ -6,8 +6,11 @@
>   #ifndef _XE_RAS_H_
>   #define _XE_RAS_H_
>   
> +#include "xe_ras_types.h"
> +
>   struct xe_device;
>   
>   void xe_ras_init(struct xe_device *xe);
> +enum xe_ras_recovery_action  xe_ras_process_errors(struct xe_device *xe);
>   
>   #endif
> diff --git a/drivers/gpu/drm/xe/xe_ras_types.h b/drivers/gpu/drm/xe/xe_ras_types.h
> index 676755732ef6..221d07efd84c 100644
> --- a/drivers/gpu/drm/xe/xe_ras_types.h
> +++ b/drivers/gpu/drm/xe/xe_ras_types.h
> @@ -11,6 +11,22 @@
>   #define XE_RAS_NUM_ERROR_ARR		3
>   #define XE_RAS_MAX_ERROR_DETAILS	16
>   
> +/**
> + * enum xe_ras_recovery_action - RAS recovery actions
> + *
> + * @XE_RAS_RECOVERY_ACTION_RECOVERED: Error recovered
> + * @XE_RAS_RECOVERY_ACTION_RESET: Requires reset
> + * @XE_RAS_RECOVERY_ACTION_DISCONNECT: Requires disconnect
> + *
> + * This enum defines the possible recovery actions that can be taken in response
> + * to RAS errors.
> + */
> +enum xe_ras_recovery_action {
> +	XE_RAS_RECOVERY_ACTION_RECOVERED = 0,
> +	XE_RAS_RECOVERY_ACTION_RESET,
> +	XE_RAS_RECOVERY_ACTION_DISCONNECT
> +};
> +
>   /**
>    * struct xe_ras_error_common - Common RAS error class
>    *

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [v2,08/11] drm/xe/xe_ras: Add support for Uncorrectable Core-Compute errors
  2026-03-06  3:50   ` [v2,08/11] " Purkait, Soham
@ 2026-03-31 16:16     ` Tauro, Riana
  0 siblings, 0 replies; 43+ messages in thread
From: Tauro, Riana @ 2026-03-31 16:16 UTC (permalink / raw)
  To: Purkait, Soham, intel-xe
  Cc: anshuman.gupta, rodrigo.vivi, aravind.iddamsetty, badal.nilawar,
	raag.jadav, ravi.kishore.koppuravuri, mallesh.koujalagi


On 3/6/2026 9:20 AM, Purkait, Soham wrote:
> Hi Riana,
>
> On 02-03-2026 15:52, Riana Tauro wrote:
>> Uncorrectable Core-Compute errors are classified into Global and Local
>> errors.
>>
>> Global error is an error that affects the entire device requiring a
>> reset. This type of error is not isolated. When an AER is reported and
>> error_detected is invoked return PCI_ERS_RESULT_NEED_RESET.
>>
>> A Local error is confined to a specific component or context like a
>> engine. These errors can be contained and recovered by resetting
>> only the affected part without distrupting the rest of the device.
>>
>> Upon detection of an Uncorrectable Local Core-Compute error, an AER is
>> generated and GuC is notified of the error. The KMD then sets
>> the context as non-runnable and initiates an engine reset.
>> (TODO: GuC <->KMD communication for the error).
>> Since the error is contained and recovered, PCI error handling
>> callback returns PCI_ERS_RESULT_RECOVERED.
>>
>> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
>> ---
>> v2: add newline and fix log
>>      add bounds check (Mallesh)
>>      add ras specific enum (Raag)
>>      helper for sysctrl prepare command
>>      process all errors before deciding recovery action
>> ---
>>   drivers/gpu/drm/xe/xe_ras.c       | 139 ++++++++++++++++++++++++++++++
>>   drivers/gpu/drm/xe/xe_ras.h       |   3 +
>>   drivers/gpu/drm/xe/xe_ras_types.h |  16 ++++
>>   3 files changed, 158 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
>> index 3bef589082d7..61c01a4bfadb 100644
>> --- a/drivers/gpu/drm/xe/xe_ras.c
>> +++ b/drivers/gpu/drm/xe/xe_ras.c
>> @@ -4,7 +4,14 @@
>>    */
>>     #include "xe_device_types.h"
>> +#include "xe_printk.h"
>>   #include "xe_ras.h"
>> +#include "xe_ras_types.h"
>> +#include "xe_sysctrl_mailbox.h"
>> +#include "xe_sysctrl_mailbox_types.h"
>> +
>> +#define COMPUTE_ERROR_SEVERITY_MASK        GENMASK(26, 25)
>> +#define GLOBAL_UNCORR_ERROR            2
>>     /* Severity classification of detected errors */
>>   enum xe_ras_severity {
>> @@ -62,6 +69,138 @@ static inline const char *comp_to_str(struct 
>> xe_device *xe, u32 comp)
>>       return xe_ras_components[comp];
>>   }
>>   +static void log_ras_error(struct xe_device *xe, struct 
>> xe_ras_error_class *error_class)
>> +{
>> +    struct xe_ras_error_common common_info = error_class->common;
>> +    struct xe_ras_error_product product_info = error_class->product;
>> +    u8 tile = product_info.unit.tile;
>> +    u32 instance = product_info.unit.instance;
>> +    u32 cause = product_info.error_cause.cause;
>> +
>> +    xe_err(xe, "[RAS]: Tile%u, Instance %u, %s %s Error detected 
>> Cause: 0x%x\n",
>> +           tile, instance, severity_to_str(xe, common_info.severity),
>> +           comp_to_str(xe, common_info.component), cause);
>> +}
>> +
>> +static enum xe_ras_recovery_action handle_compute_errors(struct 
>> xe_device *xe,
>> +                             struct xe_ras_error_array *arr)
>> +{
>> +    struct xe_ras_compute_error *error_info = (struct 
>> xe_ras_compute_error *)arr->error_details;
>> +    u8 uncorr_type;
>> +
>> +    uncorr_type = FIELD_GET(COMPUTE_ERROR_SEVERITY_MASK, 
>> error_info->error_log_header);
>> +    log_ras_error(xe, &arr->error_class);
>> +
>> +    xe_err(xe, "[RAS]: Core Compute Error: timestamp %llu 
>> Uncorrected error type %u\n",
>> +           arr->timestamp, uncorr_type);
>> +
>> +    /* Request a RESET if error is global */
>> +    if (uncorr_type == GLOBAL_UNCORR_ERROR)
>> +        return XE_RAS_RECOVERY_ACTION_RESET;
>> +
>> +    /* Local errors are recovered using a engine reset */
>> +    return XE_RAS_RECOVERY_ACTION_RECOVERED;
>> +}
>> +
>> +static void xe_ras_prepare_sysctrl_command(struct 
>> xe_sysctrl_mailbox_command *command,
>
> You can drop prefix for static functions.

Sure will fix this

Thanks
Riana

>
> Thanks,
> Soham
>
>> +                       u32 cmd_mask, void *request, size_t request_len,
>> +                       void *response, size_t response_len)
>> +{
>> +    struct xe_sysctrl_mailbox_app_msg_hdr hdr = {0};
>> +    u32 req_hdr;
>> +
>> +    req_hdr = FIELD_PREP(APP_HDR_GROUP_ID_MASK, 
>> XE_SYSCTRL_GROUP_GFSP) |
>> +          FIELD_PREP(APP_HDR_COMMAND_MASK, cmd_mask);
>> +
>> +    hdr.data = req_hdr;
>> +    command->header = hdr;
>> +    command->data_in = request;
>> +    command->data_in_len = request_len;
>> +    command->data_out = response;
>> +    command->data_out_len = response_len;
>> +}
>> +
>> +/**
>> + * xe_ras_process_errors - Process and contain hardware errors
>> + * @xe: xe device instance
>> + *
>> + * Get error details from system controller and return recovery
>> + * method. Called only from PCI error handling.
>> + *
>> + * Returns: recovery action to be taken
>> + */
>> +enum xe_ras_recovery_action xe_ras_process_errors(struct xe_device *xe)
>> +{
>> +    struct xe_sysctrl_mailbox_command command = {0};
>> +    struct xe_ras_get_error_response response;
>> +    enum xe_ras_recovery_action final_action;
>> +    size_t rlen;
>> +    int ret;
>> +
>> +    /* Default action */
>> +    final_action = XE_RAS_RECOVERY_ACTION_RECOVERED;
>> +
>> +    if (!xe->info.has_sysctrl)
>> +        return XE_RAS_RECOVERY_ACTION_RESET;
>> +
>> +    xe_ras_prepare_sysctrl_command(&command, 
>> XE_SYSCTRL_CMD_GET_SOC_ERROR, NULL, 0,
>> +                       &response, sizeof(response));
>> +
>> +    do {
>> +        memset(&response, 0, sizeof(response));
>> +        rlen = 0;
>> +
>> +        ret = xe_sysctrl_send_command(xe, &command, &rlen);
>> +        if (ret || !rlen) {
>> +            xe_err(xe, "[RAS]: Sysctrl error ret %d\n", ret);
>> +            goto err;
>> +        }
>> +
>> +        if (rlen != sizeof(response)) {
>> +            xe_err(xe, "[RAS]: Sysctrl response does not match 
>> len!!\n");
>> +            goto err;
>> +        }
>> +
>> +        if (response.num_errors > XE_RAS_NUM_ERROR_ARR) {
>> +            xe_err(xe, "[RAS]: Number of errors out of bound (%d)\n",
>> +                   XE_RAS_NUM_ERROR_ARR);
>> +            goto err;
>> +        }
>> +
>> +        for (int i = 0; i < response.num_errors; i++) {
>> +            struct xe_ras_error_array arr = response.error_arr[i];
>> +            enum xe_ras_recovery_action action;
>> +            struct xe_ras_error_class error_class;
>> +            u8 component;
>> +
>> +            error_class = arr.error_class;
>> +            component = error_class.common.component;
>> +
>> +            switch (component) {
>> +            case XE_RAS_COMPONENT_CORE_COMPUTE:
>> +                action = handle_compute_errors(xe, &arr);
>> +                break;
>> +            default:
>> +                xe_err(xe, "[RAS]: Unknown error component %u\n", 
>> component);
>> +                break;
>> +            }
>> +
>> +            /*
>> +             * Retain the highest severity action. Process and log 
>> all errors
>> +             * and then take appropriate recovery action
>> +             */
>> +            if (action > final_action)
>> +                final_action = action;
>> +        }
>> +
>> +    } while (response.additional_errors);
>> +
>> +    return final_action;
>> +
>> +err:
>> +    return XE_RAS_RECOVERY_ACTION_RESET;
>> +}
>> +
>>   #ifdef CONFIG_PCIEAER
>>   static void aer_unmask_and_downgrade_internal_error(struct 
>> xe_device *xe)
>>   {
>> diff --git a/drivers/gpu/drm/xe/xe_ras.h b/drivers/gpu/drm/xe/xe_ras.h
>> index 14cb973603e7..e191ab80080c 100644
>> --- a/drivers/gpu/drm/xe/xe_ras.h
>> +++ b/drivers/gpu/drm/xe/xe_ras.h
>> @@ -6,8 +6,11 @@
>>   #ifndef _XE_RAS_H_
>>   #define _XE_RAS_H_
>>   +#include "xe_ras_types.h"
>> +
>>   struct xe_device;
>>     void xe_ras_init(struct xe_device *xe);
>> +enum xe_ras_recovery_action  xe_ras_process_errors(struct xe_device 
>> *xe);
>>     #endif
>> diff --git a/drivers/gpu/drm/xe/xe_ras_types.h 
>> b/drivers/gpu/drm/xe/xe_ras_types.h
>> index 676755732ef6..221d07efd84c 100644
>> --- a/drivers/gpu/drm/xe/xe_ras_types.h
>> +++ b/drivers/gpu/drm/xe/xe_ras_types.h
>> @@ -11,6 +11,22 @@
>>   #define XE_RAS_NUM_ERROR_ARR        3
>>   #define XE_RAS_MAX_ERROR_DETAILS    16
>>   +/**
>> + * enum xe_ras_recovery_action - RAS recovery actions
>> + *
>> + * @XE_RAS_RECOVERY_ACTION_RECOVERED: Error recovered
>> + * @XE_RAS_RECOVERY_ACTION_RESET: Requires reset
>> + * @XE_RAS_RECOVERY_ACTION_DISCONNECT: Requires disconnect
>> + *
>> + * This enum defines the possible recovery actions that can be taken 
>> in response
>> + * to RAS errors.
>> + */
>> +enum xe_ras_recovery_action {
>> +    XE_RAS_RECOVERY_ACTION_RECOVERED = 0,
>> +    XE_RAS_RECOVERY_ACTION_RESET,
>> +    XE_RAS_RECOVERY_ACTION_DISCONNECT
>> +};
>> +
>>   /**
>>    * struct xe_ras_error_common - Common RAS error class
>>    *

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v2 09/11] drm/xe/xe_ras: Add structures for SoC Internal errors
  2026-03-02 10:21 [PATCH v2 00/11] Introduce Xe Uncorrectable Error Handling Riana Tauro
                   ` (7 preceding siblings ...)
  2026-03-02 10:22 ` [PATCH v2 08/11] drm/xe/xe_ras: Add support for Uncorrectable Core-Compute errors Riana Tauro
@ 2026-03-02 10:22 ` Riana Tauro
  2026-03-10 13:02   ` Mallesh, Koujalagi
  2026-03-02 10:22 ` [PATCH v2 10/11] drm/xe/xe_ras: Handle Uncorrectable " Riana Tauro
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 43+ messages in thread
From: Riana Tauro @ 2026-03-02 10:22 UTC (permalink / raw)
  To: intel-xe
  Cc: riana.tauro, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, raag.jadav, ravi.kishore.koppuravuri,
	mallesh.koujalagi

Add response structures for SoC Internal errors.

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
---
 drivers/gpu/drm/xe/xe_ras_types.h | 134 ++++++++++++++++++++++++++++++
 1 file changed, 134 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_ras_types.h b/drivers/gpu/drm/xe/xe_ras_types.h
index 221d07efd84c..466db9f47127 100644
--- a/drivers/gpu/drm/xe/xe_ras_types.h
+++ b/drivers/gpu/drm/xe/xe_ras_types.h
@@ -145,4 +145,138 @@ struct xe_ras_compute_error {
 	u32 spare_log3;
 } __packed;
 
+/**
+ * struct xe_ras_soc_error_source - Source of SOC error
+ */
+struct xe_ras_soc_error_source {
+	/** @csc: CSC error */
+	u32 csc:1;
+	/** @soc: SOC error */
+	u32 soc:1;
+	/** @reserved: Reserved for future use */
+	u32 reserved:30;
+} __packed;
+
+/**
+ * struct xe_ras_soc_error - SOC error details
+ */
+struct xe_ras_soc_error {
+	/** @error_source: Error Source */
+	struct xe_ras_soc_error_source error_source;
+	/** @additional_details: Additional details */
+	u32 additional_details[15];
+} __packed;
+
+/**
+ * struct xe_ras_csc_error - CSC error details
+ */
+struct xe_ras_csc_error {
+	/** @hec_uncorr_err_status: CSC error */
+	u32 hec_uncorr_err_status;
+	/** @hec_uncorr_fw_err_dw0: CSC f/w error */
+	u32 hec_uncorr_fw_err_dw0;
+} __packed;
+
+/**
+ * struct xe_ras_ieh_error - SoC IEH error details
+ */
+struct xe_ras_ieh_error {
+	/** @ieh_instance: IEH instance */
+	u32 ieh_instance:2;
+	/** @reserved: Reserved for future use */
+	u32 reserved:30;
+	union {
+		/** @global_error_status: Global error status */
+		u32 global_error_status;
+		/** @error_sources_ieh0: Error sources for IEH0 */
+		struct {
+			/** @psf0_psf1_npk: PSF0, PSF1, NPK */
+			u32 psf0_psf1_npk:1;
+			/** @punit: PUNIT */
+			u32 punit:1;
+			/** @reserved_2: Reserved */
+			u32 reserved_2:1;
+			/** @oobmsm: OOBMSM */
+			u32 oobmsm:1;
+			/** @i2c: I2C */
+			u32 i2c:1;
+			/** @pciess_gpma: PCIESS GPMA */
+			u32 pciess_gpma:1;
+			/** @lpioss_pma: LPIOSS PMA */
+			u32 lpioss_pma:1;
+			/** @fabss0_pma: FabSS0 PMA */
+			u32 fabss0_pma:1;
+			/** @fabss1_pma: FabSS1 PMA */
+			u32 fabss1_pma:1;
+			/** @reserved_9: Reserved */
+			u32 reserved_9:1;
+			/** @reserved_10: Reserved */
+			u32 reserved_10:1;
+			/** @reserved_11: Reserved */
+			u32 reserved_11:1;
+			/** @reserved_12: Reserved */
+			u32 reserved_12:1;
+			/** @reserved_13: Reserved */
+			u32 reserved_13:1;
+			/** @memss_ieh1: MEMSS -> IEH1 */
+			u32 memss_ieh1:1;
+			/** @memss_ieh2: MEMSS -> IEH2 */
+			u32 memss_ieh2:1;
+			/** @saf0_mhb0: SAF0 MHB0 */
+			u32 saf0_mhb0:1;
+			/** @saf0_mhb1: SAF0 MHB1 */
+			u32 saf0_mhb1:1;
+			 /** @saf0_mhb2: SAF0 MHB2 */
+			u32 saf0_mhb2:1;
+			/** @saf0_mhb3: SAF0 MHB3 */
+			u32 saf0_mhb3:1;
+			/** @saf0_mhb4: SAF0 MHB4 */
+			u32 saf0_mhb4:1;
+			/** @saf0_mhb5: SAF0 MHB5 */
+			u32 saf0_mhb5:1;
+			/** @saf0_mhb6: SAF0 MHB6 */
+			u32 saf0_mhb6:1;
+			/** @saf0_mhb7: SAF0 MHB7 */
+			u32 saf0_mhb7:1;
+			/** @saf1_mhb0: SAF1 MHB0 */
+			u32 saf1_mhb0:1;
+			/** @saf1_mhb1: SAF1 MHB1 */
+			u32 saf1_mhb1:1;
+			/** @saf1_mhb2: SAF1 MHB2 */
+			u32 saf1_mhb2:1;
+			/** @saf1_mhb3: SAF1 MHB3 */
+			u32 saf1_mhb3:1;
+			/** @saf1_mhb4: SAF1 MHB4 */
+			u32 saf1_mhb4:1;
+			/** @saf1_mhb5: SAF1 MHB5 */
+			u32 saf1_mhb5:1;
+			/** @saf1_mhb6: SAF1 MHB6 */
+			u32 saf1_mhb6:1;
+			/** @saf1_mhb7: SAF1 MHB7 */
+			u32 saf1_mhb7:1;
+		} error_sources_ieh0;
+	};
+
+	/** @lerr_status_ieh0: Local error status of IEH0 */
+	struct {
+		/** @reserved_0: Reserved for future use */
+		u32 reserved_0:1;
+		/** @psf0: PSF0 */
+		u32 psf0:1;
+		/** @psf1: PSF1 */
+		u32 psf1:1;
+		/** @reserved_26: Reserved */
+		u32 reserved_26:26;
+		/** @npk: NPK */
+		u32 npk:1;
+		/** @reserved_30: Reserved */
+		u32 reserved_30:2;
+	} lerr_status_ieh0;
+
+	/** @gerr_mask: Global error mask */
+	u32 gerr_mask;
+	/** @additional_info: Additional information */
+	u32 additional_info[10];
+} __packed;
+
 #endif
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 09/11] drm/xe/xe_ras: Add structures for SoC Internal errors
  2026-03-02 10:22 ` [PATCH v2 09/11] drm/xe/xe_ras: Add structures for SoC Internal errors Riana Tauro
@ 2026-03-10 13:02   ` Mallesh, Koujalagi
  2026-03-11 14:51     ` Riana Tauro
  0 siblings, 1 reply; 43+ messages in thread
From: Mallesh, Koujalagi @ 2026-03-10 13:02 UTC (permalink / raw)
  To: Riana Tauro
  Cc: anshuman.gupta, rodrigo.vivi, aravind.iddamsetty, badal.nilawar,
	raag.jadav, ravi.kishore.koppuravuri, intel-xe


On 02-03-2026 03:52 pm, Riana Tauro wrote:
> Add response structures for SoC Internal errors.
>
> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_ras_types.h | 134 ++++++++++++++++++++++++++++++
>   1 file changed, 134 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/xe_ras_types.h b/drivers/gpu/drm/xe/xe_ras_types.h
> index 221d07efd84c..466db9f47127 100644
> --- a/drivers/gpu/drm/xe/xe_ras_types.h
> +++ b/drivers/gpu/drm/xe/xe_ras_types.h
> @@ -145,4 +145,138 @@ struct xe_ras_compute_error {
>   	u32 spare_log3;
>   } __packed;
>   
Add hardware component glossary to more understanding.
> +/**
> + * struct xe_ras_soc_error_source - Source of SOC error
> + */
> +struct xe_ras_soc_error_source {
> +	/** @csc: CSC error */
> +	u32 csc:1;
> +	/** @soc: SOC error */
> +	u32 soc:1;
> +	/** @reserved: Reserved for future use */
> +	u32 reserved:30;
> +} __packed;
> +
> +/**
> + * struct xe_ras_soc_error - SOC error details
> + */
> +struct xe_ras_soc_error {
> +	/** @error_source: Error Source */
> +	struct xe_ras_soc_error_source error_source;
> +	/** @additional_details: Additional details */
> +	u32 additional_details[15];
Use Macro
> +} __packed;
> +
> +/**
> + * struct xe_ras_csc_error - CSC error details
> + */
> +struct xe_ras_csc_error {
> +	/** @hec_uncorr_err_status: CSC error */
> +	u32 hec_uncorr_err_status;
> +	/** @hec_uncorr_fw_err_dw0: CSC f/w error */
> +	u32 hec_uncorr_fw_err_dw0;
> +} __packed;
> +
> +/**
> + * struct xe_ras_ieh_error - SoC IEH error details
> + */
> +struct xe_ras_ieh_error {
> +	/** @ieh_instance: IEH instance */
> +	u32 ieh_instance:2;
> +	/** @reserved: Reserved for future use */
> +	u32 reserved:30;
> +	union {
> +		/** @global_error_status: Global error status */
> +		u32 global_error_status;
> +		/** @error_sources_ieh0: Error sources for IEH0 */
> +		struct {
> +			/** @psf0_psf1_npk: PSF0, PSF1, NPK */
> +			u32 psf0_psf1_npk:1;
> +			/** @punit: PUNIT */
> +			u32 punit:1;
> +			/** @reserved_2: Reserved */
> +			u32 reserved_2:1;
> +			/** @oobmsm: OOBMSM */
> +			u32 oobmsm:1;
> +			/** @i2c: I2C */
> +			u32 i2c:1;
> +			/** @pciess_gpma: PCIESS GPMA */
> +			u32 pciess_gpma:1;
> +			/** @lpioss_pma: LPIOSS PMA */
> +			u32 lpioss_pma:1;
> +			/** @fabss0_pma: FabSS0 PMA */
> +			u32 fabss0_pma:1;
> +			/** @fabss1_pma: FabSS1 PMA */
> +			u32 fabss1_pma:1;
> +			/** @reserved_9: Reserved */
> +			u32 reserved_9:1;
> +			/** @reserved_10: Reserved */
> +			u32 reserved_10:1;
> +			/** @reserved_11: Reserved */
> +			u32 reserved_11:1;
> +			/** @reserved_12: Reserved */
> +			u32 reserved_12:1;
> +			/** @reserved_13: Reserved */
> +			u32 reserved_13:1;
> +			/** @memss_ieh1: MEMSS -> IEH1 */
> +			u32 memss_ieh1:1;
> +			/** @memss_ieh2: MEMSS -> IEH2 */
> +			u32 memss_ieh2:1;
> +			/** @saf0_mhb0: SAF0 MHB0 */
> +			u32 saf0_mhb0:1;
> +			/** @saf0_mhb1: SAF0 MHB1 */
> +			u32 saf0_mhb1:1;
> +			 /** @saf0_mhb2: SAF0 MHB2 */
Please remove space
> +			u32 saf0_mhb2:1;
> +			/** @saf0_mhb3: SAF0 MHB3 */
> +			u32 saf0_mhb3:1;
> +			/** @saf0_mhb4: SAF0 MHB4 */
> +			u32 saf0_mhb4:1;
> +			/** @saf0_mhb5: SAF0 MHB5 */
> +			u32 saf0_mhb5:1;
> +			/** @saf0_mhb6: SAF0 MHB6 */
> +			u32 saf0_mhb6:1;
> +			/** @saf0_mhb7: SAF0 MHB7 */
> +			u32 saf0_mhb7:1;
> +			/** @saf1_mhb0: SAF1 MHB0 */
> +			u32 saf1_mhb0:1;
> +			/** @saf1_mhb1: SAF1 MHB1 */
> +			u32 saf1_mhb1:1;
> +			/** @saf1_mhb2: SAF1 MHB2 */
> +			u32 saf1_mhb2:1;
> +			/** @saf1_mhb3: SAF1 MHB3 */
> +			u32 saf1_mhb3:1;
> +			/** @saf1_mhb4: SAF1 MHB4 */
> +			u32 saf1_mhb4:1;
> +			/** @saf1_mhb5: SAF1 MHB5 */
> +			u32 saf1_mhb5:1;
> +			/** @saf1_mhb6: SAF1 MHB6 */
> +			u32 saf1_mhb6:1;
> +			/** @saf1_mhb7: SAF1 MHB7 */
> +			u32 saf1_mhb7:1;
> +		} error_sources_ieh0;
> +	};
> +
> +	/** @lerr_status_ieh0: Local error status of IEH0 */
> +	struct {
> +		/** @reserved_0: Reserved for future use */
> +		u32 reserved_0:1;
> +		/** @psf0: PSF0 */
> +		u32 psf0:1;
> +		/** @psf1: PSF1 */
> +		u32 psf1:1;
> +		/** @reserved_26: Reserved */
> +		u32 reserved_26:26;
Reserved bit 3_28 right?, need to change name
> +		/** @npk: NPK */
> +		u32 npk:1;
> +		/** @reserved_30: Reserved */
> +		u32 reserved_30:2;
Reserved bit 30_31 right?
> +	} lerr_status_ieh0;
> +
> +	/** @gerr_mask: Global error mask */
> +	u32 gerr_mask;
> +	/** @additional_info: Additional information */
> +	u32 additional_info[10];

ditto above

Thanks

-/Mallesh

> +} __packed;
> +
>   #endif

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 09/11] drm/xe/xe_ras: Add structures for SoC Internal errors
  2026-03-10 13:02   ` Mallesh, Koujalagi
@ 2026-03-11 14:51     ` Riana Tauro
  0 siblings, 0 replies; 43+ messages in thread
From: Riana Tauro @ 2026-03-11 14:51 UTC (permalink / raw)
  To: Mallesh, Koujalagi
  Cc: anshuman.gupta, rodrigo.vivi, aravind.iddamsetty, badal.nilawar,
	raag.jadav, ravi.kishore.koppuravuri, intel-xe



On 3/10/2026 6:32 PM, Mallesh, Koujalagi wrote:
> 
> On 02-03-2026 03:52 pm, Riana Tauro wrote:
>> Add response structures for SoC Internal errors.
>>
>> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_ras_types.h | 134 ++++++++++++++++++++++++++++++
>>   1 file changed, 134 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_ras_types.h b/drivers/gpu/drm/xe/ 
>> xe_ras_types.h
>> index 221d07efd84c..466db9f47127 100644
>> --- a/drivers/gpu/drm/xe/xe_ras_types.h
>> +++ b/drivers/gpu/drm/xe/xe_ras_types.h
>> @@ -145,4 +145,138 @@ struct xe_ras_compute_error {
>>       u32 spare_log3;
>>   } __packed;
> Add hardware component glossary to more understanding.

Sorry didn't get this.

>> +/**
>> + * struct xe_ras_soc_error_source - Source of SOC error
>> + */
>> +struct xe_ras_soc_error_source {
>> +    /** @csc: CSC error */
>> +    u32 csc:1;
>> +    /** @soc: SOC error */
>> +    u32 soc:1;
>> +    /** @reserved: Reserved for future use */
>> +    u32 reserved:30;
>> +} __packed;
>> +
>> +/**
>> + * struct xe_ras_soc_error - SOC error details
>> + */
>> +struct xe_ras_soc_error {
>> +    /** @error_source: Error Source */
>> +    struct xe_ras_soc_error_source error_source;
>> +    /** @additional_details: Additional details */
>> +    u32 additional_details[15];
> Use Macro

This is remaining part of existing macro

+#define XE_RAS_MAX_ERROR_DETAILS	16

This is not used across files and adding a macro
for every array creates unnecessary defines for a single
file. If we explicitly change or use this array
we can add it.

>> +} __packed;
>> +
>> +/**
>> + * struct xe_ras_csc_error - CSC error details
>> + */
>> +struct xe_ras_csc_error {
>> +    /** @hec_uncorr_err_status: CSC error */
>> +    u32 hec_uncorr_err_status;
>> +    /** @hec_uncorr_fw_err_dw0: CSC f/w error */
>> +    u32 hec_uncorr_fw_err_dw0;
>> +} __packed;
>> +
>> +/**
>> + * struct xe_ras_ieh_error - SoC IEH error details
>> + */
>> +struct xe_ras_ieh_error {
>> +    /** @ieh_instance: IEH instance */
>> +    u32 ieh_instance:2;
>> +    /** @reserved: Reserved for future use */
>> +    u32 reserved:30;
>> +    union {
>> +        /** @global_error_status: Global error status */
>> +        u32 global_error_status;
>> +        /** @error_sources_ieh0: Error sources for IEH0 */
>> +        struct {
>> +            /** @psf0_psf1_npk: PSF0, PSF1, NPK */
>> +            u32 psf0_psf1_npk:1;
>> +            /** @punit: PUNIT */
>> +            u32 punit:1;
>> +            /** @reserved_2: Reserved */
>> +            u32 reserved_2:1;
>> +            /** @oobmsm: OOBMSM */
>> +            u32 oobmsm:1;
>> +            /** @i2c: I2C */
>> +            u32 i2c:1;
>> +            /** @pciess_gpma: PCIESS GPMA */
>> +            u32 pciess_gpma:1;
>> +            /** @lpioss_pma: LPIOSS PMA */
>> +            u32 lpioss_pma:1;
>> +            /** @fabss0_pma: FabSS0 PMA */
>> +            u32 fabss0_pma:1;
>> +            /** @fabss1_pma: FabSS1 PMA */
>> +            u32 fabss1_pma:1;
>> +            /** @reserved_9: Reserved */
>> +            u32 reserved_9:1;
>> +            /** @reserved_10: Reserved */
>> +            u32 reserved_10:1;
>> +            /** @reserved_11: Reserved */
>> +            u32 reserved_11:1;
>> +            /** @reserved_12: Reserved */
>> +            u32 reserved_12:1;
>> +            /** @reserved_13: Reserved */
>> +            u32 reserved_13:1;
>> +            /** @memss_ieh1: MEMSS -> IEH1 */
>> +            u32 memss_ieh1:1;
>> +            /** @memss_ieh2: MEMSS -> IEH2 */
>> +            u32 memss_ieh2:1;
>> +            /** @saf0_mhb0: SAF0 MHB0 */
>> +            u32 saf0_mhb0:1;
>> +            /** @saf0_mhb1: SAF0 MHB1 */
>> +            u32 saf0_mhb1:1;
>> +             /** @saf0_mhb2: SAF0 MHB2 */
> Please remove space

Thanks. Will remove it.

>> +            u32 saf0_mhb2:1;
>> +            /** @saf0_mhb3: SAF0 MHB3 */
>> +            u32 saf0_mhb3:1;
>> +            /** @saf0_mhb4: SAF0 MHB4 */
>> +            u32 saf0_mhb4:1;
>> +            /** @saf0_mhb5: SAF0 MHB5 */
>> +            u32 saf0_mhb5:1;
>> +            /** @saf0_mhb6: SAF0 MHB6 */
>> +            u32 saf0_mhb6:1;
>> +            /** @saf0_mhb7: SAF0 MHB7 */
>> +            u32 saf0_mhb7:1;
>> +            /** @saf1_mhb0: SAF1 MHB0 */
>> +            u32 saf1_mhb0:1;
>> +            /** @saf1_mhb1: SAF1 MHB1 */
>> +            u32 saf1_mhb1:1;
>> +            /** @saf1_mhb2: SAF1 MHB2 */
>> +            u32 saf1_mhb2:1;
>> +            /** @saf1_mhb3: SAF1 MHB3 */
>> +            u32 saf1_mhb3:1;
>> +            /** @saf1_mhb4: SAF1 MHB4 */
>> +            u32 saf1_mhb4:1;
>> +            /** @saf1_mhb5: SAF1 MHB5 */
>> +            u32 saf1_mhb5:1;
>> +            /** @saf1_mhb6: SAF1 MHB6 */
>> +            u32 saf1_mhb6:1;
>> +            /** @saf1_mhb7: SAF1 MHB7 */
>> +            u32 saf1_mhb7:1;
>> +        } error_sources_ieh0;
>> +    };
>> +
>> +    /** @lerr_status_ieh0: Local error status of IEH0 */
>> +    struct {
>> +        /** @reserved_0: Reserved for future use */
>> +        u32 reserved_0:1;
>> +        /** @psf0: PSF0 */
>> +        u32 psf0:1;
>> +        /** @psf1: PSF1 */
>> +        u32 psf1:1;
>> +        /** @reserved_26: Reserved */
>> +        u32 reserved_26:26;
> Reserved bit 3_28 right?, need to change name

Will just rename it to reserved 0,1,2 instead of bits

>> +        /** @npk: NPK */
>> +        u32 npk:1;
>> +        /** @reserved_30: Reserved */
>> +        u32 reserved_30:2;
> Reserved bit 30_31 right?
>> +    } lerr_status_ieh0;
>> +
>> +    /** @gerr_mask: Global error mask */
>> +    u32 gerr_mask;
>> +    /** @additional_info: Additional information */
>> +    u32 additional_info[10];
> 
> ditto above

Replied above

Thanks
Riana

> 
> Thanks
> 
> -/Mallesh
> 
>> +} __packed;
>> +
>>   #endif


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v2 10/11] drm/xe/xe_ras: Handle Uncorrectable SoC Internal errors
  2026-03-02 10:21 [PATCH v2 00/11] Introduce Xe Uncorrectable Error Handling Riana Tauro
                   ` (8 preceding siblings ...)
  2026-03-02 10:22 ` [PATCH v2 09/11] drm/xe/xe_ras: Add structures for SoC Internal errors Riana Tauro
@ 2026-03-02 10:22 ` Riana Tauro
  2026-03-10 13:29   ` Mallesh, Koujalagi
  2026-03-02 10:22 ` [PATCH v2 11/11] drm/xe/xe_pci_error: Process errors in mmio_enabled Riana Tauro
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 43+ messages in thread
From: Riana Tauro @ 2026-03-02 10:22 UTC (permalink / raw)
  To: intel-xe
  Cc: riana.tauro, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, raag.jadav, ravi.kishore.koppuravuri,
	mallesh.koujalagi

Some critical errors such as CSC firmware and Punit are reported under
SoC Internal Errors.

CSC errors are classified as hardware errors and firmware errors.
Hardware errors can be recovered using a SBR whereas firmware errors
are critical and require a firmware flash. On such errors, device will
be wedged and runtime survivability mode will be enabed to notify
userspace that a firmware flash is required.

PUNIT uncorrectable errors can only be recovered through a cold reset.
TODO: Wedge device and notify userspace that a cold reset is required.

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
---
 drivers/gpu/drm/xe/xe_ras.c | 52 +++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
index 61c01a4bfadb..f35d77654c8f 100644
--- a/drivers/gpu/drm/xe/xe_ras.c
+++ b/drivers/gpu/drm/xe/xe_ras.c
@@ -7,6 +7,7 @@
 #include "xe_printk.h"
 #include "xe_ras.h"
 #include "xe_ras_types.h"
+#include "xe_survivability_mode.h"
 #include "xe_sysctrl_mailbox.h"
 #include "xe_sysctrl_mailbox_types.h"
 
@@ -102,6 +103,54 @@ static enum xe_ras_recovery_action handle_compute_errors(struct xe_device *xe,
 	return XE_RAS_RECOVERY_ACTION_RECOVERED;
 }
 
+static enum xe_ras_recovery_action handle_soc_internal_errors(struct xe_device *xe,
+							      struct xe_ras_error_array *arr)
+{
+	struct xe_ras_soc_error *error_info = (struct xe_ras_soc_error *)arr->error_details;
+	struct xe_ras_soc_error_source source = error_info->error_source;
+	struct xe_ras_error_common common_info = arr->error_class.common;
+	enum xe_ras_recovery_action action;
+
+	/* Default action */
+	action = XE_RAS_RECOVERY_ACTION_RESET;
+
+	log_ras_error(xe, &arr->error_class);
+
+	if (source.csc) {
+		struct xe_ras_csc_error *csc_error = (struct xe_ras_csc_error *)error_info->additional_details;
+
+		/*
+		 * CSC uncorrectable errors are classified as hardware errors and firmware errors.
+		 * CSC firmware errors are critical errors that can be recovered only by firmware
+		 * update via SPI driver. PCODE enables FDO mode and sets the bit in the capability
+		 * register. On receiving this error, the driver enables runtime survivability mode
+		 * which notifies userspace that a firmware update is required.
+		 */
+		if (csc_error->hec_uncorr_fw_err_dw0) {
+			xe_err(xe, "[RAS]: CSC %s error detected: 0x%x\n",
+			       severity_to_str(xe, common_info.severity),
+			       csc_error->hec_uncorr_fw_err_dw0);
+			xe_survivability_mode_runtime_enable(xe);
+			action = XE_RAS_RECOVERY_ACTION_DISCONNECT;
+		}
+	}
+
+	if (source.soc) {
+		struct xe_ras_ieh_error *ieh_error = (struct xe_ras_ieh_error *)error_info->additional_details;
+
+		if (ieh_error->error_sources_ieh0.punit) {
+			xe_err(xe, "[RAS]: PUNIT %s error detected: 0x%x\n",
+			       severity_to_str(xe, common_info.severity),
+			       ieh_error->error_sources_ieh0.punit);
+			/** TODO: Add PUNIT error handling */
+			action = XE_RAS_RECOVERY_ACTION_DISCONNECT;
+		}
+	}
+
+	/* For other SOC internal errors, request a reset as recovery mechanism */
+	return action;
+}
+
 static void xe_ras_prepare_sysctrl_command(struct xe_sysctrl_mailbox_command *command,
 					   u32 cmd_mask, void *request, size_t request_len,
 					   void *response, size_t response_len)
@@ -180,6 +229,9 @@ enum xe_ras_recovery_action xe_ras_process_errors(struct xe_device *xe)
 			case XE_RAS_COMPONENT_CORE_COMPUTE:
 				action = handle_compute_errors(xe, &arr);
 				break;
+			case XE_RAS_COMPONENT_SOC_INTERNAL:
+				action = handle_soc_internal_errors(xe, &arr);
+				break;
 			default:
 				xe_err(xe, "[RAS]: Unknown error component %u\n", component);
 				break;
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 10/11] drm/xe/xe_ras: Handle Uncorrectable SoC Internal errors
  2026-03-02 10:22 ` [PATCH v2 10/11] drm/xe/xe_ras: Handle Uncorrectable " Riana Tauro
@ 2026-03-10 13:29   ` Mallesh, Koujalagi
  2026-03-11 14:55     ` Riana Tauro
  0 siblings, 1 reply; 43+ messages in thread
From: Mallesh, Koujalagi @ 2026-03-10 13:29 UTC (permalink / raw)
  To: Riana Tauro
  Cc: anshuman.gupta, rodrigo.vivi, aravind.iddamsetty, badal.nilawar,
	raag.jadav, ravi.kishore.koppuravuri, intel-xe


On 02-03-2026 03:52 pm, Riana Tauro wrote:
> Some critical errors such as CSC firmware and Punit are reported under
> SoC Internal Errors.
>
> CSC errors are classified as hardware errors and firmware errors.
> Hardware errors can be recovered using a SBR whereas firmware errors
> are critical and require a firmware flash. On such errors, device will
> be wedged and runtime survivability mode will be enabed to notify
> userspace that a firmware flash is required.
>
> PUNIT uncorrectable errors can only be recovered through a cold reset.
> TODO: Wedge device and notify userspace that a cold reset is required.
>
> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_ras.c | 52 +++++++++++++++++++++++++++++++++++++
>   1 file changed, 52 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
> index 61c01a4bfadb..f35d77654c8f 100644
> --- a/drivers/gpu/drm/xe/xe_ras.c
> +++ b/drivers/gpu/drm/xe/xe_ras.c
> @@ -7,6 +7,7 @@
>   #include "xe_printk.h"
>   #include "xe_ras.h"
>   #include "xe_ras_types.h"
> +#include "xe_survivability_mode.h"
>   #include "xe_sysctrl_mailbox.h"
>   #include "xe_sysctrl_mailbox_types.h"
>   
> @@ -102,6 +103,54 @@ static enum xe_ras_recovery_action handle_compute_errors(struct xe_device *xe,
>   	return XE_RAS_RECOVERY_ACTION_RECOVERED;
>   }
>   
> +static enum xe_ras_recovery_action handle_soc_internal_errors(struct xe_device *xe,
> +							      struct xe_ras_error_array *arr)
> +{
> +	struct xe_ras_soc_error *error_info = (struct xe_ras_soc_error *)arr->error_details;
> +	struct xe_ras_soc_error_source source = error_info->error_source;
> +	struct xe_ras_error_common common_info = arr->error_class.common;
> +	enum xe_ras_recovery_action action;
> +
> +	/* Default action */
> +	action = XE_RAS_RECOVERY_ACTION_RESET;
> +
> +	log_ras_error(xe, &arr->error_class);
> +
> +	if (source.csc) {
> +		struct xe_ras_csc_error *csc_error = (struct xe_ras_csc_error *)error_info->additional_details;
> +
> +		/*
> +		 * CSC uncorrectable errors are classified as hardware errors and firmware errors.
> +		 * CSC firmware errors are critical errors that can be recovered only by firmware
> +		 * update via SPI driver. PCODE enables FDO mode and sets the bit in the capability
> +		 * register. On receiving this error, the driver enables runtime survivability mode
> +		 * which notifies userspace that a firmware update is required.
> +		 */
> +		if (csc_error->hec_uncorr_fw_err_dw0) {
> +			xe_err(xe, "[RAS]: CSC %s error detected: 0x%x\n",
> +			       severity_to_str(xe, common_info.severity),
> +			       csc_error->hec_uncorr_fw_err_dw0);
> +			xe_survivability_mode_runtime_enable(xe);
> +			action = XE_RAS_RECOVERY_ACTION_DISCONNECT;
> +		}
> +	}
> +
> +	if (source.soc) {
> +		struct xe_ras_ieh_error *ieh_error = (struct xe_ras_ieh_error *)error_info->additional_details;
> +
> +		if (ieh_error->error_sources_ieh0.punit) {
> +			xe_err(xe, "[RAS]: PUNIT %s error detected: 0x%x\n",
> +			       severity_to_str(xe, common_info.severity),
> +			       ieh_error->error_sources_ieh0.punit);
> +			/** TODO: Add PUNIT error handling */
> +			action = XE_RAS_RECOVERY_ACTION_DISCONNECT;
> +		}
> +	}
> +

Use else if in source.soc, since source.csc and source.soc bits are set 
(even it's not happen), the same additional details are

cast two different structure type may cause undefined behavior.

Thanks

-/Mallesh

> +	/* For other SOC internal errors, request a reset as recovery mechanism */
> +	return action;
> +}
> +
>   static void xe_ras_prepare_sysctrl_command(struct xe_sysctrl_mailbox_command *command,
>   					   u32 cmd_mask, void *request, size_t request_len,
>   					   void *response, size_t response_len)
> @@ -180,6 +229,9 @@ enum xe_ras_recovery_action xe_ras_process_errors(struct xe_device *xe)
>   			case XE_RAS_COMPONENT_CORE_COMPUTE:
>   				action = handle_compute_errors(xe, &arr);
>   				break;
> +			case XE_RAS_COMPONENT_SOC_INTERNAL:
> +				action = handle_soc_internal_errors(xe, &arr);
> +				break;
>   			default:
>   				xe_err(xe, "[RAS]: Unknown error component %u\n", component);
>   				break;

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 10/11] drm/xe/xe_ras: Handle Uncorrectable SoC Internal errors
  2026-03-10 13:29   ` Mallesh, Koujalagi
@ 2026-03-11 14:55     ` Riana Tauro
  0 siblings, 0 replies; 43+ messages in thread
From: Riana Tauro @ 2026-03-11 14:55 UTC (permalink / raw)
  To: Mallesh, Koujalagi
  Cc: anshuman.gupta, rodrigo.vivi, aravind.iddamsetty, badal.nilawar,
	raag.jadav, ravi.kishore.koppuravuri, intel-xe



On 3/10/2026 6:59 PM, Mallesh, Koujalagi wrote:
> 
> On 02-03-2026 03:52 pm, Riana Tauro wrote:
>> Some critical errors such as CSC firmware and Punit are reported under
>> SoC Internal Errors.
>>
>> CSC errors are classified as hardware errors and firmware errors.
>> Hardware errors can be recovered using a SBR whereas firmware errors
>> are critical and require a firmware flash. On such errors, device will
>> be wedged and runtime survivability mode will be enabed to notify
>> userspace that a firmware flash is required.
>>
>> PUNIT uncorrectable errors can only be recovered through a cold reset.
>> TODO: Wedge device and notify userspace that a cold reset is required.
>>
>> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_ras.c | 52 +++++++++++++++++++++++++++++++++++++
>>   1 file changed, 52 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c
>> index 61c01a4bfadb..f35d77654c8f 100644
>> --- a/drivers/gpu/drm/xe/xe_ras.c
>> +++ b/drivers/gpu/drm/xe/xe_ras.c
>> @@ -7,6 +7,7 @@
>>   #include "xe_printk.h"
>>   #include "xe_ras.h"
>>   #include "xe_ras_types.h"
>> +#include "xe_survivability_mode.h"
>>   #include "xe_sysctrl_mailbox.h"
>>   #include "xe_sysctrl_mailbox_types.h"
>> @@ -102,6 +103,54 @@ static enum xe_ras_recovery_action 
>> handle_compute_errors(struct xe_device *xe,
>>       return XE_RAS_RECOVERY_ACTION_RECOVERED;
>>   }
>> +static enum xe_ras_recovery_action handle_soc_internal_errors(struct 
>> xe_device *xe,
>> +                                  struct xe_ras_error_array *arr)
>> +{
>> +    struct xe_ras_soc_error *error_info = (struct xe_ras_soc_error 
>> *)arr->error_details;
>> +    struct xe_ras_soc_error_source source = error_info->error_source;
>> +    struct xe_ras_error_common common_info = arr->error_class.common;
>> +    enum xe_ras_recovery_action action;
>> +
>> +    /* Default action */
>> +    action = XE_RAS_RECOVERY_ACTION_RESET;
>> +
>> +    log_ras_error(xe, &arr->error_class);
>> +
>> +    if (source.csc) {
>> +        struct xe_ras_csc_error *csc_error = (struct xe_ras_csc_error 
>> *)error_info->additional_details;
>> +
>> +        /*
>> +         * CSC uncorrectable errors are classified as hardware errors 
>> and firmware errors.
>> +         * CSC firmware errors are critical errors that can be 
>> recovered only by firmware
>> +         * update via SPI driver. PCODE enables FDO mode and sets the 
>> bit in the capability
>> +         * register. On receiving this error, the driver enables 
>> runtime survivability mode
>> +         * which notifies userspace that a firmware update is required.
>> +         */
>> +        if (csc_error->hec_uncorr_fw_err_dw0) {
>> +            xe_err(xe, "[RAS]: CSC %s error detected: 0x%x\n",
>> +                   severity_to_str(xe, common_info.severity),
>> +                   csc_error->hec_uncorr_fw_err_dw0);
>> +            xe_survivability_mode_runtime_enable(xe);
>> +            action = XE_RAS_RECOVERY_ACTION_DISCONNECT;
>> +        }
>> +    }
>> +
>> +    if (source.soc) {
>> +        struct xe_ras_ieh_error *ieh_error = (struct xe_ras_ieh_error 
>> *)error_info->additional_details;
>> +
>> +        if (ieh_error->error_sources_ieh0.punit) {
>> +            xe_err(xe, "[RAS]: PUNIT %s error detected: 0x%x\n",
>> +                   severity_to_str(xe, common_info.severity),
>> +                   ieh_error->error_sources_ieh0.punit);
>> +            /** TODO: Add PUNIT error handling */
>> +            action = XE_RAS_RECOVERY_ACTION_DISCONNECT;
>> +        }
>> +    }
>> +
> 
> Use else if in source.soc, since source.csc and source.soc bits are set 
> (even it's not happen), the same additional details are
> 
> cast two different structure type may cause undefined behavior.

Yeah you are correct. Thanks for catching this.
I can just return the action.

Though will double check if two error bits can be set in a single response

Thanks
Riana

> 
> Thanks
> 
> -/Mallesh
> 
>> +    /* For other SOC internal errors, request a reset as recovery 
>> mechanism */
>> +    return action;
>> +}
>> +
>>   static void xe_ras_prepare_sysctrl_command(struct 
>> xe_sysctrl_mailbox_command *command,
>>                          u32 cmd_mask, void *request, size_t request_len,
>>                          void *response, size_t response_len)
>> @@ -180,6 +229,9 @@ enum xe_ras_recovery_action 
>> xe_ras_process_errors(struct xe_device *xe)
>>               case XE_RAS_COMPONENT_CORE_COMPUTE:
>>                   action = handle_compute_errors(xe, &arr);
>>                   break;
>> +            case XE_RAS_COMPONENT_SOC_INTERNAL:
>> +                action = handle_soc_internal_errors(xe, &arr);
>> +                break;
>>               default:
>>                   xe_err(xe, "[RAS]: Unknown error component %u\n", 
>> component);
>>                   break;


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v2 11/11] drm/xe/xe_pci_error: Process errors in mmio_enabled
  2026-03-02 10:21 [PATCH v2 00/11] Introduce Xe Uncorrectable Error Handling Riana Tauro
                   ` (9 preceding siblings ...)
  2026-03-02 10:22 ` [PATCH v2 10/11] drm/xe/xe_ras: Handle Uncorrectable " Riana Tauro
@ 2026-03-02 10:22 ` Riana Tauro
  2026-03-11  7:10   ` Mallesh, Koujalagi
  2026-03-02 16:10 ` ✗ CI.checkpatch: warning for Introduce Xe Uncorrectable Error Handling (rev2) Patchwork
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 43+ messages in thread
From: Riana Tauro @ 2026-03-02 10:22 UTC (permalink / raw)
  To: intel-xe
  Cc: riana.tauro, anshuman.gupta, rodrigo.vivi, aravind.iddamsetty,
	badal.nilawar, raag.jadav, ravi.kishore.koppuravuri,
	mallesh.koujalagi

Query system controller when any non fatal error occurs to check
the type of the error, contain and recover.

The system controller is queried in the mmio_enabled callback.

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
---
v2: use ras recovery enum (Raag)
---
 drivers/gpu/drm/xe/xe_pci_error.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_pci_error.c b/drivers/gpu/drm/xe/xe_pci_error.c
index ba62868f00d4..35bc5b8cee99 100644
--- a/drivers/gpu/drm/xe/xe_pci_error.c
+++ b/drivers/gpu/drm/xe/xe_pci_error.c
@@ -9,6 +9,7 @@
 #include "xe_device.h"
 #include "xe_gt.h"
 #include "xe_pci.h"
+#include "xe_ras.h"
 #include "xe_uc.h"
 
 static void xe_pci_error_handling(struct pci_dev *pdev)
@@ -26,6 +27,12 @@ static void xe_pci_error_handling(struct pci_dev *pdev)
 	pci_disable_device(pdev);
 }
 
+static pci_ers_result_t ras_recovery_action_to_pci_result[] = {
+	[XE_RAS_RECOVERY_ACTION_RECOVERED] = PCI_ERS_RESULT_RECOVERED,
+	[XE_RAS_RECOVERY_ACTION_RESET] = PCI_ERS_RESULT_NEED_RESET,
+	[XE_RAS_RECOVERY_ACTION_DISCONNECT] = PCI_ERS_RESULT_DISCONNECT,
+};
+
 static pci_ers_result_t xe_pci_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
 {
 	struct xe_device *xe = pdev_to_xe_device(pdev);
@@ -50,9 +57,13 @@ static pci_ers_result_t xe_pci_error_detected(struct pci_dev *pdev, pci_channel_
 
 static pci_ers_result_t xe_pci_error_mmio_enabled(struct pci_dev *pdev)
 {
+	struct xe_device *xe = pdev_to_xe_device(pdev);
+	enum xe_ras_recovery_action action;
+
 	dev_err(&pdev->dev, "Xe Pci error recovery: MMIO enabled\n");
+	action = xe_ras_process_errors(xe);
 
-	return PCI_ERS_RESULT_NEED_RESET;
+	return ras_recovery_action_to_pci_result[action];
 }
 
 static pci_ers_result_t xe_pci_error_slot_reset(struct pci_dev *pdev)
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 11/11] drm/xe/xe_pci_error: Process errors in mmio_enabled
  2026-03-02 10:22 ` [PATCH v2 11/11] drm/xe/xe_pci_error: Process errors in mmio_enabled Riana Tauro
@ 2026-03-11  7:10   ` Mallesh, Koujalagi
  2026-03-11 14:39     ` Riana Tauro
  0 siblings, 1 reply; 43+ messages in thread
From: Mallesh, Koujalagi @ 2026-03-11  7:10 UTC (permalink / raw)
  To: Riana Tauro
  Cc: anshuman.gupta, rodrigo.vivi, aravind.iddamsetty, badal.nilawar,
	raag.jadav, ravi.kishore.koppuravuri, intel-xe


On 02-03-2026 03:52 pm, Riana Tauro wrote:
> Query system controller when any non fatal error occurs to check
> the type of the error, contain and recover.
>
> The system controller is queried in the mmio_enabled callback.
>
> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
> ---
> v2: use ras recovery enum (Raag)
> ---
>   drivers/gpu/drm/xe/xe_pci_error.c | 13 ++++++++++++-
>   1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_pci_error.c b/drivers/gpu/drm/xe/xe_pci_error.c
> index ba62868f00d4..35bc5b8cee99 100644
> --- a/drivers/gpu/drm/xe/xe_pci_error.c
> +++ b/drivers/gpu/drm/xe/xe_pci_error.c
> @@ -9,6 +9,7 @@
>   #include "xe_device.h"
>   #include "xe_gt.h"
>   #include "xe_pci.h"
> +#include "xe_ras.h"
>   #include "xe_uc.h"
>   
>   static void xe_pci_error_handling(struct pci_dev *pdev)
> @@ -26,6 +27,12 @@ static void xe_pci_error_handling(struct pci_dev *pdev)
>   	pci_disable_device(pdev);
>   }
Explain the mapping rationale
>   
> +static pci_ers_result_t ras_recovery_action_to_pci_result[] = {
The mapping array should be constant, never modified.
> +	[XE_RAS_RECOVERY_ACTION_RECOVERED] = PCI_ERS_RESULT_RECOVERED,
> +	[XE_RAS_RECOVERY_ACTION_RESET] = PCI_ERS_RESULT_NEED_RESET,
> +	[XE_RAS_RECOVERY_ACTION_DISCONNECT] = PCI_ERS_RESULT_DISCONNECT,
> +};
> +
>   static pci_ers_result_t xe_pci_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
>   {
>   	struct xe_device *xe = pdev_to_xe_device(pdev);
> @@ -50,9 +57,13 @@ static pci_ers_result_t xe_pci_error_detected(struct pci_dev *pdev, pci_channel_
>   
>   static pci_ers_result_t xe_pci_error_mmio_enabled(struct pci_dev *pdev)
>   {
> +	struct xe_device *xe = pdev_to_xe_device(pdev);
> +	enum xe_ras_recovery_action action;
Add action_max @ end of enum to validate.
> +
>   	dev_err(&pdev->dev, "Xe Pci error recovery: MMIO enabled\n");
> +	action = xe_ras_process_errors(xe);
What will happen, if RAS processing takes significant time?
>   
> -	return PCI_ERS_RESULT_NEED_RESET;

Use array bound check before using action to avoid out of bounds array 
access

Thanks,

-/Mallesh

> +	return ras_recovery_action_to_pci_result[action];
>   }
>   
>   static pci_ers_result_t xe_pci_error_slot_reset(struct pci_dev *pdev)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 11/11] drm/xe/xe_pci_error: Process errors in mmio_enabled
  2026-03-11  7:10   ` Mallesh, Koujalagi
@ 2026-03-11 14:39     ` Riana Tauro
  2026-03-12  8:08       ` Mallesh, Koujalagi
  0 siblings, 1 reply; 43+ messages in thread
From: Riana Tauro @ 2026-03-11 14:39 UTC (permalink / raw)
  To: Mallesh, Koujalagi
  Cc: anshuman.gupta, rodrigo.vivi, aravind.iddamsetty, badal.nilawar,
	raag.jadav, ravi.kishore.koppuravuri, intel-xe

Hi Mallesh

On 3/11/2026 12:40 PM, Mallesh, Koujalagi wrote:
> 
> On 02-03-2026 03:52 pm, Riana Tauro wrote:
>> Query system controller when any non fatal error occurs to check
>> the type of the error, contain and recover.
>>
>> The system controller is queried in the mmio_enabled callback.
>>
>> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
>> ---
>> v2: use ras recovery enum (Raag)
>> ---
>>   drivers/gpu/drm/xe/xe_pci_error.c | 13 ++++++++++++-
>>   1 file changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_pci_error.c b/drivers/gpu/drm/xe/ 
>> xe_pci_error.c
>> index ba62868f00d4..35bc5b8cee99 100644
>> --- a/drivers/gpu/drm/xe/xe_pci_error.c
>> +++ b/drivers/gpu/drm/xe/xe_pci_error.c
>> @@ -9,6 +9,7 @@
>>   #include "xe_device.h"
>>   #include "xe_gt.h"
>>   #include "xe_pci.h"
>> +#include "xe_ras.h"
>>   #include "xe_uc.h"
>>   static void xe_pci_error_handling(struct pci_dev *pdev)
>> @@ -26,6 +27,12 @@ static void xe_pci_error_handling(struct pci_dev 
>> *pdev)
>>       pci_disable_device(pdev);
>>   }
> Explain the mapping rationale

Do you mean commit message?
In the code the naming is self explanatory

>> +static pci_ers_result_t ras_recovery_action_to_pci_result[] = {
> The mapping array should be constant, never modified.
>> +    [XE_RAS_RECOVERY_ACTION_RECOVERED] = PCI_ERS_RESULT_RECOVERED,
>> +    [XE_RAS_RECOVERY_ACTION_RESET] = PCI_ERS_RESULT_NEED_RESET,
>> +    [XE_RAS_RECOVERY_ACTION_DISCONNECT] = PCI_ERS_RESULT_DISCONNECT,
>> +};
>> +
>>   static pci_ers_result_t xe_pci_error_detected(struct pci_dev *pdev, 
>> pci_channel_state_t state)
>>   {
>>       struct xe_device *xe = pdev_to_xe_device(pdev);
>> @@ -50,9 +57,13 @@ static pci_ers_result_t 
>> xe_pci_error_detected(struct pci_dev *pdev, pci_channel_
>>   static pci_ers_result_t xe_pci_error_mmio_enabled(struct pci_dev *pdev)
>>   {
>> +    struct xe_device *xe = pdev_to_xe_device(pdev);
>> +    enum xe_ras_recovery_action action;
> Add action_max @ end of enum to validate.

Answered below, but for consistency i can add max

>> +
>>       dev_err(&pdev->dev, "Xe Pci error recovery: MMIO enabled\n");
>> +    action = xe_ras_process_errors(xe);
> What will happen, if RAS processing takes significant time?

System controller times out after a duration. Refer patch
https://patchwork.freedesktop.org/patch/710637/?series=159554&rev=10

If the ret is negative in process_errors

+		ret = xe_sysctrl_send_command(xe, &command, &rlen);
+		if (ret || !rlen) {
+			xe_err(xe, "[RAS]: Sysctrl error ret %d\n", ret);
+			goto err;
+		}
+

We request for a reset assuming a SC error.


>> -    return PCI_ERS_RESULT_NEED_RESET;
> 
> Use array bound check before using action to avoid out of bounds array 
> access

The action here comes from a xe_ras_process_errors which also returns
enum. This action is explicitly sent and is not coming from user
or firmware.

Adding a check here seems unnecessary.

Thanks
Riana

> 
> Thanks,
> 
> -/Mallesh
> 
>> +    return ras_recovery_action_to_pci_result[action];
>>   }
>>   static pci_ers_result_t xe_pci_error_slot_reset(struct pci_dev *pdev)


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 11/11] drm/xe/xe_pci_error: Process errors in mmio_enabled
  2026-03-11 14:39     ` Riana Tauro
@ 2026-03-12  8:08       ` Mallesh, Koujalagi
  0 siblings, 0 replies; 43+ messages in thread
From: Mallesh, Koujalagi @ 2026-03-12  8:08 UTC (permalink / raw)
  To: Riana Tauro
  Cc: anshuman.gupta, rodrigo.vivi, aravind.iddamsetty, badal.nilawar,
	raag.jadav, ravi.kishore.koppuravuri, intel-xe


On 11-03-2026 08:09 pm, Riana Tauro wrote:
> Hi Mallesh
>
> On 3/11/2026 12:40 PM, Mallesh, Koujalagi wrote:
>>
>> On 02-03-2026 03:52 pm, Riana Tauro wrote:
>>> Query system controller when any non fatal error occurs to check
>>> the type of the error, contain and recover.
>>>
>>> The system controller is queried in the mmio_enabled callback.
>>>
>>> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
>>> ---
>>> v2: use ras recovery enum (Raag)
>>> ---
>>>   drivers/gpu/drm/xe/xe_pci_error.c | 13 ++++++++++++-
>>>   1 file changed, 12 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_pci_error.c b/drivers/gpu/drm/xe/ 
>>> xe_pci_error.c
>>> index ba62868f00d4..35bc5b8cee99 100644
>>> --- a/drivers/gpu/drm/xe/xe_pci_error.c
>>> +++ b/drivers/gpu/drm/xe/xe_pci_error.c
>>> @@ -9,6 +9,7 @@
>>>   #include "xe_device.h"
>>>   #include "xe_gt.h"
>>>   #include "xe_pci.h"
>>> +#include "xe_ras.h"
>>>   #include "xe_uc.h"
>>>   static void xe_pci_error_handling(struct pci_dev *pdev)
>>> @@ -26,6 +27,12 @@ static void xe_pci_error_handling(struct pci_dev 
>>> *pdev)
>>>       pci_disable_device(pdev);
>>>   }
>> Explain the mapping rationale
>
> Do you mean commit message?
> In the code the naming is self explanatory

Not commit message. Add  a brief code comments 
(ras_recovery_action_to_pci_result) about map RAS recovery action to PCI 
AER results.

For example DISCONNECT: Require system intervention (FW update, cold reset)

Since we called PCI DISCONNECT when CSC Firmware errors that need a 
flash update.

Thanks

-/Mallesh

>
>>> +static pci_ers_result_t ras_recovery_action_to_pci_result[] = {
>> The mapping array should be constant, never modified.
>>> +    [XE_RAS_RECOVERY_ACTION_RECOVERED] = PCI_ERS_RESULT_RECOVERED,
>>> +    [XE_RAS_RECOVERY_ACTION_RESET] = PCI_ERS_RESULT_NEED_RESET,
>>> +    [XE_RAS_RECOVERY_ACTION_DISCONNECT] = PCI_ERS_RESULT_DISCONNECT,
>>> +};
>>> +
>>>   static pci_ers_result_t xe_pci_error_detected(struct pci_dev 
>>> *pdev, pci_channel_state_t state)
>>>   {
>>>       struct xe_device *xe = pdev_to_xe_device(pdev);
>>> @@ -50,9 +57,13 @@ static pci_ers_result_t 
>>> xe_pci_error_detected(struct pci_dev *pdev, pci_channel_
>>>   static pci_ers_result_t xe_pci_error_mmio_enabled(struct pci_dev 
>>> *pdev)
>>>   {
>>> +    struct xe_device *xe = pdev_to_xe_device(pdev);
>>> +    enum xe_ras_recovery_action action;
>> Add action_max @ end of enum to validate.
>
> Answered below, but for consistency i can add max
>
>>> +
>>>       dev_err(&pdev->dev, "Xe Pci error recovery: MMIO enabled\n");
>>> +    action = xe_ras_process_errors(xe);
>> What will happen, if RAS processing takes significant time?
>
> System controller times out after a duration. Refer patch
> https://patchwork.freedesktop.org/patch/710637/?series=159554&rev=10
>
> If the ret is negative in process_errors
>
> +        ret = xe_sysctrl_send_command(xe, &command, &rlen);
> +        if (ret || !rlen) {
> +            xe_err(xe, "[RAS]: Sysctrl error ret %d\n", ret);
> +            goto err;
> +        }
> +
>
> We request for a reset assuming a SC error.
>
>
>>> -    return PCI_ERS_RESULT_NEED_RESET;
>>
>> Use array bound check before using action to avoid out of bounds 
>> array access
>
> The action here comes from a xe_ras_process_errors which also returns
> enum. This action is explicitly sent and is not coming from user
> or firmware.
>
> Adding a check here seems unnecessary.
>
> Thanks
> Riana
>
>>
>> Thanks,
>>
>> -/Mallesh
>>
>>> +    return ras_recovery_action_to_pci_result[action];
>>>   }
>>>   static pci_ers_result_t xe_pci_error_slot_reset(struct pci_dev *pdev)
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* ✗ CI.checkpatch: warning for Introduce Xe Uncorrectable Error Handling (rev2)
  2026-03-02 10:21 [PATCH v2 00/11] Introduce Xe Uncorrectable Error Handling Riana Tauro
                   ` (10 preceding siblings ...)
  2026-03-02 10:22 ` [PATCH v2 11/11] drm/xe/xe_pci_error: Process errors in mmio_enabled Riana Tauro
@ 2026-03-02 16:10 ` Patchwork
  2026-03-02 16:11 ` ✓ CI.KUnit: success " Patchwork
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 43+ messages in thread
From: Patchwork @ 2026-03-02 16:10 UTC (permalink / raw)
  To: Riana Tauro; +Cc: intel-xe

== Series Details ==

Series: Introduce Xe Uncorrectable Error Handling (rev2)
URL   : https://patchwork.freedesktop.org/series/160482/
State : warning

== Summary ==

+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
1f57ba1afceae32108bd24770069f764d940a0e4
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit af594481a53e24cd7bedefcf31ead2d9ff64b9d3
Author: Riana Tauro <riana.tauro@intel.com>
Date:   Mon Mar 2 15:52:06 2026 +0530

    drm/xe/xe_pci_error: Process errors in mmio_enabled
    
    Query system controller when any non fatal error occurs to check
    the type of the error, contain and recover.
    
    The system controller is queried in the mmio_enabled callback.
    
    Signed-off-by: Riana Tauro <riana.tauro@intel.com>
+ /mt/dim checkpatch 01a56db3d9c2632675c23778a174b989aac4e050 drm-intel
0ed4ca338105 drm/xe/xe_sysctrl: Add System controller patch
-:26: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#26: 
new file mode 100644

-:781: ERROR:NO_AUTHOR_SIGN_OFF: Missing Signed-off-by: line by nominal patch author 'Anoop Vijay <anoop.c.vijay@intel.com>'

total: 1 errors, 1 warnings, 0 checks, 699 lines checked
a2b045dfc4ca drm/xe/xe_survivability: Decouple survivability info from boot survivability
6787619d6b29 drm/xe/xe_pci_error: Implement PCI error recovery callbacks
-:101: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#101: 
new file mode 100644

total: 0 errors, 1 warnings, 0 checks, 151 lines checked
624b371e52c4 drm/xe/xe_pci_error: Group all devres to release them on PCIe slot reset
59d6156957c6 drm/xe: Skip device access during PCI error recovery
b904b169337e drm/xe/xe_ras: Initialize Uncorrectable AER Registers
-:52: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#52: 
new file mode 100644

total: 0 errors, 1 warnings, 0 checks, 113 lines checked
23db84843498 drm/xe/xe_ras: Add structures and commands for Uncorrectable Core Compute Errors
-:47: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#47: FILE: drivers/gpu/drm/xe/xe_ras.c:36:
+};
+static_assert(ARRAY_SIZE(xe_ras_severities) == XE_RAS_SEVERITY_MAX);

-:58: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#58: FILE: drivers/gpu/drm/xe/xe_ras.c:47:
+};
+static_assert(ARRAY_SIZE(xe_ras_components) == XE_RAS_COMPONENT_MAX);

-:80: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#80: 
new file mode 100644

total: 0 errors, 1 warnings, 2 checks, 213 lines checked
d7876791b623 drm/xe/xe_ras: Add support for Uncorrectable Core-Compute errors
62e967557cf8 drm/xe/xe_ras: Add structures for SoC Internal errors
7971db29a95f drm/xe/xe_ras: Handle Uncorrectable SoC Internal errors
-:50: WARNING:LONG_LINE: line length of 111 exceeds 100 columns
#50: FILE: drivers/gpu/drm/xe/xe_ras.c:120:
+		struct xe_ras_csc_error *csc_error = (struct xe_ras_csc_error *)error_info->additional_details;

-:69: WARNING:LONG_LINE: line length of 111 exceeds 100 columns
#69: FILE: drivers/gpu/drm/xe/xe_ras.c:139:
+		struct xe_ras_ieh_error *ieh_error = (struct xe_ras_ieh_error *)error_info->additional_details;

total: 0 errors, 2 warnings, 0 checks, 70 lines checked
af594481a53e drm/xe/xe_pci_error: Process errors in mmio_enabled



^ permalink raw reply	[flat|nested] 43+ messages in thread

* ✓ CI.KUnit: success for Introduce Xe Uncorrectable Error Handling (rev2)
  2026-03-02 10:21 [PATCH v2 00/11] Introduce Xe Uncorrectable Error Handling Riana Tauro
                   ` (11 preceding siblings ...)
  2026-03-02 16:10 ` ✗ CI.checkpatch: warning for Introduce Xe Uncorrectable Error Handling (rev2) Patchwork
@ 2026-03-02 16:11 ` Patchwork
  2026-03-02 16:48 ` ✓ Xe.CI.BAT: " Patchwork
  2026-03-02 18:29 ` ✗ Xe.CI.FULL: failure " Patchwork
  14 siblings, 0 replies; 43+ messages in thread
From: Patchwork @ 2026-03-02 16:11 UTC (permalink / raw)
  To: Riana Tauro; +Cc: intel-xe

== Series Details ==

Series: Introduce Xe Uncorrectable Error Handling (rev2)
URL   : https://patchwork.freedesktop.org/series/160482/
State : success

== Summary ==

+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
[16:10:23] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[16:10:27] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[16:10:58] Starting KUnit Kernel (1/1)...
[16:10:58] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[16:10:58] ================== guc_buf (11 subtests) ===================
[16:10:58] [PASSED] test_smallest
[16:10:58] [PASSED] test_largest
[16:10:58] [PASSED] test_granular
[16:10:58] [PASSED] test_unique
[16:10:58] [PASSED] test_overlap
[16:10:58] [PASSED] test_reusable
[16:10:58] [PASSED] test_too_big
[16:10:58] [PASSED] test_flush
[16:10:58] [PASSED] test_lookup
[16:10:58] [PASSED] test_data
[16:10:58] [PASSED] test_class
[16:10:58] ===================== [PASSED] guc_buf =====================
[16:10:58] =================== guc_dbm (7 subtests) ===================
[16:10:58] [PASSED] test_empty
[16:10:58] [PASSED] test_default
[16:10:58] ======================== test_size  ========================
[16:10:58] [PASSED] 4
[16:10:58] [PASSED] 8
[16:10:58] [PASSED] 32
[16:10:58] [PASSED] 256
[16:10:58] ==================== [PASSED] test_size ====================
[16:10:58] ======================= test_reuse  ========================
[16:10:58] [PASSED] 4
[16:10:58] [PASSED] 8
[16:10:58] [PASSED] 32
[16:10:58] [PASSED] 256
[16:10:58] =================== [PASSED] test_reuse ====================
[16:10:58] =================== test_range_overlap  ====================
[16:10:58] [PASSED] 4
[16:10:58] [PASSED] 8
[16:10:58] [PASSED] 32
[16:10:58] [PASSED] 256
[16:10:58] =============== [PASSED] test_range_overlap ================
[16:10:58] =================== test_range_compact  ====================
[16:10:58] [PASSED] 4
[16:10:58] [PASSED] 8
[16:10:58] [PASSED] 32
[16:10:58] [PASSED] 256
[16:10:58] =============== [PASSED] test_range_compact ================
[16:10:58] ==================== test_range_spare  =====================
[16:10:58] [PASSED] 4
[16:10:58] [PASSED] 8
[16:10:58] [PASSED] 32
[16:10:58] [PASSED] 256
[16:10:58] ================ [PASSED] test_range_spare =================
[16:10:58] ===================== [PASSED] guc_dbm =====================
[16:10:58] =================== guc_idm (6 subtests) ===================
[16:10:58] [PASSED] bad_init
[16:10:58] [PASSED] no_init
[16:10:58] [PASSED] init_fini
[16:10:58] [PASSED] check_used
[16:10:58] [PASSED] check_quota
[16:10:58] [PASSED] check_all
[16:10:58] ===================== [PASSED] guc_idm =====================
[16:10:58] ================== no_relay (3 subtests) ===================
[16:10:58] [PASSED] xe_drops_guc2pf_if_not_ready
[16:10:58] [PASSED] xe_drops_guc2vf_if_not_ready
[16:10:58] [PASSED] xe_rejects_send_if_not_ready
[16:10:58] ==================== [PASSED] no_relay =====================
[16:10:58] ================== pf_relay (14 subtests) ==================
[16:10:58] [PASSED] pf_rejects_guc2pf_too_short
[16:10:58] [PASSED] pf_rejects_guc2pf_too_long
[16:10:58] [PASSED] pf_rejects_guc2pf_no_payload
[16:10:58] [PASSED] pf_fails_no_payload
[16:10:58] [PASSED] pf_fails_bad_origin
[16:10:58] [PASSED] pf_fails_bad_type
[16:10:58] [PASSED] pf_txn_reports_error
[16:10:58] [PASSED] pf_txn_sends_pf2guc
[16:10:58] [PASSED] pf_sends_pf2guc
[16:10:58] [SKIPPED] pf_loopback_nop
[16:10:58] [SKIPPED] pf_loopback_echo
[16:10:58] [SKIPPED] pf_loopback_fail
[16:10:58] [SKIPPED] pf_loopback_busy
[16:10:58] [SKIPPED] pf_loopback_retry
[16:10:58] ==================== [PASSED] pf_relay =====================
[16:10:58] ================== vf_relay (3 subtests) ===================
[16:10:58] [PASSED] vf_rejects_guc2vf_too_short
[16:10:58] [PASSED] vf_rejects_guc2vf_too_long
[16:10:58] [PASSED] vf_rejects_guc2vf_no_payload
[16:10:58] ==================== [PASSED] vf_relay =====================
[16:10:58] ================ pf_gt_config (9 subtests) =================
[16:10:58] [PASSED] fair_contexts_1vf
[16:10:58] [PASSED] fair_doorbells_1vf
[16:10:58] [PASSED] fair_ggtt_1vf
[16:10:58] ====================== fair_vram_1vf  ======================
[16:10:58] [PASSED] 3.50 GiB
[16:10:58] [PASSED] 11.5 GiB
[16:10:58] [PASSED] 15.5 GiB
[16:10:58] [PASSED] 31.5 GiB
[16:10:58] [PASSED] 63.5 GiB
[16:10:58] [PASSED] 13.9 GiB
[16:10:58] ================== [PASSED] fair_vram_1vf ==================
[16:10:58] ================ fair_vram_1vf_admin_only  =================
[16:10:58] [PASSED] 3.50 GiB
[16:10:58] [PASSED] 11.5 GiB
[16:10:58] [PASSED] 15.5 GiB
[16:10:58] [PASSED] 31.5 GiB
[16:10:58] [PASSED] 63.5 GiB
[16:10:58] [PASSED] 13.9 GiB
[16:10:58] ============ [PASSED] fair_vram_1vf_admin_only =============
[16:10:58] ====================== fair_contexts  ======================
[16:10:58] [PASSED] 1 VF
[16:10:58] [PASSED] 2 VFs
[16:10:58] [PASSED] 3 VFs
[16:10:58] [PASSED] 4 VFs
[16:10:58] [PASSED] 5 VFs
[16:10:58] [PASSED] 6 VFs
[16:10:58] [PASSED] 7 VFs
[16:10:58] [PASSED] 8 VFs
[16:10:58] [PASSED] 9 VFs
[16:10:58] [PASSED] 10 VFs
[16:10:58] [PASSED] 11 VFs
[16:10:58] [PASSED] 12 VFs
[16:10:58] [PASSED] 13 VFs
[16:10:58] [PASSED] 14 VFs
[16:10:58] [PASSED] 15 VFs
[16:10:58] [PASSED] 16 VFs
[16:10:58] [PASSED] 17 VFs
[16:10:58] [PASSED] 18 VFs
[16:10:58] [PASSED] 19 VFs
[16:10:58] [PASSED] 20 VFs
[16:10:58] [PASSED] 21 VFs
[16:10:58] [PASSED] 22 VFs
[16:10:58] [PASSED] 23 VFs
[16:10:58] [PASSED] 24 VFs
[16:10:58] [PASSED] 25 VFs
[16:10:58] [PASSED] 26 VFs
[16:10:58] [PASSED] 27 VFs
[16:10:58] [PASSED] 28 VFs
[16:10:58] [PASSED] 29 VFs
[16:10:58] [PASSED] 30 VFs
[16:10:58] [PASSED] 31 VFs
[16:10:58] [PASSED] 32 VFs
[16:10:58] [PASSED] 33 VFs
[16:10:58] [PASSED] 34 VFs
[16:10:58] [PASSED] 35 VFs
[16:10:58] [PASSED] 36 VFs
[16:10:58] [PASSED] 37 VFs
[16:10:58] [PASSED] 38 VFs
[16:10:58] [PASSED] 39 VFs
[16:10:58] [PASSED] 40 VFs
[16:10:58] [PASSED] 41 VFs
[16:10:58] [PASSED] 42 VFs
[16:10:58] [PASSED] 43 VFs
[16:10:58] [PASSED] 44 VFs
[16:10:58] [PASSED] 45 VFs
[16:10:58] [PASSED] 46 VFs
[16:10:58] [PASSED] 47 VFs
[16:10:58] [PASSED] 48 VFs
[16:10:58] [PASSED] 49 VFs
[16:10:58] [PASSED] 50 VFs
[16:10:58] [PASSED] 51 VFs
[16:10:58] [PASSED] 52 VFs
[16:10:58] [PASSED] 53 VFs
[16:10:58] [PASSED] 54 VFs
[16:10:58] [PASSED] 55 VFs
[16:10:58] [PASSED] 56 VFs
[16:10:58] [PASSED] 57 VFs
[16:10:58] [PASSED] 58 VFs
[16:10:58] [PASSED] 59 VFs
[16:10:58] [PASSED] 60 VFs
[16:10:58] [PASSED] 61 VFs
[16:10:58] [PASSED] 62 VFs
[16:10:58] [PASSED] 63 VFs
[16:10:58] ================== [PASSED] fair_contexts ==================
[16:10:58] ===================== fair_doorbells  ======================
[16:10:58] [PASSED] 1 VF
[16:10:58] [PASSED] 2 VFs
[16:10:58] [PASSED] 3 VFs
[16:10:58] [PASSED] 4 VFs
[16:10:58] [PASSED] 5 VFs
[16:10:58] [PASSED] 6 VFs
[16:10:58] [PASSED] 7 VFs
[16:10:58] [PASSED] 8 VFs
[16:10:58] [PASSED] 9 VFs
[16:10:58] [PASSED] 10 VFs
[16:10:58] [PASSED] 11 VFs
[16:10:58] [PASSED] 12 VFs
[16:10:58] [PASSED] 13 VFs
[16:10:58] [PASSED] 14 VFs
[16:10:58] [PASSED] 15 VFs
[16:10:58] [PASSED] 16 VFs
[16:10:58] [PASSED] 17 VFs
[16:10:58] [PASSED] 18 VFs
[16:10:58] [PASSED] 19 VFs
[16:10:58] [PASSED] 20 VFs
[16:10:58] [PASSED] 21 VFs
[16:10:58] [PASSED] 22 VFs
[16:10:58] [PASSED] 23 VFs
[16:10:58] [PASSED] 24 VFs
[16:10:58] [PASSED] 25 VFs
[16:10:58] [PASSED] 26 VFs
[16:10:58] [PASSED] 27 VFs
[16:10:58] [PASSED] 28 VFs
[16:10:58] [PASSED] 29 VFs
[16:10:58] [PASSED] 30 VFs
[16:10:58] [PASSED] 31 VFs
[16:10:58] [PASSED] 32 VFs
[16:10:58] [PASSED] 33 VFs
[16:10:58] [PASSED] 34 VFs
[16:10:58] [PASSED] 35 VFs
[16:10:58] [PASSED] 36 VFs
[16:10:58] [PASSED] 37 VFs
[16:10:58] [PASSED] 38 VFs
[16:10:58] [PASSED] 39 VFs
[16:10:58] [PASSED] 40 VFs
[16:10:58] [PASSED] 41 VFs
[16:10:58] [PASSED] 42 VFs
[16:10:58] [PASSED] 43 VFs
[16:10:58] [PASSED] 44 VFs
[16:10:58] [PASSED] 45 VFs
[16:10:58] [PASSED] 46 VFs
[16:10:58] [PASSED] 47 VFs
[16:10:58] [PASSED] 48 VFs
[16:10:58] [PASSED] 49 VFs
[16:10:58] [PASSED] 50 VFs
[16:10:58] [PASSED] 51 VFs
[16:10:58] [PASSED] 52 VFs
[16:10:58] [PASSED] 53 VFs
[16:10:58] [PASSED] 54 VFs
[16:10:58] [PASSED] 55 VFs
[16:10:58] [PASSED] 56 VFs
[16:10:58] [PASSED] 57 VFs
[16:10:58] [PASSED] 58 VFs
[16:10:58] [PASSED] 59 VFs
[16:10:58] [PASSED] 60 VFs
[16:10:58] [PASSED] 61 VFs
[16:10:58] [PASSED] 62 VFs
[16:10:58] [PASSED] 63 VFs
[16:10:58] ================= [PASSED] fair_doorbells ==================
[16:10:58] ======================== fair_ggtt  ========================
[16:10:58] [PASSED] 1 VF
[16:10:58] [PASSED] 2 VFs
[16:10:58] [PASSED] 3 VFs
[16:10:58] [PASSED] 4 VFs
[16:10:58] [PASSED] 5 VFs
[16:10:58] [PASSED] 6 VFs
[16:10:58] [PASSED] 7 VFs
[16:10:58] [PASSED] 8 VFs
[16:10:58] [PASSED] 9 VFs
[16:10:58] [PASSED] 10 VFs
[16:10:58] [PASSED] 11 VFs
[16:10:58] [PASSED] 12 VFs
[16:10:58] [PASSED] 13 VFs
[16:10:58] [PASSED] 14 VFs
[16:10:58] [PASSED] 15 VFs
[16:10:58] [PASSED] 16 VFs
[16:10:58] [PASSED] 17 VFs
[16:10:58] [PASSED] 18 VFs
[16:10:58] [PASSED] 19 VFs
[16:10:58] [PASSED] 20 VFs
[16:10:58] [PASSED] 21 VFs
[16:10:58] [PASSED] 22 VFs
[16:10:58] [PASSED] 23 VFs
[16:10:58] [PASSED] 24 VFs
[16:10:58] [PASSED] 25 VFs
[16:10:58] [PASSED] 26 VFs
[16:10:58] [PASSED] 27 VFs
[16:10:58] [PASSED] 28 VFs
[16:10:58] [PASSED] 29 VFs
[16:10:58] [PASSED] 30 VFs
[16:10:58] [PASSED] 31 VFs
[16:10:58] [PASSED] 32 VFs
[16:10:58] [PASSED] 33 VFs
[16:10:58] [PASSED] 34 VFs
[16:10:58] [PASSED] 35 VFs
[16:10:58] [PASSED] 36 VFs
[16:10:58] [PASSED] 37 VFs
[16:10:58] [PASSED] 38 VFs
[16:10:58] [PASSED] 39 VFs
[16:10:58] [PASSED] 40 VFs
[16:10:58] [PASSED] 41 VFs
[16:10:58] [PASSED] 42 VFs
[16:10:58] [PASSED] 43 VFs
[16:10:58] [PASSED] 44 VFs
[16:10:58] [PASSED] 45 VFs
[16:10:58] [PASSED] 46 VFs
[16:10:58] [PASSED] 47 VFs
[16:10:58] [PASSED] 48 VFs
[16:10:58] [PASSED] 49 VFs
[16:10:58] [PASSED] 50 VFs
[16:10:58] [PASSED] 51 VFs
[16:10:58] [PASSED] 52 VFs
[16:10:58] [PASSED] 53 VFs
[16:10:58] [PASSED] 54 VFs
[16:10:58] [PASSED] 55 VFs
[16:10:58] [PASSED] 56 VFs
[16:10:58] [PASSED] 57 VFs
[16:10:58] [PASSED] 58 VFs
[16:10:58] [PASSED] 59 VFs
[16:10:58] [PASSED] 60 VFs
[16:10:58] [PASSED] 61 VFs
[16:10:58] [PASSED] 62 VFs
[16:10:58] [PASSED] 63 VFs
[16:10:58] ==================== [PASSED] fair_ggtt ====================
[16:10:58] ======================== fair_vram  ========================
[16:10:58] [PASSED] 1 VF
[16:10:58] [PASSED] 2 VFs
[16:10:58] [PASSED] 3 VFs
[16:10:58] [PASSED] 4 VFs
[16:10:58] [PASSED] 5 VFs
[16:10:58] [PASSED] 6 VFs
[16:10:58] [PASSED] 7 VFs
[16:10:58] [PASSED] 8 VFs
[16:10:58] [PASSED] 9 VFs
[16:10:58] [PASSED] 10 VFs
[16:10:58] [PASSED] 11 VFs
[16:10:58] [PASSED] 12 VFs
[16:10:58] [PASSED] 13 VFs
[16:10:58] [PASSED] 14 VFs
[16:10:58] [PASSED] 15 VFs
[16:10:58] [PASSED] 16 VFs
[16:10:58] [PASSED] 17 VFs
[16:10:58] [PASSED] 18 VFs
[16:10:58] [PASSED] 19 VFs
[16:10:58] [PASSED] 20 VFs
[16:10:58] [PASSED] 21 VFs
[16:10:58] [PASSED] 22 VFs
[16:10:58] [PASSED] 23 VFs
[16:10:58] [PASSED] 24 VFs
[16:10:58] [PASSED] 25 VFs
[16:10:58] [PASSED] 26 VFs
[16:10:58] [PASSED] 27 VFs
[16:10:58] [PASSED] 28 VFs
[16:10:58] [PASSED] 29 VFs
[16:10:58] [PASSED] 30 VFs
[16:10:58] [PASSED] 31 VFs
[16:10:58] [PASSED] 32 VFs
[16:10:58] [PASSED] 33 VFs
[16:10:58] [PASSED] 34 VFs
[16:10:58] [PASSED] 35 VFs
[16:10:58] [PASSED] 36 VFs
[16:10:58] [PASSED] 37 VFs
[16:10:58] [PASSED] 38 VFs
[16:10:58] [PASSED] 39 VFs
[16:10:58] [PASSED] 40 VFs
[16:10:58] [PASSED] 41 VFs
[16:10:58] [PASSED] 42 VFs
[16:10:58] [PASSED] 43 VFs
[16:10:58] [PASSED] 44 VFs
[16:10:58] [PASSED] 45 VFs
[16:10:58] [PASSED] 46 VFs
[16:10:58] [PASSED] 47 VFs
[16:10:58] [PASSED] 48 VFs
[16:10:58] [PASSED] 49 VFs
[16:10:58] [PASSED] 50 VFs
[16:10:58] [PASSED] 51 VFs
[16:10:58] [PASSED] 52 VFs
[16:10:58] [PASSED] 53 VFs
[16:10:58] [PASSED] 54 VFs
[16:10:58] [PASSED] 55 VFs
[16:10:58] [PASSED] 56 VFs
[16:10:58] [PASSED] 57 VFs
[16:10:58] [PASSED] 58 VFs
[16:10:58] [PASSED] 59 VFs
[16:10:58] [PASSED] 60 VFs
[16:10:58] [PASSED] 61 VFs
[16:10:58] [PASSED] 62 VFs
[16:10:58] [PASSED] 63 VFs
[16:10:58] ==================== [PASSED] fair_vram ====================
[16:10:58] ================== [PASSED] pf_gt_config ===================
[16:10:58] ===================== lmtt (1 subtest) =====================
[16:10:58] ======================== test_ops  =========================
[16:10:58] [PASSED] 2-level
[16:10:58] [PASSED] multi-level
[16:10:58] ==================== [PASSED] test_ops =====================
[16:10:58] ====================== [PASSED] lmtt =======================
[16:10:58] ================= pf_service (11 subtests) =================
[16:10:58] [PASSED] pf_negotiate_any
[16:10:58] [PASSED] pf_negotiate_base_match
[16:10:58] [PASSED] pf_negotiate_base_newer
[16:10:58] [PASSED] pf_negotiate_base_next
[16:10:58] [SKIPPED] pf_negotiate_base_older
[16:10:58] [PASSED] pf_negotiate_base_prev
[16:10:58] [PASSED] pf_negotiate_latest_match
[16:10:58] [PASSED] pf_negotiate_latest_newer
[16:10:58] [PASSED] pf_negotiate_latest_next
[16:10:58] [SKIPPED] pf_negotiate_latest_older
[16:10:58] [SKIPPED] pf_negotiate_latest_prev
[16:10:58] =================== [PASSED] pf_service ====================
[16:10:58] ================= xe_guc_g2g (2 subtests) ==================
[16:10:58] ============== xe_live_guc_g2g_kunit_default  ==============
[16:10:58] ========= [SKIPPED] xe_live_guc_g2g_kunit_default ==========
[16:10:58] ============== xe_live_guc_g2g_kunit_allmem  ===============
[16:10:58] ========== [SKIPPED] xe_live_guc_g2g_kunit_allmem ==========
[16:10:58] =================== [SKIPPED] xe_guc_g2g ===================
[16:10:58] =================== xe_mocs (2 subtests) ===================
[16:10:58] ================ xe_live_mocs_kernel_kunit  ================
[16:10:58] =========== [SKIPPED] xe_live_mocs_kernel_kunit ============
[16:10:58] ================ xe_live_mocs_reset_kunit  =================
[16:10:58] ============ [SKIPPED] xe_live_mocs_reset_kunit ============
[16:10:58] ==================== [SKIPPED] xe_mocs =====================
[16:10:58] ================= xe_migrate (2 subtests) ==================
[16:10:58] ================= xe_migrate_sanity_kunit  =================
[16:10:58] ============ [SKIPPED] xe_migrate_sanity_kunit =============
[16:10:58] ================== xe_validate_ccs_kunit  ==================
[16:10:58] ============= [SKIPPED] xe_validate_ccs_kunit ==============
[16:10:58] =================== [SKIPPED] xe_migrate ===================
[16:10:58] ================== xe_dma_buf (1 subtest) ==================
[16:10:58] ==================== xe_dma_buf_kunit  =====================
[16:10:58] ================ [SKIPPED] xe_dma_buf_kunit ================
[16:10:58] =================== [SKIPPED] xe_dma_buf ===================
[16:10:58] ================= xe_bo_shrink (1 subtest) =================
[16:10:58] =================== xe_bo_shrink_kunit  ====================
[16:10:58] =============== [SKIPPED] xe_bo_shrink_kunit ===============
[16:10:58] ================== [SKIPPED] xe_bo_shrink ==================
[16:10:58] ==================== xe_bo (2 subtests) ====================
[16:10:58] ================== xe_ccs_migrate_kunit  ===================
[16:10:58] ============== [SKIPPED] xe_ccs_migrate_kunit ==============
[16:10:58] ==================== xe_bo_evict_kunit  ====================
[16:10:58] =============== [SKIPPED] xe_bo_evict_kunit ================
[16:10:58] ===================== [SKIPPED] xe_bo ======================
[16:10:58] ==================== args (13 subtests) ====================
[16:10:58] [PASSED] count_args_test
[16:10:58] [PASSED] call_args_example
[16:10:58] [PASSED] call_args_test
[16:10:58] [PASSED] drop_first_arg_example
[16:10:58] [PASSED] drop_first_arg_test
[16:10:58] [PASSED] first_arg_example
[16:10:58] [PASSED] first_arg_test
[16:10:58] [PASSED] last_arg_example
[16:10:58] [PASSED] last_arg_test
[16:10:58] [PASSED] pick_arg_example
[16:10:58] [PASSED] if_args_example
[16:10:58] [PASSED] if_args_test
[16:10:58] [PASSED] sep_comma_example
[16:10:58] ====================== [PASSED] args =======================
[16:10:58] =================== xe_pci (3 subtests) ====================
[16:10:58] ==================== check_graphics_ip  ====================
[16:10:58] [PASSED] 12.00 Xe_LP
[16:10:58] [PASSED] 12.10 Xe_LP+
[16:10:58] [PASSED] 12.55 Xe_HPG
[16:10:58] [PASSED] 12.60 Xe_HPC
[16:10:58] [PASSED] 12.70 Xe_LPG
[16:10:58] [PASSED] 12.71 Xe_LPG
[16:10:58] [PASSED] 12.74 Xe_LPG+
[16:10:58] [PASSED] 20.01 Xe2_HPG
[16:10:58] [PASSED] 20.02 Xe2_HPG
[16:10:58] [PASSED] 20.04 Xe2_LPG
[16:10:58] [PASSED] 30.00 Xe3_LPG
[16:10:58] [PASSED] 30.01 Xe3_LPG
[16:10:58] [PASSED] 30.03 Xe3_LPG
[16:10:58] [PASSED] 30.04 Xe3_LPG
[16:10:58] [PASSED] 30.05 Xe3_LPG
[16:10:58] [PASSED] 35.10 Xe3p_LPG
[16:10:58] [PASSED] 35.11 Xe3p_XPC
[16:10:58] ================ [PASSED] check_graphics_ip ================
[16:10:58] ===================== check_media_ip  ======================
[16:10:58] [PASSED] 12.00 Xe_M
[16:10:58] [PASSED] 12.55 Xe_HPM
[16:10:58] [PASSED] 13.00 Xe_LPM+
[16:10:58] [PASSED] 13.01 Xe2_HPM
[16:10:58] [PASSED] 20.00 Xe2_LPM
[16:10:58] [PASSED] 30.00 Xe3_LPM
[16:10:58] [PASSED] 30.02 Xe3_LPM
[16:10:58] [PASSED] 35.00 Xe3p_LPM
[16:10:58] [PASSED] 35.03 Xe3p_HPM
[16:10:58] ================= [PASSED] check_media_ip ==================
[16:10:58] =================== check_platform_desc  ===================
[16:10:58] [PASSED] 0x9A60 (TIGERLAKE)
[16:10:58] [PASSED] 0x9A68 (TIGERLAKE)
[16:10:58] [PASSED] 0x9A70 (TIGERLAKE)
[16:10:58] [PASSED] 0x9A40 (TIGERLAKE)
[16:10:58] [PASSED] 0x9A49 (TIGERLAKE)
[16:10:58] [PASSED] 0x9A59 (TIGERLAKE)
[16:10:58] [PASSED] 0x9A78 (TIGERLAKE)
[16:10:58] [PASSED] 0x9AC0 (TIGERLAKE)
[16:10:58] [PASSED] 0x9AC9 (TIGERLAKE)
[16:10:58] [PASSED] 0x9AD9 (TIGERLAKE)
[16:10:58] [PASSED] 0x9AF8 (TIGERLAKE)
[16:10:58] [PASSED] 0x4C80 (ROCKETLAKE)
[16:10:58] [PASSED] 0x4C8A (ROCKETLAKE)
[16:10:58] [PASSED] 0x4C8B (ROCKETLAKE)
[16:10:58] [PASSED] 0x4C8C (ROCKETLAKE)
[16:10:58] [PASSED] 0x4C90 (ROCKETLAKE)
[16:10:58] [PASSED] 0x4C9A (ROCKETLAKE)
[16:10:58] [PASSED] 0x4680 (ALDERLAKE_S)
[16:10:58] [PASSED] 0x4682 (ALDERLAKE_S)
[16:10:58] [PASSED] 0x4688 (ALDERLAKE_S)
[16:10:58] [PASSED] 0x468A (ALDERLAKE_S)
[16:10:58] [PASSED] 0x468B (ALDERLAKE_S)
[16:10:58] [PASSED] 0x4690 (ALDERLAKE_S)
[16:10:58] [PASSED] 0x4692 (ALDERLAKE_S)
[16:10:58] [PASSED] 0x4693 (ALDERLAKE_S)
[16:10:58] [PASSED] 0x46A0 (ALDERLAKE_P)
[16:10:58] [PASSED] 0x46A1 (ALDERLAKE_P)
[16:10:58] [PASSED] 0x46A2 (ALDERLAKE_P)
[16:10:58] [PASSED] 0x46A3 (ALDERLAKE_P)
[16:10:58] [PASSED] 0x46A6 (ALDERLAKE_P)
[16:10:58] [PASSED] 0x46A8 (ALDERLAKE_P)
[16:10:58] [PASSED] 0x46AA (ALDERLAKE_P)
[16:10:58] [PASSED] 0x462A (ALDERLAKE_P)
[16:10:58] [PASSED] 0x4626 (ALDERLAKE_P)
[16:10:58] [PASSED] 0x4628 (ALDERLAKE_P)
[16:10:58] [PASSED] 0x46B0 (ALDERLAKE_P)
[16:10:58] [PASSED] 0x46B1 (ALDERLAKE_P)
[16:10:58] [PASSED] 0x46B2 (ALDERLAKE_P)
[16:10:58] [PASSED] 0x46B3 (ALDERLAKE_P)
[16:10:58] [PASSED] 0x46C0 (ALDERLAKE_P)
[16:10:58] [PASSED] 0x46C1 (ALDERLAKE_P)
[16:10:58] [PASSED] 0x46C2 (ALDERLAKE_P)
[16:10:58] [PASSED] 0x46C3 (ALDERLAKE_P)
[16:10:58] [PASSED] 0x46D0 (ALDERLAKE_N)
[16:10:58] [PASSED] 0x46D1 (ALDERLAKE_N)
[16:10:58] [PASSED] 0x46D2 (ALDERLAKE_N)
[16:10:58] [PASSED] 0x46D3 (ALDERLAKE_N)
[16:10:58] [PASSED] 0x46D4 (ALDERLAKE_N)
[16:10:58] [PASSED] 0xA721 (ALDERLAKE_P)
[16:10:58] [PASSED] 0xA7A1 (ALDERLAKE_P)
[16:10:58] [PASSED] 0xA7A9 (ALDERLAKE_P)
[16:10:58] [PASSED] 0xA7AC (ALDERLAKE_P)
[16:10:58] [PASSED] 0xA7AD (ALDERLAKE_P)
[16:10:58] [PASSED] 0xA720 (ALDERLAKE_P)
[16:10:58] [PASSED] 0xA7A0 (ALDERLAKE_P)
[16:10:58] [PASSED] 0xA7A8 (ALDERLAKE_P)
[16:10:58] [PASSED] 0xA7AA (ALDERLAKE_P)
[16:10:58] [PASSED] 0xA7AB (ALDERLAKE_P)
[16:10:58] [PASSED] 0xA780 (ALDERLAKE_S)
[16:10:58] [PASSED] 0xA781 (ALDERLAKE_S)
[16:10:58] [PASSED] 0xA782 (ALDERLAKE_S)
[16:10:58] [PASSED] 0xA783 (ALDERLAKE_S)
[16:10:58] [PASSED] 0xA788 (ALDERLAKE_S)
[16:10:58] [PASSED] 0xA789 (ALDERLAKE_S)
[16:10:58] [PASSED] 0xA78A (ALDERLAKE_S)
[16:10:58] [PASSED] 0xA78B (ALDERLAKE_S)
[16:10:58] [PASSED] 0x4905 (DG1)
[16:10:58] [PASSED] 0x4906 (DG1)
[16:10:58] [PASSED] 0x4907 (DG1)
[16:10:58] [PASSED] 0x4908 (DG1)
[16:10:58] [PASSED] 0x4909 (DG1)
[16:10:58] [PASSED] 0x56C0 (DG2)
[16:10:58] [PASSED] 0x56C2 (DG2)
[16:10:58] [PASSED] 0x56C1 (DG2)
[16:10:58] [PASSED] 0x7D51 (METEORLAKE)
[16:10:58] [PASSED] 0x7DD1 (METEORLAKE)
[16:10:58] [PASSED] 0x7D41 (METEORLAKE)
[16:10:58] [PASSED] 0x7D67 (METEORLAKE)
[16:10:58] [PASSED] 0xB640 (METEORLAKE)
[16:10:58] [PASSED] 0x56A0 (DG2)
[16:10:58] [PASSED] 0x56A1 (DG2)
[16:10:58] [PASSED] 0x56A2 (DG2)
[16:10:58] [PASSED] 0x56BE (DG2)
[16:10:58] [PASSED] 0x56BF (DG2)
[16:10:58] [PASSED] 0x5690 (DG2)
[16:10:58] [PASSED] 0x5691 (DG2)
[16:10:58] [PASSED] 0x5692 (DG2)
[16:10:58] [PASSED] 0x56A5 (DG2)
[16:10:58] [PASSED] 0x56A6 (DG2)
[16:10:58] [PASSED] 0x56B0 (DG2)
[16:10:58] [PASSED] 0x56B1 (DG2)
[16:10:58] [PASSED] 0x56BA (DG2)
[16:10:58] [PASSED] 0x56BB (DG2)
[16:10:58] [PASSED] 0x56BC (DG2)
[16:10:58] [PASSED] 0x56BD (DG2)
[16:10:58] [PASSED] 0x5693 (DG2)
[16:10:58] [PASSED] 0x5694 (DG2)
[16:10:58] [PASSED] 0x5695 (DG2)
[16:10:58] [PASSED] 0x56A3 (DG2)
[16:10:58] [PASSED] 0x56A4 (DG2)
[16:10:58] [PASSED] 0x56B2 (DG2)
[16:10:58] [PASSED] 0x56B3 (DG2)
[16:10:58] [PASSED] 0x5696 (DG2)
[16:10:58] [PASSED] 0x5697 (DG2)
[16:10:58] [PASSED] 0xB69 (PVC)
[16:10:58] [PASSED] 0xB6E (PVC)
[16:10:58] [PASSED] 0xBD4 (PVC)
[16:10:58] [PASSED] 0xBD5 (PVC)
[16:10:58] [PASSED] 0xBD6 (PVC)
[16:10:58] [PASSED] 0xBD7 (PVC)
[16:10:58] [PASSED] 0xBD8 (PVC)
[16:10:58] [PASSED] 0xBD9 (PVC)
[16:10:58] [PASSED] 0xBDA (PVC)
[16:10:58] [PASSED] 0xBDB (PVC)
[16:10:58] [PASSED] 0xBE0 (PVC)
[16:10:58] [PASSED] 0xBE1 (PVC)
[16:10:58] [PASSED] 0xBE5 (PVC)
[16:10:58] [PASSED] 0x7D40 (METEORLAKE)
[16:10:58] [PASSED] 0x7D45 (METEORLAKE)
[16:10:58] [PASSED] 0x7D55 (METEORLAKE)
[16:10:58] [PASSED] 0x7D60 (METEORLAKE)
[16:10:58] [PASSED] 0x7DD5 (METEORLAKE)
[16:10:58] [PASSED] 0x6420 (LUNARLAKE)
[16:10:58] [PASSED] 0x64A0 (LUNARLAKE)
[16:10:58] [PASSED] 0x64B0 (LUNARLAKE)
[16:10:58] [PASSED] 0xE202 (BATTLEMAGE)
[16:10:58] [PASSED] 0xE209 (BATTLEMAGE)
[16:10:58] [PASSED] 0xE20B (BATTLEMAGE)
[16:10:58] [PASSED] 0xE20C (BATTLEMAGE)
[16:10:58] [PASSED] 0xE20D (BATTLEMAGE)
[16:10:58] [PASSED] 0xE210 (BATTLEMAGE)
[16:10:58] [PASSED] 0xE211 (BATTLEMAGE)
[16:10:58] [PASSED] 0xE212 (BATTLEMAGE)
[16:10:58] [PASSED] 0xE216 (BATTLEMAGE)
[16:10:58] [PASSED] 0xE220 (BATTLEMAGE)
[16:10:58] [PASSED] 0xE221 (BATTLEMAGE)
[16:10:58] [PASSED] 0xE222 (BATTLEMAGE)
[16:10:58] [PASSED] 0xE223 (BATTLEMAGE)
[16:10:58] [PASSED] 0xB080 (PANTHERLAKE)
[16:10:58] [PASSED] 0xB081 (PANTHERLAKE)
[16:10:58] [PASSED] 0xB082 (PANTHERLAKE)
[16:10:58] [PASSED] 0xB083 (PANTHERLAKE)
[16:10:58] [PASSED] 0xB084 (PANTHERLAKE)
[16:10:58] [PASSED] 0xB085 (PANTHERLAKE)
[16:10:58] [PASSED] 0xB086 (PANTHERLAKE)
[16:10:58] [PASSED] 0xB087 (PANTHERLAKE)
[16:10:58] [PASSED] 0xB08F (PANTHERLAKE)
[16:10:58] [PASSED] 0xB090 (PANTHERLAKE)
[16:10:58] [PASSED] 0xB0A0 (PANTHERLAKE)
[16:10:58] [PASSED] 0xB0B0 (PANTHERLAKE)
[16:10:58] [PASSED] 0xFD80 (PANTHERLAKE)
[16:10:58] [PASSED] 0xFD81 (PANTHERLAKE)
[16:10:58] [PASSED] 0xD740 (NOVALAKE_S)
[16:10:58] [PASSED] 0xD741 (NOVALAKE_S)
[16:10:58] [PASSED] 0xD742 (NOVALAKE_S)
[16:10:58] [PASSED] 0xD743 (NOVALAKE_S)
[16:10:58] [PASSED] 0xD744 (NOVALAKE_S)
[16:10:58] [PASSED] 0xD745 (NOVALAKE_S)
[16:10:58] [PASSED] 0x674C (CRESCENTISLAND)
[16:10:58] [PASSED] 0xD750 (NOVALAKE_P)
[16:10:58] [PASSED] 0xD751 (NOVALAKE_P)
[16:10:58] [PASSED] 0xD752 (NOVALAKE_P)
[16:10:58] [PASSED] 0xD753 (NOVALAKE_P)
[16:10:58] [PASSED] 0xD754 (NOVALAKE_P)
[16:10:58] [PASSED] 0xD755 (NOVALAKE_P)
[16:10:58] [PASSED] 0xD756 (NOVALAKE_P)
[16:10:58] [PASSED] 0xD757 (NOVALAKE_P)
[16:10:58] [PASSED] 0xD75F (NOVALAKE_P)
[16:10:58] =============== [PASSED] check_platform_desc ===============
[16:10:58] ===================== [PASSED] xe_pci ======================
[16:10:58] =================== xe_rtp (2 subtests) ====================
[16:10:58] =============== xe_rtp_process_to_sr_tests  ================
[16:10:58] [PASSED] coalesce-same-reg
[16:10:58] [PASSED] no-match-no-add
[16:10:58] [PASSED] match-or
[16:10:58] [PASSED] match-or-xfail
[16:10:58] [PASSED] no-match-no-add-multiple-rules
[16:10:58] [PASSED] two-regs-two-entries
[16:10:58] [PASSED] clr-one-set-other
[16:10:58] [PASSED] set-field
[16:10:58] [PASSED] conflict-duplicate
stty: 'standard input': Inappropriate ioctl for device
[16:10:58] [PASSED] conflict-not-disjoint
[16:10:58] [PASSED] conflict-reg-type
[16:10:58] =========== [PASSED] xe_rtp_process_to_sr_tests ============
[16:10:58] ================== xe_rtp_process_tests  ===================
[16:10:58] [PASSED] active1
[16:10:58] [PASSED] active2
[16:10:58] [PASSED] active-inactive
[16:10:58] [PASSED] inactive-active
[16:10:58] [PASSED] inactive-1st_or_active-inactive
[16:10:58] [PASSED] inactive-2nd_or_active-inactive
[16:10:58] [PASSED] inactive-last_or_active-inactive
[16:10:58] [PASSED] inactive-no_or_active-inactive
[16:10:58] ============== [PASSED] xe_rtp_process_tests ===============
[16:10:58] ===================== [PASSED] xe_rtp ======================
[16:10:58] ==================== xe_wa (1 subtest) =====================
[16:10:58] ======================== xe_wa_gt  =========================
[16:10:58] [PASSED] TIGERLAKE B0
[16:10:58] [PASSED] DG1 A0
[16:10:58] [PASSED] DG1 B0
[16:10:58] [PASSED] ALDERLAKE_S A0
[16:10:58] [PASSED] ALDERLAKE_S B0
[16:10:58] [PASSED] ALDERLAKE_S C0
[16:10:58] [PASSED] ALDERLAKE_S D0
[16:10:58] [PASSED] ALDERLAKE_P A0
[16:10:58] [PASSED] ALDERLAKE_P B0
[16:10:58] [PASSED] ALDERLAKE_P C0
[16:10:58] [PASSED] ALDERLAKE_S RPLS D0
[16:10:58] [PASSED] ALDERLAKE_P RPLU E0
[16:10:58] [PASSED] DG2 G10 C0
[16:10:58] [PASSED] DG2 G11 B1
[16:10:58] [PASSED] DG2 G12 A1
[16:10:58] [PASSED] METEORLAKE 12.70(Xe_LPG) A0 13.00(Xe_LPM+) A0
[16:10:58] [PASSED] METEORLAKE 12.71(Xe_LPG) A0 13.00(Xe_LPM+) A0
[16:10:58] [PASSED] METEORLAKE 12.74(Xe_LPG+) A0 13.00(Xe_LPM+) A0
[16:10:58] [PASSED] LUNARLAKE 20.04(Xe2_LPG) A0 20.00(Xe2_LPM) A0
[16:10:58] [PASSED] LUNARLAKE 20.04(Xe2_LPG) B0 20.00(Xe2_LPM) A0
[16:10:58] [PASSED] BATTLEMAGE 20.01(Xe2_HPG) A0 13.01(Xe2_HPM) A1
[16:10:58] [PASSED] PANTHERLAKE 30.00(Xe3_LPG) A0 30.00(Xe3_LPM) A0
[16:10:58] ==================== [PASSED] xe_wa_gt =====================
[16:10:58] ====================== [PASSED] xe_wa ======================
[16:10:58] ============================================================
[16:10:58] Testing complete. Ran 597 tests: passed: 579, skipped: 18
[16:10:58] Elapsed time: 35.306s total, 4.288s configuring, 30.401s building, 0.610s running

+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/tests/.kunitconfig
[16:10:58] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[16:11:00] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[16:11:24] Starting KUnit Kernel (1/1)...
[16:11:24] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[16:11:24] ============ drm_test_pick_cmdline (2 subtests) ============
[16:11:24] [PASSED] drm_test_pick_cmdline_res_1920_1080_60
[16:11:24] =============== drm_test_pick_cmdline_named  ===============
[16:11:24] [PASSED] NTSC
[16:11:24] [PASSED] NTSC-J
[16:11:24] [PASSED] PAL
[16:11:24] [PASSED] PAL-M
[16:11:24] =========== [PASSED] drm_test_pick_cmdline_named ===========
[16:11:24] ============== [PASSED] drm_test_pick_cmdline ==============
[16:11:24] == drm_test_atomic_get_connector_for_encoder (1 subtest) ===
[16:11:24] [PASSED] drm_test_drm_atomic_get_connector_for_encoder
[16:11:24] ==== [PASSED] drm_test_atomic_get_connector_for_encoder ====
[16:11:24] =========== drm_validate_clone_mode (2 subtests) ===========
[16:11:24] ============== drm_test_check_in_clone_mode  ===============
[16:11:24] [PASSED] in_clone_mode
[16:11:24] [PASSED] not_in_clone_mode
[16:11:24] ========== [PASSED] drm_test_check_in_clone_mode ===========
[16:11:24] =============== drm_test_check_valid_clones  ===============
[16:11:24] [PASSED] not_in_clone_mode
[16:11:24] [PASSED] valid_clone
[16:11:24] [PASSED] invalid_clone
[16:11:24] =========== [PASSED] drm_test_check_valid_clones ===========
[16:11:24] ============= [PASSED] drm_validate_clone_mode =============
[16:11:24] ============= drm_validate_modeset (1 subtest) =============
[16:11:24] [PASSED] drm_test_check_connector_changed_modeset
[16:11:24] ============== [PASSED] drm_validate_modeset ===============
[16:11:24] ====== drm_test_bridge_get_current_state (2 subtests) ======
[16:11:24] [PASSED] drm_test_drm_bridge_get_current_state_atomic
[16:11:24] [PASSED] drm_test_drm_bridge_get_current_state_legacy
[16:11:24] ======== [PASSED] drm_test_bridge_get_current_state ========
[16:11:24] ====== drm_test_bridge_helper_reset_crtc (3 subtests) ======
[16:11:24] [PASSED] drm_test_drm_bridge_helper_reset_crtc_atomic
[16:11:24] [PASSED] drm_test_drm_bridge_helper_reset_crtc_atomic_disabled
[16:11:24] [PASSED] drm_test_drm_bridge_helper_reset_crtc_legacy
[16:11:24] ======== [PASSED] drm_test_bridge_helper_reset_crtc ========
[16:11:24] ============== drm_bridge_alloc (2 subtests) ===============
[16:11:24] [PASSED] drm_test_drm_bridge_alloc_basic
[16:11:24] [PASSED] drm_test_drm_bridge_alloc_get_put
[16:11:24] ================ [PASSED] drm_bridge_alloc =================
[16:11:24] ============= drm_cmdline_parser (40 subtests) =============
[16:11:24] [PASSED] drm_test_cmdline_force_d_only
[16:11:24] [PASSED] drm_test_cmdline_force_D_only_dvi
[16:11:24] [PASSED] drm_test_cmdline_force_D_only_hdmi
[16:11:24] [PASSED] drm_test_cmdline_force_D_only_not_digital
[16:11:24] [PASSED] drm_test_cmdline_force_e_only
[16:11:24] [PASSED] drm_test_cmdline_res
[16:11:24] [PASSED] drm_test_cmdline_res_vesa
[16:11:24] [PASSED] drm_test_cmdline_res_vesa_rblank
[16:11:24] [PASSED] drm_test_cmdline_res_rblank
[16:11:24] [PASSED] drm_test_cmdline_res_bpp
[16:11:24] [PASSED] drm_test_cmdline_res_refresh
[16:11:24] [PASSED] drm_test_cmdline_res_bpp_refresh
[16:11:24] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced
[16:11:24] [PASSED] drm_test_cmdline_res_bpp_refresh_margins
[16:11:24] [PASSED] drm_test_cmdline_res_bpp_refresh_force_off
[16:11:24] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on
[16:11:24] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_analog
[16:11:24] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_digital
[16:11:24] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced_margins_force_on
[16:11:24] [PASSED] drm_test_cmdline_res_margins_force_on
[16:11:24] [PASSED] drm_test_cmdline_res_vesa_margins
[16:11:24] [PASSED] drm_test_cmdline_name
[16:11:24] [PASSED] drm_test_cmdline_name_bpp
[16:11:24] [PASSED] drm_test_cmdline_name_option
[16:11:24] [PASSED] drm_test_cmdline_name_bpp_option
[16:11:24] [PASSED] drm_test_cmdline_rotate_0
[16:11:24] [PASSED] drm_test_cmdline_rotate_90
[16:11:24] [PASSED] drm_test_cmdline_rotate_180
[16:11:24] [PASSED] drm_test_cmdline_rotate_270
[16:11:24] [PASSED] drm_test_cmdline_hmirror
[16:11:24] [PASSED] drm_test_cmdline_vmirror
[16:11:24] [PASSED] drm_test_cmdline_margin_options
[16:11:24] [PASSED] drm_test_cmdline_multiple_options
[16:11:24] [PASSED] drm_test_cmdline_bpp_extra_and_option
[16:11:24] [PASSED] drm_test_cmdline_extra_and_option
[16:11:24] [PASSED] drm_test_cmdline_freestanding_options
[16:11:24] [PASSED] drm_test_cmdline_freestanding_force_e_and_options
[16:11:24] [PASSED] drm_test_cmdline_panel_orientation
[16:11:24] ================ drm_test_cmdline_invalid  =================
[16:11:24] [PASSED] margin_only
[16:11:24] [PASSED] interlace_only
[16:11:24] [PASSED] res_missing_x
[16:11:24] [PASSED] res_missing_y
[16:11:24] [PASSED] res_bad_y
[16:11:24] [PASSED] res_missing_y_bpp
[16:11:24] [PASSED] res_bad_bpp
[16:11:24] [PASSED] res_bad_refresh
[16:11:24] [PASSED] res_bpp_refresh_force_on_off
[16:11:24] [PASSED] res_invalid_mode
[16:11:24] [PASSED] res_bpp_wrong_place_mode
[16:11:24] [PASSED] name_bpp_refresh
[16:11:24] [PASSED] name_refresh
[16:11:24] [PASSED] name_refresh_wrong_mode
[16:11:24] [PASSED] name_refresh_invalid_mode
[16:11:24] [PASSED] rotate_multiple
[16:11:24] [PASSED] rotate_invalid_val
[16:11:24] [PASSED] rotate_truncated
[16:11:24] [PASSED] invalid_option
[16:11:24] [PASSED] invalid_tv_option
[16:11:24] [PASSED] truncated_tv_option
[16:11:24] ============ [PASSED] drm_test_cmdline_invalid =============
[16:11:24] =============== drm_test_cmdline_tv_options  ===============
[16:11:24] [PASSED] NTSC
[16:11:24] [PASSED] NTSC_443
[16:11:24] [PASSED] NTSC_J
[16:11:24] [PASSED] PAL
[16:11:24] [PASSED] PAL_M
[16:11:24] [PASSED] PAL_N
[16:11:24] [PASSED] SECAM
[16:11:24] [PASSED] MONO_525
[16:11:24] [PASSED] MONO_625
[16:11:24] =========== [PASSED] drm_test_cmdline_tv_options ===========
[16:11:24] =============== [PASSED] drm_cmdline_parser ================
[16:11:24] ========== drmm_connector_hdmi_init (20 subtests) ==========
[16:11:24] [PASSED] drm_test_connector_hdmi_init_valid
[16:11:24] [PASSED] drm_test_connector_hdmi_init_bpc_8
[16:11:24] [PASSED] drm_test_connector_hdmi_init_bpc_10
[16:11:24] [PASSED] drm_test_connector_hdmi_init_bpc_12
[16:11:24] [PASSED] drm_test_connector_hdmi_init_bpc_invalid
[16:11:24] [PASSED] drm_test_connector_hdmi_init_bpc_null
[16:11:24] [PASSED] drm_test_connector_hdmi_init_formats_empty
[16:11:24] [PASSED] drm_test_connector_hdmi_init_formats_no_rgb
[16:11:24] === drm_test_connector_hdmi_init_formats_yuv420_allowed  ===
[16:11:24] [PASSED] supported_formats=0x9 yuv420_allowed=1
[16:11:24] [PASSED] supported_formats=0x9 yuv420_allowed=0
[16:11:24] [PASSED] supported_formats=0x3 yuv420_allowed=1
[16:11:24] [PASSED] supported_formats=0x3 yuv420_allowed=0
[16:11:24] === [PASSED] drm_test_connector_hdmi_init_formats_yuv420_allowed ===
[16:11:24] [PASSED] drm_test_connector_hdmi_init_null_ddc
[16:11:24] [PASSED] drm_test_connector_hdmi_init_null_product
[16:11:24] [PASSED] drm_test_connector_hdmi_init_null_vendor
[16:11:24] [PASSED] drm_test_connector_hdmi_init_product_length_exact
[16:11:24] [PASSED] drm_test_connector_hdmi_init_product_length_too_long
[16:11:24] [PASSED] drm_test_connector_hdmi_init_product_valid
[16:11:24] [PASSED] drm_test_connector_hdmi_init_vendor_length_exact
[16:11:24] [PASSED] drm_test_connector_hdmi_init_vendor_length_too_long
[16:11:24] [PASSED] drm_test_connector_hdmi_init_vendor_valid
[16:11:24] ========= drm_test_connector_hdmi_init_type_valid  =========
[16:11:24] [PASSED] HDMI-A
[16:11:24] [PASSED] HDMI-B
[16:11:24] ===== [PASSED] drm_test_connector_hdmi_init_type_valid =====
[16:11:24] ======== drm_test_connector_hdmi_init_type_invalid  ========
[16:11:24] [PASSED] Unknown
[16:11:24] [PASSED] VGA
[16:11:24] [PASSED] DVI-I
[16:11:24] [PASSED] DVI-D
[16:11:24] [PASSED] DVI-A
[16:11:24] [PASSED] Composite
[16:11:24] [PASSED] SVIDEO
[16:11:24] [PASSED] LVDS
[16:11:24] [PASSED] Component
[16:11:24] [PASSED] DIN
[16:11:24] [PASSED] DP
[16:11:24] [PASSED] TV
[16:11:24] [PASSED] eDP
[16:11:24] [PASSED] Virtual
[16:11:24] [PASSED] DSI
[16:11:24] [PASSED] DPI
[16:11:24] [PASSED] Writeback
[16:11:24] [PASSED] SPI
[16:11:24] [PASSED] USB
[16:11:24] ==== [PASSED] drm_test_connector_hdmi_init_type_invalid ====
[16:11:24] ============ [PASSED] drmm_connector_hdmi_init =============
[16:11:24] ============= drmm_connector_init (3 subtests) =============
[16:11:24] [PASSED] drm_test_drmm_connector_init
[16:11:24] [PASSED] drm_test_drmm_connector_init_null_ddc
[16:11:24] ========= drm_test_drmm_connector_init_type_valid  =========
[16:11:24] [PASSED] Unknown
[16:11:24] [PASSED] VGA
[16:11:24] [PASSED] DVI-I
[16:11:24] [PASSED] DVI-D
[16:11:24] [PASSED] DVI-A
[16:11:24] [PASSED] Composite
[16:11:24] [PASSED] SVIDEO
[16:11:24] [PASSED] LVDS
[16:11:24] [PASSED] Component
[16:11:24] [PASSED] DIN
[16:11:24] [PASSED] DP
[16:11:24] [PASSED] HDMI-A
[16:11:24] [PASSED] HDMI-B
[16:11:24] [PASSED] TV
[16:11:24] [PASSED] eDP
[16:11:24] [PASSED] Virtual
[16:11:24] [PASSED] DSI
[16:11:24] [PASSED] DPI
[16:11:24] [PASSED] Writeback
[16:11:24] [PASSED] SPI
[16:11:24] [PASSED] USB
[16:11:24] ===== [PASSED] drm_test_drmm_connector_init_type_valid =====
[16:11:24] =============== [PASSED] drmm_connector_init ===============
[16:11:24] ========= drm_connector_dynamic_init (6 subtests) ==========
[16:11:24] [PASSED] drm_test_drm_connector_dynamic_init
[16:11:24] [PASSED] drm_test_drm_connector_dynamic_init_null_ddc
[16:11:24] [PASSED] drm_test_drm_connector_dynamic_init_not_added
[16:11:24] [PASSED] drm_test_drm_connector_dynamic_init_properties
[16:11:24] ===== drm_test_drm_connector_dynamic_init_type_valid  ======
[16:11:24] [PASSED] Unknown
[16:11:24] [PASSED] VGA
[16:11:24] [PASSED] DVI-I
[16:11:24] [PASSED] DVI-D
[16:11:24] [PASSED] DVI-A
[16:11:24] [PASSED] Composite
[16:11:24] [PASSED] SVIDEO
[16:11:24] [PASSED] LVDS
[16:11:24] [PASSED] Component
[16:11:24] [PASSED] DIN
[16:11:24] [PASSED] DP
[16:11:24] [PASSED] HDMI-A
[16:11:24] [PASSED] HDMI-B
[16:11:24] [PASSED] TV
[16:11:24] [PASSED] eDP
[16:11:24] [PASSED] Virtual
[16:11:24] [PASSED] DSI
[16:11:24] [PASSED] DPI
[16:11:24] [PASSED] Writeback
[16:11:24] [PASSED] SPI
[16:11:24] [PASSED] USB
[16:11:24] = [PASSED] drm_test_drm_connector_dynamic_init_type_valid ==
[16:11:24] ======== drm_test_drm_connector_dynamic_init_name  =========
[16:11:24] [PASSED] Unknown
[16:11:24] [PASSED] VGA
[16:11:24] [PASSED] DVI-I
[16:11:24] [PASSED] DVI-D
[16:11:24] [PASSED] DVI-A
[16:11:24] [PASSED] Composite
[16:11:24] [PASSED] SVIDEO
[16:11:24] [PASSED] LVDS
[16:11:24] [PASSED] Component
[16:11:24] [PASSED] DIN
[16:11:24] [PASSED] DP
[16:11:24] [PASSED] HDMI-A
[16:11:24] [PASSED] HDMI-B
[16:11:24] [PASSED] TV
[16:11:24] [PASSED] eDP
[16:11:24] [PASSED] Virtual
[16:11:24] [PASSED] DSI
[16:11:24] [PASSED] DPI
[16:11:24] [PASSED] Writeback
[16:11:24] [PASSED] SPI
[16:11:24] [PASSED] USB
[16:11:24] ==== [PASSED] drm_test_drm_connector_dynamic_init_name =====
[16:11:24] =========== [PASSED] drm_connector_dynamic_init ============
[16:11:24] ==== drm_connector_dynamic_register_early (4 subtests) =====
[16:11:24] [PASSED] drm_test_drm_connector_dynamic_register_early_on_list
[16:11:24] [PASSED] drm_test_drm_connector_dynamic_register_early_defer
[16:11:24] [PASSED] drm_test_drm_connector_dynamic_register_early_no_init
[16:11:24] [PASSED] drm_test_drm_connector_dynamic_register_early_no_mode_object
[16:11:24] ====== [PASSED] drm_connector_dynamic_register_early =======
[16:11:24] ======= drm_connector_dynamic_register (7 subtests) ========
[16:11:24] [PASSED] drm_test_drm_connector_dynamic_register_on_list
[16:11:24] [PASSED] drm_test_drm_connector_dynamic_register_no_defer
[16:11:24] [PASSED] drm_test_drm_connector_dynamic_register_no_init
[16:11:24] [PASSED] drm_test_drm_connector_dynamic_register_mode_object
[16:11:24] [PASSED] drm_test_drm_connector_dynamic_register_sysfs
[16:11:24] [PASSED] drm_test_drm_connector_dynamic_register_sysfs_name
[16:11:24] [PASSED] drm_test_drm_connector_dynamic_register_debugfs
[16:11:24] ========= [PASSED] drm_connector_dynamic_register ==========
[16:11:24] = drm_connector_attach_broadcast_rgb_property (2 subtests) =
[16:11:24] [PASSED] drm_test_drm_connector_attach_broadcast_rgb_property
[16:11:24] [PASSED] drm_test_drm_connector_attach_broadcast_rgb_property_hdmi_connector
[16:11:24] === [PASSED] drm_connector_attach_broadcast_rgb_property ===
[16:11:24] ========== drm_get_tv_mode_from_name (2 subtests) ==========
[16:11:24] ========== drm_test_get_tv_mode_from_name_valid  ===========
[16:11:24] [PASSED] NTSC
[16:11:24] [PASSED] NTSC-443
[16:11:24] [PASSED] NTSC-J
[16:11:24] [PASSED] PAL
[16:11:24] [PASSED] PAL-M
[16:11:24] [PASSED] PAL-N
[16:11:24] [PASSED] SECAM
[16:11:24] [PASSED] Mono
[16:11:24] ====== [PASSED] drm_test_get_tv_mode_from_name_valid =======
[16:11:24] [PASSED] drm_test_get_tv_mode_from_name_truncated
[16:11:24] ============ [PASSED] drm_get_tv_mode_from_name ============
[16:11:24] = drm_test_connector_hdmi_compute_mode_clock (12 subtests) =
[16:11:24] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb
[16:11:24] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_10bpc
[16:11:24] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_10bpc_vic_1
[16:11:24] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_12bpc
[16:11:24] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_12bpc_vic_1
[16:11:24] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_double
[16:11:24] = drm_test_connector_hdmi_compute_mode_clock_yuv420_valid  =
[16:11:24] [PASSED] VIC 96
[16:11:24] [PASSED] VIC 97
[16:11:24] [PASSED] VIC 101
[16:11:24] [PASSED] VIC 102
[16:11:24] [PASSED] VIC 106
[16:11:24] [PASSED] VIC 107
[16:11:24] === [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_valid ===
[16:11:24] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_10_bpc
[16:11:24] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_12_bpc
[16:11:24] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_8_bpc
[16:11:24] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_10_bpc
[16:11:24] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_12_bpc
[16:11:24] === [PASSED] drm_test_connector_hdmi_compute_mode_clock ====
[16:11:24] == drm_hdmi_connector_get_broadcast_rgb_name (2 subtests) ==
[16:11:24] === drm_test_drm_hdmi_connector_get_broadcast_rgb_name  ====
[16:11:24] [PASSED] Automatic
[16:11:24] [PASSED] Full
[16:11:24] [PASSED] Limited 16:235
[16:11:24] === [PASSED] drm_test_drm_hdmi_connector_get_broadcast_rgb_name ===
[16:11:24] [PASSED] drm_test_drm_hdmi_connector_get_broadcast_rgb_name_invalid
[16:11:24] ==== [PASSED] drm_hdmi_connector_get_broadcast_rgb_name ====
[16:11:24] == drm_hdmi_connector_get_output_format_name (2 subtests) ==
[16:11:24] === drm_test_drm_hdmi_connector_get_output_format_name  ====
[16:11:24] [PASSED] RGB
[16:11:24] [PASSED] YUV 4:2:0
[16:11:24] [PASSED] YUV 4:2:2
[16:11:24] [PASSED] YUV 4:4:4
[16:11:24] === [PASSED] drm_test_drm_hdmi_connector_get_output_format_name ===
[16:11:24] [PASSED] drm_test_drm_hdmi_connector_get_output_format_name_invalid
[16:11:24] ==== [PASSED] drm_hdmi_connector_get_output_format_name ====
[16:11:24] ============= drm_damage_helper (21 subtests) ==============
[16:11:24] [PASSED] drm_test_damage_iter_no_damage
[16:11:24] [PASSED] drm_test_damage_iter_no_damage_fractional_src
[16:11:24] [PASSED] drm_test_damage_iter_no_damage_src_moved
[16:11:24] [PASSED] drm_test_damage_iter_no_damage_fractional_src_moved
[16:11:24] [PASSED] drm_test_damage_iter_no_damage_not_visible
[16:11:24] [PASSED] drm_test_damage_iter_no_damage_no_crtc
[16:11:24] [PASSED] drm_test_damage_iter_no_damage_no_fb
[16:11:24] [PASSED] drm_test_damage_iter_simple_damage
[16:11:24] [PASSED] drm_test_damage_iter_single_damage
[16:11:24] [PASSED] drm_test_damage_iter_single_damage_intersect_src
[16:11:24] [PASSED] drm_test_damage_iter_single_damage_outside_src
[16:11:24] [PASSED] drm_test_damage_iter_single_damage_fractional_src
[16:11:24] [PASSED] drm_test_damage_iter_single_damage_intersect_fractional_src
[16:11:24] [PASSED] drm_test_damage_iter_single_damage_outside_fractional_src
[16:11:24] [PASSED] drm_test_damage_iter_single_damage_src_moved
[16:11:24] [PASSED] drm_test_damage_iter_single_damage_fractional_src_moved
[16:11:24] [PASSED] drm_test_damage_iter_damage
[16:11:24] [PASSED] drm_test_damage_iter_damage_one_intersect
[16:11:24] [PASSED] drm_test_damage_iter_damage_one_outside
[16:11:24] [PASSED] drm_test_damage_iter_damage_src_moved
[16:11:24] [PASSED] drm_test_damage_iter_damage_not_visible
[16:11:24] ================ [PASSED] drm_damage_helper ================
[16:11:24] ============== drm_dp_mst_helper (3 subtests) ==============
[16:11:24] ============== drm_test_dp_mst_calc_pbn_mode  ==============
[16:11:24] [PASSED] Clock 154000 BPP 30 DSC disabled
[16:11:24] [PASSED] Clock 234000 BPP 30 DSC disabled
[16:11:24] [PASSED] Clock 297000 BPP 24 DSC disabled
[16:11:24] [PASSED] Clock 332880 BPP 24 DSC enabled
[16:11:24] [PASSED] Clock 324540 BPP 24 DSC enabled
[16:11:24] ========== [PASSED] drm_test_dp_mst_calc_pbn_mode ==========
[16:11:24] ============== drm_test_dp_mst_calc_pbn_div  ===============
[16:11:24] [PASSED] Link rate 2000000 lane count 4
[16:11:24] [PASSED] Link rate 2000000 lane count 2
[16:11:24] [PASSED] Link rate 2000000 lane count 1
[16:11:24] [PASSED] Link rate 1350000 lane count 4
[16:11:24] [PASSED] Link rate 1350000 lane count 2
[16:11:24] [PASSED] Link rate 1350000 lane count 1
[16:11:24] [PASSED] Link rate 1000000 lane count 4
[16:11:24] [PASSED] Link rate 1000000 lane count 2
[16:11:24] [PASSED] Link rate 1000000 lane count 1
[16:11:24] [PASSED] Link rate 810000 lane count 4
[16:11:24] [PASSED] Link rate 810000 lane count 2
[16:11:24] [PASSED] Link rate 810000 lane count 1
[16:11:24] [PASSED] Link rate 540000 lane count 4
[16:11:24] [PASSED] Link rate 540000 lane count 2
[16:11:24] [PASSED] Link rate 540000 lane count 1
[16:11:24] [PASSED] Link rate 270000 lane count 4
[16:11:24] [PASSED] Link rate 270000 lane count 2
[16:11:24] [PASSED] Link rate 270000 lane count 1
[16:11:24] [PASSED] Link rate 162000 lane count 4
[16:11:24] [PASSED] Link rate 162000 lane count 2
[16:11:24] [PASSED] Link rate 162000 lane count 1
[16:11:24] ========== [PASSED] drm_test_dp_mst_calc_pbn_div ===========
[16:11:24] ========= drm_test_dp_mst_sideband_msg_req_decode  =========
[16:11:24] [PASSED] DP_ENUM_PATH_RESOURCES with port number
[16:11:24] [PASSED] DP_POWER_UP_PHY with port number
[16:11:24] [PASSED] DP_POWER_DOWN_PHY with port number
[16:11:24] [PASSED] DP_ALLOCATE_PAYLOAD with SDP stream sinks
[16:11:24] [PASSED] DP_ALLOCATE_PAYLOAD with port number
[16:11:24] [PASSED] DP_ALLOCATE_PAYLOAD with VCPI
[16:11:24] [PASSED] DP_ALLOCATE_PAYLOAD with PBN
[16:11:24] [PASSED] DP_QUERY_PAYLOAD with port number
[16:11:24] [PASSED] DP_QUERY_PAYLOAD with VCPI
[16:11:24] [PASSED] DP_REMOTE_DPCD_READ with port number
[16:11:24] [PASSED] DP_REMOTE_DPCD_READ with DPCD address
[16:11:24] [PASSED] DP_REMOTE_DPCD_READ with max number of bytes
[16:11:24] [PASSED] DP_REMOTE_DPCD_WRITE with port number
[16:11:24] [PASSED] DP_REMOTE_DPCD_WRITE with DPCD address
[16:11:24] [PASSED] DP_REMOTE_DPCD_WRITE with data array
[16:11:24] [PASSED] DP_REMOTE_I2C_READ with port number
[16:11:24] [PASSED] DP_REMOTE_I2C_READ with I2C device ID
[16:11:24] [PASSED] DP_REMOTE_I2C_READ with transactions array
[16:11:24] [PASSED] DP_REMOTE_I2C_WRITE with port number
[16:11:24] [PASSED] DP_REMOTE_I2C_WRITE with I2C device ID
[16:11:24] [PASSED] DP_REMOTE_I2C_WRITE with data array
[16:11:24] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream ID
[16:11:24] [PASSED] DP_QUERY_STREAM_ENC_STATUS with client ID
[16:11:24] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream event
[16:11:24] [PASSED] DP_QUERY_STREAM_ENC_STATUS with valid stream event
[16:11:24] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream behavior
[16:11:24] [PASSED] DP_QUERY_STREAM_ENC_STATUS with a valid stream behavior
[16:11:24] ===== [PASSED] drm_test_dp_mst_sideband_msg_req_decode =====
[16:11:24] ================ [PASSED] drm_dp_mst_helper ================
[16:11:24] ================== drm_exec (7 subtests) ===================
[16:11:24] [PASSED] sanitycheck
[16:11:24] [PASSED] test_lock
[16:11:24] [PASSED] test_lock_unlock
[16:11:24] [PASSED] test_duplicates
[16:11:24] [PASSED] test_prepare
[16:11:24] [PASSED] test_prepare_array
[16:11:24] [PASSED] test_multiple_loops
[16:11:24] ==================== [PASSED] drm_exec =====================
[16:11:24] =========== drm_format_helper_test (17 subtests) ===========
[16:11:24] ============== drm_test_fb_xrgb8888_to_gray8  ==============
[16:11:24] [PASSED] single_pixel_source_buffer
[16:11:24] [PASSED] single_pixel_clip_rectangle
[16:11:24] [PASSED] well_known_colors
[16:11:24] [PASSED] destination_pitch
[16:11:24] ========== [PASSED] drm_test_fb_xrgb8888_to_gray8 ==========
[16:11:24] ============= drm_test_fb_xrgb8888_to_rgb332  ==============
[16:11:24] [PASSED] single_pixel_source_buffer
[16:11:24] [PASSED] single_pixel_clip_rectangle
[16:11:24] [PASSED] well_known_colors
[16:11:24] [PASSED] destination_pitch
[16:11:24] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb332 ==========
[16:11:24] ============= drm_test_fb_xrgb8888_to_rgb565  ==============
[16:11:24] [PASSED] single_pixel_source_buffer
[16:11:24] [PASSED] single_pixel_clip_rectangle
[16:11:24] [PASSED] well_known_colors
[16:11:24] [PASSED] destination_pitch
[16:11:24] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb565 ==========
[16:11:24] ============ drm_test_fb_xrgb8888_to_xrgb1555  =============
[16:11:24] [PASSED] single_pixel_source_buffer
[16:11:24] [PASSED] single_pixel_clip_rectangle
[16:11:24] [PASSED] well_known_colors
[16:11:24] [PASSED] destination_pitch
[16:11:24] ======== [PASSED] drm_test_fb_xrgb8888_to_xrgb1555 =========
[16:11:24] ============ drm_test_fb_xrgb8888_to_argb1555  =============
[16:11:24] [PASSED] single_pixel_source_buffer
[16:11:24] [PASSED] single_pixel_clip_rectangle
[16:11:24] [PASSED] well_known_colors
[16:11:24] [PASSED] destination_pitch
[16:11:24] ======== [PASSED] drm_test_fb_xrgb8888_to_argb1555 =========
[16:11:24] ============ drm_test_fb_xrgb8888_to_rgba5551  =============
[16:11:24] [PASSED] single_pixel_source_buffer
[16:11:24] [PASSED] single_pixel_clip_rectangle
[16:11:24] [PASSED] well_known_colors
[16:11:24] [PASSED] destination_pitch
[16:11:24] ======== [PASSED] drm_test_fb_xrgb8888_to_rgba5551 =========
[16:11:24] ============= drm_test_fb_xrgb8888_to_rgb888  ==============
[16:11:24] [PASSED] single_pixel_source_buffer
[16:11:24] [PASSED] single_pixel_clip_rectangle
[16:11:24] [PASSED] well_known_colors
[16:11:24] [PASSED] destination_pitch
[16:11:24] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb888 ==========
[16:11:24] ============= drm_test_fb_xrgb8888_to_bgr888  ==============
[16:11:24] [PASSED] single_pixel_source_buffer
[16:11:24] [PASSED] single_pixel_clip_rectangle
[16:11:24] [PASSED] well_known_colors
[16:11:24] [PASSED] destination_pitch
[16:11:24] ========= [PASSED] drm_test_fb_xrgb8888_to_bgr888 ==========
[16:11:24] ============ drm_test_fb_xrgb8888_to_argb8888  =============
[16:11:24] [PASSED] single_pixel_source_buffer
[16:11:24] [PASSED] single_pixel_clip_rectangle
[16:11:25] [PASSED] well_known_colors
[16:11:25] [PASSED] destination_pitch
[16:11:25] ======== [PASSED] drm_test_fb_xrgb8888_to_argb8888 =========
[16:11:25] =========== drm_test_fb_xrgb8888_to_xrgb2101010  ===========
[16:11:25] [PASSED] single_pixel_source_buffer
[16:11:25] [PASSED] single_pixel_clip_rectangle
[16:11:25] [PASSED] well_known_colors
[16:11:25] [PASSED] destination_pitch
[16:11:25] ======= [PASSED] drm_test_fb_xrgb8888_to_xrgb2101010 =======
[16:11:25] =========== drm_test_fb_xrgb8888_to_argb2101010  ===========
[16:11:25] [PASSED] single_pixel_source_buffer
[16:11:25] [PASSED] single_pixel_clip_rectangle
[16:11:25] [PASSED] well_known_colors
[16:11:25] [PASSED] destination_pitch
[16:11:25] ======= [PASSED] drm_test_fb_xrgb8888_to_argb2101010 =======
[16:11:25] ============== drm_test_fb_xrgb8888_to_mono  ===============
[16:11:25] [PASSED] single_pixel_source_buffer
[16:11:25] [PASSED] single_pixel_clip_rectangle
[16:11:25] [PASSED] well_known_colors
[16:11:25] [PASSED] destination_pitch
[16:11:25] ========== [PASSED] drm_test_fb_xrgb8888_to_mono ===========
[16:11:25] ==================== drm_test_fb_swab  =====================
[16:11:25] [PASSED] single_pixel_source_buffer
[16:11:25] [PASSED] single_pixel_clip_rectangle
[16:11:25] [PASSED] well_known_colors
[16:11:25] [PASSED] destination_pitch
[16:11:25] ================ [PASSED] drm_test_fb_swab =================
[16:11:25] ============ drm_test_fb_xrgb8888_to_xbgr8888  =============
[16:11:25] [PASSED] single_pixel_source_buffer
[16:11:25] [PASSED] single_pixel_clip_rectangle
[16:11:25] [PASSED] well_known_colors
[16:11:25] [PASSED] destination_pitch
[16:11:25] ======== [PASSED] drm_test_fb_xrgb8888_to_xbgr8888 =========
[16:11:25] ============ drm_test_fb_xrgb8888_to_abgr8888  =============
[16:11:25] [PASSED] single_pixel_source_buffer
[16:11:25] [PASSED] single_pixel_clip_rectangle
[16:11:25] [PASSED] well_known_colors
[16:11:25] [PASSED] destination_pitch
[16:11:25] ======== [PASSED] drm_test_fb_xrgb8888_to_abgr8888 =========
[16:11:25] ================= drm_test_fb_clip_offset  =================
[16:11:25] [PASSED] pass through
[16:11:25] [PASSED] horizontal offset
[16:11:25] [PASSED] vertical offset
[16:11:25] [PASSED] horizontal and vertical offset
[16:11:25] [PASSED] horizontal offset (custom pitch)
[16:11:25] [PASSED] vertical offset (custom pitch)
[16:11:25] [PASSED] horizontal and vertical offset (custom pitch)
[16:11:25] ============= [PASSED] drm_test_fb_clip_offset =============
[16:11:25] =================== drm_test_fb_memcpy  ====================
[16:11:25] [PASSED] single_pixel_source_buffer: XR24 little-endian (0x34325258)
[16:11:25] [PASSED] single_pixel_source_buffer: XRA8 little-endian (0x38415258)
[16:11:25] [PASSED] single_pixel_source_buffer: YU24 little-endian (0x34325559)
[16:11:25] [PASSED] single_pixel_clip_rectangle: XB24 little-endian (0x34324258)
[16:11:25] [PASSED] single_pixel_clip_rectangle: XRA8 little-endian (0x38415258)
[16:11:25] [PASSED] single_pixel_clip_rectangle: YU24 little-endian (0x34325559)
[16:11:25] [PASSED] well_known_colors: XB24 little-endian (0x34324258)
[16:11:25] [PASSED] well_known_colors: XRA8 little-endian (0x38415258)
[16:11:25] [PASSED] well_known_colors: YU24 little-endian (0x34325559)
[16:11:25] [PASSED] destination_pitch: XB24 little-endian (0x34324258)
[16:11:25] [PASSED] destination_pitch: XRA8 little-endian (0x38415258)
[16:11:25] [PASSED] destination_pitch: YU24 little-endian (0x34325559)
[16:11:25] =============== [PASSED] drm_test_fb_memcpy ================
[16:11:25] ============= [PASSED] drm_format_helper_test ==============
[16:11:25] ================= drm_format (18 subtests) =================
[16:11:25] [PASSED] drm_test_format_block_width_invalid
[16:11:25] [PASSED] drm_test_format_block_width_one_plane
[16:11:25] [PASSED] drm_test_format_block_width_two_plane
[16:11:25] [PASSED] drm_test_format_block_width_three_plane
[16:11:25] [PASSED] drm_test_format_block_width_tiled
[16:11:25] [PASSED] drm_test_format_block_height_invalid
[16:11:25] [PASSED] drm_test_format_block_height_one_plane
[16:11:25] [PASSED] drm_test_format_block_height_two_plane
[16:11:25] [PASSED] drm_test_format_block_height_three_plane
[16:11:25] [PASSED] drm_test_format_block_height_tiled
[16:11:25] [PASSED] drm_test_format_min_pitch_invalid
[16:11:25] [PASSED] drm_test_format_min_pitch_one_plane_8bpp
[16:11:25] [PASSED] drm_test_format_min_pitch_one_plane_16bpp
[16:11:25] [PASSED] drm_test_format_min_pitch_one_plane_24bpp
[16:11:25] [PASSED] drm_test_format_min_pitch_one_plane_32bpp
[16:11:25] [PASSED] drm_test_format_min_pitch_two_plane
[16:11:25] [PASSED] drm_test_format_min_pitch_three_plane_8bpp
[16:11:25] [PASSED] drm_test_format_min_pitch_tiled
[16:11:25] =================== [PASSED] drm_format ====================
[16:11:25] ============== drm_framebuffer (10 subtests) ===============
[16:11:25] ========== drm_test_framebuffer_check_src_coords  ==========
[16:11:25] [PASSED] Success: source fits into fb
[16:11:25] [PASSED] Fail: overflowing fb with x-axis coordinate
[16:11:25] [PASSED] Fail: overflowing fb with y-axis coordinate
[16:11:25] [PASSED] Fail: overflowing fb with source width
[16:11:25] [PASSED] Fail: overflowing fb with source height
[16:11:25] ====== [PASSED] drm_test_framebuffer_check_src_coords ======
[16:11:25] [PASSED] drm_test_framebuffer_cleanup
[16:11:25] =============== drm_test_framebuffer_create  ===============
[16:11:25] [PASSED] ABGR8888 normal sizes
[16:11:25] [PASSED] ABGR8888 max sizes
[16:11:25] [PASSED] ABGR8888 pitch greater than min required
[16:11:25] [PASSED] ABGR8888 pitch less than min required
[16:11:25] [PASSED] ABGR8888 Invalid width
[16:11:25] [PASSED] ABGR8888 Invalid buffer handle
[16:11:25] [PASSED] No pixel format
[16:11:25] [PASSED] ABGR8888 Width 0
[16:11:25] [PASSED] ABGR8888 Height 0
[16:11:25] [PASSED] ABGR8888 Out of bound height * pitch combination
[16:11:25] [PASSED] ABGR8888 Large buffer offset
[16:11:25] [PASSED] ABGR8888 Buffer offset for inexistent plane
[16:11:25] [PASSED] ABGR8888 Invalid flag
[16:11:25] [PASSED] ABGR8888 Set DRM_MODE_FB_MODIFIERS without modifiers
[16:11:25] [PASSED] ABGR8888 Valid buffer modifier
[16:11:25] [PASSED] ABGR8888 Invalid buffer modifier(DRM_FORMAT_MOD_SAMSUNG_64_32_TILE)
[16:11:25] [PASSED] ABGR8888 Extra pitches without DRM_MODE_FB_MODIFIERS
[16:11:25] [PASSED] ABGR8888 Extra pitches with DRM_MODE_FB_MODIFIERS
[16:11:25] [PASSED] NV12 Normal sizes
[16:11:25] [PASSED] NV12 Max sizes
[16:11:25] [PASSED] NV12 Invalid pitch
[16:11:25] [PASSED] NV12 Invalid modifier/missing DRM_MODE_FB_MODIFIERS flag
[16:11:25] [PASSED] NV12 different  modifier per-plane
[16:11:25] [PASSED] NV12 with DRM_FORMAT_MOD_SAMSUNG_64_32_TILE
[16:11:25] [PASSED] NV12 Valid modifiers without DRM_MODE_FB_MODIFIERS
[16:11:25] [PASSED] NV12 Modifier for inexistent plane
[16:11:25] [PASSED] NV12 Handle for inexistent plane
[16:11:25] [PASSED] NV12 Handle for inexistent plane without DRM_MODE_FB_MODIFIERS
[16:11:25] [PASSED] YVU420 DRM_MODE_FB_MODIFIERS set without modifier
[16:11:25] [PASSED] YVU420 Normal sizes
[16:11:25] [PASSED] YVU420 Max sizes
[16:11:25] [PASSED] YVU420 Invalid pitch
[16:11:25] [PASSED] YVU420 Different pitches
[16:11:25] [PASSED] YVU420 Different buffer offsets/pitches
[16:11:25] [PASSED] YVU420 Modifier set just for plane 0, without DRM_MODE_FB_MODIFIERS
[16:11:25] [PASSED] YVU420 Modifier set just for planes 0, 1, without DRM_MODE_FB_MODIFIERS
[16:11:25] [PASSED] YVU420 Modifier set just for plane 0, 1, with DRM_MODE_FB_MODIFIERS
[16:11:25] [PASSED] YVU420 Valid modifier
[16:11:25] [PASSED] YVU420 Different modifiers per plane
[16:11:25] [PASSED] YVU420 Modifier for inexistent plane
[16:11:25] [PASSED] YUV420_10BIT Invalid modifier(DRM_FORMAT_MOD_LINEAR)
[16:11:25] [PASSED] X0L2 Normal sizes
[16:11:25] [PASSED] X0L2 Max sizes
[16:11:25] [PASSED] X0L2 Invalid pitch
[16:11:25] [PASSED] X0L2 Pitch greater than minimum required
[16:11:25] [PASSED] X0L2 Handle for inexistent plane
[16:11:25] [PASSED] X0L2 Offset for inexistent plane, without DRM_MODE_FB_MODIFIERS set
[16:11:25] [PASSED] X0L2 Modifier without DRM_MODE_FB_MODIFIERS set
[16:11:25] [PASSED] X0L2 Valid modifier
[16:11:25] [PASSED] X0L2 Modifier for inexistent plane
[16:11:25] =========== [PASSED] drm_test_framebuffer_create ===========
[16:11:25] [PASSED] drm_test_framebuffer_free
[16:11:25] [PASSED] drm_test_framebuffer_init
[16:11:25] [PASSED] drm_test_framebuffer_init_bad_format
[16:11:25] [PASSED] drm_test_framebuffer_init_dev_mismatch
[16:11:25] [PASSED] drm_test_framebuffer_lookup
[16:11:25] [PASSED] drm_test_framebuffer_lookup_inexistent
[16:11:25] [PASSED] drm_test_framebuffer_modifiers_not_supported
[16:11:25] ================= [PASSED] drm_framebuffer =================
[16:11:25] ================ drm_gem_shmem (8 subtests) ================
[16:11:25] [PASSED] drm_gem_shmem_test_obj_create
[16:11:25] [PASSED] drm_gem_shmem_test_obj_create_private
[16:11:25] [PASSED] drm_gem_shmem_test_pin_pages
[16:11:25] [PASSED] drm_gem_shmem_test_vmap
[16:11:25] [PASSED] drm_gem_shmem_test_get_sg_table
[16:11:25] [PASSED] drm_gem_shmem_test_get_pages_sgt
[16:11:25] [PASSED] drm_gem_shmem_test_madvise
[16:11:25] [PASSED] drm_gem_shmem_test_purge
[16:11:25] ================== [PASSED] drm_gem_shmem ==================
[16:11:25] === drm_atomic_helper_connector_hdmi_check (27 subtests) ===
[16:11:25] [PASSED] drm_test_check_broadcast_rgb_auto_cea_mode
[16:11:25] [PASSED] drm_test_check_broadcast_rgb_auto_cea_mode_vic_1
[16:11:25] [PASSED] drm_test_check_broadcast_rgb_full_cea_mode
[16:11:25] [PASSED] drm_test_check_broadcast_rgb_full_cea_mode_vic_1
[16:11:25] [PASSED] drm_test_check_broadcast_rgb_limited_cea_mode
[16:11:25] [PASSED] drm_test_check_broadcast_rgb_limited_cea_mode_vic_1
[16:11:25] ====== drm_test_check_broadcast_rgb_cea_mode_yuv420  =======
[16:11:25] [PASSED] Automatic
[16:11:25] [PASSED] Full
[16:11:25] [PASSED] Limited 16:235
[16:11:25] == [PASSED] drm_test_check_broadcast_rgb_cea_mode_yuv420 ===
[16:11:25] [PASSED] drm_test_check_broadcast_rgb_crtc_mode_changed
[16:11:25] [PASSED] drm_test_check_broadcast_rgb_crtc_mode_not_changed
[16:11:25] [PASSED] drm_test_check_disable_connector
[16:11:25] [PASSED] drm_test_check_hdmi_funcs_reject_rate
[16:11:25] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_rgb
[16:11:25] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_yuv420
[16:11:25] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_ignore_yuv422
[16:11:25] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_ignore_yuv420
[16:11:25] [PASSED] drm_test_check_driver_unsupported_fallback_yuv420
[16:11:25] [PASSED] drm_test_check_output_bpc_crtc_mode_changed
[16:11:25] [PASSED] drm_test_check_output_bpc_crtc_mode_not_changed
[16:11:25] [PASSED] drm_test_check_output_bpc_dvi
[16:11:25] [PASSED] drm_test_check_output_bpc_format_vic_1
[16:11:25] [PASSED] drm_test_check_output_bpc_format_display_8bpc_only
[16:11:25] [PASSED] drm_test_check_output_bpc_format_display_rgb_only
[16:11:25] [PASSED] drm_test_check_output_bpc_format_driver_8bpc_only
[16:11:25] [PASSED] drm_test_check_output_bpc_format_driver_rgb_only
[16:11:25] [PASSED] drm_test_check_tmds_char_rate_rgb_8bpc
[16:11:25] [PASSED] drm_test_check_tmds_char_rate_rgb_10bpc
[16:11:25] [PASSED] drm_test_check_tmds_char_rate_rgb_12bpc
[16:11:25] ===== [PASSED] drm_atomic_helper_connector_hdmi_check ======
[16:11:25] === drm_atomic_helper_connector_hdmi_reset (6 subtests) ====
[16:11:25] [PASSED] drm_test_check_broadcast_rgb_value
[16:11:25] [PASSED] drm_test_check_bpc_8_value
[16:11:25] [PASSED] drm_test_check_bpc_10_value
[16:11:25] [PASSED] drm_test_check_bpc_12_value
[16:11:25] [PASSED] drm_test_check_format_value
[16:11:25] [PASSED] drm_test_check_tmds_char_value
[16:11:25] ===== [PASSED] drm_atomic_helper_connector_hdmi_reset ======
[16:11:25] = drm_atomic_helper_connector_hdmi_mode_valid (4 subtests) =
[16:11:25] [PASSED] drm_test_check_mode_valid
[16:11:25] [PASSED] drm_test_check_mode_valid_reject
[16:11:25] [PASSED] drm_test_check_mode_valid_reject_rate
[16:11:25] [PASSED] drm_test_check_mode_valid_reject_max_clock
[16:11:25] === [PASSED] drm_atomic_helper_connector_hdmi_mode_valid ===
[16:11:25] = drm_atomic_helper_connector_hdmi_infoframes (5 subtests) =
[16:11:25] [PASSED] drm_test_check_infoframes
[16:11:25] [PASSED] drm_test_check_reject_avi_infoframe
[16:11:25] [PASSED] drm_test_check_reject_hdr_infoframe_bpc_8
[16:11:25] [PASSED] drm_test_check_reject_hdr_infoframe_bpc_10
[16:11:25] [PASSED] drm_test_check_reject_audio_infoframe
[16:11:25] === [PASSED] drm_atomic_helper_connector_hdmi_infoframes ===
[16:11:25] ================= drm_managed (2 subtests) =================
[16:11:25] [PASSED] drm_test_managed_release_action
[16:11:25] [PASSED] drm_test_managed_run_action
[16:11:25] =================== [PASSED] drm_managed ===================
[16:11:25] =================== drm_mm (6 subtests) ====================
[16:11:25] [PASSED] drm_test_mm_init
[16:11:25] [PASSED] drm_test_mm_debug
[16:11:25] [PASSED] drm_test_mm_align32
[16:11:25] [PASSED] drm_test_mm_align64
[16:11:25] [PASSED] drm_test_mm_lowest
[16:11:25] [PASSED] drm_test_mm_highest
[16:11:25] ===================== [PASSED] drm_mm ======================
[16:11:25] ============= drm_modes_analog_tv (5 subtests) =============
[16:11:25] [PASSED] drm_test_modes_analog_tv_mono_576i
[16:11:25] [PASSED] drm_test_modes_analog_tv_ntsc_480i
[16:11:25] [PASSED] drm_test_modes_analog_tv_ntsc_480i_inlined
[16:11:25] [PASSED] drm_test_modes_analog_tv_pal_576i
[16:11:25] [PASSED] drm_test_modes_analog_tv_pal_576i_inlined
[16:11:25] =============== [PASSED] drm_modes_analog_tv ===============
[16:11:25] ============== drm_plane_helper (2 subtests) ===============
[16:11:25] =============== drm_test_check_plane_state  ================
[16:11:25] [PASSED] clipping_simple
[16:11:25] [PASSED] clipping_rotate_reflect
[16:11:25] [PASSED] positioning_simple
[16:11:25] [PASSED] upscaling
[16:11:25] [PASSED] downscaling
[16:11:25] [PASSED] rounding1
[16:11:25] [PASSED] rounding2
[16:11:25] [PASSED] rounding3
[16:11:25] [PASSED] rounding4
[16:11:25] =========== [PASSED] drm_test_check_plane_state ============
[16:11:25] =========== drm_test_check_invalid_plane_state  ============
[16:11:25] [PASSED] positioning_invalid
[16:11:25] [PASSED] upscaling_invalid
[16:11:25] [PASSED] downscaling_invalid
[16:11:25] ======= [PASSED] drm_test_check_invalid_plane_state ========
[16:11:25] ================ [PASSED] drm_plane_helper =================
[16:11:25] ====== drm_connector_helper_tv_get_modes (1 subtest) =======
[16:11:25] ====== drm_test_connector_helper_tv_get_modes_check  =======
[16:11:25] [PASSED] None
[16:11:25] [PASSED] PAL
[16:11:25] [PASSED] NTSC
[16:11:25] [PASSED] Both, NTSC Default
[16:11:25] [PASSED] Both, PAL Default
[16:11:25] [PASSED] Both, NTSC Default, with PAL on command-line
[16:11:25] [PASSED] Both, PAL Default, with NTSC on command-line
[16:11:25] == [PASSED] drm_test_connector_helper_tv_get_modes_check ===
[16:11:25] ======== [PASSED] drm_connector_helper_tv_get_modes ========
[16:11:25] ================== drm_rect (9 subtests) ===================
[16:11:25] [PASSED] drm_test_rect_clip_scaled_div_by_zero
[16:11:25] [PASSED] drm_test_rect_clip_scaled_not_clipped
[16:11:25] [PASSED] drm_test_rect_clip_scaled_clipped
[16:11:25] [PASSED] drm_test_rect_clip_scaled_signed_vs_unsigned
[16:11:25] ================= drm_test_rect_intersect  =================
[16:11:25] [PASSED] top-left x bottom-right: 2x2+1+1 x 2x2+0+0
[16:11:25] [PASSED] top-right x bottom-left: 2x2+0+0 x 2x2+1-1
[16:11:25] [PASSED] bottom-left x top-right: 2x2+1-1 x 2x2+0+0
[16:11:25] [PASSED] bottom-right x top-left: 2x2+0+0 x 2x2+1+1
[16:11:25] [PASSED] right x left: 2x1+0+0 x 3x1+1+0
[16:11:25] [PASSED] left x right: 3x1+1+0 x 2x1+0+0
[16:11:25] [PASSED] up x bottom: 1x2+0+0 x 1x3+0-1
[16:11:25] [PASSED] bottom x up: 1x3+0-1 x 1x2+0+0
[16:11:25] [PASSED] touching corner: 1x1+0+0 x 2x2+1+1
[16:11:25] [PASSED] touching side: 1x1+0+0 x 1x1+1+0
[16:11:25] [PASSED] equal rects: 2x2+0+0 x 2x2+0+0
[16:11:25] [PASSED] inside another: 2x2+0+0 x 1x1+1+1
[16:11:25] [PASSED] far away: 1x1+0+0 x 1x1+3+6
[16:11:25] [PASSED] points intersecting: 0x0+5+10 x 0x0+5+10
[16:11:25] [PASSED] points not intersecting: 0x0+0+0 x 0x0+5+10
[16:11:25] ============= [PASSED] drm_test_rect_intersect =============
[16:11:25] ================ drm_test_rect_calc_hscale  ================
[16:11:25] [PASSED] normal use
[16:11:25] [PASSED] out of max range
[16:11:25] [PASSED] out of min range
[16:11:25] [PASSED] zero dst
[16:11:25] [PASSED] negative src
[16:11:25] [PASSED] negative dst
[16:11:25] ============ [PASSED] drm_test_rect_calc_hscale ============
[16:11:25] ================ drm_test_rect_calc_vscale  ================
[16:11:25] [PASSED] normal use
[16:11:25] [PASSED] out of max range
[16:11:25] [PASSED] out of min range
[16:11:25] [PASSED] zero dst
[16:11:25] [PASSED] negative src
[16:11:25] [PASSED] negative dst
stty: 'standard input': Inappropriate ioctl for device
[16:11:25] ============ [PASSED] drm_test_rect_calc_vscale ============
[16:11:25] ================== drm_test_rect_rotate  ===================
[16:11:25] [PASSED] reflect-x
[16:11:25] [PASSED] reflect-y
[16:11:25] [PASSED] rotate-0
[16:11:25] [PASSED] rotate-90
[16:11:25] [PASSED] rotate-180
[16:11:25] [PASSED] rotate-270
[16:11:25] ============== [PASSED] drm_test_rect_rotate ===============
[16:11:25] ================ drm_test_rect_rotate_inv  =================
[16:11:25] [PASSED] reflect-x
[16:11:25] [PASSED] reflect-y
[16:11:25] [PASSED] rotate-0
[16:11:25] [PASSED] rotate-90
[16:11:25] [PASSED] rotate-180
[16:11:25] [PASSED] rotate-270
[16:11:25] ============ [PASSED] drm_test_rect_rotate_inv =============
[16:11:25] ==================== [PASSED] drm_rect =====================
[16:11:25] ============ drm_sysfb_modeset_test (1 subtest) ============
[16:11:25] ============ drm_test_sysfb_build_fourcc_list  =============
[16:11:25] [PASSED] no native formats
[16:11:25] [PASSED] XRGB8888 as native format
[16:11:25] [PASSED] remove duplicates
[16:11:25] [PASSED] convert alpha formats
[16:11:25] [PASSED] random formats
[16:11:25] ======== [PASSED] drm_test_sysfb_build_fourcc_list =========
[16:11:25] ============= [PASSED] drm_sysfb_modeset_test ==============
[16:11:25] ================== drm_fixp (2 subtests) ===================
[16:11:25] [PASSED] drm_test_int2fixp
[16:11:25] [PASSED] drm_test_sm2fixp
[16:11:25] ==================== [PASSED] drm_fixp =====================
[16:11:25] ============================================================
[16:11:25] Testing complete. Ran 621 tests: passed: 621
[16:11:25] Elapsed time: 26.106s total, 1.652s configuring, 24.283s building, 0.136s running

+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/ttm/tests/.kunitconfig
[16:11:25] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[16:11:26] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[16:11:36] Starting KUnit Kernel (1/1)...
[16:11:36] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[16:11:36] ================= ttm_device (5 subtests) ==================
[16:11:36] [PASSED] ttm_device_init_basic
[16:11:36] [PASSED] ttm_device_init_multiple
[16:11:36] [PASSED] ttm_device_fini_basic
[16:11:36] [PASSED] ttm_device_init_no_vma_man
[16:11:36] ================== ttm_device_init_pools  ==================
[16:11:36] [PASSED] No DMA allocations, no DMA32 required
[16:11:36] [PASSED] DMA allocations, DMA32 required
[16:11:36] [PASSED] No DMA allocations, DMA32 required
[16:11:36] [PASSED] DMA allocations, no DMA32 required
[16:11:36] ============== [PASSED] ttm_device_init_pools ==============
[16:11:36] =================== [PASSED] ttm_device ====================
[16:11:36] ================== ttm_pool (8 subtests) ===================
[16:11:36] ================== ttm_pool_alloc_basic  ===================
[16:11:36] [PASSED] One page
[16:11:36] [PASSED] More than one page
[16:11:36] [PASSED] Above the allocation limit
[16:11:36] [PASSED] One page, with coherent DMA mappings enabled
[16:11:36] [PASSED] Above the allocation limit, with coherent DMA mappings enabled
[16:11:36] ============== [PASSED] ttm_pool_alloc_basic ===============
[16:11:36] ============== ttm_pool_alloc_basic_dma_addr  ==============
[16:11:36] [PASSED] One page
[16:11:36] [PASSED] More than one page
[16:11:36] [PASSED] Above the allocation limit
[16:11:36] [PASSED] One page, with coherent DMA mappings enabled
[16:11:36] [PASSED] Above the allocation limit, with coherent DMA mappings enabled
[16:11:36] ========== [PASSED] ttm_pool_alloc_basic_dma_addr ==========
[16:11:36] [PASSED] ttm_pool_alloc_order_caching_match
[16:11:36] [PASSED] ttm_pool_alloc_caching_mismatch
[16:11:36] [PASSED] ttm_pool_alloc_order_mismatch
[16:11:36] [PASSED] ttm_pool_free_dma_alloc
[16:11:36] [PASSED] ttm_pool_free_no_dma_alloc
[16:11:36] [PASSED] ttm_pool_fini_basic
[16:11:36] ==================== [PASSED] ttm_pool =====================
[16:11:36] ================ ttm_resource (8 subtests) =================
[16:11:36] ================= ttm_resource_init_basic  =================
[16:11:36] [PASSED] Init resource in TTM_PL_SYSTEM
[16:11:36] [PASSED] Init resource in TTM_PL_VRAM
[16:11:36] [PASSED] Init resource in a private placement
[16:11:36] [PASSED] Init resource in TTM_PL_SYSTEM, set placement flags
[16:11:36] ============= [PASSED] ttm_resource_init_basic =============
[16:11:36] [PASSED] ttm_resource_init_pinned
[16:11:36] [PASSED] ttm_resource_fini_basic
[16:11:36] [PASSED] ttm_resource_manager_init_basic
[16:11:36] [PASSED] ttm_resource_manager_usage_basic
[16:11:36] [PASSED] ttm_resource_manager_set_used_basic
[16:11:36] [PASSED] ttm_sys_man_alloc_basic
[16:11:36] [PASSED] ttm_sys_man_free_basic
[16:11:36] ================== [PASSED] ttm_resource ===================
[16:11:36] =================== ttm_tt (15 subtests) ===================
[16:11:36] ==================== ttm_tt_init_basic  ====================
[16:11:36] [PASSED] Page-aligned size
[16:11:36] [PASSED] Extra pages requested
[16:11:36] ================ [PASSED] ttm_tt_init_basic ================
[16:11:36] [PASSED] ttm_tt_init_misaligned
[16:11:36] [PASSED] ttm_tt_fini_basic
[16:11:36] [PASSED] ttm_tt_fini_sg
[16:11:36] [PASSED] ttm_tt_fini_shmem
[16:11:36] [PASSED] ttm_tt_create_basic
[16:11:36] [PASSED] ttm_tt_create_invalid_bo_type
[16:11:36] [PASSED] ttm_tt_create_ttm_exists
[16:11:36] [PASSED] ttm_tt_create_failed
[16:11:36] [PASSED] ttm_tt_destroy_basic
[16:11:36] [PASSED] ttm_tt_populate_null_ttm
[16:11:36] [PASSED] ttm_tt_populate_populated_ttm
[16:11:36] [PASSED] ttm_tt_unpopulate_basic
[16:11:36] [PASSED] ttm_tt_unpopulate_empty_ttm
[16:11:36] [PASSED] ttm_tt_swapin_basic
[16:11:36] ===================== [PASSED] ttm_tt ======================
[16:11:36] =================== ttm_bo (14 subtests) ===================
[16:11:36] =========== ttm_bo_reserve_optimistic_no_ticket  ===========
[16:11:36] [PASSED] Cannot be interrupted and sleeps
[16:11:36] [PASSED] Cannot be interrupted, locks straight away
[16:11:36] [PASSED] Can be interrupted, sleeps
[16:11:36] ======= [PASSED] ttm_bo_reserve_optimistic_no_ticket =======
[16:11:36] [PASSED] ttm_bo_reserve_locked_no_sleep
[16:11:36] [PASSED] ttm_bo_reserve_no_wait_ticket
[16:11:36] [PASSED] ttm_bo_reserve_double_resv
[16:11:36] [PASSED] ttm_bo_reserve_interrupted
[16:11:36] [PASSED] ttm_bo_reserve_deadlock
[16:11:36] [PASSED] ttm_bo_unreserve_basic
[16:11:36] [PASSED] ttm_bo_unreserve_pinned
[16:11:36] [PASSED] ttm_bo_unreserve_bulk
[16:11:36] [PASSED] ttm_bo_fini_basic
[16:11:36] [PASSED] ttm_bo_fini_shared_resv
[16:11:36] [PASSED] ttm_bo_pin_basic
[16:11:36] [PASSED] ttm_bo_pin_unpin_resource
[16:11:36] [PASSED] ttm_bo_multiple_pin_one_unpin
[16:11:36] ===================== [PASSED] ttm_bo ======================
[16:11:36] ============== ttm_bo_validate (21 subtests) ===============
[16:11:36] ============== ttm_bo_init_reserved_sys_man  ===============
[16:11:36] [PASSED] Buffer object for userspace
[16:11:36] [PASSED] Kernel buffer object
[16:11:36] [PASSED] Shared buffer object
[16:11:36] ========== [PASSED] ttm_bo_init_reserved_sys_man ===========
[16:11:36] ============== ttm_bo_init_reserved_mock_man  ==============
[16:11:36] [PASSED] Buffer object for userspace
[16:11:36] [PASSED] Kernel buffer object
[16:11:36] [PASSED] Shared buffer object
[16:11:36] ========== [PASSED] ttm_bo_init_reserved_mock_man ==========
[16:11:36] [PASSED] ttm_bo_init_reserved_resv
[16:11:36] ================== ttm_bo_validate_basic  ==================
[16:11:36] [PASSED] Buffer object for userspace
[16:11:36] [PASSED] Kernel buffer object
[16:11:36] [PASSED] Shared buffer object
[16:11:36] ============== [PASSED] ttm_bo_validate_basic ==============
[16:11:36] [PASSED] ttm_bo_validate_invalid_placement
[16:11:36] ============= ttm_bo_validate_same_placement  ==============
[16:11:36] [PASSED] System manager
[16:11:36] [PASSED] VRAM manager
[16:11:36] ========= [PASSED] ttm_bo_validate_same_placement ==========
[16:11:36] [PASSED] ttm_bo_validate_failed_alloc
[16:11:36] [PASSED] ttm_bo_validate_pinned
[16:11:36] [PASSED] ttm_bo_validate_busy_placement
[16:11:36] ================ ttm_bo_validate_multihop  =================
[16:11:36] [PASSED] Buffer object for userspace
[16:11:36] [PASSED] Kernel buffer object
[16:11:36] [PASSED] Shared buffer object
[16:11:36] ============ [PASSED] ttm_bo_validate_multihop =============
[16:11:36] ========== ttm_bo_validate_no_placement_signaled  ==========
[16:11:36] [PASSED] Buffer object in system domain, no page vector
[16:11:36] [PASSED] Buffer object in system domain with an existing page vector
[16:11:36] ====== [PASSED] ttm_bo_validate_no_placement_signaled ======
[16:11:36] ======== ttm_bo_validate_no_placement_not_signaled  ========
[16:11:36] [PASSED] Buffer object for userspace
[16:11:36] [PASSED] Kernel buffer object
[16:11:36] [PASSED] Shared buffer object
[16:11:36] ==== [PASSED] ttm_bo_validate_no_placement_not_signaled ====
[16:11:36] [PASSED] ttm_bo_validate_move_fence_signaled
[16:11:36] ========= ttm_bo_validate_move_fence_not_signaled  =========
[16:11:36] [PASSED] Waits for GPU
[16:11:36] [PASSED] Tries to lock straight away
[16:11:36] ===== [PASSED] ttm_bo_validate_move_fence_not_signaled =====
[16:11:36] [PASSED] ttm_bo_validate_happy_evict
[16:11:36] [PASSED] ttm_bo_validate_all_pinned_evict
[16:11:36] [PASSED] ttm_bo_validate_allowed_only_evict
[16:11:36] [PASSED] ttm_bo_validate_deleted_evict
[16:11:36] [PASSED] ttm_bo_validate_busy_domain_evict
[16:11:36] [PASSED] ttm_bo_validate_evict_gutting
[16:11:36] [PASSED] ttm_bo_validate_recrusive_evict
stty: 'standard input': Inappropriate ioctl for device
[16:11:36] ================= [PASSED] ttm_bo_validate =================
[16:11:36] ============================================================
[16:11:36] Testing complete. Ran 101 tests: passed: 101
[16:11:36] Elapsed time: 11.136s total, 1.612s configuring, 9.307s building, 0.185s running

+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel



^ permalink raw reply	[flat|nested] 43+ messages in thread

* ✓ Xe.CI.BAT: success for Introduce Xe Uncorrectable Error Handling (rev2)
  2026-03-02 10:21 [PATCH v2 00/11] Introduce Xe Uncorrectable Error Handling Riana Tauro
                   ` (12 preceding siblings ...)
  2026-03-02 16:11 ` ✓ CI.KUnit: success " Patchwork
@ 2026-03-02 16:48 ` Patchwork
  2026-03-02 18:29 ` ✗ Xe.CI.FULL: failure " Patchwork
  14 siblings, 0 replies; 43+ messages in thread
From: Patchwork @ 2026-03-02 16:48 UTC (permalink / raw)
  To: Riana Tauro; +Cc: intel-xe

[-- Attachment #1: Type: text/plain, Size: 871 bytes --]

== Series Details ==

Series: Introduce Xe Uncorrectable Error Handling (rev2)
URL   : https://patchwork.freedesktop.org/series/160482/
State : success

== Summary ==

CI Bug Log - changes from xe-4638-200559e195414731c83ff6da6b34209dbef51227_BAT -> xe-pw-160482v2_BAT
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Participating hosts (14 -> 14)
------------------------------

  No changes in participating hosts


Changes
-------

  No changes found


Build changes
-------------

  * Linux: xe-4638-200559e195414731c83ff6da6b34209dbef51227 -> xe-pw-160482v2

  IGT_8776: 8776
  xe-4638-200559e195414731c83ff6da6b34209dbef51227: 200559e195414731c83ff6da6b34209dbef51227
  xe-pw-160482v2: 160482v2

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/index.html

[-- Attachment #2: Type: text/html, Size: 1419 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* ✗ Xe.CI.FULL: failure for Introduce Xe Uncorrectable Error Handling (rev2)
  2026-03-02 10:21 [PATCH v2 00/11] Introduce Xe Uncorrectable Error Handling Riana Tauro
                   ` (13 preceding siblings ...)
  2026-03-02 16:48 ` ✓ Xe.CI.BAT: " Patchwork
@ 2026-03-02 18:29 ` Patchwork
  14 siblings, 0 replies; 43+ messages in thread
From: Patchwork @ 2026-03-02 18:29 UTC (permalink / raw)
  To: Riana Tauro; +Cc: intel-xe

[-- Attachment #1: Type: text/plain, Size: 32968 bytes --]

== Series Details ==

Series: Introduce Xe Uncorrectable Error Handling (rev2)
URL   : https://patchwork.freedesktop.org/series/160482/
State : failure

== Summary ==

CI Bug Log - changes from xe-4638-200559e195414731c83ff6da6b34209dbef51227_FULL -> xe-pw-160482v2_FULL
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with xe-pw-160482v2_FULL absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in xe-pw-160482v2_FULL, please notify your bug team (I915-ci-infra@lists.freedesktop.org) to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (2 -> 2)
------------------------------

  No changes in participating hosts

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in xe-pw-160482v2_FULL:

### IGT changes ###

#### Possible regressions ####

  * igt@kms_cursor_legacy@flip-vs-cursor-legacy:
    - shard-bmg:          NOTRUN -> [FAIL][1]
   [1]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-2/igt@kms_cursor_legacy@flip-vs-cursor-legacy.html

  * igt@xe_fault_injection@vm-bind-fail-xe_vma_ops_alloc:
    - shard-bmg:          [PASS][2] -> [ABORT][3]
   [2]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4638-200559e195414731c83ff6da6b34209dbef51227/shard-bmg-6/igt@xe_fault_injection@vm-bind-fail-xe_vma_ops_alloc.html
   [3]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-3/igt@xe_fault_injection@vm-bind-fail-xe_vma_ops_alloc.html

  
Known issues
------------

  Here are the changes found in xe-pw-160482v2_FULL that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@kms_async_flips@async-flip-with-page-flip-events-linear:
    - shard-lnl:          [PASS][4] -> [FAIL][5] ([Intel XE#5993]) +3 other tests fail
   [4]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4638-200559e195414731c83ff6da6b34209dbef51227/shard-lnl-4/igt@kms_async_flips@async-flip-with-page-flip-events-linear.html
   [5]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-2/igt@kms_async_flips@async-flip-with-page-flip-events-linear.html

  * igt@kms_big_fb@linear-32bpp-rotate-90:
    - shard-lnl:          NOTRUN -> [SKIP][6] ([Intel XE#1407]) +1 other test skip
   [6]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@kms_big_fb@linear-32bpp-rotate-90.html

  * igt@kms_big_fb@x-tiled-32bpp-rotate-90:
    - shard-bmg:          NOTRUN -> [SKIP][7] ([Intel XE#2327])
   [7]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-2/igt@kms_big_fb@x-tiled-32bpp-rotate-90.html

  * igt@kms_big_fb@y-tiled-max-hw-stride-32bpp-rotate-180:
    - shard-bmg:          NOTRUN -> [SKIP][8] ([Intel XE#1124]) +1 other test skip
   [8]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-2/igt@kms_big_fb@y-tiled-max-hw-stride-32bpp-rotate-180.html

  * igt@kms_big_fb@yf-tiled-max-hw-stride-32bpp-rotate-180-hflip-async-flip:
    - shard-lnl:          NOTRUN -> [SKIP][9] ([Intel XE#1124]) +2 other tests skip
   [9]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@kms_big_fb@yf-tiled-max-hw-stride-32bpp-rotate-180-hflip-async-flip.html

  * igt@kms_bw@linear-tiling-1-displays-1920x1080p:
    - shard-bmg:          [PASS][10] -> [SKIP][11] ([Intel XE#367]) +1 other test skip
   [10]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4638-200559e195414731c83ff6da6b34209dbef51227/shard-bmg-5/igt@kms_bw@linear-tiling-1-displays-1920x1080p.html
   [11]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-1/igt@kms_bw@linear-tiling-1-displays-1920x1080p.html

  * igt@kms_bw@linear-tiling-2-displays-3840x2160p:
    - shard-lnl:          NOTRUN -> [SKIP][12] ([Intel XE#367])
   [12]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@kms_bw@linear-tiling-2-displays-3840x2160p.html

  * igt@kms_ccs@bad-pixel-format-4-tiled-mtl-rc-ccs-cc:
    - shard-lnl:          NOTRUN -> [SKIP][13] ([Intel XE#2887]) +3 other tests skip
   [13]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@kms_ccs@bad-pixel-format-4-tiled-mtl-rc-ccs-cc.html

  * igt@kms_ccs@crc-primary-rotation-180-4-tiled-lnl-ccs@pipe-c-dp-2:
    - shard-bmg:          NOTRUN -> [SKIP][14] ([Intel XE#2652]) +12 other tests skip
   [14]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-1/igt@kms_ccs@crc-primary-rotation-180-4-tiled-lnl-ccs@pipe-c-dp-2.html

  * igt@kms_ccs@crc-primary-suspend-4-tiled-dg2-mc-ccs:
    - shard-lnl:          NOTRUN -> [SKIP][15] ([Intel XE#3432])
   [15]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@kms_ccs@crc-primary-suspend-4-tiled-dg2-mc-ccs.html

  * igt@kms_ccs@crc-sprite-planes-basic-4-tiled-dg2-rc-ccs:
    - shard-bmg:          NOTRUN -> [SKIP][16] ([Intel XE#2887]) +1 other test skip
   [16]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-2/igt@kms_ccs@crc-sprite-planes-basic-4-tiled-dg2-rc-ccs.html

  * igt@kms_chamelium_audio@dp-audio:
    - shard-bmg:          NOTRUN -> [SKIP][17] ([Intel XE#2252]) +2 other tests skip
   [17]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-2/igt@kms_chamelium_audio@dp-audio.html

  * igt@kms_chamelium_color@ctm-limited-range:
    - shard-bmg:          NOTRUN -> [SKIP][18] ([Intel XE#2325])
   [18]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-2/igt@kms_chamelium_color@ctm-limited-range.html

  * igt@kms_chamelium_color@gamma:
    - shard-lnl:          NOTRUN -> [SKIP][19] ([Intel XE#306])
   [19]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@kms_chamelium_color@gamma.html

  * igt@kms_chamelium_frames@vga-frame-dump:
    - shard-lnl:          NOTRUN -> [SKIP][20] ([Intel XE#373]) +1 other test skip
   [20]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@kms_chamelium_frames@vga-frame-dump.html

  * igt@kms_color_pipeline@plane-lut1d-post-ctm3x4@pipe-a-plane-0:
    - shard-lnl:          NOTRUN -> [FAIL][21] ([Intel XE#7305]) +9 other tests fail
   [21]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@kms_color_pipeline@plane-lut1d-post-ctm3x4@pipe-a-plane-0.html

  * igt@kms_content_protection@atomic@pipe-a-dp-2:
    - shard-bmg:          NOTRUN -> [FAIL][22] ([Intel XE#1178] / [Intel XE#3304])
   [22]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-7/igt@kms_content_protection@atomic@pipe-a-dp-2.html

  * igt@kms_content_protection@dp-mst-type-0-suspend-resume:
    - shard-lnl:          NOTRUN -> [SKIP][23] ([Intel XE#6974])
   [23]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@kms_content_protection@dp-mst-type-0-suspend-resume.html

  * igt@kms_content_protection@lic-type-0@pipe-a-dp-1:
    - shard-bmg:          NOTRUN -> [FAIL][24] ([Intel XE#3304]) +1 other test fail
   [24]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-5/igt@kms_content_protection@lic-type-0@pipe-a-dp-1.html

  * igt@kms_content_protection@uevent-hdcp14@pipe-a-dp-2:
    - shard-bmg:          NOTRUN -> [FAIL][25] ([Intel XE#6707])
   [25]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-1/igt@kms_content_protection@uevent-hdcp14@pipe-a-dp-2.html

  * igt@kms_cursor_crc@cursor-offscreen-512x170:
    - shard-lnl:          NOTRUN -> [SKIP][26] ([Intel XE#2321])
   [26]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@kms_cursor_crc@cursor-offscreen-512x170.html

  * igt@kms_cursor_crc@cursor-random-64x21:
    - shard-bmg:          NOTRUN -> [SKIP][27] ([Intel XE#2320])
   [27]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-2/igt@kms_cursor_crc@cursor-random-64x21.html

  * igt@kms_cursor_crc@cursor-rapid-movement-32x10:
    - shard-lnl:          NOTRUN -> [SKIP][28] ([Intel XE#1424])
   [28]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@kms_cursor_crc@cursor-rapid-movement-32x10.html

  * igt@kms_cursor_legacy@cursorb-vs-flipa-atomic-transitions-varying-size:
    - shard-lnl:          NOTRUN -> [SKIP][29] ([Intel XE#309]) +1 other test skip
   [29]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@kms_cursor_legacy@cursorb-vs-flipa-atomic-transitions-varying-size.html

  * igt@kms_cursor_legacy@flip-vs-cursor-crc-legacy:
    - shard-bmg:          NOTRUN -> [ABORT][30] ([Intel XE#5545] / [Intel XE#6652])
   [30]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-2/igt@kms_cursor_legacy@flip-vs-cursor-crc-legacy.html

  * igt@kms_cursor_legacy@short-busy-flip-before-cursor-atomic-transitions-varying-size:
    - shard-lnl:          NOTRUN -> [SKIP][31] ([Intel XE#323])
   [31]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@kms_cursor_legacy@short-busy-flip-before-cursor-atomic-transitions-varying-size.html

  * igt@kms_dsc@dsc-fractional-bpp-with-bpc:
    - shard-lnl:          NOTRUN -> [SKIP][32] ([Intel XE#2244])
   [32]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@kms_dsc@dsc-fractional-bpp-with-bpc.html

  * igt@kms_flip@2x-flip-vs-dpms-on-nop-interruptible:
    - shard-lnl:          NOTRUN -> [SKIP][33] ([Intel XE#1421])
   [33]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@kms_flip@2x-flip-vs-dpms-on-nop-interruptible.html

  * igt@kms_flip@flip-vs-suspend-interruptible:
    - shard-bmg:          [PASS][34] -> [INCOMPLETE][35] ([Intel XE#2049] / [Intel XE#2597]) +1 other test incomplete
   [34]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4638-200559e195414731c83ff6da6b34209dbef51227/shard-bmg-8/igt@kms_flip@flip-vs-suspend-interruptible.html
   [35]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-6/igt@kms_flip@flip-vs-suspend-interruptible.html

  * igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-32bpp-ytilegen12rcccs-upscaling:
    - shard-lnl:          NOTRUN -> [SKIP][36] ([Intel XE#7178])
   [36]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-32bpp-ytilegen12rcccs-upscaling.html

  * igt@kms_flip_scaled_crc@flip-p016-linear-to-p016-linear-reflect-x:
    - shard-lnl:          NOTRUN -> [SKIP][37] ([Intel XE#7179])
   [37]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@kms_flip_scaled_crc@flip-p016-linear-to-p016-linear-reflect-x.html

  * igt@kms_frontbuffer_tracking@drrs-2p-primscrn-cur-indfb-draw-render:
    - shard-bmg:          NOTRUN -> [SKIP][38] ([Intel XE#2311]) +5 other tests skip
   [38]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-2/igt@kms_frontbuffer_tracking@drrs-2p-primscrn-cur-indfb-draw-render.html

  * igt@kms_frontbuffer_tracking@fbc-1p-offscreen-pri-indfb-draw-render:
    - shard-bmg:          NOTRUN -> [SKIP][39] ([Intel XE#4141]) +2 other tests skip
   [39]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-2/igt@kms_frontbuffer_tracking@fbc-1p-offscreen-pri-indfb-draw-render.html

  * igt@kms_frontbuffer_tracking@fbc-tiling-y:
    - shard-bmg:          NOTRUN -> [SKIP][40] ([Intel XE#2352])
   [40]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-2/igt@kms_frontbuffer_tracking@fbc-tiling-y.html

  * igt@kms_frontbuffer_tracking@fbcdrrs-modesetfrombusy:
    - shard-lnl:          NOTRUN -> [SKIP][41] ([Intel XE#6312] / [Intel XE#651]) +3 other tests skip
   [41]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@kms_frontbuffer_tracking@fbcdrrs-modesetfrombusy.html

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-indfb-pgflip-blt:
    - shard-lnl:          NOTRUN -> [SKIP][42] ([Intel XE#656]) +9 other tests skip
   [42]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-indfb-pgflip-blt.html

  * igt@kms_frontbuffer_tracking@fbcpsr-abgr161616f-draw-blt:
    - shard-lnl:          NOTRUN -> [SKIP][43] ([Intel XE#7061]) +1 other test skip
   [43]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@kms_frontbuffer_tracking@fbcpsr-abgr161616f-draw-blt.html

  * igt@kms_frontbuffer_tracking@psr-2p-scndscrn-cur-indfb-onoff:
    - shard-bmg:          NOTRUN -> [SKIP][44] ([Intel XE#2313]) +4 other tests skip
   [44]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-2/igt@kms_frontbuffer_tracking@psr-2p-scndscrn-cur-indfb-onoff.html

  * igt@kms_pipe_stress@stress-xrgb8888-ytiled:
    - shard-lnl:          NOTRUN -> [SKIP][45] ([Intel XE#4329] / [Intel XE#6912])
   [45]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@kms_pipe_stress@stress-xrgb8888-ytiled.html

  * igt@kms_plane@pixel-format-4-tiled-dg2-rc-ccs-modifier:
    - shard-lnl:          NOTRUN -> [SKIP][46] ([Intel XE#7283])
   [46]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@kms_plane@pixel-format-4-tiled-dg2-rc-ccs-modifier.html

  * igt@kms_plane@pixel-format-y-tiled-gen12-mc-ccs-modifier:
    - shard-bmg:          NOTRUN -> [SKIP][47] ([Intel XE#7283])
   [47]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-2/igt@kms_plane@pixel-format-y-tiled-gen12-mc-ccs-modifier.html

  * igt@kms_plane_multiple@2x-tiling-4:
    - shard-lnl:          NOTRUN -> [SKIP][48] ([Intel XE#4596])
   [48]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@kms_plane_multiple@2x-tiling-4.html

  * igt@kms_plane_scaling@planes-upscale-factor-0-25-downscale-factor-0-75@pipe-b:
    - shard-bmg:          NOTRUN -> [SKIP][49] ([Intel XE#2763] / [Intel XE#6886]) +4 other tests skip
   [49]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-2/igt@kms_plane_scaling@planes-upscale-factor-0-25-downscale-factor-0-75@pipe-b.html

  * igt@kms_pm_backlight@bad-brightness:
    - shard-bmg:          NOTRUN -> [SKIP][50] ([Intel XE#870])
   [50]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-2/igt@kms_pm_backlight@bad-brightness.html

  * igt@kms_pm_dc@dc5-retention-flops:
    - shard-lnl:          NOTRUN -> [SKIP][51] ([Intel XE#3309])
   [51]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@kms_pm_dc@dc5-retention-flops.html

  * igt@kms_psr2_sf@fbc-psr2-overlay-plane-move-continuous-exceed-fully-sf:
    - shard-lnl:          NOTRUN -> [SKIP][52] ([Intel XE#2893] / [Intel XE#4608])
   [52]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@kms_psr2_sf@fbc-psr2-overlay-plane-move-continuous-exceed-fully-sf.html

  * igt@kms_psr2_sf@fbc-psr2-overlay-plane-move-continuous-exceed-fully-sf@pipe-a-edp-1:
    - shard-lnl:          NOTRUN -> [SKIP][53] ([Intel XE#4608]) +1 other test skip
   [53]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@kms_psr2_sf@fbc-psr2-overlay-plane-move-continuous-exceed-fully-sf@pipe-a-edp-1.html

  * igt@kms_psr2_sf@psr2-cursor-plane-move-continuous-sf:
    - shard-bmg:          NOTRUN -> [SKIP][54] ([Intel XE#1489])
   [54]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-2/igt@kms_psr2_sf@psr2-cursor-plane-move-continuous-sf.html

  * igt@kms_psr2_su@page_flip-nv12:
    - shard-lnl:          NOTRUN -> [SKIP][55] ([Intel XE#1128])
   [55]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@kms_psr2_su@page_flip-nv12.html

  * igt@kms_psr@pr-basic:
    - shard-lnl:          NOTRUN -> [SKIP][56] ([Intel XE#1406])
   [56]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@kms_psr@pr-basic.html

  * igt@kms_psr@pr-sprite-plane-onoff:
    - shard-bmg:          NOTRUN -> [SKIP][57] ([Intel XE#2234] / [Intel XE#2850]) +1 other test skip
   [57]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-2/igt@kms_psr@pr-sprite-plane-onoff.html

  * igt@kms_rotation_crc@multiplane-rotation:
    - shard-bmg:          NOTRUN -> [FAIL][58] ([Intel XE#6946])
   [58]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-2/igt@kms_rotation_crc@multiplane-rotation.html

  * igt@kms_rotation_crc@primary-y-tiled-reflect-x-0:
    - shard-lnl:          NOTRUN -> [SKIP][59] ([Intel XE#1127])
   [59]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@kms_rotation_crc@primary-y-tiled-reflect-x-0.html

  * igt@kms_rotation_crc@primary-yf-tiled-reflect-x-180:
    - shard-bmg:          NOTRUN -> [SKIP][60] ([Intel XE#2330])
   [60]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-2/igt@kms_rotation_crc@primary-yf-tiled-reflect-x-180.html

  * igt@kms_rotation_crc@sprite-rotation-270:
    - shard-lnl:          NOTRUN -> [SKIP][61] ([Intel XE#3414] / [Intel XE#3904])
   [61]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@kms_rotation_crc@sprite-rotation-270.html

  * igt@kms_sharpness_filter@filter-dpms:
    - shard-bmg:          NOTRUN -> [SKIP][62] ([Intel XE#6503])
   [62]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-2/igt@kms_sharpness_filter@filter-dpms.html

  * igt@kms_vrr@max-min:
    - shard-bmg:          NOTRUN -> [SKIP][63] ([Intel XE#1499])
   [63]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-2/igt@kms_vrr@max-min.html

  * igt@kms_vrr@negative-basic:
    - shard-lnl:          NOTRUN -> [SKIP][64] ([Intel XE#1499])
   [64]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@kms_vrr@negative-basic.html

  * igt@kms_vrr@seamless-rr-switch-virtual@pipe-a-edp-1:
    - shard-lnl:          [PASS][65] -> [FAIL][66] ([Intel XE#2142]) +1 other test fail
   [65]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4638-200559e195414731c83ff6da6b34209dbef51227/shard-lnl-8/igt@kms_vrr@seamless-rr-switch-virtual@pipe-a-edp-1.html
   [66]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-1/igt@kms_vrr@seamless-rr-switch-virtual@pipe-a-edp-1.html

  * igt@xe_compute_preempt@compute-preempt-many-vram:
    - shard-lnl:          NOTRUN -> [SKIP][67] ([Intel XE#5191] / [Intel XE#7316])
   [67]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@xe_compute_preempt@compute-preempt-many-vram.html

  * igt@xe_eudebug@discovery-race-sigint:
    - shard-lnl:          NOTRUN -> [SKIP][68] ([Intel XE#4837]) +1 other test skip
   [68]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@xe_eudebug@discovery-race-sigint.html

  * igt@xe_eudebug_online@breakpoint-many-sessions-tiles:
    - shard-bmg:          NOTRUN -> [SKIP][69] ([Intel XE#4837] / [Intel XE#6665])
   [69]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-2/igt@xe_eudebug_online@breakpoint-many-sessions-tiles.html

  * igt@xe_eudebug_online@writes-caching-vram-bb-vram-target-vram:
    - shard-lnl:          NOTRUN -> [SKIP][70] ([Intel XE#4837] / [Intel XE#6665])
   [70]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@xe_eudebug_online@writes-caching-vram-bb-vram-target-vram.html

  * igt@xe_evict@evict-beng-small-multi-vm:
    - shard-lnl:          NOTRUN -> [SKIP][71] ([Intel XE#6540] / [Intel XE#688]) +2 other tests skip
   [71]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@xe_evict@evict-beng-small-multi-vm.html

  * igt@xe_exec_balancer@once-parallel-userptr-invalidate:
    - shard-lnl:          NOTRUN -> [SKIP][72] ([Intel XE#7482]) +5 other tests skip
   [72]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@xe_exec_balancer@once-parallel-userptr-invalidate.html

  * igt@xe_exec_basic@multigpu-many-execqueues-many-vm-null-rebind:
    - shard-bmg:          NOTRUN -> [SKIP][73] ([Intel XE#2322]) +2 other tests skip
   [73]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-2/igt@xe_exec_basic@multigpu-many-execqueues-many-vm-null-rebind.html

  * igt@xe_exec_basic@multigpu-once-rebind:
    - shard-lnl:          NOTRUN -> [SKIP][74] ([Intel XE#1392]) +1 other test skip
   [74]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@xe_exec_basic@multigpu-once-rebind.html

  * igt@xe_exec_fault_mode@many-multi-queue-userptr-rebind-prefetch:
    - shard-bmg:          NOTRUN -> [SKIP][75] ([Intel XE#7136]) +2 other tests skip
   [75]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-2/igt@xe_exec_fault_mode@many-multi-queue-userptr-rebind-prefetch.html

  * igt@xe_exec_fault_mode@once-multi-queue-userptr-rebind-imm:
    - shard-lnl:          NOTRUN -> [SKIP][76] ([Intel XE#7136]) +2 other tests skip
   [76]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@xe_exec_fault_mode@once-multi-queue-userptr-rebind-imm.html

  * igt@xe_exec_multi_queue@few-execs-preempt-mode-fault-basic:
    - shard-bmg:          NOTRUN -> [SKIP][77] ([Intel XE#6874]) +4 other tests skip
   [77]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-2/igt@xe_exec_multi_queue@few-execs-preempt-mode-fault-basic.html

  * igt@xe_exec_multi_queue@many-execs-preempt-mode-fault-close-fd-smem:
    - shard-lnl:          NOTRUN -> [SKIP][78] ([Intel XE#6874]) +6 other tests skip
   [78]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@xe_exec_multi_queue@many-execs-preempt-mode-fault-close-fd-smem.html

  * igt@xe_exec_threads@threads-multi-queue-cm-fd-userptr:
    - shard-bmg:          NOTRUN -> [SKIP][79] ([Intel XE#7138]) +2 other tests skip
   [79]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-2/igt@xe_exec_threads@threads-multi-queue-cm-fd-userptr.html

  * igt@xe_exec_threads@threads-multi-queue-fd-userptr:
    - shard-lnl:          NOTRUN -> [SKIP][80] ([Intel XE#7138])
   [80]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@xe_exec_threads@threads-multi-queue-fd-userptr.html

  * igt@xe_multigpu_svm@mgpu-latency-basic:
    - shard-lnl:          NOTRUN -> [SKIP][81] ([Intel XE#6964])
   [81]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@xe_multigpu_svm@mgpu-latency-basic.html

  * igt@xe_pm@s3-exec-after:
    - shard-lnl:          NOTRUN -> [SKIP][82] ([Intel XE#584])
   [82]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@xe_pm@s3-exec-after.html

  * igt@xe_pm_residency@aspm_link_residency:
    - shard-bmg:          [PASS][83] -> [SKIP][84] ([Intel XE#7258])
   [83]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4638-200559e195414731c83ff6da6b34209dbef51227/shard-bmg-7/igt@xe_pm_residency@aspm_link_residency.html
   [84]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-6/igt@xe_pm_residency@aspm_link_residency.html

  * igt@xe_pmu@fn-engine-activity-load:
    - shard-lnl:          NOTRUN -> [SKIP][85] ([Intel XE#4650])
   [85]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@xe_pmu@fn-engine-activity-load.html

  * igt@xe_pxp@pxp-stale-bo-bind-post-suspend:
    - shard-bmg:          NOTRUN -> [SKIP][86] ([Intel XE#4733])
   [86]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-2/igt@xe_pxp@pxp-stale-bo-bind-post-suspend.html

  * igt@xe_query@multigpu-query-mem-usage:
    - shard-lnl:          NOTRUN -> [SKIP][87] ([Intel XE#944])
   [87]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@xe_query@multigpu-query-mem-usage.html

  * igt@xe_sriov_auto_provisioning@selfconfig-reprovision-reduce-numvfs:
    - shard-lnl:          NOTRUN -> [SKIP][88] ([Intel XE#4130])
   [88]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-7/igt@xe_sriov_auto_provisioning@selfconfig-reprovision-reduce-numvfs.html

  
#### Possible fixes ####

  * igt@kms_pm_dc@deep-pkgc:
    - shard-lnl:          [FAIL][89] ([Intel XE#2029] / [Intel XE#7314]) -> [PASS][90]
   [89]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4638-200559e195414731c83ff6da6b34209dbef51227/shard-lnl-5/igt@kms_pm_dc@deep-pkgc.html
   [90]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-3/igt@kms_pm_dc@deep-pkgc.html

  * igt@kms_vrr@cmrr@pipe-a-edp-1:
    - shard-lnl:          [FAIL][91] ([Intel XE#4459]) -> [PASS][92] +1 other test pass
   [91]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4638-200559e195414731c83ff6da6b34209dbef51227/shard-lnl-4/igt@kms_vrr@cmrr@pipe-a-edp-1.html
   [92]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-lnl-6/igt@kms_vrr@cmrr@pipe-a-edp-1.html

  
#### Warnings ####

  * igt@kms_tiled_display@basic-test-pattern:
    - shard-bmg:          [FAIL][93] ([Intel XE#1729]) -> [SKIP][94] ([Intel XE#2426])
   [93]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4638-200559e195414731c83ff6da6b34209dbef51227/shard-bmg-6/igt@kms_tiled_display@basic-test-pattern.html
   [94]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-8/igt@kms_tiled_display@basic-test-pattern.html

  * igt@kms_tiled_display@basic-test-pattern-with-chamelium:
    - shard-bmg:          [SKIP][95] ([Intel XE#2426]) -> [SKIP][96] ([Intel XE#2509])
   [95]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4638-200559e195414731c83ff6da6b34209dbef51227/shard-bmg-8/igt@kms_tiled_display@basic-test-pattern-with-chamelium.html
   [96]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-4/igt@kms_tiled_display@basic-test-pattern-with-chamelium.html

  * igt@xe_fault_injection@probe-fail-guc-xe_guc_ct_send_recv:
    - shard-bmg:          [ABORT][97] ([Intel XE#5466]) -> [ABORT][98] ([Intel XE#5466] / [Intel XE#6652])
   [97]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4638-200559e195414731c83ff6da6b34209dbef51227/shard-bmg-2/igt@xe_fault_injection@probe-fail-guc-xe_guc_ct_send_recv.html
   [98]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/shard-bmg-8/igt@xe_fault_injection@probe-fail-guc-xe_guc_ct_send_recv.html

  
  [Intel XE#1124]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1124
  [Intel XE#1127]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1127
  [Intel XE#1128]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1128
  [Intel XE#1178]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1178
  [Intel XE#1392]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1392
  [Intel XE#1406]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1406
  [Intel XE#1407]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1407
  [Intel XE#1421]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1421
  [Intel XE#1424]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1424
  [Intel XE#1489]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1489
  [Intel XE#1499]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1499
  [Intel XE#1729]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1729
  [Intel XE#2029]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2029
  [Intel XE#2049]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2049
  [Intel XE#2142]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2142
  [Intel XE#2234]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2234
  [Intel XE#2244]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2244
  [Intel XE#2252]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2252
  [Intel XE#2311]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2311
  [Intel XE#2313]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2313
  [Intel XE#2320]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2320
  [Intel XE#2321]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2321
  [Intel XE#2322]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2322
  [Intel XE#2325]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2325
  [Intel XE#2327]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2327
  [Intel XE#2330]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2330
  [Intel XE#2352]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2352
  [Intel XE#2426]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2426
  [Intel XE#2509]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2509
  [Intel XE#2597]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2597
  [Intel XE#2652]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2652
  [Intel XE#2763]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2763
  [Intel XE#2850]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2850
  [Intel XE#2887]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2887
  [Intel XE#2893]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2893
  [Intel XE#306]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/306
  [Intel XE#309]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/309
  [Intel XE#323]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/323
  [Intel XE#3304]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3304
  [Intel XE#3309]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3309
  [Intel XE#3414]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3414
  [Intel XE#3432]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3432
  [Intel XE#367]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/367
  [Intel XE#373]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/373
  [Intel XE#3904]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3904
  [Intel XE#4130]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4130
  [Intel XE#4141]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4141
  [Intel XE#4329]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4329
  [Intel XE#4459]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4459
  [Intel XE#4596]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4596
  [Intel XE#4608]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4608
  [Intel XE#4650]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4650
  [Intel XE#4733]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4733
  [Intel XE#4837]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4837
  [Intel XE#5191]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5191
  [Intel XE#5466]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5466
  [Intel XE#5545]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5545
  [Intel XE#584]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/584
  [Intel XE#5993]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5993
  [Intel XE#6312]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6312
  [Intel XE#6503]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6503
  [Intel XE#651]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/651
  [Intel XE#6540]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6540
  [Intel XE#656]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/656
  [Intel XE#6652]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6652
  [Intel XE#6665]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6665
  [Intel XE#6707]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6707
  [Intel XE#6874]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6874
  [Intel XE#688]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/688
  [Intel XE#6886]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6886
  [Intel XE#6912]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6912
  [Intel XE#6946]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6946
  [Intel XE#6964]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6964
  [Intel XE#6974]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6974
  [Intel XE#7061]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7061
  [Intel XE#7136]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7136
  [Intel XE#7138]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7138
  [Intel XE#7178]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7178
  [Intel XE#7179]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7179
  [Intel XE#7258]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7258
  [Intel XE#7283]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7283
  [Intel XE#7305]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7305
  [Intel XE#7314]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7314
  [Intel XE#7316]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7316
  [Intel XE#7482]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/7482
  [Intel XE#870]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/870
  [Intel XE#944]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/944


Build changes
-------------

  * Linux: xe-4638-200559e195414731c83ff6da6b34209dbef51227 -> xe-pw-160482v2

  IGT_8776: 8776
  xe-4638-200559e195414731c83ff6da6b34209dbef51227: 200559e195414731c83ff6da6b34209dbef51227
  xe-pw-160482v2: 160482v2

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-160482v2/index.html

[-- Attachment #2: Type: text/html, Size: 36582 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2026-04-01  6:47 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-02 10:21 [PATCH v2 00/11] Introduce Xe Uncorrectable Error Handling Riana Tauro
2026-03-02 10:21 ` [PATCH v2 01/11] drm/xe/xe_sysctrl: Add System controller patch Riana Tauro
2026-03-02 10:21 ` [PATCH v2 02/11] drm/xe/xe_survivability: Decouple survivability info from boot survivability Riana Tauro
2026-03-02 17:00   ` Raag Jadav
2026-03-03  8:18     ` Mallesh, Koujalagi
2026-03-30 12:56       ` Tauro, Riana
2026-03-30 13:00     ` Tauro, Riana
2026-03-02 10:21 ` [PATCH v2 03/11] drm/xe/xe_pci_error: Implement PCI error recovery callbacks Riana Tauro
2026-03-02 17:37   ` Raag Jadav
2026-03-03  5:09     ` Riana Tauro
2026-03-04 10:38   ` Mallesh, Koujalagi
2026-03-31  5:18     ` Tauro, Riana
2026-03-02 10:21 ` [PATCH v2 04/11] drm/xe/xe_pci_error: Group all devres to release them on PCIe slot reset Riana Tauro
2026-03-02 10:22 ` [PATCH v2 05/11] drm/xe: Skip device access during PCI error recovery Riana Tauro
2026-03-04 10:59   ` Mallesh, Koujalagi
2026-03-02 10:22 ` [PATCH v2 06/11] drm/xe/xe_ras: Initialize Uncorrectable AER Registers Riana Tauro
2026-03-02 10:22 ` [PATCH v2 07/11] drm/xe/xe_ras: Add structures and commands for Uncorrectable Core Compute Errors Riana Tauro
2026-03-04 16:32   ` Raag Jadav
2026-03-31 16:14     ` Tauro, Riana
2026-04-01  6:25       ` Raag Jadav
2026-04-01  6:39         ` Tauro, Riana
2026-03-02 10:22 ` [PATCH v2 08/11] drm/xe/xe_ras: Add support for Uncorrectable Core-Compute errors Riana Tauro
2026-03-04 16:52   ` Raag Jadav
2026-03-06 18:37     ` Raag Jadav
2026-03-31 16:24     ` Tauro, Riana
2026-04-01  6:34       ` Raag Jadav
2026-04-01  6:47         ` Tauro, Riana
2026-03-06  3:50   ` [v2,08/11] " Purkait, Soham
2026-03-31 16:16     ` Tauro, Riana
2026-03-02 10:22 ` [PATCH v2 09/11] drm/xe/xe_ras: Add structures for SoC Internal errors Riana Tauro
2026-03-10 13:02   ` Mallesh, Koujalagi
2026-03-11 14:51     ` Riana Tauro
2026-03-02 10:22 ` [PATCH v2 10/11] drm/xe/xe_ras: Handle Uncorrectable " Riana Tauro
2026-03-10 13:29   ` Mallesh, Koujalagi
2026-03-11 14:55     ` Riana Tauro
2026-03-02 10:22 ` [PATCH v2 11/11] drm/xe/xe_pci_error: Process errors in mmio_enabled Riana Tauro
2026-03-11  7:10   ` Mallesh, Koujalagi
2026-03-11 14:39     ` Riana Tauro
2026-03-12  8:08       ` Mallesh, Koujalagi
2026-03-02 16:10 ` ✗ CI.checkpatch: warning for Introduce Xe Uncorrectable Error Handling (rev2) Patchwork
2026-03-02 16:11 ` ✓ CI.KUnit: success " Patchwork
2026-03-02 16:48 ` ✓ Xe.CI.BAT: " Patchwork
2026-03-02 18:29 ` ✗ Xe.CI.FULL: failure " Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox